AI Engineering

AI Agent Tool Calling: A Deep Dive Into How I Build It

February 2025/8 min read

Tool calling (function calling) is what transforms an LLM from a text generator into an AI agent that takes real actions. After building travel booking agents, ESG scoring agents, and medical retrieval systems, here’s my complete framework for designing tool-calling agents that work reliably.

01.

Tools are just functions with descriptions

In LangChain, a tool is a Python function decorated with @tool, with a docstring that the LLM reads to decide when to call it. The quality of that docstring determines whether the agent calls the right tool at the right time. I spend as much time writing tool descriptions as I do writing the functions themselves.

02.

Tool design principles that matter

Each tool should do exactly one thing. Tools that try to be flexible confuse the LLM about when to use them. Return structured Pydantic objects, never raw strings. Include error information in the return type so the agent can handle failures gracefully without crashing the workflow.

03.

The Travel AI case study

The travel assistant had 6 tools: search_flights, search_hotels, check_availability, create_booking, get_booking_status, cancel_booking. Each had a crisp one-sentence description. The LangGraph supervisor routed user intents to the correct tool sequence. This architecture handled the full booking lifecycle autonomously.

04.

Handling tool failures and retries

LLMs will sometimes call tools with invalid parameters. I always wrap tool execution in try/except, return structured error responses, and configure the agent with a retry limit. LangGraph’s conditional edges make it easy to route to an error handler node when a tool fails repeatedly.

05.

Testing tool-calling agents before production

I test each tool in isolation first (unit tests). Then I test the full agent with a fixed set of user scenarios using LangSmith datasets. I track tool call accuracy — what percentage of the time does the agent call the right tool with valid parameters. Anything under 90% means the tool description needs rewriting.

Usman GhaniFull-Stack Developer & AI Engineer

Building production-grade AI systems and web applications for international clients. 3+ years shipping end-to-end products across the US and Australia.

<- Previous Post

View All Posts