Skip to content

Tool Use and Function Calling

If the previous chapters built the brain of your agent, this chapter gives it hands, feet, and a toolbox. Without tools, an agent is just a really articulate parrot - it can reason about the world, but it can't do anything in it. Tool use is what separates a chatbot from an agent.

Think about it this way: you could ask a brilliant strategist locked in a room to plan a marketing campaign. They might give you a great plan. But if you give them a phone, a laptop, access to analytics dashboards, and a budget - now they can execute that plan. Tools transform intent into action.

In this chapter, we'll go deep on how modern LLMs call functions, how to design tools that agents can actually use well, and how to build a robust tool system that's secure, composable, and production-ready.

How Function Calling Works

Function calling is the mechanism by which an LLM says "I need to use a tool" and provides structured arguments for that tool. The model doesn't actually run the function - your application does. The model just decides which function to call and what arguments to pass.

Here's the basic flow:

  1. You describe available tools to the model (names, descriptions, parameter schemas)
  2. The model generates a response that includes a tool call request
  3. Your application executes the tool with the provided arguments
  4. You send the tool's result back to the model
  5. The model incorporates the result into its response

This loop can repeat multiple times in a single interaction. An agent might call five tools in sequence to answer one question.

Format Comparison Across Providers

Every major LLM provider supports function calling, but the formats differ. Here's a practical comparison:

Feature OpenAI Anthropic Google (Gemini)
Tool definition location tools array in request tools array in request tools array in request
Schema format JSON Schema JSON Schema OpenAPI-style Schema
Tool choice control tool_choice: "auto"/"required"/"none" tool_choice: {type: "auto"/"any"/"tool"} tool_config.function_calling_config
Parallel tool calls Supported natively Supported natively Supported natively
Response format tool_calls array in message tool_use content block function_call in parts
Result return format role: "tool" message tool_result content block function_response part
Streaming support Yes, chunked Yes, event-based Yes, chunked
Max tools per request 128 1000+ 128

Here's what a tool definition looks like in OpenAI's format:

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Search the product database by query. Returns matching products with prices and availability.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Natural language search query, e.g., 'red running shoes under $100'"
                    },
                    "category": {
                        "type": "string",
                        "enum": ["electronics", "clothing", "home", "sports"],
                        "description": "Product category to filter by"
                    },
                    "max_results": {
                        "type": "integer",
                        "default": 10,
                        "description": "Maximum number of results to return (1-50)"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

And the equivalent in Anthropic's format:

tools = [
    {
        "name": "search_database",
        "description": "Search the product database by query. Returns matching products with prices and availability.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Natural language search query, e.g., 'red running shoes under $100'"
                },
                "category": {
                    "type": "string",
                    "enum": ["electronics", "clothing", "home", "sports"],
                    "description": "Product category to filter by"
                },
                "max_results": {
                    "type": "integer",
                    "default": 10,
                    "description": "Maximum number of results to return (1-50)"
                }
            },
            "required": ["query"]
        }
    }
]
Tip

The differences between providers are mostly cosmetic. If you design your internal tool representation well, you can write adapters that convert to any provider's format. Don't lock yourself into one provider's schema.

Designing Good Tool Interfaces

This is where most agent developers stumble. A poorly designed tool is like giving someone a Swiss Army knife with no labels - technically functional, but practically frustrating. The LLM needs to understand what a tool does, when to use it, and how to call it correctly.

Naming Matters More Than You Think

The tool name is the first thing the model sees. It should be a verb-noun pair that clearly communicates the action:

Bad Name Good Name Why
data query_database Verb-noun, specific action
do_thing send_email Descriptive and unambiguous
helper calculate_shipping_cost Self-documenting
search search_knowledge_base Specifies what is being searched
process_v2_final extract_invoice_data No versioning in names

Descriptions Are Your Documentation

The description field is arguably the most important part of a tool definition. It's the model's only documentation for understanding when and how to use the tool.

A good description includes:

  • What the tool does (one sentence)
  • When to use it (context)
  • What it returns (output format)
  • Edge cases (what happens with bad input)
# Bad description
"description": "Searches stuff"

# Good description
"description": "Search the company knowledge base for relevant documents. Use this when the user asks questions about company policies, procedures, or internal documentation. Returns a list of document snippets ranked by relevance, each with a title, excerpt, and confidence score. Returns an empty list if no relevant documents are found."

Parameter Schema Design

Your parameter schemas should be tight enough to prevent misuse but flexible enough for the model to work with. A few rules:

  1. Use enums when possible - they constrain the model's choices and prevent typos
  2. Provide defaults - reduce the number of required parameters
  3. Add descriptions to every parameter - the model reads them
  4. Include examples in descriptions - they help the model understand format expectations
Warning

Never create a tool with a single data parameter that accepts a JSON string. This defeats the purpose of structured schemas and dramatically increases error rates. Break your inputs into individual, typed parameters.

Tool Categories

Not all tools are created equal. Understanding the categories helps you make better design decisions, especially around security and error handling.

Information Retrieval Tools

These tools read from the world but don't change it. They're generally safe and idempotent.

  • Database queries
  • API lookups
  • File reading
  • Web search
  • Knowledge base retrieval

Computation Tools

These tools perform calculations or transformations. They're deterministic and side-effect-free.

  • Mathematical calculations
  • Data transformations
  • Format conversions
  • Statistical analysis

Side Effect Tools

These tools change the world. They require extra caution, confirmation, and often human approval.

  • Sending emails
  • Creating database records
  • Modifying files
  • Making purchases
  • Deploying code

Communication Tools

These tools interact with other systems or people.

  • Sending notifications
  • Posting to Slack
  • Creating tickets
  • Triggering webhooks
Note

The security posture you apply should match the category. Read-only tools can run freely. Side-effect tools should require confirmation in most scenarios. This isn't paranoia - it's engineering discipline.

Building a Tool Registry

Let's build a proper tool registry with validation, error handling, and execution management. This is production-grade code you can actually use.

import json
import time
import asyncio
import logging
from typing import Any, Callable, Optional
from dataclasses import dataclass, field
from enum import Enum
from jsonschema import validate, ValidationError

logger = logging.getLogger(__name__)


class ToolCategory(Enum):
    RETRIEVAL = "retrieval"
    COMPUTATION = "computation"
    SIDE_EFFECT = "side_effect"
    COMMUNICATION = "communication"


@dataclass
class ToolResult:
    """Standardized result from tool execution."""
    success: bool
    data: Any = None
    error: Optional[str] = None
    execution_time_ms: float = 0
    metadata: dict = field(default_factory=dict)

    def to_message(self) -> str:
        if self.success:
            return json.dumps(self.data) if not isinstance(self.data, str) else self.data
        return f"Error: {self.error}"


@dataclass
class ToolDefinition:
    """Complete tool definition with metadata and implementation."""
    name: str
    description: str
    parameters: dict  # JSON Schema
    handler: Callable
    category: ToolCategory
    requires_confirmation: bool = False
    rate_limit_per_minute: int = 60
    timeout_seconds: float = 30.0
    retry_count: int = 0


class ToolRegistry:
    """Central registry for all agent tools."""

    def __init__(self):
        self._tools: dict[str, ToolDefinition] = {}
        self._call_counts: dict[str, list[float]] = {}
        self._total_calls: int = 0

    def register(self, tool: ToolDefinition) -> None:
        if tool.name in self._tools:
            raise ValueError(f"Tool '{tool.name}' is already registered")
        self._tools[tool.name] = tool
        self._call_counts[tool.name] = []
        logger.info(f"Registered tool: {tool.name} ({tool.category.value})")

    def get_tool(self, name: str) -> Optional[ToolDefinition]:
        return self._tools.get(name)

    def list_tools(self, category: Optional[ToolCategory] = None) -> list[ToolDefinition]:
        tools = list(self._tools.values())
        if category:
            tools = [t for t in tools if t.category == category]
        return tools

    def _check_rate_limit(self, tool_name: str) -> bool:
        """Check if tool is within its rate limit."""
        now = time.time()
        window_start = now - 60
        # Clean old entries
        self._call_counts[tool_name] = [
            t for t in self._call_counts[tool_name] if t > window_start
        ]
        return len(self._call_counts[tool_name]) < self._tools[tool_name].rate_limit_per_minute

    def _validate_input(self, tool: ToolDefinition, arguments: dict) -> Optional[str]:
        """Validate arguments against the tool's JSON schema."""
        try:
            validate(instance=arguments, schema=tool.parameters)
            return None
        except ValidationError as e:
            return f"Invalid arguments: {e.message}"

    async def execute(self, tool_name: str, arguments: dict) -> ToolResult:
        """Execute a tool with full validation, rate limiting, and error handling."""
        tool = self.get_tool(tool_name)
        if not tool:
            return ToolResult(success=False, error=f"Unknown tool: {tool_name}")

        # Validate input
        validation_error = self._validate_input(tool, arguments)
        if validation_error:
            return ToolResult(success=False, error=validation_error)

        # Check rate limit
        if not self._check_rate_limit(tool_name):
            return ToolResult(
                success=False,
                error=f"Rate limit exceeded for {tool_name}. Max {tool.rate_limit_per_minute}/min."
            )

        # Execute with timeout and retries
        start_time = time.time()
        last_error = None

        for attempt in range(tool.retry_count + 1):
            try:
                result = await asyncio.wait_for(
                    tool.handler(**arguments),
                    timeout=tool.timeout_seconds
                )
                elapsed = (time.time() - start_time) * 1000
                self._call_counts[tool_name].append(time.time())
                self._total_calls += 1

                return ToolResult(
                    success=True,
                    data=result,
                    execution_time_ms=elapsed,
                    metadata={"tool": tool_name, "attempt": attempt + 1}
                )

            except asyncio.TimeoutError:
                last_error = f"Tool '{tool_name}' timed out after {tool.timeout_seconds}s"
                logger.warning(f"{last_error} (attempt {attempt + 1})")
            except Exception as e:
                last_error = f"Tool '{tool_name}' failed: {str(e)}"
                logger.error(f"{last_error} (attempt {attempt + 1})", exc_info=True)

        elapsed = (time.time() - start_time) * 1000
        return ToolResult(
            success=False,
            error=last_error,
            execution_time_ms=elapsed,
            metadata={"tool": tool_name, "attempts": tool.retry_count + 1}
        )

    def to_openai_format(self) -> list[dict]:
        """Export tools in OpenAI's function calling format."""
        return [
            {
                "type": "function",
                "function": {
                    "name": tool.name,
                    "description": tool.description,
                    "parameters": tool.parameters,
                }
            }
            for tool in self._tools.values()
        ]

    def to_anthropic_format(self) -> list[dict]:
        """Export tools in Anthropic's tool use format."""
        return [
            {
                "name": tool.name,
                "description": tool.description,
                "input_schema": tool.parameters,
            }
            for tool in self._tools.values()
        ]

Now let's register some tools:

import aiohttp

async def search_knowledge_base(query: str, max_results: int = 5) -> list[dict]:
    """Search the knowledge base for relevant documents."""
    # In production, this would hit a vector DB or search API
    async with aiohttp.ClientSession() as session:
        async with session.get(
            "https://api.internal.com/search",
            params={"q": query, "limit": max_results}
        ) as resp:
            return await resp.json()


async def run_sql_query(query: str, database: str = "analytics") -> dict:
    """Execute a read-only SQL query."""
    # Validate it's actually read-only
    normalized = query.strip().upper()
    if not normalized.startswith("SELECT"):
        raise ValueError("Only SELECT queries are allowed")
    # Execute query (using your preferred async DB driver)
    # ...
    return {"columns": [...], "rows": [...], "row_count": ...}


registry = ToolRegistry()

registry.register(ToolDefinition(
    name="search_knowledge_base",
    description="Search company documentation and knowledge base articles. Returns relevant document snippets ranked by relevance.",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "Search query in natural language"},
            "max_results": {"type": "integer", "default": 5, "minimum": 1, "maximum": 20}
        },
        "required": ["query"]
    },
    handler=search_knowledge_base,
    category=ToolCategory.RETRIEVAL,
    rate_limit_per_minute=30,
    timeout_seconds=10.0,
    retry_count=2
))

registry.register(ToolDefinition(
    name="run_sql_query",
    description="Execute a read-only SQL query against the analytics database. Only SELECT statements are allowed. Use this to look up metrics, aggregate data, or answer questions about business performance.",
    parameters={
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "SQL SELECT query to execute"},
            "database": {
                "type": "string",
                "enum": ["analytics", "users", "products"],
                "default": "analytics"
            }
        },
        "required": ["query"]
    },
    handler=run_sql_query,
    category=ToolCategory.RETRIEVAL,
    requires_confirmation=False,
    rate_limit_per_minute=20,
    timeout_seconds=30.0
))

Parallel vs Sequential Tool Execution

When an agent needs to call multiple tools, you have a choice: run them one at a time, or run them in parallel. This decision has real performance implications.

Sequential execution is simpler and necessary when tool B depends on tool A's output:

# Sequential: query depends on search results
search_results = await registry.execute("search_knowledge_base", {"query": "Q3 revenue"})
sql_result = await registry.execute("run_sql_query", {
    "query": f"SELECT * FROM metrics WHERE doc_id IN ({search_results.data['ids']})"
})

Parallel execution is faster when tools are independent:

# Parallel: these tools don't depend on each other
results = await asyncio.gather(
    registry.execute("search_knowledge_base", {"query": "Q3 revenue"}),
    registry.execute("get_current_exchange_rates", {"currency": "USD"}),
    registry.execute("fetch_stock_price", {"symbol": "AAPL"}),
)

Modern LLMs (OpenAI, Anthropic, Google) support parallel tool calls natively - the model can request multiple tool calls in a single response. Your execution layer should detect independent calls and run them concurrently.

Tip

A good rule of thumb: if you're calling three or more independent tools sequentially and each takes 500ms, that's 1.5 seconds of unnecessary latency. Parallelize aggressively for independent tool calls.

Tool Composition - Chaining Tools Together

Individual tools are useful, but the real power comes from composing them. Tool composition is when the output of one tool feeds into the input of another, creating a pipeline.

class ToolChain:
    """Chain multiple tools together, passing results between them."""

    def __init__(self, registry: ToolRegistry):
        self.registry = registry
        self.steps: list[dict] = []

    def add_step(self, tool_name: str, argument_mapper: Callable[[Any], dict]) -> "ToolChain":
        self.steps.append({
            "tool_name": tool_name,
            "argument_mapper": argument_mapper,
        })
        return self

    async def execute(self, initial_input: Any) -> list[ToolResult]:
        results = []
        current_data = initial_input

        for step in self.steps:
            arguments = step["argument_mapper"](current_data)
            result = await self.registry.execute(step["tool_name"], arguments)
            results.append(result)

            if not result.success:
                break  # Stop chain on failure
            current_data = result.data

        return results


# Example: Search -> Query -> Chart
chain = ToolChain(registry)
chain.add_step(
    "search_knowledge_base",
    lambda input_data: {"query": input_data["question"]}
).add_step(
    "run_sql_query",
    lambda search_data: {"query": f"SELECT date, revenue FROM metrics WHERE topic = '{search_data[0]['topic']}'"}
).add_step(
    "create_chart",
    lambda sql_data: {"data": sql_data["rows"], "chart_type": "line", "title": "Revenue Over Time"}
)

results = await chain.execute({"question": "What's our revenue trend?"})

Security Considerations

Tool use is where agent security gets real. You're giving an AI the ability to execute code, query databases, and interact with external systems. If you're not careful, you're one prompt injection away from a disaster.

Sandboxing

Never let an agent tool run with full system privileges. Sandbox your tool execution:

# Bad: agent can run any shell command
async def run_command(command: str) -> str:
    process = await asyncio.create_subprocess_shell(command, ...)
    return await process.communicate()

# Good: agent has specific, constrained tools
async def list_project_files(directory: str) -> list[str]:
    # Validate directory is within allowed paths
    allowed_roots = ["/app/data", "/app/uploads"]
    resolved = os.path.realpath(directory)
    if not any(resolved.startswith(root) for root in allowed_roots):
        raise PermissionError(f"Access denied: {directory}")
    return os.listdir(resolved)

Input Validation

Always validate tool inputs beyond what JSON Schema provides:

import re

async def run_sql_query(query: str, database: str = "analytics") -> dict:
    # Schema validation ensures query is a string,
    # but we need semantic validation too

    # Block dangerous patterns
    dangerous = re.compile(
        r'\b(DROP|DELETE|UPDATE|INSERT|ALTER|TRUNCATE|EXEC|UNION)\b',
        re.IGNORECASE
    )
    if dangerous.search(query):
        raise ValueError("Query contains disallowed SQL operations")

    # Limit query complexity
    if query.count("JOIN") > 3:
        raise ValueError("Query too complex: maximum 3 JOINs allowed")

    # Set execution timeout at the database level too
    query = f"SET statement_timeout = '5s'; {query}"
    # ... execute

Output Sanitization

Tool outputs go back to the model, but they might also be shown to users. Sanitize accordingly:

def sanitize_tool_output(output: Any, max_length: int = 10000) -> str:
    """Sanitize tool output before returning to the model."""
    text = json.dumps(output) if not isinstance(output, str) else output

    # Truncate to prevent context window overload
    if len(text) > max_length:
        text = text[:max_length] + f"\n... [truncated, {len(text) - max_length} chars omitted]"

    # Remove any sensitive patterns that leaked through
    text = re.sub(r'(?i)(api[_-]?key|password|secret|token)\s*[:=]\s*\S+', '[REDACTED]', text)

    return text
Warning

Never let tool outputs flow directly to the user without sanitization. Even if your tools are well-designed, the data they return might contain sensitive information from your backend systems.

Rate Limiting and Cost Management

Every tool call costs something - API quota, compute time, money. You need guardrails:

@dataclass
class UsageBudget:
    """Track and limit tool usage per agent session."""
    max_tool_calls: int = 50
    max_cost_usd: float = 1.00
    max_execution_time_seconds: float = 120.0

    # Running totals
    total_calls: int = 0
    total_cost_usd: float = 0.0
    total_time_seconds: float = 0.0

    def check(self, estimated_cost: float = 0.0) -> bool:
        if self.total_calls >= self.max_tool_calls:
            raise BudgetExceededError("Maximum tool calls reached")
        if self.total_cost_usd + estimated_cost > self.max_cost_usd:
            raise BudgetExceededError("Cost budget exceeded")
        if self.total_time_seconds > self.max_execution_time_seconds:
            raise BudgetExceededError("Time budget exceeded")
        return True

    def record(self, cost: float, time_seconds: float) -> None:
        self.total_calls += 1
        self.total_cost_usd += cost
        self.total_time_seconds += time_seconds

Advanced: Dynamic Tool Generation

Here's where things get really interesting. What if an agent could create its own tools? Instead of being limited to a predefined set, the agent generates a tool function when it encounters a novel problem.

async def generate_tool(
    agent,
    task_description: str,
    registry: ToolRegistry
) -> ToolDefinition:
    """Have the agent generate a new tool for a specific task."""
    prompt = f"""You need to create a Python tool function for this task:
    {task_description}

    Return a JSON object with:
    - name: snake_case function name
    - description: what the tool does
    - parameters: JSON Schema for the input
    - code: Python async function code (must be named the same as 'name')

    The function must be safe - no file system access, no network calls,
    pure computation only."""

    response = await agent.llm.generate(prompt)
    tool_spec = json.loads(response)

    # Safety: only allow pure computation tools
    disallowed = ["import os", "import subprocess", "open(", "exec(", "eval("]
    if any(pattern in tool_spec["code"] for pattern in disallowed):
        raise SecurityError("Generated tool contains disallowed operations")

    # Compile the function in a restricted namespace
    namespace = {"__builtins__": {"len": len, "range": range, "str": str,
                                   "int": int, "float": float, "list": list,
                                   "dict": dict, "sorted": sorted, "sum": sum,
                                   "min": min, "max": max, "round": round}}
    exec(tool_spec["code"], namespace)
    handler = namespace[tool_spec["name"]]

    tool = ToolDefinition(
        name=tool_spec["name"],
        description=tool_spec["description"],
        parameters=tool_spec["parameters"],
        handler=handler,
        category=ToolCategory.COMPUTATION,
        rate_limit_per_minute=100,
    )
    registry.register(tool)
    return tool
Warning

Dynamic tool generation is powerful but dangerous. Only allow it for sandboxed, computation-only tools. Never let a generated tool access the network, file system, or databases. The security implications of letting an LLM write executable code that accesses your infrastructure are severe.

Real Example: Data Analysis Agent

Let's bring everything together with a practical example - a data analysis agent that can query SQL databases, create charts, and write reports.

import matplotlib.pyplot as plt
import pandas as pd
import io
import base64


async def query_analytics_db(sql: str) -> dict:
    """Execute a read-only query against the analytics database."""
    # ... validate and execute SQL
    df = pd.read_sql(sql, connection)
    return {
        "columns": list(df.columns),
        "rows": df.values.tolist(),
        "row_count": len(df),
        "dtypes": {col: str(dtype) for col, dtype in df.dtypes.items()}
    }


async def create_visualization(
    data: list[list], columns: list[str], chart_type: str = "bar",
    title: str = "", x_column: str = "", y_column: str = ""
) -> dict:
    """Create a chart from data and return as base64 image."""
    df = pd.DataFrame(data, columns=columns)
    fig, ax = plt.subplots(figsize=(10, 6))

    if chart_type == "bar":
        df.plot(x=x_column, y=y_column, kind="bar", ax=ax)
    elif chart_type == "line":
        df.plot(x=x_column, y=y_column, kind="line", ax=ax)
    elif chart_type == "pie":
        df.set_index(x_column)[y_column].plot(kind="pie", ax=ax)

    ax.set_title(title)
    buf = io.BytesIO()
    fig.savefig(buf, format="png", bbox_inches="tight")
    plt.close(fig)
    return {
        "image_base64": base64.b64encode(buf.getvalue()).decode(),
        "format": "png"
    }


async def write_report(title: str, sections: list[dict], output_path: str) -> dict:
    """Write a markdown report to disk."""
    content = f"# {title}\n\n"
    for section in sections:
        content += f"## {section['heading']}\n\n{section['body']}\n\n"

    with open(output_path, "w") as f:
        f.write(content)
    return {"path": output_path, "size_bytes": len(content)}


# Register all tools
registry = ToolRegistry()

registry.register(ToolDefinition(
    name="query_analytics_db",
    description="Run a read-only SQL query on the analytics database. Contains tables: daily_metrics (date, revenue, users, sessions), products (id, name, category, price), orders (id, user_id, product_id, amount, created_at).",
    parameters={
        "type": "object",
        "properties": {
            "sql": {"type": "string", "description": "SELECT query to execute"}
        },
        "required": ["sql"]
    },
    handler=query_analytics_db,
    category=ToolCategory.RETRIEVAL,
    rate_limit_per_minute=20
))

registry.register(ToolDefinition(
    name="create_visualization",
    description="Create a chart from tabular data. Supports bar, line, and pie charts.",
    parameters={
        "type": "object",
        "properties": {
            "data": {"type": "array", "description": "2D array of data rows"},
            "columns": {"type": "array", "items": {"type": "string"}},
            "chart_type": {"type": "string", "enum": ["bar", "line", "pie"]},
            "title": {"type": "string"},
            "x_column": {"type": "string"},
            "y_column": {"type": "string"}
        },
        "required": ["data", "columns", "chart_type", "x_column", "y_column"]
    },
    handler=create_visualization,
    category=ToolCategory.COMPUTATION
))

registry.register(ToolDefinition(
    name="write_report",
    description="Write a formatted markdown report to a file.",
    parameters={
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "sections": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "heading": {"type": "string"},
                        "body": {"type": "string"}
                    }
                }
            },
            "output_path": {"type": "string"}
        },
        "required": ["title", "sections", "output_path"]
    },
    handler=write_report,
    category=ToolCategory.SIDE_EFFECT,
    requires_confirmation=True
))

With these tools registered, an agent can handle a prompt like "Analyze our Q3 revenue trends by product category and generate a report with charts" by orchestrating SQL queries, visualizations, and report writing - all autonomously.

Common Mistakes and How to Fix Them

Mistake Problem Fix
Too many tools Model gets confused choosing between similar tools Consolidate overlapping tools; keep toolset focused
Vague descriptions Model calls wrong tool or passes wrong arguments Write detailed descriptions with examples and edge cases
Giant catch-all tools Single tool does too much; hard to use correctly Split into focused, single-purpose tools
No error messages Model can't recover when a tool fails Return descriptive errors the model can reason about
Returning raw data Huge outputs fill context window Summarize or paginate large results
No rate limits Runaway agent burns through API quota Set per-tool and per-session limits
Trusting model input SQL injection, path traversal, etc. Validate and sanitize all inputs server-side
No timeout Hanging tool blocks entire agent Set timeouts on every tool execution

Best Practices Summary

Practice Description
Name tools clearly Use verb_noun format: search_documents, send_email
Write rich descriptions Include what, when, returns, and edge cases
Use strict schemas Leverage enums, required fields, and constraints
Categorize tools Separate retrieval, computation, side effects
Validate inputs twice JSON Schema validation + semantic validation
Sanitize outputs Truncate, redact secrets, format for readability
Set budgets Limit calls, cost, and execution time per session
Handle errors gracefully Return errors the model can understand and act on
Parallelize independent calls Use asyncio.gather for concurrent execution
Log everything Every tool call, input, output, and error
Start small Begin with 3-5 essential tools, add more as needed
Test with adversarial prompts Try to break your tools before users do

Tool use is the bridge between reasoning and action. Get it right, and your agent becomes genuinely useful. Get it wrong, and you've built an expensive liability. The tool registry pattern we built in this chapter gives you a solid foundation - now go build tools that let your agents change the world (safely).