Tool Use and Function Calling
If the previous chapters built the brain of your agent, this chapter gives it hands, feet, and a toolbox. Without tools, an agent is just a really articulate parrot - it can reason about the world, but it can't do anything in it. Tool use is what separates a chatbot from an agent.
Think about it this way: you could ask a brilliant strategist locked in a room to plan a marketing campaign. They might give you a great plan. But if you give them a phone, a laptop, access to analytics dashboards, and a budget - now they can execute that plan. Tools transform intent into action.
In this chapter, we'll go deep on how modern LLMs call functions, how to design tools that agents can actually use well, and how to build a robust tool system that's secure, composable, and production-ready.
How Function Calling Works
Function calling is the mechanism by which an LLM says "I need to use a tool" and provides structured arguments for that tool. The model doesn't actually run the function - your application does. The model just decides which function to call and what arguments to pass.
Here's the basic flow:
- You describe available tools to the model (names, descriptions, parameter schemas)
- The model generates a response that includes a tool call request
- Your application executes the tool with the provided arguments
- You send the tool's result back to the model
- The model incorporates the result into its response
This loop can repeat multiple times in a single interaction. An agent might call five tools in sequence to answer one question.
Format Comparison Across Providers
Every major LLM provider supports function calling, but the formats differ. Here's a practical comparison:
| Feature | OpenAI | Anthropic | Google (Gemini) |
|---|---|---|---|
| Tool definition location | tools array in request |
tools array in request |
tools array in request |
| Schema format | JSON Schema | JSON Schema | OpenAPI-style Schema |
| Tool choice control | tool_choice: "auto"/"required"/"none" |
tool_choice: {type: "auto"/"any"/"tool"} |
tool_config.function_calling_config |
| Parallel tool calls | Supported natively | Supported natively | Supported natively |
| Response format | tool_calls array in message |
tool_use content block |
function_call in parts |
| Result return format | role: "tool" message |
tool_result content block |
function_response part |
| Streaming support | Yes, chunked | Yes, event-based | Yes, chunked |
| Max tools per request | 128 | 1000+ | 128 |
Here's what a tool definition looks like in OpenAI's format:
tools = [
{
"type": "function",
"function": {
"name": "search_database",
"description": "Search the product database by query. Returns matching products with prices and availability.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural language search query, e.g., 'red running shoes under $100'"
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "home", "sports"],
"description": "Product category to filter by"
},
"max_results": {
"type": "integer",
"default": 10,
"description": "Maximum number of results to return (1-50)"
}
},
"required": ["query"]
}
}
}
]
And the equivalent in Anthropic's format:
tools = [
{
"name": "search_database",
"description": "Search the product database by query. Returns matching products with prices and availability.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural language search query, e.g., 'red running shoes under $100'"
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "home", "sports"],
"description": "Product category to filter by"
},
"max_results": {
"type": "integer",
"default": 10,
"description": "Maximum number of results to return (1-50)"
}
},
"required": ["query"]
}
}
]
The differences between providers are mostly cosmetic. If you design your internal tool representation well, you can write adapters that convert to any provider's format. Don't lock yourself into one provider's schema.
Designing Good Tool Interfaces
This is where most agent developers stumble. A poorly designed tool is like giving someone a Swiss Army knife with no labels - technically functional, but practically frustrating. The LLM needs to understand what a tool does, when to use it, and how to call it correctly.
Naming Matters More Than You Think
The tool name is the first thing the model sees. It should be a verb-noun pair that clearly communicates the action:
| Bad Name | Good Name | Why |
|---|---|---|
data |
query_database |
Verb-noun, specific action |
do_thing |
send_email |
Descriptive and unambiguous |
helper |
calculate_shipping_cost |
Self-documenting |
search |
search_knowledge_base |
Specifies what is being searched |
process_v2_final |
extract_invoice_data |
No versioning in names |
Descriptions Are Your Documentation
The description field is arguably the most important part of a tool definition. It's the model's only documentation for understanding when and how to use the tool.
A good description includes:
- What the tool does (one sentence)
- When to use it (context)
- What it returns (output format)
- Edge cases (what happens with bad input)
# Bad description
"description": "Searches stuff"
# Good description
"description": "Search the company knowledge base for relevant documents. Use this when the user asks questions about company policies, procedures, or internal documentation. Returns a list of document snippets ranked by relevance, each with a title, excerpt, and confidence score. Returns an empty list if no relevant documents are found."
Parameter Schema Design
Your parameter schemas should be tight enough to prevent misuse but flexible enough for the model to work with. A few rules:
- Use enums when possible - they constrain the model's choices and prevent typos
- Provide defaults - reduce the number of required parameters
- Add descriptions to every parameter - the model reads them
- Include examples in descriptions - they help the model understand format expectations
Never create a tool with a single data parameter that accepts a JSON string. This defeats the purpose of structured schemas and dramatically increases error rates. Break your inputs into individual, typed parameters.
Tool Categories
Not all tools are created equal. Understanding the categories helps you make better design decisions, especially around security and error handling.
Information Retrieval Tools
These tools read from the world but don't change it. They're generally safe and idempotent.
- Database queries
- API lookups
- File reading
- Web search
- Knowledge base retrieval
Computation Tools
These tools perform calculations or transformations. They're deterministic and side-effect-free.
- Mathematical calculations
- Data transformations
- Format conversions
- Statistical analysis
Side Effect Tools
These tools change the world. They require extra caution, confirmation, and often human approval.
- Sending emails
- Creating database records
- Modifying files
- Making purchases
- Deploying code
Communication Tools
These tools interact with other systems or people.
- Sending notifications
- Posting to Slack
- Creating tickets
- Triggering webhooks
The security posture you apply should match the category. Read-only tools can run freely. Side-effect tools should require confirmation in most scenarios. This isn't paranoia - it's engineering discipline.
Building a Tool Registry
Let's build a proper tool registry with validation, error handling, and execution management. This is production-grade code you can actually use.
import json
import time
import asyncio
import logging
from typing import Any, Callable, Optional
from dataclasses import dataclass, field
from enum import Enum
from jsonschema import validate, ValidationError
logger = logging.getLogger(__name__)
class ToolCategory(Enum):
RETRIEVAL = "retrieval"
COMPUTATION = "computation"
SIDE_EFFECT = "side_effect"
COMMUNICATION = "communication"
@dataclass
class ToolResult:
"""Standardized result from tool execution."""
success: bool
data: Any = None
error: Optional[str] = None
execution_time_ms: float = 0
metadata: dict = field(default_factory=dict)
def to_message(self) -> str:
if self.success:
return json.dumps(self.data) if not isinstance(self.data, str) else self.data
return f"Error: {self.error}"
@dataclass
class ToolDefinition:
"""Complete tool definition with metadata and implementation."""
name: str
description: str
parameters: dict # JSON Schema
handler: Callable
category: ToolCategory
requires_confirmation: bool = False
rate_limit_per_minute: int = 60
timeout_seconds: float = 30.0
retry_count: int = 0
class ToolRegistry:
"""Central registry for all agent tools."""
def __init__(self):
self._tools: dict[str, ToolDefinition] = {}
self._call_counts: dict[str, list[float]] = {}
self._total_calls: int = 0
def register(self, tool: ToolDefinition) -> None:
if tool.name in self._tools:
raise ValueError(f"Tool '{tool.name}' is already registered")
self._tools[tool.name] = tool
self._call_counts[tool.name] = []
logger.info(f"Registered tool: {tool.name} ({tool.category.value})")
def get_tool(self, name: str) -> Optional[ToolDefinition]:
return self._tools.get(name)
def list_tools(self, category: Optional[ToolCategory] = None) -> list[ToolDefinition]:
tools = list(self._tools.values())
if category:
tools = [t for t in tools if t.category == category]
return tools
def _check_rate_limit(self, tool_name: str) -> bool:
"""Check if tool is within its rate limit."""
now = time.time()
window_start = now - 60
# Clean old entries
self._call_counts[tool_name] = [
t for t in self._call_counts[tool_name] if t > window_start
]
return len(self._call_counts[tool_name]) < self._tools[tool_name].rate_limit_per_minute
def _validate_input(self, tool: ToolDefinition, arguments: dict) -> Optional[str]:
"""Validate arguments against the tool's JSON schema."""
try:
validate(instance=arguments, schema=tool.parameters)
return None
except ValidationError as e:
return f"Invalid arguments: {e.message}"
async def execute(self, tool_name: str, arguments: dict) -> ToolResult:
"""Execute a tool with full validation, rate limiting, and error handling."""
tool = self.get_tool(tool_name)
if not tool:
return ToolResult(success=False, error=f"Unknown tool: {tool_name}")
# Validate input
validation_error = self._validate_input(tool, arguments)
if validation_error:
return ToolResult(success=False, error=validation_error)
# Check rate limit
if not self._check_rate_limit(tool_name):
return ToolResult(
success=False,
error=f"Rate limit exceeded for {tool_name}. Max {tool.rate_limit_per_minute}/min."
)
# Execute with timeout and retries
start_time = time.time()
last_error = None
for attempt in range(tool.retry_count + 1):
try:
result = await asyncio.wait_for(
tool.handler(**arguments),
timeout=tool.timeout_seconds
)
elapsed = (time.time() - start_time) * 1000
self._call_counts[tool_name].append(time.time())
self._total_calls += 1
return ToolResult(
success=True,
data=result,
execution_time_ms=elapsed,
metadata={"tool": tool_name, "attempt": attempt + 1}
)
except asyncio.TimeoutError:
last_error = f"Tool '{tool_name}' timed out after {tool.timeout_seconds}s"
logger.warning(f"{last_error} (attempt {attempt + 1})")
except Exception as e:
last_error = f"Tool '{tool_name}' failed: {str(e)}"
logger.error(f"{last_error} (attempt {attempt + 1})", exc_info=True)
elapsed = (time.time() - start_time) * 1000
return ToolResult(
success=False,
error=last_error,
execution_time_ms=elapsed,
metadata={"tool": tool_name, "attempts": tool.retry_count + 1}
)
def to_openai_format(self) -> list[dict]:
"""Export tools in OpenAI's function calling format."""
return [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.parameters,
}
}
for tool in self._tools.values()
]
def to_anthropic_format(self) -> list[dict]:
"""Export tools in Anthropic's tool use format."""
return [
{
"name": tool.name,
"description": tool.description,
"input_schema": tool.parameters,
}
for tool in self._tools.values()
]
Now let's register some tools:
import aiohttp
async def search_knowledge_base(query: str, max_results: int = 5) -> list[dict]:
"""Search the knowledge base for relevant documents."""
# In production, this would hit a vector DB or search API
async with aiohttp.ClientSession() as session:
async with session.get(
"https://api.internal.com/search",
params={"q": query, "limit": max_results}
) as resp:
return await resp.json()
async def run_sql_query(query: str, database: str = "analytics") -> dict:
"""Execute a read-only SQL query."""
# Validate it's actually read-only
normalized = query.strip().upper()
if not normalized.startswith("SELECT"):
raise ValueError("Only SELECT queries are allowed")
# Execute query (using your preferred async DB driver)
# ...
return {"columns": [...], "rows": [...], "row_count": ...}
registry = ToolRegistry()
registry.register(ToolDefinition(
name="search_knowledge_base",
description="Search company documentation and knowledge base articles. Returns relevant document snippets ranked by relevance.",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query in natural language"},
"max_results": {"type": "integer", "default": 5, "minimum": 1, "maximum": 20}
},
"required": ["query"]
},
handler=search_knowledge_base,
category=ToolCategory.RETRIEVAL,
rate_limit_per_minute=30,
timeout_seconds=10.0,
retry_count=2
))
registry.register(ToolDefinition(
name="run_sql_query",
description="Execute a read-only SQL query against the analytics database. Only SELECT statements are allowed. Use this to look up metrics, aggregate data, or answer questions about business performance.",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "SQL SELECT query to execute"},
"database": {
"type": "string",
"enum": ["analytics", "users", "products"],
"default": "analytics"
}
},
"required": ["query"]
},
handler=run_sql_query,
category=ToolCategory.RETRIEVAL,
requires_confirmation=False,
rate_limit_per_minute=20,
timeout_seconds=30.0
))
Parallel vs Sequential Tool Execution
When an agent needs to call multiple tools, you have a choice: run them one at a time, or run them in parallel. This decision has real performance implications.
Sequential execution is simpler and necessary when tool B depends on tool A's output:
# Sequential: query depends on search results
search_results = await registry.execute("search_knowledge_base", {"query": "Q3 revenue"})
sql_result = await registry.execute("run_sql_query", {
"query": f"SELECT * FROM metrics WHERE doc_id IN ({search_results.data['ids']})"
})
Parallel execution is faster when tools are independent:
# Parallel: these tools don't depend on each other
results = await asyncio.gather(
registry.execute("search_knowledge_base", {"query": "Q3 revenue"}),
registry.execute("get_current_exchange_rates", {"currency": "USD"}),
registry.execute("fetch_stock_price", {"symbol": "AAPL"}),
)
Modern LLMs (OpenAI, Anthropic, Google) support parallel tool calls natively - the model can request multiple tool calls in a single response. Your execution layer should detect independent calls and run them concurrently.
A good rule of thumb: if you're calling three or more independent tools sequentially and each takes 500ms, that's 1.5 seconds of unnecessary latency. Parallelize aggressively for independent tool calls.
Tool Composition - Chaining Tools Together
Individual tools are useful, but the real power comes from composing them. Tool composition is when the output of one tool feeds into the input of another, creating a pipeline.
class ToolChain:
"""Chain multiple tools together, passing results between them."""
def __init__(self, registry: ToolRegistry):
self.registry = registry
self.steps: list[dict] = []
def add_step(self, tool_name: str, argument_mapper: Callable[[Any], dict]) -> "ToolChain":
self.steps.append({
"tool_name": tool_name,
"argument_mapper": argument_mapper,
})
return self
async def execute(self, initial_input: Any) -> list[ToolResult]:
results = []
current_data = initial_input
for step in self.steps:
arguments = step["argument_mapper"](current_data)
result = await self.registry.execute(step["tool_name"], arguments)
results.append(result)
if not result.success:
break # Stop chain on failure
current_data = result.data
return results
# Example: Search -> Query -> Chart
chain = ToolChain(registry)
chain.add_step(
"search_knowledge_base",
lambda input_data: {"query": input_data["question"]}
).add_step(
"run_sql_query",
lambda search_data: {"query": f"SELECT date, revenue FROM metrics WHERE topic = '{search_data[0]['topic']}'"}
).add_step(
"create_chart",
lambda sql_data: {"data": sql_data["rows"], "chart_type": "line", "title": "Revenue Over Time"}
)
results = await chain.execute({"question": "What's our revenue trend?"})
Security Considerations
Tool use is where agent security gets real. You're giving an AI the ability to execute code, query databases, and interact with external systems. If you're not careful, you're one prompt injection away from a disaster.
Sandboxing
Never let an agent tool run with full system privileges. Sandbox your tool execution:
# Bad: agent can run any shell command
async def run_command(command: str) -> str:
process = await asyncio.create_subprocess_shell(command, ...)
return await process.communicate()
# Good: agent has specific, constrained tools
async def list_project_files(directory: str) -> list[str]:
# Validate directory is within allowed paths
allowed_roots = ["/app/data", "/app/uploads"]
resolved = os.path.realpath(directory)
if not any(resolved.startswith(root) for root in allowed_roots):
raise PermissionError(f"Access denied: {directory}")
return os.listdir(resolved)
Input Validation
Always validate tool inputs beyond what JSON Schema provides:
import re
async def run_sql_query(query: str, database: str = "analytics") -> dict:
# Schema validation ensures query is a string,
# but we need semantic validation too
# Block dangerous patterns
dangerous = re.compile(
r'\b(DROP|DELETE|UPDATE|INSERT|ALTER|TRUNCATE|EXEC|UNION)\b',
re.IGNORECASE
)
if dangerous.search(query):
raise ValueError("Query contains disallowed SQL operations")
# Limit query complexity
if query.count("JOIN") > 3:
raise ValueError("Query too complex: maximum 3 JOINs allowed")
# Set execution timeout at the database level too
query = f"SET statement_timeout = '5s'; {query}"
# ... execute
Output Sanitization
Tool outputs go back to the model, but they might also be shown to users. Sanitize accordingly:
def sanitize_tool_output(output: Any, max_length: int = 10000) -> str:
"""Sanitize tool output before returning to the model."""
text = json.dumps(output) if not isinstance(output, str) else output
# Truncate to prevent context window overload
if len(text) > max_length:
text = text[:max_length] + f"\n... [truncated, {len(text) - max_length} chars omitted]"
# Remove any sensitive patterns that leaked through
text = re.sub(r'(?i)(api[_-]?key|password|secret|token)\s*[:=]\s*\S+', '[REDACTED]', text)
return text
Never let tool outputs flow directly to the user without sanitization. Even if your tools are well-designed, the data they return might contain sensitive information from your backend systems.
Rate Limiting and Cost Management
Every tool call costs something - API quota, compute time, money. You need guardrails:
@dataclass
class UsageBudget:
"""Track and limit tool usage per agent session."""
max_tool_calls: int = 50
max_cost_usd: float = 1.00
max_execution_time_seconds: float = 120.0
# Running totals
total_calls: int = 0
total_cost_usd: float = 0.0
total_time_seconds: float = 0.0
def check(self, estimated_cost: float = 0.0) -> bool:
if self.total_calls >= self.max_tool_calls:
raise BudgetExceededError("Maximum tool calls reached")
if self.total_cost_usd + estimated_cost > self.max_cost_usd:
raise BudgetExceededError("Cost budget exceeded")
if self.total_time_seconds > self.max_execution_time_seconds:
raise BudgetExceededError("Time budget exceeded")
return True
def record(self, cost: float, time_seconds: float) -> None:
self.total_calls += 1
self.total_cost_usd += cost
self.total_time_seconds += time_seconds
Advanced: Dynamic Tool Generation
Here's where things get really interesting. What if an agent could create its own tools? Instead of being limited to a predefined set, the agent generates a tool function when it encounters a novel problem.
async def generate_tool(
agent,
task_description: str,
registry: ToolRegistry
) -> ToolDefinition:
"""Have the agent generate a new tool for a specific task."""
prompt = f"""You need to create a Python tool function for this task:
{task_description}
Return a JSON object with:
- name: snake_case function name
- description: what the tool does
- parameters: JSON Schema for the input
- code: Python async function code (must be named the same as 'name')
The function must be safe - no file system access, no network calls,
pure computation only."""
response = await agent.llm.generate(prompt)
tool_spec = json.loads(response)
# Safety: only allow pure computation tools
disallowed = ["import os", "import subprocess", "open(", "exec(", "eval("]
if any(pattern in tool_spec["code"] for pattern in disallowed):
raise SecurityError("Generated tool contains disallowed operations")
# Compile the function in a restricted namespace
namespace = {"__builtins__": {"len": len, "range": range, "str": str,
"int": int, "float": float, "list": list,
"dict": dict, "sorted": sorted, "sum": sum,
"min": min, "max": max, "round": round}}
exec(tool_spec["code"], namespace)
handler = namespace[tool_spec["name"]]
tool = ToolDefinition(
name=tool_spec["name"],
description=tool_spec["description"],
parameters=tool_spec["parameters"],
handler=handler,
category=ToolCategory.COMPUTATION,
rate_limit_per_minute=100,
)
registry.register(tool)
return tool
Dynamic tool generation is powerful but dangerous. Only allow it for sandboxed, computation-only tools. Never let a generated tool access the network, file system, or databases. The security implications of letting an LLM write executable code that accesses your infrastructure are severe.
Real Example: Data Analysis Agent
Let's bring everything together with a practical example - a data analysis agent that can query SQL databases, create charts, and write reports.
import matplotlib.pyplot as plt
import pandas as pd
import io
import base64
async def query_analytics_db(sql: str) -> dict:
"""Execute a read-only query against the analytics database."""
# ... validate and execute SQL
df = pd.read_sql(sql, connection)
return {
"columns": list(df.columns),
"rows": df.values.tolist(),
"row_count": len(df),
"dtypes": {col: str(dtype) for col, dtype in df.dtypes.items()}
}
async def create_visualization(
data: list[list], columns: list[str], chart_type: str = "bar",
title: str = "", x_column: str = "", y_column: str = ""
) -> dict:
"""Create a chart from data and return as base64 image."""
df = pd.DataFrame(data, columns=columns)
fig, ax = plt.subplots(figsize=(10, 6))
if chart_type == "bar":
df.plot(x=x_column, y=y_column, kind="bar", ax=ax)
elif chart_type == "line":
df.plot(x=x_column, y=y_column, kind="line", ax=ax)
elif chart_type == "pie":
df.set_index(x_column)[y_column].plot(kind="pie", ax=ax)
ax.set_title(title)
buf = io.BytesIO()
fig.savefig(buf, format="png", bbox_inches="tight")
plt.close(fig)
return {
"image_base64": base64.b64encode(buf.getvalue()).decode(),
"format": "png"
}
async def write_report(title: str, sections: list[dict], output_path: str) -> dict:
"""Write a markdown report to disk."""
content = f"# {title}\n\n"
for section in sections:
content += f"## {section['heading']}\n\n{section['body']}\n\n"
with open(output_path, "w") as f:
f.write(content)
return {"path": output_path, "size_bytes": len(content)}
# Register all tools
registry = ToolRegistry()
registry.register(ToolDefinition(
name="query_analytics_db",
description="Run a read-only SQL query on the analytics database. Contains tables: daily_metrics (date, revenue, users, sessions), products (id, name, category, price), orders (id, user_id, product_id, amount, created_at).",
parameters={
"type": "object",
"properties": {
"sql": {"type": "string", "description": "SELECT query to execute"}
},
"required": ["sql"]
},
handler=query_analytics_db,
category=ToolCategory.RETRIEVAL,
rate_limit_per_minute=20
))
registry.register(ToolDefinition(
name="create_visualization",
description="Create a chart from tabular data. Supports bar, line, and pie charts.",
parameters={
"type": "object",
"properties": {
"data": {"type": "array", "description": "2D array of data rows"},
"columns": {"type": "array", "items": {"type": "string"}},
"chart_type": {"type": "string", "enum": ["bar", "line", "pie"]},
"title": {"type": "string"},
"x_column": {"type": "string"},
"y_column": {"type": "string"}
},
"required": ["data", "columns", "chart_type", "x_column", "y_column"]
},
handler=create_visualization,
category=ToolCategory.COMPUTATION
))
registry.register(ToolDefinition(
name="write_report",
description="Write a formatted markdown report to a file.",
parameters={
"type": "object",
"properties": {
"title": {"type": "string"},
"sections": {
"type": "array",
"items": {
"type": "object",
"properties": {
"heading": {"type": "string"},
"body": {"type": "string"}
}
}
},
"output_path": {"type": "string"}
},
"required": ["title", "sections", "output_path"]
},
handler=write_report,
category=ToolCategory.SIDE_EFFECT,
requires_confirmation=True
))
With these tools registered, an agent can handle a prompt like "Analyze our Q3 revenue trends by product category and generate a report with charts" by orchestrating SQL queries, visualizations, and report writing - all autonomously.
Common Mistakes and How to Fix Them
| Mistake | Problem | Fix |
|---|---|---|
| Too many tools | Model gets confused choosing between similar tools | Consolidate overlapping tools; keep toolset focused |
| Vague descriptions | Model calls wrong tool or passes wrong arguments | Write detailed descriptions with examples and edge cases |
| Giant catch-all tools | Single tool does too much; hard to use correctly | Split into focused, single-purpose tools |
| No error messages | Model can't recover when a tool fails | Return descriptive errors the model can reason about |
| Returning raw data | Huge outputs fill context window | Summarize or paginate large results |
| No rate limits | Runaway agent burns through API quota | Set per-tool and per-session limits |
| Trusting model input | SQL injection, path traversal, etc. | Validate and sanitize all inputs server-side |
| No timeout | Hanging tool blocks entire agent | Set timeouts on every tool execution |
Best Practices Summary
| Practice | Description |
|---|---|
| Name tools clearly | Use verb_noun format: search_documents, send_email |
| Write rich descriptions | Include what, when, returns, and edge cases |
| Use strict schemas | Leverage enums, required fields, and constraints |
| Categorize tools | Separate retrieval, computation, side effects |
| Validate inputs twice | JSON Schema validation + semantic validation |
| Sanitize outputs | Truncate, redact secrets, format for readability |
| Set budgets | Limit calls, cost, and execution time per session |
| Handle errors gracefully | Return errors the model can understand and act on |
| Parallelize independent calls | Use asyncio.gather for concurrent execution |
| Log everything | Every tool call, input, output, and error |
| Start small | Begin with 3-5 essential tools, add more as needed |
| Test with adversarial prompts | Try to break your tools before users do |
Tool use is the bridge between reasoning and action. Get it right, and your agent becomes genuinely useful. Get it wrong, and you've built an expensive liability. The tool registry pattern we built in this chapter gives you a solid foundation - now go build tools that let your agents change the world (safely).