Skip to content

Building Your First Agent

Enough theory. Let's build something.

By the end of this chapter, you'll have a working AI agent - written from scratch, no frameworks - that can search the web, perform calculations, extract text from URLs, and synthesize findings into a coherent answer. More importantly, you'll understand every line of code and every design decision, so you can modify it for your own use cases.

We're building a research agent: give it a question, and it will autonomously search for information, read sources, and produce a well-reasoned answer. The kind of thing you'd use to research a topic before writing a report, compare product options, or investigate a technical question.


Prerequisites and Setup

You'll need:

  • Python 3.11+ (3.12 or 3.13 preferred)
  • An Anthropic API key (sign up at console.anthropic.com - you'll get free credits to start)
  • A few Python packages

Install dependencies

pip install anthropic httpx beautifulsoup4
Package Purpose
anthropic Official Anthropic Python SDK - for calling Claude
httpx Modern HTTP client - for web requests
beautifulsoup4 HTML parser - for extracting text from web pages

Set your API key

export ANTHROPIC_API_KEY="sk-ant-your-key-here"

Or in Python:

import os
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-your-key-here"
Warning

Never hardcode API keys in your source files. Use environment variables or a secrets manager. If you accidentally commit a key to git, rotate it immediately - bots scrape public repos for exposed credentials within minutes.


Architecture Overview

Here's what we're building:

┌──────────────────────────────────────────┐
│              RESEARCH AGENT              │
│                                          │
│  ┌──────────┐    ┌──────────────────┐    │
│  │  Claude   │◄──►│   Agent Loop     │    │
│  │  (LLM)   │    │  (orchestrator)  │    │
│  └──────────┘    └───────┬──────────┘    │
│                          │               │
│              ┌───────────┼───────────┐   │
│              │           │           │   │
│          ┌───▼───┐  ┌────▼────┐  ┌───▼──┐│
│          │Web    │  │Calculate│  │Extract││
│          │Search │  │         │  │Text   ││
│          └───────┘  └─────────┘  └──────┘│
└──────────────────────────────────────────┘

The components:

  1. Tools - Functions the agent can call (web search, calculator, text extraction)
  2. Tool Registry - Schema definitions that tell the LLM what tools are available and how to call them
  3. Agent Loop - The orchestrator that sends messages to the LLM, executes tool calls, and feeds results back
  4. Output Handling - Parsing and presenting the final result

Let's build each piece.


Step 1: Define the Tools

Tools are just Python functions. The key is making them robust - they'll be called by an LLM that might pass unexpected inputs, so defensive coding matters.

import httpx
from bs4 import BeautifulSoup
import json
import re
import math

def web_search(query: str) -> str:
    """Search the web and return top results.

    In production, you'd use a real search API (Brave, Serper, Tavily).
    This example uses a simple approach for demonstration.
    """
    try:
        # Using Brave Search API as an example
        # Replace with your preferred search provider
        headers = {
            "Accept": "application/json",
            "Accept-Encoding": "gzip",
            "X-Subscription-Token": os.environ.get("BRAVE_API_KEY", ""),
        }
        resp = httpx.get(
            "https://api.search.brave.com/res/v1/web/search",
            params={"q": query, "count": 5},
            headers=headers,
            timeout=10.0,
        )
        resp.raise_for_status()
        results = resp.json().get("web", {}).get("results", [])

        formatted = []
        for r in results[:5]:
            formatted.append(
                f"Title: {r['title']}\n"
                f"URL: {r['url']}\n"
                f"Snippet: {r.get('description', 'No description')}\n"
            )
        return "\n---\n".join(formatted) if formatted else "No results found."

    except Exception as e:
        return f"Search error: {str(e)}"


def calculate(expression: str) -> str:
    """Evaluate a mathematical expression safely.

    Supports basic arithmetic, powers, sqrt, trig functions, etc.
    """
    # Allow only safe math operations
    allowed_names = {
        "abs": abs, "round": round, "min": min, "max": max,
        "sqrt": math.sqrt, "pow": pow, "log": math.log,
        "log10": math.log10, "sin": math.sin, "cos": math.cos,
        "tan": math.tan, "pi": math.pi, "e": math.e,
    }

    try:
        # Remove anything that isn't a number, operator, or allowed function
        sanitized = re.sub(r'[^0-9+\-*/().,%\s a-z_]', '', expression.lower())
        result = eval(sanitized, {"__builtins__": {}}, allowed_names)
        return f"{expression} = {result}"
    except Exception as e:
        return f"Calculation error: {str(e)}"


def extract_text(url: str) -> str:
    """Fetch a URL and extract the main text content."""
    try:
        headers = {
            "User-Agent": (
                "Mozilla/5.0 (compatible; ResearchAgent/1.0; "
                "+https://example.com/bot)"
            )
        }
        resp = httpx.get(url, headers=headers, timeout=15.0, follow_redirects=True)
        resp.raise_for_status()

        soup = BeautifulSoup(resp.text, "html.parser")

        # Remove script, style, nav, footer elements
        for tag in soup(["script", "style", "nav", "footer", "header", "aside"]):
            tag.decompose()

        text = soup.get_text(separator="\n", strip=True)

        # Clean up excessive whitespace
        lines = [line.strip() for line in text.splitlines() if line.strip()]
        text = "\n".join(lines)

        # Truncate to avoid blowing up context window
        max_chars = 8000
        if len(text) > max_chars:
            text = text[:max_chars] + "\n\n[... truncated for length]"

        return text

    except Exception as e:
        return f"Extraction error: {str(e)}"
Tip

Notice the max_chars truncation in extract_text. This is critical. Without it, a single web page could consume your entire context window, leaving no room for the agent to think. Always set sensible limits on tool output size.


Step 2: Create the Tool Registry

The LLM needs to know what tools are available. Anthropic's API uses a specific JSON schema format for tool definitions. This is the "menu" the model reads to decide which tool to call.

TOOLS = [
    {
        "name": "web_search",
        "description": (
            "Search the web for current information. Use this when you need "
            "to find facts, recent data, or information you don't already know. "
            "Returns titles, URLs, and snippets from top results."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query. Be specific and include key terms.",
                }
            },
            "required": ["query"],
        },
    },
    {
        "name": "calculate",
        "description": (
            "Evaluate a mathematical expression. Supports arithmetic, powers, "
            "sqrt, trig functions, logarithms. Use this for any calculation "
            "rather than trying to compute in your head."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "The math expression to evaluate, e.g., '(1500 * 0.03) + 200'",
                }
            },
            "required": ["expression"],
        },
    },
    {
        "name": "extract_text",
        "description": (
            "Fetch a web page and extract its text content. Use this when you "
            "have a specific URL and need to read the full article or page "
            "content. Returns cleaned text without HTML."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The full URL to fetch, e.g., 'https://example.com/article'",
                }
            },
            "required": ["url"],
        },
    },
]

# Map tool names to their implementations
TOOL_FUNCTIONS = {
    "web_search": web_search,
    "calculate": calculate,
    "extract_text": extract_text,
}

A few things worth noting about tool descriptions:

  1. Be specific about when to use each tool. The LLM uses the description to decide which tool fits the situation. Vague descriptions lead to wrong tool choices.
  2. Include examples in parameter descriptions. Showing the model what good input looks like improves reliability dramatically.
  3. Describe what the tool returns. The model makes better decisions when it knows what to expect.

Step 3: Build the Agent Loop

This is the heart of the agent - the orchestrator that ties everything together. Pay attention to the error handling and the max-steps safety net.

import anthropic
import os

def run_agent(user_query: str, max_steps: int = 15, verbose: bool = True) -> str:
    """
    Run the research agent on a query.

    Args:
        user_query: The research question to answer.
        max_steps: Maximum number of tool-use cycles before stopping.
        verbose: If True, print each step for debugging.

    Returns:
        The agent's final answer as a string.
    """
    client = anthropic.Anthropic()  # Uses ANTHROPIC_API_KEY env var

    system_prompt = """You are a research agent. Your job is to answer the user's
question thoroughly and accurately by searching the web, reading sources, and
synthesizing information.

Guidelines:
- Always search for information rather than relying on memory for factual claims.
- Read the actual source (using extract_text) when a search snippet isn't detailed enough.
- Use the calculator for any mathematical computations - don't estimate.
- Cite your sources with URLs when providing factual information.
- If you can't find reliable information, say so rather than guessing.
- Aim for comprehensive but concise answers."""

    messages = [{"role": "user", "content": user_query}]

    for step in range(max_steps):
        if verbose:
            print(f"\n{'='*60}")
            print(f"Step {step + 1}/{max_steps}")
            print(f"{'='*60}")

        # Call Claude with the current conversation and available tools
        try:
            response = client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=4096,
                system=system_prompt,
                tools=TOOLS,
                messages=messages,
            )
        except anthropic.APIError as e:
            print(f"API Error: {e}")
            # Retry once after a brief pause
            import time
            time.sleep(2)
            try:
                response = client.messages.create(
                    model="claude-sonnet-4-20250514",
                    max_tokens=4096,
                    system=system_prompt,
                    tools=TOOLS,
                    messages=messages,
                )
            except anthropic.APIError as e:
                return f"Agent failed after retry: {e}"

        # Process the response
        # Claude can return multiple content blocks - text and/or tool_use
        assistant_message = {"role": "assistant", "content": response.content}
        messages.append(assistant_message)

        # Check if the agent is done (no more tool calls)
        if response.stop_reason == "end_turn":
            # Extract the text content from the final response
            final_text = ""
            for block in response.content:
                if hasattr(block, "text"):
                    final_text += block.text
            if verbose:
                print(f"\nAgent finished. Final answer length: {len(final_text)} chars")
            return final_text

        # Execute any tool calls
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_name = block.name
                tool_input = block.input

                if verbose:
                    print(f"\nTool call: {tool_name}")
                    print(f"Input: {json.dumps(tool_input, indent=2)}")

                # Execute the tool
                if tool_name in TOOL_FUNCTIONS:
                    try:
                        result = TOOL_FUNCTIONS[tool_name](**tool_input)
                    except Exception as e:
                        result = f"Tool execution error: {str(e)}"
                else:
                    result = f"Unknown tool: {tool_name}"

                if verbose:
                    preview = result[:200] + "..." if len(result) > 200 else result
                    print(f"Result: {preview}")

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })

        # Feed tool results back to the conversation
        if tool_results:
            messages.append({"role": "user", "content": tool_results})

    return "Agent reached maximum steps without producing a final answer."

Let's break down the key design decisions:

Max steps safety net. Without this, a confused agent could loop forever, burning through your API budget. 15 steps is generous for most research queries - if the agent hasn't found an answer by then, something is wrong.

Retry logic. API calls fail. Networks hiccup. Rate limits trigger. A single retry with a brief delay handles the vast majority of transient failures.

Verbose mode. When developing an agent, you need to see what it's doing at each step. Print the tool calls and results. When it misbehaves (and it will), the trace is how you diagnose the problem.

Stop reason check. Anthropic's API signals stop_reason == "end_turn" when the model has finished its response without requesting a tool call. This is how we know the agent is done and has produced its final answer.

Warning

The max_steps parameter isn't just about cost - it's about safety. An agent in a loop can take real-world actions repeatedly. Imagine a bug in a customer-support agent that keeps issuing refunds in a loop. Always set hard limits on agent execution.


Step 4: Add Structured Output Parsing

Sometimes you want the agent's output in a structured format - not just free text. For example, you might want a JSON object with specific fields. Here's a utility to request and parse structured output:

def run_agent_structured(
    user_query: str,
    output_schema: dict,
    max_steps: int = 15,
) -> dict:
    """
    Run the agent and parse the final output as structured JSON.

    Args:
        user_query: The research question.
        output_schema: A description of the expected JSON output format.
        max_steps: Maximum tool-use cycles.

    Returns:
        Parsed JSON dict from the agent's response.
    """
    structured_query = f"""{user_query}

IMPORTANT: Your final answer MUST be valid JSON matching this schema:
{json.dumps(output_schema, indent=2)}

Return ONLY the JSON object in your final response, no other text."""

    raw_result = run_agent(structured_query, max_steps=max_steps, verbose=False)

    # Extract JSON from the response
    try:
        # Try parsing the entire response as JSON
        return json.loads(raw_result)
    except json.JSONDecodeError:
        # Try extracting JSON from a code block
        json_match = re.search(r'```(?:json)?\s*(.*?)```', raw_result, re.DOTALL)
        if json_match:
            return json.loads(json_match.group(1).strip())
        # Try finding a JSON object in the text
        json_match = re.search(r'\{.*\}', raw_result, re.DOTALL)
        if json_match:
            return json.loads(json_match.group(0))
        raise ValueError(f"Could not parse JSON from response: {raw_result[:500]}")
Note

Structured output parsing is inherently fragile when you're parsing free text. In production, use the Anthropic API's built-in structured output support or add validation and retries. The code above is a starting point, not a production solution.


Step 5: Run It End-to-End

Let's see the agent in action with a real query. Here's the main entry point:

if __name__ == "__main__":
    query = (
        "What are the top 3 most energy-efficient programming languages "
        "based on recent benchmarks? Compare their energy consumption "
        "and also compare their performance in terms of execution speed."
    )

    print(f"Query: {query}\n")
    print("Starting agent...\n")

    result = run_agent(query, verbose=True)

    print("\n" + "=" * 60)
    print("FINAL ANSWER")
    print("=" * 60)
    print(result)

Example trace

Here's what a typical run looks like (condensed for readability):

Query: What are the top 3 most energy-efficient programming languages...

Starting agent...

============================================================
Step 1/15
============================================================

Tool call: web_search
Input: {"query": "most energy efficient programming languages benchmark 2024 2025"}
Result: Title: Energy Efficiency across Programming Languages - 2024 Update
URL: https://example.com/energy-benchmarks
Snippet: A comprehensive study measuring energy consumption across 27 prog...

============================================================
Step 2/15
============================================================

Tool call: extract_text
Input: {"url": "https://example.com/energy-benchmarks"}
Result: Energy Efficiency across Programming Languages
Updated January 2024
This study measures the energy consumption, execution time, and memory usage...

============================================================
Step 3/15
============================================================

Tool call: web_search
Input: {"query": "programming language energy consumption comparison C Rust Java"}
Result: ...

============================================================
Step 4/15
============================================================

Tool call: calculate
Input: {"expression": "76.04 / 49.20"}
Result: 76.04 / 49.20 = 1.5455284552845529

============================================================
Step 5/15
============================================================

Agent finished. Final answer length: 1847 chars

============================================================
FINAL ANSWER
============================================================
Based on recent benchmarks, the top 3 most energy-efficient
programming languages are:

1. **C** - Baseline energy consumption (1.00x). Fastest execution...
2. **Rust** - 1.03x energy consumption relative to C...
3. **C++** - 1.34x energy consumption relative to C...

[Full formatted answer with comparisons and source citations]

Notice the agent's behavior: it searched for benchmarks, read the actual source for detailed data, searched again for additional context, used the calculator for a comparison ratio, and then synthesized everything into a clear answer. That's the agent loop in action.


Full Working Code Listing

Here's the complete agent in one file, ready to copy and run:

#!/usr/bin/env python3
"""A research agent built from scratch - no frameworks."""

import anthropic
import httpx
from bs4 import BeautifulSoup
import json
import math
import os
import re

# ── Tools ────────────────────────────────────────────────────

def web_search(query: str) -> str:
    """Search the web using Brave Search API."""
    try:
        headers = {"Accept": "application/json",
                   "X-Subscription-Token": os.environ.get("BRAVE_API_KEY", "")}
        resp = httpx.get("https://api.search.brave.com/res/v1/web/search",
                         params={"q": query, "count": 5},
                         headers=headers, timeout=10.0)
        resp.raise_for_status()
        results = resp.json().get("web", {}).get("results", [])
        return "\n---\n".join(
            f"Title: {r['title']}\nURL: {r['url']}\nSnippet: {r.get('description', 'N/A')}"
            for r in results[:5]
        ) or "No results found."
    except Exception as e:
        return f"Search error: {e}"

def calculate(expression: str) -> str:
    """Safely evaluate a math expression."""
    safe = {"abs": abs, "round": round, "min": min, "max": max,
            "sqrt": math.sqrt, "pow": pow, "log": math.log,
            "log10": math.log10, "pi": math.pi, "e": math.e}
    try:
        sanitized = re.sub(r'[^0-9+\-*/().%\s a-z_]', '', expression.lower())
        return f"{expression} = {eval(sanitized, {'__builtins__': {}}, safe)}"
    except Exception as e:
        return f"Calculation error: {e}"

def extract_text(url: str) -> str:
    """Fetch and extract text from a URL."""
    try:
        resp = httpx.get(url, headers={"User-Agent": "ResearchAgent/1.0"},
                         timeout=15.0, follow_redirects=True)
        resp.raise_for_status()
        soup = BeautifulSoup(resp.text, "html.parser")
        for tag in soup(["script", "style", "nav", "footer", "header"]):
            tag.decompose()
        text = "\n".join(l.strip() for l in soup.get_text("\n", strip=True).splitlines() if l.strip())
        return text[:8000] + "\n[truncated]" if len(text) > 8000 else text
    except Exception as e:
        return f"Extraction error: {e}"

# ── Tool registry ────────────────────────────────────────────

TOOLS = [
    {"name": "web_search",
     "description": "Search the web for current information. Returns titles, URLs, and snippets.",
     "input_schema": {"type": "object", "properties": {
         "query": {"type": "string", "description": "Search query"}}, "required": ["query"]}},
    {"name": "calculate",
     "description": "Evaluate a math expression. Example: '(1500 * 0.03) + 200'",
     "input_schema": {"type": "object", "properties": {
         "expression": {"type": "string", "description": "Math expression to evaluate"}}, "required": ["expression"]}},
    {"name": "extract_text",
     "description": "Fetch a URL and extract its text content for reading.",
     "input_schema": {"type": "object", "properties": {
         "url": {"type": "string", "description": "URL to fetch"}}, "required": ["url"]}},
]
TOOL_FN = {"web_search": web_search, "calculate": calculate, "extract_text": extract_text}

# ── Agent loop ───────────────────────────────────────────────

def run_agent(query: str, max_steps: int = 15, verbose: bool = True) -> str:
    client = anthropic.Anthropic()
    system = ("You are a research agent. Search the web for facts, read sources, "
              "use the calculator for math, and cite URLs in your final answer.")
    messages = [{"role": "user", "content": query}]

    for step in range(max_steps):
        if verbose: print(f"\n── Step {step+1}/{max_steps} ──")
        response = client.messages.create(model="claude-sonnet-4-20250514", max_tokens=4096,
                                          system=system, tools=TOOLS, messages=messages)
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            return "".join(b.text for b in response.content if hasattr(b, "text"))

        results = []
        for block in response.content:
            if block.type == "tool_use":
                if verbose: print(f"  Tool: {block.name}({json.dumps(block.input)})")
                try:    result = TOOL_FN[block.name](**block.input)
                except Exception as e: result = f"Error: {e}"
                if verbose: print(f"  Result: {result[:150]}...")
                results.append({"type": "tool_result", "tool_use_id": block.id, "content": result})

        if results:
            messages.append({"role": "user", "content": results})

    return "Max steps reached."

if __name__ == "__main__":
    print(run_agent("What is the population of the 5 largest cities in Japan?"))

That's approximately 80 lines of meaningful code (excluding comments). A fully functional agent.


Testing Your Agent

Testing non-deterministic systems is genuinely hard. The same query can produce different results on different runs. Here's a practical approach:

1. Test tools independently

Each tool is a pure function (or close to it). Test them in isolation:

def test_calculate():
    assert "= 42" in calculate("6 * 7")
    assert "= 3.0" in calculate("sqrt(9)")
    assert "error" in calculate("invalid!!").lower()

def test_extract_text():
    text = extract_text("https://example.com")
    assert len(text) > 0
    assert "Example Domain" in text

2. Test tool selection with mocked LLM calls

def test_agent_uses_search_for_factual_query():
    """Verify the agent calls web_search for a factual question."""
    tool_calls = []
    original_search = TOOL_FN["web_search"]

    def mock_search(query):
        tool_calls.append(query)
        return "Mock result: The answer is 42."

    TOOL_FN["web_search"] = mock_search
    try:
        run_agent("What is the GDP of France?", max_steps=3, verbose=False)
        assert len(tool_calls) > 0, "Agent should have called web_search"
    finally:
        TOOL_FN["web_search"] = original_search

3. Evaluation-based testing

For the final output, use another LLM call to evaluate:

def evaluate_answer(query: str, answer: str) -> dict:
    """Use an LLM to evaluate the quality of an agent's answer."""
    client = anthropic.Anthropic()
    eval_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{"role": "user", "content": f"""
Rate this answer on three dimensions (1-5 each):
- Accuracy: Are the facts correct?
- Completeness: Does it fully address the question?
- Citations: Does it cite sources?

Question: {query}
Answer: {answer}

Return JSON: {{"accuracy": N, "completeness": N, "citations": N, "notes": "..."}}
"""}]
    )
    return json.loads(eval_response.content[0].text)
Tip

Build a small evaluation suite of 10-20 queries with expected behaviors (not expected outputs - those will vary). Run it regularly. Track the scores over time. This is the closest thing to a test suite you'll get for agent systems.


Debugging Tips

When your agent misbehaves (and it will), here's your debugging playbook:

1. Enable verbose mode and read the trace

The most common issues are immediately obvious in the trace:

  • Agent calling the wrong tool? Your tool descriptions need work.
  • Agent stuck in a loop? It's probably not getting useful information from its tools and keeps retrying.
  • Agent ignoring tool results? The result might be too long and getting lost in the context.

2. Log everything

import logging

logging.basicConfig(level=logging.DEBUG, filename="agent_trace.log")

def log_step(step: int, tool_name: str, tool_input: dict, result: str):
    logging.debug(json.dumps({
        "step": step,
        "tool": tool_name,
        "input": tool_input,
        "result_length": len(result),
        "result_preview": result[:500],
    }, indent=2))

3. Replay conversations

Save the full messages list to a file. When something goes wrong, you can replay the conversation - feeding the exact same messages to the LLM to reproduce the issue:

# Save after each run
with open("last_run.json", "w") as f:
    json.dump(messages, f, indent=2, default=str)

Common Pitfalls and How to Avoid Them

Pitfall Symptom Fix
Infinite loops Agent keeps calling the same tool with the same query Set max_steps. Add a check: if the last 3 tool calls were identical, force stop.
Context overflow API error about token limits, or agent "forgets" early steps Truncate tool results. Summarize conversation history periodically.
Tool errors crashing the agent Unhandled exception kills the loop Wrap every tool call in try/except. Return error strings, not exceptions.
Wrong tool selection Agent uses calculator when it should search, etc. Improve tool descriptions. Add "Use this when..." and "Do NOT use this for..."
Hallucinated tool names Agent tries to call a tool that doesn't exist Check tool_name in TOOL_FUNCTIONS before calling. Return a clear error.
Overly verbose answers Agent writes a 3,000-word essay for a simple question Add instructions to system prompt: "Be concise. Match your answer length to the question complexity."
Expensive runs $2+ per query from too many LLM calls Lower max_steps. Use a cheaper model for simple tool-selection steps. Cache search results.

Making It Production-Ready

Our agent works, but it's a prototype. Here's what a production version would need:

Must-have for production

  • Authentication and rate limiting - Don't let users hammer your API
  • Input validation and sanitization - Users will send adversarial inputs
  • Structured logging and observability - You need to know what your agents are doing at scale
  • Cost tracking - Every LLM call costs money; track per-query costs
  • Timeout handling - Both per-tool and per-query timeouts
  • Graceful degradation - If a tool is down, the agent should work around it, not crash
  • Conversation memory - For multi-turn interactions, you need persistent state

Nice-to-have

  • Caching - Same search query shouldn't hit the API twice
  • Streaming - Show the user what the agent is doing in real-time
  • Human-in-the-loop - Pause for approval before high-stakes actions
  • A/B testing - Compare different system prompts, models, or tool configurations
  • Fallback models - If your primary model is down, fall back to an alternative
Note

Don't try to build all of this before you've validated that your agent is useful. Ship the prototype. Get users. Then harden based on real failure modes, not imagined ones.


Exercise: Extend the Agent

Now it's your turn. Add a fourth tool to the agent: a date/time tool that can answer questions like "What day of the week was January 15, 2024?" or "How many days between March 1 and December 25?"

Here's your starting point:

def date_tool(operation: str, **kwargs) -> str:
    """Perform date/time operations.

    Operations:
    - "day_of_week": What day of the week is/was a given date?
    - "days_between": How many days between two dates?
    - "add_days": What date is N days from a given date?
    """
    from datetime import datetime, timedelta
    # Your implementation here
    pass

Tasks:

  1. Implement the date_tool function with all three operations
  2. Write the tool schema (name, description, input_schema) and add it to TOOLS
  3. Register it in TOOL_FN
  4. Test it with the query: "If a project started on January 15, 2025, and the deadline is 90 business days later, what date is the deadline? What day of the week does it fall on?"

This exercise forces you to think about tool design - what parameters does the LLM need to pass? How do you handle edge cases? What makes a good tool description?


What's Next

You now have a working agent and, more importantly, you understand how it works from the inside out. No magic, no black boxes - just a loop that asks an LLM what to do, does it, and reports back.

In the next chapter, we'll take this foundation and make it significantly more powerful by adding memory - giving your agent the ability to remember past conversations, learn from mistakes, and build up knowledge over time. We'll also explore how to manage context windows effectively when your agent needs to handle long, complex tasks.

The journey from "toy agent" to "useful agent" is mostly about the details: better error handling, smarter tool design, and thoughtful guardrails. You've got the foundation. Now let's build on it.