Skip to content

What Are AI Agents?

Imagine asking your computer: "Book me a round-trip flight to Tokyo for the last week of April, find a hotel near Shibuya under $200/night, and build a day-by-day itinerary that includes at least two Michelin-starred ramen shops." Then you walk away, make coffee, and come back to find it done - flights booked, hotel confirmed, itinerary in your inbox with reservation links.

That is not science fiction anymore. That is the promise - and increasingly the reality - of AI agents.

But what exactly is an AI agent? How is it different from ChatGPT, GitHub Copilot, or the autocomplete in your IDE? And why is everyone in tech suddenly obsessed with them?

Let's break it all down.


Defining AI Agents

An AI agent is autonomous software that can perceive its environment, decide what to do, and act on those decisions - often across multiple steps - to accomplish a goal.

That definition has three critical parts:

  1. Perceive - The agent takes in information. This could be a user's message, the contents of a web page, sensor data, an API response, or the result of running code.
  2. Decide - Using a reasoning engine (today, almost always a large language model), the agent figures out what to do next. This is where planning, prioritization, and judgment happen.
  3. Act - The agent executes actions in the real world: calling APIs, writing files, sending emails, running code, clicking buttons.

The key differentiator is autonomy. A traditional program does exactly what you tell it. A chatbot answers one question at a time. An agent pursues a goal across multiple steps, deciding on its own which actions to take and in what order.

"An agent is a system that uses an LLM to decide the control flow of an application." - Harrison Chase, creator of LangChain

That quote captures it well. The LLM isn't just generating text - it's driving the program's execution.


The Agent Loop

Every AI agent, regardless of how sophisticated it is, runs on some variation of the same fundamental loop. Understanding this loop is understanding agents.

Here's the cycle:

┌─────────────────────────────────────┐
│           USER GIVES GOAL           │
└──────────────┬──────────────────────┘
               │
               ▼
┌─────────────────────────────────────┐
│         OBSERVE / PERCEIVE          │◄──────────┐
│   (read inputs, tool results, etc.) │           │
└──────────────┬──────────────────────┘           │
               │                                  │
               ▼                                  │
┌─────────────────────────────────────┐           │
│        THINK / REASON / PLAN        │           │
│   (LLM decides the next action)     │           │
└──────────────┬──────────────────────┘           │
               │                                  │
               ▼                                  │
┌─────────────────────────────────────┐           │
│              ACT                    │           │
│   (call tool, write code, search)   │───────────┘
└──────────────┬──────────────────────┘
               │
               ▼ (when goal is met or max steps reached)
┌─────────────────────────────────────┐
│         RETURN RESULT               │
└─────────────────────────────────────┘

And here's that same loop expressed as Python pseudocode:

def agent_loop(goal: str, tools: list, max_steps: int = 10) -> str:
    """The fundamental agent loop."""
    messages = [{"role": "user", "content": goal}]

    for step in range(max_steps):
        # THINK: Ask the LLM what to do next
        response = llm.chat(messages, tools=tools)

        # CHECK: Is the agent done?
        if response.is_final_answer:
            return response.content

        # ACT: Execute the tool the LLM chose
        tool_name = response.tool_call.name
        tool_args = response.tool_call.arguments
        result = execute_tool(tool_name, tool_args)

        # OBSERVE: Feed the result back into the conversation
        messages.append({"role": "assistant", "content": response})
        messages.append({"role": "tool", "content": result})

    return "Max steps reached without completing the goal."

That's it. Every agent you've ever used - Claude Code, Devin, AutoGPT, Perplexity - is a more sophisticated version of this loop. The sophistication comes from what tools are available, how good the reasoning is, and how the system handles errors and edge cases.

Tip

When you're building your first agent, start with this loop. Get it working with one tool. Then add complexity. Trying to build a sophisticated multi-agent system before you've internalized this loop is like trying to write a distributed system before you understand a function call.


Chatbots vs. Copilots vs. Agents

These three terms get thrown around interchangeably, and that causes real confusion. They are different things on a spectrum of autonomy.

Dimension Chatbot Copilot Agent
Interaction model You ask, it answers You work, it suggests You set a goal, it works
Number of steps Single turn (or simple multi-turn) Inline suggestions, one at a time Multiple steps, often dozens
Who drives? The human The human (with AI assist) The AI (with human guardrails)
Tool use Rarely Sometimes (e.g., code completion) Extensively - tools are core
Autonomy Low Medium High
Error recovery Tells you it can't help Offers alternatives Retries, replans, works around issues
Examples ChatGPT (basic), customer support bots GitHub Copilot, Gmail autocomplete Claude Code, Devin, research agents

The mental model I find most useful: a chatbot is an oracle you consult. A copilot is a pair programmer sitting next to you. An agent is a junior engineer you can delegate a task to and check on later.

Note

These categories exist on a spectrum, not as hard boundaries. Many products blend them. Claude, for example, acts as a chatbot when you ask it a question, but becomes an agent when you give it a multi-step task with tool access.


Why Agents Matter NOW

AI agents aren't a new idea. The concept dates back to the 1980s. So why is 2025 the year they're suddenly everywhere?

Because four capabilities converged at the same time:

1. LLMs Got Good Enough to Reason

Before GPT-4 (early 2023), language models could generate plausible-sounding text but struggled with multi-step reasoning. Modern models like Claude, GPT-4o, and Gemini can genuinely plan, decompose problems, and make judgment calls. Not perfectly - but well enough to be useful.

2. Function Calling / Tool Use Became Standard

In mid-2023, OpenAI introduced structured function calling. Anthropic, Google, and others followed. This gave models a reliable way to say "I want to call this specific tool with these specific arguments" instead of generating free-form text that you'd have to parse. This was the missing API primitive.

3. Context Windows Exploded

Agents need to keep track of what they've done. When context windows were 4K tokens, agents would "forget" what they were doing after a few steps. Today, 200K-token (and beyond) context windows mean agents can maintain coherent plans across dozens of actions.

4. The Ecosystem Caught Up

Frameworks like LangChain, CrewAI, AutoGen, and the Anthropic SDK matured. Observability tools (LangSmith, Braintrust, Arize) made agents debuggable. The infrastructure to build, test, and deploy agents went from research-grade to production-grade.

Warning

"Agents" is also the most overhyped term in AI right now. Many products slapping the "agent" label on their marketing are really just chatbots with a tool or two. Be skeptical. Ask: does it actually pursue multi-step goals autonomously? If not, it's not really an agent.


Historical Context: How We Got Here

AI agents didn't appear out of thin air. They're the culmination of decades of research.

1980s - Rule-Based Expert Systems: Programs like MYCIN (medical diagnosis) used hand-coded if-then rules to make decisions. They were "agents" in the academic sense, but brittle - they could only handle situations their creators had anticipated.

1990s - Reinforcement Learning Agents: Researchers explored agents that learned through trial and error. TD-Gammon learned to play backgammon at a world-class level. These agents were powerful but narrow - trained for one specific task.

2000s - Multi-Agent Systems: Academic research on agents that could communicate and coordinate. Interesting theoretically, but limited by the AI capabilities of the time.

2010s - Virtual Assistants: Siri (2011), Alexa (2014), Google Assistant (2016). These felt like agents but were mostly sophisticated chatbots - they could handle simple commands but fell apart on complex, multi-step tasks.

2023 - The LLM Agent Explosion: ChatGPT plugins, AutoGPT (March 2023), BabyAGI, and a wave of agent frameworks. Most were demos that broke in practice, but they proved the concept.

2024-2025 - Production Agents: Claude Code, Devin, Cursor, Perplexity, and dozens of domain-specific agents that actually work reliably enough for production use. The transition from "cool demo" to "thing I use every day."


The Spectrum of Autonomy

Not every agent needs to (or should) operate with full autonomy. Think of it as a spectrum:

FULLY MANUAL ──── COPILOT ──── SEMI-AUTONOMOUS ──── FULLY AUTONOMOUS
    │                │                │                      │
 You do            AI suggests,     AI acts, you          AI acts,
 everything        you decide       approve key steps     you review after
                                                          (or not at all)

Fully Manual: Traditional software. You click every button, write every query.

Copilot: The AI watches what you're doing and offers suggestions. You accept or reject each one. Think GitHub Copilot for code completion.

Semi-Autonomous: The AI takes action but pauses at critical decision points for human approval. "I found 3 flights. Should I book the $450 direct one, or the $280 one with a layover?" This is where most production agents sit today - and for good reason.

Fully Autonomous: The AI does everything end-to-end without human intervention. "Trip booked. Confirmation in your email." This is the goal for low-stakes tasks but still risky for high-stakes ones.

Tip

Start semi-autonomous. Always. You can remove human checkpoints later once you've built trust in the system. Adding them after your agent has already booked a $5,000 non-refundable flight is not a fun conversation to have.


Real-World Examples of AI Agents

Let's ground this in reality. Here are agents people are actually using in 2025:

Coding Agents

  • Claude Code - Anthropic's CLI agent that can read your codebase, write code, run tests, fix bugs, and create commits. It operates in your terminal with access to your file system and shell.
  • Cursor - An IDE with deep agent integration. It can make multi-file edits, explain code, and iterate based on test results.
  • Devin - Cognition's "AI software engineer" that can handle full development tasks in a sandboxed environment.

Research Agents

  • Perplexity - Searches the web, reads multiple sources, synthesizes answers with citations. It's an agent that makes the "tool calls" invisible to the user.
  • Elicit - An academic research agent that finds papers, extracts claims, and identifies consensus across studies.

Business Agents

  • Customer support agents - Companies like Klarna, Intercom, and Sierra deploy agents that handle customer issues end-to-end: processing refunds, updating orders, answering policy questions.
  • Data analysis agents - Connect to databases, write SQL, create visualizations, and narrate findings.
  • Sales agents - Research prospects, draft personalized emails, schedule meetings, and update CRM records.

What's notable is that the successful agents share common traits: they have clear goals, well-defined tools, appropriate guardrails, and graceful fallback to humans when they're uncertain.


Key Terminology

Before we go further, let's establish a shared vocabulary. You'll encounter these terms throughout this book and in the broader AI agent ecosystem.

Term Definition Example
Agent Software that autonomously pursues goals using perception, reasoning, and action A coding agent that fixes a bug across multiple files
Environment Everything external to the agent that it can observe or affect File system, web, APIs, databases
Action A discrete step the agent takes to affect its environment Calling a search API, writing a file, running a command
Observation Information the agent receives after taking an action Search results, command output, API response
Tool A specific capability the agent can use - a function it can call Web search, calculator, code executor, file reader
Tool Schema The structured definition of what a tool does and what inputs it takes JSON schema describing a function's parameters
Episode One complete run of an agent from receiving a goal to producing a result Researching a topic and writing a summary
Trajectory The sequence of thought-action-observation steps in an episode Think → Search → Read → Think → Write → Done
Context Window The maximum amount of text the LLM can "see" at once 200K tokens for Claude, ~128K for GPT-4o
Grounding Connecting the LLM's reasoning to real-world data via tools Using web search to verify facts before answering
Guardrails Constraints that prevent the agent from taking harmful actions "Never delete files without confirmation"
Human-in-the-Loop Design pattern where the agent pauses for human approval at key steps "I'm about to send this email. Approve?"

Common Misconceptions About AI Agents

There's a lot of hype and misunderstanding around agents. Let's clear up the biggest ones.

Misconception 1: "Agents are just fancy prompts"

Prompts are one component. An agent also includes tool definitions, execution logic, error handling, memory management, and often multiple LLM calls orchestrated together. Saying agents are "just prompts" is like saying a web application is "just HTML."

Misconception 2: "Agents can do anything"

They can't. Agents are bounded by their tools, their LLM's reasoning capability, and their context window. An agent without a web search tool can't search the web, no matter how smart the underlying model is. An agent with a 4K context window will struggle with complex multi-step tasks. The design of the agent's toolset is often more important than the choice of model.

Misconception 3: "More autonomy is always better"

More autonomy means more potential for compounding errors. If an agent makes a wrong decision on step 3 of a 20-step plan, every subsequent step may be wasted (or worse, harmful). The best agents are designed with appropriate autonomy for their domain. A coding agent that auto-commits to main is not "more advanced" - it's poorly designed.

Misconception 4: "Agents will replace all software"

Agents are great for tasks that require judgment, flexibility, and natural language understanding. They're terrible for tasks that need to run the exact same way every time with 100% reliability. You don't need an agent to process payroll. You need an agent to figure out how to set up payroll processing for a new country.

Misconception 5: "You need a framework to build an agent"

You don't. An agent is a loop that calls an LLM and executes tools. You can build one in 50 lines of Python. Frameworks can help with common patterns, but they're not prerequisites. In fact, building one from scratch first (as we'll do in Chapter 3) is the best way to understand what frameworks are abstracting for you.

Note

A rule of thumb for when to use an agent vs. traditional software: if you can write a flowchart for the task that covers all cases, use traditional software. If the flowchart would need a "use judgment" box, consider an agent.


What's Coming Next

Now that you have a solid understanding of what agents are, how they work at a high level, and why they've become practical, we're going to go deeper.

In Chapter 2, we'll explore the different types of AI agents - from simple reactive agents to sophisticated hierarchical multi-agent systems. You'll learn which architecture fits which problem, and how the commercial products you use every day map to these categories.

In Chapter 3, we'll build a working agent from scratch. No frameworks, no hand-waving - just Python, an LLM API, and a clear understanding of the agent loop we covered here. By the end of that chapter, you'll have a functional research agent that can search the web, extract information, and synthesize findings.

The goal of this book is not to make you an AI agents theorist. It's to make you an AI agents practitioner. Let's keep building.