What Are AI Agents?
Imagine asking your computer: "Book me a round-trip flight to Tokyo for the last week of April, find a hotel near Shibuya under $200/night, and build a day-by-day itinerary that includes at least two Michelin-starred ramen shops." Then you walk away, make coffee, and come back to find it done - flights booked, hotel confirmed, itinerary in your inbox with reservation links.
That is not science fiction anymore. That is the promise - and increasingly the reality - of AI agents.
But what exactly is an AI agent? How is it different from ChatGPT, GitHub Copilot, or the autocomplete in your IDE? And why is everyone in tech suddenly obsessed with them?
Let's break it all down.
Defining AI Agents
An AI agent is autonomous software that can perceive its environment, decide what to do, and act on those decisions - often across multiple steps - to accomplish a goal.
That definition has three critical parts:
- Perceive - The agent takes in information. This could be a user's message, the contents of a web page, sensor data, an API response, or the result of running code.
- Decide - Using a reasoning engine (today, almost always a large language model), the agent figures out what to do next. This is where planning, prioritization, and judgment happen.
- Act - The agent executes actions in the real world: calling APIs, writing files, sending emails, running code, clicking buttons.
The key differentiator is autonomy. A traditional program does exactly what you tell it. A chatbot answers one question at a time. An agent pursues a goal across multiple steps, deciding on its own which actions to take and in what order.
"An agent is a system that uses an LLM to decide the control flow of an application." - Harrison Chase, creator of LangChain
That quote captures it well. The LLM isn't just generating text - it's driving the program's execution.
The Agent Loop
Every AI agent, regardless of how sophisticated it is, runs on some variation of the same fundamental loop. Understanding this loop is understanding agents.
Here's the cycle:
┌─────────────────────────────────────┐
│ USER GIVES GOAL │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ OBSERVE / PERCEIVE │◄──────────┐
│ (read inputs, tool results, etc.) │ │
└──────────────┬──────────────────────┘ │
│ │
▼ │
┌─────────────────────────────────────┐ │
│ THINK / REASON / PLAN │ │
│ (LLM decides the next action) │ │
└──────────────┬──────────────────────┘ │
│ │
▼ │
┌─────────────────────────────────────┐ │
│ ACT │ │
│ (call tool, write code, search) │───────────┘
└──────────────┬──────────────────────┘
│
▼ (when goal is met or max steps reached)
┌─────────────────────────────────────┐
│ RETURN RESULT │
└─────────────────────────────────────┘
And here's that same loop expressed as Python pseudocode:
def agent_loop(goal: str, tools: list, max_steps: int = 10) -> str:
"""The fundamental agent loop."""
messages = [{"role": "user", "content": goal}]
for step in range(max_steps):
# THINK: Ask the LLM what to do next
response = llm.chat(messages, tools=tools)
# CHECK: Is the agent done?
if response.is_final_answer:
return response.content
# ACT: Execute the tool the LLM chose
tool_name = response.tool_call.name
tool_args = response.tool_call.arguments
result = execute_tool(tool_name, tool_args)
# OBSERVE: Feed the result back into the conversation
messages.append({"role": "assistant", "content": response})
messages.append({"role": "tool", "content": result})
return "Max steps reached without completing the goal."
That's it. Every agent you've ever used - Claude Code, Devin, AutoGPT, Perplexity - is a more sophisticated version of this loop. The sophistication comes from what tools are available, how good the reasoning is, and how the system handles errors and edge cases.
When you're building your first agent, start with this loop. Get it working with one tool. Then add complexity. Trying to build a sophisticated multi-agent system before you've internalized this loop is like trying to write a distributed system before you understand a function call.
Chatbots vs. Copilots vs. Agents
These three terms get thrown around interchangeably, and that causes real confusion. They are different things on a spectrum of autonomy.
| Dimension | Chatbot | Copilot | Agent |
|---|---|---|---|
| Interaction model | You ask, it answers | You work, it suggests | You set a goal, it works |
| Number of steps | Single turn (or simple multi-turn) | Inline suggestions, one at a time | Multiple steps, often dozens |
| Who drives? | The human | The human (with AI assist) | The AI (with human guardrails) |
| Tool use | Rarely | Sometimes (e.g., code completion) | Extensively - tools are core |
| Autonomy | Low | Medium | High |
| Error recovery | Tells you it can't help | Offers alternatives | Retries, replans, works around issues |
| Examples | ChatGPT (basic), customer support bots | GitHub Copilot, Gmail autocomplete | Claude Code, Devin, research agents |
The mental model I find most useful: a chatbot is an oracle you consult. A copilot is a pair programmer sitting next to you. An agent is a junior engineer you can delegate a task to and check on later.
These categories exist on a spectrum, not as hard boundaries. Many products blend them. Claude, for example, acts as a chatbot when you ask it a question, but becomes an agent when you give it a multi-step task with tool access.
Why Agents Matter NOW
AI agents aren't a new idea. The concept dates back to the 1980s. So why is 2025 the year they're suddenly everywhere?
Because four capabilities converged at the same time:
1. LLMs Got Good Enough to Reason
Before GPT-4 (early 2023), language models could generate plausible-sounding text but struggled with multi-step reasoning. Modern models like Claude, GPT-4o, and Gemini can genuinely plan, decompose problems, and make judgment calls. Not perfectly - but well enough to be useful.
2. Function Calling / Tool Use Became Standard
In mid-2023, OpenAI introduced structured function calling. Anthropic, Google, and others followed. This gave models a reliable way to say "I want to call this specific tool with these specific arguments" instead of generating free-form text that you'd have to parse. This was the missing API primitive.
3. Context Windows Exploded
Agents need to keep track of what they've done. When context windows were 4K tokens, agents would "forget" what they were doing after a few steps. Today, 200K-token (and beyond) context windows mean agents can maintain coherent plans across dozens of actions.
4. The Ecosystem Caught Up
Frameworks like LangChain, CrewAI, AutoGen, and the Anthropic SDK matured. Observability tools (LangSmith, Braintrust, Arize) made agents debuggable. The infrastructure to build, test, and deploy agents went from research-grade to production-grade.
"Agents" is also the most overhyped term in AI right now. Many products slapping the "agent" label on their marketing are really just chatbots with a tool or two. Be skeptical. Ask: does it actually pursue multi-step goals autonomously? If not, it's not really an agent.
Historical Context: How We Got Here
AI agents didn't appear out of thin air. They're the culmination of decades of research.
1980s - Rule-Based Expert Systems: Programs like MYCIN (medical diagnosis) used hand-coded if-then rules to make decisions. They were "agents" in the academic sense, but brittle - they could only handle situations their creators had anticipated.
1990s - Reinforcement Learning Agents: Researchers explored agents that learned through trial and error. TD-Gammon learned to play backgammon at a world-class level. These agents were powerful but narrow - trained for one specific task.
2000s - Multi-Agent Systems: Academic research on agents that could communicate and coordinate. Interesting theoretically, but limited by the AI capabilities of the time.
2010s - Virtual Assistants: Siri (2011), Alexa (2014), Google Assistant (2016). These felt like agents but were mostly sophisticated chatbots - they could handle simple commands but fell apart on complex, multi-step tasks.
2023 - The LLM Agent Explosion: ChatGPT plugins, AutoGPT (March 2023), BabyAGI, and a wave of agent frameworks. Most were demos that broke in practice, but they proved the concept.
2024-2025 - Production Agents: Claude Code, Devin, Cursor, Perplexity, and dozens of domain-specific agents that actually work reliably enough for production use. The transition from "cool demo" to "thing I use every day."
The Spectrum of Autonomy
Not every agent needs to (or should) operate with full autonomy. Think of it as a spectrum:
FULLY MANUAL ──── COPILOT ──── SEMI-AUTONOMOUS ──── FULLY AUTONOMOUS
│ │ │ │
You do AI suggests, AI acts, you AI acts,
everything you decide approve key steps you review after
(or not at all)
Fully Manual: Traditional software. You click every button, write every query.
Copilot: The AI watches what you're doing and offers suggestions. You accept or reject each one. Think GitHub Copilot for code completion.
Semi-Autonomous: The AI takes action but pauses at critical decision points for human approval. "I found 3 flights. Should I book the $450 direct one, or the $280 one with a layover?" This is where most production agents sit today - and for good reason.
Fully Autonomous: The AI does everything end-to-end without human intervention. "Trip booked. Confirmation in your email." This is the goal for low-stakes tasks but still risky for high-stakes ones.
Start semi-autonomous. Always. You can remove human checkpoints later once you've built trust in the system. Adding them after your agent has already booked a $5,000 non-refundable flight is not a fun conversation to have.
Real-World Examples of AI Agents
Let's ground this in reality. Here are agents people are actually using in 2025:
Coding Agents
- Claude Code - Anthropic's CLI agent that can read your codebase, write code, run tests, fix bugs, and create commits. It operates in your terminal with access to your file system and shell.
- Cursor - An IDE with deep agent integration. It can make multi-file edits, explain code, and iterate based on test results.
- Devin - Cognition's "AI software engineer" that can handle full development tasks in a sandboxed environment.
Research Agents
- Perplexity - Searches the web, reads multiple sources, synthesizes answers with citations. It's an agent that makes the "tool calls" invisible to the user.
- Elicit - An academic research agent that finds papers, extracts claims, and identifies consensus across studies.
Business Agents
- Customer support agents - Companies like Klarna, Intercom, and Sierra deploy agents that handle customer issues end-to-end: processing refunds, updating orders, answering policy questions.
- Data analysis agents - Connect to databases, write SQL, create visualizations, and narrate findings.
- Sales agents - Research prospects, draft personalized emails, schedule meetings, and update CRM records.
What's notable is that the successful agents share common traits: they have clear goals, well-defined tools, appropriate guardrails, and graceful fallback to humans when they're uncertain.
Key Terminology
Before we go further, let's establish a shared vocabulary. You'll encounter these terms throughout this book and in the broader AI agent ecosystem.
| Term | Definition | Example |
|---|---|---|
| Agent | Software that autonomously pursues goals using perception, reasoning, and action | A coding agent that fixes a bug across multiple files |
| Environment | Everything external to the agent that it can observe or affect | File system, web, APIs, databases |
| Action | A discrete step the agent takes to affect its environment | Calling a search API, writing a file, running a command |
| Observation | Information the agent receives after taking an action | Search results, command output, API response |
| Tool | A specific capability the agent can use - a function it can call | Web search, calculator, code executor, file reader |
| Tool Schema | The structured definition of what a tool does and what inputs it takes | JSON schema describing a function's parameters |
| Episode | One complete run of an agent from receiving a goal to producing a result | Researching a topic and writing a summary |
| Trajectory | The sequence of thought-action-observation steps in an episode | Think → Search → Read → Think → Write → Done |
| Context Window | The maximum amount of text the LLM can "see" at once | 200K tokens for Claude, ~128K for GPT-4o |
| Grounding | Connecting the LLM's reasoning to real-world data via tools | Using web search to verify facts before answering |
| Guardrails | Constraints that prevent the agent from taking harmful actions | "Never delete files without confirmation" |
| Human-in-the-Loop | Design pattern where the agent pauses for human approval at key steps | "I'm about to send this email. Approve?" |
Common Misconceptions About AI Agents
There's a lot of hype and misunderstanding around agents. Let's clear up the biggest ones.
Misconception 1: "Agents are just fancy prompts"
Prompts are one component. An agent also includes tool definitions, execution logic, error handling, memory management, and often multiple LLM calls orchestrated together. Saying agents are "just prompts" is like saying a web application is "just HTML."
Misconception 2: "Agents can do anything"
They can't. Agents are bounded by their tools, their LLM's reasoning capability, and their context window. An agent without a web search tool can't search the web, no matter how smart the underlying model is. An agent with a 4K context window will struggle with complex multi-step tasks. The design of the agent's toolset is often more important than the choice of model.
Misconception 3: "More autonomy is always better"
More autonomy means more potential for compounding errors. If an agent makes a wrong decision on step 3 of a 20-step plan, every subsequent step may be wasted (or worse, harmful). The best agents are designed with appropriate autonomy for their domain. A coding agent that auto-commits to main is not "more advanced" - it's poorly designed.
Misconception 4: "Agents will replace all software"
Agents are great for tasks that require judgment, flexibility, and natural language understanding. They're terrible for tasks that need to run the exact same way every time with 100% reliability. You don't need an agent to process payroll. You need an agent to figure out how to set up payroll processing for a new country.
Misconception 5: "You need a framework to build an agent"
You don't. An agent is a loop that calls an LLM and executes tools. You can build one in 50 lines of Python. Frameworks can help with common patterns, but they're not prerequisites. In fact, building one from scratch first (as we'll do in Chapter 3) is the best way to understand what frameworks are abstracting for you.
A rule of thumb for when to use an agent vs. traditional software: if you can write a flowchart for the task that covers all cases, use traditional software. If the flowchart would need a "use judgment" box, consider an agent.
What's Coming Next
Now that you have a solid understanding of what agents are, how they work at a high level, and why they've become practical, we're going to go deeper.
In Chapter 2, we'll explore the different types of AI agents - from simple reactive agents to sophisticated hierarchical multi-agent systems. You'll learn which architecture fits which problem, and how the commercial products you use every day map to these categories.
In Chapter 3, we'll build a working agent from scratch. No frameworks, no hand-waving - just Python, an LLM API, and a clear understanding of the agent loop we covered here. By the end of that chapter, you'll have a functional research agent that can search the web, extract information, and synthesize findings.
The goal of this book is not to make you an AI agents theorist. It's to make you an AI agents practitioner. Let's keep building.