Multi-Agent Systems
There's a moment in every agent project where you stare at your single-agent implementation and realize it's doing too much. It's researching, writing, fact-checking, formatting, and somehow also trying to manage its own workflow. The result? Mediocre output across the board. Your agent has become the developer who insists on doing frontend, backend, DevOps, and QA all by themselves - technically possible, but nobody's getting their best work.
This is where multi-agent systems come in. Instead of one overloaded agent, you build a team of specialized agents that collaborate, argue, and hold each other accountable. This chapter walks you through the when, why, and how of multi-agent architectures - with enough practical detail to actually build one.
When One Agent Isn't Enough
Before you start splitting your agent into a committee, let's be honest: most tasks don't need multiple agents. A single well-designed agent with good tools can handle an enormous range of work. Multi-agent systems add coordination overhead, debugging complexity, and cost.
So when do you need them?
| Signal | Example | Why Multi-Agent Helps |
|---|---|---|
| Conflicting expertise | A task needs both deep legal knowledge and creative writing | Specialized system prompts and tool sets per agent |
| Quality through verification | High-stakes outputs that need fact-checking | A separate agent can verify without anchoring bias |
| Pipeline workflows | Content goes through research → drafting → editing stages | Each stage has different requirements and evaluation criteria |
| Parallel subtasks | Analyzing 50 documents simultaneously | Fan-out to worker agents, fan-in results |
| Adversarial robustness | Security analysis that benefits from red-team/blue-team thinking | Agents deliberately challenge each other's conclusions |
A good heuristic: if you find yourself writing a system prompt longer than 800 words with multiple "modes" or "personas," you're probably cramming multiple agents into one. Split them.
Multi-Agent Architectures
There's no single "right" architecture for multi-agent systems. The choice depends on your workflow's shape, your latency tolerance, and how much coordination you need.
Orchestrator-Worker
The most common pattern. One agent (the orchestrator) decomposes tasks and delegates to specialized worker agents. Think of it as a project manager assigning tickets.
┌──────────────┐
│ Orchestrator │
└──────┬───────┘
┌───────┼───────┐
▼ ▼ ▼
[Researcher] [Writer] [Editor]
Best for: Well-defined pipelines, tasks that decompose cleanly into subtasks.
Peer-to-Peer
Agents communicate directly with each other without a central coordinator. Each agent decides when to pass work to another agent.
Best for: Flexible, exploratory workflows where the sequence isn't predetermined.
Risk: Without coordination, agents can enter infinite loops or duplicate work. You need clear termination conditions.
Hierarchical
Multiple layers of orchestration. A top-level orchestrator delegates to mid-level orchestrators, which manage their own teams of workers. This is the corporate org chart of agent architectures.
Best for: Very complex tasks with natural groupings - like building an entire application where you have separate teams for frontend, backend, and testing.
Blackboard
All agents read from and write to a shared state (the "blackboard"). Any agent can pick up work when it sees something relevant to its expertise. This is event-driven and decoupled.
Best for: Problems where contributions from different specialists need to be integrated iteratively, like collaborative diagnosis.
| Architecture | Coordination | Flexibility | Complexity | Best Use Case |
|---|---|---|---|---|
| Orchestrator-Worker | Centralized | Low-Medium | Medium | Pipelines, task decomposition |
| Peer-to-Peer | Decentralized | High | High | Exploratory, creative tasks |
| Hierarchical | Multi-level | Medium | Very High | Large, complex projects |
| Blackboard | Shared state | High | Medium-High | Iterative, multi-discipline problems |
Communication Protocols
Agents need to talk to each other. How they do it matters more than you'd think.
Message Passing
The simplest approach. Agents send structured messages to each other, typically through the orchestrator.
from dataclasses import dataclass
from typing import Any
@dataclass
class AgentMessage:
sender: str
recipient: str
content: str
message_type: str # "task", "result", "feedback", "error"
metadata: dict[str, Any] = None
def to_prompt_context(self) -> str:
return f"[Message from {self.sender}]: {self.content}"
Shared State
All agents read from and write to a common state object. This is simpler to implement but requires careful management to avoid conflicts.
class SharedState:
def __init__(self):
self._state = {}
self._history = []
def update(self, agent_name: str, key: str, value: Any):
self._history.append({
"agent": agent_name,
"key": key,
"old_value": self._state.get(key),
"new_value": value
})
self._state[key] = value
def read(self, key: str) -> Any:
return self._state.get(key)
Event-Driven
Agents subscribe to events and react when something relevant happens. This is the most decoupled approach and works well when you don't know the execution order upfront.
In practice, most production systems use a hybrid: an orchestrator does message passing for the main workflow, with shared state for context that all agents need (like the original user request or accumulated results).
The Orchestrator Pattern in Depth
Let's build a real orchestrator. This isn't a toy example - it's the pattern I've used in production systems.
import openai
import json
from dataclasses import dataclass, field
@dataclass
class AgentConfig:
name: str
system_prompt: str
model: str = "gpt-4o"
tools: list = field(default_factory=list)
max_tokens: int = 4096
class SpecialistAgent:
def __init__(self, config: AgentConfig):
self.config = config
self.client = openai.OpenAI()
def run(self, task: str, context: str = "") -> str:
messages = [
{"role": "system", "content": self.config.system_prompt},
]
if context:
messages.append({
"role": "user",
"content": f"Context from previous agents:\n{context}"
})
messages.append({"role": "user", "content": task})
response = self.client.chat.completions.create(
model=self.config.model,
messages=messages,
max_tokens=self.config.max_tokens,
tools=self.config.tools or None
)
return response.choices[0].message.content
class Orchestrator:
def __init__(self, agents: dict[str, SpecialistAgent]):
self.agents = agents
self.client = openai.OpenAI()
self.execution_log = []
def plan(self, user_request: str) -> list[dict]:
"""Ask the LLM to decompose the task into agent assignments."""
agent_descriptions = "\n".join(
f"- {name}: {agent.config.system_prompt[:100]}..."
for name, agent in self.agents.items()
)
planning_prompt = f"""You are a task orchestrator. Given a user request,
decompose it into sequential steps and assign each step to the most
appropriate specialist agent.
Available agents:
{agent_descriptions}
Respond with a JSON array of steps:
[{{"agent": "agent_name", "task": "specific task description"}}]
User request: {user_request}"""
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": planning_prompt}],
response_format={"type": "json_object"}
)
result = json.loads(response.choices[0].message.content)
return result.get("steps", result.get("plan", []))
def execute(self, user_request: str) -> dict:
"""Plan and execute the multi-agent workflow."""
steps = self.plan(user_request)
accumulated_context = f"Original request: {user_request}\n\n"
results = {}
for i, step in enumerate(steps):
agent_name = step["agent"]
task = step["task"]
if agent_name not in self.agents:
self.execution_log.append({
"step": i, "error": f"Unknown agent: {agent_name}"
})
continue
agent = self.agents[agent_name]
result = agent.run(task, context=accumulated_context)
results[f"step_{i}_{agent_name}"] = result
accumulated_context += f"\n--- Output from {agent_name} ---\n{result}\n"
self.execution_log.append({
"step": i,
"agent": agent_name,
"task": task,
"output_length": len(result)
})
return {
"results": results,
"final_output": result,
"execution_log": self.execution_log
}
The critical design decision here is accumulated context - each agent receives the outputs from all previous agents. This allows downstream agents to build on (or critique) upstream work.
Accumulated context grows with every step. For pipelines with many agents, you'll blow through context windows fast. Consider summarizing intermediate outputs or only passing relevant portions to each agent.
Agent Specialization: Why Focused Agents Win
This is counterintuitive if you've been impressed by how capable a single GPT-4 call can be. But specialization matters for agents just like it matters for software microservices.
A researcher agent with a system prompt laser-focused on finding facts, citing sources, and flagging uncertainty will consistently outperform a generalist agent asked to "research and then write." Why?
- System prompts have limited attention. The more instructions you pack in, the more the model "forgets" or deprioritizes some of them.
- Tool sets can be tailored. A researcher needs web search and document retrieval. A writer needs... nothing but a clear brief. Giving both agents all tools leads to confusion.
- Evaluation is clearer. You can evaluate a researcher on factual accuracy and a writer on prose quality. Evaluating a generalist on both simultaneously is a mess.
"A multi-agent system is really just the single-responsibility principle applied to AI."
Debate and Consensus: Agents That Argue
One of the most powerful multi-agent patterns is adversarial collaboration - agents that deliberately challenge each other.
def debate_round(proposition: str, agents: list[SpecialistAgent],
rounds: int = 3) -> str:
"""Run a structured debate between agents."""
transcript = f"Topic: {proposition}\n\n"
for round_num in range(rounds):
for agent in agents:
response = agent.run(
task=f"Round {round_num + 1}: Given the debate so far, "
f"present your strongest argument. Challenge weak "
f"points from other participants.",
context=transcript
)
transcript += f"\n[{agent.config.name}, Round {round_num + 1}]:\n"
transcript += f"{response}\n"
# Final synthesis by a judge agent
judge = SpecialistAgent(AgentConfig(
name="judge",
system_prompt="You are an impartial judge. Synthesize the debate "
"into a balanced conclusion, noting where agents "
"agreed and where legitimate disagreements remain."
))
return judge.run("Provide your final judgment.", context=transcript)
This pattern is especially useful for:
- Code review: One agent writes code, another reviews it for bugs and security issues
- Risk assessment: Optimistic and pessimistic agents present cases
- Fact verification: A claim agent and a skeptic agent go back and forth
Practical Example: Content Creation Pipeline
Let's wire up a real pipeline that takes a topic and produces a polished article.
# Define the specialist agents
researcher = SpecialistAgent(AgentConfig(
name="researcher",
system_prompt="""You are a research specialist. Given a topic:
1. Identify key facts, statistics, and expert opinions
2. Cite sources with URLs where possible
3. Flag any claims you're uncertain about with [UNVERIFIED]
4. Organize findings into clear sections
Do NOT write prose. Provide structured research notes."""
))
writer = SpecialistAgent(AgentConfig(
name="writer",
system_prompt="""You are a skilled technical writer. Given research notes:
1. Write an engaging, well-structured article
2. Use clear examples and analogies
3. Maintain an authoritative but conversational tone
4. Include section headers, bullet points, and code examples where relevant
Write for a technical audience. No fluff."""
))
editor = SpecialistAgent(AgentConfig(
name="editor",
system_prompt="""You are a senior editor. Review the article for:
1. Clarity and readability (fix jargon, improve flow)
2. Logical structure (does the argument build properly?)
3. Technical accuracy (flag anything that seems wrong)
4. Grammar and style consistency
Return the improved article with your changes applied."""
))
fact_checker = SpecialistAgent(AgentConfig(
name="fact_checker",
system_prompt="""You are a fact-checker. Review the article against
the original research notes:
1. Verify all claims are supported by the research
2. Flag any statements that were added without evidence
3. Check that statistics and quotes are accurately represented
4. Mark issues as [FACTUAL ERROR], [UNSUPPORTED], or [NEEDS CONTEXT]
Return a fact-check report AND the corrected article."""
))
# Create and run the pipeline
pipeline = Orchestrator({
"researcher": researcher,
"writer": writer,
"editor": editor,
"fact_checker": fact_checker
})
result = pipeline.execute(
"Write a 1500-word article about the environmental impact of "
"large language model training"
)
This pipeline consistently produces better output than asking a single agent to "research and write an article about X." The researcher finds better facts because that's all it's doing. The writer produces cleaner prose because it has structured input. The editor catches issues the writer was blind to. And the fact-checker keeps everyone honest.
Handling Conflicts Between Agents
When agents disagree, you need a resolution strategy. Here are the three I've found most practical:
1. Orchestrator as tiebreaker. The orchestrator reviews conflicting outputs and makes a decision. Simple but adds latency.
2. Voting. Run multiple agents on the same task and take the majority answer. Works well for factual questions, poorly for creative tasks.
3. Escalation to human. When agents can't agree and the stakes are high, surface the disagreement to a human reviewer with both perspectives summarized.
def resolve_conflict(outputs: list[str], strategy: str = "orchestrator") -> str:
if strategy == "vote":
# Simple majority - works best with structured outputs
from collections import Counter
votes = Counter(outputs)
return votes.most_common(1)[0][0]
elif strategy == "orchestrator":
resolver = SpecialistAgent(AgentConfig(
name="resolver",
system_prompt="You resolve disagreements between agents. "
"Analyze each position, identify the strongest "
"arguments, and produce a final answer."
))
conflict_summary = "\n\n".join(
f"Agent {i+1} says:\n{output}" for i, output in enumerate(outputs)
)
return resolver.run("Resolve this disagreement.", context=conflict_summary)
elif strategy == "escalate":
return {
"status": "needs_human_review",
"conflicting_outputs": outputs
}
Scaling Multi-Agent Systems
Multi-agent systems are expensive. Every agent call is an LLM invocation, and orchestration adds overhead. Here's how to keep costs manageable:
| Strategy | Implementation | Impact |
|---|---|---|
| Use cheaper models for simple agents | GPT-4o-mini for extraction, GPT-4o for reasoning | 5-10x cost reduction |
| Parallelize independent steps | Use asyncio.gather() for agents that don't depend on each other |
2-4x latency reduction |
| Cache intermediate results | Hash inputs and cache agent outputs | Eliminates redundant calls |
| Short-circuit when possible | Skip the fact-checker if the article is purely opinion | Reduces unnecessary steps |
| Set token budgets per agent | max_tokens=1000 for summarizers, more for writers |
Prevents runaway costs |
import asyncio
async def parallel_research(topics: list[str], researcher: SpecialistAgent):
"""Fan out research tasks in parallel."""
tasks = [
asyncio.to_thread(researcher.run, topic)
for topic in topics
]
return await asyncio.gather(*tasks)
Profile your pipeline before optimizing. I've seen teams spend days parallelizing agents when the real bottleneck was a single slow tool call. Measure first.
Real-World Multi-Agent Frameworks
You don't have to build everything from scratch. Here's how the major frameworks approach multi-agent:
| Framework | Architecture Style | Strengths | Weaknesses | Best For |
|---|---|---|---|---|
| AutoGen (Microsoft) | Conversation-based, agents chat | Flexible conversation patterns, GroupChat | Can be verbose, hard to control flow | Research, exploratory tasks |
| CrewAI | Role-based, sequential/parallel | Simple API, role abstraction | Less flexible for complex graphs | Content pipelines, business workflows |
| LangGraph | Graph-based state machine | Precise flow control, conditional edges | Steeper learning curve | Complex workflows with branching |
| OpenAI Swarm | Lightweight handoffs | Minimal abstraction, easy to understand | Limited built-in coordination | Simple agent-to-agent delegation |
| Autogen Studio | Visual builder | No-code agent orchestration | Less customizable | Prototyping, non-developers |
My recommendation: start with LangGraph if you need precise control, CrewAI if you want simplicity, and plain code (like the examples in this chapter) if you want to understand what's happening.
Anti-Patterns: What Goes Wrong
The Committee Anti-Pattern. You create 8 agents for a task that one agent could handle. Each agent adds latency, cost, and potential failure points. More agents does not mean better output.
The Telephone Game. Information degrades as it passes through too many agents. By the time the 5th agent sees the original request, it's been summarized and reinterpreted so many times that critical details are lost.
The Infinite Loop. Agent A asks Agent B for clarification. Agent B asks Agent A for more context. Neither has a termination condition. Always set maximum iteration limits.
Over-Specified Orchestration. The orchestrator's plan is so detailed that the specialist agents have no room to apply their expertise. If you're dictating every sentence, you don't need specialists - you need a typist.
# Always set a maximum number of steps
MAX_STEPS = 10
for i, step in enumerate(steps):
if i >= MAX_STEPS:
logger.warning(f"Hit max steps ({MAX_STEPS}). Forcing completion.")
break
# ... execute step
When NOT to Use Multi-Agent
I want to end with the most important advice in this chapter: default to a single agent.
Use a single agent when:
- The task is well-defined and doesn't require conflicting perspectives
- Latency matters more than output quality
- Your budget is tight (multi-agent systems multiply costs)
- The task is simple enough that one good system prompt covers it
- You're prototyping and don't yet know the workflow shape
Use multi-agent when:
- A single agent consistently fails at part of the task
- You need adversarial verification (security, compliance, fact-checking)
- The workflow has natural pipeline stages with different requirements
- You need to parallelize work across many subtasks
- Quality is more important than speed or cost
"The best multi-agent system is the one you didn't build because a single agent was enough."
Multi-agent systems are a powerful tool, but they're a tool for specific problems. In the next chapters, we'll look at how to give these agents (single or multi) access to external knowledge through RAG, and how to keep them safe with guardrails.