Multi-Agent Architectures
One Claude is good. An orchestra of Claudes — each with a focused role — is transformative.
Why Multiple Agents?
Single-agent Claude hits limits: context windows fill up, tasks require parallel work, and some steps need a second opinion. Multi-agent systems divide work across specialised Claude instances that collaborate.
- Task is too long to fit in one context window
- Sub-tasks are independent and can run in parallel
- You need specialised roles (researcher, writer, critic)
- Quality matters enough to warrant a review step
Pattern 1: Orchestrator + Sub-Agents
One Claude (the orchestrator) breaks the task into subtasks and delegates to specialist sub-agents. Each sub-agent has its own system prompt and context — they never see each other's full conversation.
import anthropic
client = anthropic.Anthropic()
def call_claude(system: str, user: str, model="claude-opus-4-5") -> str:
r = client.messages.create(
model=model, max_tokens=1024,
messages=[
{"role": "user", "content": user}
],
system=system
)
return r.content[0].text
# Sub-agents with focused system prompts
def researcher(topic: str) -> str:
return call_claude(
"You are a research specialist. Extract key facts, "
"statistics, and examples. Return bullet points only.",
f"Research this topic thoroughly: {topic}"
)
def writer(research: str, angle: str) -> str:
return call_claude(
"You are a senior copywriter. Write engaging, "
"clear prose. No bullet points in output.",
f"Write a 400-word article using this research:
{research}
Angle: {angle}"
)
def editor(draft: str) -> str:
return call_claude(
"You are a meticulous editor. Fix grammar, improve "
"flow, tighten sentences. Return only the revised text.",
f"Edit this draft:
{draft}"
)
# Orchestrator
def orchestrate(topic: str, angle: str) -> str:
print("Researching...")
research = researcher(topic)
print("Writing...")
draft = writer(research, angle)
print("Editing...")
final = editor(draft)
return final
result = orchestrate(
topic="Prompt injection attacks in AI systems",
angle="Practical guide for developers"
)Pattern 2: Parallel Pipeline
Run sub-agents concurrently for independent tasks, then combine results. Uses Python's asyncio or ThreadPoolExecutor.
import asyncio
import anthropic
client = anthropic.AsyncAnthropic()
async def analyse_section(section: str, role: str) -> dict:
r = await client.messages.create(
model="claude-haiku-4-5-20251001", # fast + cheap for parallel work
max_tokens=512,
system=f"You are a {role}. Analyse the text and return JSON with "
'"findings": [list of key findings]',
messages=[{"role": "user", "content": section}]
)
import json
return json.loads(r.content[0].text)
async def parallel_analyse(document: str) -> dict:
sections = document.split("
")[:4] # first 4 paragraphs
roles = ["security auditor", "UX reviewer",
"performance engineer", "accessibility expert"]
tasks = [
analyse_section(s, r)
for s, r in zip(sections, roles)
]
results = await asyncio.gather(*tasks) # all run simultaneously
return {"analyses": results}
# Run it
report = asyncio.run(parallel_analyse(open("spec.txt").read()))claude-haiku-4-5 for parallel sub-agents — it's ~20x cheaper than Opus and fast. Reserve Opus for the orchestrator and final synthesis steps.Pattern 3: Critic + Reviser Loop
One Claude generates, another critiques. Loop until the critic is satisfied or a max iteration count is reached. Excellent for high-quality creative or technical output.
def critic_reviser_loop(task: str, max_rounds: int = 3) -> str:
draft = call_claude(
"You are an expert writer. Produce a high-quality first draft.",
task
)
for round in range(max_rounds):
critique = call_claude(
"You are a ruthless but constructive critic. "
"List specific improvements needed. If the draft is excellent, "
'respond with exactly "APPROVED".',
f"Critique this draft:
{draft}"
)
if "APPROVED" in critique:
print(f"Approved after {round + 1} round(s)")
break
draft = call_claude(
"You are an expert writer. Revise based on the critique.",
f"Original draft:
{draft}
Critique:
{critique}
Revise:"
)
return draftGuardrails & Cost Control
- Max iterations: always cap loops (3-5 rounds max)
- Token budgets: set lower
max_tokensfor sub-agents than orchestrator - Haiku for internals: sub-agents doing extraction/formatting don't need Opus
- Early stopping: check intermediate results — bail if something is clearly wrong
- Logging: log every agent call with token counts for cost monitoring
Hands-on: Build a Research Pipeline
Challenge: Build a 3-agent pipeline that takes a topic and produces a structured report.
- Agent 1 (Researcher): outline the topic into 4 sub-questions
- Agent 2 (Analyst): answer each sub-question in parallel with asyncio
- Agent 3 (Synthesiser): combine all answers into a coherent 500-word report
Stretch: Add a critic agent that scores the final report 1-10 on accuracy, clarity, and depth. If any score is below 7, trigger a revision.
Master agent that breaks tasks into subtasks and delegates
Specialised Claude with its own system prompt and context
asyncio.gather() to run independent agents simultaneously
Generate-critique-revise loop with APPROVED exit condition
Use cheaper/faster model for extraction; Opus for synthesis
Always cap loops — 3-5 rounds prevents runaway costs