Level 4Lesson 32⏱️ 100 min

Multi-Agent Architectures

One Claude is good. An orchestra of Claudes - each with a focused role - is transformative.

Why Multiple Agents?

Single-agent Claude hits limits: context windows fill up, tasks require parallel work, and some steps need a second opinion. Multi-agent systems divide work across specialised Claude instances that collaborate.

When to use multi-agent:

Task is too long to fit in one context window
Sub-tasks are independent and can run in parallel
You need specialised roles (researcher, writer, critic)
Quality matters enough to warrant a review step

Pattern 1: Orchestrator + Sub-Agents

One Claude (the orchestrator) breaks the task into subtasks and delegates to specialist sub-agents. Each sub-agent has its own system prompt and context - they never see each other's full conversation.

import anthropic
client = anthropic.Anthropic()

def call_claude(system: str, user: str, model="claude-opus-4-8") -> str:
    r = client.messages.create(
        model=model, max_tokens=1024,
        messages=[
            {"role": "user", "content": user}
        ],
        system=system
    )
    return r.content[0].text

# Sub-agents with focused system prompts
def researcher(topic: str) -> str:
    return call_claude(
        "You are a research specialist. Extract key facts, "
        "statistics, and examples. Return bullet points only.",
        f"Research this topic thoroughly: {topic}"
    )

def writer(research: str, angle: str) -> str:
    return call_claude(
        "You are a senior copywriter. Write engaging, "
        "clear prose. No bullet points in output.",
        f"Write a 400-word article using this research:
{research}
Angle: {angle}"
    )

def editor(draft: str) -> str:
    return call_claude(
        "You are a meticulous editor. Fix grammar, improve "
        "flow, tighten sentences. Return only the revised text.",
        f"Edit this draft:
{draft}"
    )

# Orchestrator
def orchestrate(topic: str, angle: str) -> str:
    print("Researching...")
    research = researcher(topic)
    print("Writing...")
    draft = writer(research, angle)
    print("Editing...")
    final = editor(draft)
    return final

result = orchestrate(
    topic="Prompt injection attacks in AI systems",
    angle="Practical guide for developers"
)

Pattern 2: Parallel Pipeline

Run sub-agents concurrently for independent tasks, then combine results. Uses Python's asyncio or ThreadPoolExecutor.

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()

async def analyse_section(section: str, role: str) -> dict:
    r = await client.messages.create(
        model="claude-haiku-4-5-20251001",  # fast + cheap for parallel work
        max_tokens=512,
        system=f"You are a {role}. Analyse the text and return JSON with "
               '"findings": [list of key findings]',
        messages=[{"role": "user", "content": section}]
    )
    import json
    return json.loads(r.content[0].text)

async def parallel_analyse(document: str) -> dict:
    sections = document.split("

")[:4]   # first 4 paragraphs
    roles = ["security auditor", "UX reviewer",
             "performance engineer", "accessibility expert"]

    tasks = [
        analyse_section(s, r)
        for s, r in zip(sections, roles)
    ]
    results = await asyncio.gather(*tasks)   # all run simultaneously
    return {"analyses": results}

# Run it
report = asyncio.run(parallel_analyse(open("spec.txt").read()))

Cost tip: Use claude-haiku-4-5 for parallel sub-agents - it's ~20x cheaper than Opus and fast. Reserve Opus for the orchestrator and final synthesis steps.

Pattern 3: Critic + Reviser Loop

One Claude generates, another critiques. Loop until the critic is satisfied or a max iteration count is reached. Excellent for high-quality creative or technical output.

def critic_reviser_loop(task: str, max_rounds: int = 3) -> str:
    draft = call_claude(
        "You are an expert writer. Produce a high-quality first draft.",
        task
    )

    for round in range(max_rounds):
        critique = call_claude(
            "You are a ruthless but constructive critic. "
            "List specific improvements needed. If the draft is excellent, "
            'respond with exactly "APPROVED".',
            f"Critique this draft:
{draft}"
        )

        if "APPROVED" in critique:
            print(f"Approved after {round + 1} round(s)")
            break

        draft = call_claude(
            "You are an expert writer. Revise based on the critique.",
            f"Original draft:
{draft}

Critique:
{critique}

Revise:"
        )

    return draft

Guardrails & Cost Control

Multi-agent can get expensive fast. Use these controls:

Max iterations: always cap loops (3-5 rounds max)
Token budgets: set lower max_tokens for sub-agents than orchestrator
Haiku for internals: sub-agents doing extraction/formatting don't need Opus
Early stopping: check intermediate results - bail if something is clearly wrong
Logging: log every agent call with token counts for cost monitoring

Hands-on: Build a Research Pipeline

Challenge: Build a 3-agent pipeline that takes a topic and produces a structured report.

Agent 1 (Researcher): outline the topic into 4 sub-questions
Agent 2 (Analyst): answer each sub-question in parallel with asyncio
Agent 3 (Synthesiser): combine all answers into a coherent 500-word report

Stretch: Add a critic agent that scores the final report 1-10 on accuracy, clarity, and depth. If any score is below 7, trigger a revision.

Lesson 32 Quick Reference

Orchestrator

Master agent that breaks tasks into subtasks and delegates

Sub-agent

Specialised Claude with its own system prompt and context

Parallel pipeline

asyncio.gather() to run independent agents simultaneously

Critic + reviser

Generate-critique-revise loop with APPROVED exit condition

Haiku for sub-agents

Use cheaper/faster model for extraction; Opus for synthesis

Max iterations

Always cap loops - 3-5 rounds prevents runaway costs

← L31: Building RAG Systems

Unlocks in ~25 min of reading

L33: Claude for Teams & Orgs →