Level 4Lesson 0⏱️ 0 min

Level 4 Capstone: Ship Your AI Product

Everything you've learned - API, RAG, agents, vision, production, evals - comes together in one real project.

You're now an AI builder.

Level 4 covered the full stack of building with Claude: calling the API, extracting structured data, building RAG pipelines, orchestrating multi-agent systems, processing images, shipping to production, building responsibly, and measuring quality with evals. The capstone is to ship something real.

Option A: AI-Powered Support Inbox

Build a full-stack Next.js app that triages, summarises, and drafts replies for a customer support inbox. This pulls together structured outputs, RAG over your knowledge base, and a clean UI.

What to build:

Ingest a knowledge base (docs, FAQs) into Supabase pgvector
API endpoint: triage incoming ticket (category + priority) using tool use
API endpoint: retrieve relevant docs via RAG, draft a reply
Next.js UI: ticket list, click to see AI summary + draft reply
Eval suite: 20 test tickets with expected triage labels

Starter prompt for Claude Code:

Build a Next.js 14 App Router customer support inbox app.
Stack: TypeScript, Tailwind, Supabase, Anthropic SDK.

Features:
1. Supabase table: tickets (id, subject, body, status, category, priority, created_at)
2. Supabase table: knowledge_base (id, content, metadata, embedding vector(1536))
3. POST /api/ingest - chunk and embed a text document into knowledge_base
4. POST /api/triage - given a ticket body, return {category, priority} using
   Claude claude-haiku-4-5 with structured JSON output
5. POST /api/draft-reply - RAG over knowledge_base, return a draft reply
   using claude-opus-4-8 with prompt caching on the system prompt
6. GET /api/tickets - list all tickets
7. UI: /inbox page with ticket list; click opens detail with:
   - Ticket body
   - AI triage badge (category + priority)
   - Suggested reply (editable textarea)
   - "Send" button (marks ticket resolved)

Add exponential backoff retry on all Claude calls.
Add per-request cost logging to console.

Option B: Multi-Agent Research Assistant

Build a pipeline that takes a research question and returns a structured report - with parallel sub-agents, a critic loop, and source citations.

What to build:

Orchestrator: decompose question into 4 sub-questions
Researcher agents: answer each sub-question in parallel (asyncio)
Synthesiser: combine answers into a coherent 600-word report
Critic loop: score report 1-10; revise if below 8 (max 3 rounds)
CLI or simple web UI to submit questions and display the report

Starter prompt for Claude Code:

Build a Python multi-agent research assistant.

Architecture:
- orchestrator(question) -> list of 4 sub-questions
- researcher(sub_question) -> findings (bullet points)
  Use claude-haiku-4-5 for researchers (cheap + parallel)
- synthesiser(all_findings, original_question) -> 600-word report
  Use claude-opus-4-8 for synthesis
- critic(report) -> {"score": 1-10, "improvements": [...]}
  Loop max 3 times; stop if score >= 8 or "APPROVED" in output
- reviser(report, critique) -> improved report

Requirements:
- Use asyncio.gather() for parallel researcher calls
- Log token usage and cost per agent call to a CSV
- Add exponential backoff on all API calls
- CLI: python research.py "What are the tradeoffs of microservices?"
- Output: print final report + total cost summary

Eval: write 5 test questions with LLM-as-judge scoring
against ideal answers you write manually.

Option C: Vision Document Processor

Build a document intelligence app that accepts uploaded images or PDFs and extracts structured data - receipts, invoices, forms, or contracts.

What to build:

Next.js page with drag-and-drop file upload (images + PDFs)
API route: send file to Claude with structured extraction prompt
Return extracted data as JSON (vendor, amount, date, line items)
Display results in a clean table with edit capability
Export to CSV button
History: store all processed documents in Supabase

Starter prompt for Claude Code:

Build a Next.js 14 document intelligence app.
Stack: TypeScript, Tailwind, Supabase, Anthropic SDK.

Features:
1. /upload page: drag-and-drop zone accepting JPEG, PNG, PDF (max 10MB)
2. POST /api/extract:
   - Accept file as FormData
   - Convert to base64 (image) or base64 pdf document block
   - Send to claude-opus-4-8 with this system prompt:
     "Extract all data from this document. Return JSON:
      {type: 'receipt'|'invoice'|'form'|'contract'|'other',
       vendor: string|null, date: string|null,
       total: number|null, currency: string|null,
       line_items: [{description, quantity, unit_price, total}],
       notes: string}"
   - Validate JSON with try/catch; retry once on failure
3. Display result as editable table
4. "Export CSV" button for line_items
5. Supabase table: documents (id, filename, type, extracted_data, created_at)
6. /history page: list all processed documents

Add output safety check: if extracted total > 100000, flag for manual review.

Capstone Requirements (all options)

Uses the Claude API directly - no UI wrappers, raw SDK calls

Structured outputs - at least one feature returns validated JSON

Production-ready - retry logic, error handling, cost logging

Eval suite - 10+ test cases with a measurable pass/fail score

Deployed - live on Vercel or Railway with a shareable URL

Responsible - input sanitisation, output safety check, AI disclosure in UI

What You've Accomplished

Completing Level 4 means you can:

Call the Claude API from any language with streaming, retries, and cost tracking
Extract structured data reliably using JSON mode, instructor, and tool use
Build RAG systems over your own data with Supabase pgvector
Orchestrate multiple Claude agents in parallel and critic-reviser loops
Deploy Claude into production with caching, rate limiting, and observability
Build responsibly - prompt injection defence, output filtering, transparency
Measure and improve prompt quality with systematic evals

You're not a Claude user anymore. You're a Claude builder.

🧠 Check your understanding

5 quick questions on Level 4. Answer each, then check your score.

1. What does RAG (Retrieval-Augmented Generation) do?

2. Structured outputs / tool use let the model...

3. In a production AI system you must handle...

4. A benefit of a multi-agent architecture is...

5. Why does prompt evaluation matter?

💬

How was Level 4?

Your feedback helps improve this course. Takes 30 seconds.

← L37: Advanced Prompt Evaluation