Level 4Lesson 29⏱️ 90 min

The Claude API: Direct Access

Stop relying on UIs - call Claude from code with streaming, multi-turn conversations, and full control

Why Go Direct to the API?

Claude Code and Cowork are powerful interfaces for your own work. But when you want to embed Claude inside a product, run it in a background job, process thousands of items in batch, or pipe its output into another system - you need the API. This is where Claude becomes infrastructure, not a tool you use.

What you need: An Anthropic API key from console.anthropic.com. Free credits on signup. Billing is per token - input + output. No subscription required.

The Messages API

Everything goes through one endpoint: POST /v1/messages. Here is the full anatomy:

POST https://api.anthropic.com/v1/messages
x-api-key: $ANTHROPIC_API_KEY
anthropic-version: 2023-06-01
Content-Type: application/json

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "system": "You are a senior Python engineer. Reply with code only.",
  "messages": [
    { "role": "user", "content": "Write a function to validate an email address" }
  ]
}

Response structure:

{
  "role": "assistant",
  "content": [{ "type": "text", "text": "def validate_email..." }],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "usage": { "input_tokens": 28, "output_tokens": 87 }
}

SDK Setup - Python and Node

Python SDK

pip install anthropic

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "Explain RAG in 3 sentences."}]
)

print(message.content[0].text)

Node.js / TypeScript

npm install @anthropic-ai/sdk

import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic() // reads ANTHROPIC_API_KEY from env

const message = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 1024,
  system: 'You are a helpful assistant.',
  messages: [{ role: 'user', content: 'Explain RAG in 3 sentences.' }],
})

console.log(message.content[0].text)

Best practice: Always set ANTHROPIC_API_KEY as an environment variable. Never hardcode it in source files. Use dotenv locally and your platform's secrets manager in production.

Streaming Responses

Streaming sends tokens as they are generated. Essential for any user-facing feature - users see output immediately rather than waiting 5-30 seconds for a full response.

Python streaming

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a poem about APIs."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)  # Each token printed immediately

Next.js API route streaming to browser

// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic()

export async function POST(req: Request) {
  const { message } = await req.json()
  const stream = await client.messages.stream({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    messages: [{ role: 'user', content: message }],
  })
  return new Response(stream.toReadableStream(), {
    headers: { 'Content-Type': 'text/event-stream' },
  })
}

Multi-Turn Conversations

The API is stateless - Claude remembers nothing between calls. You must pass the full conversation history on every request. You control exactly what context Claude has.

messages = []

# Turn 1
messages.append({"role": "user", "content": "My name is Jay."})
response = client.messages.create(model="claude-sonnet-4-6",
    max_tokens=256, messages=messages)
messages.append({"role": "assistant", "content": response.content[0].text})

# Turn 2 - Claude remembers because we pass the full history
messages.append({"role": "user", "content": "What's my name?"})
response = client.messages.create(model="claude-sonnet-4-6",
    max_tokens=256, messages=messages)
# "Your name is Jay."

# Track cost
total = response.usage.input_tokens + response.usage.output_tokens
cost_usd = total * 0.000003
print(f"Tokens used: {total} (cost: ~USD {cost_usd:.4f})")

Context cost warning: Long conversations get expensive. At 200K context window, you could send 150K words of history - but each message costs tokens for the entire history. Implement sliding window cutoff or summarisation for long-running conversations.

Key API Parameters

model - Haiku for bulk/cheap tasks ($0.25/M input), Sonnet for most production work ($3/M), Opus for the hardest reasoning ($15/M).

max_tokens - Hard cap on output. Set to 2x what you expect. The API stops at this limit; it does not truncate your input.

temperature - 0 = deterministic/consistent (classification, extraction). 1 = creative/varied (writing). Default is 1. For production pipelines use 0.

system - Your most powerful lever. A precise system prompt beats a vague one every time. Spend more time here than on the user message.

stop_sequences - Stop generation when Claude outputs this string. Useful for structured output: ["```"] stops after a code block.

Worked Example: Bulk Ticket Classifier

Classify 100 support tickets with Haiku (fast + cheap)

import anthropic, json

client = anthropic.Anthropic()

tickets = [
    "I can't log in - password reset isn't working",
    "Please add dark mode to the dashboard",
    "The CSV export generates a blank file every time",
    "How do I add a team member to my account?",
]

def classify(ticket: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Haiku: 12x cheaper for bulk tasks
        max_tokens=128,
        temperature=0,  # Deterministic for classification
        system="""Return JSON only. No markdown.
{
  "category": "bug" | "feature" | "billing" | "how_to",
  "priority": "critical" | "high" | "medium" | "low",
  "summary": "one sentence"
}""",
        messages=[{"role": "user", "content": ticket}]
    )
    return json.loads(response.content[0].text)

for ticket in tickets:
    result = classify(ticket)
    print(f"[{result['priority'].upper()}] {result['category']}: {result['summary']}")

Lesson 29 Quick Reference

Messages API

POST /v1/messages with model, max_tokens, system, messages array. Stateless - pass full history each call.

Streaming

Use .stream() to get tokens as generated. Essential for user-facing UIs. Returns ReadableStream for browser.

temperature=0

Makes output deterministic and consistent. Use for classification, extraction, JSON output. Default is 1.

Model costs

Haiku: $0.25/M input. Sonnet: $3/M input. Opus: $15/M input. Route by task complexity to control costs.

Multi-turn

API is stateless. Append each user+assistant turn to messages array. You control context window content.

Never hardcode keys

Use ANTHROPIC_API_KEY env var. dotenv locally, platform secrets manager in production. Never commit keys.

← L28: Claude Settings

Unlocks in ~23 min of reading

L30: Structured Outputs & Tool Use →