The Claude API: Direct Access
Stop relying on UIs — call Claude from code with streaming, multi-turn conversations, and full control
Why Go Direct to the API?
Claude Code and Cowork are powerful interfaces for your own work. But when you want to embed Claude inside a product, run it in a background job, process thousands of items in batch, or pipe its output into another system — you need the API. This is where Claude becomes infrastructure, not a tool you use.
console.anthropic.com. Free credits on signup. Billing is per token — input + output. No subscription required.The Messages API
Everything goes through one endpoint: POST /v1/messages. Here is the full anatomy:
POST https://api.anthropic.com/v1/messages
x-api-key: $ANTHROPIC_API_KEY
anthropic-version: 2023-06-01
Content-Type: application/json
{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"system": "You are a senior Python engineer. Reply with code only.",
"messages": [
{ "role": "user", "content": "Write a function to validate an email address" }
]
}Response structure:
{
"role": "assistant",
"content": [{ "type": "text", "text": "def validate_email..." }],
"model": "claude-sonnet-4-6",
"stop_reason": "end_turn",
"usage": { "input_tokens": 28, "output_tokens": 87 }
}SDK Setup — Python and Node
pip install anthropic
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Explain RAG in 3 sentences."}]
)
print(message.content[0].text)npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic() // reads ANTHROPIC_API_KEY from env
const message = await client.messages.create({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: 'You are a helpful assistant.',
messages: [{ role: 'user', content: 'Explain RAG in 3 sentences.' }],
})
console.log(message.content[0].text)ANTHROPIC_API_KEY as an environment variable. Never hardcode it in source files. Use dotenv locally and your platform's secrets manager in production.Streaming Responses
Streaming sends tokens as they are generated. Essential for any user-facing feature — users see output immediately rather than waiting 5-30 seconds for a full response.
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a poem about APIs."}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True) # Each token printed immediately// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic()
export async function POST(req: Request) {
const { message } = await req.json()
const stream = await client.messages.stream({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
messages: [{ role: 'user', content: message }],
})
return new Response(stream.toReadableStream(), {
headers: { 'Content-Type': 'text/event-stream' },
})
}Multi-Turn Conversations
The API is stateless — Claude remembers nothing between calls. You must pass the full conversation history on every request. You control exactly what context Claude has.
messages = []
# Turn 1
messages.append({"role": "user", "content": "My name is Jay."})
response = client.messages.create(model="claude-sonnet-4-6",
max_tokens=256, messages=messages)
messages.append({"role": "assistant", "content": response.content[0].text})
# Turn 2 — Claude remembers because we pass the full history
messages.append({"role": "user", "content": "What's my name?"})
response = client.messages.create(model="claude-sonnet-4-6",
max_tokens=256, messages=messages)
# "Your name is Jay."
# Track cost
total = response.usage.input_tokens + response.usage.output_tokens
cost_usd = total * 0.000003
print(f"Tokens used: {total} (cost: ~USD {cost_usd:.4f})")Key API Parameters
model — Haiku for bulk/cheap tasks ($0.25/M input), Sonnet for most production work ($3/M), Opus for the hardest reasoning ($15/M).max_tokens — Hard cap on output. Set to 2x what you expect. The API stops at this limit; it does not truncate your input.temperature — 0 = deterministic/consistent (classification, extraction). 1 = creative/varied (writing). Default is 1. For production pipelines use 0.system — Your most powerful lever. A precise system prompt beats a vague one every time. Spend more time here than on the user message.stop_sequences — Stop generation when Claude outputs this string. Useful for structured output: ["```"] stops after a code block.Worked Example: Bulk Ticket Classifier
Classify 100 support tickets with Haiku (fast + cheap)
import anthropic, json
client = anthropic.Anthropic()
tickets = [
"I can't log in — password reset isn't working",
"Please add dark mode to the dashboard",
"The CSV export generates a blank file every time",
"How do I add a team member to my account?",
]
def classify(ticket: str) -> dict:
response = client.messages.create(
model="claude-haiku-4-5-20251001", # Haiku: 12x cheaper for bulk tasks
max_tokens=128,
temperature=0, # Deterministic for classification
system="""Return JSON only. No markdown.
{
"category": "bug" | "feature" | "billing" | "how_to",
"priority": "critical" | "high" | "medium" | "low",
"summary": "one sentence"
}""",
messages=[{"role": "user", "content": ticket}]
)
return json.loads(response.content[0].text)
for ticket in tickets:
result = classify(ticket)
print(f"[{result['priority'].upper()}] {result['category']}: {result['summary']}")POST /v1/messages with model, max_tokens, system, messages array. Stateless — pass full history each call.
Use .stream() to get tokens as generated. Essential for user-facing UIs. Returns ReadableStream for browser.
Makes output deterministic and consistent. Use for classification, extraction, JSON output. Default is 1.
Haiku: $0.25/M input. Sonnet: $3/M input. Opus: $15/M input. Route by task complexity to control costs.
API is stateless. Append each user+assistant turn to messages array. You control context window content.
Use ANTHROPIC_API_KEY env var. dotenv locally, platform secrets manager in production. Never commit keys.