Level 4Lesson 31⏱️ 100 min

Building RAG Systems

Retrieval-Augmented Generation: teach Claude about YOUR data without fine-tuning.

What is RAG and Why Does It Matter?

Claude knows a lot - but not your company wiki, your product docs, or last week's sales data. RAG solves this by retrieving relevant chunks of your data at query time and stuffing them into Claude's context window.

RAG pipeline in 4 steps:

Ingest - chunk your documents, embed each chunk into a vector
Store - save vectors in a vector database (Supabase pgvector, Pinecone, etc.)
Retrieve - embed the user query, find closest chunks by cosine similarity
Generate - pass retrieved chunks + question to Claude, get grounded answer

Step 1: Chunking Strategy

How you split documents dramatically affects quality. Bad chunking = bad retrieval = hallucinations.

# pip install anthropic supabase
def chunk_text(text: str, chunk_size: int = 500,
               overlap: int = 50) -> list[str]:
    """
    Sliding window chunker.
    - chunk_size: characters per chunk (~125 tokens)
    - overlap: characters shared between adjacent chunks
      (preserves context at boundaries)
    """
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunks.append(text[start:end])
        start += chunk_size - overlap   # slide with overlap
    return chunks

# Better: use LangChain's RecursiveCharacterTextSplitter
# which tries to split on paragraphs, then sentences, then words
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500, chunk_overlap=50,
    separators=["

", "
", ". ", " ", ""]
)
chunks = splitter.split_text(your_document)

Chunking rules of thumb:

500-1000 characters (~125-250 tokens) works well for most docs
Always overlap 10-15% to avoid cutting mid-sentence
Store metadata (source, page, section) with every chunk
For structured data (FAQs, tables), chunk by logical unit, not character count

Step 2: Embeddings + Supabase pgvector

We'll use OpenAI's embedding model (or Voyage AI, which is excellent for Claude workflows) and store vectors in Supabase - free tier included.

-- In Supabase SQL editor:
create extension if not exists vector;

create table documents (
  id        bigserial primary key,
  content   text,
  metadata  jsonb,
  embedding vector(1536)  -- dimension matches your embedding model
);

create index on documents
  using ivfflat (embedding vector_cosine_ops)
  with (lists = 100);  -- ~sqrt(num_rows) is a good starting point

-- Similarity search function
create or replace function match_documents(
  query_embedding vector(1536),
  match_count     int default 5,
  match_threshold float default 0.7
)
returns table(id bigint, content text, metadata jsonb, similarity float)
language sql stable as $$
  select id, content, metadata,
         1 - (embedding <=> query_embedding) as similarity
  from documents
  where 1 - (embedding <=> query_embedding) > match_threshold
  order by embedding <=> query_embedding
  limit match_count;
$$;

Step 3: Ingest Pipeline (Python)

import anthropic
from supabase import create_client
import voyageai  # pip install voyageai (great for RAG)

claude = anthropic.Anthropic()
supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
vo = voyageai.Client(api_key=VOYAGE_API_KEY)

def embed(texts: list[str]) -> list[list[float]]:
    result = vo.embed(texts, model="voyage-3", input_type="document")
    return result.embeddings

def ingest_document(text: str, metadata: dict):
    chunks = chunk_text(text)           # from Step 1
    embeddings = embed(chunks)
    rows = [
        {"content": c, "metadata": metadata, "embedding": e}
        for c, e in zip(chunks, embeddings)
    ]
    supabase.table("documents").insert(rows).execute()

# Usage
ingest_document(open("handbook.txt").read(),
                {"source": "handbook", "version": "2024"})

Step 4: Query + Generate

def ask(question: str) -> str:
    # Embed the question
    q_embedding = vo.embed(
        [question], model="voyage-3", input_type="query"
    ).embeddings[0]

    # Retrieve top-5 chunks from Supabase
    results = supabase.rpc("match_documents", {
        "query_embedding": q_embedding,
        "match_count": 5,
        "match_threshold": 0.7
    }).execute()

    if not results.data:
        return "I don't have information about that."

    # Build context from retrieved chunks
    context = "

---

".join(
        r["content"] for r in results.data
    )

    # Generate grounded answer
    response = claude.messages.create(
        model="claude-opus-4-8",
        max_tokens=512,
        system="""You are a helpful assistant. Answer using ONLY
the context provided. If the answer isn't in the context,
say "I don't have that information."

Context:
""" + context,
        messages=[{"role": "user", "content": question}]
    )
    return response.content[0].text

print(ask("What is our parental leave policy?"))

RAG Quality Improvements

When RAG gives bad answers, try these fixes:

Re-ranking: retrieve 20 chunks, use a cross-encoder to re-rank, keep top 5
Hybrid search: combine vector similarity with keyword (BM25) search
HyDE: ask Claude to generate a hypothetical answer, embed that for retrieval
Metadata filtering: filter by date, source, or category before similarity search
Smaller chunks for retrieval, larger for generation: retrieve small, expand context before feeding to Claude

Hands-on: Build a Docs Q&A Bot

Challenge: Build a RAG pipeline over a set of markdown docs (your own notes, a project README, or any text file).

Chunk 3+ documents and store in Supabase with the SQL schema above
Build a query function that retrieves and passes to Claude
Test with 5 questions - note where it gets it right vs. wrong
Add source citation: include the metadata source in the response

Stretch: Add a confidence score - if the highest similarity is below 0.75, have Claude say it's not sure rather than hallucinating.

Lesson 31 Quick Reference

RAG

Retrieve relevant chunks at query time, pass to Claude as context

Chunking

500-1000 chars, 10-15% overlap, split on paragraphs first

pgvector

Postgres extension for vector storage - built into Supabase

Cosine similarity

1 - (embedding <=> query_embedding) - ranges 0 to 1

Voyage AI

Anthropic-recommended embedding model, great for Claude RAG

HyDE

Generate hypothetical answer, embed it for better retrieval

← L30: Structured Outputs & Tool Use

Unlocks in ~25 min of reading

L32: Multi-Agent Architectures →