Ollama

Run powerful AI models on your own computer - completely private, completely free

Ollama makes it as easy to run an AI model locally as installing any other app. One command to pull a model, one command to chat. Your data never leaves your machine - no API keys, no usage limits, no monthly fees. Supports Llama 3, Mistral, Gemma, Phi-3, DeepSeek, and dozens more.

Cost

Free forever

Privacy

100% local

Internet

Not required

Platform

Mac, Win, Linux

RAM needed

8 GB min (16 GB+)

Website

ollama.com

⚡ Install & Setup (5 minutes)

Download Ollama

Go to ollama.com → Download. Pick your OS. On Mac, drag Ollama.app to Applications. It adds a menu bar icon.

Pull your first model

Open Terminal and run: ollama pull llama3.2 - this downloads the 2.0 GB Llama 3.2 3B model. For a bigger model: ollama pull llama3.1 (4.7 GB, much more capable).

Chat in terminal

Run: ollama run llama3.2 - a chat prompt appears. Type your message and press Enter. Type /bye to exit.

Or use a GUI (optional)

Install Open WebUI for a ChatGPT-like browser interface: run the Docker command from openwebui.com. Connects to your local Ollama automatically.

# Quick start commands:
ollama pull llama3.2      # Download model (one time)
ollama run llama3.2      # Start chatting
ollama list             # See downloaded models
ollama pull mistral      # Download Mistral 7B

📦 Recommended Models to Try

Model	Size	RAM	Best for
`llama3.2`	2 GB	8 GB	Fast chat, low-end hardware
`llama3.1`	4.7 GB	8 GB	General purpose, good quality
`llama3.1:70b`	40 GB	64 GB	Near GPT-4 quality, needs high-end Mac
`mistral`	4.1 GB	8 GB	Fast, strong coding tasks
`deepseek-coder`	776 MB	8 GB	Code generation, very fast
`phi3:mini`	2.3 GB	8 GB	Lightweight, good reasoning
`gemma2:2b`	1.6 GB	8 GB	Google model, very fast
`codellama`	3.8 GB	8 GB	Code completion, works with editors

🚀 Example: Chat + Code Assistance

Terminal Chat

$ ollama run llama3.1
>>> Write a Python function that parses a CSV file and returns a list of dicts

Here is a clean implementation:
def parse_csv(file_path: str) -> list[dict]:
    with open(file_path, 'r') as f:
        return list(csv.DictReader(f))

Use Ollama from Python

pip install ollama

import ollama

response = ollama.chat(
  model='llama3.1',
  messages=[{'role': 'user', 'content': 'Explain async/await in Python'}]
)
print(response['message']['content'])

Use with VS Code via Continue extension

Install the "Continue" extension in VS Code → set provider to Ollama → point to your local model. You now have free local AI completions in VS Code with no API costs.

💡 Pro Tips

Apple Silicon Macs (M1/M2/M3/M4) run Ollama models fast using the GPU - even 7B models feel snappy
Models are cached - pulling is a one-time download. Running them requires no internet
Use Open WebUI (openwebui.com) for a full browser-based ChatGPT-like UI over your local models
Ollama exposes an OpenAI-compatible API at localhost:11434 - swap it into any code that uses OpenAI
For max privacy: AI coding without any data leaving your machine - perfect for confidential codebases

🔗 Official Resources

Download Ollama ↗Mac, Windows, Linux - free Model library ↗Browse Llama, Mistral, Gemma, DeepSeek, Qwen...GitHub repo ↗Source, docs, and issues Blog ↗New models and features

Links open the official sites. Pricing and features change often - always confirm there. (Verified June 2026.)

→ Free Local AI 1-Pager → Hugging Face Guide → Hardware Buying Guide