Running Local Models with Ollama

Course: OpenClaw — Autonomous AI Agents | Pathway: Builder | Tier: Free | Level: Beginner Estimated Reading Time: 10 minutes

Why Run Models Locally?

So far, your agent has been using a cloud model — an AI model running on someone else's server. You send it a message, the message travels over the internet, gets processed, and the response comes back.

That works, but it comes with trade-offs:

Privacy. Every message you send is processed on a remote server. If your agent handles sensitive information — business emails, customer data, personal details — that data leaves your machine.

Cost. Many cloud models charge per message or per token (roughly per word). For an agent running cron jobs every few minutes, those costs add up quickly.

Reliability. Cloud services go down. Rate limits kick in. Free tiers get throttled. If your agent depends entirely on a cloud model, it stops working when the cloud has a bad day.

Speed. Network round trips add latency. A local model processes your request without needing to send anything over the internet.

Local models solve all of these problems. Your data stays on your machine. There are no usage fees. It works offline. And for smaller models, the response time can be impressively fast.

What is Ollama?

Ollama is a tool that makes it easy to download and run AI models on your own computer. Think of it as a one-stop shop for local AI.

Without Ollama, running a local model involves downloading model files, configuring GPU drivers, setting up Python environments, and dealing with compatibility issues. Ollama wraps all of that complexity into a simple command-line tool.

You tell Ollama which model you want, it downloads it, and it runs. That is about as complicated as it gets.

Installing Ollama

macOS

Download Ollama from ollama.com. Open the downloaded file and drag it to your Applications folder.

Alternatively, if you have Homebrew installed:

brew install ollama

Linux

Run this command in your terminal:

curl -fsSL https://ollama.com/install.sh | sh

Verify the Installation

Open a terminal and run:

ollama --version

You should see a version number. If you do, Ollama is installed and ready.

Downloading Your First Model

Ollama has a library of models you can download. To get started, we recommend Llama 3.2 — it is capable, well-rounded, and runs reasonably well on most modern machines.

ollama pull llama3.2

This will download the model. Depending on your internet speed, it might take a few minutes — model files are typically a few gigabytes.

Once it is downloaded, you can test it right away:

ollama run llama3.2

This opens an interactive chat. Type a message and see how it responds. Press Ctrl + D to exit when you are done.

Choosing the Right Model

Ollama supports many models. Here is a practical guide:

For Machines with 8GB RAM or Less

Stick with smaller models:

llama3.2 (3B) — good general-purpose model, runs well on modest hardware
phi3 (3.8B) — Microsoft's compact model, good at following instructions
gemma2:2b (2B) — Google's small model, fast and light

For Machines with 16GB RAM

You have more options:

llama3.2 (8B version) — the sweet spot of capability and speed for most people
mistral (7B) — solid all-rounder
gemma2 (9B) — very capable, good at reasoning

For Machines with 32GB+ RAM

You can run larger, more capable models:

llama3.1:70b — significantly more capable, but slower
deepseek-coder — excellent for code-related tasks
command-r — good at following complex instructions

Practical Reality

Be honest with yourself about your hardware. A model that takes 30 seconds to respond to each message is not going to be fun for an interactive chat agent. It might be fine for cron jobs where speed does not matter as much.

If you are unsure, start with llama3.2 (the default 3B version). If it runs well, try a larger model. If it stutters, stick with what works.

For Apple Silicon Macs (M1, M2, M3, M4), you are in luck — these chips handle local models particularly well because of their unified memory architecture. An M1 MacBook Air with 16GB can comfortably run 7-8B parameter models.

Connecting Ollama to OpenClaw

Once Ollama is running, connecting it to OpenClaw is simple. Open your agent's configuration:

nano ~/openclaw/agents/helper/config.yaml

Change the model line to use an Ollama model:

name: Helper
model: ollama/llama3.2
description: A friendly general-purpose assistant

The ollama/ prefix tells OpenClaw to use Ollama instead of a cloud provider.

Make sure Ollama is running before you start OpenClaw. You can start the Ollama server with:

ollama serve

Or on macOS, just open the Ollama application — it runs in the background automatically.

Then start OpenClaw:

npm start

Your agent is now running entirely on your machine. No internet required for the AI part.

Mixing Local and Cloud Models

Here is something powerful: you do not have to choose one or the other. Different agents can use different models.

At Lalapanzi.ai, we use a mix:

Agents handling routine, frequent tasks use local models through Ollama — it keeps costs at zero and data private.
Agents that need maximum capability for complex reasoning use cloud models through OpenRouter.

You can set this up by giving each agent a different model in its config:

# agents/helper/config.yaml
model: ollama/llama3.2

# agents/researcher/config.yaml
model: openrouter/anthropic/claude-sonnet-4

This way, your daily email summary runs on a free local model, while your research agent uses a more powerful cloud model only when needed.

Managing Models

A few useful Ollama commands:

List downloaded models:

ollama list

Remove a model you no longer need:

ollama rm model-name

Update a model to the latest version:

ollama pull model-name

Check how much disk space models are using:

Models are stored in ~/.ollama/models/. You can check the size with:

du -sh ~/.ollama/models/

Keep an eye on disk space. Each model can be several gigabytes, and they add up if you download many of them.

Limitations to Know About

Quality gap. Local models are improving rapidly, but the best cloud models (GPT-4, Claude, Gemini Pro) are still more capable, especially for complex reasoning, nuanced writing, and following intricate instructions. For many everyday tasks, the gap is small. For demanding tasks, it matters.

Hardware dependent. If your machine is slow, your agent is slow. There is no way around this. The model runs on your CPU and RAM (or GPU if you have one), and its speed is limited by your hardware.

No internet knowledge. Local models do not browse the internet. Their knowledge comes from their training data, which has a cutoff date. For current information, you need to give the agent tools that can access the internet.

Memory usage. While a model is loaded, it uses a significant chunk of your RAM. Running multiple large models at the same time on a modest machine will cause problems. Ollama handles loading and unloading models, but be aware of the constraint.

What is Next

In the final lesson, we will look at practical projects you can build with OpenClaw — from email assistants to research agents to monitoring bots. You now have all the building blocks: agents, channels, cron jobs, and models.

Key Takeaways

Local models keep your data private, cost nothing to run, and work offline.
Ollama makes running local models as simple as a single command.
Choose your model based on your available RAM — start small and work up.
You can mix local and cloud models across different agents.
Local models are less capable than top cloud models, but for many tasks the difference is small.
Apple Silicon Macs are particularly good at running local models.