Course: OpenClaw — Autonomous AI Agents | Pathway: Builder | Tier: Free | Level: Beginner Estimated Reading Time: 10 minutes
So far, your agent has been using a cloud model — an AI model running on someone else's server. You send it a message, the message travels over the internet, gets processed, and the response comes back.
That works, but it comes with trade-offs:
Privacy. Every message you send is processed on a remote server. If your agent handles sensitive information — business emails, customer data, personal details — that data leaves your machine.
Cost. Many cloud models charge per message or per token (roughly per word). For an agent running cron jobs every few minutes, those costs add up quickly.
Reliability. Cloud services go down. Rate limits kick in. Free tiers get throttled. If your agent depends entirely on a cloud model, it stops working when the cloud has a bad day.
Speed. Network round trips add latency. A local model processes your request without needing to send anything over the internet.
Local models solve all of these problems. Your data stays on your machine. There are no usage fees. It works offline. And for smaller models, the response time can be impressively fast.
Ollama is a tool that makes it easy to download and run AI models on your own computer. Think of it as a one-stop shop for local AI.
Without Ollama, running a local model involves downloading model files, configuring GPU drivers, setting up Python environments, and dealing with compatibility issues. Ollama wraps all of that complexity into a simple command-line tool.
You tell Ollama which model you want, it downloads it, and it runs. That is about as complicated as it gets.
Download Ollama from ollama.com. Open the downloaded file and drag it to your Applications folder.
Alternatively, if you have Homebrew installed:
brew install ollama
Run this command in your terminal:
curl -fsSL https://ollama.com/install.sh | sh
Open a terminal and run:
ollama --version
You should see a version number. If you do, Ollama is installed and ready.
Ollama has a library of models you can download. To get started, we recommend Llama 3.2 — it is capable, well-rounded, and runs reasonably well on most modern machines.
ollama pull llama3.2
This will download the model. Depending on your internet speed, it might take a few minutes — model files are typically a few gigabytes.
Once it is downloaded, you can test it right away:
ollama run llama3.2
This opens an interactive chat. Type a message and see how it responds. Press Ctrl + D to exit when you are done.
Ollama supports many models. Here is a practical guide:
Stick with smaller models:
You have more options:
You can run larger, more capable models:
Be honest with yourself about your hardware. A model that takes 30 seconds to respond to each message is not going to be fun for an interactive chat agent. It might be fine for cron jobs where speed does not matter as much.
If you are unsure, start with llama3.2 (the default 3B version). If it runs well, try a larger model. If it stutters, stick with what works.
For Apple Silicon Macs (M1, M2, M3, M4), you are in luck — these chips handle local models particularly well because of their unified memory architecture. An M1 MacBook Air with 16GB can comfortably run 7-8B parameter models.
Once Ollama is running, connecting it to OpenClaw is simple. Open your agent's configuration:
nano ~/openclaw/agents/helper/config.yaml
Change the model line to use an Ollama model:
name: Helper
model: ollama/llama3.2
description: A friendly general-purpose assistant
The ollama/ prefix tells OpenClaw to use Ollama instead of a cloud provider.
Make sure Ollama is running before you start OpenClaw. You can start the Ollama server with:
ollama serve
Or on macOS, just open the Ollama application — it runs in the background automatically.
Then start OpenClaw:
npm start
Your agent is now running entirely on your machine. No internet required for the AI part.
Here is something powerful: you do not have to choose one or the other. Different agents can use different models.
At Lalapanzi.ai, we use a mix:
You can set this up by giving each agent a different model in its config:
# agents/helper/config.yaml
model: ollama/llama3.2
# agents/researcher/config.yaml
model: openrouter/anthropic/claude-sonnet-4
This way, your daily email summary runs on a free local model, while your research agent uses a more powerful cloud model only when needed.
A few useful Ollama commands:
List downloaded models:
ollama list
Remove a model you no longer need:
ollama rm model-name
Update a model to the latest version:
ollama pull model-name
Check how much disk space models are using:
Models are stored in ~/.ollama/models/. You can check the size with:
du -sh ~/.ollama/models/
Keep an eye on disk space. Each model can be several gigabytes, and they add up if you download many of them.
Quality gap. Local models are improving rapidly, but the best cloud models (GPT-4, Claude, Gemini Pro) are still more capable, especially for complex reasoning, nuanced writing, and following intricate instructions. For many everyday tasks, the gap is small. For demanding tasks, it matters.
Hardware dependent. If your machine is slow, your agent is slow. There is no way around this. The model runs on your CPU and RAM (or GPU if you have one), and its speed is limited by your hardware.
No internet knowledge. Local models do not browse the internet. Their knowledge comes from their training data, which has a cutoff date. For current information, you need to give the agent tools that can access the internet.
Memory usage. While a model is loaded, it uses a significant chunk of your RAM. Running multiple large models at the same time on a modest machine will cause problems. Ollama handles loading and unloading models, but be aware of the constraint.
In the final lesson, we will look at practical projects you can build with OpenClaw — from email assistants to research agents to monitoring bots. You now have all the building blocks: agents, channels, cron jobs, and models.
Key Takeaways