Ollama
Run open-weight models locally — no API key, no data leaves your machine. Ollama is the easiest way to host models on your own laptop or a workstation.
What you need
- Ollama installed and running on the machine Kenaz lives on (or any machine on your network)
- At least one model pulled:
ollama pull llama3.2or similar - ~10–80 GB of free disk per model, plus enough RAM/VRAM to run it
Hardware reality check:
- 8 GB RAM — small models only (
llama3.2:1b,qwen2.5:1.5b) - 16 GB RAM —
llama3.2,qwen2.5:7b,mistral - 32 GB+ RAM / GPU —
llama3.3:70b,qwen2.5:32b,deepseek-r1:32b
A model that doesn't fit will load painfully slowly off swap or fail outright.
Steps
- Install Ollama. ollama.com/download — runs on macOS, Windows, Linux. The installer adds a system service that listens on
http://localhost:11434by default. - Pull a model.
List what you've got:ollama pull llama3.2ollama list
- Add to Kenaz. Providers → Add provider → Ollama. The endpoint defaults to
http://localhost:11434— change it if Ollama is running on a different host. No API key needed (set theBearerfield if you've put Ollama behind a reverse proxy that requires one). Click Test, Save.
Kenaz reads the list of locally-available models on save. Pull a new model later via ollama pull, then click Refresh models in the Kenaz provider editor to pick it up.
Models and what they're for
The full library is at ollama.com/library. Notable picks:
- llama3.2 — Meta's daily-driver. Good general assistant, fast.
- qwen2.5 — Alibaba's. Stronger at code than Llama.
- deepseek-r1 — reasoning model, slow but strong on multi-step problems.
- mistral / mixtral — efficient European models.
- gemma2 — Google's open-weight family.
- phi-3 — Microsoft's small efficient models.
Tags (the part after :) pick the size variant: llama3.2:1b, llama3.2:3b, etc.
Pricing
Free. Pay your electric bill.
Privacy posture
- Nothing leaves your machine. Period. Verifiable: pull the network cable and Ollama still works.
- Ollama itself collects no telemetry by default. You can verify with
lsof -i -Pwhile Ollama is running.
Tool use
Ollama supports OpenAI-compatible function calling on models that the underlying GGUF advertises tool support for (most recent Llama, Qwen, Mistral models). Capability hints in Kenaz reflect what each Ollama model declares; tools won't show up for models that can't use them.
Tool quality on local models is materially worse than frontier hosted models. If your work depends on robust multi-step tool use, this isn't the right backend.
Troubleshooting
connection refusedon Test — Ollama isn't running.ollama serve(or restart the Ollama desktop app).- Test passes, no models listed. You haven't pulled any.
ollama pull <model>. - Generation is unbearably slow. Model is too big for your RAM. Pick a smaller variant, or move Ollama to a machine with a GPU and point Kenaz at
http://that-host:11434. - Network access from Kenaz. Ollama listens on localhost by default. To reach it from another machine you need to set
OLLAMA_HOST=0.0.0.0:11434and restart — but be aware that exposes your model to anyone on the network.