Local AI Setup (Mac mini + Laptop)¶
This is the practical setup/run guide for your stack.
What You Have¶
From your dotfiles installer:
- ollama (model manager + local inference server)
- llama.cpp (llama-cli for low-level local inference)
Optional (off by default) in installer prompt:
- codex
- claude-code
- opencode-desktop
Is Ollama an App or Terminal?¶
- Installed via Homebrew formula: terminal-first workflow.
- You can run it as a background service (
brew services start ollama). - If you want a chat UI, use a separate UI layer (see Open WebUI below).
- If you want cloud coding-agent apps, install optional casks from installer prompt.
Minimal "Get Running" Flow¶
- Start Ollama service
brew services start ollama
- Pull one coding model
ollama pull qwen2.5-coder:7b
- Run first prompt
ollama run qwen2.5-coder:7b "Write a Python script that renames files by date"
- Check loaded models
ollama list
ollama ps
This flow is enough to run local AI without Codex/Claude subscriptions.
Optional UI (Browser Chat)¶
Run Open WebUI with Docker and connect to local Ollama:
docker run -d \
--name open-webui \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--restart unless-stopped \
ghcr.io/open-webui/open-webui:main
Then open: http://localhost:3000
Remote Workflow (Mac mini at home, laptop anywhere)¶
Recommended secure path:
- Install Tailscale on both devices and join same tailnet.
- Enable SSH on Mac mini (
System Settings -> General -> Sharing -> Remote Login). - SSH from laptop:
ssh youruser@<mac-mini-tailscale-ip>
- Keep coding sessions persistent with tmux:
tmux new -s dev
-
(Optional) Use VS Code Remote SSH to edit/run directly on Mac mini.
-
(Optional) Tunnel Ollama API to laptop for local tools:
ssh -N -L 11434:localhost:11434 youruser@<mac-mini-tailscale-ip>
Now laptop can call http://localhost:11434 securely through SSH tunnel.
Suggested Model Strategy (16GB RAM / 256GB SSD)¶
- Keep 1-2 active models.
- Prefer 7B/8B quantized models.
- Add larger models only if needed.
- Move model cache to external SSD if library grows.
Known Issue Seen on Current Machine¶
On this machine, ollama crashes immediately with an MLX/Metal exception (NSRangeException, no Metal device selected).
If you hit this:
- Ensure you run from a normal logged-in GUI user session (not restricted shell context).
- Restart Mac and try again.
- Reinstall Ollama:
brew reinstall ollama
- Try the app/cask variant instead:
brew uninstall ollama
brew install --cask ollama
- Use
llama.cppas fallback while debugging Ollama:
llama-cli -m /path/to/model.gguf -p "Hello"
Day-2 Operations¶
Update everything:
scripts/update.sh
Check environment health:
scripts/dev-check.sh