Local AI Setup (Mac mini + Laptop)¶

This is the practical setup/run guide for your stack.

What You Have¶

From your dotfiles installer: - ollama (model manager + local inference server) - llama.cpp (llama-cli for low-level local inference)

Optional (off by default) in installer prompt: - codex - claude-code - opencode-desktop

Is Ollama an App or Terminal?¶

Installed via Homebrew formula: terminal-first workflow.
You can run it as a background service (brew services start ollama).
If you want a chat UI, use a separate UI layer (see Open WebUI below).
If you want cloud coding-agent apps, install optional casks from installer prompt.

Minimal "Get Running" Flow¶

Start Ollama service

brew services start ollama

Pull one coding model

ollama pull qwen2.5-coder:7b

Run first prompt

ollama run qwen2.5-coder:7b "Write a Python script that renames files by date"

Check loaded models

ollama list
ollama ps

This flow is enough to run local AI without Codex/Claude subscriptions.

Optional UI (Browser Chat)¶

Run Open WebUI with Docker and connect to local Ollama:

docker run -d \
  --name open-webui \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  --restart unless-stopped \
  ghcr.io/open-webui/open-webui:main

Then open: http://localhost:3000

Remote Workflow (Mac mini at home, laptop anywhere)¶

Recommended secure path:

Install Tailscale on both devices and join same tailnet.
Enable SSH on Mac mini (System Settings -> General -> Sharing -> Remote Login).
SSH from laptop:

ssh youruser@<mac-mini-tailscale-ip>

Keep coding sessions persistent with tmux:

tmux new -s dev

(Optional) Use VS Code Remote SSH to edit/run directly on Mac mini.
(Optional) Tunnel Ollama API to laptop for local tools:

ssh -N -L 11434:localhost:11434 youruser@<mac-mini-tailscale-ip>

Now laptop can call http://localhost:11434 securely through SSH tunnel.

Suggested Model Strategy (16GB RAM / 256GB SSD)¶

Keep 1-2 active models.
Prefer 7B/8B quantized models.
Add larger models only if needed.
Move model cache to external SSD if library grows.

Known Issue Seen on Current Machine¶

On this machine, ollama crashes immediately with an MLX/Metal exception (NSRangeException, no Metal device selected).

If you hit this:

Ensure you run from a normal logged-in GUI user session (not restricted shell context).
Restart Mac and try again.
Reinstall Ollama:

brew reinstall ollama

Try the app/cask variant instead:

brew uninstall ollama
brew install --cask ollama

Use llama.cpp as fallback while debugging Ollama:

llama-cli -m /path/to/model.gguf -p "Hello"

Day-2 Operations¶

Update everything:

scripts/update.sh

Check environment health:

scripts/dev-check.sh