Run Gemma 4 with Claude Code for Free
Google's Gemma 4 is one of the most capable open-weight coding models available right now — and with Ollama and Claude Code, you can be up and running in under 10 minutes. No API bills. No data leaving your machine. Just a powerful free coding assistant, running locally.
Google's Gemma 4 is one of the most capable open-weight coding models available right now — and with Ollama and Claude Code, you can be up and running in under 10 minutes. No API bills. No data leaving your machine. Just a powerful free coding assistant, running locally.
What you'll need
- 16GB RAM minimum (for the default 8B model)
- 24GB+ RAM or unified memory recommended (for the 26B MoE model)
- macOS, Linux, or Windows (WSL supported)
- Ollama v0.19+ for best performance on Apple Silicon
Install Ollama
Head to ollama.com and download the installer, or use Homebrew on Mac:
brew install --cask ollama-app
After installing via Homebrew, open Ollama from /Applications (or Spotlight) to trigger the initial launch. macOS will prompt for Touch ID or your password to approve the app.
Verify it's running:
ollama --version
Choose and pull the right Gemma 4 model
Not all Gemma 4 variants are equal. Pick based on your hardware:
# Default 8B model (~9.6GB) — good for 16GB machines
ollama pull gemma4
# 26B MoE model (~17.99GB) — best performance/speed balance
ollama pull gemma4:26b
Confirm the download succeeded:
ollama list
Install Claude Code
Claude Code is Anthropic's agentic coding CLI — it can read, write, and execute code directly in your working directory.
# macOS / Linux / WSL
curl -fsSL https://claude.ai/install.sh | bash
# Windows CMD
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd
Launch Claude Code with Gemma 4
This is where everything comes together. One command and you're coding:
# Local 26B model (recommended for most setups)
ollama launch claude --model gemma4:26b
# Cloud-accelerated 31B model via Ollama + NVIDIA Blackwell
ollama launch claude --model gemma4:31b-cloud
Claude Code connects to Ollama using the Anthropic-compatible API. A context window of at least 64K tokens is recommended for best results.
Expand your context window Optional
For larger codebases or more complex tasks, boost the context window with a custom Modelfile:
printf 'FROM gemma4:26b\nPARAMETER num_ctx 65536' > /tmp/Modelfile-64k
ollama create gemma4-26b-64k -f /tmp/Modelfile-64k
ollama launch claude --model gemma4-26b-64k
If your Mac's internal drive is tight, you can also symlink Ollama's model storage to an external NVMe drive to free up space.
Keep Gemma 4 warm in memory Optional
By default, Ollama unloads models after a period of inactivity, causing cold-start delays. To keep Gemma 4 loaded throughout your workday:
OLLAMA_KEEP_ALIVE=-1 ollama serve
Or add it permanently to your shell profile (~/.zshrc or ~/.bashrc):
export OLLAMA_KEEP_ALIVE=-1
Troubleshooting common issues
Your Ollama version is outdated. Reinstall from ollama.com to get the latest version with the launch command.
Make sure you're on Ollama v0.19+ for MLX acceleration on Apple Silicon. Close memory-heavy apps — browsers with many open tabs are the usual culprit.
Drop to the smaller gemma4 (8B) model, or reduce your context window to 32K.
On 16GB hardware, keep inputs under 32K tokens. If you notice degraded responses, memory pressure is likely the cause.
Run ollama ps and look for GPU layers. If all layers show as CPU, check your GPU drivers and reinstall Ollama.
What to use Gemma 4 for
Gemma 4 is strongest on structured, routine tasks. Reserve expensive cloud API calls for what truly needs them.
Best for Gemma 4 locally
- Routine code generation
- Repo and file summaries
- Structured data extraction
- Boilerplate and scaffolding
- Status checks and API polling
- Lightweight research passes
Better suited for cloud models
- Complex multi-file refactors
- High-stakes architecture decisions
- Ambiguous or open-ended planning
- Legal or financial judgment tasks
- Long-form strategic reasoning
The result
Once set up, you have a fully local, Apache 2.0-licensed coding assistant — free to use, modify, and deploy commercially, with no data ever leaving your machine. Whether you're a developer who wants a private inference endpoint or someone experimenting with local AI, this is one of the cleanest setups you can run today.
Manoj
Editor
Comments (0)
No comments yet. Be the first to share your thoughts!