Technology Featured

Run Gemma 4 with Claude Code for Free

Google's Gemma 4 is one of the most capable open-weight coding models available right now — and with Ollama and Claude Code, you can be up and running in under 10 minutes. No API bills. No data leaving your machine. Just a powerful free coding assistant, running locally.

Manoj

April 10, 2026 · 4 min read · 2,219 views

What you'll need

16GB RAM minimum (for the default 8B model)
24GB+ RAM or unified memory recommended (for the 26B MoE model)
macOS, Linux, or Windows (WSL supported)
Ollama v0.19+ for best performance on Apple Silicon

Install Ollama

Head to ollama.com and download the installer, or use Homebrew on Mac:

brew install --cask ollama-app

After installing via Homebrew, open Ollama from /Applications (or Spotlight) to trigger the initial launch. macOS will prompt for Touch ID or your password to approve the app.

Verify it's running:

ollama --version

Choose and pull the right Gemma 4 model

Not all Gemma 4 variants are equal. Pick based on your hardware:

Support Atomni

Help us keep delivering high-quality, independent journalism. Your support makes a difference.

Support Our Work

# Default 8B model (~9.6GB) — good for 16GB machines
ollama pull gemma4

# 26B MoE model (~17.99GB) — best performance/speed balance
ollama pull gemma4:26b

Pro tip: The 26B MoE variant is the sleeper pick. Its mixture-of-experts architecture activates only 3.8B parameters per token, delivering roughly 10B dense-equivalent quality at 4B inference cost — the best balance for most machines.

Confirm the download succeeded:

ollama list

Install Claude Code

Claude Code is Anthropic's agentic coding CLI — it can read, write, and execute code directly in your working directory.

# macOS / Linux / WSL
curl -fsSL https://claude.ai/install.sh | bash

# Windows CMD
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

Launch Claude Code with Gemma 4

This is where everything comes together. One command and you're coding:

# Local 26B model (recommended for most setups)
ollama launch claude --model gemma4:26b

# Cloud-accelerated 31B model via Ollama + NVIDIA Blackwell
ollama launch claude --model gemma4:31b-cloud

Claude Code connects to Ollama using the Anthropic-compatible API. A context window of at least 64K tokens is recommended for best results.

Expand your context window Optional

For larger codebases or more complex tasks, boost the context window with a custom Modelfile:

printf 'FROM gemma4:26b\nPARAMETER num_ctx 65536' > /tmp/Modelfile-64k
ollama create gemma4-26b-64k -f /tmp/Modelfile-64k
ollama launch claude --model gemma4-26b-64k

If your Mac's internal drive is tight, you can also symlink Ollama's model storage to an external NVMe drive to free up space.

Keep Gemma 4 warm in memory Optional

By default, Ollama unloads models after a period of inactivity, causing cold-start delays. To keep Gemma 4 loaded throughout your workday:

OLLAMA_KEEP_ALIVE=-1 ollama serve

Or add it permanently to your shell profile (~/.zshrc or ~/.bashrc):

export OLLAMA_KEEP_ALIVE=-1

Troubleshooting common issues

"Unknown command: launch"

Your Ollama version is outdated. Reinstall from ollama.com to get the latest version with the launch command.

Slow responses

Make sure you're on Ollama v0.19+ for MLX acceleration on Apple Silicon. Close memory-heavy apps — browsers with many open tabs are the usual culprit.

Out of memory error

Drop to the smaller gemma4 (8B) model, or reduce your context window to 32K.

Degraded output on long prompts

On 16GB hardware, keep inputs under 32K tokens. If you notice degraded responses, memory pressure is likely the cause.

GPU not detected

Run ollama ps and look for GPU layers. If all layers show as CPU, check your GPU drivers and reinstall Ollama.

→

What to use Gemma 4 for

Gemma 4 is strongest on structured, routine tasks. Reserve expensive cloud API calls for what truly needs them.

Best for Gemma 4 locally

Routine code generation
Repo and file summaries
Structured data extraction
Boilerplate and scaffolding
Status checks and API polling
Lightweight research passes

Better suited for cloud models

Complex multi-file refactors
High-stakes architecture decisions
Ambiguous or open-ended planning
Legal or financial judgment tasks
Long-form strategic reasoning

The result

Once set up, you have a fully local, Apache 2.0-licensed coding assistant — free to use, modify, and deploy commercially, with no data ever leaving your machine. Whether you're a developer who wants a private inference endpoint or someone experimenting with local AI, this is one of the cleanest setups you can run today.

Gemma 4 Claude Code Ollama Local AI Open Source AI Google DeepMind AI Coding Tools Free AI Tools Developer Tools Apple Silicon Self-Hosted AI 2026

Sources

Ollama Docs Ollama Gemma 4 Library George Liu AI MindStudio Guide Apidog Guide

Tags: #AI #Apple #Coding #Gemma #4 #Claude #Code #Ollama #Local #Open #Source #Google #DeepMind #Tools #Free #Developer #Silicon #Self-Hosted #2026

Manoj

Editorial Desk

Comments (0)

No comments yet. Be the first to share your thoughts!