🇮🇳 Edition IN
Detecting...
Menu
Latest
Run Gemma 4 with Claude Code for Free
Technology Featured

Run Gemma 4 with Claude Code for Free

Google's Gemma 4 is one of the most capable open-weight coding models available right now — and with Ollama and Claude Code, you can be up and running in under 10 minutes. No API bills. No data leaving your machine. Just a powerful free coding assistant, running locally.

M
Manoj
April 10, 2026 · 4 min read · 11 views
Share:

Google's Gemma 4 is one of the most capable open-weight coding models available right now — and with Ollama and Claude Code, you can be up and running in under 10 minutes. No API bills. No data leaving your machine. Just a powerful free coding assistant, running locally.

What you'll need

  • 16GB RAM minimum (for the default 8B model)
  • 24GB+ RAM or unified memory recommended (for the 26B MoE model)
  • macOS, Linux, or Windows (WSL supported)
  • Ollama v0.19+ for best performance on Apple Silicon
1

Install Ollama

Head to ollama.com and download the installer, or use Homebrew on Mac:

brew install --cask ollama-app

After installing via Homebrew, open Ollama from /Applications (or Spotlight) to trigger the initial launch. macOS will prompt for Touch ID or your password to approve the app.

Verify it's running:

ollama --version

2

Choose and pull the right Gemma 4 model

Not all Gemma 4 variants are equal. Pick based on your hardware:

# Default 8B model (~9.6GB) — good for 16GB machines
ollama pull gemma4

# 26B MoE model (~17.99GB) — best performance/speed balance
ollama pull gemma4:26b
Pro tip: The 26B MoE variant is the sleeper pick. Its mixture-of-experts architecture activates only 3.8B parameters per token, delivering roughly 10B dense-equivalent quality at 4B inference cost — the best balance for most machines.

Confirm the download succeeded:

ollama list

3

Install Claude Code

Claude Code is Anthropic's agentic coding CLI — it can read, write, and execute code directly in your working directory.

# macOS / Linux / WSL
curl -fsSL https://claude.ai/install.sh | bash

# Windows CMD
curl -fsSL https://claude.ai/install.cmd -o install.cmd && install.cmd && del install.cmd

4

Launch Claude Code with Gemma 4

This is where everything comes together. One command and you're coding:

# Local 26B model (recommended for most setups)
ollama launch claude --model gemma4:26b

# Cloud-accelerated 31B model via Ollama + NVIDIA Blackwell
ollama launch claude --model gemma4:31b-cloud

Claude Code connects to Ollama using the Anthropic-compatible API. A context window of at least 64K tokens is recommended for best results.


5

Expand your context window Optional

For larger codebases or more complex tasks, boost the context window with a custom Modelfile:

printf 'FROM gemma4:26b\nPARAMETER num_ctx 65536' > /tmp/Modelfile-64k
ollama create gemma4-26b-64k -f /tmp/Modelfile-64k
ollama launch claude --model gemma4-26b-64k

If your Mac's internal drive is tight, you can also symlink Ollama's model storage to an external NVMe drive to free up space.


6

Keep Gemma 4 warm in memory Optional

By default, Ollama unloads models after a period of inactivity, causing cold-start delays. To keep Gemma 4 loaded throughout your workday:

OLLAMA_KEEP_ALIVE=-1 ollama serve

Or add it permanently to your shell profile (~/.zshrc or ~/.bashrc):

export OLLAMA_KEEP_ALIVE=-1

!

Troubleshooting common issues

"Unknown command: launch"

Your Ollama version is outdated. Reinstall from ollama.com to get the latest version with the launch command.

Slow responses

Make sure you're on Ollama v0.19+ for MLX acceleration on Apple Silicon. Close memory-heavy apps — browsers with many open tabs are the usual culprit.

Out of memory error

Drop to the smaller gemma4 (8B) model, or reduce your context window to 32K.

Degraded output on long prompts

On 16GB hardware, keep inputs under 32K tokens. If you notice degraded responses, memory pressure is likely the cause.

GPU not detected

Run ollama ps and look for GPU layers. If all layers show as CPU, check your GPU drivers and reinstall Ollama.


What to use Gemma 4 for

Gemma 4 is strongest on structured, routine tasks. Reserve expensive cloud API calls for what truly needs them.

Best for Gemma 4 locally

  • Routine code generation
  • Repo and file summaries
  • Structured data extraction
  • Boilerplate and scaffolding
  • Status checks and API polling
  • Lightweight research passes

Better suited for cloud models

  • Complex multi-file refactors
  • High-stakes architecture decisions
  • Ambiguous or open-ended planning
  • Legal or financial judgment tasks
  • Long-form strategic reasoning

The result

Once set up, you have a fully local, Apache 2.0-licensed coding assistant — free to use, modify, and deploy commercially, with no data ever leaving your machine. Whether you're a developer who wants a private inference endpoint or someone experimenting with local AI, this is one of the cleanest setups you can run today.

Gemma 4 Claude Code Ollama Local AI Open Source AI Google DeepMind AI Coding Tools Free AI Tools Developer Tools Apple Silicon Self-Hosted AI 2026
M

Manoj

Editor

Comments (0)

No comments yet. Be the first to share your thoughts!