Setting Up a Local Coding Agent on macOS with Gemma 4 and MTP
By
Kyle Howells
Toasted golden, schmeared with insight. Top of the rack.
Summary
A developer documents their experience setting up a local coding agent on macOS using Gemma 4 with Multi-Token Prediction (MTP) for faster inference. The setup runs on an Apple M1 Max with 64GB RAM, using llama.cpp with gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf and Qwen3.6 35B-A3B models. The article covers the configuration process, performance testing, and demonstrates real-time agent responsiveness without relying on cloud internet connectivity.
Key quotes
· 3 pulledI'd had my internet fail a few times recently leaving me stranded without a coding agent
This video is realtime. And shows the agent responding at a perfectly usable speed.
This was tested on an Apple M1 Max with 64 GB unified memory, running macOS 15.7.7.
You might also wanna read
Google DeepMind's Gemma 4 12B: Encoder-free multimodal AI runs locally on 16GB VRAM
Google DeepMind's Gemma 4 12B is an open-source multimodal AI model that processes text, images, and audio natively on consumer hardware wit
Google DeepMind's Gemma 4 12B brings native audio and vision AI to standard laptops
Google DeepMind has released Gemma 4 12B, an open multimodal AI model with native audio and vision processing capabilities designed to run l
Google's Gemma 4 12B matches larger model performance while running on standard laptops
Google has released Gemma 4 12B, a compact AI model that runs locally on consumer-grade laptops with just 16GB of VRAM or unified memory. Ac
bit.ly·4d agoGoogle DeepMind Releases Gemma 4 12B Unified Open Multimodal AI Model
Google DeepMind has released Gemma 4 12B Unified, an open multimodal AI model that processes text, audio, image, and video inputs natively w
Locally AI: Run AI Models Offline on Apple Devices
Locally AI is a software application that enables users to run various AI models (including Llama, Gemma, Qwen, and DeepSeek) locally on App
UTCP Agent: Lightweight Tool-Calling Protocol for AI Agents
UTCP Agent is a lightweight alternative to MCP (Model Context Protocol) that enables AI agents to call tools directly with minimal code. The
