All Topics

Technology

Art

Setting Up a Local Coding Agent on macOS with Gemma 4 and MTP

Kyle Howells

13h ago· 6 min readen

100/100

Golden Brown

Bagelometer↗

Toasted golden, schmeared with insight. Top of the rack.

Score100Typehow-toSentimentpositive

Summary

A developer documents their experience setting up a local coding agent on macOS using Gemma 4 with Multi-Token Prediction (MTP) for faster inference. The setup runs on an Apple M1 Max with 64GB RAM, using llama.cpp with gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf and Qwen3.6 35B-A3B models. The article covers the configuration process, performance testing, and demonstrates real-time agent responsiveness without relying on cloud internet connectivity.

Key quotes

· 3 pulled

I'd had my internet fail a few times recently leaving me stranded without a coding agent

This video is realtime. And shows the agent responding at a perfectly usable speed.

This was tested on an Apple M1 Max with 64 GB unified memory, running macOS 15.7.7.

Snippet from the RSS feed

Running Gemma 4 26B-A4B and Qwen3.6 35B-A3B locally with llama.cpp, MTP speculative decoding, multimodal support, and PI as a coding agent.

You might also wanna read

Google DeepMind's Gemma 4 12B: Encoder-free multimodal AI runs locally on 16GB VRAM

Google DeepMind's Gemma 4 12B is an open-source multimodal AI model that processes text, images, and audio natively on consumer hardware wit

Product Hunt·9d ago

Google DeepMind's Gemma 4 12B brings native audio and vision AI to standard laptops

Google DeepMind has released Gemma 4 12B, an open multimodal AI model with native audio and vision processing capabilities designed to run l

gadgetbond.com·6d ago

Google's Gemma 4 12B matches larger model performance while running on standard laptops

Google has released Gemma 4 12B, a compact AI model that runs locally on consumer-grade laptops with just 16GB of VRAM or unified memory. Ac

bit.ly·4d ago

Google DeepMind Releases Gemma 4 12B Unified Open Multimodal AI Model

Google DeepMind has released Gemma 4 12B Unified, an open multimodal AI model that processes text, audio, image, and video inputs natively w

huggingface.co·5d ago

Locally AI: Run AI Models Offline on Apple Devices

Locally AI is a software application that enables users to run various AI models (including Llama, Gemma, Qwen, and DeepSeek) locally on App

Product Hunt·3mo ago

UTCP Agent: Lightweight Tool-Calling Protocol for AI Agents

UTCP Agent is a lightweight alternative to MCP (Model Context Protocol) that enables AI agents to call tools directly with minimal code. The

Product Hunt·9mo ago