All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Setting Up a Local Coding Agent on macOS with Gemma 4 and MTP

By

Kyle Howells

13h ago· 6 min readen

Summary

A developer documents their experience setting up a local coding agent on macOS using Gemma 4 with Multi-Token Prediction (MTP) for faster inference. The setup runs on an Apple M1 Max with 64GB RAM, using llama.cpp with gemma-4-26B-A4B-it-UD-Q4_K_XL.gguf and Qwen3.6 35B-A3B models. The article covers the configuration process, performance testing, and demonstrates real-time agent responsiveness without relying on cloud internet connectivity.

Key quotes

· 3 pulled
I'd had my internet fail a few times recently leaving me stranded without a coding agent
This video is realtime. And shows the agent responding at a perfectly usable speed.
This was tested on an Apple M1 Max with 64 GB unified memory, running macOS 15.7.7.
Snippet from the RSS feed
Running Gemma 4 26B-A4B and Qwen3.6 35B-A3B locally with llama.cpp, MTP speculative decoding, multimodal support, and PI as a coding agent.

You might also wanna read