All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Guide to Running Google Gemma 4 AI Model Locally with LM Studio CLI on macOS

By

vbtechguy

1mo ago· 19 min readen

Summary

This article provides a technical guide on running Google's Gemma 4 26B parameter model locally using LM Studio's new headless CLI tools. It explains the advantages of local AI models over cloud APIs, including cost savings, privacy, and avoiding rate limits. The article details how to set up Gemma 4 on macOS hardware, highlighting its mixture-of-experts architecture that allows the 26B model to run efficiently by only activating 4B parameters per forward pass. The guide includes practical setup instructions for using the model with Claude Code for local inference tasks.

Key quotes

· 4 pulled
Cloud AI APIs are great until they are not. Rate limits, usage costs, privacy concerns, and network latency all add up.
For quick tasks like code review, drafting, or testing prompts, a local model that runs entirely on your hardware has real advantages: zero API costs, no data leaving your machine, and consistent availability.
Google's Gemma 4 is interesting for local use because of its mixture-of-experts architecture. The 26B parameter model only activates 4B parameters per forward pass, which means it runs well on hardware that could never handle a dense 26B model.
LM Studio 0.4.0 introduced llmster and the lms CLI. Here is how I set up Gemma 4 26B for local inference on macOS that can be used with Claude Code.
Snippet from the RSS feed
LM Studio 0.4.0 introduced llmster and the lms CLI. Here is how I set up Gemma 4 26B for local inference on macOS that can be used with Claude Code.

You might also wanna read