Gemma-Tuner-Multimodal: Fine-Tuning Google's Gemma Models on Apple Silicon for Text, Images, and Audio
By
MediaSquirrel
Master baker tier. Every paragraph earns its place on the tray.
Summary
The article introduces gemma-tuner-multimodal, an open-source tool for fine-tuning Google's Gemma language models (versions 4 and 3n) on multimodal data including text, images, and audio. The key innovation is that it runs on Apple Silicon Macs using PyTorch and Metal Performance Shaders (MPS), eliminating the need for NVIDIA GPUs. The tool supports LoRA (Low-Rank Adaptation) fine-tuning and can stream training data from the cloud, making it accessible for users without high-end hardware. A comparison table shows its advantages over alternatives like MLX-LM, Unsloth, and axolotl in terms of multimodal support and Apple Silicon compatibility.
Key quotes
· 4 pulledFine-tune Gemma on text, images, and audio — on your Mac, on data that doesn't fit on your Mac.
LoRA for Gemma 4 & 3n — why not just use…?
Runs on Apple Silicon (MPS) ✅
No NVIDIA GPU required ✅
You might also wanna read
TranslateGemma: Open AI Translation Models Based on Google's Gemma 3 Support 55 Languages
TranslateGemma is a new suite of open AI translation models built on Google's Gemma 3 framework, supporting 55 languages with high accuracy
Google Unveils Gemini: A Multimodal AI Model to Rival GPT-4
Google's Gemini is introduced as its largest and most capable AI model, designed to be multimodal and capable of understanding and combining
Google DeepMind Releases Gemma 4: Most Advanced Open AI Model Family
Google DeepMind has released Gemma 4, its most advanced open AI model family to date. The models feature enhanced reasoning capabilities, mu
Russet: On-Device AI Platform for Apple Silicon with MLX Models and Local Processing
Russet is an on-device AI platform for Apple silicon that combines Apple Intelligence with hardware-optimized MLX models. It offers pre-conf
MiniCPM 4.0: Open-source 8B multimodal AI model outperforms GPT-4o and Gemini Pro on vision benchmarks
MiniCPM 4.0 is an ultra-efficient 8B open-source multimodal AI model designed for on-device use that outperforms larger models like GPT-4o a
Google launches Gemini 3.1 Flash-Lite, its fastest and cheapest model for high-volume AI pipelines
Google's Gemini 3.1 Flash-Lite has reached general availability as the company's most cost-efficient Gemini 3 model. It's designed for high-
