Running Gemma 4 on a 2016 Xeon Server with No GPU: A Technical Walkthrough
By
cafkafk
Slow-proofed and worth the wait. Worth its weight in flour.
Summary
The article describes running Gemma 4 (a 25B-parameter Mixture-of-Experts model) on a severely outdated server with a 2016 Intel Xeon E5-2620 v4 CPU, 128 GB of slow DDR3 RAM, and no GPU. The author details the technical process of quantizing the model's MTP drafters, pairing them with a verifier, and successfully executing inference on hardware that is 5-6 times slower than modern equivalents. The piece serves as both a technical walkthrough and a demonstration that large language models can be run on recycled, decade-old enterprise hardware through optimization techniques.
Key quotes
· 3 pulledI have a recycled server. To its credit, it has a whopping 128 GB RAM, but it's DDR3… That RAM is 5-6 times slower than the current best laptop ram.
It also has a single Intel Xeon E5-2620 v4 from 2016, which is about 5 times slower than my laptops CPU…
Oh, and as I did mention, we have no GPU. And no, the Xeon does not have an integrated GPU.
You might also wanna read
How Multi-Token Prediction drafters accelerate Gemma 4 inference by up to 3x
This article explains how Google's Gemma 4 models achieve up to 3x faster inference through Multi-Token Prediction (MTP) drafters and specul
Lucebox Hub: Hand-Tuned LLM Inference Optimization for Consumer Hardware
Lucebox Hub is an open-source optimization project focused on hand-tuning LLM inference for specific consumer hardware. The project rewrites
Optimizing Suspend/Resume Performance for FreeBSD on Thinkpad X220
The article discusses optimizing suspend/resume times for FreeBSD on a Thinkpad X220 laptop, comparing performance between FreeBSD 14.2 with
Project Glasswing: AI-assisted vulnerability detection finds over 10,000 critical software flaws
Project Glasswing is a collaborative effort launched to secure critical software against potential threats from increasingly capable AI mode
Project Glasswing: AI-assisted vulnerability detection finds over 10,000 critical software flaws
Project Glasswing is a collaborative effort launched to secure critical software against potential threats from increasingly capable AI mode
Kefir C compiler development moves to private mode indefinitely
The developer of the Kefir C compiler announces the cessation of public development, transitioning the project to private mode indefinitely.
