All Topics

Technology

Art

Running Gemma 4 on a 2016 Xeon Server with No GPU: A Technical Walkthrough

cafkafk

4h ago· 15 min readen

100/100

Golden Brown

Bagelometer↗

Slow-proofed and worth the wait. Worth its weight in flour.

Score100Typehow-toSentimentpositive

Summary

The article describes running Gemma 4 (a 25B-parameter Mixture-of-Experts model) on a severely outdated server with a 2016 Intel Xeon E5-2620 v4 CPU, 128 GB of slow DDR3 RAM, and no GPU. The author details the technical process of quantizing the model's MTP drafters, pairing them with a verifier, and successfully executing inference on hardware that is 5-6 times slower than modern equivalents. The piece serves as both a technical walkthrough and a demonstration that large language models can be run on recycled, decade-old enterprise hardware through optimization techniques.

Key quotes

· 3 pulled

I have a recycled server. To its credit, it has a whopping 128 GB RAM, but it's DDR3… That RAM is 5-6 times slower than the current best laptop ram.

It also has a single Intel Xeon E5-2620 v4 from 2016, which is about 5 times slower than my laptops CPU…

Oh, and as I did mention, we have no GPU. And no, the Xeon does not have an integrated GPU.

Snippet from the RSS feed

Or running Gemma 4 on a 2016 Xeon with no GPU, 25 flags, 128 GB of DDR3, and a 25B-parameter MoE.

You might also wanna read

How Multi-Token Prediction drafters accelerate Gemma 4 inference by up to 3x

This article explains how Google's Gemma 4 models achieve up to 3x faster inference through Multi-Token Prediction (MTP) drafters and specul

blog.google·26d ago

Lucebox Hub: Hand-Tuned LLM Inference Optimization for Consumer Hardware

Lucebox Hub is an open-source optimization project focused on hand-tuning LLM inference for specific consumer hardware. The project rewrites

github.com·1mo ago

Optimizing Suspend/Resume Performance for FreeBSD on Thinkpad X220

The article discusses optimizing suspend/resume times for FreeBSD on a Thinkpad X220 laptop, comparing performance between FreeBSD 14.2 with

eugene-andrienko.com·10mo ago

Project Glasswing: AI-assisted vulnerability detection finds over 10,000 critical software flaws

Project Glasswing is a collaborative effort launched to secure critical software against potential threats from increasingly capable AI mode

anthropic.com·58m ago

Project Glasswing: AI-assisted vulnerability detection finds over 10,000 critical software flaws

Project Glasswing is a collaborative effort launched to secure critical software against potential threats from increasingly capable AI mode

anthropic.com·58m ago

Kefir C compiler development moves to private mode indefinitely

The developer of the Kefir C compiler announces the cessation of public development, transitioning the project to private mode indefinitely.

kefir.protopopov.lv·2h ago