All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Running Gemma 4 on a 2016 Xeon Server with No GPU: A Technical Walkthrough

By

cafkafk

4h ago· 15 min readen

Summary

The article describes running Gemma 4 (a 25B-parameter Mixture-of-Experts model) on a severely outdated server with a 2016 Intel Xeon E5-2620 v4 CPU, 128 GB of slow DDR3 RAM, and no GPU. The author details the technical process of quantizing the model's MTP drafters, pairing them with a verifier, and successfully executing inference on hardware that is 5-6 times slower than modern equivalents. The piece serves as both a technical walkthrough and a demonstration that large language models can be run on recycled, decade-old enterprise hardware through optimization techniques.

Key quotes

· 3 pulled
I have a recycled server. To its credit, it has a whopping 128 GB RAM, but it's DDR3… That RAM is 5-6 times slower than the current best laptop ram.
It also has a single Intel Xeon E5-2620 v4 from 2016, which is about 5 times slower than my laptops CPU…
Oh, and as I did mention, we have no GPU. And no, the Xeon does not have an integrated GPU.
Snippet from the RSS feed
Or running Gemma 4 on a 2016 Xeon with no GPU, 25 flags, 128 GB of DDR3, and a 25B-parameter MoE.

You might also wanna read