Transformer AI Model Successfully Runs on Commodore 64 with 25,000 Parameters
By
adunk
A baker's-dozen of insight crammed into one ring.
Summary
A developer has successfully implemented a 25,000-parameter transformer neural network (the same architecture used by ChatGPT, Claude, and Gemini) on an unmodified Commodore 64 computer. The 2-layer decoder-only transformer runs on 1 MHz hardware using hand-written 6502/6510 assembly language, featuring real multi-head causal self-attention, softmax, and RMSNorm. The entire model fits on a floppy disk and generates tokens at a rate of about 60 seconds per token, demonstrating the feasibility of running modern AI architectures on vintage 8-bit hardware.
Key quotes
· 5 pulledA real transformer running on a 1 MHz Commodore 64.
A 2-layer decoder-only transformer - the same architecture behind ChatGPT, Claude, and Gemini - implemented in hand-written 6502/6510 assembly and running on an unmodified Commodore 64.
~25,000 int8 parameters. Real multi-head causal self-attention, real softmax, real RMSNorm.
About 60 seconds per token. The whole thing fits on a floppy disk with room to spare.
2 layers, 4 attention heads × 8 dims, 32-dimensional embeddings, 64 FFN hidden units.
You might also wanna read
Transformer Neural Network Implemented in HyperTalk for Classic Macintosh
A complete transformer neural network implemented entirely in HyperTalk, a 1987 scripting language designed for interactive card stacks, run
XORTRAN: A FORTRAN IV Neural Network for IBM 1130 and PDP-11 Computers
XORTRAN is a multilayer perceptron neural network written in FORTRAN IV that runs on vintage IBM 1130 (1965) and PDP-11 (1970-1980) computer
ZX Spectrum BASIC interpreter rebuilt from scratch to run natively in web browsers
A developer has rebuilt the ZX Spectrum's BASIC interpreter from scratch to run in a web browser, without emulating the original Z80 hardwar
Reflections on DwarfStar 4's rapid rise in local AI inference
The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve
Reflections on DwarfStar 4's rapid rise in local AI inference
The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve
MacSurf brings CSS3, ES5 JavaScript, and native HTTPS to Classic Mac OS 9 PowerPC systems
MacSurf is an early-alpha web browser for Classic Mac OS 9 PowerPC systems (like the G3 iMac) that brings modern web technologies — CSS3, ES
