All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Transformer AI Model Successfully Runs on Commodore 64 with 25,000 Parameters

By

adunk

1mo ago· 5 min readenCode

Summary

A developer has successfully implemented a 25,000-parameter transformer neural network (the same architecture used by ChatGPT, Claude, and Gemini) on an unmodified Commodore 64 computer. The 2-layer decoder-only transformer runs on 1 MHz hardware using hand-written 6502/6510 assembly language, featuring real multi-head causal self-attention, softmax, and RMSNorm. The entire model fits on a floppy disk and generates tokens at a rate of about 60 seconds per token, demonstrating the feasibility of running modern AI architectures on vintage 8-bit hardware.

Key quotes

· 5 pulled
A real transformer running on a 1 MHz Commodore 64.
A 2-layer decoder-only transformer - the same architecture behind ChatGPT, Claude, and Gemini - implemented in hand-written 6502/6510 assembly and running on an unmodified Commodore 64.
~25,000 int8 parameters. Real multi-head causal self-attention, real softmax, real RMSNorm.
About 60 seconds per token. The whole thing fits on a floppy disk with room to spare.
2 layers, 4 attention heads × 8 dims, 32-dimensional embeddings, 64 FFN hidden units.
Snippet from the RSS feed
A real 25k-parameter transformer running on a Commodore 64! - gizmo64k/soulplayer-c64

You might also wanna read