All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Flash-MoE: Running 397B Parameter AI Model on MacBook Pro with 48GB RAM

By

mft_

2mo ago· 5 min readenCode

Summary

Flash-MoE is a pure C/Metal inference engine that enables running the massive Qwen3.5-397B-A17B model (397 billion parameters) on a MacBook Pro with 48GB RAM. The system achieves 4.4+ tokens/second with production-quality output including tool calling, streaming the entire 209GB model from SSD through a custom Metal compute pipeline. The project was built in 24 hours by an AI and human collaboration, using no Python or frameworks—just C, Objective-C, and hand-tuned Metal shaders.

Key quotes

· 4 pulled
Pure C/Metal inference engine that runs Qwen3.5-397B-A17B (a 397 billion parameter Mixture-of-Experts model) on a MacBook Pro with 48GB RAM at 4.4+ tokens/second with production-quality output including tool calling.
The entire 209GB model streams from SSD through a custom Metal compute pipeline. No Python. No frameworks. Just C, Objective-C, and hand-tuned Metal shaders.
Running a big model on a small laptop.
The story of how an AI and a human built this in 24 hours.
Snippet from the RSS feed
Running a big model on a small laptop. Contribute to danveloper/flash-moe development by creating an account on GitHub.

You might also wanna read