Efficient Performance of 120B Model on Minimal Hardware
By
zigzag312
A respectable bake. You'd come back tomorrow for another.
Summary
The article discusses the efficient performance of a 120B model running on minimal hardware, specifically highlighting the use of CPU for expert layers and GPU for attention layers, requiring only 5 to 8GB of VRAM. It emphasizes the benefits of this setup, such as low memory use and snappy performance, and recommends hardware like the RTX3000 series for optimal results.
Key quotes
· 4 pulledThe expert layers run amazing on CPU (~17T/s 25T/s on a 14900K) and you can force that with this new llama-cpp option: --cpu-moe.
No giant MLP weights are resident on the GPU, so memory use stays low.
This yields an amazing snappy system for a 120B model! Even something like a 3060Ti would be amazing!
GPU with BF16 support would be best (RTX3000+) because all layers except the MOE layers (which are mxfp4) are BF16.
You might also wanna read
xAI Releases Grok Build 0.1 Coding Model to Developers via Public API Beta
xAI has released Grok Build 0.1, its fastest coding model, to developers via the xAI API in public beta. Previously limited to paying subscr
devops.com·22m agoC# Span<T>: A Guide to Type-Safe Memory Management and Performance Optimization
This article explains C# Span<T> and ReadOnlySpan<T>, introduced in C# 7.2 (2017) and fully supported in .NET Core. These structures provide
Flathub bans nearly all generative AI apps and submissions on Linux platform
Flathub, a popular Linux application platform, has updated its generative AI policy to effectively ban nearly all apps and submissions creat
Project Glasswing: AI-assisted vulnerability detection finds over 10,000 critical software flaws
Project Glasswing is a collaborative effort launched to secure critical software against potential threats from increasingly capable AI mode
Project Glasswing: AI-assisted vulnerability detection finds over 10,000 critical software flaws
Project Glasswing is a collaborative effort launched to secure critical software against potential threats from increasingly capable AI mode
Kefir C compiler development moves to private mode indefinitely
The developer of the Kefir C compiler announces the cessation of public development, transitioning the project to private mode indefinitely.
