All Topics

Technology

Art

Efficient Performance of 120B Model on Minimal Hardware

zigzag312

· 3 min readenNews

75/100

Toasty

Bagelometer↗

A respectable bake. You'd come back tomorrow for another.

Score75TypenewsSentimentpositive

Summary

The article discusses the efficient performance of a 120B model running on minimal hardware, specifically highlighting the use of CPU for expert layers and GPU for attention layers, requiring only 5 to 8GB of VRAM. It emphasizes the benefits of this setup, such as low memory use and snappy performance, and recommends hardware like the RTX3000 series for optimal results.

Key quotes

· 4 pulled

The expert layers run amazing on CPU (~17T/s 25T/s on a 14900K) and you can force that with this new llama-cpp option: --cpu-moe.

No giant MLP weights are resident on the GPU, so memory use stays low.

This yields an amazing snappy system for a 120B model! Even something like a 3060Ti would be amazing!

GPU with BF16 support would be best (RTX3000+) because all layers except the MOE layers (which are mxfp4) are BF16.

Snippet from the RSS feed

Here is the thing, the expert layers run amazing on CPU (~~\~17T/s~~ 25T/s on a 14900K) and you can force that with this new llama-cpp option:...

You might also wanna read

xAI Releases Grok Build 0.1 Coding Model to Developers via Public API Beta

xAI has released Grok Build 0.1, its fastest coding model, to developers via the xAI API in public beta. Previously limited to paying subscr

devops.com·22m ago

C# Span<T>: A Guide to Type-Safe Memory Management and Performance Optimization

This article explains C# Span<T> and ReadOnlySpan<T>, introduced in C# 7.2 (2017) and fully supported in .NET Core. These structures provide

blog.ndepend.com·1h ago

Flathub bans nearly all generative AI apps and submissions on Linux platform

Flathub, a popular Linux application platform, has updated its generative AI policy to effectively ban nearly all apps and submissions creat

gamingonlinux.com·1h ago

Project Glasswing: AI-assisted vulnerability detection finds over 10,000 critical software flaws

Project Glasswing is a collaborative effort launched to secure critical software against potential threats from increasingly capable AI mode

anthropic.com·2h ago

Project Glasswing: AI-assisted vulnerability detection finds over 10,000 critical software flaws

Project Glasswing is a collaborative effort launched to secure critical software against potential threats from increasingly capable AI mode

anthropic.com·2h ago

Kefir C compiler development moves to private mode indefinitely

The developer of the Kefir C compiler announces the cessation of public development, transitioning the project to private mode indefinitely.

kefir.protopopov.lv·3h ago