MacBook vs. Dedicated GPU for LLM Inference: Unified Memory Trade-Offs
By
mzubairtahir
Summary
A brief Hacker News comment comparing MacBooks (with unified memory) to dedicated GPUs for running LLMs. The author notes MacBooks can run larger models slowly due to high unified memory, while dedicated GPUs run smaller models faster due to limited VRAM.
Source
Key quotes
· 3 pulledMacBooks with their unified memory behave like a slow GPU with enormous amount of video RAM.
So you can run large smart models slowly.
Dedicated GPUs have less video RAM so can run smaller less smart models quickly.
You might also wanna read
LLMs on Intel Macs with AMD GPUs in MacOS is here
Guide to Calculating GPU Memory for Self-Hosted LLM Inference
The article provides a guide on calculating GPU memory requirements and managing concurrent requests for self-hosted large language model (L
The Critical Role of GPU Kernel Quality in Machine Learning System Performance
This article discusses the critical role of GPU kernel quality in machine learning system performance. It highlights that end-to-end speed i
Nvidia Has a Plan to Put Its Chips in Personal Computers
APEX4: Platform-Dependent W4A4 LLM Inference via Intra-SM Compute Rebalancing
This paper presents APEX4, a system for efficient W4A4 (4-bit weights, 4-bit activations) LLM inference that addresses the bottleneck of gro
University of Twente researchers find GPU clock adjustment can cut LLM training energy by 14% without speed loss
Researchers at the University of Twente have discovered that dynamically adjusting GPU clock frequency during LLM training can save up to 14
spectrum.ieee.org·24d ago
Comments
Sign in to join the conversation.
No comments yet. Be the first.