All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Understanding Modern GPU Architecture for Machine Learning: H100 and B200 Technical Analysis

By

matt_d

9mo ago· 65 min readen

Summary

This article provides a technical deep dive into modern GPU architecture, specifically focusing on NVIDIA GPUs like H100 and B200 used for machine learning. It explains that GPUs are essentially collections of compute cores specialized for matrix multiplication (Streaming Multiprocessors) connected to high-bandwidth memory. The content compares GPU architecture to Google's TPUs and explores how these chips work individually, how they're networked together, and the implications for large language model training and performance.

Key quotes

· 3 pulled
A modern ML GPU (e.g. H100, B200) is basically a bunch of compute cores that specialize in matrix multiplication (called Streaming Multiprocessors or SMs) connected to a stick of fast memory (called HBM)
Each SM, like a TPU's Tensor Core, has...
This chapter takes a deep dive into the world of NVIDIA GPUs – how each chip works, how they're networked together, and what that means for LLMs, especially compared to TPUs
Snippet from the RSS feed
We love TPUs at Google, but GPUs are great too. This chapter takes a deep dive into the world of NVIDIA GPUs – how each chip works, how they’re networked together, and what that means for LLMs, especially compared to TPUs. This section builds on