All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

Meta announces two 24k GPU clusters for Llama 3 training and AI infrastructure

By

By Kevin Lee, Adi Gangidi, Mathew Oldham

2h ago· 11 min readen

Summary

Meta is announcing two large-scale 24k GPU clusters built for training AI models, specifically Llama 3. The post details the hardware (Grand Teton, OpenRack), network, storage, design, performance, and software stack (PyTorch) used to achieve high throughput and reliability. Meta emphasizes its commitment to open compute and open source, positioning this as a key step in its ambitious AI infrastructure roadmap through 2024.

Source

Twitter / XMeta announces two 24k GPU clusters for Llama 3 training and AI infrastructureengineering.fb.com

Key quotes

· 5 pulled
Marking a major investment in Meta's AI future, we are announcing two 24k GPU clusters.
We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads.
We are strongly committed to open compute and open source.
We built these clusters on top of Grand Teton, OpenRack, and PyTorch and continue to push open innovation across the industry.
This announcement is one step in our ambitious infrastructure roadmap.
Snippet from the RSS feed
Marking a major investment in Meta’s AI future, we are announcing two 24k GPU clusters. We are sharing details on the hardware, network, storage, design, performance, and software that help us extr…

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.