All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Exploring the Engineering Behind ChatGPT's Scalability for 700M Users

By

superasn

9mo ago· 1 min readenNews

Summary

The article discusses the technical challenges of running a GPT-4-class model locally compared to ChatGPT's ability to serve 700 million weekly users. It explores potential engineering optimizations like model sharding, custom hardware, and load balancing that enable such scalability while maintaining low latency.

Key quotes

· 3 pulled
Sam said yesterday that chatgpt handles ~700M weekly users.
Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.
What engineering tricks make this possible at such massive scale while keeping latency low?
Snippet from the RSS feed
Sam said yesterday that chatgpt handles ~700M weekly users. Meanwhile, I can't even run a single GPT-4-class model locally without insane VRAM or painfully slow speeds.

You might also wanna read