All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

How Modal built ultra-low-latency serverless routing with Pingora, Envoy, and Spanner

2d ago· 13 min readenInsight

Summary

Modal introduces "Servers" — a new ultra-low-latency primitive for running HTTP, WebSocket, and gRPC workloads on their serverless platform. The article provides a deep technical dive into how Modal built the routing layer for these Servers using Pingora (a Rust-based proxy framework by Cloudflare), Envoy, and Google Cloud Spanner. It covers the architectural decisions, trade-offs, and design patterns behind achieving sub-millisecond routing overhead for latency-sensitive applications like LLM inference for interactive agents.

Source

Twitter / XHow Modal built ultra-low-latency serverless routing with Pingora, Envoy, and Spannermodal.com

Key quotes

· 3 pulled
Servers are designed for applications where every millisecond counts, like LLM inference for interactive agents.
Servers give you a regionalized, autoscaling pool of HTTP server replicas behind Modal's routing layer, with the deployment ergonomics, fast feedback loops, and autoscaling we consider table stakes (for humans and for agents).
Now, you can also run ultra-low-latency Servers on Modal for HTTP, WebSocket, and gRPC traffic.
Snippet from the RSS feed
A deep dive inside our new ultra-low-latency primitive.

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.