How Modal built ultra-low-latency serverless routing with Pingora, Envoy, and Spanner
Summary
Modal introduces "Servers" — a new ultra-low-latency primitive for running HTTP, WebSocket, and gRPC workloads on their serverless platform. The article provides a deep technical dive into how Modal built the routing layer for these Servers using Pingora (a Rust-based proxy framework by Cloudflare), Envoy, and Google Cloud Spanner. It covers the architectural decisions, trade-offs, and design patterns behind achieving sub-millisecond routing overhead for latency-sensitive applications like LLM inference for interactive agents.
Source
Key quotes
· 3 pulledServers are designed for applications where every millisecond counts, like LLM inference for interactive agents.
Servers give you a regionalized, autoscaling pool of HTTP server replicas behind Modal's routing layer, with the deployment ergonomics, fast feedback loops, and autoscaling we consider table stakes (for humans and for agents).
Now, you can also run ultra-low-latency Servers on Modal for HTTP, WebSocket, and gRPC traffic.
You might also wanna read
Brahma-JS: Ultra-Low-Latency JavaScript Orchestrator Built with Rust Core
Brahma-JS is an ultra-low-latency orchestrator for JavaScript that combines Express-style middleware and routing with a high-performance Rus
npmjs.com·8mo agoBuilding Scalable Agent Infrastructure: From AWS Lambda to Unikraft Micro-VMs
Browser Use shares their technical journey from running web agents on AWS Lambda to developing a more robust infrastructure using Unikraft m
Evolution of High-Performance Web Servers: From C10k to io_uring, kTLS and Rust
This article discusses the evolution of high-performance web servers, focusing on the historical context of the C10k problem and the progres
Building a Sub-500ms Latency Voice Agent: Technical Architecture and Implementation
Nick Tikhonov shares his technical journey building a sub-500ms latency voice agent from scratch, detailing the challenges of achieving real
Rustunnel: Open-Source ngrok-Style Tunnel Service Written in Rust
Rustunnel is an open-source tunnel service written in Rust that replicates ngrok's functionality, allowing users to expose local services be
TCP_NODELAY: Why Modern Distributed Systems Should Disable Nagle's Algorithm by Default
The article discusses the persistent latency issues in modern distributed systems caused by the default TCP_NODELAY setting, which implement

Comments
Sign in to join the conversation.
No comments yet. Be the first.