Technology

Art

How Modal built ultra-low-latency serverless routing with Pingora, Envoy, and Spanner

2d ago· 13 min readenInsight

technology programming systems architecture cloud infrastructure

Summary

Modal introduces "Servers" — a new ultra-low-latency primitive for running HTTP, WebSocket, and gRPC workloads on their serverless platform. The article provides a deep technical dive into how Modal built the routing layer for these Servers using Pingora (a Rust-based proxy framework by Cloudflare), Envoy, and Google Cloud Spanner. It covers the architectural decisions, trade-offs, and design patterns behind achieving sub-millisecond routing overhead for latency-sensitive applications like LLM inference for interactive agents.

Source

Twitter / XHow Modal built ultra-low-latency serverless routing with Pingora, Envoy, and Spannermodal.com

Key quotes

· 3 pulled

Servers are designed for applications where every millisecond counts, like LLM inference for interactive agents.

Servers give you a regionalized, autoscaling pool of HTTP server replicas behind Modal's routing layer, with the deployment ergonomics, fast feedback loops, and autoscaling we consider table stakes (for humans and for agents).

Now, you can also run ultra-low-latency Servers on Modal for HTTP, WebSocket, and gRPC traffic.

Snippet from the RSS feed

A deep dive inside our new ultra-low-latency primitive.

You might also wanna read

Brahma-JS: Ultra-Low-Latency JavaScript Orchestrator Built with Rust Core

Brahma-JS is an ultra-low-latency orchestrator for JavaScript that combines Express-style middleware and routing with a high-performance Rus

npmjs.com·8mo ago

Building Scalable Agent Infrastructure: From AWS Lambda to Unikraft Micro-VMs

Browser Use shares their technical journey from running web agents on AWS Lambda to developing a more robust infrastructure using Unikraft m

browser-use.com·4mo ago

Evolution of High-Performance Web Servers: From C10k to io_uring, kTLS and Rust

This article discusses the evolution of high-performance web servers, focusing on the historical context of the C10k problem and the progres

blog.habets.se·10mo ago

Building a Sub-500ms Latency Voice Agent: Technical Architecture and Implementation

Nick Tikhonov shares his technical journey building a sub-500ms latency voice agent from scratch, detailing the challenges of achieving real

ntik.me·3mo ago

Rustunnel: Open-Source ngrok-Style Tunnel Service Written in Rust

Rustunnel is an open-source tunnel service written in Rust that replicates ngrok's functionality, allowing users to expose local services be

github.com·3mo ago

TCP_NODELAY: Why Modern Distributed Systems Should Disable Nagle's Algorithm by Default

The article discusses the persistent latency issues in modern distributed systems caused by the default TCP_NODELAY setting, which implement

brooker.co.za·6mo ago

Comments

No comments yet. Be the first.