Debugging Envoy Load Balancer Latency with eBPF Zero-Code Instrumentation
By
sergiocipriano
A five-star bake. Worth schmearing, sharing, saving.
Summary
The article describes a technical solution for debugging an Envoy Network Load Balancer using eBPF (Extended Berkeley Packet Filter) for zero-code instrumentation. The author faced HTTP 499 errors caused by latency in their cloud infrastructure, and Envoy's built-in options were insufficient for identifying bottlenecks. The solution involves using eBPF to instrument the TCP proxy without modifying code, allowing for detailed latency analysis and troubleshooting of network performance issues in a Load Balancer as a Service (LBaaS) environment.
Key quotes
· 5 pulledWe were seeing a small number of HTTP 499 errors caused by latency somewhere in our cloud, but it wasn't clear what the bottleneck was.
My team is responsible for the LBaaS product (Load Balancer as a Service) and, of course, we are the first suspects when this kind of problem appear.
Before going for the current solution, I read a lot of Envoy's documentation.
The options Envoy provides just weren't enough.
As a result, each team had to set up additional instrumentation to catch latency spikes and figure out what was going on.
You might also wanna read
Why average CPU utilization is a misleading metric for cloud-native applications
The article discusses the pitfalls of relying on average CPU utilization metrics in cloud-native environments, particularly in Kubernetes. I
Agumbe: AI-Powered Workspace Platform for Kubernetes Application Development
Agumbe is a platform that provides AI-powered workspaces for building and running applications on Kubernetes. It helps teams go from idea to
Towlion: Self-Hosted Micro-PaaS for GitHub-Based Application Deployment
Towlion is a self-hosted micro-PaaS (Platform as a Service) that enables developers to deploy full web applications directly from GitHub to
aws-doctor: Open-Source CLI Tool for AWS Security, Cost, and Best Practices Auditing
aws-doctor is an open-source command-line tool written in Golang that performs comprehensive health checks on AWS accounts. It audits securi
Netflix's Simian Army: Testing Cloud Reliability Through Intentional Failures
Netflix discusses their cloud infrastructure reliability strategy called the "Simian Army" - a suite of tools designed to test and improve s
Wozz: Kubernetes Cost Optimization Tool for Preventing Resource Waste
Wozz is a Kubernetes cost optimization tool that helps engineering teams reduce cloud spending through two main approaches: a PR Cost Linter
