DigitalOcean Managed PostgreSQL Update Breaks Private VPC Connectivity to Kubernetes
By
neilfrndes
Day-old at best. Try it dunked in something stronger.
Summary
A DigitalOcean customer experienced a production outage when a managed PostgreSQL update broke private VPC connectivity to their managed Kubernetes service. The root cause was a Cilium bug (#34503) where ARP entries become stale after infrastructure changes. While DigitalOcean support responded within 12 hours, their temporary fix involved deploying a DaemonSet from a random GitHub user to ping stale ARP entries every 10 seconds. The upstream Cilium fix is merged but not yet deployed to DigitalOcean Kubernetes Service (DOKS), with no ETA provided. The customer, a small startup, chose managed services specifically to avoid operational emergencies but still experienced downtime.
Key quotes
· 5 pulledYesterday my production app went down. The cause? DigitalOcean's managed PostgreSQL update broke private VPC connectivity to their managed Kubernetes.
Root cause: a Cilium bug (#34503) where ARP entries go stale after infrastructure changes.
Their fix? Deploy a DaemonSet from a random GitHub user to ping stale ARP entries every 10 seconds.
The upstream Cilium fix is merged but not yet deployed to DOKS. No ETA.
I chose managed services specifically to avoid ops emergencies. We're a tiny startup.
You might also wanna read

AWS data center outage in Northern Virginia disrupts Coinbase and FanDuel trading
Amazon Web Services experienced a data center outage in its US-East-1 region (Northern Virginia) due to overheating, affecting major platfor
Building and Operating Your Own Data Center: A Practical Guide from comma
The article advocates for companies to build and operate their own data centers rather than relying on cloud services. It describes how the
DigitalOcean Resolves Network Incident Affecting Multiple Services
DigitalOcean has resolved a major external network incident that affected multiple services including Gen AI tools, App Platform, Load Balan
Kubernetes as an Alternative to Public Cloud: Regaining Infrastructure Control
The article argues that while public cloud services initially seem convenient for infrastructure needs, they lead to escalating costs, opaqu
Two-Year Review: AWS to Bare Metal Migration Saves $230,000 Annually
Two years after migrating from AWS to bare metal servers, the company revisits their cost savings of $230,000 per year and addresses key que
AWS Outage Highlights Concerns About Senior Engineer Brain Drain and DNS Knowledge Loss
The article discusses how AWS's recent DNS-related outage may be linked to a 'brain drain' of senior engineers leaving the company, taking w
