Using OR-Tools CP-SAT to Optimize Cloud Infrastructure Maintenance Scheduling at Akamai
By
akutlay
Toasted golden, schmeared with insight. Top of the rack.
Summary
The article discusses using Google's OR-Tools CP-SAT solver to optimize maintenance scheduling in Akamai's cloud infrastructure. It addresses the complex problem of scheduling disruptive maintenance on hypervisor hosts serving hundreds of thousands of guest VMs, balancing competing constraints like capacity, customer disruption SLAs, and concurrency limits across hosts, racks, and datacenters. The author compares various optimization tools including commercial and open-source MIP solvers before settling on OR-Tools CP-SAT for prototyping solutions.
Key quotes
· 3 pulledI've been working on improving how we schedule maintenance in Akamai's cloud infrastructure, especially disruptive maintenance on hypervisor hosts serving hundreds of thousands of guest VMs.
The problem is fairly complex, with competing constraints such as capacity, customer disruption SLAs, and concurrency limits across hosts, racks, and datacenters.
After exploring different options, I found Google's OR-Tools library, particularly its CP-SAT solver.
You might also wanna read
Why local configuration verification is critical before cloud deployment in DevOps pipelines
The article discusses the importance of verifying configuration shifts locally before deploying to cloud clusters, using the example of a co
dev.to·2d agoInstaVM: Hardware-Isolated Cloud Infrastructure for AI Agents with Sub-200ms MicroVMs
InstaVM is a cloud infrastructure platform designed specifically for AI agents, providing them with isolated, fast-booting virtual machines
K3k: Rancher's Tool for Running Isolated Kubernetes Clusters Inside Kubernetes
K3k is an open-source tool from Rancher that enables users to create and manage isolated K3s (lightweight Kubernetes) clusters within an exi
Loopsy: Control Terminal AI Agents Remotely from Your Phone via Cloudflare Relay
Loopsy is a self-hosted tool that lets you control terminal-based AI coding agents (like Claude Code, Cursor, Codex) from your phone. It wor
Technical Challenge: Implementing SSH Access for Multiple VMs on Shared IP Infrastructure
The article discusses the technical challenge of implementing SSH access for multiple VMs on a shared IP address infrastructure at exe.dev.
Ably Realtime Platform: Scalable Messaging Infrastructure for Developers
Ably Realtime is a platform offering developers a purpose-built API for adding scalable realtime messaging features to applications. The pla
