Cedana: AI/HPC GPU Checkpointing Startup Seeks Forward Deployed Engineer
By
neelm
Crispy enough to crunch, soft enough to enjoy. A good bake.
Summary
Cedana is a Y Combinator-backed startup that provides automated GPU checkpointing infrastructure to maximize AI and HPC cluster utilization and reliability. Their solution enables transparent migration of GPU workloads across instances without losing work, operating at the kernel/OS level with no code changes required. The company is hiring a Forward Deployed Engineer to lead customer integrations, deploy into SLURM, Kubernetes, and Dynamo environments, and drive product innovation from the field. The role requires 3-10 years of software engineering experience with SLURM deployment expertise, strong Linux fundamentals, and Kubernetes operations knowledge. The position is remote US-based with ~25% travel and offers $140K-$180K base salary plus equity.
Key quotes
· 5 pulledCedana maximizes AI+HPC cluster utilization and reliability with automated GPU checkpointing infrastructure.
We enable transparent and fast migration of GPU workloads across instances, without losing work.
Our system is at the kernel/OS level, requiring no code or config changes, and works seamlessly with Kubernetes, SLURM, and NVIDIA Dynamo.
This role will expose you to the cutting edge of AI and HPC infrastructure, working with the world's leading research and commercial customers to deliver a breakthrough solution.
Cedana's founding team has spent over a decade making computation run fast, productively, and reliably for AI.
You might also wanna read
Startup SPAN plans to install mini AI data centers in residential neighborhoods, offering homeowners subsidized utilities
SPAN, a San Francisco startup, is piloting a program to install mini data centers (XFRA nodes with liquid-cooled Nvidia RTX Pro 6000 GPUs) i
arstechnica.com·5h agoFemale CFOs at major tech firms navigate massive AI infrastructure spending decisions
The article examines how the CFO role in Big Tech has evolved from focusing on margins and investor discipline to grappling with massive AI
Female CFOs at major tech firms navigate massive AI infrastructure spending decisions
The article examines how the CFO role in Big Tech has evolved from focusing on margins and investor discipline to grappling with massive AI
ByteDance Plans $70B AI Infrastructure Spend for 2026, Tripling Investment to Bypass US Export Controls
ByteDance, the parent company of TikTok, is planning up to $70 billion in AI capital expenditure for 2026, nearly tripling its 2025 spend of
awesomeagents.ai·12h agoUtah tightens regulations on Kevin O'Leary's 40,000-acre AI data center project after public opposition
Utah regulators have tightened rules for Kevin O'Leary's proposed Stratos Project, a massive AI data center campus spanning 40,000 acres, fo
Utah tightens regulations on Kevin O'Leary's 40,000-acre AI data center after public backlash
Utah regulators have tightened rules on Kevin O'Leary's proposed Stratos Project, a massive 40,000-acre AI data center campus, following sig
