Platform Engineer - Benchmark Lead Position at ARC Prize Foundation for AI Benchmark Development
By
gkamradt_
A respectable bake. You'd come back tomorrow for another.
Summary
The ARC Prize Foundation is hiring a Platform Engineer - Benchmark Lead to own and evolve the platform behind their ARC-AGI series of AI benchmarks. This senior engineering role involves stabilizing the current benchmark infrastructure, building verification and testing layers, supporting early implementation of ARC-AGI-4, and setting the technical foundation for ARC-AGI-5. The position requires strong backend engineering skills with Python, distributed systems experience, and expertise in building evaluation harnesses and testing pipelines for AI/ML systems.
Key quotes
· 5 pulledAI benchmarks that measure general intelligence and inspire new ideas
A senior engineer to own and evolve the platform behind ARC-AGI series of benchmarks
Stabilize and extend the V3 backend and infrastructure - Own performance to keep the current benchmark platform reliable
Strong backend engineering with Python, plus distributed systems, SQL, cloud infrastructure, and production reliability experience
Senior enough to act as a technical owner and architect of the benchmark platform (we have a high agency team)
You might also wanna read
Ndea Hiring Technical Staff for AGI Search Guidance Research and Engineering
Ndea is hiring technical staff for a full-time remote position focused on building AGI systems with search guidance. The role involves hands
NVIDIA Announces "Hack for Impact" London Event for Autonomous AI Agent Development
NVIDIA is hosting a "Hack for Impact" event in London, challenging participants to build autonomous agentic applications using open-source m
MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines
MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types
Reflections on DwarfStar 4's rapid rise in local AI inference
The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve
Reflections on DwarfStar 4's rapid rise in local AI inference
The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve
Building a Personal AI Agent with Markdown-Based Skills and Local Models
The article describes a personal AI agent built on Pi that manages the author's inbox, calendar, deal pipeline, blog publishing, and researc
