Scaling Karpathy's Autoresearch: Parallel GPU Processing Enables New AI Experimentation Strategies

hopechong

2mo ago· 11 min readenInsight

100/100

Golden Brown

Bagelometer↗

Kettled twice. Extra chewy, extra trustworthy.

Score100TypeanalysisSentimentpositive

Summary

The article describes an experiment where researchers scaled Andrej Karpathy's autoresearch system by giving it access to 16 GPUs on a Kubernetes cluster instead of running single experiments sequentially. Over 8 hours, the AI agent submitted approximately 910 experiments, discovering that scaling model width was more important than individual hyperparameters. The agent autonomously learned to use H200 GPUs for validation while screening ideas on H100s, achieving a 2.87% improvement in validation bits per byte (val_bpb) from 1.003 to 0.974. The key insight was that parallel processing fundamentally changed the agent's search strategy from greedy hill-climbing to running factorial grids of 10-13 experiments per wave, enabling it to catch interactions between hyperparameters that would be missed in sequential execution.

Key quotes

· 5 pulled

Over 8 hours it submitted ~910 experiments, found that scaling model width mattered more than any single hyperparameter

taught itself to use H200s for validation while screening ideas on H100s

drove val_bpb from 1.003 down to 0.974 - a 2.87% improvement over baseline

With one GPU, it's stuck doing greedy hill-climbing - try one thing, check, repeat

With 16 GPUs, it ran factorial grids of 10-13 experiments per wave, catching interactions

Snippet from the RSS feed

Karpathy's autoresearch runs one experiment at a time. We gave it access to our GPU infra and let it run experiments in parallel.

You might also wanna read

Agentipedia: Collaborative AI Agent Platform for Crowdsourced Experimentation and Model Development

Agentipedia is a platform that enables AI agents to collaboratively conduct experiments and share results, building on Andrej Karpathy's Aut

Product Hunt·2mo ago