Karpathy Autoresearch: 700 Experiments Rewire AI Research
Karpathy's autoresearch project used an AI agent to autonomously run ~700 experiments on a language model training setup, finding ~20 tweaks that improved a larger model's performance by ~11%. This shifts the research bottleneck from experiment execution to experiment design and hypothesis curation.
Why it matters
This project demonstrates how AI can automate the tedious parts of the research process, shifting the bottleneck to higher-level tasks like defining the right objectives and constraints.
Key Points
- 1Karpathy's autoresearch used an AI agent to run ~700 autonomous experiments on a language model training setup
- 2The agent found ~20 tweaks that improved a larger model's performance by ~11%
- 3This shifts the research bottleneck from experiment execution to experiment design and hypothesis curation
- 4Automated experimentation can lead to overfitting on the validation set and brittle gains that don't generalize
Details
Karpathy's autoresearch project took a small language model training setup, defined goals and constraints, and handed it to an AI agent that could propose code edits, run 5-minute training runs, and keep edits that improved the validation metric. Over ~2 days, this produced around 700 autonomous changes and about 20 additive improvements that transferred to a larger model, cutting its training time by ~11%. The key insight is that experiment execution is now cheap and automatic, so the hard problems become evaluation design and hypothesis curation. This favors organizations that control compute, benchmarks, and research direction, while also making it easier to generate metric overfitting and brittle gains at scale.
No comments yet
Be the first to comment