Running Karpathy's Autoresearch with Local LLM — Zero API Cost Autonomous AI Research
This article describes a fork of Andrej Karpathy's 'autoresearch' experiment that replaces the cloud-based Claude API with a local 9B-parameter LLM, enabling autonomous AI research on a single GPU with zero API costs.
Why it matters
This approach enables fully autonomous AI research on a single GPU with zero API costs, making it more accessible for individual researchers and hobbyists.
Key Points
- 1Runs Qwen 3.5 LLM locally via ollama alongside GPT training on the same GPU
- 2Adjusts hyperparameters to fit within 48GB VRAM constraints
- 3Autonomous research loop: LLM proposes code changes, runs 5-minute experiments, keeps improvements
- 4Elegant code extraction pipeline using regex and ast.parse() for syntax validation
Details
The key innovation in this fork is running both the LLM agent and the GPT training on the same GPU, fitting within 48GB VRAM constraints by reducing model depth, batch size, and total batch tokens. The autonomous research loop involves the LLM proposing specific code modifications to the 'train.py' script, with the changes validated for syntax and then executed in a 5-minute experiment. If the validation loss (val_bpb) improves, the changes are kept; otherwise, they are discarded. A failsafe resets to the baseline after 3 consecutive crashes. The agent code is compact (250 lines) and includes efficient pipelines for LLM interaction, Git operations, experiment execution, and results logging.
No comments yet
Be the first to comment