Dev.to LLM3h ago|Research & Papers Products & Services

Running Karpathy's Autoresearch with Local LLM — Zero API Cost Autonomous AI Research

This article describes a fork of Andrej Karpathy's 'autoresearch' experiment that replaces the cloud-based Claude API with a local 9B-parameter LLM, enabling autonomous AI research on a single GPU with zero API costs.

💡

Why it matters

This approach enables fully autonomous AI research on a single GPU with zero API costs, making it more accessible for individual researchers and hobbyists.

Key Points

1Runs Qwen 3.5 LLM locally via ollama alongside GPT training on the same GPU
2Adjusts hyperparameters to fit within 48GB VRAM constraints
3Autonomous research loop: LLM proposes code changes, runs 5-minute experiments, keeps improvements
4Elegant code extraction pipeline using regex and ast.parse() for syntax validation

Details

The key innovation in this fork is running both the LLM agent and the GPT training on the same GPU, fitting within 48GB VRAM constraints by reducing model depth, batch size, and total batch tokens. The autonomous research loop involves the LLM proposing specific code modifications to the 'train.py' script, with the changes validated for syntax and then executed in a 5-minute experiment. If the validation loss (val_bpb) improves, the changes are kept; otherwise, they are discarded. A failsafe resets to the baseline after 3 consecutive crashes. The agent code is compact (250 lines) and includes efficient pipelines for LLM interaction, Git operations, experiment execution, and results logging.

Running Karpathy's Autoresearch with Local LLM — Zero API Cost Autonomous AI Research

Why it matters

Key Points

Details

Dive deeper

Related Articles

Sub-Agent Architectures: Patterns, Trade-offs, and a Kotlin…

Building a Local-First RAG Research Tool with Nemotron, vLL…

Security Blind Spots in AI-Generated Code

Debugging & Production Incidents with AI

Testing Illusions – AI-Generated Tests That Lie

Prompting Like a Pro – How to Talk to AI

We Don't Need to Copy the Human Brain, We Need to Learn fro…

Add Email Capabilities to AI Agents in Google Colab

Why GenAI Isn't Ready for Prime Time

Evaluating and Abandoning a Context Compression Tool

AI Curator

Ask me anything about AI

Related Articles

Sub-Agent Architectures: Patterns, Trade-offs, and a Kotlin…

Building a Local-First RAG Research Tool with Nemotron, vLL…

Security Blind Spots in AI-Generated Code

Debugging & Production Incidents with AI

Testing Illusions – AI-Generated Tests That Lie

Prompting Like a Pro – How to Talk to AI

We Don't Need to Copy the Human Brain, We Need to Learn fro…

Add Email Capabilities to AI Agents in Google Colab

Why GenAI Isn't Ready for Prime Time

Evaluating and Abandoning a Context Compression Tool