Training GPT-2 Style Model on Vast.ai for $10
The article describes the author's journey in pursuing machine learning and AI, including their experience with the fast.ai course and Andrej Karpathy's videos. They built a training pipeline for the GPT-2 architecture and used Vast.ai to train a 124M parameter model for $10. They then fine-tuned the model on the Alpaca dataset and made it available on Hugging Face.
Why it matters
The article showcases the author's journey in learning and applying AI/ML techniques, including building a custom training pipeline and experimenting with model architectures to improve performance.
Key Points
- 1Pivoted from fast.ai to Karpathy's videos to learn ML/AI
- 2Built a training pipeline for GPT-2 architecture on Vast.ai
- 3Trained a 124M parameter model for $10 on the fineweb-edu dataset
- 4Fine-tuned the model on the Alpaca dataset to create a chatbot
- 5Experimented with adding a causal 1D convolution layer before attention
Details
The author initially started with the fast.ai course but was frustrated with its teaching style and outdated libraries. They then pivoted to Andrej Karpathy's 'Zero to Hero' playlist, which they followed closely, pausing to search for and ask questions to LLMs on parts they didn't understand. This led to the creation of the 'rudyon/pipeline' repository, which started as a simple GPT-2 training loop and grew into a full training pipeline for renting GPU instances on services like Vast.ai. The author used a 2x4090S Ti instance on Vast.ai to pretrain the 'rudyon/rudygpt' model with 124M parameters, which cost around $10 for 19 hours of training on the fineweb-edu dataset. They then fine-tuned the model on the Alpaca dataset to create the 'rudyon/rudygpt-instruct' chatbot. The author also experimented with adding a causal 1D convolution layer before the attention mechanism, which surprisingly improved the model's performance at a depth of 4.
No comments yet
Be the first to comment