Reverse-Engineering Hermes 4's Training Stack
This article delves into the technical details of how Hermes 4, a large language model, was trained by Nous Research. It highlights the key components of their training approach, including DataForge, a synthetic data generator, and Atropos, a rejection-sampling framework.
Why it matters
Nous' training approach for Hermes 4 provides insights into how large language models can be effectively trained using synthetic data and rejection sampling, while balancing performance and reliability.
Key Points
- 1Nous built a composable synthetic data graph (DataForge) to generate diverse training data
- 2Atropos uses rejection sampling with ~1,000 task-specific verifiers, not online RL
- 3A second-stage SFT pass hard-caps reasoning traces at 30,000 tokens to reduce overlong outputs
Details
The article explains that Nous Research used a directed acyclic graph-based synthetic data generator called DataForge to create 5 million training samples, covering a wide range of distributions that would be difficult to design manually. This allowed them to scale up the training data by 5x compared to Hermes 3, with a 50x increase in total tokens. The article also highlights Atropos, Nous' rejection-sampling framework that uses around 1,000 task-specific verifiers to filter the generated samples, rather than relying on online reinforcement learning. This approach is more scalable and allows for broader coverage of desired behaviors, such as structured output and tool use. Finally, the article discusses a second-stage SFT (Supervised Fine-Tuning) pass that hard-caps reasoning traces at 30,000 tokens, which reduced overlong outputs by 78.4% but came at a 4.7-12.7% accuracy cost.
No comments yet
Be the first to comment