Dev.to Machine Learning3h ago|Research & Papers Products & Services

Reverse-Engineering Hermes 4's Training Stack

This article delves into the technical details of how Hermes 4, a large language model, was trained by Nous Research. It highlights the key components of their training approach, including DataForge, a synthetic data generator, and Atropos, a rejection-sampling framework.

💡

Why it matters

Nous' training approach for Hermes 4 provides insights into how large language models can be effectively trained using synthetic data and rejection sampling, while balancing performance and reliability.

Key Points

1Nous built a composable synthetic data graph (DataForge) to generate diverse training data
2Atropos uses rejection sampling with ~1,000 task-specific verifiers, not online RL
3A second-stage SFT pass hard-caps reasoning traces at 30,000 tokens to reduce overlong outputs

Details

The article explains that Nous Research used a directed acyclic graph-based synthetic data generator called DataForge to create 5 million training samples, covering a wide range of distributions that would be difficult to design manually. This allowed them to scale up the training data by 5x compared to Hermes 3, with a 50x increase in total tokens. The article also highlights Atropos, Nous' rejection-sampling framework that uses around 1,000 task-specific verifiers to filter the generated samples, rather than relying on online reinforcement learning. This approach is more scalable and allows for broader coverage of desired behaviors, such as structured output and tool use. Finally, the article discusses a second-stage SFT (Supervised Fine-Tuning) pass that hard-caps reasoning traces at 30,000 tokens, which reduced overlong outputs by 78.4% but came at a 4.7-12.7% accuracy cost.

Reverse-Engineering Hermes 4's Training Stack

Why it matters

Key Points

Details

Dive deeper

Related Articles

AI-Powered Catalog Operations for E-commerce Companies in 2…

AI-Powered Ticket Routing for Support Operations Teams in 2…

AI Knowledge Base Automation for Customer Support Operation…

AI-Enabled Call Summarization for Customer Support Teams in…

AI Quality Assurance Automation for Contact Center Teams in…

AI-Powered Agent Coaching for BPO Operations in 2026 (50% C…

Architecting a Self-Organizing Content Platform with HDBSCAN

Atlassian Enables Default Data Collection to Train AI

Anthropic's Claude Mythos Escape Exposes Decades-Old Securi…

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reaso…

AI Curator

Ask me anything about AI

Related Articles

AI-Powered Catalog Operations for E-commerce Companies in 2…

AI-Powered Ticket Routing for Support Operations Teams in 2…

AI Knowledge Base Automation for Customer Support Operation…

AI-Enabled Call Summarization for Customer Support Teams in…

AI Quality Assurance Automation for Contact Center Teams in…

AI-Powered Agent Coaching for BPO Operations in 2026 (50% C…

Architecting a Self-Organizing Content Platform with HDBSCAN

Atlassian Enables Default Data Collection to Train AI

Anthropic's Claude Mythos Escape Exposes Decades-Old Securi…

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reaso…