Hermes 4's Tool-Calling Trained as Separate Skill

Nous Research's Atropos RL framework trains tool-calling as a separate skill, not just a prompt-format convention. This leads to more reliable and structurally valid tool invocations.

💡

Why it matters

This training methodology for tool-calling can lead to more reliable and production-ready AI agents, with real trade-offs around reasoning mode, token cost, and other benchmarks.

Key Points

  • 1Atropos uses rejection sampling, not fine-tuning, to train tool-calling behavior
  • 2Hermes 4 uses in-turn XML-style tags for tool definitions and invocations
  • 3This approach makes the inner JSON more reliable, not just syntactically correct

Details

Atropos RL framework runs ~1,000 task-specific verifiers, including ones for Schema Adherence and Tool Use. This trains the model to emit structurally valid, constraint-respecting JSON for tool calls, not just 'JSON-shaped text'. The training methodology is different from typical RLHF fine-tuning - Atropos generates candidate responses and filters them through the verifiers, using a binary signal. This shapes the model's tool-calling behavior to explicitly satisfy the schema's structural constraints. The article notes the absence of a published benchmark to confirm this holds across arbitrary user-defined schemas, but the qualitative observations are consistent with the training approach.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies