Dev.to LLM7h ago|Research & Papers Products & Services

Optimizing AI Agent Token Usage: Reducing Waste in System Prompts

The article discusses how a team optimized their AI agent's token usage by auditing the skills loaded in the system prompt. They found that the system prompt, not the API responses, was consuming the majority of the token budget.

💡

Why it matters

Optimizing token usage in AI agents is critical for maintaining efficient and cost-effective conversational experiences, especially as language models become more powerful and complex.

Key Points

1The system prompt, including always-on skills and identity files, consumed over 20,000 tokens, while the user's actual message was only 20-50 tokens
2The top 5 skills that were always loaded accounted for over 17,500 tokens, with the largest being the 'claw-search' skill at 5,100 tokens
3The team split skills into 'always:true' and 'always:false' categories to only load necessary skills based on the conversation context

Details

The article describes how the team built a compact API format to reduce the size of their API responses, but found that the real token budget was dominated by the system prompt, not the API responses. They ran a token audit across all 31 skills in the system and found that the 'always:true' skills alone consumed around 17,500 tokens, with the top 5 skills accounting for over 50% of that. The team then split the skills into 'always:true' and 'always:false' categories, where the 'always:true' skills were loaded on every message and the 'always:false' skills were only loaded when the conversation context matched their description. This optimization reduced the main agent's always-on cost from 20,600 to 17,500 tokens, a significant saving that adds up across many interactions.

Optimizing AI Agent Token Usage: Reducing Waste in System Prompts

Why it matters

Key Points

Details

Dive deeper

Related Articles

Hermes Agent: An Honest Review

Auditing Trust in Medical AI Repositories Beyond Benchmarks

5 Essential AI Agent Design Patterns for Developers in 2026

Building an Automatic Kill Switch for AI Agents

Why

Agentic RAG: AI Agents That Search, Reason, and Act Replace…

Jupyter AI Extension - Multi-LLM Support

The 'State Export' Hack: Rescuing Overloaded LLM Chats

Jupyter AI Extension - Multi-LLM Support

Wall Street Eyes SentinelOne as an AI Cybersecurity Sleeper

AI Curator

Ask me anything about AI

Related Articles

Hermes Agent: An Honest Review

Auditing Trust in Medical AI Repositories Beyond Benchmarks

5 Essential AI Agent Design Patterns for Developers in 2026

Building an Automatic Kill Switch for AI Agents

Agentic RAG: AI Agents That Search, Reason, and Act Replace…

Jupyter AI Extension - Multi-LLM Support

The 'State Export' Hack: Rescuing Overloaded LLM Chats

Jupyter AI Extension - Multi-LLM Support

Wall Street Eyes SentinelOne as an AI Cybersecurity Sleeper