Optimizing AI Agent Token Usage: Reducing Waste in System Prompts
The article discusses how a team optimized their AI agent's token usage by auditing the skills loaded in the system prompt. They found that the system prompt, not the API responses, was consuming the majority of the token budget.
Why it matters
Optimizing token usage in AI agents is critical for maintaining efficient and cost-effective conversational experiences, especially as language models become more powerful and complex.
Key Points
- 1The system prompt, including always-on skills and identity files, consumed over 20,000 tokens, while the user's actual message was only 20-50 tokens
- 2The top 5 skills that were always loaded accounted for over 17,500 tokens, with the largest being the 'claw-search' skill at 5,100 tokens
- 3The team split skills into 'always:true' and 'always:false' categories to only load necessary skills based on the conversation context
Details
The article describes how the team built a compact API format to reduce the size of their API responses, but found that the real token budget was dominated by the system prompt, not the API responses. They ran a token audit across all 31 skills in the system and found that the 'always:true' skills alone consumed around 17,500 tokens, with the top 5 skills accounting for over 50% of that. The team then split the skills into 'always:true' and 'always:false' categories, where the 'always:true' skills were loaded on every message and the 'always:false' skills were only loaded when the conversation context matched their description. This optimization reduced the main agent's always-on cost from 20,600 to 17,500 tokens, a significant saving that adds up across many interactions.
No comments yet
Be the first to comment