Understanding the Token Consumption of Gemini 2.5 Flash
This article explains why Gemini 2.5 Flash and Pro models from Google may return a truncated response with a low number of output tokens, despite setting a high max_output_tokens limit. The root cause is the models' internal reasoning process that consumes a significant portion of the token budget.
Why it matters
Understanding the token consumption behavior of Gemini 2.5 Flash and Pro models is crucial for developers using these models, as it can help them optimize their usage and avoid unexpected truncated responses.
Key Points
- 1Gemini 2.5 Flash and Pro are reasoning models that burn tokens on internal thinking before generating the visible response
- 2Unlike OpenAI's models, Google counts the 'thinking tokens' against the max_output_tokens budget
- 3Gemini 2.5 Flash defaults to a dynamic thinking budget, which can consume 90-98% of the token limit
- 4The API response shows the 'thoughtsTokenCount' and 'candidatesTokenCount', indicating the token usage breakdown
Details
The article explains that Gemini 2.5 Flash and Pro models from Google are reasoning models, similar to OpenAI's language models. However, unlike OpenAI's models, Google's models count the internal 'thinking tokens' against the max_output_tokens budget set by the user. This means that even if you set a high token limit, the model may consume most of that budget on its internal reasoning process, leaving little room for the actual output. The article provides a breakdown of how the token usage works: 1) The model thinks and consumes some number of tokens, tracked as 'thoughtsTokenCount'. 2) Once the combined 'thoughtsTokenCount' and 'candidatesTokenCount' hits the budget, the generation stops. 3) If the thinking process consumed most of the budget, the 'candidatesTokenCount' ends up being near zero, resulting in a truncated response. The article also notes that Gemini 2.5 Flash defaults to a dynamic thinking budget, which can consume 90-98% of the token limit for non-trivial tasks.
No comments yet
Be the first to comment