Gemini Flash Hallucinates 91% of the Time When Unsure
The Gemini 3 Flash model has a 91% hallucination rate on the Artificial Analysis Omniscience Hallucination Rate benchmark, indicating it frequently provides incorrect answers when it should have refused or admitted to not knowing.
Why it matters
Hallucination rate is a critical metric for AI models, especially in applications that require accurate and reliable output.
Key Points
- 1Gemini 3 Flash model has a 91% hallucination rate on the AA-Omniscience Hallucination Rate benchmark
- 2Hallucination rate measures how often the model answers incorrectly when it should have refused or admitted to not knowing
- 3Other models like Claude and GPT have lower hallucination rates, ranging from 26% to 93%
- 4Hallucination rate may be an important factor for applications requiring precise, reliable output like coding
Details
The article discusses the performance of various AI models on the Artificial Analysis Omniscience Hallucination Rate benchmark. The Gemini 3 Flash model stands out with a very high 91% hallucination rate, meaning it frequently provides incorrect answers when it should have refused or admitted to not knowing. In contrast, other models like Claude and GPT have lower hallucination rates, ranging from 26% to 93%. This metric may be particularly important for applications that require precise, reliable output, such as coding, where hallucinations could lead to significant issues. The article suggests that the lower hallucination rates of Anthropic models like Claude may be a key factor in their strong performance on coding tasks.
No comments yet
Be the first to comment