Benchmarks are all over the map in 2026. HalluHard shows a 30.2% error rate...
https://dibz.me/blog/gemini-2-0-flash-001-at-0-7-hallucination-rate-why-your-production-pipeline-needs-a-reality-check-1160
Benchmarks are all over the map in 2026. HalluHard shows a 30.2% error rate even with web search enabled. You cannot just pick a single score and trust it for your stack