Every tenth answer is a mistake: study questions the accuracy of Google’s AI answers
12.04.26
Google’s AI-powered search summaries demonstrate a high level of accuracy, yet a noticeable share of errors remains. According to the study, around 10% of responses are inaccurate — which, at the scale of Google Search, translates into a massive volume of misleading information.
How AI Overviews work
AI Overviews are a Google feature that generates concise answers to user queries using Gemini AI models. The technology was first introduced in 2024 and has since been widely rolled out across multiple regions, including Ukraine.
The system aggregates data from various sources and produces a short summary, allowing users to get information quickly without visiting multiple links.
Study findings
A joint study by The New York Times and the startup Oumi found that approximately 90% of AI Overviews responses are accurate. However, about one in ten answers contains errors or misleading information.
The evaluation was conducted using the SimpleQA benchmark — a set of 4,000 questions developed by OpenAI. Results showed that accuracy improved after model updates: earlier versions achieved around 85%, while newer iterations exceeded 90%.
Still, even this level of accuracy raises concerns given the scale of Google Search. When extrapolated, it may result in millions of incorrect responses every hour.
Examples of inaccuracies
The report highlights several specific cases. For instance, when asked about the date Bob Marley’s former home became a museum, the system cited sources that either lacked clear dates or contained incorrect information.
In another example, the AI claimed that a particular classical music institution did not exist, despite referencing its official website. Such inconsistencies point to reliability issues in AI-generated responses.
Google’s response
Google criticized the study’s methodology, arguing that the benchmark used may contain inaccuracies and does not reflect real-world search behavior.
According to the company, internal evaluations rely on a more carefully curated dataset, providing a more accurate picture of system performance.
Why evaluating AI is difficult
Assessing generative AI systems remains a complex task. Different benchmarks can produce varying results, and models may generate different answers to the same question.
Additionally, AI Overviews does not rely on a single model — instead, it dynamically selects the most appropriate system for each query. More advanced models tend to be slower and more resource-intensive, so they are not always used.
The main risk: user trust
Despite clear progress, the biggest concern lies in how users воспринимают AI-generated answers. Many tend to trust them without verification, even when errors are possible.
While using internet sources improves accuracy, it also increases the risk of spreading misinformation.
Although Google includes disclaimers that AI responses may be incorrect, in practice many users do not double-check the information they receive.
Don't miss interesting news
Subscribe to our channels and read announcements of high-tech news, tes
Oppo A6 Pro smartphone review: ambitious
Creating new mid-range smartphones is no easy task. Manufacturers have to balance performance, camera capabilities, displays, and the overall cost impact of each component. How the new Oppo A6 Pro balances these factors is discussed in our review.
Logitech MX Keys S Combo wireless keyboard and mouse set review: tactile hi-end
The Logitech MX Keys S Combo combines a top-of-the-line mouse, keyboard, and palm rest, providing good ergonomics, build quality, and extensive functionality when working with multiple devices
Google Translate competitor Duolingo: new widgets for learning languages
Google developers are preparing an update that will add the Practice streak widget to the Google Translate mobile app.
Apple immediately updated iOS, iPadOS, and macOS, fixing more than 25 critical vulnerabilities
Apple immediately released security patches 26.5.2 for iOS, iPadOS, and macOS. Fixes for over 25 vulnerabilities driven by the need to protect users from accelerated AI-powered malware creation


