Every tenth answer is a mistake: study questions the accuracy of Google’s AI answers
12.04.26
Google’s AI-powered search summaries demonstrate a high level of accuracy, yet a noticeable share of errors remains. According to the study, around 10% of responses are inaccurate — which, at the scale of Google Search, translates into a massive volume of misleading information.
How AI Overviews work
AI Overviews are a Google feature that generates concise answers to user queries using Gemini AI models. The technology was first introduced in 2024 and has since been widely rolled out across multiple regions, including Ukraine.
The system aggregates data from various sources and produces a short summary, allowing users to get information quickly without visiting multiple links.
Study findings
A joint study by The New York Times and the startup Oumi found that approximately 90% of AI Overviews responses are accurate. However, about one in ten answers contains errors or misleading information.
The evaluation was conducted using the SimpleQA benchmark — a set of 4,000 questions developed by OpenAI. Results showed that accuracy improved after model updates: earlier versions achieved around 85%, while newer iterations exceeded 90%.
Still, even this level of accuracy raises concerns given the scale of Google Search. When extrapolated, it may result in millions of incorrect responses every hour.
Examples of inaccuracies
The report highlights several specific cases. For instance, when asked about the date Bob Marley’s former home became a museum, the system cited sources that either lacked clear dates or contained incorrect information.
In another example, the AI claimed that a particular classical music institution did not exist, despite referencing its official website. Such inconsistencies point to reliability issues in AI-generated responses.
Google’s response
Google criticized the study’s methodology, arguing that the benchmark used may contain inaccuracies and does not reflect real-world search behavior.
According to the company, internal evaluations rely on a more carefully curated dataset, providing a more accurate picture of system performance.
Why evaluating AI is difficult
Assessing generative AI systems remains a complex task. Different benchmarks can produce varying results, and models may generate different answers to the same question.
Additionally, AI Overviews does not rely on a single model — instead, it dynamically selects the most appropriate system for each query. More advanced models tend to be slower and more resource-intensive, so they are not always used.
The main risk: user trust
Despite clear progress, the biggest concern lies in how users воспринимают AI-generated answers. Many tend to trust them without verification, even when errors are possible.
While using internet sources improves accuracy, it also increases the risk of spreading misinformation.
Although Google includes disclaimers that AI responses may be incorrect, in practice many users do not double-check the information they receive.
Don't miss interesting news
Subscribe to our channels and read announcements of high-tech news, tes
Oppo A6 Pro smartphone review: ambitious
Creating new mid-range smartphones is no easy task. Manufacturers have to balance performance, camera capabilities, displays, and the overall cost impact of each component. How the new Oppo A6 Pro balances these factors is discussed in our review.
One UI 8.5 Gives Older Samsung Phones a New Lease on Life — Here’s What the Update Brings
One UI 8.5 brings features once exclusive to Samsung’s newest flagships to older Galaxy devices. But can the update really make the Galaxy S22, S23 and S24 feel closer to the Galaxy S26 experience? Here’s what actually changes after installing the new firmware.
GIGABYTE is preparing a revolution in the monitor market with the AORUS ELITE line
The GIGABYTE brand has introduced an advanced premium series of AORUS ELITE gaming monitors.
NVIDIA RTX Spark: Windows on Arm goes to a new level
NVIDIA has announced a new RTX Spark processor designed specifically for the Windows on Arm platform.


