AI can’t fake toxicity – new Turing test
15.11.25
Researchers from the University of Zurich, the University of Amsterdam, Duke University, and New York University have found that modern artificial intelligence language models can still be easily distinguished from humans, primarily by their overly friendly and “smooth” emotional tone.
The researchers tested nine popular open-source models—Llama 3.1 8B, Llama 3.1 8B Instruct, Llama 3.1 70B, Mistral 7B v0.1, Mistral 7B Instruct v0.2, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek-R1-Distill-Llama-8B, and Apertus-8B-2509—on social media posts from X (Twitter), Bluesky, and Reddit. Classifier algorithms developed as part of the project were able to recognize AI-generated texts with an accuracy of 70–80%.
“Computational Turing Test”
Researchers have presented a new version of the “Computational Turing Test”—a metric that evaluates how closely AI speech resembles real internet communication. The system uses automated linguistic analysis to identify features by which neural network texts differ from human texts, primarily in emotional tone.
“Even after careful calibration, LLM results remain significantly different from human texts in emotional tone and expression,” the authors note.
Why AI is “too polite”
A team led by Nicolo Pagan from the University of Zurich found that even with complex optimization strategies (including fine-tuning and clarifying prompts), the emotional cues inherent in AI persist. When models responded to real social media posts, they struggled to reproduce the informal expressions, sarcasm, and mild negativity typical of human speech. Their toxicity levels remained significantly lower.
Attempts to increase realism—for example, adding example user posts or additional context—were only partially successful. Differences in sentence length and text structure were smoothed out, but the emotional differences remained.
Unexpected Findings
Scientists found that models with instructional training (Instruct) imitate humans worse than their baseline versions. Llama 3.1 8B and Mistral 7B v0.1 demonstrated better results, achieving 75-85% accuracy in imitating human responses. Furthermore, scaling the models yielded no benefits: Llama 3.1, with 70 billion parameters, proved less “human” than models with 8 billion parameters.
Attempts to “disguise” texts as human-like reduced their semantic similarity to real user responses: the semantic similarity score dropped from 0.18-0.34 to 0.16-0.28 across different platforms. In other words, the harder the model tried to “appear human,” the easier it was to recognize that it was AI.
Where AI Hides Best
Differences also emerged between platforms. On X (Twitter), neural networks imitated humans most successfully, but detection was the least accurate.
On Bluesky, results were average, while on Reddit, they performed the worst: AI texts differed more significantly. The researchers believe this is due to differences in user communication styles and the extent to which data from specific platforms was used in training the models.
Modern LLMs remain vulnerable to spontaneous emotional expression and the natural ambiguity characteristic of human communication. AI can imitate grammar and vocabulary, but its emotional “smoothness” remains a noticeable marker of artificiality.
Don't miss interesting news
Subscribe to our channels and read announcements of high-tech news, tes
Oppo A6 Pro smartphone review: ambitious
Creating new mid-range smartphones is no easy task. Manufacturers have to balance performance, camera capabilities, displays, and the overall cost impact of each component. How the new Oppo A6 Pro balances these factors is discussed in our review.
Editor’s Choice 2025. Best devices of the year by hi-tech.ua
The best gaming laptops, mice for work, gaming keyboards, smartphones, and wireless headphones of 2025. Among them, we will highlight the most interesting ones and those that we can recommend buying.
Rogbid Enduro – smartwatch with large 1100 mAh battery and $30 price protection smart watches
Chinese company Rogbid introduced the Rogbid Enduro smartwatch, emphasizing autonomy and increased durability.
Over 19,000 games were released on Steam in 2025, half of which players didn’t even notice games Steam
In 2025, more than 19,000 games were released on the Steam platform – this is new record for the service and more than last year, when 18,559 releases were recorded.


