We're All Wrong in the Same Places

We're All Wrong in the Same Places

Photo by jean wimmerlin on Unsplash

A paper published earlier this year compared more than 350 AI models and looked at the questions they get wrong. When models fail, they tend to fail together, agreeing on the same wrong answer about 60% of the time. Random chance would land at 33%. The best models were worst at this, clustering on incorrect answers more reliably than weaker models did.

For most of intellectual history, error correction across a field happened through variation. Your blind spot was someone else’s obvious objection. People approached the same problem from different angles, read different things and came up with different answers based on knowledge and/or intuition. One person’s error got caught by another person’s different framing.

Now everyone is consulting the same LLM’s that have been trained on the same data. The variation doesn’t disappear altogether. It stops doing its job.

There’s a 2026 paper that calls this “Invisible Groupthink.” When a whole team asks the same AI around the same time, they converge without any of the usual warning signs. Normally you can feel groupthink building: someone senior pushes a view, speaking up gets costly and people go quiet instead of saying the thing that needs saying. None of that happens here with LLM’s because nobody pushed and nobody went quiet. Everyone sent the same question to the same place, got a similar kind of answer, and it came back looking like genuine agreement. The signal that something’s wrong is gone because the mechanism that would produce it never operated.

Organizations are making strategy calls, investment theses, hiring decisions, and market reads from the same source. The variation that used to exist across those decisions didn’t disappear…it was replaced by correlation. When the source is right, everyone benefits. When it’s wrong, the mistakes stack on each other. Everybody shares the blind spot and there’s nobody standing at a different angle to catch it.

A paper in Nature Communications Psychology this year found the same pattern in research itself, with academic work bunching up around topics and methods AI happens to be good at. Work that doesn’t fit the AI story, the authors write, “may struggle for legitimacy or resources.” Nobody sat down and decided to narrow the field; it just drifted toward whatever the tools were trained on, and the questions that might have kept it honest quietly stopped getting funded.

You can see the surface version of this in content and professional work. A USC study published in Trends in Cognitive Sciences earlier this year found that after an LLM edits a piece of writing, the markers that identify the writer’s perspective start to disappear. Researchers tracking 52,000 chess players over five years found the same pattern : playing styles converged as players got feedback from the same AI system. This is the visible layer. It’s easy to read as a style gripe and scroll past.

I wrote a few weeks ago about what happens when you ask AI for a second opinion : it tends to agree with you because it was trained to. The 60% error-clustering finding is the field-level version of that problem. If your second opinion agrees with you, and mine agrees with me, and we’re both asking the same model, neither of us got an independent perspective. Instead, we get agreement from the same source.

Not everyone reads it this way. A 2024 study found that people exposed to AI suggestions actually produced more varied ideas than those who didn’t use the tools at all. That’s a real result. The catch is what those people were doing with the output: using it as a starting point and then pushing back, not taking it wholesale. Most people, though, don’t push back so the diversity gains found in that research probably don’t survive the typical workflow.

The tools are good and the work looks fine, which is what makes this hard to catch. A whole field converging on the same answer looks exactly like a lot of smart people independently arriving at the truth. From the inside, there’s no way to tell those two apart.

Share

Get weekly insights on technology leadership

One idea per issue. No spam. Plus a free guide on measuring AI initiatives when the old metrics don't work.

Or download the free guide directly →