AI Productivity Measurement Just Broke

AI Productivity Measurement Just Broke

I ran across a LinkedIn post last week in which a consultant bragged about helping a client achieve “347% ROI on their AI initiative.” The comments were full of congratulations and requests for the methodology. Nobody asked the obvious question: how exactly did they measure that?

ROI math isn’t complicated. You spend money, you get money back, you compare the two. That part is real. You can track what you spent on an AI tool, what you spent on implementation, and what you spent on the people involved. The dollars-out side of the equation is concrete.

The problem is what happens between spending the money and getting the return. Somewhere in that gap, you need to understand how AI changed the way people work. How much time did it save? How much did it cost in human effort to manage, verify, and correct? What did people do with the time they got back, and did that generate value?

That’s the productivity question. And it turns out even the best researchers in the world can’t answer it reliably anymore. (I wrote about a version of this problem in The AI Productivity Lie , but what’s happened since makes that piece look optimistic.)

What Happened at METR

METR is a research organization that runs controlled experiments on AI’s impact on developer productivity. Real developers, real open-source projects, randomized conditions. About as rigorous as this kind of research gets.

In early 2025, they published a result that surprised people: AI tools caused a 19% slowdown in task completion among experienced developers. Controversial, but the methodology was solid.

Then they ran a follow-up. Late 2025, 57 developers across 143 repositories, over 800 tasks. The new results showed the opposite: an 18% speedup for returning developers.

The researchers themselves flagged the new data as unreliable. Here’s why.

Between the two studies, developers started refusing to participate in tasks where AI wasn’t allowed. Thirty to fifty percent declined to submit tasks they wouldn’t attempt without AI. Recruitment got harder because experienced developers didn’t want to work under those conditions.

One developer put it this way: “My head’s going to explode if I try to do too much the old-fashioned way because it’s like trying to get across the city walking when all of a sudden I was more used to taking an Uber.”

The control group — people willing to work without AI — stopped being representative. The measurement methodology broke. Not because the researchers made mistakes, but because AI adoption crossed a threshold where you can’t create a clean baseline anymore.

The Hard Side of the Equation

ROI has a hard side and a soft side. The hard side of what you spent on licenses, infrastructure, and implementation is straightforward accounting. Nobody’s confused about that number.

The soft side is where it falls apart. That’s the human productivity piece: how much time did people save, how did their work change, and what’s the value of the work they did instead? This is the number that feeds every AI ROI calculation I’ve ever seen, and it’s the number that METR just proved we can’t measure cleanly.

When the people doing the work won’t do it without AI, you can’t isolate AI’s contribution anymore. The tool and the worker have merged into something you can’t cleanly separate. It’s like trying to measure the productivity impact of electricity by asking factory workers to go back to steam power. The experiment doesn’t measure what you think it measures because the conditions required to run it no longer reflect reality.

Most AI ROI calculations papers over this problem. Take the cost of three analysts, subtract the cost of the AI tool, and call the difference “savings.” Never mind that you still need two analysts to manage the tool, or that the third one got reassigned to a project that was already behind. The hard dollars look clean. The productivity assumptions underneath them are fiction. (This is the dashboard delusion applied to AI: clean numbers that feel precise but measure the wrong thing.)

The Refusal Is the Data Point

Here’s what I think most organizations are missing: that developers’ refusal to work without AI is more meaningful than any productivity percentage METR could have produced.

Nobody refused because a study told them AI makes them 18% faster. They refused because the experience of working without it feels broken. That’s not a metric you can put in a spreadsheet, but it’s a stronger signal than any ROI calculation I’ve seen.

I don’t know a developer right now who doesn’t use AI in some form. There are holdouts who think it’s cheating or a crutch, but they’re increasingly outnumbered. The people actually doing the work have already made their judgment; with their behavior, not their benchmarks.

When your workforce reorganizes itself around a tool before you’ve finished your ROI analysis, the analysis is answering yesterday’s question. These are the quiet failures of systems shifting underneath you while you’re still measuring the old version.

Harder Math, Not No Math

None of this means you shouldn’t measure AI’s impact. But the productivity assumptions that feed the return side of the equation need a different approach.

Instead of measuring “AI productivity,” measure whether your business problems are getting solved. Track the outcomes that actually matter : resolution times, error rates, customer satisfaction, revenue per customer. Track them before AI, track them after, and account for all the other variables that affect them. That’s messy. It takes longer than a quarter. It won’t fit on a slide.

It’s also the only version that reflects reality.

Extend your measurement timeline past the point where most organizations lose patience. AI projects typically look bad in months one through six while people learn the tools and rebuild processes. Break-even shows up between six and twelve months. Positive returns come after a year if the implementation was sound.

Measure business outcomes and ask whether AI contributed to improving them. It’s harder math, but it’s honest math.

The Question That Matters

Stop asking “What’s the ROI of our AI investment?”, at least not the way most people mean it. The dollars-in side is knowable. The productivity-gains side isn’t, and pretending otherwise produces fantasy numbers.

Ask instead: Are we solving the business problems we set out to solve? Are resolution times dropping? Are customers more satisfied? Are we making fewer errors? Is revenue growing in the areas where we deployed AI?

If the answers are yes, the investment is working, even if you can’t pin a precise percentage on the AI tool. If the answers are no, a polished ROI deck won’t change what’s actually broken.

Getting this right means giving up on clean productivity numbers and focusing on whether the business is actually moving forward.

Share

Like what you're reading?

Get new issues delivered to your inbox. One idea per issue, no spam.