Your AI velocity gauge reads backward
A controlled trial found experienced developers felt 20% faster with AI while the stopwatch clocked them 19% slower.
Ask a senior engineer whether AI coding tools make them faster and most will say yes. Ask them to prove it with a number and the room goes quiet. That gap between the confident yes and the missing number is now the most important thing in the AI tooling conversation, because a real experiment just measured it, and the yes was wrong.
METR, an AI research nonprofit, ran a randomized controlled trial on 16 experienced open-source developers across 246 real tasks in repositories they already knew well. Big, mature codebases: over a million lines, thousands of GitHub stars, the kind of software where these people are the resident experts. For each task the dev was randomly assigned to work with an AI assistant (Cursor Pro with Claude 3.5/3.7 Sonnet) or without one. Before starting, they expected AI to speed them up by 24%. After finishing, they reported it had sped them up by about 20%. The clock said they were 19% slower.
That is not measurement noise. It is a self-report pointing the opposite direction from reality by nearly 40 points, under exactly the conditions most professional work happens in: skilled people, in code that already exists.
xychart-beta
title "Expected vs felt vs measured speedup"
x-axis ["Predicted", "Felt after", "Measured"]
y-axis "Speedup %" -20 --> 30
bar [24, 20, -19]
The bottleneck moved and nobody re-staffed it
The mechanism is not mysterious once you stop treating typing as the hard part. For an expert in a familiar codebase, generating the code was never the expensive step. Knowing what to write, and confirming it's correct, was. AI attacks the cheap step and quietly inflates the expensive one. In the trial, developers using AI spent less time searching and writing and more time prompting, waiting, and reviewing. Over half the AI suggestions were reportedly not usable, and even the accepted ones needed cleanup.
The one-line version: generation got cheap, verification got expensive. You remove the old bottleneck and dump the work straight into a new one, and the new one is review.
The interesting part is that team telemetry says the same thing from the other side, at a scale a 16-person study never could. The analysis that's been circulating under the phrase "the gauge broke" pulls together several of these signals. Faros AI, looking across more than 10,000 developers, reported pull requests merged up 98% and PR size up over 150%, with review time up 91% and roughly no net change in delivery. Notably, 31% of PRs merged with no review at all. DORA's research has associated higher AI adoption with a measurable drop in delivery stability. GitClear, reading some 200 million changed lines, found copy-pasted code and churn rising while refactoring collapsed below 10% of changes, with 2024 the first year on record that developers pasted more code than they reorganized.
Treat those specific figures with the caution any single-sourced number deserves. But the shape is consistent and it matches the RCT: more generated, more merged, more churned, same amount actually delivered, and shakier when it lands. The volume exploded at the one stage nobody added headcount to. And the cost is invisible on the velocity dashboard because it shows up downstream, in incidents, rework, and reviewer fatigue, on a different page from the chart everyone's cheering.
Why the feeling lies
The dangerous claim here isn't "AI is bad." A bad tool you eventually notice. The claim is that the instrument leaders steer by, a team's felt sense of velocity, doesn't just have noise in it. It reads backward. The people most confident the tool was speeding them up were the ones it was measurably slowing down.
That's worse than a bad tool, because you keep trusting a broken gauge. And nearly every AI adoption decision in the industry currently runs on it. "87% of developers use AI daily" is a sentiment survey, not a productivity measurement. Headcount plans, sprint commitments, and leadership decks claiming a team is twice as fast are built on self-report, which we now have direct evidence is inverted for experienced engineers on existing code. Developers are famously bad at estimating their own timelines. Turns out they're bad at estimating their own speedups too.
Worth saying plainly: this is one small study. The authors are careful that it doesn't prove AI slows everyone everywhere. The effect is widely expected to flip positive for juniors and for greenfield work, though the trial itself only tested experienced devs on mature repos, so treat the "flips for juniors" line as reasonable inference rather than a measured result from this experiment.
The honest counter: this is probably the dip
The strongest argument against panicking is the J-curve. New tools cost you before they pay you, and a lot of the felt-versus-real gap is the cost arriving before the payoff. There's a supporting signal in the aggregate data: DORA has reported throughput recovering even while stability lagged, which is roughly what a team climbing out of the dip looks like. Greenfield code and junior developers are a growing share of what gets built, and those are the cases the tools help most.
So the reasonable read is not "stop using AI." It's that the current generation of tools makes experienced engineers on large legacy systems slower today, and we should stop pretending otherwise until a stopwatch says the payoff arrived.
You can watch the toolmakers concede the same point in where their money and attention went. The whole "agent-first IDE" pitch, stripped of branding, says you stop sitting at the keyboard generating and move to a dashboard where the job is reviewing what agents produced and deciding what to keep. That is a bet that the work is now verification. They're building the cockpit for the exact bottleneck this study put a clock on.
What to actually do about it
If you run a team, the discipline is one line: stop steering by how fast it feels.
- Kill felt-velocity as a metric. Don't accept "the team feels much faster" as evidence. It's the one reading we now know reads backward. Any productivity claim that lives in a feeling is unproven until the clock agrees.
- Measure what reaches production and stays standing. Lean on delivery outcomes over activity counts. Change-fail rate, time to restore, and cycle time to deploy tell you more than PRs merged or lines added. A 98% jump in merged PRs with flat delivery isn't productivity, it's motion.
- Watch review, not authorship. If review time is climbing and 31% of PRs merge unreviewed, that's your bottleneck and your risk. Re-staff it. Treat reviewer capacity as a first-class constraint, not an afterthought, and don't let generated volume outrun the humans who have to vouch for it.
- Track churn and refactoring ratio. Rising copy-paste and collapsing refactoring are leading indicators of a legacy nightmare you're building 19% faster. If the codebase is accreting instead of being reshaped, generation is winning and comprehension is losing.
For individuals, the takeaway is narrower and more honest. If you enjoy tab-complete and your output stays good, fine. But if you're claiming a speedup, that's a number, and you should have the number. Time yourself on a few comparable tasks with and without the assistant before you rearrange your workflow around a vibe.
The fear isn't that these tools are useless. It's that engineering organizations are being restructured on the assumption they're efficient, and one of the only rigorous studies to test that assumption directly found the gain isn't there for the people doing the hardest work. The gauge broke this summer. The teams that come out ahead are the ones that notice, and replace it, before they report a number they got from a feeling.
Sources & further reading
- The gauge broke: devs felt 20% faster with AI, measured 19% slower — intrepidkarthi.com
- AI Coding Tools Make Devs 19% Slower Despite Feeling Faster — glbgpt.com
- AI coders think they’re 20% faster — but they’re actually 19% slower — pivot-to-ai.com
- AI made devs feel 20% faster but measured 19% slower. Nobody's ready for that conversation. - DEV Community — dev.to
Rachel has been embedded in the developer tooling ecosystem for nearly eight years, covering everything from IDE wars and package-manager drama to the quiet rise of AI-assisted coding. She has a soft spot for open-source maintainers and an unhealthy number of terminal emulators installed on a single laptop.
Discussion 2
need to consider this for my next side project
@indiehacker_noor, okay but does a 19% slowdown actually matter for your side project, or are there other benefits that outweigh the speed hit?