Do AI Tools Really Boost Developer Productivity

Written by Torsten Zimmermann | Jul 19, 2025 12:17:28 PM

A New Study Challenges the Hype: In the rapidly evolving world of software development, AI tools like Cursor Pro and Claude 3.5/3.7 Sonnet have been hailed as game changers, promising to supercharge productivity and streamline coding tasks. However, a recent study from METR (Model Evaluation & Threat Research) presents a surprising finding: for experienced open-source developers working on familiar codebases, AI tools may slow things down. Let’s dive into the findings of this randomized controlled trial (RCT) and explore what they mean for the future of AI in software development.

AI offers countless possibilities, but one should also be aware of the pitfalls and drawbacks..

The Study: A Real-World Test of AI Tools

Published on July 10, 2025, METR’s study, titled "Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity", set out to test how AI tools affect developers in realistic scenarios. Unlike many benchmarks that rely on synthetic tasks or algorithmic evaluations, this study focused on real-world coding tasks, such as bug fixes, feature additions, and code refactoring, in large, mature open-source repositories.

The researchers recruited 16 experienced developers, each with an average of 5 years of experience contributing to repositories boasting over 22,000 stars and 1 million lines of code. These developers tackled 246 tasks, with each task randomly assigned to either allow or disallow AI assistance. When AI was permitted, developers primarily used Cursor Pro with Claude 3.5/3.7 Sonnet, state-of-the-art tools at the time (February to March 2025). The tasks, averaging about two hours each, were recorded via screen captures, and developers self-reported their completion times.

The Surprising Result: AI Slowed Developers Down

The headline finding? When developers used AI tools, they took 19% longer to complete tasks compared to when they worked without AI. This result flies in the face of both developer expectations and industry hype. Before the study, participants predicted AI would speed them up by 24%, and even after completing the tasks, they estimated a 20% speedup. The reality—a 19% slowdown—reveals a striking gap between perception and actual performance.

Why the slowdown? METR’s analysis of screen recordings offers some clues. While AI tools reduced time spent on active coding, testing, and searching for information, developers spent significantly more time prompting AI, reviewing its outputs, and waiting for responses. In some cases, they also experienced “idle/overhead time” with no activity at all. Only 44% of AI-generated code was accepted without modification, meaning developers often had to tweak or debug suggestions, which ate into their time savings.

Why Did AI Underperform?

Several factors likely contributed to the slowdown:

Complex Codebases: The repositories in the study were large (over 1 million lines of code) and mature (averaging 10 years old). Experienced developers, with years of familiarity, could often navigate these codebases more efficiently than AI tools, which struggled with the context-heavy nature of the tasks. AI models often lack the deep understanding of project-specific conventions and architecture that seasoned developers bring to the table.
Over-Reliance on AI: Developers may have been overly optimistic about AI’s capabilities, using it for tasks better suited to their expertise. This over-reliance led to time spent crafting prompts and fixing AI-generated code that didn’t quite hit the mark. One study participant noted that AI’s suggestions were “directionally correct” but often required significant cleanup to meet project standards.
The “Generative AI Slot Machine Effect”: Some developers reported getting distracted while waiting for AI outputs, checking social media, or performing other tasks. This “slot machine effect” suggests that AI tools, while reducing cognitive load, can introduce interruptions that disrupt focus. One developer even mentioned that enabling a feature in Cursor to alert them when prompts finish could help reclaim lost time.
Learning Curve: While 94% of participants had some experience with web-based LLMs, only 56% had used Cursor before. Even with training, the unfamiliarity with specific tools might have contributed to inefficiencies. However, the study found no significant correlation between prior AI tool experience and performance, suggesting the slowdown wasn’t solely due to inexperience.

The Perception vs. Reality Gap

One of the study’s most intriguing findings is the disconnect between developers’ perceptions and the actual outcomes. Despite the measured slowdown, 69% of participants continued using AI tools after the study, suggesting they found the experience less taxing or more enjoyable, even if it wasn’t faster. This aligns with broader productivity research, which shows that self-reported productivity often doesn’t match objective metrics. The reduced cognitive effort of using AI might make developers feel more productive, even when the clock tells a different story.

This perception gap has parallels in other fields. For example, studies have shown that people often overestimate the productivity gains from tools or substances (like Adderall) because they feel more engaged or energized, even if the output doesn’t match the enthusiasm. In coding, the satisfaction of seeing AI generate a quick prototype or handle boilerplate code can create an “IKEA effect,” where developers value the results more because they interacted with the tool, even if it took longer overall.

What This Means for AI in Software Development

The METR study doesn’t spell doom for AI coding tools—it’s a snapshot of early-2025 capabilities in a specific context. The researchers themselves caution against overgeneralizing, noting that AI might offer greater benefits for less experienced developers, smaller projects, or unfamiliar codebases. They also point out that AI progress is rapid, and newer models (like Claude 4 Opus or Gemini 2.5 Pro, released after the study) could shift the results.

Still, the findings challenge the narrative that AI is a universal productivity booster. They highlight the importance of rigorous, real-world testing over anecdotal hype or synthetic benchmarks. As one participant put it, AI can be a “magic bullet” for tasks like prototyping or handling tedious boilerplate, but it’s not a one-size-fits-all solution. For complex, context-heavy tasks, human expertise still holds an edge.

Looking Ahead

METR plans to continue refining this methodology to track AI’s evolving impact on software development. Future studies could explore how AI performs with junior developers, greenfield projects, or different tools and models. They might also investigate whether AI’s benefits lie in areas beyond raw speed, like improving code quality or reducing burnout for developers with cognitive challenges, as one participant with ADHD noted.

For now, the study serves as a reality check. Developers and companies banking on AI to revolutionize coding should temper their expectations and invest in training to use these tools effectively. It’s also a reminder to measure actual outcomes, not just vibes. As AI continues to advance, the balance between human expertise and machine assistance will likely shift - but for now, don’t expect miracles.

Key Takeaways

AI Slowdown: Experienced developers took 19% longer to complete tasks with AI tools like Cursor Pro and Claude 3.5/3.7 Sonnet, despite expecting a 24% speedup.
Why It Happened: Time spent prompting, reviewing, and fixing AI outputs outweighed savings in coding and debugging. Complex codebases and over-reliance on AI also played a role.
Perception vs. Reality: Developers believed AI made them faster, even after experiencing the slowdown, highlighting a gap between subjective experience and objective metrics.
Context Matters: AI might shine in less complex or unfamiliar projects, but it struggles with large, mature codebases where developers already have deep expertise.
Future Potential: Rapid AI advancements mean these results may not hold long-term, but real-world testing is crucial to understanding the true impact.

For the full details, check out the study on METR’s website: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/).

What do you think? Have you found AI tools to be a help or a hindrance in your coding projects? Let’s keep the conversation going!

Ready to transform your software quality strategy? Visit NUCIDA to learn more about artificial intelligence solutions. The future of intelligent quality is here.

Want to know more? Watch our YouTube video, Ignite Your Business with Three Strategies in AI, to leverage your business processes to the next level.

Pictures from pixabay.com and NUCIDA Group
Article written and published by Torsten Zimmermann

View full post