tformance

AI Impact on Software Engineering

Analysis of 167,000+ PRs from 101 Open Source Companies

Research Period: January 1 - December 25, 2025 | Generated: December 27, 2025

🤖 Feed this report to your AI: report_data_for_llms.md

📋 Copy Raw URL

⚡ TL;DR: Review AI Works. Code AI? It's Complicated.

✓ What This Research SHOWS

Review AI → -11% cycle time
CI: -18% to -7% (significant)

Review AI → -54% review time
CI: -60% to -49% (significant)

Code AI → +16% cycle time
CI: +4% to +25% (significant)

✗ What This Research Does NOT Show

"Code AI speeds up reviews" — NOT significant (CI crosses zero)
"AI causes faster cycles" — Correlation only, not RCT
"Typical PR is faster" — Medians show opposite (+100% slower)
"All teams benefit" — 60% of teams show AI is SLOWER
"We detected all AI" — Copilot (68% adoption) leaves no trace
"Review AI = human productivity" — Bots auto-comment instantly; we measure bot latency

Industry claims 85% adoption (Stack Overflow, JetBrains 2025) and massive productivity gains. The only RCT study (METR 2025) found AI made devs 19% slower. We analyzed 167,000+ PRs from 101 OSS companies to see what actually happens.

🔍 REVIEW AI (CodeRabbit, Cubic, Greptile)

-54% review time — bots provide instant feedback
-11% cycle time — PRs merge faster
73.5% of detected AI usage

⌨️ CODE AI (Cursor, Claude, Copilot, Devin)

+16% cycle time — aligns with METR RCT findings
-14% review time (n.s.) — NOT statistically significant
26.3% of detected usage (real share likely much higher)

CYCLE TIME: HOW LONG UNTIL PR MERGES?

🔍 REVIEW AI

73.4

hours avg

-11% faster

CodeRabbit, Cubic

📊 BASELINE

82.2

hours avg

0% (no AI)

111,215 PRs

⌨️ CODE AI

95.6

hours avg

+16% slower

Cursor, Claude, Copilot

Lower is better. Based on 167,000+ PRs from 101 OSS companies (74 with 500+ PRs for detailed analysis).

⚠️ Median Reality Check: The Typical PR Experience

Headlines use means. But medians tell a different story. Analysis excludes PRs >200h cycle time (~9% outliers) for statistical validity.

CODE AI (Typical PR)

+53%

5.5h vs 3.6h baseline

REVIEW AI (Typical PR)

+11%

4.0h vs 3.6h baseline

SKEW FACTOR

6.9×

mean ÷ median (filtered)

Bottom line: The typical (median) PR tells a consistent story: both Code AI (+53%) and Review AI (+11%) correlate with slightly slower cycle times. Mean-based headlines can be misleading due to outliers.

Review AI: Mixed Cycle, Faster Reviews

+5% cycle time (not significant), -54% review time (significant).

Code AI: Slower Cycles, Faster Reviews

+19% cycle time (95% CI: +9% to +24%), -17% review time.

🔬 Methodology

LLM analysis of PR metadata. Outliers >200h excluded (~9%). Detection is a floor—silent tools leave no trace.

About This Research

This is a meta-analysis of 167,000+ pull requests from 101 open source companies throughout 2025, attempting to understand real-world AI coding tool adoption patterns and their impact on engineering metrics.

How It Works

Fetch PR metadata (title, description, comments, commits) via GitHub GraphQL API
Analyze each PR with LLM (Llama 3.3 70B via Groq) to detect AI tool mentions
Pattern matching (25+ regex patterns) provides secondary detection
Calculate delivery metrics: cycle time, review time, PR size
Correlate AI usage with delivery metrics using statistical analysis

What We Don't Analyze

❌ Actual file contents or code quality
❌ Private/enterprise repositories
❌ AI usage without disclosure (hidden adoption)
❌ Copilot suggestions accepted without mention

⚠️ Important Disclaimer: This is an independent research project, not a peer-reviewed study. While we strive for accuracy, our detection methods have limitations — we can only identify AI usage that developers explicitly disclose. The findings represent detected AI adoption, not total AI adoption. Use these insights as directional guidance, not definitive conclusions.

🚀 Want Better Insights for YOUR Team?

This OSS study has inherent limitations. tformance solves them with native integrations:

● GitHub + Copilot API

Actual Copilot acceptance rates
Lines of code generated
Usage intensity (not just detected/not)

● Jira Integration

Story points (complexity control)
Issue type: bug vs feature vs refactor
Sprint velocity correlation

● Slack Surveys

Per-PR: "Did AI help?"
Task complexity self-report
Bot vs human reviewer tracking

The difference: This report shows correlations in OSS data. With your team's full data, tformance can show controlled associations — answering "Does AI actually help MY team, on which tasks, and by how much?"

Get Started with tformance →

📊 Statistical Confidence

How confident can we be in these findings? Here's a transparency breakdown of our statistical validity.

167,308

PRs Analyzed

Large sample size

±0.16%

95% Confidence Interval

For overall 12.1% adoption

21.9%

Standard Deviation

Between teams (high variance)

p < 0.0001

Team Structure Test

Chi-square significance

✅ What We Can Confidently Say:

Overall detected AI adoption is 12.1% ± 0.16% (95% CI)
AI-assisted PRs show 9% faster cycle time on average
Review time is 50% faster for AI-assisted PRs
Review AI drives the gains: -11% cycle time, -54% review time
Code AI shows mixed results: +16% cycle time, -14% review time
There's massive variance between teams (0% to 86%)

⚠️ Limitations to Consider:

Selection bias: These are popular OSS projects, not a random sample
Detection bias: We only capture disclosed AI usage
Team confounding: Language/framework correlations overlap with team identity
Survivorship: Only analyzing active, successful projects

📈 Distribution Insight: Team AI adoption ranges from 0% (Huly) to 85.6% (Plane). The interquartile range is 0.9% – 13.5%, meaning half of teams fall within this wide range. Aggregate stats (like "12.1% overall") hide enormous project-to-project differences — your team could reasonably be anywhere from 0% to 86%.

AI Adoption Trend — 2025

Overall AI-assisted PR rate across 101 OSS companies throughout 2025. Shows the growth trajectory from early adoption to mainstream usage.

8.3%

January 2025

16.8%

Peak (July 2025)

14.6%

December 2025

+76%

YoY Growth (Jan→Dec)

Trend Analysis: AI adoption grew steadily throughout 2025, peaking at 16.8% in July. With 101 OSS companies in the dataset (74 with 500+ PRs), we see variance month-to-month. The dip after July may reflect seasonal patterns or detection limitations as AI usage becomes more normalized.

Key Takeaways for Engineering Leaders

1 Mixed Tool Strategy Wins

Teams using bot + IDE + LLM combinations show better outcomes than single-tool teams. Antiwork's 61.9% adoption with mixed tools yielded -8.5% cycle time and -50.2% review time.

2 Review Velocity is Consistent

35 out of 50 teams (with comparable data) show faster review times with AI assistance. Average improvement: -52%. AI-generated code appears easier to review.

3 Autonomous Agents Growing Fast

Cubic went from 0 → 347 PRs/month (Jan→Nov). Autonomous agents (Devin, Cubic) now represent 30.7% of all AI tool usage.

4 40-60% is Sweet Spot

Very high adoption (>80%) correlates with larger cycle times. Best-performing teams (Antiwork, Trigger.dev) maintain balanced 40-60% AI usage.

5 Tool Diversification Accelerating

Jan: CodeRabbit had 95%+ share. Dec: CodeRabbit 36%, Cubic 21%, Claude 17%. The market is fragmenting rapidly.

Want these insights for your team? — Connect GitHub, Jira & Slack to track AI adoption automatically.

Get Started Free →

Executive Summary

167,308

Total Pull Requests

100

OSS Companies

12.1%

AI-Assisted PRs

-50%

Avg Review Time Change

-9%

Avg Cycle Time Change

Key Finding: AI tools accelerate both code review (-50% avg) and overall cycle time (-9% avg). But the story is more nuanced: Review AI (CodeRabbit, Cubic) drives most gains (-54% review time), while Code AI (Cursor, Claude) shows mixed results (+16% cycle time).

⚠️ Correlation ≠ Causation: These findings show correlation, not causation. We cannot determine if AI itself improves delivery or if high-performing teams simply adopt AI more readily. Alternative explanations include: selection bias toward well-maintained projects, and the fact that disclosed AI usage may correlate with more structured development practices.

AI Tool Evolution (2025)

Monthly trend of AI tool usage across all teams. Shows dramatic market shift throughout the year.

Market Shift: CodeRabbit dominated early 2025 (95%+ share in Jan-Feb). By Q4, the market fragmented: Cubic emerged as #2 (347 PRs in Nov), Claude and Cursor saw 10x growth. Human-directed AI tools (Claude, Cursor, Copilot) grew from 2% to 35% of monthly usage.

⚠️ Important Context — Hidden Tool Usage:

Our tool rankings reflect disclosed usage patterns in OSS PRs, not total market share. Industry surveys show Copilot (68%) and ChatGPT (82%) dominate actual usage — yet they're nearly absent from our data.

Why? Tools that leave visible artifacts (CodeRabbit bot comments, Devin author attribution) appear dominant because we can detect them. Silent tools (Copilot autocomplete, ChatGPT for research/debugging) are likely underrepresented by 5-10x.

Tool Category Over Time

Current Tool Market Share

🔀 Code AI vs Review AI: A New Framework

Not all AI tools are equal. We categorize AI coding tools by their primary function to reveal different impact patterns.

⌨️ Code AI

Tools that write or generate code — autocomplete, code generation, refactoring assistance.

Cursor Copilot Claude Devin ChatGPT Windsurf Aider

🔍 Review AI

Tools that review or comment on code — automated reviews, suggestions, quality checks.

CodeRabbit Greptile Cubic Sourcery CodeAnt

Why This Categorization Matters:

Different impact patterns: Code AI affects what gets written; Review AI affects what gets approved
Different investment decisions: Teams may need both categories for optimal outcomes
Detection bias awareness: Review AI (bots) is highly visible; Code AI (autocomplete) is often silent

📊 Category Breakdown in Our Data

Analysis of 25,217 detected tool usages across 167,308 PRs:

73.5%

Review AI

18,534 detections
CodeRabbit (11.1k) • Cubic (6.9k) • Greptile (469)

26.3%

Code AI

6,631 detections
Devin (1.8k) • Cursor (1.4k) • Claude (1.2k) • Copilot (823)

Detection Bias Warning: Review AI dominates because it leaves visible bot comments. Code AI (especially Copilot autocomplete) is likely underrepresented by 5-10x — industry surveys show 68% of developers use Copilot, yet we detect only 823 explicit mentions. This gap is expected: autocomplete leaves no trace.

📈 Category Impact: The Data Speaks

Comparing PRs by AI category against the non-AI baseline (111,215 PRs):

NO AI (BASELINE)

25.0 hrs

avg cycle time

12.5 hrs

avg review time

CODE AI (2,685 PRs)

+19%

cycle time (29.7 hrs)

95% CI: +9% to +24%

-17%

review time (10.4 hrs)

95% CI: -29% to -9%

REVIEW AI (10,361 PRs)

+5%

cycle time (26.2 hrs)

95% CI: -1% to +6%

-54%

review time (5.8 hrs)

95% CI: -59% to -52%

📊 Distribution Note: Cycle times are heavily right-skewed — a few PRs take weeks while most merge quickly. The median (typical PR) differs dramatically from the mean:

No AI
Mean: 82.2h | Median: 5.7h
Code AI
Mean: 95.6h | Median: 11.4h
Review AI
Mean: 73.4h | Median: 6.0h

Medians show the typical PR experience is much faster than means suggest. The patterns hold in both measures.

✓ Review AI Insight: Automated code reviews (CodeRabbit, Cubic) deliver 11% faster cycle times and 54% faster reviews. Bot reviews replace some human review time while catching issues early.

⚠ Code AI Caveat: Code generation tools show +16% cycle time but -14% review time. Hypothesis: AI-generated code may require more iteration/refinement before review-readiness.

⚡ Key Takeaway for CTOs: Review AI is the clear efficiency win with measurable time savings. Code AI's mixed results suggest it's better suited for specific use cases (refactoring, boilerplate) rather than universal adoption. Consider a hybrid strategy: Review AI for all PRs + targeted Code AI for appropriate tasks.

📏 Size-Normalized Analysis: Controlling for PR Size

PR size is a potential confounding variable — AI-assisted PRs might be systematically larger or smaller. By normalizing review time per 100 lines of code, we control for this:

Category	PRs	Review Hours / 100 Lines	vs Baseline
Baseline (No AI)	96,903	321.4 hrs	—
Code AI	3,502	236.5 hrs	-26%
Review AI	12,290	122.8 hrs	-62%

✓ Size Doesn't Explain the Results: Even after controlling for PR size, Review AI shows 62% faster reviews per 100 lines. Code AI also shows improvement (-26%) when size-normalized, suggesting the raw metrics may understate its benefits.

🔬 Within-Team Analysis: Controlling for Team Differences

Different teams have different baselines (fast vs slow). Comparing AI vs non-AI within each team controls for this. Teams included only if they have 10+ PRs in both groups:

48

Teams Analyzed

21

AI Faster (44%)

27

AI Slower (56%)

Show top 5 teams where AI is faster/slower

✓ AI Faster (Top 5)

Comp AI -81%

OpenReplay -78%

Coolify -65%

Directus -40%

Resend -38%

✗ AI Slower (Top 5)

Appsmith +150%

Mattermost +140%

Windmill +102%

LangChain +97%

Plane +87%

⚠️ Simpson's Paradox Alert: The aggregate shows Review AI is -11% faster overall, but 60% of individual teams see AI as slower. This can happen when AI adoption correlates with other factors (e.g., teams that already ship fast adopt AI more). Aggregate stats can mislead — always check within-team patterns.

AI Adoption by Team

74 companies with 500+ analyzed PRs in 2025 (from 100 total), sorted by AI adoption rate.

Adoption Range: AI adoption varies from 0% (GrowthBook) to 85.6% (Plane). High-adoption teams tend to use review bots (CodeRabbit) extensively.

AI Impact on Metrics

Comparing AI-assisted PRs vs non-AI PRs within each team. Green = AI improved metric, Red = AI increased metric.

Review Time Impact (AI vs Non-AI)

Cycle Time Impact (AI vs Non-AI)

Monthly Adoption Trends by Team

Select teams to compare their AI adoption journey throughout 2025.

Patterns: Cal.com shows dramatic growth (22.9% Jan → 78.8% Jun). Formbricks shows opposite trend (90% Jan → 1% Dec). Teams experiment heavily then settle on sustainable adoption levels.

Complete Team Data

📊 Team Summary (CSV) 📈 Monthly Trends (CSV) 🔧 AI Tools by Month (CSV)

Team	PRs	AI %	Cycle Δ	Review Δ	Size Δ
			-	-	-

🔬

Detection Method Comparison

Regex patterns vs LLM semantic analysis — validating our AI detection accuracy

93.4%

Agreement Rate

51,800

PRs with Matching Results

Both methods agree

+1,718

Additional AI PRs Found

LLM catches what regex misses

53,876

PRs Analyzed

89.0% of dataset

⚡

Regex Pattern Detection

25+ patterns • Instant • Rule-based

10,626

AI-assisted PRs detected (17.2%)

Exact signature matching
Known tool footprints (CodeRabbit, Devin, etc.)
Fast, deterministic results

🧠

LLM Semantic Analysis

ChatGPT OSS 20B + Llama 70B fallback • via Groq

11,540

AI-assisted PRs detected (21.4%)

Context-aware understanding
Catches implicit AI mentions
Tool disambiguation (GPT-4 via Cursor → Cursor)

Tool Detection by Method

Comparing how each method detects specific AI tools. Green delta = LLM finds more; Red delta = Regex finds more (often over-matching).

AI Tool	Regex	LLM	Delta	Insight
CodeRabbit	5,431	5,891	+460	LLM catches bot mentions in comments
Devin	1,706	1,743	+37	Strong pattern coverage
Cubic	1,490	1,590	+100	LLM understands "cubic" in context
Claude	412	724	+312	+76% — catches indirect refs
Cursor	499	702	+203	+41% — IDE mentions
Greptile	7	459	+452	65x more! Regex pattern gap
Copilot	204	403	+199	2x — implicit mentions
ChatGPT	471	238	-233	LLM reassigns to specific tools

💡 Key Insights from Method Comparison

93.4% agreement validates both detection methods produce consistent results
LLM catches 1,718 additional AI-assisted PRs (3.2%) that regex patterns miss entirely
Greptile detection improved 65x — major regex pattern gap identified and fixed via LLM feedback
LLM correctly reassigns generic "ChatGPT" to specific tools (e.g., "GPT-4 via Cursor" → Cursor)
Human-directed AI tools (Claude, Cursor, Copilot) have the largest detection gaps — developers don't always mention them explicitly

Detection Improvement by Team

How much more AI usage does LLM detect compared to regex patterns? Positive = LLM finds more, Negative = Regex over-matches.

Biggest LLM Improvements:

Deno — +408% more AI PRs detected
Twenty CRM — +360% improvement
PostHog — +200% (312 more PRs)
Cal.com — +476 additional AI PRs

Regex Over-Matching:

Vercel — Regex found 135 more (likely false positives)
LangChain — Regex found 50 more
LLM more conservative with ambiguous mentions

Note: Teams with low AI usage show high percentage improvements but small absolute numbers. High-adoption teams like Plane (87%) show near-parity between methods.

🔄 LLM-to-Regex Feedback Loop

LLM analysis revealed patterns that were then added to regex detection, creating a continuous improvement cycle:

LLM detected @lingodotdev mentions → added to regex → 430 new Replexica detections
CodeRabbit author patterns identified → 6,884 total detections
Gap reduction: 1,717 → 1,668 (-49 PRs, 2.9% improvement)

This validates an iterative approach: LLM finds edge cases → patterns extracted → regex updated → backfill run.

📊

Measure your team's AI adoption — tformance runs this analysis automatically on your repos.

Learn more →

AI Adoption Correlations

Analyzing 53,876 LLM-processed PRs to understand where AI tools are used most.

By PR Type

Key Insight: Refactors show highest AI adoption (32.8%), suggesting developers leverage AI for code restructuring more than greenfield features. CI/CD work shows lowest adoption (9.9%) — infrastructure-as-code less suited to current AI tools.

By Technology Category

Key Insight: Mobile development shows highest AI adoption (42.4%), though sample size is smaller (118 PRs). Test-related PRs at 33% suggest AI is heavily used for test generation. DevOps lowest at 15.6%.

By PR Size

Key Insight: Larger PRs correlate with higher AI usage — XL PRs (501+ lines) show 26.0% AI adoption vs 17.8% for XS PRs (0-10 lines). This is a 46% relative increase. AI tools may enable developers to tackle larger changes confidently, or larger changes benefit more from AI assistance.

By Team Structure

Does team composition affect AI adoption? We measured "concentration" — what % of PRs come from the top 5 contributors.

Focused Teams (70%+ concentration):

Dub — 94.3% concentration, 83.9% AI
Trigger.dev — 76.0%, 45.7% AI
Formbricks — 70.3%, 50.5% AI

~66 contributors avg, 29.2% AI adoption

Distributed Teams (<40% concentration):

PostHog — 18.7% concentration, 7.5% AI
LangChain — 30.5%, 4.2% AI
Supabase — 31.5%, 12.0% AI

~452 contributors avg, 19.2% AI adoption

Key Insight: Focused teams show 52% higher AI adoption (29.2% vs 19.2%) than distributed open-source projects. Core teams can establish AI tool workflows, share knowledge, and standardize practices. Large contributor bases with occasional contributors have less consistent tooling.

💡 Recommendation: Explicit AI Disclosure in PRs

Teams with high AI detection rates (like Antiwork at 61.7%) often have a culture of explicit AI disclosure in PR descriptions:

                    ## AI Usage

                    - Used Claude to scaffold the initial component structure

                    - Copilot assisted with test generation

                    - Cursor for refactoring the API layer

Benefits for Engineering Leaders:

Accurate measurement — Know the true AI adoption rate across your team
Knowledge sharing — Team members learn which tools work for which tasks
Quality insights — Correlate AI usage with review feedback and bug rates
Onboarding — New hires learn team AI practices from PR history

Consider adding an "AI Usage" section to your PR template. Even "No AI used" is valuable data.

Methodology

Data Collection

Source: GitHub GraphQL API with authenticated access
Teams: 51 open source project teams
Date Range: January 1 - December 25, 2025
PR Fields: Title, body, comments, commits, reviews, files

Metrics Definitions

Cycle Time: Hours from PR creation to merge
Review Time: Hours from PR creation to first review
PR Size: Total lines added + deleted
Impact Delta: (AI metric - Non-AI metric) / Non-AI × 100

AI Detection Methods

LLM Detection: Llama 3.3 70B (via Groq Batch API) for semantic analysis of PR metadata
Pattern Detection: 25+ regex patterns for known AI signatures
Tool Attribution: Extracted from PR descriptions, commits, comments
Scope: Metadata only — file contents not analyzed

Data Quality

Total PRs: 167,308
LLM Analyzed: 109,240 (65.3%)
AI-Assisted PRs: 20,260 (12.1%)
OSS Companies: 101 (74 with 500+ PRs)

📊 Data Pipeline: Sample Selection

Why do our category metrics show 125k PRs instead of 167k? Here's the data funnel:

167,308 total PRs (all 101 OSS companies)
    │
    ├─▶ 5,346 excluded: teams with <500 PRs (too small for stats)
    │
    └─▶ 161,962 PRs from 74 teams with 500+ PRs
            │
            ├─▶ 36,290 excluded: unmerged PRs (no cycle_time)
            │   └─ Cycle time requires merge timestamp
            │
            └─▶ 125,573 merged PRs
                    └─ This is what category_metrics.csv contains
                    └─ Used for cycle time & review time analysis

Why exclude unmerged PRs? Cycle time (PR creation → merge) is undefined for PRs that never merged. Draft PRs, abandoned PRs, and still-open PRs are excluded from timing analysis but counted in adoption rates.

📚 Industry Context

How do our findings compare to major developer surveys? Understanding the gap helps contextualize our results.

12.1%

Our Report (Detected)

Explicit AI mentions in 161k OSS PRs

84%

Stack Overflow 2025

Using or planning to use AI

51%

SO 2025 Daily Users

Professional devs using AI daily

85%

JetBrains 2025

Regularly use AI for coding

+19%

METR 2025 (RCT)

AI made devs slower (controlled study)

🔬

The Only Randomized Controlled Trial (RCT)

METR's 2025 study is the only randomized controlled trial measuring AI tool impact on developer productivity. Key findings:

Developers with AI tools took 19% longer to complete tasks
Yet developers believed they were 20% faster — a 43-point perception gap
Sample: 16 experienced devs, 246 issues from major OSS repos

Why this matters: Survey data (SO, JetBrains) captures developer perception; RCT captures reality. Our behavioral PR data aligns more closely with RCT findings than survey sentiment.

⚠️ Important Caveat on Study Design: METR used an RCT (causal) — randomized assignment proves causation. This report is observational (correlational) — we observe what happens but cannot prove AI caused faster/slower cycles. Confounders like team culture, PR complexity, or developer experience could explain differences. The within-team and size-normalized analyses help control for some confounders, but true causation requires controlled experiments.

📊 Understanding the 4x Gap: Why Our Data Differs

Metric	Value	What It Measures	Why Different
Our Report	12.1%	PRs with explicit AI disclosure	Floor — only visible mentions
SO 2025 Daily	51%	Devs using AI daily at work	Self-reported, includes silent use
SO 2025 Total	84%	Using or planning to use	Ceiling — includes "planning"
JetBrains 2025	85%	Regularly use AI for coding	Self-reported, all usage types
METR 2025 (RCT)	+19% slower	Task completion time with AI	Controlled experiment, not survey

🧊

The Iceberg Analogy:

Think of AI usage as an iceberg. Our 12.1% is the visible tip — PRs where developers explicitly mention AI tools. The hidden 80% includes: Copilot autocomplete (accepted silently), ChatGPT brainstorming (never documented), and AI-assisted debugging (no trace in PR). Survey data captures the whole iceberg; we only see what surfaces in PR metadata.

Why This Matters: Our 12.1% is a conservative floor, not a ceiling. The real AI adoption rate in these teams is likely 40-60% based on industry benchmarks. Our data shows disclosed, attributable AI usage — valuable for understanding tool-specific patterns, but not total AI penetration.

Copilot autocomplete — 68% of devs use it (SO 2025), but it's rarely mentioned in PRs
ChatGPT research — 82% use it, but for learning/debugging, not disclosed
OSS disclosure norms — OSS may have lower disclosure rates than enterprise
Our unique value — We capture what developers choose to share, showing tool attribution patterns

Key Survey Insights (2025)

Trust declining: Only 33% trust AI outputs (SO 2025), down from 43% in 2024
Productivity claims: 52% say AI improved productivity (SO 2025)
Time saved: 88% save 1+ hour/week, 19% save 8+ hours (JetBrains 2025)
Coding assistants: 62% use AI coding assistant or agent (JetBrains 2025)

Our Unique Contribution

Behavioral data — PR metadata, not self-reported surveys
Tool attribution — which specific tools are used where
Metric correlation — AI vs cycle time, review time
Trend analysis — month-over-month adoption changes

📖 Sources: Stack Overflow 2025 • JetBrains 2025 • METR 2025 RCT • Stack Overflow 2024 (for trend comparison)

🎯 Action Items for Your Team

Based on our findings, here's what engineering leaders should evaluate in their own teams.

📊 Measure Your Baseline

Do you know your team's current AI adoption rate? Without measurement, you can't improve. Start tracking AI usage in PRs — even informal surveys help.

📝 Add AI Disclosure to PR Template

High-adoption teams explicitly disclose AI usage. Add an "## AI Usage" section to your PR template. Even "No AI used" is valuable data.

🔧 Diversify Your Tool Stack

Top performers use a mix of bots (CodeRabbit), IDEs (Cursor), and LLMs (Claude). Single-tool teams show worse outcomes than multi-tool teams.

⚖️ Target 40-60% Adoption

Very high adoption (>80%) correlates with longer cycle times. The sweet spot appears to be balanced usage where AI augments rather than replaces human judgment.

🔍 Watch Review Velocity

AI-assisted PRs typically get reviewed faster (-31% avg). If your team isn't seeing this benefit, investigate the quality of AI-generated code.

👥 Consider Team Structure

Focused teams (small core) show 52% higher AI adoption. If you have many occasional contributors, AI tool standardization may be harder.

Want These Insights for Your Team?

tformance connects to your GitHub, Jira, and Slack to automatically measure AI adoption, correlate with delivery metrics, and surface actionable insights — no manual tracking required.

Get Started → Request Demo

About the Author

Oleksii Ianchuk

Technical Product Manager

8 years building developer tools. Technical Product Lead at Mailtrap — shipped Email API/SMTP, ran pricing experiments, onboarded enterprise accounts.

"We used Copilot, Cursor, and every AI tool we could get. But I couldn't answer: 'Is this actually helping us ship faster?'"

GitHub LinkedIn