AI Impact on Software Engineering

Analysis of 167,000+ PRs from 101 Open Source Companies

Research Period: January 1 - December 25, 2025 | Generated: December 27, 2025

πŸ€– Feed this report to your AI: report_data_for_llms.md
πŸ“‹ Copy Raw URL

⚑ TL;DR: Review AI Works. Code AI? It's Complicated.

βœ“ What This Research SHOWS
Review AI β†’ -11% cycle time
CI: -18% to -7% (significant)
Review AI β†’ -54% review time
CI: -60% to -49% (significant)
Code AI β†’ +16% cycle time
CI: +4% to +25% (significant)
βœ— What This Research Does NOT Show
  • "Code AI speeds up reviews" β€” NOT significant (CI crosses zero)
  • "AI causes faster cycles" β€” Correlation only, not RCT
  • "Typical PR is faster" β€” Medians show opposite (+100% slower)
  • "All teams benefit" β€” 60% of teams show AI is SLOWER
  • "We detected all AI" β€” Copilot (68% adoption) leaves no trace
  • "Review AI = human productivity" β€” Bots auto-comment instantly; we measure bot latency
Industry claims 85% adoption (Stack Overflow, JetBrains 2025) and massive productivity gains. The only RCT study (METR 2025) found AI made devs 19% slower. We analyzed 167,000+ PRs from 101 OSS companies to see what actually happens.
πŸ” REVIEW AI (CodeRabbit, Cubic, Greptile)
  • -54% review time β€” bots provide instant feedback
  • -11% cycle time β€” PRs merge faster
  • 73.5% of detected AI usage
⌨️ CODE AI (Cursor, Claude, Copilot, Devin)
  • +16% cycle time β€” aligns with METR RCT findings
  • -14% review time (n.s.) β€” NOT statistically significant
  • 26.3% of detected usage (real share likely much higher)
CYCLE TIME: HOW LONG UNTIL PR MERGES?
πŸ” REVIEW AI
73.4
hours avg
-11% faster
CodeRabbit, Cubic
πŸ“Š BASELINE
82.2
hours avg
0% (no AI)
111,215 PRs
⌨️ CODE AI
95.6
hours avg
+16% slower
Cursor, Claude, Copilot
Lower is better. Based on 167,000+ PRs from 101 OSS companies (74 with 500+ PRs for detailed analysis).
⚠️ Median Reality Check: The Typical PR Experience

Headlines use means. But medians tell a different story. Analysis excludes PRs >200h cycle time (~9% outliers) for statistical validity.

CODE AI (Typical PR)
+53%
5.5h vs 3.6h baseline
REVIEW AI (Typical PR)
+11%
4.0h vs 3.6h baseline
SKEW FACTOR
6.9Γ—
mean Γ· median (filtered)
Bottom line: The typical (median) PR tells a consistent story: both Code AI (+53%) and Review AI (+11%) correlate with slightly slower cycle times. Mean-based headlines can be misleading due to outliers.
Review AI: Mixed Cycle, Faster Reviews
+5% cycle time (not significant), -54% review time (significant).
Code AI: Slower Cycles, Faster Reviews
+19% cycle time (95% CI: +9% to +24%), -17% review time.
πŸ”¬ Methodology
LLM analysis of PR metadata. Outliers >200h excluded (~9%). Detection is a floorβ€”silent tools leave no trace.

About This Research

This is a meta-analysis of 167,000+ pull requests from 101 open source companies throughout 2025, attempting to understand real-world AI coding tool adoption patterns and their impact on engineering metrics.

How It Works

  • Fetch PR metadata (title, description, comments, commits) via GitHub GraphQL API
  • Analyze each PR with LLM (Llama 3.3 70B via Groq) to detect AI tool mentions
  • Pattern matching (25+ regex patterns) provides secondary detection
  • Calculate delivery metrics: cycle time, review time, PR size
  • Correlate AI usage with delivery metrics using statistical analysis

What We Don't Analyze

  • ❌ Actual file contents or code quality
  • ❌ Private/enterprise repositories
  • ❌ AI usage without disclosure (hidden adoption)
  • ❌ Copilot suggestions accepted without mention
⚠️ Important Disclaimer: This is an independent research project, not a peer-reviewed study. While we strive for accuracy, our detection methods have limitations β€” we can only identify AI usage that developers explicitly disclose. The findings represent detected AI adoption, not total AI adoption. Use these insights as directional guidance, not definitive conclusions.

πŸš€ Want Better Insights for YOUR Team?

This OSS study has inherent limitations. tformance solves them with native integrations:

● GitHub + Copilot API
  • Actual Copilot acceptance rates
  • Lines of code generated
  • Usage intensity (not just detected/not)
● Jira Integration
  • Story points (complexity control)
  • Issue type: bug vs feature vs refactor
  • Sprint velocity correlation
● Slack Surveys
  • Per-PR: "Did AI help?"
  • Task complexity self-report
  • Bot vs human reviewer tracking
The difference: This report shows correlations in OSS data. With your team's full data, tformance can show controlled associations β€” answering "Does AI actually help MY team, on which tasks, and by how much?"
Get Started with tformance β†’

πŸ“Š Statistical Confidence

How confident can we be in these findings? Here's a transparency breakdown of our statistical validity.

167,308
PRs Analyzed
Large sample size
Β±0.16%
95% Confidence Interval
For overall 12.1% adoption
21.9%
Standard Deviation
Between teams (high variance)
p < 0.0001
Team Structure Test
Chi-square significance
βœ… What We Can Confidently Say:
  • Overall detected AI adoption is 12.1% Β± 0.16% (95% CI)
  • AI-assisted PRs show 9% faster cycle time on average
  • Review time is 50% faster for AI-assisted PRs
  • Review AI drives the gains: -11% cycle time, -54% review time
  • Code AI shows mixed results: +16% cycle time, -14% review time
  • There's massive variance between teams (0% to 86%)
⚠️ Limitations to Consider:
  • Selection bias: These are popular OSS projects, not a random sample
  • Detection bias: We only capture disclosed AI usage
  • Team confounding: Language/framework correlations overlap with team identity
  • Survivorship: Only analyzing active, successful projects
πŸ“ˆ Distribution Insight: Team AI adoption ranges from 0% (Huly) to 85.6% (Plane). The interquartile range is 0.9% – 13.5%, meaning half of teams fall within this wide range. Aggregate stats (like "12.1% overall") hide enormous project-to-project differences β€” your team could reasonably be anywhere from 0% to 86%.

AI Adoption Trend β€” 2025

Overall AI-assisted PR rate across 101 OSS companies throughout 2025. Shows the growth trajectory from early adoption to mainstream usage.

8.3%
January 2025
16.8%
Peak (July 2025)
14.6%
December 2025
+76%
YoY Growth (Jan→Dec)
Trend Analysis: AI adoption grew steadily throughout 2025, peaking at 16.8% in July. With 101 OSS companies in the dataset (74 with 500+ PRs), we see variance month-to-month. The dip after July may reflect seasonal patterns or detection limitations as AI usage becomes more normalized.

Key Takeaways for Engineering Leaders

1 Mixed Tool Strategy Wins

Teams using bot + IDE + LLM combinations show better outcomes than single-tool teams. Antiwork's 61.9% adoption with mixed tools yielded -8.5% cycle time and -50.2% review time.

2 Review Velocity is Consistent

35 out of 50 teams (with comparable data) show faster review times with AI assistance. Average improvement: -52%. AI-generated code appears easier to review.

3 Autonomous Agents Growing Fast

Cubic went from 0 → 347 PRs/month (Jan→Nov). Autonomous agents (Devin, Cubic) now represent 30.7% of all AI tool usage.

4 40-60% is Sweet Spot

Very high adoption (>80%) correlates with larger cycle times. Best-performing teams (Antiwork, Trigger.dev) maintain balanced 40-60% AI usage.

5 Tool Diversification Accelerating

Jan: CodeRabbit had 95%+ share. Dec: CodeRabbit 36%, Cubic 21%, Claude 17%. The market is fragmenting rapidly.

Want these insights for your team? β€” Connect GitHub, Jira & Slack to track AI adoption automatically.
Get Started Free β†’

Executive Summary

167,308
Total Pull Requests
100
OSS Companies
12.1%
AI-Assisted PRs
-50%
Avg Review Time Change
-9%
Avg Cycle Time Change
Key Finding: AI tools accelerate both code review (-50% avg) and overall cycle time (-9% avg). But the story is more nuanced: Review AI (CodeRabbit, Cubic) drives most gains (-54% review time), while Code AI (Cursor, Claude) shows mixed results (+16% cycle time).
⚠️ Correlation β‰  Causation: These findings show correlation, not causation. We cannot determine if AI itself improves delivery or if high-performing teams simply adopt AI more readily. Alternative explanations include: selection bias toward well-maintained projects, and the fact that disclosed AI usage may correlate with more structured development practices.

AI Tool Evolution (2025)

Monthly trend of AI tool usage across all teams. Shows dramatic market shift throughout the year.

Market Shift: CodeRabbit dominated early 2025 (95%+ share in Jan-Feb). By Q4, the market fragmented: Cubic emerged as #2 (347 PRs in Nov), Claude and Cursor saw 10x growth. Human-directed AI tools (Claude, Cursor, Copilot) grew from 2% to 35% of monthly usage.
⚠️ Important Context β€” Hidden Tool Usage:

Our tool rankings reflect disclosed usage patterns in OSS PRs, not total market share. Industry surveys show Copilot (68%) and ChatGPT (82%) dominate actual usage β€” yet they're nearly absent from our data.

Why? Tools that leave visible artifacts (CodeRabbit bot comments, Devin author attribution) appear dominant because we can detect them. Silent tools (Copilot autocomplete, ChatGPT for research/debugging) are likely underrepresented by 5-10x.

Tool Category Over Time

Current Tool Market Share

πŸ”€ Code AI vs Review AI: A New Framework

Not all AI tools are equal. We categorize AI coding tools by their primary function to reveal different impact patterns.

⌨️ Code AI

Tools that write or generate code β€” autocomplete, code generation, refactoring assistance.

Cursor Copilot Claude Devin ChatGPT Windsurf Aider

πŸ” Review AI

Tools that review or comment on code β€” automated reviews, suggestions, quality checks.

CodeRabbit Greptile Cubic Sourcery CodeAnt
Why This Categorization Matters:
  • Different impact patterns: Code AI affects what gets written; Review AI affects what gets approved
  • Different investment decisions: Teams may need both categories for optimal outcomes
  • Detection bias awareness: Review AI (bots) is highly visible; Code AI (autocomplete) is often silent

πŸ“Š Category Breakdown in Our Data

Analysis of 25,217 detected tool usages across 167,308 PRs:

73.5%
Review AI
18,534 detections
CodeRabbit (11.1k) β€’ Cubic (6.9k) β€’ Greptile (469)
26.3%
Code AI
6,631 detections
Devin (1.8k) β€’ Cursor (1.4k) β€’ Claude (1.2k) β€’ Copilot (823)
Detection Bias Warning: Review AI dominates because it leaves visible bot comments. Code AI (especially Copilot autocomplete) is likely underrepresented by 5-10x β€” industry surveys show 68% of developers use Copilot, yet we detect only 823 explicit mentions. This gap is expected: autocomplete leaves no trace.

πŸ“ˆ Category Impact: The Data Speaks

Comparing PRs by AI category against the non-AI baseline (111,215 PRs):

NO AI (BASELINE)
25.0 hrs
avg cycle time
12.5 hrs
avg review time
CODE AI (2,685 PRs)
+19%
cycle time (29.7 hrs)
95% CI: +9% to +24%
-17%
review time (10.4 hrs)
95% CI: -29% to -9%
REVIEW AI (10,361 PRs)
+5%
cycle time (26.2 hrs)
95% CI: -1% to +6%
-54%
review time (5.8 hrs)
95% CI: -59% to -52%
πŸ“Š Distribution Note: Cycle times are heavily right-skewed β€” a few PRs take weeks while most merge quickly. The median (typical PR) differs dramatically from the mean:
No AI
Mean: 82.2h | Median: 5.7h
Code AI
Mean: 95.6h | Median: 11.4h
Review AI
Mean: 73.4h | Median: 6.0h
Medians show the typical PR experience is much faster than means suggest. The patterns hold in both measures.
βœ“ Review AI Insight: Automated code reviews (CodeRabbit, Cubic) deliver 11% faster cycle times and 54% faster reviews. Bot reviews replace some human review time while catching issues early.
⚠ Code AI Caveat: Code generation tools show +16% cycle time but -14% review time. Hypothesis: AI-generated code may require more iteration/refinement before review-readiness.
⚑ Key Takeaway for CTOs: Review AI is the clear efficiency win with measurable time savings. Code AI's mixed results suggest it's better suited for specific use cases (refactoring, boilerplate) rather than universal adoption. Consider a hybrid strategy: Review AI for all PRs + targeted Code AI for appropriate tasks.

πŸ“ Size-Normalized Analysis: Controlling for PR Size

PR size is a potential confounding variable β€” AI-assisted PRs might be systematically larger or smaller. By normalizing review time per 100 lines of code, we control for this:

Category PRs Review Hours / 100 Lines vs Baseline
Baseline (No AI) 96,903 321.4 hrs β€”
Code AI 3,502 236.5 hrs -26%
Review AI 12,290 122.8 hrs -62%
βœ“ Size Doesn't Explain the Results: Even after controlling for PR size, Review AI shows 62% faster reviews per 100 lines. Code AI also shows improvement (-26%) when size-normalized, suggesting the raw metrics may understate its benefits.

πŸ”¬ Within-Team Analysis: Controlling for Team Differences

Different teams have different baselines (fast vs slow). Comparing AI vs non-AI within each team controls for this. Teams included only if they have 10+ PRs in both groups:

48
Teams Analyzed
21
AI Faster (44%)
27
AI Slower (56%)
Show top 5 teams where AI is faster/slower
βœ“ AI Faster (Top 5)
Comp AI -81%
OpenReplay -78%
Coolify -65%
Directus -40%
Resend -38%
βœ— AI Slower (Top 5)
Appsmith +150%
Mattermost +140%
Windmill +102%
LangChain +97%
Plane +87%
⚠️ Simpson's Paradox Alert: The aggregate shows Review AI is -11% faster overall, but 60% of individual teams see AI as slower. This can happen when AI adoption correlates with other factors (e.g., teams that already ship fast adopt AI more). Aggregate stats can mislead β€” always check within-team patterns.

AI Adoption by Team

74 companies with 500+ analyzed PRs in 2025 (from 100 total), sorted by AI adoption rate.

Adoption Range: AI adoption varies from 0% (GrowthBook) to 85.6% (Plane). High-adoption teams tend to use review bots (CodeRabbit) extensively.

AI Impact on Metrics

Comparing AI-assisted PRs vs non-AI PRs within each team. Green = AI improved metric, Red = AI increased metric.

Review Time Impact (AI vs Non-AI)

Cycle Time Impact (AI vs Non-AI)

Monthly Adoption Trends by Team

Select teams to compare their AI adoption journey throughout 2025.

Patterns: Cal.com shows dramatic growth (22.9% Jan β†’ 78.8% Jun). Formbricks shows opposite trend (90% Jan β†’ 1% Dec). Teams experiment heavily then settle on sustainable adoption levels.

Complete Team Data

Team PRs AI % Cycle Ξ” Review Ξ” Size Ξ”
πŸ”¬

Detection Method Comparison

Regex patterns vs LLM semantic analysis β€” validating our AI detection accuracy

93.4%
Agreement Rate
51,800
PRs with Matching Results
Both methods agree
+1,718
Additional AI PRs Found
LLM catches what regex misses
53,876
PRs Analyzed
89.0% of dataset
⚑
Regex Pattern Detection
25+ patterns β€’ Instant β€’ Rule-based
10,626
AI-assisted PRs detected (17.2%)
  • Exact signature matching
  • Known tool footprints (CodeRabbit, Devin, etc.)
  • Fast, deterministic results
VS
🧠
LLM Semantic Analysis
ChatGPT OSS 20B + Llama 70B fallback β€’ via Groq
11,540
AI-assisted PRs detected (21.4%)
  • Context-aware understanding
  • Catches implicit AI mentions
  • Tool disambiguation (GPT-4 via Cursor β†’ Cursor)

Tool Detection by Method

Comparing how each method detects specific AI tools. Green delta = LLM finds more; Red delta = Regex finds more (often over-matching).

AI Tool Regex LLM Delta Insight
CodeRabbit 5,431 5,891 +460 LLM catches bot mentions in comments
Devin 1,706 1,743 +37 Strong pattern coverage
Cubic 1,490 1,590 +100 LLM understands "cubic" in context
Claude 412 724 +312 +76% β€” catches indirect refs
Cursor 499 702 +203 +41% β€” IDE mentions
Greptile 7 459 +452 65x more! Regex pattern gap
Copilot 204 403 +199 2x β€” implicit mentions
ChatGPT 471 238 -233 LLM reassigns to specific tools
πŸ’‘ Key Insights from Method Comparison
  • 93.4% agreement validates both detection methods produce consistent results
  • LLM catches 1,718 additional AI-assisted PRs (3.2%) that regex patterns miss entirely
  • Greptile detection improved 65x β€” major regex pattern gap identified and fixed via LLM feedback
  • LLM correctly reassigns generic "ChatGPT" to specific tools (e.g., "GPT-4 via Cursor" β†’ Cursor)
  • Human-directed AI tools (Claude, Cursor, Copilot) have the largest detection gaps β€” developers don't always mention them explicitly

Detection Improvement by Team

How much more AI usage does LLM detect compared to regex patterns? Positive = LLM finds more, Negative = Regex over-matches.

Biggest LLM Improvements:
  • Deno β€” +408% more AI PRs detected
  • Twenty CRM β€” +360% improvement
  • PostHog β€” +200% (312 more PRs)
  • Cal.com β€” +476 additional AI PRs
Regex Over-Matching:
  • Vercel β€” Regex found 135 more (likely false positives)
  • LangChain β€” Regex found 50 more
  • LLM more conservative with ambiguous mentions

Note: Teams with low AI usage show high percentage improvements but small absolute numbers. High-adoption teams like Plane (87%) show near-parity between methods.

πŸ”„ LLM-to-Regex Feedback Loop

LLM analysis revealed patterns that were then added to regex detection, creating a continuous improvement cycle:

  • LLM detected @lingodotdev mentions β†’ added to regex β†’ 430 new Replexica detections
  • CodeRabbit author patterns identified β†’ 6,884 total detections
  • Gap reduction: 1,717 β†’ 1,668 (-49 PRs, 2.9% improvement)

This validates an iterative approach: LLM finds edge cases β†’ patterns extracted β†’ regex updated β†’ backfill run.

πŸ“Š
Measure your team's AI adoption β€” tformance runs this analysis automatically on your repos.
Learn more β†’

AI Adoption Correlations

Analyzing 53,876 LLM-processed PRs to understand where AI tools are used most.

By PR Type

Key Insight: Refactors show highest AI adoption (32.8%), suggesting developers leverage AI for code restructuring more than greenfield features. CI/CD work shows lowest adoption (9.9%) β€” infrastructure-as-code less suited to current AI tools.

By Technology Category

Key Insight: Mobile development shows highest AI adoption (42.4%), though sample size is smaller (118 PRs). Test-related PRs at 33% suggest AI is heavily used for test generation. DevOps lowest at 15.6%.

By PR Size

Key Insight: Larger PRs correlate with higher AI usage β€” XL PRs (501+ lines) show 26.0% AI adoption vs 17.8% for XS PRs (0-10 lines). This is a 46% relative increase. AI tools may enable developers to tackle larger changes confidently, or larger changes benefit more from AI assistance.

By Team Structure

Does team composition affect AI adoption? We measured "concentration" β€” what % of PRs come from the top 5 contributors.

Focused Teams (70%+ concentration):
  • Dub β€” 94.3% concentration, 83.9% AI
  • Trigger.dev β€” 76.0%, 45.7% AI
  • Formbricks β€” 70.3%, 50.5% AI

~66 contributors avg, 29.2% AI adoption

Distributed Teams (<40% concentration):
  • PostHog β€” 18.7% concentration, 7.5% AI
  • LangChain β€” 30.5%, 4.2% AI
  • Supabase β€” 31.5%, 12.0% AI

~452 contributors avg, 19.2% AI adoption

Key Insight: Focused teams show 52% higher AI adoption (29.2% vs 19.2%) than distributed open-source projects. Core teams can establish AI tool workflows, share knowledge, and standardize practices. Large contributor bases with occasional contributors have less consistent tooling.

πŸ’‘ Recommendation: Explicit AI Disclosure in PRs

Teams with high AI detection rates (like Antiwork at 61.7%) often have a culture of explicit AI disclosure in PR descriptions:

## AI Usage
- Used Claude to scaffold the initial component structure
- Copilot assisted with test generation
- Cursor for refactoring the API layer

Benefits for Engineering Leaders:

  • Accurate measurement β€” Know the true AI adoption rate across your team
  • Knowledge sharing β€” Team members learn which tools work for which tasks
  • Quality insights β€” Correlate AI usage with review feedback and bug rates
  • Onboarding β€” New hires learn team AI practices from PR history

Consider adding an "AI Usage" section to your PR template. Even "No AI used" is valuable data.

Methodology

Data Collection

  • Source: GitHub GraphQL API with authenticated access
  • Teams: 51 open source project teams
  • Date Range: January 1 - December 25, 2025
  • PR Fields: Title, body, comments, commits, reviews, files

Metrics Definitions

  • Cycle Time: Hours from PR creation to merge
  • Review Time: Hours from PR creation to first review
  • PR Size: Total lines added + deleted
  • Impact Delta: (AI metric - Non-AI metric) / Non-AI Γ— 100

AI Detection Methods

  • LLM Detection: Llama 3.3 70B (via Groq Batch API) for semantic analysis of PR metadata
  • Pattern Detection: 25+ regex patterns for known AI signatures
  • Tool Attribution: Extracted from PR descriptions, commits, comments
  • Scope: Metadata only β€” file contents not analyzed

Data Quality

  • Total PRs: 167,308
  • LLM Analyzed: 109,240 (65.3%)
  • AI-Assisted PRs: 20,260 (12.1%)
  • OSS Companies: 101 (74 with 500+ PRs)

πŸ“Š Data Pipeline: Sample Selection

Why do our category metrics show 125k PRs instead of 167k? Here's the data funnel:

167,308 total PRs (all 101 OSS companies)
    β”‚
    β”œβ”€β–Ά 5,346 excluded: teams with <500 PRs (too small for stats)
    β”‚
    └─▢ 161,962 PRs from 74 teams with 500+ PRs
            β”‚
            β”œβ”€β–Ά 36,290 excluded: unmerged PRs (no cycle_time)
            β”‚   └─ Cycle time requires merge timestamp
            β”‚
            └─▢ 125,573 merged PRs
                    └─ This is what category_metrics.csv contains
                    └─ Used for cycle time & review time analysis
Why exclude unmerged PRs? Cycle time (PR creation β†’ merge) is undefined for PRs that never merged. Draft PRs, abandoned PRs, and still-open PRs are excluded from timing analysis but counted in adoption rates.

πŸ“š Industry Context

How do our findings compare to major developer surveys? Understanding the gap helps contextualize our results.

12.1%
Our Report (Detected)
Explicit AI mentions in 161k OSS PRs
84%
Stack Overflow 2025
Using or planning to use AI
51%
SO 2025 Daily Users
Professional devs using AI daily
85%
JetBrains 2025
Regularly use AI for coding
+19%
METR 2025 (RCT)
AI made devs slower (controlled study)
πŸ”¬
The Only Randomized Controlled Trial (RCT)

METR's 2025 study is the only randomized controlled trial measuring AI tool impact on developer productivity. Key findings:

  • Developers with AI tools took 19% longer to complete tasks
  • Yet developers believed they were 20% faster β€” a 43-point perception gap
  • Sample: 16 experienced devs, 246 issues from major OSS repos

Why this matters: Survey data (SO, JetBrains) captures developer perception; RCT captures reality. Our behavioral PR data aligns more closely with RCT findings than survey sentiment.

⚠️ Important Caveat on Study Design: METR used an RCT (causal) β€” randomized assignment proves causation. This report is observational (correlational) β€” we observe what happens but cannot prove AI caused faster/slower cycles. Confounders like team culture, PR complexity, or developer experience could explain differences. The within-team and size-normalized analyses help control for some confounders, but true causation requires controlled experiments.

πŸ“Š Understanding the 4x Gap: Why Our Data Differs

Metric Value What It Measures Why Different
Our Report 12.1% PRs with explicit AI disclosure Floor β€” only visible mentions
SO 2025 Daily 51% Devs using AI daily at work Self-reported, includes silent use
SO 2025 Total 84% Using or planning to use Ceiling β€” includes "planning"
JetBrains 2025 85% Regularly use AI for coding Self-reported, all usage types
METR 2025 (RCT) +19% slower Task completion time with AI Controlled experiment, not survey
🧊
The Iceberg Analogy:

Think of AI usage as an iceberg. Our 12.1% is the visible tip β€” PRs where developers explicitly mention AI tools. The hidden 80% includes: Copilot autocomplete (accepted silently), ChatGPT brainstorming (never documented), and AI-assisted debugging (no trace in PR). Survey data captures the whole iceberg; we only see what surfaces in PR metadata.

Why This Matters: Our 12.1% is a conservative floor, not a ceiling. The real AI adoption rate in these teams is likely 40-60% based on industry benchmarks. Our data shows disclosed, attributable AI usage β€” valuable for understanding tool-specific patterns, but not total AI penetration.
  • Copilot autocomplete β€” 68% of devs use it (SO 2025), but it's rarely mentioned in PRs
  • ChatGPT research β€” 82% use it, but for learning/debugging, not disclosed
  • OSS disclosure norms β€” OSS may have lower disclosure rates than enterprise
  • Our unique value β€” We capture what developers choose to share, showing tool attribution patterns

Key Survey Insights (2025)

  • Trust declining: Only 33% trust AI outputs (SO 2025), down from 43% in 2024
  • Productivity claims: 52% say AI improved productivity (SO 2025)
  • Time saved: 88% save 1+ hour/week, 19% save 8+ hours (JetBrains 2025)
  • Coding assistants: 62% use AI coding assistant or agent (JetBrains 2025)

Our Unique Contribution

  • Behavioral data β€” PR metadata, not self-reported surveys
  • Tool attribution β€” which specific tools are used where
  • Metric correlation β€” AI vs cycle time, review time
  • Trend analysis β€” month-over-month adoption changes
πŸ“– Sources: Stack Overflow 2025 β€’ JetBrains 2025 β€’ METR 2025 RCT β€’ Stack Overflow 2024 (for trend comparison)

🎯 Action Items for Your Team

Based on our findings, here's what engineering leaders should evaluate in their own teams.

πŸ“Š Measure Your Baseline

Do you know your team's current AI adoption rate? Without measurement, you can't improve. Start tracking AI usage in PRs β€” even informal surveys help.

πŸ“ Add AI Disclosure to PR Template

High-adoption teams explicitly disclose AI usage. Add an "## AI Usage" section to your PR template. Even "No AI used" is valuable data.

πŸ”§ Diversify Your Tool Stack

Top performers use a mix of bots (CodeRabbit), IDEs (Cursor), and LLMs (Claude). Single-tool teams show worse outcomes than multi-tool teams.

βš–οΈ Target 40-60% Adoption

Very high adoption (>80%) correlates with longer cycle times. The sweet spot appears to be balanced usage where AI augments rather than replaces human judgment.

πŸ” Watch Review Velocity

AI-assisted PRs typically get reviewed faster (-31% avg). If your team isn't seeing this benefit, investigate the quality of AI-generated code.

πŸ‘₯ Consider Team Structure

Focused teams (small core) show 52% higher AI adoption. If you have many occasional contributors, AI tool standardization may be harder.

Want These Insights for Your Team?

tformance connects to your GitHub, Jira, and Slack to automatically measure AI adoption, correlate with delivery metrics, and surface actionable insights β€” no manual tracking required.

Get Started β†’ Request Demo

About the Author

OI

Oleksii Ianchuk

Technical Product Manager

8 years building developer tools. Technical Product Lead at Mailtrap β€” shipped Email API/SMTP, ran pricing experiments, onboarded enterprise accounts.

"We used Copilot, Cursor, and every AI tool we could get. But I couldn't answer: 'Is this actually helping us ship faster?'"
GitHub LinkedIn