β‘ TL;DR: Review AI Works. Code AI? It's Complicated.
β What This Research SHOWS
Review AI β -11% cycle time CI: -18% to -7% (significant)
Review AI β -54% review time CI: -60% to -49% (significant)
Code AI β +16% cycle time CI: +4% to +25% (significant)
β What This Research Does NOT Show
"Code AI speeds up reviews" β NOT significant (CI crosses zero)
"AI causes faster cycles" β Correlation only, not RCT
"Typical PR is faster" β Medians show opposite (+100% slower)
"All teams benefit" β 60% of teams show AI is SLOWER
"We detected all AI" β Copilot (68% adoption) leaves no trace
"Review AI = human productivity" β Bots auto-comment instantly; we measure bot latency
Industry claims 85% adoption (Stack Overflow, JetBrains 2025) and massive productivity gains.
The only RCT study (METR 2025) found AI made devs 19% slower.
We analyzed 167,000+ PRs from 101 OSS companies to see what actually happens.
π REVIEW AI (CodeRabbit, Cubic, Greptile)
-54% review time β bots provide instant feedback
-11% cycle time β PRs merge faster
73.5% of detected AI usage
β¨οΈ CODE AI (Cursor, Claude, Copilot, Devin)
+16% cycle time β aligns with METR RCT findings
-14% review time (n.s.) β NOT statistically significant
26.3% of detected usage (real share likely much higher)
CYCLE TIME: HOW LONG UNTIL PR MERGES?
π REVIEW AI
73.4
hours avg
-11% faster
CodeRabbit, Cubic
π BASELINE
82.2
hours avg
0% (no AI)
111,215 PRs
β¨οΈ CODE AI
95.6
hours avg
+16% slower
Cursor, Claude, Copilot
Lower is better. Based on 167,000+ PRs from 101 OSS companies (74 with 500+ PRs for detailed analysis).
β οΈ Median Reality Check: The Typical PR Experience
Headlines use means. But medians tell a different story. Analysis excludes PRs >200h cycle time (~9% outliers) for statistical validity.
CODE AI (Typical PR)
+53%
5.5h vs 3.6h baseline
REVIEW AI (Typical PR)
+11%
4.0h vs 3.6h baseline
SKEW FACTOR
6.9Γ
mean Γ· median (filtered)
Bottom line: The typical (median) PR tells a consistent story: both Code AI (+53%) and Review AI (+11%) correlate with slightly slower cycle times.
Mean-based headlines can be misleading due to outliers.
Review AI: Mixed Cycle, Faster Reviews
+5% cycle time (not significant), -54% review time (significant).
Code AI: Slower Cycles, Faster Reviews
+19% cycle time (95% CI: +9% to +24%), -17% review time.
π¬ Methodology
LLM analysis of PR metadata. Outliers >200h excluded (~9%). Detection is a floorβsilent tools leave no trace.
About This Research
This is a meta-analysis of 167,000+ pull requests from 101 open source companies throughout 2025, attempting to understand real-world AI coding tool adoption patterns and their impact on engineering metrics.
How It Works
Fetch PR metadata (title, description, comments, commits) via GitHub GraphQL API
Analyze each PR with LLM (Llama 3.3 70B via Groq) to detect AI tool mentions
Calculate delivery metrics: cycle time, review time, PR size
Correlate AI usage with delivery metrics using statistical analysis
What We Don't Analyze
β Actual file contents or code quality
β Private/enterprise repositories
β AI usage without disclosure (hidden adoption)
β Copilot suggestions accepted without mention
β οΈ Important Disclaimer: This is an independent research project, not a peer-reviewed study. While we strive for accuracy, our detection methods have limitations β we can only identify AI usage that developers explicitly disclose. The findings represent detected AI adoption, not total AI adoption. Use these insights as directional guidance, not definitive conclusions.
π Want Better Insights for YOUR Team?
This OSS study has inherent limitations. tformance solves them with native integrations:
β GitHub + Copilot API
Actual Copilot acceptance rates
Lines of code generated
Usage intensity (not just detected/not)
β Jira Integration
Story points (complexity control)
Issue type: bug vs feature vs refactor
Sprint velocity correlation
β Slack Surveys
Per-PR: "Did AI help?"
Task complexity self-report
Bot vs human reviewer tracking
The difference: This report shows correlations in OSS data.
With your team's full data, tformance can show controlled associations β answering
"Does AI actually help MY team, on which tasks, and by how much?"
How confident can we be in these findings? Here's a transparency breakdown of our statistical validity.
167,308
PRs Analyzed
Large sample size
Β±0.16%
95% Confidence Interval
For overall 12.1% adoption
21.9%
Standard Deviation
Between teams (high variance)
p < 0.0001
Team Structure Test
Chi-square significance
β What We Can Confidently Say:
Overall detected AI adoption is 12.1% Β± 0.16% (95% CI)
AI-assisted PRs show 9% faster cycle time on average
Review time is 50% faster for AI-assisted PRs
Review AI drives the gains: -11% cycle time, -54% review time
Code AI shows mixed results: +16% cycle time, -14% review time
There's massive variance between teams (0% to 86%)
β οΈ Limitations to Consider:
Selection bias: These are popular OSS projects, not a random sample
Detection bias: We only capture disclosed AI usage
Team confounding: Language/framework correlations overlap with team identity
Survivorship: Only analyzing active, successful projects
π Distribution Insight:
Team AI adoption ranges from 0% (Huly) to 85.6% (Plane). The interquartile range is 0.9% β 13.5%,
meaning half of teams fall within this wide range. Aggregate stats (like "12.1% overall") hide enormous
project-to-project differences β your team could reasonably be anywhere from 0% to 86%.
AI Adoption Trend β 2025
Overall AI-assisted PR rate across 101 OSS companies throughout 2025. Shows the growth trajectory from early adoption to mainstream usage.
8.3%
January 2025
16.8%
Peak (July 2025)
14.6%
December 2025
+76%
YoY Growth (JanβDec)
Trend Analysis: AI adoption grew steadily throughout 2025, peaking at 16.8% in July. With 101 OSS companies in the dataset (74 with 500+ PRs), we see variance month-to-month. The dip after July may reflect seasonal patterns or detection limitations as AI usage becomes more normalized.
Key Takeaways for Engineering Leaders
1 Mixed Tool Strategy Wins
Teams using bot + IDE + LLM combinations show better outcomes than single-tool teams. Antiwork's 61.9% adoption with mixed tools yielded -8.5% cycle time and -50.2% review time.
2 Review Velocity is Consistent
35 out of 50 teams (with comparable data) show faster review times with AI assistance. Average improvement: -52%. AI-generated code appears easier to review.
3 Autonomous Agents Growing Fast
Cubic went from 0 β 347 PRs/month (JanβNov). Autonomous agents (Devin, Cubic) now represent 30.7% of all AI tool usage.
4 40-60% is Sweet Spot
Very high adoption (>80%) correlates with larger cycle times. Best-performing teams (Antiwork, Trigger.dev) maintain balanced 40-60% AI usage.
5 Tool Diversification Accelerating
Jan: CodeRabbit had 95%+ share. Dec: CodeRabbit 36%, Cubic 21%, Claude 17%. The market is fragmenting rapidly.
Want these insights for your team? β Connect GitHub, Jira & Slack to track AI adoption automatically.
Key Finding: AI tools accelerate both code review (-50% avg) and overall cycle time (-9% avg). But the story is more nuanced: Review AI (CodeRabbit, Cubic) drives most gains (-54% review time), while Code AI (Cursor, Claude) shows mixed results (+16% cycle time).
β οΈ Correlation β Causation:
These findings show correlation, not causation. We cannot determine if AI itself improves delivery or if high-performing teams simply adopt AI more readily. Alternative explanations include: selection bias toward well-maintained projects, and the fact that disclosed AI usage may correlate with more structured development practices.
AI Tool Evolution (2025)
Monthly trend of AI tool usage across all teams. Shows dramatic market shift throughout the year.
Market Shift: CodeRabbit dominated early 2025 (95%+ share in Jan-Feb). By Q4, the market fragmented: Cubic emerged as #2 (347 PRs in Nov), Claude and Cursor saw 10x growth. Human-directed AI tools (Claude, Cursor, Copilot) grew from 2% to 35% of monthly usage.
β οΈ Important Context β Hidden Tool Usage:
Our tool rankings reflect disclosed usage patterns in OSS PRs, not total market share. Industry surveys show Copilot (68%) and ChatGPT (82%) dominate actual usage β yet they're nearly absent from our data.
Why? Tools that leave visible artifacts (CodeRabbit bot comments, Devin author attribution) appear dominant because we can detect them. Silent tools (Copilot autocomplete, ChatGPT for research/debugging) are likely underrepresented by 5-10x.
Tool Category Over Time
Current Tool Market Share
π Code AI vs Review AI: A New Framework
Not all AI tools are equal. We categorize AI coding tools by their primary function to reveal different impact patterns.
β¨οΈ Code AI
Tools that write or generate code β autocomplete, code generation, refactoring assistance.
CursorCopilotClaudeDevinChatGPTWindsurfAider
π Review AI
Tools that review or comment on code β automated reviews, suggestions, quality checks.
CodeRabbitGreptileCubicSourceryCodeAnt
Why This Categorization Matters:
Different impact patterns: Code AI affects what gets written; Review AI affects what gets approved
Different investment decisions: Teams may need both categories for optimal outcomes
Detection bias awareness: Review AI (bots) is highly visible; Code AI (autocomplete) is often silent
π Category Breakdown in Our Data
Analysis of 25,217 detected tool usages across 167,308 PRs:
Detection Bias Warning:
Review AI dominates because it leaves visible bot comments. Code AI (especially Copilot autocomplete) is likely underrepresented by 5-10x β industry surveys show 68% of developers use Copilot, yet we detect only 823 explicit mentions. This gap is expected: autocomplete leaves no trace.
π Category Impact: The Data Speaks
Comparing PRs by AI category against the non-AI baseline (111,215 PRs):
NO AI (BASELINE)
25.0 hrs
avg cycle time
12.5 hrs
avg review time
CODE AI (2,685 PRs)
+19%
cycle time (29.7 hrs)
95% CI: +9% to +24%
-17%
review time (10.4 hrs)
95% CI: -29% to -9%
REVIEW AI (10,361 PRs)
+5%
cycle time (26.2 hrs)
95% CI: -1% to +6%
-54%
review time (5.8 hrs)
95% CI: -59% to -52%
π Distribution Note:
Cycle times are heavily right-skewed β a few PRs take weeks while most merge quickly. The median (typical PR) differs dramatically from the mean:
No AI
Mean: 82.2h | Median: 5.7h
Code AI
Mean: 95.6h | Median: 11.4h
Review AI
Mean: 73.4h | Median: 6.0h
Medians show the typical PR experience is much faster than means suggest. The patterns hold in both measures.
β Review AI Insight:
Automated code reviews (CodeRabbit, Cubic) deliver 11% faster cycle times and 54% faster reviews. Bot reviews replace some human review time while catching issues early.
β Code AI Caveat:
Code generation tools show +16% cycle time but -14% review time. Hypothesis: AI-generated code may require more iteration/refinement before review-readiness.
β‘ Key Takeaway for CTOs:Review AI is the clear efficiency win with measurable time savings. Code AI's mixed results suggest it's better suited for specific use cases (refactoring, boilerplate) rather than universal adoption. Consider a hybrid strategy: Review AI for all PRs + targeted Code AI for appropriate tasks.
π Size-Normalized Analysis: Controlling for PR Size
PR size is a potential confounding variable β AI-assisted PRs might be systematically larger or smaller. By normalizing review time per 100 lines of code, we control for this:
Category
PRs
Review Hours / 100 Lines
vs Baseline
Baseline (No AI)
96,903
321.4 hrs
β
Code AI
3,502
236.5 hrs
-26%
Review AI
12,290
122.8 hrs
-62%
β Size Doesn't Explain the Results:
Even after controlling for PR size, Review AI shows 62% faster reviews per 100 lines. Code AI also shows improvement (-26%) when size-normalized, suggesting the raw metrics may understate its benefits.
π¬ Within-Team Analysis: Controlling for Team Differences
Different teams have different baselines (fast vs slow). Comparing AI vs non-AI within each team controls for this. Teams included only if they have 10+ PRs in both groups:
48
Teams Analyzed
21
AI Faster (44%)
27
AI Slower (56%)
Show top 5 teams where AI is faster/slower
β AI Faster (Top 5)
Comp AI-81%
OpenReplay-78%
Coolify-65%
Directus-40%
Resend-38%
β AI Slower (Top 5)
Appsmith+150%
Mattermost+140%
Windmill+102%
LangChain+97%
Plane+87%
β οΈ Simpson's Paradox Alert:
The aggregate shows Review AI is -11% faster overall, but 60% of individual teams see AI as slower. This can happen when AI adoption correlates with other factors (e.g., teams that already ship fast adopt AI more). Aggregate stats can mislead β always check within-team patterns.
AI Adoption by Team
74 companies with 500+ analyzed PRs in 2025 (from 100 total), sorted by AI adoption rate.
Adoption Range: AI adoption varies from 0% (GrowthBook) to 85.6% (Plane). High-adoption teams tend to use review bots (CodeRabbit) extensively.
AI Impact on Metrics
Comparing AI-assisted PRs vs non-AI PRs within each team. Green = AI improved metric, Red = AI increased metric.
Review Time Impact (AI vs Non-AI)
Cycle Time Impact (AI vs Non-AI)
Monthly Adoption Trends by Team
Select teams to compare their AI adoption journey throughout 2025.
Patterns: Cal.com shows dramatic growth (22.9% Jan β 78.8% Jun). Formbricks shows opposite trend (90% Jan β 1% Dec). Teams experiment heavily then settle on sustainable adoption levels.
Regex patterns vs LLM semantic analysis β validating our AI detection accuracy
93.4%
Agreement Rate
51,800
PRs with Matching Results
Both methods agree
+1,718
Additional AI PRs Found
LLM catches what regex misses
53,876
PRs Analyzed
89.0% of dataset
β‘
Regex Pattern Detection
25+ patterns β’ Instant β’ Rule-based
10,626
AI-assisted PRs detected (17.2%)
Exact signature matching
Known tool footprints (CodeRabbit, Devin, etc.)
Fast, deterministic results
VS
π§
LLM Semantic Analysis
ChatGPT OSS 20B + Llama 70B fallback β’ via Groq
11,540
AI-assisted PRs detected (21.4%)
Context-aware understanding
Catches implicit AI mentions
Tool disambiguation (GPT-4 via Cursor β Cursor)
Tool Detection by Method
Comparing how each method detects specific AI tools. Green delta = LLM finds more; Red delta = Regex finds more (often over-matching).
AI Tool
Regex
LLM
Delta
Insight
CodeRabbit
5,431
5,891
+460
LLM catches bot mentions in comments
Devin
1,706
1,743
+37
Strong pattern coverage
Cubic
1,490
1,590
+100
LLM understands "cubic" in context
Claude
412
724
+312
+76% β catches indirect refs
Cursor
499
702
+203
+41% β IDE mentions
Greptile
7
459
+452
65x more! Regex pattern gap
Copilot
204
403
+199
2x β implicit mentions
ChatGPT
471
238
-233
LLM reassigns to specific tools
π‘Key Insights from Method Comparison
93.4% agreement validates both detection methods produce consistent results
LLM catches 1,718 additional AI-assisted PRs (3.2%) that regex patterns miss entirely
Greptile detection improved 65x β major regex pattern gap identified and fixed via LLM feedback
LLM correctly reassigns generic "ChatGPT" to specific tools (e.g., "GPT-4 via Cursor" β Cursor)
Human-directed AI tools (Claude, Cursor, Copilot) have the largest detection gaps β developers don't always mention them explicitly
Detection Improvement by Team
How much more AI usage does LLM detect compared to regex patterns? Positive = LLM finds more, Negative = Regex over-matches.
Biggest LLM Improvements:
Deno β +408% more AI PRs detected
Twenty CRM β +360% improvement
PostHog β +200% (312 more PRs)
Cal.com β +476 additional AI PRs
Regex Over-Matching:
Vercel β Regex found 135 more (likely false positives)
LangChain β Regex found 50 more
LLM more conservative with ambiguous mentions
Note: Teams with low AI usage show high percentage improvements but small absolute numbers.
High-adoption teams like Plane (87%) show near-parity between methods.
π LLM-to-Regex Feedback Loop
LLM analysis revealed patterns that were then added to regex detection, creating a continuous improvement cycle:
LLM detected @lingodotdev mentions β added to regex β 430 new Replexica detections
CodeRabbit author patterns identified β 6,884 total detections
Gap reduction: 1,717 β 1,668 (-49 PRs, 2.9% improvement)
This validates an iterative approach: LLM finds edge cases β patterns extracted β regex updated β backfill run.
π
Measure your team's AI adoption β tformance runs this analysis automatically on your repos.
Analyzing 53,876 LLM-processed PRs to understand where AI tools are used most.
By PR Type
Key Insight: Refactors show highest AI adoption (32.8%), suggesting developers leverage AI for code restructuring more than greenfield features. CI/CD work shows lowest adoption (9.9%) β infrastructure-as-code less suited to current AI tools.
By Technology Category
Key Insight: Mobile development shows highest AI adoption (42.4%), though sample size is smaller (118 PRs). Test-related PRs at 33% suggest AI is heavily used for test generation. DevOps lowest at 15.6%.
By PR Size
Key Insight: Larger PRs correlate with higher AI usage β XL PRs (501+ lines) show 26.0% AI adoption vs 17.8% for XS PRs (0-10 lines). This is a 46% relative increase. AI tools may enable developers to tackle larger changes confidently, or larger changes benefit more from AI assistance.
By Team Structure
Does team composition affect AI adoption? We measured "concentration" β what % of PRs come from the top 5 contributors.
Focused Teams (70%+ concentration):
Dub β 94.3% concentration, 83.9% AI
Trigger.dev β 76.0%, 45.7% AI
Formbricks β 70.3%, 50.5% AI
~66 contributors avg, 29.2% AI adoption
Distributed Teams (<40% concentration):
PostHog β 18.7% concentration, 7.5% AI
LangChain β 30.5%, 4.2% AI
Supabase β 31.5%, 12.0% AI
~452 contributors avg, 19.2% AI adoption
Key Insight: Focused teams show 52% higher AI adoption (29.2% vs 19.2%) than distributed open-source projects. Core teams can establish AI tool workflows, share knowledge, and standardize practices. Large contributor bases with occasional contributors have less consistent tooling.
π‘ Recommendation: Explicit AI Disclosure in PRs
Teams with high AI detection rates (like Antiwork at 61.7%) often have a culture of explicit AI disclosure in PR descriptions:
## AI Usage - Used Claude to scaffold the initial component structure - Copilot assisted with test generation - Cursor for refactoring the API layer
Benefits for Engineering Leaders:
Accurate measurement β Know the true AI adoption rate across your team
Knowledge sharing β Team members learn which tools work for which tasks
Quality insights β Correlate AI usage with review feedback and bug rates
Onboarding β New hires learn team AI practices from PR history
Consider adding an "AI Usage" section to your PR template. Even "No AI used" is valuable data.
Methodology
Data Collection
Source: GitHub GraphQL API with authenticated access
LLM Detection: Llama 3.3 70B (via Groq Batch API) for semantic analysis of PR metadata
Pattern Detection: 25+ regex patterns for known AI signatures
Tool Attribution: Extracted from PR descriptions, commits, comments
Scope: Metadata only β file contents not analyzed
Data Quality
Total PRs: 167,308
LLM Analyzed: 109,240 (65.3%)
AI-Assisted PRs: 20,260 (12.1%)
OSS Companies: 101 (74 with 500+ PRs)
π Data Pipeline: Sample Selection
Why do our category metrics show 125k PRs instead of 167k? Here's the data funnel:
167,308 total PRs (all 101 OSS companies)
β
βββΆ 5,346 excluded: teams with <500 PRs (too small for stats)
β
βββΆ 161,962 PRs from 74 teams with 500+ PRs
β
βββΆ 36,290 excluded: unmerged PRs (no cycle_time)
β ββ Cycle time requires merge timestamp
β
βββΆ 125,573 merged PRs
ββ This is what category_metrics.csv contains
ββ Used for cycle time & review time analysis
Why exclude unmerged PRs? Cycle time (PR creation β merge) is undefined for PRs that never merged.
Draft PRs, abandoned PRs, and still-open PRs are excluded from timing analysis but counted in adoption rates.
π Industry Context
How do our findings compare to major developer surveys? Understanding the gap helps contextualize our results.
12.1%
Our Report (Detected)
Explicit AI mentions in 161k OSS PRs
84%
Stack Overflow 2025
Using or planning to use AI
51%
SO 2025 Daily Users
Professional devs using AI daily
85%
JetBrains 2025
Regularly use AI for coding
+19%
METR 2025 (RCT)
AI made devs slower (controlled study)
π¬
The Only Randomized Controlled Trial (RCT)
METR's 2025 study is the only randomized controlled trial measuring AI tool impact on developer productivity. Key findings:
Developers with AI tools took 19% longer to complete tasks
Yet developers believed they were 20% faster β a 43-point perception gap
Sample: 16 experienced devs, 246 issues from major OSS repos
Why this matters: Survey data (SO, JetBrains) captures developer perception; RCT captures reality. Our behavioral PR data aligns more closely with RCT findings than survey sentiment.
β οΈ Important Caveat on Study Design:METR used an RCT (causal) β randomized assignment proves causation.
This report is observational (correlational) β we observe what happens but cannot prove AI caused faster/slower cycles.
Confounders like team culture, PR complexity, or developer experience could explain differences.
The within-team and size-normalized analyses help control for some confounders, but true causation requires controlled experiments.
π Understanding the 4x Gap: Why Our Data Differs
Metric
Value
What It Measures
Why Different
Our Report
12.1%
PRs with explicit AI disclosure
Floor β only visible mentions
SO 2025 Daily
51%
Devs using AI daily at work
Self-reported, includes silent use
SO 2025 Total
84%
Using or planning to use
Ceiling β includes "planning"
JetBrains 2025
85%
Regularly use AI for coding
Self-reported, all usage types
METR 2025 (RCT)
+19% slower
Task completion time with AI
Controlled experiment, not survey
π§
The Iceberg Analogy:
Think of AI usage as an iceberg. Our 12.1% is the visible tip β PRs where developers explicitly mention AI tools.
The hidden 80% includes: Copilot autocomplete (accepted silently), ChatGPT brainstorming (never documented),
and AI-assisted debugging (no trace in PR). Survey data captures the whole iceberg; we only see what surfaces in PR metadata.
Why This Matters: Our 12.1% is a conservative floor, not a ceiling. The real AI adoption rate in these teams is likely 40-60% based on industry benchmarks. Our data shows disclosed, attributable AI usage β valuable for understanding tool-specific patterns, but not total AI penetration.
Copilot autocomplete β 68% of devs use it (SO 2025), but it's rarely mentioned in PRs
ChatGPT research β 82% use it, but for learning/debugging, not disclosed
OSS disclosure norms β OSS may have lower disclosure rates than enterprise
Our unique value β We capture what developers choose to share, showing tool attribution patterns
Key Survey Insights (2025)
Trust declining: Only 33% trust AI outputs (SO 2025), down from 43% in 2024
Productivity claims: 52% say AI improved productivity (SO 2025)
Time saved: 88% save 1+ hour/week, 19% save 8+ hours (JetBrains 2025)
Coding assistants: 62% use AI coding assistant or agent (JetBrains 2025)
Our Unique Contribution
Behavioral data β PR metadata, not self-reported surveys
Tool attribution β which specific tools are used where
Metric correlation β AI vs cycle time, review time
Based on our findings, here's what engineering leaders should evaluate in their own teams.
π Measure Your Baseline
Do you know your team's current AI adoption rate? Without measurement, you can't improve. Start tracking AI usage in PRs β even informal surveys help.
π Add AI Disclosure to PR Template
High-adoption teams explicitly disclose AI usage. Add an "## AI Usage" section to your PR template. Even "No AI used" is valuable data.
π§ Diversify Your Tool Stack
Top performers use a mix of bots (CodeRabbit), IDEs (Cursor), and LLMs (Claude). Single-tool teams show worse outcomes than multi-tool teams.
βοΈ Target 40-60% Adoption
Very high adoption (>80%) correlates with longer cycle times. The sweet spot appears to be balanced usage where AI augments rather than replaces human judgment.
π Watch Review Velocity
AI-assisted PRs typically get reviewed faster (-31% avg). If your team isn't seeing this benefit, investigate the quality of AI-generated code.
π₯ Consider Team Structure
Focused teams (small core) show 52% higher AI adoption. If you have many occasional contributors, AI tool standardization may be harder.
Want These Insights for Your Team?
tformance connects to your GitHub, Jira, and Slack to automatically measure AI adoption, correlate with delivery metrics, and surface actionable insights β no manual tracking required.
8 years building developer tools. Technical Product Lead at Mailtrap β shipped Email API/SMTP, ran pricing experiments, onboarded enterprise accounts.
"We used Copilot, Cursor, and every AI tool we could get. But I couldn't answer: 'Is this actually helping us ship faster?'"