Microsoft — Engineering Performance
Avg. perf / dev / mo (ETV)
+55.0%
1.77 → 2.74
Active contributors
−6.0%
168.0 → 158.0
Growth
+2.9pp
41.2% → 44.1%
Fixes
−7.9pp
25.3% → 17.4%
Microsoft vs. 500 OSS Performance Index
Per-engineer ETV for Microsoft plotted against the pooled 500 OSS Performance Index. Both series are 90-day trailing rolling averages scaled to a 30-day month, so the curves sit on the same scale (ETV / dev / mo) and can be compared point-for-point. Latest reading: Microsoft is 41% above the index (2.73 vs 1.94 ETV/dev/mo). Baseline gap was 31% above.
Monthly reports
Highlights
- Significant new capabilities were added to *Copilot* and *Chat*, including displaying Claude model details in chat sessions [2056350d · Justin Chen], adding a `CopilotAPI` service for future Claude agent integration [ddb2b117 · Tyler James Leonhardt], and improving the Chat Debug Tree View with cached token display [337820c3 · Bhavya U], [5d6d2593 · Bhavya U].
- New features for *Agents* include adding a diff summary action to the chat editing session changes view [0943c170 · Ladislau Szomoru], support for "Discard Changes" in "Uncommitted Changes" [b43191a5 · Ladislau Szomoru], and introducing session activity events to fix client tool stalls [120066c2 · Connor Peet].
- The *Playwright Dashboard* gained a new `openDashboardForContext` entry point with refactored session management [459fb7b6 · Pavel Feldman]. The *Playwright CLI* was enhanced to support `.zip` reports in `show-report` [26d198eb · Pavel Feldman] and improved argument validation [97b450c4 · Pavel Feldman].
- UI/UX and customization improvements include the reintroduction of the `update.titleBar` setting for update indicator visibility [753d67d9 · Dmitriy Vasyura] and a new theme selection step for *Agents* to preview/import VS Code themes [f2ccb575 · Sandeep Somavarapu].
- A comprehensive set of *Drawer components* was added to the `react-headless-components-preview` package [c7766b6f · Dmytro Kirpa].
Observations
- Total output increased 50% compared to the 5-month average (current: 420, avg: 281), indicating a strong focus on new feature development and enhancements.
- The grow score surged 49% compared to the 5-month average (current: 175, avg: 118), reflecting substantial progress in delivering new value.
- Maintenance score saw a substantial 85% increase compared to the 5-month average (current: 178, avg: 96), suggesting significant refactoring, test improvements, and infrastructure work.
- The waste score remained stable with a 0% change compared to the 5-month average (current: 67, avg: 67), which is positive given the high activity in other areas.
- A notable pattern of *security fixes* was observed across various Playwright components, including `fetch` request handling [703f0cbc · Pavel Feldman], `mcp` HTTP transport [4a80eed3 · Pavel Feldman], Trace Viewer snapshot validation [26219d51 · Pavel Feldman], `postMessage` origin validation [d6041b50 · Pavel Feldman], WebSocket server origin validation [212c3a40 · Pavel Feldman], and Trace CLI attachment confinement [1d38adbe · Pavel Feldman].
- Rework in *CLI* and *testing infrastructure* was evident with the revert of the `annotate` command [24db89b1 · Yury Semikhatsky] and ongoing test framework migrations (Jest to Mocha) [946cfe57 · Joshua Smithrud] and test mode cleanups [d57fb31e · Pavel Feldman].
- Several critical bug fixes were implemented to improve stability, such as fixing flaky network tests in `mcp` [10fbbbcb · Yury Semikhatsky], resolving client tool stalls in `agentHost` [120066c2 · Connor Peet], and addressing `TypeError` in `mapObservableArrayCached` [1d76c448 · copilot-swe-agent[bot]].
Repositories
Active repositories ranked by average performance per developer per month (over the last 90 days). The chart shows monthly performance composition — each repo as a stacked layer, with the top of the stack representing total org performance per month. Top 9 repos shown; the remainder is aggregated as “Other”.
| Repository | ||||
|---|---|---|---|---|
| playwright | 8 | 122 | 5.1 | +52%since Q2 2025 |
| vscode | 74 | 942 | 4.2 | +310%since Q2 2025 |
| Agents-for-net | 3 | 23 | 2.6 | −3%since Q2 2025 |
| fluentui | 15 | 62 | 1.4 | −61%since Q2 2025 |
| PowerToys | 13 | 54 | 1.4 | +224%since Q2 2025 |
| FluidFramework | 28 | 69 | 0.8 | −29%since Q2 2025 |
| semantic-kernel | 7 | 16 | 0.8 | −72%since Q2 2025 |
| terminal | 4 | 6 | 0.5 | +72%since Q2 2025 |
| DeepSpeed | 5 | 4 | 0.3 | +27%since Q2 2025 |
| autogen | 1 | 1 | 0.2 | −92%since Q2 2025 |
| TypeScript | 2 | 1 | 0.1 | −22%since Q2 2025 |
| markitdown | 1 | 0 | 0.0 | −65%since Q2 2025 |
Company total12 repositories | 158unique devs | 1299ETV total | 2.74ETV / dev / mo | +115%since Q2 2025 |
| Performance (ETV) is the sum of every repository above. Active devs at the company level counts unique contributors across all repos, so a contributor working in multiple repos is counted once here but appears in each repo's row (the per-repo column will sum higher). ETV / dev / mo = Company ETV ÷ unique devs ÷ 3 mo. The "Since start" column compares each repo's Q1 2026 quarterly performance to the first quarter it had any activity — for repos that existed in Q2 2025 (when this index began), that's Q2 2025; for younger repos it's the quarter they actually started. The company row uses Q2 2025 as the baseline since the index itself began then. | ||||
Performance Growth vs Active Contributors
Shows how engineering performance scales relative to team growth. Left axis shows total performance score, right axis shows active contributor count. The gap between curves represents productivity gains — more delivered per person, not just more people. Unit: Engineering Throughput Value (ETV).
Cost per Performance Unit
−59%
If performance per engineer more than doubled, each unit of engineering performance now costs approximately 59% less than at the baseline 90-day window (ending 2025-06-29). This is a directional estimate — the exact figure depends on fully-loaded engineer cost, but the direction is unambiguous.
Effective Capacity Added
+223 engineers
At today's productivity, the current 158-person team delivers the performance equivalent of 381 engineers at the baseline 90-day rolling window (ending 2025-06-29). That's roughly 223 engineers worth of capacity added through productivity gains, not hiring.
Performance Composition
Stacked bars show total complexity performance split into Growth (new value), Maintenance (sustaining systems), and Fixes (rework). The yellow line overlays performance per contributor — rising line means each engineer is delivering more, regardless of team size changes. Unit: Engineering Throughput Value (ETV).
CapEx vs OpEx
Monthly CapEx vs OpEx split. CapEx (capitalizable investment) is Growth — new features and capabilities. OpEx (operating expense) is Maintenance plus Fixes — keeping the lights on and reworking what's already shipped. The yellow line is the CapEx share, a quick read on how much of the month went into building new vs sustaining existing. Unit: Engineering Throughput Value (ETV).
Hours per Repository
Trailing 90-day window (64 working days). Org-level capacity is allocated to each repo by its share of org performance, then split CapEx / OpEx by that repo's own Growth vs Maintenance + Fixes mix.
| vscode | 48.1% | 51.9% |
| PowerToys | 41.7% | 58.3% |
| fluentui | 37.1% | 62.9% |
| FluidFramework | 34.9% | 65.1% |
| playwright | 32.9% | 67.1% |
| terminal | 29.7% | 70.3% |
| markitdown | 25.0% | 75.0% |
| Agents-for-net | 22.1% | 77.9% |
| semantic-kernel | 19.1% | 80.9% |
| DeepSpeed | 7.3% | 92.7% |
| TypeScript | 5.7% | 94.3% |
| autogen | 0.0% | 100.0% |
| Total | 44.1% | 55.9% |
Fix Burden Distribution
Monthly rework volume broken down by who did it. Top contributors carry their named slice; everyone else is rolled into Others. Use this to spot whether fix work is concentrated on a small group (bus-factor risk) or distributed across the team.
Fix authorship over time
Monthly fix activity in this scope, split by who fixed the bug. Darker emerald = the original author fixed their own bug; lighter emerald = someone else cleaned it up. A persistently high 'fixed by another author' share is a signal of bug-debt landing on the team rather than its author. X-axis is fix-time — bugs introduced but not yet detected don't appear.
- Self-fix share
- 31%
- Total bug impact
- 926.9
Quarterly Summary
The raw numbers behind the charts: commits analyzed, active contributors, total performance, performance per developer, and the Growth / Maintenance / Fixes split for each quarter.
| Quarter | ||||||||
|---|---|---|---|---|---|---|---|---|
| Q2'25 | 4,641 | 160 | 556.91 | 1.2 | 37.7% | 47.1% | 15.2% | — |
| Q3'25 | 5,279 | 166 | 552.86 | 1.1 | 40.3% | 43.1% | 16.7% | −1% |
| Q4'25 | 6,137 | 164 | 595.17 | 1.2 | 33.2% | 44.8% | 22.1% | +8% |
| Q1'26 | 9,278 | 169 | 1,197.49 | 2.4 | 44.2% | 32.4% | 23.4% | +101% |
Top Contributors
Contributors ranked by performance per month (Growth + Maintenance + Fixes), over the last 90 days normalized to a 30-day calendar month.
The best way to measure AI efficiency
SampleA preview of the Navigara engine running on a sample organization. The numbers below are illustrative, not part of the OSS500 benchmark above.
Measure
Score every commit by depth
GitHub commits are weighted by what it took to write them, not by lines of code. The result is ETV per developer per month.
SourceGitHub
Spend
Tie ETV to cost
AI token bills and seat costs are pulled per team and divided by the ETV produced. The result is your true cost per unit of work.
SourceToken usage + finance
Map
Tie work to objectives
Each ETV is mapped to your Jira epics and labels, so you can see what's key-aligned, aligned, or unmapped capacity.
SourceJira
Performance
ETV delivered per developer / month
9.4ETV / dev / month
5.6 below target- Non-AI 5.8
- AI 3.6
AI Efficiency
AI spend per ETV unit delivered
$4.20
$0.60 over target- Cost / ETV $4.20
Objective Alignment
Share of work mapped to key objectives
51%
24 pts below target- Key-aligned 28%
- Aligned 23%