Why we built this
Everyone feels 100×. Nobody knows what 100× looks like.
We had the moment. Claude Code wrote in 20 minutes what would have taken our team a sprint to complete. We shipped it. It worked. We felt 100×.
Then the feeling faded. We asked the question that mattered. Did we actually move 100× faster across the quarter, or did one task feel fast and the average stay where it was?
Nobody had the number. Not one you would trust against payroll, anyway.
The expectations gap is louder than the productivity gap
Every engineering manager we talk to is having the same standup right now.
A non-technical stakeholder saw Cursor write a landing page in an evening and assumed the whole roadmap should compress by the same factor. Why is the migration not done? Why is this still estimated in weeks? My nephew built a CRUD app over the weekend.
The engineering side answers with what it has. “It is more complex than it looks.” “Sustainable pace matters.” “Bugs cost more than features.” All true. None of the data. None of it the kind of thing you put in front of a CEO who has already made up their mind.
That argument is being held in standups right now, in every company. The side without numbers loses it every time.
The old rulers describe a world that does not exist anymore
DORA was built for a pre-AI org chart. Velocity points were vibes before Copilot. Lines of code are quietly back, dressed up as “AI completion rate” and “AI-generated LOC,” and they correlate with shipped value even less than they did the first time.
So we built a new one.
What the 500 OSS Performance Index measures
We track engineering output across the orgs that built the industry. Public repos only. Refreshed daily.
Every merged commit is scored by the depth of work, not by size. Growth. Maintenance. Fix. The unit is ETV. The methodology is public. Revisions are versioned.
We report it in three ways:
- ETV per developer per month. What a contributor actually sustains.
- Work composition. How much of the output is growth versus rework?
- AI share of authorship. Directionally, how much of the merged code is AI-assisted?
This is what top teams sustain when nobody is performing for a quarterly review.
The evidence
Per-developer monthly performance across the index has increased by 137% since Q2 2025. The curve has not flattened. We will be the first to say if it does.
We track it daily. The day the average per-developer ETV hits 100× of the Q2 2025 baseline, we will say it on the page. That is the day AI runs engineering instead of assisting it.
Yes, we thought about Goodhart
Every metric becomes a target. We know.
On the public index, we limit the gaming surface by including only well-known repositories with no real incentive to ship fake features for a leaderboard. We cannot see roadmap alignment on public data. We cannot see token spend. Those are real limits. We would rather name them than hide them.
In our engine for connected teams, we measure three things together:
- ETV per developer per month. Rewards shipping.
- Share of that ETV aligned to a roadmap epic or external ticket, pulled from Jira or Linear, not self-issued. Penalizes ticket-mining for the metric.
- Cost per ETV, including AI token spend. Penalizes burning AI budget for thin output.
We don't yet have a way to game all three at the same time; we're working on it. Someone will probably figure out a way. When they do, we will add a fourth.
That is our honest answer to “Will engineers just optimize for the number?” Build a metric that, when optimized, means doing the thing you wanted them to do anyway.
The name is a target
We are calling it the 500 OSS Performance Index because that is where we are heading.
Right now, the index covers a smaller seed set of repos across the orgs we cover. We will be adding the rest over the coming months. We chose to publish early on purpose. The methodology is the hard part. The repo list is not. Feedback while we are still building it is worth more than waiting for a launch number.
Adding more repositories costs us almost nothing. As soon as the seed set holds up under scrutiny, we widen it.
Coming next: the tooling map
A meaningful part of the index will be the tooling map. Which teams lean on Cursor versus Claude Code versus internal forks? Which repos run Copilot? Which have moved most of their authorship to agents?
Every vendor already has a leaderboard. This is something else → A signal of what is worth copying.
If the top quartile of merged output is sustained by teams running a specific stack, you should know that before your next purchasing cycle. If the gap between tools is smaller than the gap between teams using the same tool, you should know that too.
What to do with this
If you are an engineering leader getting beaten up on timelines, there is now a number to point at. The top teams in the industry, with the best available tools, maintain a consistent output. It is high. It is also not infinite.
If you are a CTO trying to figure out where to aim, target above the median on the index for your stack first. Closing the gap on the top quartile is the harder part. It does not happen in one quarter.
If you want your own team on the chart, public or private, we will do that. The methodology runs the same either way.