Navigara
All Organizations

OpenAI — Engineering Performance

Avg. perf / dev / mo (ETV)

+71.0%

3.59 → 6.13

Active contributors

+2.7%

37.0 → 38.0

Growth

−7.6pp

48.2% → 40.6%

Fixes

−0.2pp

12.6% → 12.4%

OpenAI vs. 500 OSS Performance Index

Per-engineer ETV for OpenAI plotted against the pooled 500 OSS Performance Index. Both series are 90-day trailing rolling averages scaled to a 30-day month, so the curves sit on the same scale (ETV / dev / mo) and can be compared point-for-point. Latest reading: OpenAI is 216% above the index (6.13 vs 1.94 ETV/dev/mo). Baseline gap was 2% above.

Monthly reports

  • Highlights

    • Major *refactoring* efforts were undertaken to decouple the *plugin management subsystem* into a new `codex-core-plugins` crate, enhancing modularity and maintainability, as seen in [pr/20348].
    • Significant *security enhancements* were implemented, including the addition of *Windows Filtering Platform (WFP) filters* to the *Windows Sandbox* to reduce network egress vulnerabilities [pr/20101] and a denylist for dangerous project-level config keys [pr/20098].
    • The *TUI* underwent a multi-part *refactoring* to reduce its dependency on the core protocol, introducing new TUI-owned local models and streamlining command handling, initiated in [pr/20324], [pr/20173], [pr/20174], and [pr/20172].
    • New *APIs for workspace plugin sharing* were introduced, enabling users to save, list, and delete custom plugins, impacting `app-server-protocol` and `core-plugins` [pr/20278].
    • The *plugin ecosystem* was expanded with the addition of *Canva* and several *Microsoft-curated plugins* to the `tool_suggest` discoverable allowlist, enhancing integration options [pr/20474], [pr/20278], and [pr/20282].
    • Improvements to *CI/CD robustness* were made by increasing workflow timeouts for Windows release builds [pr/20343] and general Rust release builds [pr/6eab7519], ensuring more reliable artifact generation.

    Observations

    • Total output decreased by 14% (202 current vs 234 1-month average) compared to the 1-month average.
    • Grow score saw a 28% decrease (76 current vs 105 1-month average) from the 1-month average.
    • Maintenance score remained stable with a 2% increase (102 current vs 101 1-month average) compared to the 1-month average.
    • Waste score showed a positive trend with a 16% decrease (23 current vs 28 1-month average) from the 1-month average.
    • Commit volume decreased by 9% (728 current vs 796 1-month average) compared to the 1-month average.
    • Multiple critical bug fixes were addressed for *Windows sandbox* environments, including named-pipe access issues [06f3b483 · iceweasel-oai], pseudoconsole attribute handling [5cac3f89 · iceweasel-oai], and general process management edge cases [cecca5ae · iceweasel-oai].
    • Several bug fixes targeted *configuration management* and *feature gating*, such as ignoring dangerous project-level config keys [9ddb267e · Owen Lin], making missing config clears no-ops [a73403a8 · Eric Traut], and correctly gating *multi-agent v2* tools independently of `collab` [c37f7434 · jif-oai].
    • A bug fix for *network policy enforcement* correctly handles deferred network proxy denials, ensuring commands are terminated when network access is blocked [07c8b8c7 · viyatb-oai], which contributed 3 to the waste score.
    • A breaking change was introduced in the *realtime communication protocol* by renaming `session_id` to `realtime_session_id`, requiring client updates [8a97f3cf · Ahmed Ibrahim].
    • Fixes were implemented for *flaky tests* related to port fallback in the `codex-rs` login server [6014b667 · Owen Lin] and a CI regression impacting `apply_patch_cli` tests [c9f7c88f · Michael Bolin].

Repositories

Active repositories ranked by average performance per developer per month (over the last 90 days). The chart shows monthly performance composition — each repo as a stacked layer, with the top of the stack representing total org performance per month. Top 7 repos shown; the remainder is aggregated as “Other”.

Repository
openai-agents-js14113.6
+942%since Q2 2025
codex316036.5
+1367%since Q2 2025
openai-agents-python2325.3
+532%since Q2 2025
openai-dotnet4211.8
+311%since Q2 2025
plugins630.2
openai-node100.0
−100%since Q2 2025
Company total6 repositories
38unique devs
699ETV total
6.13ETV / dev / mo
+1124%since Q2 2025
Performance (ETV) is the sum of every repository above. Active devs at the company level counts unique contributors across all repos, so a contributor working in multiple repos is counted once here but appears in each repo's row (the per-repo column will sum higher). ETV / dev / mo = Company ETV ÷ unique devs ÷ 3 mo. The "Since start" column compares each repo's Q1 2026 quarterly performance to the first quarter it had any activity — for repos that existed in Q2 2025 (when this index began), that's Q2 2025; for younger repos it's the quarter they actually started. The company row uses Q2 2025 as the baseline since the index itself began then.

Performance Growth vs Active Contributors

Engineering performance is outpacing team growth by 12.3×. Left axis shows total performance score, right axis shows active contributor count. The gap between curves represents productivity gains — more delivered per person, not just more people. Unit: Engineering Throughput Value (ETV).

Cost per Performance Unit

−86%

If performance per engineer more than doubled, each unit of engineering performance now costs approximately 86% less than at the baseline 90-day window (ending 2025-06-29). This is a directional estimate — the exact figure depends on fully-loaded engineer cost, but the direction is unambiguous.

Effective Capacity Added

+225 engineers

At today's productivity, the current 38-person team delivers the performance equivalent of 263 engineers at the baseline 90-day rolling window (ending 2025-06-29). That's roughly 225 engineers worth of capacity added through productivity gains, not hiring.

Performance Composition

Stacked bars show total complexity performance split into Growth (new value), Maintenance (sustaining systems), and Fixes (rework). The yellow line overlays performance per contributor — rising line means each engineer is delivering more, regardless of team size changes. Unit: Engineering Throughput Value (ETV).

CapEx vs OpEx

Monthly CapEx vs OpEx split. CapEx (capitalizable investment) is Growth — new features and capabilities. OpEx (operating expense) is Maintenance plus Fixes — keeping the lights on and reworking what's already shipped. The yellow line is the CapEx share, a quick read on how much of the month went into building new vs sustaining existing. Unit: Engineering Throughput Value (ETV).

Hours per Repository

Trailing 90-day window (64 working days). Org-level capacity is allocated to each repo by its share of org performance, then split CapEx / OpEx by that repo's own Growth vs Maintenance + Fixes mix.

openai-dotnet74.6%25.4%
codex41.6%58.4%
plugins29.3%70.7%
openai-agents-js26.6%73.4%
openai-agents-python17.0%83.0%
openai-node0.0%0.0%
Total40.6%59.4%

Fix Burden Distribution

Monthly rework volume broken down by who did it. Top contributors carry their named slice; everyone else is rolled into Others. Use this to spot whether fix work is concentrated on a small group (bus-factor risk) or distributed across the team.

Fix authorship over time

Monthly fix activity in this scope, split by who fixed the bug. Darker emerald = the original author fixed their own bug; lighter emerald = someone else cleaned it up. A persistently high 'fixed by another author' share is a signal of bug-debt landing on the team rather than its author. X-axis is fix-time — bugs introduced but not yet detected don't appear.

Self-fix share
22%
Total bug impact
253.6

Quarterly Summary

The raw numbers behind the charts: commits analyzed, active contributors, total performance, performance per developer, and the Growth / Maintenance / Fixes split for each quarter.

Quarter
Q2'254481847.990.952.8%35.2%12%
Q3'251,09225160.912.143.2%45.8%11%+235%
Q4'251,09632198.112.146.7%42.5%10.8%+23%
Q1'262,50738587.595.246.9%40.3%12.8%+197%

Top Contributors

Contributors ranked by performance per month (Growth + Maintenance + Fixes), over the last 90 days normalized to a 30-day calendar month.

The best way to measure AI efficiency

Sample

A preview of the Navigara engine running on a sample organization. The numbers below are illustrative, not part of the OSS500 benchmark above.

Measure

Score every commit by depth

GitHub commits are weighted by what it took to write them, not by lines of code. The result is ETV per developer per month.

SourceGitHub

Spend

Tie ETV to cost

AI token bills and seat costs are pulled per team and divided by the ETV produced. The result is your true cost per unit of work.

SourceToken usage + finance

Map

Tie work to objectives

Each ETV is mapped to your Jira epics and labels, so you can see what's key-aligned, aligned, or unmapped capacity.

SourceJira

Performance

ETV delivered per developer / month

9.4ETV / dev / month

5.6 below target
target 15
Jun '25Dec '25May '26
  • Non-AI 5.8
  • AI 3.6

AI Efficiency

AI spend per ETV unit delivered

$4.20

$0.60 over target
target $3.60
Jun '25Dec '25May '26
  • Cost / ETV $4.20

Objective Alignment

Share of work mapped to key objectives

51%

24 pts below target
target 75%
Jun '25Dec '25May '26
  • Key-aligned 28%
  • Aligned 23%