Navigara

github.com-microsoft-DeepSpeed

all · 6 devs · built 2026-06-13

Repository snapshot

Performance · 90d

5.0ETV

+27.0% since Q2 2025

ETV / dev / mo

0.9ETV

3 devs · 30-day window

Last 7 days · per dev

1.3ETV

per developer per week

Work mix

Growth24%· 4.3Maintenance57%· 10.4Fixes19%· 3.5
18.2 ETV all-time254 commits6 all-time devsJan 2025 – Jun 2026

Monthly reports

  • Highlights

    • Enabled `torch.func` transformations for DeepSpeed engine with *ZeRO stages 0, 1, and 2*, allowing advanced gradient computations via [4370aa50 · Sung Hyun Cho].
    • Introduced `engine.coalesce_grad_reduction()` for *ZeRO 1/2/3 multi-backward patterns*, significantly improving efficiency by coalescing gradient reductions via [60b242af · Sung Hyun Cho].
    • Added support for *bf16 optimizer states with CPU offload* for *ZeRO stages 1, 2, and 3*, reducing CPU host RAM usage by storing Adam moments in bf16 precision via [3c337b54 · lucaspirola].
    • Integrated an *SDMA allgather backend* for *AMD MI300 GPUs* in *DeepSpeed ZeRO-3*, optimizing parameter prefetch and speeding up training by 10-11% for MI300X hardware via [66af8f03 · inkcherry].
    • Implemented *automatic Sequence Parallelism (AutoSP) support for multimodal models* (ViT encoders and LLM decoders), reducing memory footprint for long sequence inputs via [4e668fce · nathon].
    • Streamlined setup for *DS4Sci EvoformerAttention* by *automating CUTLASS installation path discovery*, eliminating manual configuration via [d5356e07 · Max Tretikov].

    Observations

    • Total output (Grow + Maintenance) increased 87% compared to the 2-month average (current: 7, average: 4), indicating a highly productive month.
    • Maintenance score surged 128% compared to the 2-month average (current: 4, average: 2), reflecting a strong emphasis on improving existing systems, CI stability, and release management.
    • Commit volume saw a moderate increase of 22% (36 commits this month vs 30-commit 2-month average).
    • Grow and Waste scores remained stable compared to the 2-month average (current Grow: 1 vs average: 1; current Waste: 1 vs average: 1).
    • A significant number of bug fixes were implemented, addressing critical issues such as a *ZeRO-3 forward crash* on modules with plain dict `_parameters` [d7a3972f · Sung Hyun Cho], a *critical file descriptor leak* in `FastFileWriter` [b01a0915 · jg-heo], and a *command injection vulnerability* in `data_analyzer.py` [8cdf8651 · OrbisAI Security].
    • Multiple compatibility fixes were delivered, including enabling `vmap` for *LinearFunctionForZeroStage3* [ae576f83 · Sung Hyun Cho], fixing *DeepCompile AOT kwargs patching* for PyTorch >= v2.11 [510ebe58 · Masahiro Tanaka], and supporting *flash-attn 2.7.0* in FPDT attention [45429221 · bincheng.xiong].
    • CI/CD robustness was a recurring theme, with fixes for *PR-target workflow concurrency* [b7aef4dc · Masahiro Tanaka], *AutoSP compile test sequencing* [2c8a007b · Masahiro Tanaka], and *full CI test isolation* for ZeRO chmod and NVMe quantization [4570c508 · Masahiro Tanaka].

Performance over time

ETV stacked by Growth, Maintenance and Fixes — 90-day moving average, normalized to ETV / month.

Average performance per developer

ETV per active developer per month — 30-day moving average.

Active developers over time

Unique developers committing each day — 90-day moving average.

Knowledge concentration

How dependent is this repo on a small number of contributors? Higher top-1 share = higher key-person risk.

Top 1
34.6 %
Top 3
79.9 %
Top 5
96.1 %

Masahiro Tanaka owns 34.6 % of commits.

Most impactful commits

Top 20 by ETV in the all-time window.