Luke Sandberg
lukesandberg@users.noreply.github.com
90d · built 2026-05-28
90-day totals
- Commits
- 93
- Grow
- 11.8
- Maintenance
- 23.5
- Fixes
- 4.5
- Total ETV
- 39.8
Where this dev ranks
Percentile against the global top-100 leaderboard (all-time totals).
- By commits
- Top 71 %
- By Growth share
- Top 67 %
30-day trajectory
Last 30 days vs. the 30 days before. Up arrows on Growth and ETV mean improvement; up arrow on Fixes share means more time on fixes (worse).
Daily performance
Daily ETV, stacked by Growth, Maintenance and Fixes.
Work-mix over time
Share of Growth / Maintenance / Fixes over a rolling 7-day window. Reads as 'where is effort flowing right now'.
Bug flow over time
Monthly bug flow attributed to this developer. The left bar (red) is bug impact this dev authored that was addressed in the given month — combining bugs others fixed for them and bugs they fixed themselves. The right bar is fixes they personally shipped that month, split between self-fixes (overlap with the red bar) and fixes done for someone else. X-axis is fix-time, not introduction-time — the Navigara API attributes bugs backward to the author at the moment the fix lands.
- Self-fix share
- 17%
- Bugs you introduced
- 6.1
- Bugs you fixed
- 14.5
Repository spread
Where this developer's commits land. Concentrated work (top1 > 80%) vs polymath spread (top1 < 30%).
Most impactful commits
Top 20 by ETV in the 90-day window.
- 3.4ETVProof of concept: task eviction after snapshot for turbo-tasks-backend (#91790) > **Note:** This is a **proof of concept** implementation. It is not yet ready for production use. ## Summary Implements memory eviction for the turbo-tasks engine. After a persistence snapshot completes, tasks that are safe to remove are evicted from in-memory storage and transparently restored from disk on next access. ### Eviction levels - **Full eviction**: Entire task removed from the in-memory map (restored from disk on access). Only possible when the task has no meaningful transient state (and other state is already on disk) - **DataAndMeta eviction**: Both data and meta categories cleared, but the task stays in the map to preserve transient state (e.g. `current_session_clean`, aggregated session-clean counts). - **DataOnly eviction**: Only data-category fields cleared; meta (graph structure, output, dirty state) stays in memory. - **MetaOnly eviction**: Only meta-category fields cleared; data stays in memory. Data and meta evictability are computed independently — if one category is modified but the other is clean, the clean category can still be dropped. Eviction is gated behind `BackendOptions::evict_after_snapshot` (off by default), and can be enabled in Next.js via the `TURBO_ENGINE_EVICT_AFTER_SNAPSHOT=1` env var for testing. ## Key changes - **Orthogonal eviction decision tree** (`storage_schema.rs`): Data and meta evictability are computed independently. Full eviction additionally requires no meaningful transient state (session-clean flags, aggregated session-clean counts). Replaces the previous sequential bail-out approach which was too aggressive on full eviction (losing transient session state on leaf tasks) and not aggressive enough on partial eviction (blocking all eviction when only one category was modified). - **`drop_partial()` codegen** (`task_storage_macro.rs`): New generated methods to drop data - **`restore_from_*()` codegen changes** (`task_storage_macro.rs`): New semantics for merging persistent data from the backend with transient data stored in memory. - **`task_cache` moved into `Storage`** (`storage.rs`): The `CachedTaskType → TaskId` deduplication map was previously a separate field on `TurboTasksBackendInner`. It is now owned by `Storage` so eviction can remove entries when a task is fully evicted. Because `task_cache` is a pure performance cache (entries are re-populated by `task_by_type()` on miss once the task type is persisted to backing storage), evicting entries is safe. After bulk eviction the map is shrunk when it is less than half full. - **Parallel shard eviction** (`storage.rs`): Eviction iterates all storage shards in parallel after snapshot, applying the appropriate eviction level per task. Each shard is shrunk after bulk eviction to reclaim slack capacity. - In principle this is O(N) work to scan, but because each pass drops >98% of tasks there isn't wasted work and the logic is fast, taking <100ms for even the largest applications. ## Design notes - **SessionDependent tasks**: SessionDependent tasks can still be evicted but if `current_session_clean` is set we prevent full eviction to avoid rechecking. Within a session the file-watchers are responsible for invalidations after setting `current_session_clean`. ## Known limitations (proof of concept) - No LRU or access-frequency tracking — all eligible tasks are evicted on every snapshot cycle - No memory pressure feedback — eviction runs on a timer, not in response to actual memory pressure - Only runs after snapshotting which tends to be a high point in memory - Future work will explore interleaving this logic with snapshotting to trim the peak <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · d81d5ab7 · 2026-05-10
- 3.4ETVRemove a ton of turbo task functions from Issue trait items (#92623) ## What? Remove `#[turbo_tasks::function]` from all methods on the `Issue` value_trait, following the same pattern as #92593 which did the same for `ValueDebug`. The `Issue` trait previously had 8 methods decorated with `#[turbo_tasks::function]`, which caused the turbo-tasks machinery to generate `NativeFunction` statics, task registration, vtable entries, and scheduling/caching infrastructure for each method on each of the ~48 `impl Issue for` types in the codebase — most of which are trivially synchronous (returning a constant string, enum variant, or stored struct field). ## How? - Change all 8 `Issue` trait methods to plain methods. Methods that are always synchronous become direct return-type methods (`severity`, `stage`, `source`, `documentation_link`). Methods that are genuinely async on at least one impl use `Pin<Box<dyn Future<Output = Result<T>> + Send + '_>>` for object safety. - Callers that need to invoke trait methods on a `Vc<Box<dyn Issue>>` use `issue.into_trait_ref().await?` to get a `TraitRef<Box<dyn Issue>>` and call methods directly on that. - `PlainIssue::from_issue` (the sole hot caller that materializes all fields) resolves one `trait_ref` and calls all methods on it, similar to the `ValueDebug` pattern. - `ResolvingIssueWithLocation` (which wraps an inner `ResolvedVc<Box<dyn Issue>>`) eagerly copies `severity`, `stage`, `documentation_link`, and `source` at construction time rather than delegating async calls to the inner issue. - Delete `OptionIssueSource` and `OptionStyledString` wrapper types, which existed solely as `Vc<T>` return type wrappers for the old turbo-tasks methods. ## Tradeoffs **Lost per-call caching:** Previously each `Issue` method was a turbo-tasks task, so results were cached per `(Vc<Issue>, method)`. In practice this caching was useless — issues are emitted once and their fields read once by `PlainIssue::from_issue`, so there was never a second call to hit the cache. The only real cost is that `IssueFilter::matches` previously cached per `(filter, issue)` pair; that caching is preserved because `matches` itself remains a `#[turbo_tasks::function]` and now calls `into_trait_ref().await?` once at the top. **`ResolvingIssueWithLocation` eagerly reads inner fields:** Instead of lazily delegating to the inner issue, `stage` and `documentation_link` are copied at construction. This is a minor semantic change (the values are snapshotted rather than live), but in practice these fields are always constant values so it makes no difference. ## Binary size impact (darwin-arm64, compared against #92593) | Metric | #92593 baseline | This PR | Delta | |---|---|---|---| | Unstripped dylib | 124.6 MB | 124.0 MB | **-0.6 MB (-0.5%)** | | Stripped | 84.7 MB | 84.4 MB | **-0.3 MB (-0.4%)** | | Stripped + gzip (npm) | 29.2 MB | 29.1 MB | **-0.1 MB (-0.3%)** | Smaller savings than `ValueDebug` (~2.5 MB stripped) because there are ~48 `Issue` impls vs. one `ValueDebug` impl per every `#[turbo_tasks::value]` type. But real and cumulative. <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · 35b55826 · 2026-04-10
- 2.9ETVReimplement code frame rendering in native code (#85592) ## What Replaces the `@babel/code-frame` dependency with a new Rust-based implementation (`next-code-[frame](https://github.com/arthurprs/qfilter/pull/20#issuecomment-3986882055)` crate) for rendering code frames in error messages. ### Why - **Crash fix**: `@babel/code-frame` uses the `js-tokens` library for syntax highlighting, which has [known issues](https://github.com/lydell/js-tokens?tab=readme-ov-file#known-failures) with large string literals and long lines. This can cause Next.js to throw RangeErrors when rendering errors, hiding the original issue! - **Long line support**: The old implementation had no concept of terminal width, dumping entire lines into the output. The new implementation uses "horizontal scrolling" — truncating lines and centering the error location in the visible window. - **Performance**: The Rust implementation only processes the visible line range (typically ~6 lines), not the entire file. Syntax highlighting uses a skip-scan heuristic to start tokenizing near the visible window rather than from byte 0. - **Dependency reduction**: Drops the semi-unmaintained `@babel/code-frame` bundled dependency in favor of code we control. ### Benchmarks In-process benchmarks comparing `render_code_frame()` (Rust, via criterion) against `codeFrameColumns()` (Babel, via hrtime with DCE prevention). Both have syntax highlighting and color output enabled. No process startup or file I/O is included in the measurement. | Scenario | `next-code-frame` (Rust) | `@babel/code-frame` | Speedup | |---|---|---|---| | Small file (~490 lines TSX) | **5.4 µs** | 507 µs | **~94x** | | Large file (~39k lines JS) | **143 µs** | 82.9 ms | **~580x** | | Large file minified | **51 µs** | - | **-** | The gap widens with file size because Babel's `highlight()` runs a regex tokenizer over the **entire source** before slicing to the visible window, while the Rust implementation uses a windowed line index and skip-scan heuristic — only processing the visible window regardless of file size. However, for minified files we do end up tokenizing the whole thing so we end up being only 8x faster ### How - New `crates/next-code-frame/` Rust crate with: - Frame rendering with terminal-width-aware horizontal scrolling - Regex-based syntax highlighting (matching Babel's color scheme) - Skip-scan heuristic for O(1)-ish highlighting regardless of file size - Windowed line index that only scans/stores offsets for the visible region - Comprehensive test suite (800+ lines) - Exposed via both NAPI (native) and WASM bindings - JS wrappers in `packages/next/src/shared/lib/errors/`: - `code-frame.ts` — primary wrapper using native bindings - `optional-code-frame.ts` — graceful fallback returning `undefined` if bindings are unavailable - All existing callsites (`diagnosticFormatter`, `parseScss`, dev overlay, turbopack utils) updated - for `patch-error-inspect.ts` i adopted an `injection` style approach to avoid coupling to the native dependency ### Concerns / review focus areas - **Reliability**: The regex-based tokenizer is best-effort and language-agnostic — it should never crash on invalid syntax, but highlighting accuracy may differ from Babel's `js-tokens` in edge cases. - **Native dependency**: This moves code frame rendering into the native binary. Performance should be better, but worth verifying there are no regressions in environments where native bindings behave differently (e.g. WASM fallback path). - **Regressions**: The output format and color scheme closely match Babel's, but there may be subtle differences. The horizontal scrolling behavior is new. Fixes #85357 Closes PACK-5754github.com-vercel-next.js · ce14ca88 · 2026-03-03
- 2.2ETVturbo-tasks: task-storage memory wins (#93720) ## Summary Four small, independent changes that shrink `TaskStorage` and the data it owns: Recommend reviewing commit-by-commit 1. **`Arc<CachedTaskType>` → `triomphe::Arc<CachedTaskType>`.** `triomphe::Arc` is already a workspace dep used in `ReadRef` / `SharedReference`. `CachedTaskType` never appears in a `Weak<...>`, so we can drop the weak count and the CAS in `drop_slow`. Saves one `usize` per allocation. Migrated via a `CachedTaskTypeArc` newtype so the bincode `Encode`/`Decode` impls don't need to cross the orphan rule. 2. **Niche-encode `CellDependency`.** The `cell_dependencies` / `cell_dependents` sets used to hold `(CellRef, Option<u64>)` tuples — `Option<u64>` cost a full 16 B (8 B discriminant + 8 B value, aligned), making each element 32 B. A `CellDependency` enum with two variants (`All(CellRef)` / `Hash(CellRef, u64)`) lets the layout algorithm reuse the niche on `ValueTypeId` (`NonZero<u16>`) inside `CellRef.cell.type_id` for the variant tag. Element size drops 32 → 24 B; `LazyField` from 56 → 48 B. The same enum backs both forward and reverse edges — for `cell_dependents` we re-point `CellRef.task` at the dependent task. Added `CellDependency::into_parts()` and use it in `iter_cell_dependents` / `iter_cell_dependencies` hot loops so the discriminant is checked once instead of twice via back-to-back `cell_ref()` + `key()` calls. 3. **`TaskStorage::lazy: Vec<LazyField>` → `TinyVec<LazyField>`.** The lazy vec only ever holds ~25 elements (one per declared lazy field in the schema). Swapping `Vec`'s 24 B `(ptr, len, cap)` header for `(ptr, len: u8, cap: u8)` + 6 B padding gives 16 B. Drops `size_of::<TaskStorage>()` from 136 → 128 B. `TinyVec` is hand-rolled so I added a push/iter micro-benchmark to confirm it doesn't lose performance vs std `Vec`. Results below. 4. **Rightsize collections** → Explore the `AutoSet`/`AutoMap` types in storage_schema and ensure each one is maximally sized for its natural alignment. ## Benchmark results ### `next build` on a representative app (15 runs each, M4 Pro, `caffeinate -dimsu nice -n -20`) Fresh same-day baseline against branch: | metric | canary | branch | Δ | 95% CI | significant? | |---|---:|---:|---:|---|:---:| | wall time | 40.83s | 41.12s | +0.7% | [−1.07s, +1.64s] | no | | user time | 282.27s | 283.21s | +0.3% | [−1.02s, +2.89s] | no | | sys time | 69.38s | 71.26s | +2.7% | [−1.54s, +5.32s] | no | | **MaxRSS** | **12.47 GB** | **12.04 GB** | **−3.4%** | **[−0.48 GB, −0.38 GB]** | **yes** | **MaxRSS is the headline.** −0.43 GB on a 12.5 GB working set, with t=−17.86 (every branch run lower than every canary run, CV ≤ 0.6% on both sides). Wall / user / sys are all within noise — this PR is a memory win with no measurable timing impact. ### `TinyVec` vs `Vec` micro-bench (`turbo-tasks/benches/tiny_vec.rs`, 200 samples each) | n | Vec push | TinyVec push | Δ% | Vec iter | TinyVec iter | Δ% | |---:|---:|---:|---:|---:|---:|---:| | 0 | 1.31ns | 894ps | **−31.8%** | 598ps | 596ps | −0.4% | | 1 | 16.92ns | 14.75ns | **−12.9%** | 964ps | 952ps | −1.2% | | 4 | 17.93ns | 15.93ns | **−11.1%** | 1.49ns | 1.50ns | +0.5% | | 8 | 63.13ns | 45.24ns | **−28.3%** | 1.97ns | 1.96ns | −0.2% | | 16 | 97.35ns | 79.91ns | **−17.9%** | 3.16ns | 3.14ns | −0.5% | | 24 | 137.41ns | 119.88ns | **−12.8%** | 4.30ns | 4.30ns | +0.0% | TinyVec push is 11–32% faster than Vec push across all realistic sizes; iter is identical. Run with `cargo bench -p turbo-tasks --bench tiny_vec`. ### `task_overhead/turbo` Criterion bench (M4 Pro, `--sample-size 200`) | variant | dur | canary | branch | Δ | significant? | |---|---:|---:|---:|---:|:---:| | turbo-uncached | 1µs | 9.77 µs | 9.68 µs | −1.0% | yes | | turbo-uncached | 1000µs | 1.01 ms | 1.01 ms | −0.1% | yes | | turbo-cached-same-keys | 1µs | 198.6 ns | 191.9 ns | −3.4% | yes | | turbo-cached-same-keys | 100µs | 226.5 ns | 208.1 ns | −8.1% | yes | | turbo-cached-different-keys | 1µs | 233.8 ns | 224.1 ns | −4.2% | yes | | turbo-cached-different-keys | 100µs | 305.3 ns | 246.9 ns | −19.1% | yes | | turbo-uncached-parallel | 10µs | 1.63 µs | 1.54 µs | −5.8% | yes | | turbo-uncached-parallel | 100µs | 8.41 µs | 7.88 µs | −6.3% | yes | <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · 05553796 · 2026-05-20
- 1.9ETVRemove ineffective turbo-tasks (#91341) ## Remove ineffective turbo-tasks Identifies and removes turbo-tasks functions where the task overhead exceeds the value they provide. Each turbo-task carries ~4-6μs execution overhead per miss and ~200-500ns per cache hit, plus allocations and bookkeeping. ### What? Removes 22 `#[turbo_tasks::function]` implementations across resolve plugins, chunk items, and resolve-result helpers — converting them to plain methods or inlining their work. Changes fall into a few buckets: - **ResolvePlugin condition handling** (`AfterResolvePluginCondition::matches`, `BeforeResolvePluginCondition::matches`, `after_resolve_condition`, `before_resolve_condition`): conditions now store the resolved `Glob` as a `ReadRef<Glob>` on the plugin struct at construction, so `matches` is a pure sync function and the per-plugin `*_resolve_condition` getters are trivial field reads (no longer turbo-tasks). The `after_resolve` / `before_resolve` hooks themselves stay as `#[turbo_tasks::function]` — they synthesize virtual sources/modules and need memoization on `(self, lookup_path, reference_type, request)` to avoid distinct cells producing duplicate module-graph idents. - The basic theory here is that the right level of caching is at `resolve` and at the hook bodies themselves, not the conditions or condition getters. - `AfterResolvePluginCondition` and `BeforeResolvePluginCondition` are marked `serialization = "none"` because `ReadRef` cannot be persisted; plugin construction is cheap enough to re-derive on restore. - **ChunkItem trait methods** (`chunking_context`, `ty`, `content_with_async_module_info`): returned constants or simple field reads, zero cache hits and no `.await` calls (no invalidation value). - **ResolveResult / ModuleResolveResult helpers** (`primary_modules`, `first_module`, `first_source`, `primary_sources`, `is_unresolvable`, `primary_output_assets`): simple iterators over already-resolved data; converted to plain methods. Added a `Duplicate(usize)` variant to `ModuleResolveResultItem` to handle dedup at construction time instead of in a separate task. - The basic idea here is that it is reasonable to consume `ResolveResult/ModuleResolveResult` monolithically, and we get little to no benefit from fine grained access. e.g. `is_unresolved()` in theory that is a valuable turbotask, but since it rarely changes but generally if we change how we resolve an import then we have to regenerate code, so saving a few boolean conditions is unlikely to be very valuable. - Misc: `EcmascriptModuleAsset::analyze`, `is_types_resolving_enabled`, `next_server::resolve::condition`. ### Impact (vercel-site build, dev first-compile) | Metric | Before | After | Δ | |---|---:|---:|---:| | Total cache hits | 30,885,827 | 29,201,314 | −1,684,513 | | Total cache misses | 6,473,123 | 5,953,626 | **−519,497** | | Overall hit rate | 82.67% | 83.06% | +0.39 pp | | Registered task functions | 1,294 | 1,272 | −22 | The 22 removed tasks were collectively responsible for ~519K misses per build — each miss previously paying the full execution overhead. Most of the work from `EcmascriptModuleAsset::analyze` naturally migrated into `analyze_ecmascript_module` (the task it was wrapping; +129K hits there). ### On-disk cache size (persistent caching) Each removed task also stops allocating cache cells on disk. Measured on the same vercel-site build with `.next/cache/turbopack` (persistent cache enabled): | | Size | |---|---:| | canary | 2.56 GiB | | this branch | 2.46 GiB | | **saved** | **~100 MiB (−3.81%)** | ### Build-time wall clock and peak memory Ran `pnpm next build --experimental-build-mode=compile` 5 times on each branch **Peak RSS — clear reduction:** | | canary | branch | Δ | |---|---:|---:|---:| | min | 19.18 GiB | 18.94 GiB | | | **median** | **19.22 GiB** | **19.01 GiB** | **−217 MiB (−1.10%)** | | mean | 19.21 GiB | 19.02 GiB | −199 MiB (−1.01%) | | max | 19.23 GiB | 19.13 GiB | | Every branch run has lower RSS than every canary run — the distributions don't overlap. Welch's t = −6.03. **Wall time — no measurable change:** | | canary | branch | Δ | |---|---:|---:|---:| | min | 62.03s | 60.78s | | | **median** | **62.61s** | **62.65s** | **+0.04s (+0.06%)** | | mean | 62.83s | 63.80s | +0.96s (+1.53%) | | max | 64.25s | 68.23s | | | stddev | 0.84s | 3.42s | | Median is flat. The mean difference is within noise (Welch's t = +0.61, n = 5). Branch run-to-run variance is higher — one 68.23s outlier pulls the mean up — so this is neither a regression nor a measurable speedup at this sample size. <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · f0c1ffc4 · 2026-05-08
- 1.7ETVTurbopack: fix error reporting with crashing webpack loaders (#93926) ### What? When a Turbopack webpack-loader subprocess crashes (e.g. a loader calls `process.exit()`, a native fatal error, or the IPC socket otherwise closes mid-message), the error users see today is: ``` - Execution of <WebpackLoadersProcessedAsset as Asset>::content failed - Execution of WebpackLoadersProcessedAsset::process failed - Execution of evaluate_webpack_loader failed - failed to receive message - reading packet length - unexpected end of file ``` After this PR, the same crash produces: ``` ⨯ ./data/crash.data Error evaluating Node.js code Error: Node.js subprocess crashed while evaluating loaders [/path/to/loaders/crash-loader.js]: failed to receive message Caused by: - Node.js process exited with exit status: 7 - reading packet length - unexpected end of file Debug info: - failed to receive message - Node.js process exited with exit status: 7 Recent process stderr: <whatever the loader wrote to stderr before exiting> - reading packet length - unexpected end of file ``` ### Why? The original message gave no actionable information: no exit code, no captured stdout/stderr, no indication of which loader was running. It also looked like an internal turbopack bug rather than a user-fixable error, and a transient pool failure could cascade into an unrelated "issue formatter crashed while reading the source for a code frame" failure on the way out. ### How? Four orthogonal fixes, plus a regression test: 1. **Capture stdout/stderr on subprocess crash.** `OutputStreamHandler` now keeps a bounded ring buffer (last 100 lines per stream) shared with the owning `NodeJsPoolProcess`. When `NodeJsPoolProcess::recv` fails, the buffers and the child's exit status are attached to the error via `anyhow::Error::context`. 2. **Recover from subprocess crash in `pull_operation`.** Instead of propagating the recv error up through `evaluate_webpack_loader` → `process()` → `Asset::content` (the cascade above), `pull_operation` catches it, synthesizes a `StructuredError` via `evaluate_context.emit_error(...)`, disables process reuse, and returns `Ok(None)`. This mirrors the existing in-band loader-error path, so the asset's existing `FileContent::NotFound` degradation kicks in naturally — `Asset::content` never errors. 3. **Include the loader chain in the error message and issue detail.** `WebpackLoaderContext` gained a `loader_names: Vec<RcStr>` field. A new optional `EvaluateContext::crash_context_prefix()` trait method lets webpack-loader evaluations describe what was being evaluated (\"loaders [a, b, c]\") in the synthesized crash message. `EvaluationIssue` also gained an optional `detail` field for the same chain, surfacing it in `--log-detail` output. PostCSS evaluations are labelled \"postcss\". 4. **Crash-proof the issue formatter.** `PlainSource::from_source` and `IssueSource::into_plain` previously propagated errors from `asset.content()` with `?`. They now degrade to `FileContent::NotFound` (and `range = None`) on read failure, so a future regression in some other code path can never cause the issue reporter itself to crash on top of whatever the user was debugging. ### Tests - Added `test/e2e/app-dir/webpack-loader-errors/loaders/crash-loader.js`: a loader that writes a marker to stderr and calls `process.exit(7)`. - Added an e2e test that fetches `/crash` and asserts the marker, the absence of the internal cascade, the loader name, and the resource name are all present in the CLI output. - All 11 tests in `webpack-loader-errors.test.ts` pass; the 5 Rust `turbopack-node` pool tests still pass. Some snapshot/golden tests for error formatting may need updating in CI since `EvaluationIssue` now emits a non-empty `detail`. <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · d46516ce · 2026-05-20
- 1.6ETVTurbopack: simplify asset ident constructors (#93213) ### What? Removes the per-method turbo-task constructors on `AssetIdent` (`from_path`, `with_query`, `with_fragment`, `with_modifier`, `with_part`, `with_path`, `with_layer`, `with_content_type`, `with_asset`, `rename_as`, and `path`). Each of those was its own cached task that returned a small projection or a one-field-changed copy. They are now plain Rust builder methods on the owned value, with a single `into_vc()` at the end of the chain that goes through the existing cached `new_inner` constructor. Call sites that previously chained `Vc` methods now look like: ```rust module .ident() .owned() .await? .with_modifier(rcstr!("async loader")) .into_vc() ``` ### Why? These constructors were tiny "projection" turbo-tasks that paid the cost of a task lookup, cell allocation, and dependency tracking but whose cache layer didn't meaningfully prevent recomputation. The trade-off is invalidation semantics: - **Before:** a caller doing `module.ident().path()` depended on the cached `path()` projection. If the source `AssetIdent` changed but its `.path` field was unchanged (e.g. a new modifier was added), `path()` re-ran, returned the same `FileSystemPath` cell, and the caller did not re-run. - **After:** the same caller does `module.ident().await?.path` and depends directly on the `AssetIdent` cell. Any change to the ident (modifier, query, layer, …) invalidates the caller, even if the path is unchanged. In practice this is rarely a real loss: when an ident changes, the `Module` typically changes too, and the dependent task was going to re-run anyway. `new_inner` already deduplicates structurally-equal idents, so the wrappers were paying overhead per call without buying meaningful invalidation isolation. Measured on a `vercel-site` build via `NEXT_TURBOPACK_TASK_STATISTICS` and `turbopack/scripts/analyze_cache_effectiveness.py`: | Task | canary (hits / misses) | this branch | | --------------------------------- | ---------------------- | ----------- | | `AssetIdent::path` | 778,273 / 98,018 | removed | | `AssetIdent::with_modifier` | 27,895 / 22,801 | removed | | `AssetIdent::from_path` | 2,954 / 29,650 | removed | | `AssetIdent::with_part` | 2 / 5,440 | removed | | `AssetIdent::with_layer` | 7 / 4,356 | removed | | `AssetIdent::rename_as` | 4,969 / 2,269 | removed | | `AssetIdent::with_query` | 0 / 521 | removed | | `AssetIdent::with_content_type` | 0 / 79 | removed | | `AssetIdent::new_inner` | 628 / 129,777 | 29,213 / 120,650 | Aggregate over the whole build: - Total cached tasks: 1,300 → 1,292 - Total task invocations: 39,361,186 → 38,208,036 (~1.15M fewer lookups) - Total cache misses: 6,812,198 → 6,639,937 (~172k fewer) - Overall hit rate: 82.7% → 82.6% (essentially unchanged) `new_inner` absorbs the construction work that used to be split across the wrappers. Four upstream tasks gained +519 cache hits each (`EsmAssetReference::resolve_reference`, `ReferencedAsset::from_resolve_result`, `NextServerUtilityModule::ident`, `NodeJsChunkingContext::chunk_item_id_strategy`); no task gained any new misses. ### How? - `AssetIdent::from_path` and the `with_*` methods are now plain `&mut self`/`self`-by-value builder methods on the struct itself, not `#[turbo_tasks::function]`s. - A new `AssetIdent::into_vc(self)` finalizes the builder by going through the still-cached `new_inner`. - `AssetIdent::path()` is removed; callers use `.path` on an owned `AssetIdent`. - All call sites across `turbopack-*` and `next-*` crates are updated. Most go from `ident.with_modifier(m)` (returning `Vc`) to `ident.owned().await?.with_modifier(m).into_vc()`. - A follow-up commit removes a few `.clone()`s introduced in the conversion that aren't needed once lifetimes are bound to a local. ### Follow-ups (out of scope) While migrating call sites, two pre-existing entry builders surfaced as candidates for cleanup. Not addressed here, but worth noting: - `get_app_page_entry` (`crates/next-core/src/next_app/app_page_entry.rs`) replaces the *content* of the source returned by `load_next_js_template` (prefixing imports onto `result.build()`) but reuses the template's `ident` with a `?page=...` query suffix as a disambiguator. The new `VirtualSource` ends up with content from one place and an ident chain pointing at another. A cleaner shape would be to mint a fresh ident from the page path, since the caller already knows what it's building. - `create_page_ssr_entry_module` (`crates/next-pages/page_entry.rs`) has the same shape on the instrumentation-conflict branch: it appends `export const register = hoist(...)` to the template content and constructs a `VirtualSource` with the original `source.ident()` unchanged. Lower-frequency than the app-page case (fires at most once per build), but the ident still misrepresents the constructed content. <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · 0f38c522 · 2026-05-06
- 1.4ETVAdd support for multi-valued tables (#89728) ## What Add support for multi-valued tables in `turbo-persistence`. A multi-valued table allows multiple distinct values to be associated with a single key. Each family is independently configured as `SingleValue` (existing behavior) or `MultiValue` via the new `FamilyKind` enum. ## Why This will support the `TaskCache` table (implemented in #88904), where keys will change to be _hashes_ instead of full `TaskType` values. This greatly decreases DB size and speeds up queries due to smaller key sizes, at the cost of hash collisions requiring multiple values per key. ## How ### API - New `FamilyKind` enum (`SingleValue` / `MultiValue`) and per-family `FamilyConfig` in `DbConfig` - `get()` for single-valued families (panics if called on multi-valued) - `get_multiple()` for multi-valued families, returns `SmallVec<[ArcBytes; 1]>` — stack-allocated for the common 0–1 result case, heap-scales when needed - `put()` and `delete()` are unchanged — the family kind controls dedup/compaction behavior ### Write path & compaction - **Single-valued** (unchanged): last-write-wins per key - **Multi-valued**: all values are maintained, deletions 'shadow' old values. - Deletion inserts a tombstone that shadows all older values for that key across SST layers. Values written *after* the tombstone in the same batch are retained. - To avoid extra buffering logic the `MergeIter` semantics were changed so it produces 'newest' entries first * for SingleValues families this is no different since we only keep one value * for MultiValued families this makes dealing with tombstones trivial, but does mean compaction will reverse the order of the set. For this reason we make no guarantees about ordering. ### Read path - Controlled by a `FIND_ALL` const generic on the internal lookup methods - **Single-valued** (`FIND_ALL=false`): binary search, return _last_ match, stop * This fixes a bug where we might return Deleted when there is a value in the SST depending on what the search algorithm found first - **Multi-valued** (`FIND_ALL=true`): scan all matching entries in the SST block, then continue to older SSTs. If a tombstone is found, stop searching older layers. --------- Co-authored-by: Tobias Koppers <tobias.koppers@googlemail.com>github.com-vercel-next.js · 11823f84 · 2026-03-03
- 1.2ETV[turbopack] Optimize compaction cpu usage (#91468) ## Summary Optimizes the turbo-persistence compaction and iteration paths with several targeted improvements: ### Iterator optimizations - **Flatten index block iteration** — The iterator previously used a `Vec<CurrentIndexBlock>` stack, but SST files have exactly one index level. Inline the index block fields (`index_entries`, `index_block_count`, `index_pos`) directly into `StaticSortedFileIter`, eliminating the stack allocation and `Option` overhead. - **Non-optional `CurrentKeyBlock`** — Parse the first key block during `try_into_iter()` construction so `current_key_block` is always populated, removing the `Option<CurrentKeyBlock>` wrapper and its `take()`/`Some()` ceremony in the hot loop. - **Replace `ReadBytesExt` with direct byte indexing** — In `handle_key_match`, `parse_key_block`, and `next_internal`, replace `val.read_u16::<BE>()` etc. with `u16::from_be_bytes(val[0..2].try_into().unwrap())`. This eliminates th mutable slice pointer advancement. - **Extract `read_offset_entry` helper** — Read type + offset from the key block offset table in a single `u32` load + shift, replacing two separate `ReadBytesExt` calls. ### Refcounting optimization - **Introduce `RcBytes`** — Thread-local byte slice type using `Rc` instead of `Arc`, eliminating atomic refcount overhead during single-threaded SST iteration. The iteration path (`StaticSortedFileIter`) now produces `RcBytes` slices backed by an `Rc<Mmap>`, so per-entry clone/drop operations are plain integer increments rather than atomic operations. ### Merge iterator simplification - **Optimize `MergeIter::next`** — Replaced the straightforwards `pop/push` pattern with `PeekMut`-based replace-top pattern, which means we only need to adjust the heap once per iteration instead of twice. ## Benchmark results ### Compaction (`key_8/value_4/entries_16.00Mi/commits_128`) | Benchmark | Canary | Optimized | Change | |-----------|--------|-----------|--------| | partial compaction | 1.949 s | 1.515 s | **-22%** | | full compaction | 2.051 s | 1.542 s | **-25%** | ### Read path (`static_sorted_file_lookup/entries_1000.00Ki`) No read regression — the branch is neutral to slightly faster: | Benchmark | Canary | Optimized | Change | |-----------|--------|-----------|--------| | hit/uncached | 6.73 µs | 6.59 µs | **-2%** | | hit/cached | 140.8 ns | 130.7 ns | **-7%** | | miss/uncached | 5.10 µs | 5.02 µs | **-2%** | | miss/cached | 230.1 ns | 233.1 ns | ~+1% (noise) | ## Test plan - [x] `cargo test -p turbo-persistence` — 60/60 tests passing - [x] Compaction benchmarks run and compared against canary baseline - [x] Read path (lookup) benchmarks verified no regressiongithub.com-vercel-next.js · 8e9f9514 · 2026-03-20
- 1.2ETVturbo-persistence: streaming SST writer for reduced memory usage (#90617) ### What? Replace the two-pass SST file write approach with a single-pass `StreamingSstWriter` that writes blocks incrementally as entries arrive. Also replace the compaction `Collector`'s double-buffered `last_entries` pattern with a streaming collector that wraps the new writer. ### Why? The previous SST write path materialized all entries in memory (keys + values), then wrote all value blocks, then all key blocks. During compaction, the `Collector` maintained two full entry vectors (`entries` + `last_entries`) for size balancing. For large SST files (256MB+), this required holding hundreds of MB of entry data in memory simultaneously. The streaming approach reduces peak memory per writer from hundreds of MB to ~200KB by writing blocks to disk as soon as possible rather than buffering everything. ### How? **Phase 1: `StreamingSstWriter` in `static_sorted_file_builder.rs`** The key insight is that the SST reader is **block-index-addressed** — it locates blocks by index via the offset table, not by file position. So interleaving value blocks and key blocks in any order is fully compatible with the reader. The writer processes entries one at a time: - **Medium values** are written to disk immediately as individual blocks - **Small values** accumulate in a buffer and flush when reaching 8KB - **Key blocks** flush incrementally as the resolved boundary advances A `VecDeque<PendingKeyEntry>` holds entries waiting for their value blocks to be written. Each pending entry contains a `ValueRef` that tracks where its value lives: - `Small { block_index, offset, size }` — already resolved - `PendingSmall { small_block_id, offset, size }` — waiting for small block flush - `Medium { block_index }` — already resolved (written immediately) - `Inline`, `Blob`, `Deleted` — no value block needed The `first_pending_small_index` partitions `pending_keys` into resolved entries (front) and unresolved entries (back). Key blocks are flushed **incrementally** via `try_flush_key_blocks()` at the two points where this boundary advances: - When a small value block is flushed — resolves all `PendingSmall` entries, then calls `try_flush_key_blocks()` to flush any complete key blocks from the resolved region - In `advance_boundary_to()` — processes resolved entries and flushes key blocks via `try_flush_key_block()` when they exceed size or entry count limits This makes adding N entries **O(N) total** instead of O(N²) from the earlier per-call scan approach. **Key block flushing**: The `try_flush_key_block()` method combines the check-and-flush logic — it tests whether adding the next entry would exceed `MAX_KEY_BLOCK_SIZE` (16KB) or `MAX_KEY_BLOCK_ENTRIES`, and if so, flushes the accumulated key block. Entries sharing the same key hash are never split across blocks. **AMQF filter**: Created with `max_entry_count` (an upper bound), then shrunk via `shrink_to_fit()` on `close()` to reclaim unused capacity while preserving false positive rates. **Index block writing**: `IndexBlockBuilder` is generic over `Write` and writes directly to the `BufWriter` during `close()`, avoiding an intermediate buffer allocation. Index blocks use `write_raw_block_to_file` since they are never compressed. **File size**: Computed from the last block offset plus the offset table size, avoiding a `stream_position()` call (which would trigger an unnecessary flush + seek syscall). `write_static_stored_file()` is reimplemented as a thin wrapper around `StreamingSstWriter`, so existing callers (`write_batch.rs`, benchmarks) need zero changes. **Phase 2: Streaming compaction in `db.rs`** The compaction `Collector` is simplified to wrap an `Option<(u32, StreamingSstWriter)>`. The old pattern of maintaining `entries` + `last_entries` vectors with swap/early-flush/split logic (~100 lines) is replaced by direct calls to `writer.add()`. The collector tracks total key/value sizes and a `ValueBlockCountTracker` to know when to finish the current writer and start a new one. **Memory savings:** | | Previous (compaction) | Streaming | |---|---|---| | Entry storage | `entries` + `last_entries` (up to 2× full entry vecs) | ~800 pending key entries (~80KB) | | Value data | All values held in memory until written | Written immediately (medium) or after 8KB (small) | | Peak for 256MB SST | Hundreds of MB | ~200KB per writer | **Performance** Both compaction and write benchmarks are within the noise. It makes sense that this might be slightly slower due to some additional per-entry work, but writing is dominated by other factors `write` syscalls, compression and memcopies into temporary buffers. **Files changed:** - `turbopack/crates/turbo-persistence/src/static_sorted_file_builder.rs` — `StreamingSstWriter` with incremental key block flushing via `try_flush_key_block()`, `PendingKeyEntry`, `ValueRef`; generic `IndexBlockBuilder<W: Write>`; `write_static_stored_file` as wrapper; AMQF `shrink_to_fit()` on close - `turbopack/crates/turbo-persistence/src/db.rs` — Replaced compaction `Collector` with streaming version - `turbopack/crates/turbo-persistence/src/lib.rs` — Export `StreamingSstWriter` - `turbopack/crates/turbo-persistence/src/value_block_count_tracker.rs` — Removed dead `is_half_full()`/`reset_to()` methods **Verification:** - `cargo clippy -p turbo-persistence` — zero warnings - `cargo test -p turbo-persistence` — all 46 tests pass - `cargo fmt -p turbo-persistence -- --check` — clean - Criterion benchmarks show no performance regressions (write + compaction within noise)github.com-vercel-next.js · 28df39ba · 2026-02-28
- 1.1ETVturbo-tasks-backend: fix snapshot coordination races + extract SnapshotCoordinator (#93416) Hold a lock while persisting so two snapshots cannot execute concurrently * Currently if `stop` is called while an idle snapshot is running, then snapshotting can race with itself, this can corrupt the use of the `in_progress_operations` parameter since two threads will `fetch_or` with it and wait for the bit to be cleared Abort the process if `panic!` occurs during task spawning * Currently if `try_start_task_execution` panics it ends up hanging a task which can deadlock the process, in this case we have no better option than to just log and abort. * I considered strategies that would 'poison' the task or possibly all of turbo-tasks and this is attractive but i believe fundamentally unsafe, the most likely cause of these panics is something wrong with state tracking in the backend, so exiting is all we can dogithub.com-vercel-next.js · 2e1e5958 · 2026-05-04
- 1.0ETV[turbopack] Remove `turbo_tasks::function` from ModuleReference getters (#91229) ### What? Refactors the `ModuleReference` trait to make `chunking_type()` and `binding_usage()` methods return direct values instead of `Vc<T>` wrapped values, removing the need for async task functions. Also removes the `get_referenced_asset` task from `EsmAssetReference`, inlining its logic into the callers. ### Why? This change simplifies the API by eliminating unnecessary async overhead for methods that typically return simple, computed values. The previous implementation required `#[turbo_tasks::function]` annotations and `Vc<T>` wrappers even when the methods didn't need to perform async operations or benefit from caching. ### Impact | Metric | Base | Change | Delta | |--------|------|--------|-------| | Hits | 35,678,143 | 35,845,124 | **+166,981** | | Misses | 9,418,378 | 7,910,986 | **-1,507,392** | | Total | 45,096,521 | 43,756,110 | **-1,340,411** | | Task types | 1,306 | 1,277 | **-29** | 29 task types were removed, eliminating **2.6M total task invocations** (1.1M hits + 1.5M misses): - **`chunking_type`** — 21 task types removed across all `ModuleReference` implementors (~952k invocations) - **`binding_usage`** — 6 task types removed (~527k invocations) - **`BindingUsage::all`** — helper task removed (~36k invocations) - **`EsmAssetReference::get_referenced_asset`** — removed and inlined (~1.08M invocations: 628k hits + 451k misses) The removed `get_referenced_asset` hits reappear as +628k hits on `EsmAssetReference::resolve_reference` and `ReferencedAsset::from_resolve_result` (with zero increase in misses), confirming the work is now served from cache through the existing callers. No tasks had increased misses — the removal is clean with no cache invalidation spillover. I also ran some builds to measure latency ``` # This branch $ hyperfine -p 'rm -rf .next' -w 2 -r 10 'pnpm next build --turbopack --experimental-build-mode=compile' Benchmark 1: pnpm next build --turbopack --experimental-build-mode=compile Time (mean ± σ): 52.752 s ± 0.658 s [User: 376.575 s, System: 106.375 s] Range (min … max): 51.913 s … 54.161 s 10 runs # on canary $ hyperfine -p 'rm -rf .next' -w 2 -r 10 'pnpm next build --turbopack --experimental-build-mode=compile' Benchmark 1: pnpm next build --turbopack --experimental-build-mode=compile Time (mean ± σ): 54.675 s ± 1.394 s [User: 389.273 s, System: 114.642 s] Range (min … max): 53.434 s … 58.189 s 10 runs ``` so a solid win of almost 2 seconds MaxRSS also went from 16,474,324,992 bytes to 16,359,309,312 bytes (from one measurement) so a savings of ~100M of max heap size. ### How? - Changed `chunking_type()` method signature from `Vc<ChunkingTypeOption>` to `Option<ChunkingType>` - Changed `binding_usage()` method signature from `Vc<BindingUsage>` to `BindingUsage` - Removed `ChunkingTypeOption` type alias as it's no longer needed - Updated all implementations across the codebase to return direct values instead of wrapped ones - Removed `#[turbo_tasks::function]` annotations from these methods - Updated call sites to use `into_trait_ref().await?` pattern when accessing these methods from `Vc<dyn ModuleReference>` - Removed `EsmAssetReference::get_referenced_asset`, inlining its logic into callers - Added validation for `turbopack-chunking-type` annotation values in import analysis - Fixed cache effectiveness analysis scriptgithub.com-vercel-next.js · 236a76dd · 2026-03-13
- 1.0ETV[turbopack] Unify Cell Storage (#92974) ## What Tightens the value-type persistence API and sets the table for a future eviction policy. Two user-visible changes on the `#[turbo_tasks::value(...)]` macro: - **`serialization = "none"` → `serialization = "skip"`** — imperative ("skip persisting") instead of descriptive. Making it clear that it isn't that we are missing a feature but rather that we are choosing to not persist (of course persisting might be impossible but that is generally rare) - **New `evict = "always" | "last" | "never"` attribute** — replaces the old overloaded `"none"` semantic. Only valid with `serialization = "skip"`. Defaults to `"always"`. Internally this collapses `persistent_cell_data` + `transient_cell_data` into one `cell_data` map and replaces the old `bincode: Option<(enc, dec)>` field with a four-variant `ValueTypePersistence` enum. Eviction machinery itself is a follow-up PR; this PR just gives each value type a precise, queryable persistence/eviction descriptor. ## Why break out a new parameter? `serialization = "none"` on canary conflated three different intents: 1. **Cheap recomputable outputs** (SWC ASTs, codegen `Rope`s) — fine to evict, recompute is cheap. 2. **Expensive recomputable outputs** (WASM modules, Node process pools) — re-derivable but costly. 3. **Session-scoped state** (`State<>` cells, `Arc<Mutex<_>>` dedup histories) — can't be recomputed without losing accumulated mutations. They all produced identical runtime behavior (stored in transient_cell_data), so eviction can't tell them apart. The fix is two orthogonal attributes: ```rust // A cheap skip — default evict = "always" #[turbo_tasks::value(serialization = "skip")] // Expensive recompute — evict last under pressure #[turbo_tasks::value(serialization = "skip", evict = "last")] // Session-scoped state — never evict #[turbo_tasks::value(serialization = "skip", evict = "never")] ``` The macro rejects `evict` on any other `serialization` mode. ## `ValueTypePersistence` enum Replaces `ValueType.bincode: Option<(enc, dec)>`: ```rust pub enum ValueTypePersistence { Persistable(AnyEncodeFn, AnyDecodeFn<SharedReference>), // auto, custom SkipPersist { expensive: bool }, // skip (+ evict = last) HashOnly, // hash SessionStateful, // skip + evict = never } ``` The existing `"hash"` mode gets its own `HashOnly` variant rather than being folded into `SkipPersist`, which lets the backend gate its hash-writing and hash-comparison paths precisely. ## Unified `cell_data` storage `persistent_cell_data: AutoMap<CellId, TypedSharedReference>` + `transient_cell_data: AutoMap<CellId, SharedReference>` collapse into `cell_data: CellData`. `CellData` is a newtype over `AutoMap<CellId, SharedReference>` with a custom bincode impl that filters non-`Persistable` entries at encode time. This removes the `is_serializable_cell_content: bool` parameter that was threading through ~14 read/write call sites. Uses `SharedReference` instead of `TypedSharedReference` — `CellId` already carries the `ValueTypeId`. ## Annotation sweep All prior `serialization = "none"` sites move to either `serialization = "skip", evict = "never"` or `serialization = "skip", evict = "last"` based on a per-site audit. Summary: **`evict = "last"` (6 sites)** — re-derivable but expensive: - `SwcPluginModule`, `EvaluatePool`, `ChildProcessPool`, `WorkerThreadPool`, `EffectInstance`, `Effects` **`evict = "never"` (2 sites)** — interior-mutable state accumulated across the session: - `ConsoleUi` (`Arc<Mutex<SeenIssues>>`), `VersionState` (`State<VersionRef>` with HMR invalidators) The distinguishing rule: `evict = "never"` only when the value holds interior mutability accumulated across the session. Everything else can be re-derived (possibly expensively) by re-running the producing task. ## Follow-ups (separate PRs) - Wire an eviction policy that consumes `ValueTypePersistence` — respects `SessionStateful` (never evict), prefers cheap `SkipPersist` over `expensive: true` ones. Either as part of #91790 or afterwards depending on when things land <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · d069c2e1 · 2026-04-21
- 0.9ETVturbo-tasks: Reduce allocations on cache hits (#92756) ### What? Reduce heap allocations when turbo-tasks functions get cache hits (~85% of calls). ### Why? Every turbo-tasks function call (generated by `#[turbo_tasks::function]`) was boxing its arguments into `Box<dyn MagicAny>` before looking up the task cache. This allocation is wasted on cache hits, which are the overwhelmingly common case. ### How? **Deferred boxing via `StackMagicAny` trait object:** Introduce a `StackMagicAny` trait that abstracts over a stack-resident `Option<T>`: - `as_ref(&self) -> &dyn MagicAny` — borrow the argument for hash/equality (cache lookup) - `take_box(&mut self) -> Box<dyn MagicAny>` — move the value to the heap (zero clones) - `as_any_mut(&mut self) -> &mut dyn Any` — downcast to concrete type without boxing The data flow: 1. **Callsite** (macro-generated): creates `StackMagicAnySlot::new((args...))` on the stack, calls `dynamic_call(..., &mut arg)` 2. **`dynamic_call`**: checks resolution via `arg.as_ref()`, routes to `native_call` (resolved) or boxes via `arg.take_box()` for `LocalTaskSpec` (unresolved) 3. **Backend `get_or_create_task_inner`**: does a read-only `raw_get` lookup using `hash_from_components` + `eq_components` with the borrowed `&dyn MagicAny`. On cache hit (~85%), returns immediately — **zero heap allocation**. On cache miss, re-checks under write lock using the same borrowed reference, and only calls `arg.take_box()` in the vacant entry case (true cache miss). Boxing is now deferred past all of these: - **Memory cache hit** — the common case, no allocation at all - **Backing storage hit** — found in persistent storage, no allocation needed - **Lost race under write lock** — another thread inserted while we upgraded; we use their task_id, still no allocation - **Trait method dispatch (no filtering)** — `filter_owned` is now `Option<FilterOwnedArgsFunctor>`; when `None` (the common case where all args are used), the original `&mut dyn StackMagicAny` passes straight through to `dynamic_call` without boxing **Optimized `filter_owned` for traits:** When trait methods *do* need argument filtering (unused `_`-prefixed parameters), the old path did `take_box()` → `downcast_args_owned()` → dereference → repack. This is an unnecessary heap round-trip. The new `downcast_stack_args_owned()` function uses `as_any_mut()` to downcast directly to `&mut StackMagicAnySlot<T>` and calls `take()` on the inner `Option`, skipping the intermediate `Box` entirely. **Additional changes:** - `Backend::get_or_create_*_task` now takes decomposed parameters (`native_fn`, `this`, `&mut dyn StackMagicAny`) instead of a pre-constructed `CachedTaskType` - Persistent and transient task creation merged into a shared `get_or_create_task_inner(transient: bool)` - `connect_child` uses eagerly-set `persistent_task_type` from `initialize_new_task` - `OwnedMagicAny` adapter wraps already-boxed args (from async resolution tasks) to fit the `StackMagicAny` interface - Both `dynamic_call` and `trait_call` take `&mut dyn StackMagicAny` (trait dispatch also benefits) - `CachedTaskType::hash_encode` now delegates to `hash_encode_components` (deduplicated) - Removed `try_native_call`, `native_call_if_consistent`, `try_get_or_create_*` — the deferred boxing approach subsumes these ### Binary size Binary size is neutral (linux-x86_64, `--release`, stripped + gzipped: 30.9 MB on both canary and this branch). ### Overhead benchmark (turbo-tasks-backend, median, lower is better) Measured on an isolated Firecracker microVM (linux-x86_64). Variance is nontrivial on this environment, but the direction is consistently positive across all turbo-tasks benchmarks. | Benchmark | Canary | This PR | Delta | |---|---|---|---| | `turbo-cached-same-keys/1` | 512.6 ns | 490.2 ns | -4.4% | | `turbo-cached-same-keys/10` | 493.4 ns | 483.8 ns | -2.0% | | `turbo-cached-same-keys/100` | 502.1 ns | 484.9 ns | -3.4% | | `turbo-cached-same-keys/1000` | 1.31 µs | 774.2 ns | -41.0% | | `turbo-cached-different-keys/1` | 1.02 µs | 1.00 µs | -1.4% | | `turbo-cached-different-keys/10` | 1.13 µs | 1.11 µs | -1.8% | | `turbo-cached-different-keys/100` | 1.29 µs | 1.24 µs | -3.8% | | `turbo-cached-different-keys/1000` | 2.37 µs | 2.32 µs | -2.5% | | `turbo-uncached/1` | 23.43 µs | 20.47 µs | -12.6% | | `turbo-uncached/10` | 32.23 µs | 29.71 µs | -7.8% | | `turbo-uncached/100` | 126.01 µs | 124.95 µs | -0.8% | | `turbo-uncached/1000` | 1.079 ms | 1.048 ms | -2.9% | | `turbo-uncached-parallel/1` | 6.16 µs | 5.67 µs | -8.0% | | `turbo-uncached-parallel/10` | 5.47 µs | 5.12 µs | -6.4% | | `turbo-uncached-parallel/100` | 14.65 µs | 14.55 µs | -0.7% | | `turbo-uncached-parallel/1000` | 132.09 µs | 129.13 µs | -2.2% | <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · 31a1b635 · 2026-04-16
- 0.8ETVOptimize how we track data for persistence (#89370) ## Summary Reworks how the turbo-tasks backend tracks modified tasks for persistence snapshots, reducing overhead and simplifying the snapshot lifecycle. **Key changes:** - **Replace `modified` DashMap with per-shard atomic counters + inline flags.** Instead of maintaining a separate `FxDashMap<TaskId, ModifiedState>` that mirrors every modification, track modifications via flags already on `TaskStorage` and use per-shard `AtomicU64` counters to skip unmodified shards during snapshot iteration. This eliminates a major source of memory overhead. - The downside here is needing to scan the map. per shard counters enable early exits but we will still need to scan entire shards. For a large site that means scanning thousands of - **Merge task cache writes into the snapshot pipeline.** New tasks now carry a `new_task` flag and their type hash in `SnapshotItem`, so task cache entries are written in the same batch as task data/meta — removing the separate `persisted_task_cache_log` (`Sharded<ChunkedVec<...>>`) and its associated locking. - **Remove `local_is_partial` optimization.** The backing storage layer already short-circuits on empty databases, and new tasks eagerly set `restored` flags at allocation time, making this redundant. - **Simplify `end_snapshot`.** Instead of a multi-pass retain/iterate/update cycle over the `modified` map, `end_snapshot` now just drains the small `snapshots` map (only tasks concurrently accessed during snapshot mode) and promotes their `modified_during_snapshot` flags. - **Delete unused utilities.** Removes `Sharded`, `ChunkedVec` (from backing_storage), and `swap_retain` import now that they're no longer needed. **Other cleanups:** - `initialize_new_task` sets restored + new_task flags at allocation time for both persistent and transient tasks - Fuzz test updated to use `active_tracking: true` and `StorageMode::ReadWrite` - New KV storage test for batch write+flush+reopen pattern - Minor fix: `SmallVec::into_boxed_slice()` instead of `into_vec().into_boxed_slice()` ## Build Benchmark Results Measured over 9 runs (1 warm-up discarded), macOS, `TURBOPACK_PERSISTENT_CACHE=1`. ### Cold build (`rm -rf .next/`) | | Time (avg) | Time (stddev) | MaxRSS (avg) | |---|---|---|---| | HEAD | 75.48s | 0.72s | 24,231 MiB | | This PR | 75.20s | 1.98s | 23,829 MiB | | **Delta** | −0.28s | — | **−402 MiB (−1.7%)** | ### Warm build (single file edit) | | Time (avg) | MaxRSS (avg) | |---|---|---| | HEAD | 23.79s | 8,764 MiB | | This PR | 23.67s | 8,766 MiB | | **Delta** | −0.12s | flat | Cold build time difference is within noise (< 1 stddev). The meaningful improvement is a **~400 MiB reduction in peak memory on cold builds**, consistent with the fixed overhead this PR targets. Warm builds are unaffected as expected.github.com-vercel-next.js · 29045997 · 2026-04-07
- 0.8ETV[turbopack-trace-server] optimize loading (#93264) Land a few optimizations to the trace server * Change `SpanEvent` so it is 32 bytes instead of 40 bytes by triggering a `niche` optimization * Change `args` and `events` to be a `smallvec` with inline size 1 * for `args` it is size <=1 ~31% of the time * for `events` it is size <=1 69% of the time * Compute min/max timestamps in a single pass instead of 2 when inserting into the selftimetree * Bundle dynamically computed 'total' fields behind a single OnceLock * saves 40 bytes per span due to `Oncelock` overheads * Inline SpanTimeData and SpanNames into Span * We get little benefit from deferring the allocations and by inlining we save time and improve memory locality. * post load SpanTimeData is allocated for 94% of spans, but after loading `trace.nextjs.org` it is 100% * post load SpanNames is allocated for 0% of spans, but after loading it is 96.2% of spans * Remove the `inner` `OnceLocks` from `SpanNames` we can just allocate these all together Measuring with one 10gb trace file I see loading times progress from 75.7s (33G of ram) to 60.5s (19.5G of ram). With loading times hitting >200mb/s occasionallygithub.com-vercel-next.js · 8dee7acb · 2026-04-28
- 0.7ETVsimplify session dependent tasks and add TTL support (#91729) ## Summary Three related improvements to turbo-tasks session-dependent task handling: ### 1. Make `session_dependent` a function attribute Previously, tasks called `mark_session_dependent()` at runtime to flag themselves. This PR makes it a compile-time `#[turbo_tasks::function(session_dependent)]` attribute instead. Benefits: - **Enables eager aggregation number selection**: Session-dependent tasks change on every session restore, behaving like dirty leaf nodes. By knowing at task creation time (not mid-execution) that a task is session-dependent, the backend can assign a high initial aggregation number, preventing long dirty-propagation chains through intermediate aggregated nodes. - **Simpler API**: No runtime `mark_session_dependent()` call needed — the attribute is declarative and statically checked. - Removes `mark_session_dependent()`, `mark_own_task_as_session_dependent()`, and the `session_dependent` field from `InProgressStateInner`. The backend now reads `is_session_dependent` directly from the `NativeFunction` metadata via `TaskGuard::is_session_dependent()`. ### 2. Fix fetch to respect HTTP `Cache-Control` headers Previously, `fetch` results were cached indefinitely meaning results would never be refreshed (unless the cache was invalidated). Now they are `session_dependent` with a TTL to ensure we respect the http settings (e.g. Google Fonts with `max-age=86400`). New two-task pattern: - **`fetch_inner`** ( NOT `session_dependent`): Performs the HTTP request, grabs an `Invalidator` for itself, and returns the response + invalidator + absolute deadline. Cached across sessions. - **`fetch`** (`network`, `session_dependent`): Reads the cached `fetch_inner` result and spawns a timer to invalidate when the TTL expires. On warm cache restore, `fetch` re-executes (session-dependent), reads the persisted deadline from `fetch_inner`'s cached result, computes remaining TTL, and spawns a timer — no HTTP request unless the TTL has already expired. Mid-session, the timer fires and triggers a re-fetch. Error handling: On fetch failure, `fetch_inner` takes a dependency on `Completion::session_dependent()` so transient errors (network down, DNS failure) are retried on the next session without busy-looping. ### 3. Drop `TransientState` in favor of inline solution `TransientState` was only used in one place (`EcmascriptModuleAsset::last_successful_parse`) and brought unnecessary complexity — it registered invalidators and called `mark_session_dependent()` on every read. Replaced with a simple `TransientCache<T>` local to `turbopack-ecmascript` that is just a `Mutex<Option<T>>` with `#[bincode(skip)]`. ## Test Plan - 3 new integration tests in `turbo-tasks-fetch/tests/fetch.rs`: - `ttl_invalidates_within_session` — mock server returns `max-age=1`, body changes, verifies re-fetch after TTL - `ttl_invalidates_on_session_restore` — fetches with TTL, stops TT, waits for expiry, warm restores with new TT, verifies re-fetch - `errors_retried_on_session_restore` — server returns 500, stops TT, fixes server, warm restores, verifies success - Existing 6 fetch tests continue to pass - `cargo check -p turbo-tasks -p turbo-tasks-backend -p turbo-tasks-fetch`github.com-vercel-next.js · fe99f0d6 · 2026-03-31
- 0.6ETV[turbopack] Add support for fixed key blocks (#90844) ## What? Adds fixed-size key block types to turbo-persistence SST files. When all entries in a key block share the same key size and value type, the 4-byte-per-entry offset table is eliminated entirely. Entry positions become a direct arithmetic calculation: `header_size + index * stride`. Also fixes a bug in `sst_inspect` where value blocks were misidentified as key blocks (value blocks have no type header, so their raw data could coincidentally match a key block type byte). ## Why? Many turbo-persistence tables have uniform key and value sizes. For example, TaskCache stores 4-byte task ID keys with 4-byte inline values — every single entry is identical in structure. The existing variable-size key block format stores a 4-byte offset table entry per entry (1B type + 3B position) to support variable-length entries. For uniform blocks, this offset table is pure overhead in both space and read-path indirection (binary search must chase offset table pointers at each step). ## How? **New block types 3 and 4** (parallel to existing types 1/2 for hash/no-hash): ``` Variable: [1B type][3B count][4B offset × count][entries...] Fixed: [1B type][3B count][1B key_size][1B value_type][entries...] ``` The 2 extra header bytes (key_size + value_type) are amortized across all entries since we save 4 bytes per entry from removing the offset table. Break-even at 1 entry. **Writer changes:** - `KeyBlockFormat` enum with state machine (`Unknown → Fixed → Variable`) tracks uniformity as entries are added to the accumulator - `FixedKeyBlockBuilder` writes the compact 6-byte header + contiguous entries with no offset table - Falls back to variable-size automatically when entries aren't uniform - `ValueRef::write_value_to()` shared across both builder types to avoid duplication **Reader changes:** - `lookup_fixed_key_block()` — binary search using stride arithmetic (pure arithmetic, zero conditional branching per step) - `get_fixed_key_entry()` — direct index calculation instead of offset table indirection - Iterator refactored with `CurrentKeyBlockKind` enum (Variable vs Fixed variants) **sst_inspect fix:** Reads the index block first to determine which block indices are key blocks, rather than guessing from the first byte of block data. ### Real-world impact (vercel-site .next cache, ~9.5M entries per family) | Family | Fixed Blocks | Variable Blocks | Key Block Size | Notes | |--------|-------------|----------------|---------------|-------| | **TaskCache** | **16,274 (100%)** | 0 | **108.82 MB** (was 145 MB, **-25%**) | All inline 4B values | | TaskMeta | 10,078 (72%) | 3,877 (28%) | 118.86 MB (was ~145 MB) | Variable blocks contain rare medium values | | TaskData | 39 (0.3%) | 13,943 (99.7%) | 144.45 MB | Medium values spread across most blocks | | Infra | 0 | 1 | 25 B | Mixed inline sizes | This also saves 73M (2%) of the overall cache size ### Read-path benchmarks (`static_sorted_file_lookup`, 8B keys + 4B values) | Benchmark | Canary | Fixed Blocks | Change | |---|---|---|---| | **1Ki hit/cached** | 534 ns | 510 ns | **-4.5%** | | **10Ki hit/cached** | 559 ns | 556 ns | ~0% | | **100Ki hit/cached** | 566 ns | 503 ns | **-11.1%** | | **1Mi hit/cached** | 573 ns | 519 ns | **-9.4%** | | 1Ki miss/cached | 571 ns | 571 ns | ~0% | | 10Ki miss/cached | 613 ns | 556 ns | **-9.3%** | | 100Ki miss/cached | 639 ns | 593 ns | **-7.2%** | | 1Mi miss/cached | 791 ns | 702 ns | **-11.3%** | | 1Ki hit/uncached | 4.07 µs | 3.81 µs | **-6.4%** | | 10Ki hit/uncached | 5.33 µs | 5.03 µs | **-5.6%** | | 100Ki hit/uncached | 5.55 µs | 5.24 µs | **-5.6%** | | 1Mi hit/uncached | 8.47 µs | 8.31 µs | **-1.9%** | | 1Ki miss/uncached | 3.37 µs | 3.19 µs | **-5.3%** | | 10Ki miss/uncached | 3.87 µs | 3.41 µs | **-11.9%** | | 100Ki miss/uncached | 4.15 µs | 3.70 µs | **-10.8%** | | 1Mi miss/uncached | 6.98 µs | 6.66 µs | **-4.6%** | Consistent **5-11% improvement** on cached lookups (the hot path in production), where the binary search dominates and offset table elimination matters most. Uncached lookups also improve 2-12% due to smaller blocks and faster binary search.github.com-vercel-next.js · f1f83236 · 2026-03-10
- 0.6ETV[turbopack] fix feature usage telemetry (#93100) ## Report Turbopack feature-usage telemetry Turbopack never reported `NEXT_BUILD_FEATURE_USAGE` telemetry for production builds. This PR wires it up and fixes a correctness bug in how the counts were computed, then cleans up the API surface that carried them across the napi boundary. ### Changes - **JS**: `turbopackBuild()` now records `EVENT_BUILD_FEATURE_USAGE` events after `writeAllEntrypointsToDisk` via a new `eventBuildFeatureUsageFromTurbopackDiagnostics` helper. Dev is out of scope — webpack's `TelemetryPlugin` is `!dev && isClient` too. - **Rust**: aligned feature names with the JS `EventBuildFeatureUsage['featureName']` union — SWC triple is now `swc/target/<triple>`; dropped `persistentCaching` (redundant with `turbopackFileSystemCache`) and `turbotrace: false` (hardcoded). ### Correctness fix: count unique importers, not resolves Previously feature-usage counts for module imports (`next/image`, `next/font/google`, …) were computed from a `BeforeResolvePlugin` that emitted one event per resolve. Turbopack caches resolves, so the emission fired at most **once per unique request** — the count was effectively `1` for every feature that was imported anywhere. Webpack's equivalent counts unique importing modules via `moduleGraph.getIncomingConnections(module).size`. This PR replaces the resolve-plugin emission with a single whole-app module-graph traversal on `Project`. For each tracked feature, we accumulate the set of unique parent modules of each matching node (mirroring webpack's "unique origin modules" semantics). Fonts are matched against their synthesized `/target.css?…` virtual modules produced by the SWC font-loader transform — matching webpack's `FEATURE_MODULE_REGEXP_MAP` approach. Paths are matched via `phf_map!` tables in `next_telemetry.rs`. ### Incidental simplifications While in here, the `Diagnostic` collectibles subsystem got right-sized and then removed entirely, since feature usage was its only consumer: - `Project::project_feature_usage()` returns a structured `Vc<ProjectFeatureUsageSummary>` instead of emitting diagnostics. Surfaced to JS as a dedicated `project.featureUsage(): Promise<BuildFeatureUsage[]>` napi method, called once at build's end. - `TurbopackResult<T>` loses its `diagnostics: BuildFeatureUsage[]` field — it's now just `{ result, issues }`. Every napi result type and ~10 construction sites are correspondingly simpler. - Deleted `turbopack_core::diagnostics` entirely (`Diagnostic` trait, `DiagnosticExt`, `DiagnosticContextExt`, `CapturedDiagnostics`, `PlainBuildFeatureUsage`). Deleted `FeatureUsageTelemetry`, `ModuleFeatureReportResolvePlugin`, `get_diagnostics()` aggregation, the `feature_usage`/`diagnostics` fields on `AllWrittenEntrypointsWithIssues`/`OperationResult`/`EntrypointsWithIssues`/`WrittenEndpointWithIssues`/`HmrUpdateWithIssues`/`HmrChunkNamesWithIssues`/`EndpointIssuesAndDiags`/`WriteAnalyzeResult`, and the defensive `drop_collectibles::<Box<dyn Diagnostic>>()` scrub in `entrypoints_without_collectibles_operation`. Feature-usage telemetry now flows as a plain return value end-to-end: `Project::project_feature_usage()` → napi `projectFeatureUsage()` → JS `project.featureUsage()` → `telemetry.record()`. No collectibles, no peeking, no emission-as-side-effect. ### Tests Un-skipped four previously webpack-only integration tests in `test/integration/telemetry/test/config.test.ts`: `image/script/dynamic`, `next/legacy/image`, `transpilePackages`, and middleware options. All pass under Turbopack. The remaining three skipped tests (`swc` flags, `@vercel/og`, `useCache`) cover features Turbopack doesn't emit yet — left skipped with TODOs. Added unit test for the helper at `packages/next/src/telemetry/events/build.test.ts`. Updated the Turbopack `next-rs-api` snapshot to reflect the new diagnostic shape. <!-- NEXT_JS_LLM_PR -->github.com-vercel-next.js · e8f8f498 · 2026-05-11
- 0.6ETVRemove LMDB backend and ReadTransaction abstractions from turbo-tasks-backend (#91284) ## Summary Remove the unused LMDB database backend and the `ReadTransaction` abstraction layer from `turbo-tasks-backend`. This is a pure cleanup that deletes ~1,500 lines of code and eliminates all `unsafe` blocks related to transaction lifetime management. **Motivation:** - This was much more useful in the early days of the persistence layer where comparing against a 'known good' db was useful, but this has rotted and turbo-persistence has gained lots of tests and usage so that this is not needed. - Carrying the code added a lot of complexity to APIs e.g. - The `ReadTransaction` abstraction existed solely to support LMDB's transaction-based read model - The only remaining backend (`TurboKeyValueDatabase` via `turbo-persistence`) already used `()` as its `ReadTransaction` type, making the entire abstraction dead weight - The transaction plumbing required `unsafe` transmute-based lifetime extension in `ExecuteContextImpl`, adding complexity and risk for no benefit - The WriteBatch::Serial branches complicated already subtle code pathsgithub.com-vercel-next.js · 04a08435 · 2026-03-12