github.com-cloudflare-agents
all · 14 devs · built 2026-06-13
Repository snapshot
Monthly reports
Highlights
- Introduced a *first-class Agent Skills system* integrated with `@cloudflare/think`, enabling agents to execute various script types and enhancing capabilities [commit/8700e27].
- Launched *first-class messenger support* for Project Think, allowing direct integration with external chat platforms like Telegram via the Chat SDK [32ea71ef · Sunil Pai].
- Added *declarative scheduled tasks* to Think agents, simplifying the definition and management of recurring background work [8ad724b1 · Sunil Pai].
- Implemented *SSE resumability* for `McpAgent` and `WorkerTransport`, improving reliability and resource efficiency by persisting events in Durable Object storage [cfc75bc9 · Matt].
- Integrated *Think workflow prompts* with Cloudflare Workflows, enabling durable and event-driven AI reasoning steps [f5e37bfa · Sunil Pai].
- Introduced a *new Telnyx voice provider* with PSTN telephony bridge, Speech-to-Text (STT), and Text-to-Speech (TTS) integration, significantly expanding communication capabilities [d44f59ad · whoiskatrin].
- Enhanced *chat recovery* by implementing early stashing of chat fiber snapshots before model inference, enabling automatic retries for interrupted pre-stream turns [f942ffe4 · Christopher Little-Savage].
- Significantly *hardened recovery foundations* for both *Think* and *AIChat*, introducing bounded and observable recovery mechanisms with incident-based chat recovery and transcript repair [02f93809 · Sunil Pai].
- Introduced *experimental Postgres session providers* with Hyperdrive support, enabling persistent storage for agent conversations and a major refactoring of the core `Session` API to be fully asynchronous [d151e6d6 · Matt].
- Enhanced the `parentAgent()` helper to correctly resolve *facet-only parents*, allowing for more flexible sub-agent architectures [ce2af348 · Sunil Pai].
Observations
- *Maintenance activity* increased significantly by +49% (24 current vs 16 2-month average), indicating a strong focus on refining and stabilizing existing systems.
- *Commit volume* decreased by -40% (96 current vs 160 2-month average), suggesting a shift from high-volume feature development to more targeted improvements and bug fixes.
- A substantial number of commits addressed *critical bug fixes and hardening* related to *chat recovery* and *Durable Object* stability, including issues with recovery abandonment during deployments [5e60034e · Sunil Pai], [51a771ff · Sunil Pai], stream resume race conditions [7c17736f · Christopher Little-Savage], and agent-tool recovery startup wedges [dfb3ecdd · Sunil Pai].
- Repeated modifications were observed in the *chat recovery mechanism* across multiple commits, such as [4c8b3712 · Sunil Pai], [fac44632 · Sunil Pai], [6d1a8f9d · Sunil Pai], and [02f93809 · Sunil Pai], highlighting an ongoing effort to improve resilience in dynamic deployment environments.
- Significant effort was invested in *CI/CD pipeline improvements* and *testing stability*, including adding retries to typecheck scripts [abf3cd85 · Sunil Pai] and Vitest on CI [a89b5561 · Sunil Pai], optimizing CI with `nx affected` [480d4f36 · Matt], and caching `node_modules` [cbb0d98c · Matt].
- Several refactoring efforts were undertaken to improve code consistency and maintainability, such as standardizing codemode tool call dispatching [f739ec9c · Matt] and making the core `Session` API fully asynchronous [d151e6d6 · Matt].
Performance over time
ETV stacked by Growth, Maintenance and Fixes — 90-day moving average, normalized to ETV / month.
Average performance per developer
ETV per active developer per month — 30-day moving average.
Active developers over time
Unique developers committing each day — 90-day moving average.
Knowledge concentration
How dependent is this repo on a small number of contributors? Higher top-1 share = higher key-person risk.
Sunil Pai owns 72.3 % of commits.
Top contributors
Most impactful commits
Top 20 by ETV in the all-time window.
- 5.2ETVRecovery: e2e + unit coverage, and fix runFiber recovery starvation/backoff (#1729) * test(ai-chat,think): fix racy fiber-cleanup checks and add continue-path e2e The recovery e2e tests asserted `hasFiberRows() === false` the instant recovery was detected, racing the continuation/retry turn that recovery legitimately re-runs in a fresh fiber. Poll until the fiber rows settle instead. Also fixes the ai-chat e2e worker, which emitted the legacy `0:{json}` stream framing that AIChatAgent never parses (it reads `data:` SSE frames), so no chunk was ever persisted and recovery only ever saw an empty partial. Emit proper `data:` frames and stream enough chunks to cross the ResumableStream flush threshold, enabling a new continue-path test (non-empty partial -> resume the same assistant message). Co-authored-by: Cursor <cursoragent@cursor.com> * test(ai-chat): e2e coverage for chat recovery budget exhaustion Adds a deterministic exhaustion harness: agents whose turn hangs and produces no recovery progress, so repeated SIGKILLs drive the recovery budget without racing real streamed content. Covers onExhausted firing with reason no_progress_timeout, recovery_aborted, and work_budget_exceeded, plus the persisted terminal banner (#1645). Extracts the shared wrangler/WebSocket e2e plumbing into harness.ts. max_attempts (alarm-debounce forces >30s spacing) and stable_timeout (not feasibly deterministic in-process) are left to unit coverage. Co-authored-by: Cursor <cursoragent@cursor.com> * test(ai-chat): e2e coverage for continue:false / persist:false recovery outcomes Interrupts a turn after a non-empty partial has flushed, then asserts the two onChatRecovery branches that suppress the default behavior: - { continue: false } persists the partial as a durable assistant message but does not re-run the turn (onChatMessage invoked once). - { persist: false, continue: false } drops a plain-text partial (no settled tool results) and does not re-run. Adds an onChatMessage invocation counter + assistant-text accessor to the test agent to distinguish "persisted partial" from a continuation. Co-authored-by: Cursor <cursoragent@cursor.com> * test(think): e2e for context-overflow compaction recovery Add ThinkContextOverflowE2EAgent plus an in-process (no process kill) e2e covering the opt-in contextOverflow recovery paths: reactive compact-and-retry that recovers a turn, reactive budget exhaustion that surfaces a terminal context_overflow error, and the proactive guard that compacts pre-step when reported usage crosses the headroom budget. Co-authored-by: Cursor <cursoragent@cursor.com> * test(ai-chat): e2e coverage for stream-buffer cleanup alarm (#1706) and recovering-status broadcast (#1620) Add two deterministic e2e tests in @cloudflare/ai-chat: - #1706 stream-buffer cleanup alarm: new ChatBufferCleanupAgent exposes @callable inspectors (buffer/chunk row counts, _cleanupStreamBuffers schedule count, forced future sweep, hasReclaimableStreams). Asserts a completed turn arms exactly one cleanup alarm, a second turn does not stack a duplicate, and a forced future-now sweep reclaims all buffers so a fully-swept DO reports no reclaimable streams. - #1620 recovering-status broadcast: drives a SIGKILL/restart recovery of a slow-stream turn and asserts the durable cf:chat:recovering flag transitions active -> cleared (via a new getRecoveringFlag @callable), plus a live WS frame collector observes the cf_agent_chat_recovering clear broadcast. The durable flag is the deterministic source of truth because the live frame is not replayed on connect. Adds a createFrameCollector helper to harness.ts and registers ChatBufferCleanupAgent under a new v5 migration tag. Co-authored-by: Cursor <cursoragent@cursor.com> * test(think): e2e for durable-submission recovery on start Add ThinkSubmissionRecoveryE2EAgent plus an e2e covering the three _recoverSubmissionsOnStart transitions: messages-not-applied re-enqueues as pending, applied-but-unrecoverable surfaces as error, and a recoverable in-flight submission (real mid-stream SIGKILL) is left running and driven to completion by the scheduled continuation. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(agents): arm follow-up alarm for pending runFiber recovery `_scheduleNextAlarm()` only rescheduled for active keepAlive leases, due schedules, and facet runs — never for orphaned `cf_agents_runs` rows or interrupted/pending managed ledger fibers still awaiting recovery. Because orphaned fibers hold no keepAlive ref, a scan that yielded on `fiberRecoveryScanDeadlineMs` (or a pass that retained a repeatedly-throwing unmanaged recovery hook) never got another alarm, so the remaining fibers starved. Add `_hasPendingFiberRecovery()` and arm a follow-up alarm whenever recovery work is outstanding, so multi-pass recovery resumes and eventually drains every fiber (and ages out poison rows via `fiberRecoveryMaxAgeMs`). Co-authored-by: Cursor <cursoragent@cursor.com> * test(agents): e2e coverage for poison-row aging, scan-deadline yield, and concurrent fiber recovery Add three runFiber recovery e2e tests (real `wrangler dev` + SIGKILL/restart against `--persist-to`): - poison-row aging: an unmanaged fiber whose `onFiberRecovered` always throws is retained for retry across alarm passes, then dropped with a `max_age_exceeded` skip once it exceeds `fiberRecoveryMaxAgeMs`. - scan-deadline yield: a tiny `fiberRecoveryScanDeadlineMs` forces a single alarm pass to yield (`scan_deadline_exceeded`) partway through 20 orphaned fibers; subsequent passes drain the rest with no starvation. - concurrent fibers: N concurrent fibers (mixed managed + unmanaged) are all recovered after a kill, covering the gap that prior tests only recovered a single fiber. New DO test agents record recovery signals (hook invocations + skip reasons) into durable SQL so assertions survive DO eviction between polls. Shared spawn/kill/RPC harness lives in `recovery-helpers.ts`. Co-authored-by: Cursor <cursoragent@cursor.com> * test(think): e2e for messenger reply-fiber recovery Add ThinkMessengerRecoveryE2EAgent plus an e2e covering MESSENGER_REPLY_FIBER_NAME recovery via _handleInternalFiberRecovery: a streaming-stage interruption posts the apology (apologize mode), and an accepted-stage interruption recovers in answer mode and re-drives reply delivery. Uses an in-memory fake chat adapter that records posts into agent SQL; full streamed-answer rendering is deferred (needs a complete adapter/real transport). Co-authored-by: Cursor <cursoragent@cursor.com> * test(think): e2e for workflow-turn recovery + notification drain replay Add ThinkWorkflowRecoveryE2EAgent (reuses STEP_PROMPT_WORKFLOW with a deterministic mock structured model). Covers the happy path (structured workflow turn completes, notification drains, workflow resumes with the validated output) and the recovery path (mid-stream SIGKILL): on restart the turn is reconciled to a terminal submission and the workflow-notification drain replays it so the workflow is unblocked. Documents the deferred gap that an interrupted structured turn is recovered as skipped rather than completed. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): rename e2e callable to avoid Think.getWorkflow collision The workflow-recovery e2e agent's @callable shadowed the inherited Think.getWorkflow(workflowId) with an incompatible signature, failing typecheck. Rename it to inspectWorkflowRun. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(agents): exponential backoff for the runFiber-recovery follow-up alarm The follow-up alarm added for pending fiber recovery fired every keepAliveIntervalMs with no backoff, so a repeatedly-throwing recovery hook — or a `fiberRecoveryMaxAgeMs: 0` ("retain forever") row whose hook keeps throwing — would wake the DO on every tick indefinitely (the perpetual-heartbeat hazard #1707 guards against). Track consecutive no-progress recovery scans and back the alarm off exponentially (capped at 5 min); any scan that recovers a fiber (including a scan-deadline yield that drained part of a batch) resets it, so legitimate multi-pass draining stays prompt. Adds e2e coverage: retain-forever poison-row backoff cadence, and multi-pass recovery for a sub-agent (facet) child driven by the parent alarm. Co-authored-by: Cursor <cursoragent@cursor.com> * test(agents): fast unit coverage for runFiber recovery alarm re-arm + backoff Adds deterministic, in-process unit tests (no process kill / timers) that drive `_checkRunFibers` + `_scheduleNextAlarm` directly and inspect the physical alarm: the starvation re-arm (alarm armed while a retained recovery row is pending), exponential backoff across no-progress scans, backoff reset on forward progress, and no alarm once recovery drains. Previously this behavior was only covered by the nightly e2e suite. Adds getCurrentAlarm/getRecoveryNoProgressScans/simulateAlarmCycle test helpers to the run-fiber test agent. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(agents): note the fiberRecoveryMaxAgeMs:0 warm-DO trade-off A repeatedly-throwing recovery hook with fiberRecoveryMaxAgeMs:0 ("retain forever") is retried on the capped backoff indefinitely, so the Durable Object never idle-evicts while the un-recoverable row exists. Document this in the option JSDoc and docs/durable-execution.md, and recommend a finite age. Bounding recovery by attempts is tracked in #1728. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>Sunil Pai · 1c8fdf58 · 2026-06-10
- 4.1ETVfeat(codemode): connector model + durable runtime, snippets, and vite plugin (#1581) * feat(codemode): connector model with durable runtime, skills, and vite plugin Executor is the dumb code sandbox (DynamicWorkerExecutor, IframeSandboxExecutor). CodemodeRuntime is a DurableObject facet that wraps an executor and makes execution durable via abort-and-replay. Connectors — class-based service integrations (WorkerEntrypoint subclasses): CodemodeConnector, McpConnector, OpenApiConnector, ToolsetConnector Runtime — durable execution engine: - Every tool call recorded in a durable log (the replay spine) - Observations execute and record; approval-required actions abort the run - resumeCodemode() replays the log and runs the approved action - rejectCodemode() / rollbackCodemode() for HITL resolution - codemode.get/set persist scratchpad state across runs Model-facing tool: createProxyTool({ ctx, executor, connectors, skills }) → { code }. Sandbox SDK: codemode.search/describe/connectors/pending/run/get/set + connector globals. Skills: CodemodeSkillSource interface — pluggable reusable code patterns. Vite: @cloudflare/codemode/vite discovers *.codemode.ts, auto-exports connectors + runtime. Search: Executor-style ranked search with normalization/scoring. Connectors support revertAction() for rollback. * feat(codemode): runtime handle with pending(), spec+request OpenAPI surface - createCodemodeRuntime({ ctx, executor, connectors }) returns the runtime handle; runtime.tool() is the primary way to expose codemode to a model - add runtime.pending() (and pendingCodemode) so approval UIs can list actions awaiting approval, per the RFC runtime API - OpenApiConnector is now two overridable primitives: spec() returns the OpenAPI doc into the sandbox, request() performs an authenticated call; drop the search substring matcher and operationId dispatch - docs, README, changeset and PR body updated to snippets language and the runtime-first API * Merge origin/main into feat/codemode-executor-style-providers (pnpm migration) * feat(codemode): trim to the minimal API surface Sandbox SDK is now five methods: search, describe, step, save, run. Removed codemode.connectors()/pending()/fork()/get()/set()/snippets(): - pending() was dead code (a pause aborts the run, so there is never anything pending while model code is executing) - fork() is a host decision, not a model decision - connectors() duplicated the tool description and search/describe - get/set duplicated step (deterministic code recomputes on replay; nondeterministic work belongs in a step) - snippets() duplicated search Host runtime handle is now: tool, pending, approve, reject, rollback. - removed the resume() alias of approve() - removed runtime.fork() and the facet fork(): no concrete developer story yet; the replay log supports re-adding it later - removed the duplicate description option on createCodemodeRuntime (set it on runtime.tool({ description }) instead) - scratch state and parentId removed from ExecutionState The low-level proxy-tool functions (createProxyTool, resumeCodemode, rollbackCodemode, ...) are no longer exported: the runtime handle is the one public API, matching the RFC's one-way-of-doing-things thesis. Docs, changeset, PR body and the RFC wiki (v12) updated to match. * feat(codemode): one tools() record per connector, curated snippets, execution audit trail Connector authoring is now a single surface. A connector is three things: name(), instructions()?, and tools() — one record, one entry per tool, with each tool carrying its own description, schema, requiresApproval, execute, and optional revert. The old parallel string-keyed maps (loadDescriptors/annotations/executeTool/ revertAction) are now wire plumbing derived from the record, not something authors write. ToolsetConnector is deleted: AI SDK toolsets are shape-compatible and return from tools() directly. Derived connectors (MCP) are decorated via a single tool(name, t) hook. observation and approvalDescription annotations are gone (observation was behaviorally dead; the approval UI uses the tool's own description). setConnection two-phase init replaced by constructor injection in the example. Snippets are curated by the developer, not self-promoted by the model: codemode.save is removed from the sandbox; runtime.saveSnippet(name, { executionId? }) promotes any run's script, with runtime.snippets() and runtime.deleteSnippet(name) for management. runtime.executions() exposes the full run history (the audit trail) for developer UIs. The sandbox SDK is now four methods: search, describe, step, run. Also: docs/codemode overhauled around why/configure/use per page (search-and-describe.md folded into runtime.md), example dependency versions aligned with the workspace so sherif passes, RFC wiki updated to v15. * chore(codemode): refresh PR body with audience-split API summary * fix(codemode): address review findings - connector sandbox proxies guard non-string property access, matching the dispatcher proxy, so symbol lookups no longer produce bogus RPC calls - codemode.run executes snippets with the platform provider attached: snippets are saved execution code and may use codemode.step, which previously threw ReferenceError inside a snippet run - McpConnector throws on sanitized tool-name collisions instead of silently dropping tools; override toolName() to disambiguate - add connector base tests (describe derivation, execute/revert dispatch, tool() decoration hook, collision error) * fix(codemode): harden durable runtime — stateless, explicit executionId, resilient rollback Reworks the CodemodeRuntime durable-execution model for correctness under hibernation and concurrency, simplifies the OpenAPI connector, and adds an end-to-end test suite. Runtime architecture - Make CodemodeRuntime stateless across calls: no in-memory cursor or annotations. Every interaction is addressed by (executionId, seq), with seq allocated host-side, so a run survives eviction between any two tool calls. - Remove the global CURRENT_KEY "current execution" pointer and its helpers (#currentId, #current, #resolve). approve/reject/rollback/saveSnippet now require an explicit executionId, eliminating a class of races when multiple runs share one Durable Object. - Thread executionId through to every tool outcome: ProxyToolOutput now includes executionId on completed/paused/error so callers can follow up (e.g. saveSnippet) without guessing the newest run. Replay correctness - Add "executing" ToolLogEntryState: non-approval calls/steps are logged as executing by decide() and only promoted to "applied" once recordResult() stores the real value. A crash between the two re-executes instead of replaying undefined. - Detect replay divergence by hashing connector/method/args via a stable stringify (sorted keys, bigint-tagged). Divergence is recorded as a terminal error and surfaced as { status: "error" } rather than thrown across RPC. - Guard decide() on terminal/paused state: once a run is paused/error/ rolled_back, further decide() calls are inert and return a pause decision, so model code that swallows the pause sentinel cannot apply more side effects. Approvals & rollback - rollback() now reverts ALL applied reversible actions (any tool with a revert), not just approval-gated ones, in reverse order. requiresApproval (pause-before-do) and revert (undo-after-do) are orthogonal. - Make rollback resilient: each revert is wrapped in try/catch, all reverts are attempted, failures are aggregated and thrown, and the run is marked with the new "rolled_back" status when anything was undone. - listPending()/pending() aggregate pending actions across ALL paused runs when no executionId is given, fixing a racy single-run approvals view. - Document that reject() ends a paused run but does not undo applied actions. Execution retention - begin() accepts maxExecutions and prunes old terminal runs automatically; add explicit deleteExecution() and pruneExecutions() APIs. Connectors / DX - OpenApiConnector derives one typed tool per operation host-side (e.g. repoApi.get_repository) instead of making the model parse the raw spec; request() remains as an escape hatch. Adds module-level memoization of derived operations (WeakMap keyed by spec), deeper $ref resolution across allOf/oneOf/anyOf/additionalProperties, and collision warnings for operation names that clash or hit reserved names. - Pass connector bindings as RpcTarget evaluate() arguments instead of via worker env to fix DataCloneError; route pause via a control marker rather than throwing across the sandbox→host RPC boundary. - Switch DynamicWorkerExecutor to loader.load() for one-off dynamic workers (loader.get(random-id) gave no caching benefit). - Widen CodemodeConnector ctx to DurableObjectState | ExecutionContext so connectors inside a Durable Object no longer need to cast this.ctx. - revertAction() returns boolean to report whether a revert actually occurred. Tests, docs, cleanup - Add src/runtime-tests/ e2e suite (vitest-pool-workers) driving a real DO host: read-only, pause/approve, replay, reject, rollback (+rolled_back), divergence, step replay-safety, concurrent runs, retention, snippets, delete, pause-swallow guard, and pending aggregation. Wire vitest.runtime.config.ts into test script. - Rewrite examples/codemode-connectors with an approvals panel and snippet flow. - Update changeset and docs (runtime, approvals, connectors, index, READMEs). - Delete orphaned src/mcp-provider.ts and stray .pr-body.md / EXECUTOR_TODO.md. * feat(codemode): per-execution connector lifecycle + result shaping Adds the two codemode primitives needed for stateful connectors (e.g. reusable browser sessions) to ride on the durable runtime instead of reinventing session storage, plus a model-facing result transform. Both are additive. Per-execution resource lifecycle - A tool's execute(args, ctx) and revert(args, result, ctx) now receive a ToolExecuteContext carrying the run's executionId, stable across pause/resume, so a connector can lazily acquire/reconnect a resource keyed by that id. - CodemodeConnector.disposeExecution(executionId, status) is an optional override (default no-op) called when a run reaches a terminal state, so a connector can tear the resource down. It fires on each terminal transition (completed/error/rejected/rolled_back) and never on pause — a paused run may resume, so the resource must outlive a pause. Documented to be idempotent (a completed-then-rolled-back run disposes twice), to not rely on instance memory (keyed off durable storage), and to never throw (rejections ignored). - A stale/no-op reject() no longer triggers teardown: runtime.reject now returns whether it actually terminated the run, and dispose is gated on that, so a still-resumable run keeps its resources. rejected is now a first-class ExecutionStatus - reject() marks the execution "rejected" instead of masquerading as "error", so the audit trail distinguishes a user rejection from a failure, and ExecutionEndStatus is exactly the terminal subset of ExecutionStatus. Result shaping - createCodemodeRuntime accepts an optional transformResult that reshapes the model-facing result of a completed run (initial run and resume), after the raw value is recorded — so the audit trail keeps the full result while the model sees the shaped one. A throwing transform falls back to the raw result rather than failing a completed run. - New exported truncateResult/truncateResponse (token-aware, { maxChars?, maxTokens? }) as the default building blocks: small structured results pass through unchanged; oversized ones serialize to a bounded, marked string. Tests + docs - e2e: executionId threading, dispose on complete/reject/rollback, no dispose while paused, no dispose on a stale reject, transformResult on run + resume. - unit: truncateResponse/truncateResult. - Documented the lifecycle contract, result shaping, the rejected status, and the sequential-tool-call determinism constraint; updated the changeset. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(codemode): guard resume() to paused runs + make example methods callable Two correctness fixes surfaced in review, plus the docs/readme/changeset updates that go with them. 1. resume() no longer revives terminal runs CodemodeRuntime.resume() reset status to "running" unconditionally, so approve({ executionId }) on a completed/error/rejected/rolled_back run flipped it back to running and re-executed it — bypassing decide()'s terminal guard. The concrete hazards: a rejected action's log entry (state "reverted") fell through decide() to a fresh "pending" entry, re-offering the exact action the user rejected; and a rolled_back run re-applied the side effects rollback had just undone. resume() now changes nothing unless the run is "paused" (returns null otherwise). resumeCodemode() distinguishes missing vs. not-paused and returns a { status: "error", executionId, error } ProxyToolOutput instead of throwing — matching the divergence/pause paths, so the result crosses RPC cleanly and the agent loop is never broken by an exception. This is intentionally a safe no-op rather than a hard error: approve() is operator-initiated (never on the model's tool path), and a stale/racing approval UI hitting an already-finished run is an expected race, not a caller bug. 2. example server methods are now @callable() examples/codemode-connectors exposed pendingApprovals/approveExecution/ rejectExecution/rollbackExecution/executions/saveSnippet/snippets for the client's agent.call(), but none carried @callable(). The Agent RPC dispatcher rejects any method without callable metadata ("Method X is not callable"), so the entire approval/snippet UI threw at runtime. Added the import and the decorators. Tests: new e2e "refuses to approve a terminal run, never re-offering a rejected action" (reject a paused run, then assert approve returns status:"error", no new pending action, no leaked side effects, run stays rejected). 282 unit + 21 e2e + 33 browser pass; pnpm run check clean. Docs: approvals.md and the example README document approve() as a safe no-op on a non-paused run; README snippets fixed to show @callable() and the now-required saveSnippet executionId; changeset updated. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(codemode): listPending() only surfaces paused runs pendingOf() filters log entries by state === "pending" without regard to the execution's overall status, and the aggregate listPending() scanned every execution. A non-paused run can retain a stale "pending" entry — #diverge sets status to "error" but leaves the log untouched, so a resume that diverges before reaching the pending entry ends the run as "error" while that entry stays "pending". Those entries aren't actionable (approve() is a no-op on a non-paused run), so they must not clutter the approval queue. listPending() now considers only paused runs on both paths. The explicit executionId path is only ever called from runPass on a confirmed-paused run, so guarding it is safe and makes "pending = actionable approval on a paused run" the consistent contract. This matches the docs, which already said "all paused runs" — the code was the side out of sync. Regression: the divergence e2e now asserts that after the run ends "error" with a leftover pending entry, both pending() and pending(executionId) return []. 282 unit + 21 e2e + 33 browser pass; pnpm run check clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(codemode): reserve __connectors so a provider can't shadow RPC bindings RESERVED_NAMES guarded __dispatchers (one evaluate() parameter) but not __connectors (the other). A provider named __connectors passed validation and emitted `const __connectors = new Proxy(...)` into the same function scope as the `evaluate(__dispatchers = {}, __connectors = {})` parameter — clobbering the RPC bindings that every connector proxy reads from (`__connectors.<name>.callTool`), and in fact producing a SyntaxError (const redeclaring a parameter binding). The connector validation path already special-cased "__connectors"; the provider path didn't. Add __connectors to RESERVED_NAMES so both providers and connectors are checked against it, and drop the now-redundant special case in the connector loop. Regression: new executor test asserts a provider named __connectors is rejected as reserved. 283 unit + 21 e2e + 33 browser pass; pnpm run check clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(codemode): connector errors return a marker instead of rejecting across RPC The connector binding's comment promised "the RpcTarget method always resolves" (control signals are returned, not thrown, to avoid an unhandled rejection on the host), but the execute/record path didn't honor it: connector.executeTool, runtime.recordResult, and runtime.decide could all reject ConnectorCallTarget. A rejected promise returned from a DO/RPC method is tracked as an uncaught (in promise) on the host even though the sandbox awaits it — so any throwing connector tool (a failed API call, a bug) produced a misleading "unhandled rejection" host trace. Correctness was already fine (the sandbox try/catch ends the run as "error"), but the noise contradicted the design. Make the error path symmetric with pause: the whole binding body is wrapped so it always resolves — to a result, a { control: "pause" } marker, or a new { control: "error", message } marker. The sandbox connector proxy re-throws the error marker locally, so the run's own try/catch records it and the run ends "error" with the message exactly as before — just without a host-side rejection. Regression: ItemsConnector gains a boom tool that throws; a new e2e asserts the run ends "error" with the message and the suite completes with no unhandled rejection. 283 unit + 22 e2e + 33 browser pass; pnpm run check clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(codemode): log connector-call failures on the host before returning the marker Returning an error marker keeps the RPC call from rejecting (no misleading unhandled-rejection trace), but a genuine connector failure still deserves a host-side log with its stack for debugging. Add a console.error in the binding's catch with the connector/method and execution id. This restores the visibility the pre-marker throw had — minus the "uncaught (in promise)" framing — while the message continues to reach the model and the audit trail via the run's "error" outcome. Pause is unaffected (it isn't an error and isn't logged). 283 unit + 22 e2e + 33 browser pass; pnpm run check clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(codemode): mark approved action "executing" before running it (reject race) decide() returned { kind: "execute" } for a just-approved (pending) entry WITHOUT persisting, leaving the entry "pending" in storage for the entire tool-execution window. Between decide() returning and recordResult(), the DO is idle (the tool runs on the host worker), so a concurrent reject() — e.g. a second UI tab — could read "pending", mark it "reverted", and set status "rejected"; recordResult() then overwrote the entry back to "applied". Net result: the side effect ran even though the user rejected it, and the status was left inconsistent. The fresh-call path already guarded this window by persisting "executing" before returning; the pending→execute path skipped it. Now the pending→execute transition writes "executing" before returning, so a racing reject() sees "executing" and no-ops (reject only acts on "pending"). decide() also handles an existing "executing" entry explicitly — re-execute, never re-pause — so a crash mid-execution recovers without re-requesting approval for an already-approved action (which a naive "executing" flip would have caused via the requiresApproval branch on the fall-through). Either interleaving is now consistent: reject-before-decide ends the run before the action runs (decide sees status != running → pause); decide-before-reject runs and applies the action while reject no-ops. Regression: new e2e drives the facet directly (begin → decide → resume → decide → reject → recordResult) and asserts the approved action is "executing" at the decision boundary, the racing reject returns false and leaves the run "running", and the action records as "applied". 283 unit + 23 e2e + 33 browser pass; pnpm run check clean. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(codemode): normalize snippet code before embedding it in codemode.run A snippet stores the model's raw code (runtime.begin keeps it verbatim, and saveSnippet copies it). codemode.run embeds that raw text as an expression: `const snippet = (${snippet.code})`. Normal runs and replay survive fenced or statement-style code because they pass through normalizeCode (strip markdown fences, wrap non-expressions into an arrow), but the snippet wrapper bypassed that — so a snippet saved from ```ts-fenced output or a statement block (`const x = ...; return x;`) became a syntax error on re-run. Normalize snippet.code to a valid arrow expression before embedding it, the same transform the executor applies to a fresh run; runCode still normalizes the outer wrapper. The fix lives in the execution layer (proxy-tool) so the runtime facet stays pure storage and snippet.code remains the faithful raw model output. Regression: new e2e saves snippets from both fenced (```ts ... ```) and statement-block code, then re-runs each via codemode.run and asserts they complete with the right result. 283 unit + 24 e2e + 33 browser pass; pnpm run check clean. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(examples): richer codemode-connectors UI + dark-mode fix Render assistant/user text through Streamdown (markdown + highlighted fences), add collapsible tool cards showing the model's code, result, console logs, and errors, and add a collapsible reasoning-trace block. Fix the user message bubble, which used a non-theme-aware `text-black` on the accent background, switching to `bg-kumo-contrast` + `**:text-kumo-inverse` so it reads correctly in both light and dark mode. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Matt Carey <matt@cloudflare.com> Co-authored-by: Sunil Pai <spai@cloudflare.com> Co-authored-by: Cursor <cursoragent@cursor.com>Matt · b2b67623 · 2026-06-10
- 4.1ETVfeat(experimental): Postgres session providers with Hyperdrive support (#1297) * feat(experimental): PlanetScale SessionProvider + async interface * fix(session): address PR #1297 review feedback Responses to @mattzcarey review on PR #1297. Covers all 13 comments: 1. Drop wrapPgClient boilerplate — Postgres providers now accept raw pg.Client directly via new providers/postgres-adapter.ts. The adapter normalises pg.Client-style (`query`) into the internal PostgresConnection shape (`execute`) and rewrites `?` placeholders to `$1, $2, …` so providers keep a driver-agnostic SQL dialect. 2. appendMessage parentId semantics — both PostgresSessionProvider and AgentSessionProvider were using `parentId ?? latestLeaf`, which collapsed undefined (auto-detect) with null (explicit root). Fixed to honour the documented contract: - undefined / omitted → auto-detect - explicit null → root with no parent SessionProvider JSDoc now documents this. New tests cover both cases for both providers. 3. extractText — renamed to extractSearchableText, added JSDoc explaining it feeds text_content for FTS while the full JSON stays in `content`. 4/6/8. Restored `enum` on label params across set_context, load_context, unload_context, search_context — schema-level enforcement instead of free-text description hints so smaller models can't hallucinate invalid labels. 5. set_context metadata shape — switched from flat `title` to nested `metadata: { title?, description? }`. Tool description now explains metadata is optional and useful for longer loadable entries (skills). setSkill() receives description ?? title so behaviour is preserved when only title is passed. 7. Dropped 'e.g. memory' example from search_context description — avoids seeding models with a non-existent block name. 9. Renamed Session.create(storageOrAgent) → Session.create(provider). 10. Async skill restore — _ensureReady() now kicks off restoration as a background _restorePromise; a new _ensureRestored() awaits it. Every async Session public method awaits _ensureRestored() before touching storage or skill state. unloadSkill / getLoadedSkillKeys are now async (internal callers only). Async SessionProviders (Postgres) now correctly rehydrate loaded-skill tracking after DO hibernation instead of silently dropping it. 11. Added JSDoc to _reclaimLoadedSkill explaining it reclaims context-window tokens by collapsing a load_context tool result to a short marker (kept the name per review feedback). 12. Clarified addContext JSDoc: it's a builder/host API, not an LLM tool; the LLM writes via set_context. 13. Added a comment on the Think._cachedMessages in-place patch explaining why it's not a full _syncMessages() call (in-flight streaming messages would be dropped — see commits 3f615a24, 6e76bd49). Example server.ts + docs/sessions.md updated for the new API and fixed the Devin-flagged premature client-caching bug (client is only assigned after connect() resolves). Tests: +4 in postgres-providers.test.ts (parentId null/undefined, raw pg.Client adapter for session/context/search providers), +2 in provider.test.ts (same parentId semantics for AgentSessionProvider). 161/161 session-related tests pass. * chore(session-planetscale): align kumo + ai dep versions with workspace Bump @cloudflare/kumo from ^1.18.0 to ^1.19.0 and ai from ^6.0.159 to ^6.0.168 so the session-planetscale example matches every other package in the monorepo. Makes `npm run check` (sherif) pass without multiple-dependency-versions errors. * fix(session): resolve postgres provider rebase fallout Align the new PlanetScale example with the rebased workspace dependencies and update Think async-session call sites/tests so the branch stays typecheck- and lint-clean on current main. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(session): harden postgres provider follow-ups Tighten the Postgres session provider for shared database usage by scoping message id conflicts to (session_id, id) and validating explicit parent ids against the current session before storing them. This keeps caller-provided message ids safe across sessions and preserves the SQLite provider's fallback-to-root behavior for invalid parents. Make generated keys for keyed context writes deterministic but collision-resistant when the model omits metadata.title. Title-based writes remain stable update keys, while content-derived keys now include a short hash so long shared prefixes and non-Latin content do not silently overwrite unrelated skill or search entries. Clean up the new PlanetScale example and docs for merge readiness: remove committed Cloudflare account/resource IDs, document the required Hyperdrive placeholder, use raw pg.Client in examples, initialize the client/session from onStart instead of request-created promises, update Session docs for the async API, document the Postgres composite message primary key, and add the relevant changeset for the new public providers and async session surface. Tests cover cross-session duplicate message ids, foreign-session parent fallback, and generated key collision cases. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): keep session message cache coherent Teach Session to notify internal listeners after message mutations so cache-owning framework code can mirror durable storage changes without widening the public API. Think now registers that hook during startup, treats `messages` as its live cached view, and routes writes through history helpers that sanitize and enforce row-size limits before delegating to Session. This avoids full storage rereads during active streaming turns, while still refreshing at safe boundaries for duplicate appends, branch writes, deletes, clears, and compaction overlays. It also makes direct `this.session.appendMessage()` calls from advanced Think subclasses update the live cache through the same observer path. Add regression coverage for duplicate message IDs, compaction-triggered refreshes, direct Session appends, subclass append helpers, `getMessages()` copy semantics, and host-injected messages. Update the Session docs and PlanetScale example README for async APIs, Postgres-backed search/storage wording, Durable Object persistence semantics, and mark the changeset as a minor bump because the async Session API is breaking for 0.x consumers. Co-authored-by: Cursor <cursoragent@cursor.com> * Render skill blocks and label keyed block kinds Treat empty skill blocks as renderable in ContextBlocks so the LLM can discover loadable skill collections (add !block.isSkill to the skip logic). Include a human-readable kind for keyed/writable blocks in the set_context description ("skill collection, keyed entries", "searchable, keyed entries", or "writable"). Add unit tests and small test providers (EmptySkillProvider, WritableSkillProvider, WritableSearchProvider) to verify empty skill block rendering and that tools().set_context lists keyed block kinds and metadata fields. * Normalize Postgres timestamps and patch cache Normalize created_at values returned from Postgres to ISO strings (handle Date objects and other types) in PostgresSessionProvider and add a unit test for this behavior. In Think, replace an upsert on session update events with a patch-only _patchCachedMessage implementation so updateMessage no longer inserts messages that are missing from the live cache; add a test helper and a test to ensure missing messages are not appended. These changes prevent Date objects from leaking into API fields and stop update events from creating unexpected cached entries. --------- Co-authored-by: Matt <matt@test.com> Co-authored-by: Sunil Pai <spai@cloudflare.com> Co-authored-by: Cursor <cursoragent@cursor.com>Matt · d151e6d6 · 2026-05-19
- 3.8ETVfix: stop oversized sessions from bricking the DO with SQLITE_NOMEM on wake (#1724) * fix: stop oversized sessions from bricking the DO with SQLITE_NOMEM on wake (#1710) Four coordinated changes across agents + @cloudflare/think: 1. AgentSessionProvider.getHistory() no longer carries message content through the recursive CTE and its ORDER BY sorter (2-3 transient copies of the whole transcript inside SQLite's allocator); content is fetched in bounded chunks via json_each. 2. Think.onStart degrades instead of throwing when a data-driven step fails (transcript hydration, declared-task reconcile, durable-work recovery) — a throw there is re-run on every wake, including alarm retries, permanently bricking the DO. 3. hydrationByteBudget (default 24MB): oversized transcripts hydrate as a bounded recent window instead of materializing fully in memory. 4. mediaEviction (default on): aged inline media (data-URL file parts, large strings in tool outputs) is evicted from stored messages in background passes, preserved as workspace files by default. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(think): document budget-exemption contract on _readMessagesFromStorage (#1710) * fix: close remaining full-history reads and harden memory bounds (#1710) Deep-review follow-up to the SQLITE_NOMEM fix. The original change bounded onStart hydration, but three helper paths still materialized the full transcript, undermining the budget; this closes them and hardens the new mechanisms. Session layer (agents): - Skill restore no longer bypasses budgeted hydration. The init-time loaded-skill scan is skipped entirely when no skill-capable context provider is configured, and when one is, it enumerates rows via getHistoryRowStats and fetches assistant messages ONE AT A TIME instead of reading the full history (full-read fallback for providers without row stats). A skill block added later via addContext() triggers the scan at that point. HistoryRowStat gains `role` to support the filter. - New Session.internal_rewriteMessage(): maintenance write path that skips the public updateMessage side effects (status broadcast + its FULL-history token estimate) while still notifying the message-change listener. Media eviction rewrites rows through it, so a pass no longer triggers a full-history read per rewritten row. - getRecentHistory(maxContentBytes, minRecentMessages?) gains a window floor: the most recent N rows are always included even when they exceed the byte budget (rows are write-capped, so the floor stays bounded). - Honest fallback: providers without getRecentHistory now report the real serialized size (not 0) and warn once that the budget is unenforced. - Content hydration chunks are bounded by cumulative stored bytes (4MB) as well as row count, removing the 50-near-cap-rows (~90MB) worst case. Think: - Budgeted hydration passes MODEL_RECENT_WINDOW (4 — the truncateOlderMessages default) as the floor, so windowing can never shrink this.messages below the span the model replays at full fidelity. - Media eviction: keepRecentMessages is clamped to that same window (a misconfigured low value can never strip content the model still sees); a pass that stops at maxRowsPerPass with eligible rows remaining schedules the next pass itself so backlogs drain; providers without row-stats support warn once instead of silently no-opping. - chat:hydration:windowed emits on change rather than on every safe- boundary sync (a chronically oversized session syncs many times per turn and would spam identical events). - Public getOnStartDegradations() accessor; stale onStart/JSDoc comments updated to describe the budgeted behavior. Tests (~30 new): restore-scan gating and bounded-restore call counts via stub providers, addContext late-skill scan, honest fallback metadata, silent-rewrite no-broadcast contract, floor semantics and corrupt-leaf behavior at the provider, role in row stats, 6MB multi-chunk round-trip, pure-function coverage for media-eviction.ts (markers, shape preservation, depth limit, no-mutation), eviction clamp and automatic pass chaining, and observability event assertions for chat:onstart:degraded / chat:hydration:windowed / chat:media:evicted. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Sunil Pai <spai@cloudflare.com> Co-authored-by: Cursor <cursoragent@cursor.com>whoiskatrin · c18a446d · 2026-06-12
- 3.5ETVHarden Think and AIChat recovery foundations (#1611) * test: cover Think recovery failure surfaces Add characterization coverage for the production recovery gaps before changing runtime behavior. These tests capture the current failure surfaces so the follow-up fixes can prove that they improve behavior rather than only reshaping APIs. The Think recovery e2e harness now covers repeated restart churn around an interrupted turn, the post-persist/pre-turn request failure path, and parent agent-tool recovery interruption after restart. The test worker persists recovery observations across restarts and exposes test-only controls for forced turn failures and retained agent-tool rows. The focused regression tests also document two non-restart reliability issues: poisoned transcripts with orphan tool calls fail later turns, and createCompactFunction can skip summarization when tool-heavy histories are under-counted by the heuristic tail budget. This commit intentionally contains only tests and test harness changes. Runtime recovery behavior is left unchanged for the subsequent fix commits. Co-authored-by: Cursor <cursoragent@cursor.com> * test: cover AIChatAgent recovery failure surfaces Mirror the Think restart-churn characterization coverage in the AIChatAgent recovery e2e harness before changing shared recovery behavior. Both chat packages use the same underlying fiber recovery pattern, so the shared fix work needs coverage that proves AIChatAgent receives the same protection rather than only improving Think. The e2e worker now persists recovery observations in Durable Object storage so restart churn does not lose the evidence we need to assert on. The new e2e test starts a slow recovered chat turn, repeatedly kills and restarts wrangler around the interrupted fiber, and verifies recovery still fires and stale fiber rows are cleaned up. This commit is intentionally limited to characterization coverage. It excludes Think-only concerns such as Session compaction and durable submissions, which do not apply to AIChatAgent. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: add recovery observability channels Introduce dedicated observability event types and diagnostics channels for the recovery work that follows. Fiber recovery, chat recovery, transcript repair, and agent-tool reconciliation now have stable event names instead of being mixed into unrelated message or lifecycle streams. The channel routing tests lock in the public names for the new channels and ensure chat:transcript:* events land on the transcript channel while chat recovery events stay on the chat channel. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: bound and observe fiber recovery Make the generic fiber substrate safer before adding chat-specific retry policy. Recovered fibers now carry a recoveryReason, interrupted run rows emit fiber recovery lifecycle events, and internal framework recovery hooks are bounded by a default timeout so startup cannot wedge forever on a broken internal recovery path. Managed fibers whose recovery hook throws are now marked terminal error instead of staying indefinitely interrupted with only an error message attached. Unmanaged run rows continue to be pruned after recovery handling so the same broken stale row does not re-trigger forever across boots. This also exposes the agentTool observability channel under the camelCase subscribe key while preserving the diagnostics channel name agents:agent_tool. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: add bounded chat recovery incidents Add shared chat recovery configuration and incident context so Think and AIChatAgent recovery are bounded by framework-owned attempt state. Recovery hooks now receive incidentId, attempt, maxAttempts, and recoveryKind, with attempt === 1 representing the first handling of an incident. Both chat runtimes persist recovery incidents in Durable Object storage, emit chat recovery lifecycle events, use configurable stable-state timeouts, and stop scheduling more recovery work once maxAttempts is exceeded. Exhaustion emits a terminal chat recovery event and sends a user-visible terminal chat error frame; Think also marks any matching running submission as error. The default behavior is enabled for existing chatRecovery=true users with maxAttempts 6 and stableTimeoutMs 10000, while custom policy can be supplied via chatRecovery object configuration. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: surface Think chat request failures Extend Think's existing onChatError hook with optional stage context and route post-persist request failures through it. The request path now emits chat:request:failed with request id, stage, persistence state, and sanitized error text before sending the terminal chat error frame. This gives applications a server-side hook for failures that occur after user messages have been accepted but before or during the model turn, without routing those chat failures through the generic Agent onError hook. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: bound agent-tool recovery scans Add a total recovery deadline for parent-side agent-tool reconciliation so a restart cannot spend unbounded time inspecting stale child rows. Per-child inspection remains bounded, and any rows reached after the total deadline are terminalized as interrupted with an explicit recovery-deadline error. The recovery loop now emits structured agent_tool recovery lifecycle events for begin, per-row outcome, deadline, completion, and unexpected failure, giving operators visibility into exactly which child run was finalized and why. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: repair poisoned chat history before model calls Repair incomplete tool-call transcript shapes before Think converts UI messages for provider calls. Persisted orphan tool calls are stripped from the active Session history, clients are rebroadcast with the repaired transcript, and a chat:transcript:repaired event records the repair counts and tool call ids. Also let createCompactFunction use an optional tokenCounter for tail-budget selection. Callers with tokenizer or model-reported accounting can avoid the heuristic under-count that otherwise protects too much tool-heavy history and skips summarization. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: add recovery foundation changeset Record the public recovery API, observability, and behavior changes across the published packages so the release notes explain the new default recovery bounds and transcript/compaction fixes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: export chat recovery lifecycle types Expose the new chat recovery config and exhaustion context types from agents/chat so downstream packages can typecheck against the shared lifecycle surface. Update the Think test fixture to construct the enriched fiber recovery context. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: harden recovery edge cases and docs Stabilize chat recovery incident accounting across retry fibers, preserve failed internal recovery rows for later scans, and avoid stale continuation targets after orphan persistence. Tighten transcript repair behavior and document the new recovery and observability surfaces so users can configure and debug bounded recovery. Co-authored-by: Cursor <cursoragent@cursor.com> * docs: clarify recovery UX surfaces Document the distinction between stream resumption and durable chat recovery so users can configure and observe recovery behavior correctly. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: persist normalized transcript repairs Track length-preserving part replacements so repaired tool inputs are written back before provider calls. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: remove dead transcript repair counter Align transcript repair observability with AI SDK v6 message parts by dropping the unused removedToolResults field. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: harden chat recovery incident bounds and fiber retry limits Follow-up hardening from a full review of the recovery foundation. These close edge cases where bounded recovery could still leak storage, exceed its attempt budget, or fail to surface a stuck turn. Chat recovery incidents (Think + AIChatAgent): - Drop `recoveryKind` from the incident identity so a single interrupted turn that flips between `retry` (no chunks persisted) and `continue` (partial chunks exist) across restarts shares one attempt budget. The kind is still tracked as a mutable field on the incident record. - Delete the incident record on `completed` (success) and add a TTL sweep (1h inactivity) on each new incident so durable storage no longer grows without bound. Exhausted/failed/skipped records are retained for inspection until they age out. - Guard a throwing `onExhausted` hook so the terminal error frame (and Think's running-submission interruption) is always delivered. - Wrap the post-begin recovery dispatch (onChatRecovery, orphan persist, scheduling) so any throw flips the incident to a terminal `failed` state and emits `chat:recovery:failed` instead of leaking in `attempting`. Generic fiber recovery (Agent): - Add `fiberRecoveryMaxAgeMs` (default 24h). A repeatedly-throwing unmanaged `onFiberRecovered()` row is still retried while fresh but is evicted with a `fiber:recovery:skipped` / `max_age_exceeded` event once it ages out, so a poison row cannot re-trigger forever across boots. - Note that the fiber recovery hook timeout bounds the wait but does not cancel the underlying internal operation. Observability: - Replace the hand-coded `agentTool` subscribe special-case with a `CHANNEL_DIAGNOSTIC_NAME_OVERRIDES` lookup to prevent future drift between camelCase keys and snake_case diagnostics channel names. Tests: - Think + AIChatAgent: shared attempt budget across retry/continue flip, incident deletion on completion, stale incident sweep, `failed` transition when onChatRecovery throws, and terminal UX delivery when onExhausted throws. - Agent fibers: a fresh throwing unmanaged row is retained (retryable), an aged throwing row is evicted with `max_age_exceeded`. - Update existing incident-id assertions to the kind-less format. All unit and e2e suites pass (Think, AIChatAgent, Agent fiber recovery, observability routing). Co-authored-by: Cursor <cursoragent@cursor.com> * docs: document fiber recovery max-age bound Clarify that an `onFiberRecovered()` hook which always throws is retried only until the row exceeds `fiberRecoveryMaxAgeMs` (default 24h), after which it is discarded with a `fiber:recovery:skipped` / `max_age_exceeded` event. Previously the bullet implied a thrown hook kept the row indefinitely. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>Sunil Pai · 02f93809 · 2026-05-29
- 3.4ETVAdd first-class Agent Skills (`agents/skills`) with Think integration (#1584) * Add first-class Think agent skills Introduce a first-class Agent Skills integration for Think so agents can declare skill sources with getSkills(), advertise a compact skill catalog in the session prompt, and let the model activate matching skills through dedicated tools instead of generic Session context loading. This adds the core SkillSource model, manifest/frontmatter helpers, a SkillRegistry, and Think wiring for activate_skill and read_skill_resource. Skill fingerprints are stored in Think config and used to refresh cached prompts when bundled or runtime skill catalogs change. Skill names now fail fast on duplicates across sources so ambiguous registrations do not silently pick the first provider. Add bundled local skills support to agents/vite via import attributes such as import skills from "./skills" with { type: "skills" }. The Vite plugin parses SKILL.md YAML frontmatter, bundles allowed Agent Skills resource directories, emits a deterministic manifest-backed SkillSource, and keeps resource collection limited to references/, scripts/, and assets/ to avoid leaking unrelated local files. Add an R2-backed Think skill source with skills.r2(bucket, options). It reads standard Agent Skills directory layouts from R2, parses SKILL.md files, exposes resource descriptors, lazily fetches individual resources, supports metadata or content fingerprinting, and refreshes mutable bucket indexes on an interval so prompt catalogs can update without restarting the Durable Object. Add an agent-skills example demonstrating bundled skills, including release notes, debug planning, brand voice, and pirate voice skills. Document the design decisions in design/skills.md and add focused tests for parsing, manifest sources, registry behavior, R2 directory discovery, resource reads, duplicate detection, and fingerprint refresh behavior. Also make the affected Think and AI Chat vitest worker suites retry consistently to reduce transient worker test flakes. Co-authored-by: Cursor <cursoragent@cursor.com> * Expand Think skills with script execution Add first-class skill script execution to Think so agent skills can bundle and run task-specific helper scripts alongside SKILL.md instructions. This introduces the run_skill_script tool, a SkillScriptRunner contract, and a workerScriptRunner implementation with explicit capability boundaries for workspace, tools, network, and timeouts. Support JavaScript and TypeScript scripts through the existing DynamicWorkerExecutor path, using @cloudflare/worker-bundler to compile TypeScript skill scripts before execution. Add Bash support through just-bash and Python support through Python Dynamic Workers with a host bridge for tool calls and workspace access. Tighten the runtime UX and safety model by validating script paths, restricting executable resources to scripts/ with supported extensions, defaulting omitted script input to {}, defaulting script timeout to 30 seconds, and granting read-only workspace access when a workspaceInstance is provided while keeping writes, network, and tools as explicit opt-ins. Update the agent-skills example to demonstrate bundled skills, the Vite skills import attribute, worker_loaders, TypeScript/Python/Bash release-note scripts, and the simplified default runner configuration. Refresh README and design documentation so the first-class Think skills API is the recommended path going forward. Add focused test coverage for skill registry script tools and worker script execution across TypeScript, Bash, and Python, including context/input handling, tool invocation, workspace read/write permissions, validation failures, and script error surfacing. Co-authored-by: Cursor <cursoragent@cursor.com> * Update Think skills RFC status Mark the implemented skills MVP as complete and leave Git-backed sources plus R2 write/delete helpers as explicit follow-up work. Co-authored-by: Cursor <cursoragent@cursor.com> * Improve Think skill resource and script compatibility Broaden Agent Skills compatibility by making bundled and R2 resources binary-safe, adding encoding and MIME metadata to resource descriptors, and supporting qualified cross-skill resource reads. This lets skills expose non-text assets without corrupting content while giving model-visible tools clearer metadata and diagnostics. Tighten resource path handling across manifest, R2, and script execution so malformed paths cannot escape the intended skill resource namespace. R2 content fingerprints now hash binary resources through base64 content instead of lossy text decoding, preserving correctness when resources include images, fonts, PDFs, or other binary files. Make skill script execution friendlier to CLI-style skills without committing to a JavaScript filesystem shim yet. Python and Bash scripts receive /input.json, /context.json, and mounted /skill resources, while JavaScript and TypeScript keep top-level execution and function-style compatibility. TypeScript and JavaScript scripts can now import sibling script resources because worker-bundler receives all text script files and bundles multi-file script packages when needed. Update the agent-skills example, README, design note, and changeset to describe the new resource behavior, script defaults, and the deferred JavaScript filesystem compatibility design. Add regression coverage for binary resources, unsafe resource paths, qualified reads, sibling script imports, Python CLI-style scripts, and CPU-bound Python timeouts. Co-authored-by: Cursor <cursoragent@cursor.com> * Add skill script filesystem compatibility Add a stable worker-bundler virtualModules option so callers can provide generated modules for exact import specifiers without reaching for esbuild plugin internals. This lets framework integrations alias modules like node:fs, fs/promises, or other virtual runtime APIs while preserving the existing virtual filesystem resolver and transform-only warnings. Use that capability in Think skill scripts to provide a partial fs/node:fs/path compatibility layer for JavaScript and TypeScript skills. Skill-local files, input, and context can be read synchronously; workspace access remains async-only through fs.promises because it crosses the host Worker boundary; /output writes are returned as scratch artifacts instead of mutating durable workspace state; and /workspace writes require explicit read-write workspace permission. Keep the authoring model aligned with Agent Skills by continuing to bundle sibling script imports through worker-bundler, supporting both static fs imports and dynamic import("node:fs"), and preserving the function-style run(input, ctx) compatibility path while surfacing any output artifacts it writes. Document virtualModules in worker-bundler, update Think and skills design docs for the partial filesystem contract, and extend the agent-skills example with a bundled release-notes style guide read via node:fs. Add regression coverage for virtual module aliases, JS/TS fs reads and writes, output artifacts, workspace permission boundaries, dynamic imports, and Node-like workspace readdir/stat behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * Polish Think skills runtime and docs Tighten the skill script runtime before PR by making Bash resource handling and exit semantics match the path-based script contract, adding Python /output artifact collection, and improving dev-mode invalidation for bundled skill imports. This also expands Think skills documentation so the public docs explain skill sources, tool exposure, script execution requirements, and runtime dependencies instead of leaving the feature mostly in examples and package README text. Co-authored-by: Cursor <cursoragent@cursor.com> * Add default Think workspace bash tool Expose a sandboxed bash tool from Think's built-in workspace tools so agents can use shell-style workflows for multi-file operations without each app wiring its own executor. The tool mounts a bounded snapshot of the workspace into just-bash, runs with network disabled by default, and syncs created, updated, deleted, and empty-directory changes back to the durable workspace. Make the write-back path conservative: directory snapshots are paginated, oversized or unreadable files are reported as skipped and treated as protected paths, /tmp and system-like paths are ignored for new-file sync, and write/delete failures are returned as structured tool errors instead of being reported as successful changes. Preserve binary writes when the workspace supports writeFileBytes, fall back to text writes when safe, and return structured stdout, stderr, exitCode, changedFiles, skippedFiles, and errors for both successful and failed bash execution. Add workspaceBash as an opt-out/option property on Think and keep createWorkspaceTools typed with a concrete WorkspaceTools shape so callers can discover the optional bash tool cleanly. Document the default behavior, snapshot limits, tuning options, and opt-out path in the Think README and docs. Cover the new behavior with assistant tool tests for persisted file changes, non-zero exits, skipped-file protection, empty directory sync, paginated snapshots, and structured timeout/error output. Verified with: - npm run check - npx vitest --run -c packages/think/src/tests/vitest.config.ts packages/think/src/tests/assistant-tools.test.ts packages/think/src/tests/skill-runner.test.ts Co-authored-by: Cursor <cursoragent@cursor.com> * Harden Think skills and promote the engine to `agents/skills` This builds on the initial first-class Think skills work, hardening the runtime, simplifying the script API, and relocating the engine so it is framework-agnostic rather than Think-specific. Engine moved to `agents/skills` - Move the skills engine (types, frontmatter parser, `SkillRegistry`, the bundled-manifest and R2 sources, and the script runner) from `@cloudflare/think` into a new framework-agnostic `agents/skills` export. The runner depends only on a local structural `SkillWorkspace` interface, so the engine no longer couples to Think or `@cloudflare/shell`. - `@cloudflare/think` now re-exports the engine as `skills` (so `import { skills } from "@cloudflare/think"` is unchanged) and keeps only the integration wiring: `getSkills()`, `getSkillScriptRunner()`, the Session catalog context block, and the fingerprint refresh. - Any AI SDK caller (including `@cloudflare/ai-chat` in `onChatMessage`) can now build a `SkillRegistry` and merge `registry.tools()` + `registry.systemPrompt()` directly. - Move the skill/runner/r2 tests into `packages/agents/src/tests` and add the `LOADER` worker-loader binding to the agents test worker so script execution runs in the agents workers pool. Import API: `agents:skills` specifier - Replace the `import x from "./skills" with { type: "skills" }` import attribute (and its per-project `.ts` type shim) with an explicit `agents:skills` virtual specifier resolved by `agents/vite`. The path is optional and defaults to a `./skills` directory next to the importer; `agents:skills/<dir>` targets a sibling directory. - Ship ambient types from `agents` (`skills-module.d.ts`), referenced from the built `dist/index.d.ts`, so importing `agents` (directly or via `@cloudflare/think`) types the specifier with no per-project shim. Graceful skill loading (never throw in the turn path) - `SkillRegistry` skips duplicate skill names (first source wins) and sources that fail to list, recording diagnostics in `warnings` instead of throwing; warnings reset each load. `Think` wraps init/refresh and logs warnings deduped by message. The Vite plugin warns on duplicate bundled names at build time. Experimental, simpler script runner - Rename `skills.workerScriptRunner` to `skills.runner`, flag it `@experimental`, and log a one-time warning on first use. - JS/TS scripts are now function-style only: `export default run(input, ctx)` with `ctx = { skill, files, workspace, tools, output }`. Removed the hand-rolled `node:fs`/`path` compatibility shim; bundled text resources are exposed via `ctx.files` and scratch artifacts via `ctx.output.writeFile`. - Unify capabilities and permission enforcement behind a single `SkillScriptHostBridge`, constructed fresh per `run()` so `/output` artifacts never leak between concurrent runs. JS providers, Python RPC, and Bash commands all delegate to it. Python and Bash keep the path-based `/skill` / `/input.json` / `/output` contract. Bundled asset guardrails - The Vite plugin warns when a bundled skill asset (or the total) exceeds size thresholds and recommends `skills.r2()` for large assets. Example (`examples/agent-skills`) - Switch to `import bundledSkills from "agents:skills"` and `skills.runner`, delete the type shim. - Keep script execution TypeScript-only (function-style, reading the style guide from `ctx.files`); drop the Python and Bash demo scripts. - Replace the `brand-voice` persona skill with a procedure-style `test-plan` skill. - Render skill tool activity (`activate_skill` / `run_skill_script`) inline and light up activated skills in the sidebar. Docs and changesets - Update `docs/think/index.md`, `packages/think/README.md`, `design/skills.md`, and `packages/agents/AGENTS.md` for the new home, specifier, function-style `ctx` API, and ordering/first-source-wins semantics. - Split the changeset: skills bumps `@cloudflare/think` + `agents`; the `worker-bundler` `virtualModules` option gets its own changeset. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix review nits: R2 fingerprint boundaries, Think deps, tool-merge docs - r2.ts: fold a part boundary into `stableHash` so different catalogs whose concatenated metadata/content streams would otherwise be identical (e.g. ["ab","cd"] vs ["abcd"]) now hash differently, preventing a missed catalog refresh. - think/package.json: `@cloudflare/codemode` and `@cloudflare/shell` were listed in both `dependencies` and `peerDependencies` (with codemode also flagged optional). They are required at runtime, so keep them as plain dependencies and drop the contradictory peer/optional entries. - docs/think/index.md: rewrite the dependency table — split provided peers (agents/ai/zod/telegram) from bundled deps (shell/codemode/just-bash), drop the stale `@cloudflare/worker-bundler` row, and note the skills engine lives in `agents/skills`. - docs/think/tools.md: correct the tool merge order to match code (extension tools before session tools) and add the missing skill-tools entry. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>Sunil Pai · 87006e27 · 2026-05-29
- 3.4ETVfeat(observability): replace console.log with diagnostics_channel, add typed subscribe helper and ai-chat events (#1024) * Use diagnostics_channel for agent observability Replace console.log-based observability with Node's diagnostics_channel API. Events are now published to named channels (agents:state, agents:rpc, agents:message, agents:schedule, agents:lifecycle, agents:workflow, agents:mcp) via a new channels map and a getChannel(type) helper that routes event types to the appropriate channel. The genericObservability.emit implementation now publishes events to diagnostics_channel instead of printing them, and the previous local-mode console-printing logic and getCurrentAgent usage were removed. A changeset was added documenting the change and noting that messages are forwarded to Tail Workers in production. * Use diagnostics_channel and add typed subscribe Replace console.log-based observability with Node's diagnostics_channel API and update docs/changeset. Events are now published to named channels (agents:state, agents:rpc, agents:message, agents:schedule, agents:lifecycle, agents:workflow, agents:mcp) instead of unconditionally logging to stdout. Add a ChannelEventMap type and a typed subscribe(channel, callback) helper that returns an unsubscribe function. Changes also document Tail Worker integration where published events are forwarded to production tailing. * Observability: diagnostics_channel & typed events Replace console.log observability with Node diagnostics_channel and stricter typed events. Breaking changes to agents/observability types: BaseEvent no longer includes id or displayMessage and payloads are now strict types; Observability.emit signature removed the optional ctx parameter (emit(event: ObservabilityEvent): void). Exported ObservabilityEvent type and refined per-channel unions (including new error event types such as rpc:error, schedule:error, queue:error). Add Agent._emit helper to auto-generate timestamps and replace ~20 inline emit blocks (removed nanoid usage and per-call ids). Update MCP observability emissions to drop legacy fields. Docs updated to describe named channels, typed subscribe helper, Tail Worker integration, and event reference. Tests and example agents cleaned up to stop overriding observability. Bump agents package to minor. If you implement a custom Observability, update your emit signature and narrow on event.type before accessing payload fields.Sunil Pai · e9ae0701 · 2026-02-28
- 3.3ETVfeat: add Workspace class — persistent virtual filesystem for Agents (#1069) * feat: add workspace mixin + refactor mixin patterns to use typeof Agent * Add just-bash and refine fiber typings Add just-bash as a dependency in packages/agents and refine the fiber mixin typings. Introduce a FiberAgentClass generic constructor return type so consumers extending the mixin retain FiberMethods on `this`, broaden spawnFiber's methodName to string, export FiberMethods for external use, and add several internal tracking properties to the interface. Update a test to pass a string methodName accordingly. * Add Workspace class and tests Introduce a new Workspace class that provides durable hybrid file storage (SQLite inline + optional R2 for large files) and optional bash execution. Replaces the previous withWorkspace mixin pattern with a class-based API (usage: new Workspace(this, { namespace, r2, r2Prefix, inlineThreshold, bashLimits })). The Workspace includes namespace validation, per-host registration to avoid duplicate namespaces, a scoped SQL helper, lazy table initialization, and R2 key prefixing. Add TestWorkspaceAgent exposing workspace operations, a comprehensive vitest suite (workspace.test.ts) covering file I/O, dirs, rm, listing, path normalization and bash integration, and wire the agent into the test worker and wrangler config. Add just-bash as a dependency in package.json and include it in the test runtime config. Update changelog (.changeset) to document the new Workspace class. * Add Workspace class; rename idPrefix -> r2Prefix Introduce a new Workspace class providing durable file storage for Agents with a hybrid SQLite+R2 backend and optional just-bash execution (usage: new Workspace(this, { r2, r2Prefix })). Update package exports, tests, and workspace implementation to use the new API (idPrefix renamed to r2Prefix). Also update package metadata and regenerate package-lock.json to reflect dependency changes and added example entries. * Add experimental Workspace and workspace-chat demo Introduce an experimental Workspace virtual filesystem (hybrid SQLite+R2) under agents/experimental/workspace with BashSession, persistent cwd/env sessions, streaming I/O, symlinks, change events, and diagnostics/observability hooks. Add design documentation (design/workspace.md) and register it in design indices. Add a workspace-chat example (frontend, server, configs, vite/wrangler, types) demonstrating AIChatAgent integration and tools (read/write/list/mkdir/bash/glob). Expose the workspace API under the package exports as ./experimental/workspace, add observability helpers and tests updates, and make a small example model comment tweak in the OpenAI SDK sample. * Persist user ID and use for agent name Add a STORAGE_KEY and getUserId() helper that retrieves a persisted UUID from localStorage (or generates one with crypto.randomUUID() and stores it). Fall back to "default" when window is undefined. Use the returned ID as the agent name when initializing the WorkspaceChatAgent so the client retains a stable identifier across sessions. * chore: changeset bump to patch * Delete fix-preserve-server-messages.md * Add turndown stub and Vite alias Introduce a minimal TurndownService stub (remove and turndown methods) at experimental/workspace-chat/src/turndown-stub.ts and update vite.config.ts to import node:path and alias 'turndown' to the stub. This avoids pulling in the real turndown package for the workspace-chat experiment during bundling.Sunil Pai · b5238de6 · 2026-03-05
- 3.3ETVfix(think,ai-chat,agents): harden recovery, transcript integrity & compaction under deploy churn (#1623) * fix(think,ai-chat): stop recovery falsely erroring a turn under repeated mid-turn deploys Under repeated real `wrangler deploy`s mid-turn, chat recovery runs a chain of continuations. Three bugs combined to mark a turn's durable submission `error` even when it actually completed every step (validated end-to-end with the deploy-churn harness + a recovery trace): 1. Lost ownership: the submission link (`recoveredRequestId`) was derived from each continuation's own fresh requestId, so chained continuations dropped it and the continuation that finally completed the turn could not mark the submission `completed`. Now keyed off the stable recovery root and threaded through the whole chain. 2. Stale-continuation clobber: a superseded continuation tripped the `conversation_changed` guard because the leaf had advanced via recovery's own forward progress (a new assistant message), not a new user turn, and overwrote the still-running submission to `error`. Now a superseded continuation skips benignly; only a genuinely newer user turn marks the submission `skipped` (never `error`). 3. Premature stable_timeout: a timeout while waiting for the isolate to settle (common while a deploy is in flight) failed the turn terminally at attempt 1. Now it reschedules within the `maxAttempts` budget. `@cloudflare/ai-chat` shares the recovery machinery but has no durable-submission layer, so it receives only the stable_timeout reschedule fix (mirrored in both `_chatRecoveryContinue` and `_chatRecoveryRetry`). Tests: 7 deterministic unit tests (5 in think, 2 in ai-chat) covering chained ownership, benign superseded skip, newer-user-turn -> skipped, stable_timeout reschedule within budget, and exhaustion. think 441 / ai-chat 475 green. Co-authored-by: Cursor <cursoragent@cursor.com> * test(deploy-churn): add tool-result rollback harness for real-deploy recovery testing Extends the deploy-churn example to drive a long, tool-using session via HTTP (no browser) against a REAL model (Workers AI or Anthropic) while firing real `wrangler deploy`s mid-turn, and measures whether completed tool calls re-run or the durable submission is wrongly errored. - `recordStep` tool: one ledger row per execution, so a re-run of a completed step shows up as a duplicate index (the "rollback" signal). - provider switch (workers-ai | anthropic) stored in a SQL config table so getModel()/getTools() observe it on a fresh post-deploy isolate. - `/drive/start|status|reset` HTTP routes driving `submitMessages` + ledger. - `scripts/deploy-rollback.ts` orchestrator: real deploys during the session, then a CLEAN / MINIMAL / ROLLBACK verdict plus submission status. This reproduced and validated both the #1621 tool-result durability fix and the recovery submission-status fix in the preceding commit. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think,ai-chat): reschedule stable-timeout recovery on a fresh schedule row The stable-timeout retry added in the previous commit used `schedule(..., { idempotent: true })` from INSIDE the currently-executing one-shot `_chatRecoveryContinue`/`_chatRecoveryRetry` schedule row. Because `alarm()` deletes that one-shot row only AFTER the callback returns, the idempotent reschedule deduped onto the still-present executing row and was then deleted with it — so the retry silently never fired and the turn STALLED (incident frozen at `stable_timeout_retry`, submission stuck `running`). A real-deploy repro with a 12s tool reproduced this: the turn stalled at 4/8. With the reschedule switched to a fresh (non-idempotent) delayed row it now completes 8/8 under the same churn. The unit tests previously passed only because they drive the callback directly (no executing row to dedup against). They now pre-insert a matching schedule row to simulate the executing one-shot, and assert the reschedule creates a NEW row. Co-authored-by: Cursor <cursoragent@cursor.com> * test(think,deploy-churn): rollback-depth + task-amplification repros under churn Adds rigorous reproductions used to characterize recovery behavior under deploy churn (all show the framework is BOUNDED — re-runs at most the in-flight step, no deep rollback, no task amplification): - think e2e `tool-rollback.test.ts` + `ThinkToolRollbackE2EAgent`: a long deterministic tool loop with a non-idempotent ledger, rapid SIGKILL/restart; measures rollback DEPTH (re-runs vs evictions). - think e2e `task-amplification.test.ts` + `ThinkTaskParentE2EAgent`: a parent `runTask` tool driving a child agent; verifies an eviction mid-task does NOT re-run the whole child turn. - deploy-churn: configurable per-tool delay (`--delay-ms` / `delayMs`) so a real ~33s `wrangler deploy` lands DURING a tool execution (code-update reset mid-tool). This repro surfaced the stable-timeout reschedule stall fixed in the previous commit. Co-authored-by: Cursor <cursoragent@cursor.com> * test(think): assert re-reconstructing an interrupted stream is idempotent Pins the property that protects against the "disappearing/duplicated completed tool calls" failure mode under churn: re-running recovery on the same interrupted stream (e.g. a second eviction during the persist window) replaces the reconstructed assistant message by its stable id (taken from the stream's `start` chunk) rather than appending a duplicate or losing it. Verified the content is preserved across two recovery passes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): preserve interrupted tool calls as errored instead of deleting them `_repairToolTranscriptParts` deleted any tool call with no recorded output before the next turn. For a tool interrupted mid-execution (deploy/eviction) or an `ask_user` answered by the next message, that: - removed the call from the durable + broadcast transcript (it visibly "disappeared" — the exact symptom in the customer's deploy-churn video), and - let the model silently re-run it, duplicating non-idempotent side effects. Now the orphan is flipped to `state: "output-error"` with an explanatory message: the record is preserved, the model is told the tool errored (so it doesn't blindly re-run it), and conversion still gets a valid tool-result so the provider doesn't 400 with AI_MissingToolResultsError. Stringified `input`s are normalized in the same pass. Blast radius was a single test (which asserted the old deletion); it now asserts preservation. Full think suite green (442). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): ignoreIncompleteToolCalls at convert as a last-line backstop After `_repairTranscriptForProvider` heals orphan tool calls (preserving them as errored results), pass `ignoreIncompleteToolCalls: true` to `convertToModelMessages` so any incomplete tool call that still slips through (compaction edges, addToolOutput races, unrecognized part shapes) is dropped at conversion instead of throwing AI_MissingToolResultsError and wedging the turn. No-op in the common path (the repair runs first); verified no test churn. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(agents): flow Session tokenCounter into compaction boundary logic (#1593) A tokenCounter set on Session.compactAfter() only drove the fire/no-fire trigger; createCompactFunction's tail-budget boundary still used the chars/4 heuristic. On tool-heavy histories that under-counts ~4-5x, so the tail budget covered the whole history and compaction fired every turn but returned null — never shortening history (worse than not configuring it). The Session now passes its counter to the compaction function via a new CompactContext argument; createCompactFunction uses it for the tail walk when no explicit CompactOptions.tokenCounter was given. One counter on compactAfter() now drives both "should we compact?" and "what should we compact?". If the trigger fires but compaction still returns null, the Session logs a one-time warning instead of looping silently. CompactFunction gains an optional second context?: CompactContext arg (backward compatible). Session suite green (74). Co-authored-by: Cursor <cursoragent@cursor.com> * compaction: re-arm no-op warning on success + document per-message counter caveat - Reset the one-time auto-compaction no-op warning when compaction succeeds, so a later regression is surfaced again instead of staying silent. - Document that the Session counter flowed into createCompactFunction is invoked per-message: usage-only counters degrade the tail budget to minTailMessages, and the counter runs O(n) per compaction. Recommend an explicit per-message CompactOptions.tokenCounter for precise budgeting. Co-authored-by: Cursor <cursoragent@cursor.com> * chore: format changeset files with oxfmt * fix(think): give getScheduledChatRecoveryPayloadForTest a serializable return type A `Record<string, unknown>` return collapses to `never` across the Durable Object RPC stub boundary (Workers RPC drops `unknown`-valued records as non-serializable), so the chained-continuation test saw `payload` as `never` and failed typecheck. Return the concrete recovery-link fields instead. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): default a missing tool input to {} during transcript repair Transcript repair already parsed stringified-JSON tool inputs, but a tool call with a missing or null `input` was left unrepaired — Anthropic rejects a `tool_use` block whose `input` is absent, so the turn 400s forever. A new `_normalizeToolInput` helper now also defaults a missing/null input to `{}` on both the orphan-healing and settled-part paths. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(agents): structured retryable failure envelope for agentTool() agentTool() collapsed every non-completed sub-agent run to an opaque { ok: false, error: string }, so a parent agent could not tell a transient interruption (child reset/superseded by a deploy or parent recovery) apart from a terminal failure or an intentional cancellation — and would often parrot the interruption text to the user as final. Failures now return AgentToolFailure { ok: false, status, error, retryable }: interrupted -> retryable: true (and surfaces the interruption reason), while aborted and error -> retryable: false. Backward compatible for consumers reading ok/error; AgentToolFailure is exported from `agents`. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(think): opt-in inactivity watchdog for the streaming read loop A model stream that parks without ever throwing (no chunk, no error, no `done`) left the chat read loop waiting forever — an infinite spinner with no terminal state. There was no detection for a silently hung turn. Add `chatStreamStallTimeoutMs` (default 0 = off): if no UI-message-stream chunk arrives within the window, the watchdog aborts the turn so the loop exits with a terminal stream error (routed through onChatError stage "stream") and emits a new `chat:stream:stalled` observability event. Applies to both the WebSocket turn loop and the chat()/sub-agent callback loop. The watchdog aborts the turn's signal (not a reader cancel) so the AI SDK pipeline tears down without writing to an already-cancelled readable; the abandoned read's rejection is pre-caught to avoid an unhandled rejection. It measures inter-chunk inactivity (which includes tool execution), so it must be set above the slowest expected model TTFT and tool latency. Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(think): address PR #1623 review — dedup reschedule, retry-path test, sharper diagnostics - Extract the stable-timeout reschedule into a shared `_rescheduleRecoveryAfterStableTimeout` helper (mirrors @cloudflare/ai-chat), removing the two inlined near-identical copies in `_chatRecoveryRetry` / `_chatRecoveryContinue` so a future fix can't diverge between them. - Add a `_chatRecoveryRetry` stable-timeout reschedule test (the path previously only covered for `_chatRecoveryContinue`). - Surface when an incomplete tool call survives transcript repair and is about to be dropped by `ignoreIncompleteToolCalls` (warns + emits), so the backstop can't silently mask a repair gap. - Make the compaction no-op warning distinguish a per-message vs whole-prompt (usage) tokenCounter, since "configure a tokenCounter" was misleading when one was already configured. - Document the single-field `_activeChatRecoveryRootRequestId` serialization invariant (safe only because turns are serialized by the turn queue). Co-authored-by: Cursor <cursoragent@cursor.com> * docs: align with PR #1623 behavior changes - chat-agents: transcript repair now heals orphaned tool calls (preserved as errored results) and normalizes malformed/missing inputs — was "removing". - observability: add the new chat:stream:stalled event (agents:chat channel) and clarify chat:transcript:repaired counts (preserved-as-errored + backstop). - think README: document the opt-in chatStreamStallTimeoutMs inactivity watchdog. - agent-tools: document the AgentToolFailure shape and retryable semantics. - sessions: note the compactAfter tokenCounter now also drives the boundary walk (CompactContext), with the per-message/usage-counter caveat. Co-authored-by: Cursor <cursoragent@cursor.com> * docs(think): add chatStreamStallTimeoutMs to the docs-site config reference Mirrors the package README so the developers.cloudflare.com Think config table (the target of the observability chat:stream:stalled link) documents the new inactivity watchdog. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): cancel the source stream when the stall-watchdog wrapper exits early The inactivity-watchdog generator wraps the model stream, but on an early consumer exit (a `break` on an in-band stream error, where the abort signal is NOT set) it never forwarded `.return()` to the source — leaking the wrapped ReadableStream that the old direct `for await` would have cancelled. Add a top-level finally that cancels the source on early termination, skipped after a watchdog stall (which already aborted the upstream, where a late cancel would make the AI SDK write to an already-cancelled readable). Tests: watchdog does not false-fire on a slow-but-steady stream (timer resets per chunk), and an in-band error under an armed watchdog terminates cleanly with no unhandled rejection. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): key submission abandon-paths off the recovery root, not the continuation id _markRecoveredSubmissionInterrupted was called with the per-continuation `requestId` in both terminal abandon paths — recovery exhaustion (`_exhaustChatRecovery`) and `{ continue: false }`. Under chained continuations (recoveryRootRequestId !== requestId) the durable submission row still carries the root id, so the `WHERE request_id = ?` lookup missed it and left the submission stuck `running` forever instead of `error`. Thread the recovery root through both paths (storing it on the incident record for the exhaustion path). Regression test drives a disabled-recovery chained continuation and asserts the root submission flips to `error` (verified to fail without the fix). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): treat output-error as settled in transcript repair _repairToolTranscriptParts' hasOutput check omitted `state === "output-error"`, so a tool part already healed to output-error (no `output` field) — or a tool that legitimately errored — re-entered the heal branch on every turn. That clobbered a real errorText with the generic "interrupted" message and emitted a spurious chat:transcript:repaired event + updateMessage write + broadcast each turn for the life of the conversation. Treat output-error as a settled terminal state (matching _incompleteToolCallIds). Regression test asserts a real errorText survives a follow-up turn (verified to fail without the fix). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(think): treat output-denied as a settled tool state; centralize the enumeration Sweep for the recurring "incomplete terminal-state enumeration" bug class found one more instance: `output-denied` (a user-denied tool approval) is a settled state the AI SDK converts into a denial tool-result, but transcript repair, the backstop detector, and the immediate-flush check all omitted it. Repair therefore flipped a denial into a generic "interrupted" error (losing the denial), and the denied result wasn't durably flushed. - Centralize the terminal-state check into `_toolPartHasSettledResult` (output-available | output-error | output-denied, plus legacy output/result), shared by `_repairToolTranscriptParts` and `_incompleteToolCallIds` so the two can no longer drift. - Flush `tool-output-denied` chunks immediately, like other settled results. - Regression test: an output-denied part survives a follow-up turn (verified to fail without the fix). Reviewed and confirmed complete (no change needed): _isTerminalSubmissionStatus, streamIsTerminal, _messageHasPendingInteraction, shouldMarkSkippedAfterGenerationChange. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(agents): reconcile protects errored/denied tool results from stale-client clobber AIChatAgent sweep for the terminal-state-enumeration bug class: ai-chat's own tool-state handling already covers output-denied (it's the HITL home), but the shared reconciler did not. reconcileMessages (run at persist by both Think and AIChatAgent) only carried over the server's `output-available` result into a stale client part — so a client that persisted a stale `input-available` for a tool the server had already resolved to `output-error`/`output-denied` clobbered the resolved result, losing the error or the user's denial. Index all three terminal states and overlay the matching result field (output / errorText / approval). Tests assert a server output-error and output-denied survive a stale client input-available (verified to fail before). Co-authored-by: Cursor <cursoragent@cursor.com> * refactor: address PR #1623 deep-review-2 hardening items - think: remove the now-dead message-deletion branch in _repairTranscriptForProvider (repair preserves every message, never deletes). - think: _normalizeToolInput now also parses a stringified-ARRAY input (`[...]`), not just objects. + test. - agents(reconciler): make the server-state overlay state-driven so only the field matching the terminal state is carried (a stray `output` on an output-error part can't ride along). + test. - ai-chat: document why _chatRecoveryContinue's conversation_changed skip does NOT split assistant-leaf vs user-leaf like Think (no submission layer to protect) — guards against a future regression if submissions are added. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: ask-bonk[bot] <ask-bonk[bot]@users.noreply.github.com>Sunil Pai · 4c8b3712 · 2026-05-31
- 3.2ETVWIP - Add Telnyx voice provider (#1461) * Add Telnyx voice provider * Add empty changeset for Telnyx provider * Address Telnyx provider review feedback * Fix Telnyx example typecheck * Update voice-providers/telnyx/src/transport/phone-transport.ts Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> * Refocus Telnyx example on phone bridge * dependency version pinning * port telnyx tests * Bump deps, add aria-label, adjust tests Update dependency versions in examples/telnyx-voice-agent/package.json (bumping @cloudflare/kumo, ai, react, and several devDependencies) to newer releases for compatibility. Add an aria-label ("Text message") to the text input in the Telnyx voice agent UI to improve accessibility. Harden durable-chat recovery tests by checking scheduled callback counts before running retry/continue tasks and waiting for the agent to become idle, reducing flakiness in CI. * Harden Telnyx voice provider lifecycle and exports Keep the Telnyx package root server-safe by moving browser WebRTC helpers to the /browser entrypoint, and update the example/docs to use the clearer root-plus-browser import model. This avoids pulling @telnyx/webrtc into Worker bundles while keeping STT, TTS, and JWT helpers available from server-safe paths. Tighten the phone bridge and credential lifecycle so reconnects preserve the preferred audio format, stale async bridge setup is cancelled without leaking browser audio resources, in-flight starts settle on stop, late STT WebSockets are closed, and failed JWT token creation cleans up orphaned Telnyx credentials. The example now warns about local-only unauthenticated token minting, handles overlapping connect/disconnect attempts, and documents the live browser bridge requirement. Co-authored-by: Cursor <cursoragent@cursor.com> * Notify on websocket close error Ensure the TTS stream is unblocked if websocket.close() throws by calling notify() in the close handler catch. Add a test that simulates close() throwing (via MockWebSocket) and verifies aborting the request immediately resolves the synthesize promise to null. --------- Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Christopher Little-Savage <clittle-savage@cloudflare.com> Co-authored-by: Sunil Pai <spai@cloudflare.com> Co-authored-by: Cursor <cursoragent@cursor.com>whoiskatrin · d44f59ad · 2026-05-22
- 3.2ETVfeat: worker-bundler package — runtime app bundler with asset handling (#1079) * Add worker-bundler package and playground Introduce a new packages/worker-bundler package (bundler, config, resolver, installer, transformer, types, utils, build script and tests) and an experimental Worker Bundler Playground app under experimental/worker-bundler-playground. The playground includes a React/Vite UI, a Durable Object AI agent (WorkerPlayground) that uses @cloudflare/ai-chat and @cloudflare/worker-bundler to generate, bundle and load Workers at runtime, plus tooling (vite.config, wrangler.jsonc, tsconfig, styles, README). The agent exposes callable tools to generate and test Workers and persists source files to a Workspace. This change wires up the new package in the repo and adds example/demo app to exercise the bundler end-to-end. * Add createApp app bundler and playground support Introduce full-stack app bundling and preview support: add createApp API and app bundler (packages/worker-bundler/src/app.ts) to build server Workers + client bundles + static assets, asset manifest/storage handling, and optional Durable Object wrapper. Update build script to prebundle asset runtime code and ignore generated file. Extend playground UI and agent to use "app" terminology, support assets, preview iframe proxy, tabs (preview/source/test), and new callable tools (generateApp/testApp). Update wrangler config, README with createApp/docs and examples, and add tests and mime/asset-handler sources. These changes enable generating, bundling, persisting, and previewing full-stack apps in the playground. * Mark worker-bundler experimental and adjust builds Add an experimental warning utility and surface it in the worker-bundler API, plus standardize build configs across packages. Changes include: add src/experimental.ts with showExperimentalWarning(), call it from createApp and createWorker, and add an experimental note to the worker-bundler README; add a ts-expect-error in the playground server for the experimental API; and update build scripts for agents, ai-chat, codemode, and hono-agents to move skipNodeModulesBundle/external into a deps object using neverBundle. These updates warn users about the package's unstable API and unify dependency bundling behavior.Sunil Pai · 91d3ebab · 2026-03-07
- 3.1ETVEnable alarm-backed APIs in sub-agents (#1418) * Add sub-agent alarm recovery support Made-with: Cursor * Tighten facet cleanup bookkeeping Made-with: Cursor * Hide internal schedule storage fields Made-with: Cursor * Stabilize destroy cleanup schema test Made-with: CursorSunil Pai · 8de0ce39 · 2026-04-30
- 2.9ETVfeat: readonly connections — restrict WebSocket clients from modifying agent state (#610) * Add readonly connection support to agents Introduces readonly connections to restrict certain WebSocket clients from modifying agent state while allowing state updates and RPC calls. Adds server-side methods for managing readonly status, persists status in SQL for hibernation, and client-side error handling via onStateUpdateError. Updates documentation and relevant types, client, and React hook implementations. * Add tests and agent for readonly connections Introduces a new TestReadonlyAgent Durable Object and comprehensive tests for readonly connection behavior, including state update restrictions, RPC permissions, persistence, and cleanup. Updates wrangler config to register the new agent for testing. * add a changeset * Refactor readonly-connections tests for type safety Improved type safety in readonly-connections.test.ts by introducing explicit message interfaces and type guards, replacing 'any' with specific types in test helpers and assertions. Also updated TestReadonlyAgent to ignore unused connection parameter in shouldConnectionBeReadonly. These changes enhance code reliability and maintainability in the test suite. * Update linter comments and fix test import Replace biome-ignore comments in packages/agents/src/index.ts with oxlint-disable-next-line typescript-eslint(no-explicit-any) to satisfy the linter while allowing variadic args to be passed through. Also update the test import in packages/agents/src/tests/readonly-connections.test.ts from "../ai-types" to "../types" to match the correct module path. * Add readonly connection support Introduce readonly connections to prevent certain WebSocket clients from modifying agent state while still allowing them to receive updates and call non-mutating RPCs. Adds server APIs (shouldConnectionBeReadonly, setConnectionReadonly, isConnectionReadonly), client onStateUpdateError handling, and enforces restrictions in both the client message handler and Agent.setState(). Internally stores the flag in a namespaced connection attachment (_cf_readonly) and wraps connection.state/setState to preserve the flag across hibernation. Includes design doc, user docs, tests, and a playground demo (ReadonlyDemo + readonly-agent). Also updates example manifests and bumps minor dependencies (hono) where applicable. * Add test agents and rework test worker exports Introduce a suite of test agent classes under packages/agents/src/tests/agents (callable, email, mcp, oauth, race, readonly, schedule, state, workflow) to cover RPC, streaming, MCP tooling, OAuth flows, state management, schedules, workflows and concurrency/read-only behaviors. Add an agents index that re-exports these test agents and update packages/agents/src/tests/worker.ts to re-export the agents from ./agents, simplify imports, and expose an Env type via import-types to avoid runtime circulars. These changes centralize test helpers and streamline test worker wiring. * Deprecate onStateUpdate in favor of onStatePersisted Introduce onStatePersisted as the new server-side state notification hook and deprecate onStateUpdate. Add internal dispatch that calls onStatePersisted (or the deprecated onStateUpdate), emits a one-time console warning per class when the old hook is used, and throws if a class overrides both hooks. Ensure validateStateChange rejections propagate a CF_AGENT_STATE_ERROR message back to the client. Update docs, examples, tests, and test harness (wrangler) to cover the new hook and error behavior. Add changeset documenting the patch. * Rename onStatePersisted to onStateChanged Rename the server-side persistence hook from `onStatePersisted` to `onStateChanged` across the codebase (docs, READMEs, examples, tests, playground, and package implementation). Update Agent internals to detect and call `onStateChanged`, adjust deprecation/error messages (one-time warning for `onStateUpdate`, and error if both hooks are overridden), and update tests/assertions to match the new name. Also update the changeset metadata to deprecate `onStateUpdate` in favor of `onStateChanged`. (Note: validateStateChange rejection behavior that propagates a `CF_AGENT_STATE_ERROR` message to clients is preserved as documented in the changeset.)Sunil Pai · f59f3053 · 2026-02-08
- 2.8ETVfix(ai-chat): wrap onRequest in constructor to guarantee /get-messages works (#953) * fix(ai-chat): wrap onRequest in constructor to guarantee /get-messages works Move /get-messages handling from the prototype override to a constructor wrapper. This ensures the endpoint works even if users override onRequest without calling super.onRequest(). Not a breaking change - fixes broken code without affecting working code. * changeset * test: add test for onRequest override calling super * fix: remove unused import * Support get-messages route; add test agent Detect the /get-messages endpoint by comparing the last path segment (url.pathname.split('/').pop()) instead of using endsWith. Add a new AgentWithoutSuperCall Durable Object and register it in Env and the wrangler test config. Include tests to ensure /get-messages works when onRequest is overridden without calling super, and that non-get-messages routes are still delegated to the user's onRequest override. --------- Co-authored-by: Sunil Pai <spai@cloudflare.com>Matt · bd22d600 · 2026-02-23
- 2.5ETVfeat: experimental sub-agent API and examples (#1060) * Add experimental sub-agent API and examples Introduce an experimental sub-agent system and integrate it into example gadgets. Adds a new RFC (design/rfc-sub-agents.md) describing sub-agents, plus an implementation (packages/agents/src/experimental/sub-agent.ts) with a withSubAgents mixin and typed RPC stubs. Refactors Agents package (index/build/scripts/tests) and adds tests for sub-agents. Update example apps to use sub-agents: experimental/gadgets-chat now uses SubAgent facets, a StreamRelay RPC target, and a client-side AgentChatTransport that supports streaming, cancel, and resume; other gadgets and servers updated to the new API. Misc: docs/README and design listings updated and various package metadata/build changes to wire everything together. * docs: migrate gadgets READMEs to sub-agents Replace experimental Durable Object "facet" docs with the new sub-agent pattern across four READMEs. Updated terminology, diagrams and TypeScript examples to use SubAgent / withSubAgents, added Key Pattern snippets, streaming protocol details (chat), and Related links. Files changed: experimental/gadgets-chat/README.md, experimental/gadgets-gatekeeper/README.md, experimental/gadgets-sandbox/README.md, experimental/gadgets-subagents/README.md. Primarily documentation updates to reflect API/architecture changes (no functional code changes).Sunil Pai · 054e65d1 · 2026-03-04
- 2.5ETVfix(ai-chat): preserve assistant messages across chained continuations (#1162) * Bump core deps: agents, hono, AI providers Upgrade various dependencies across the repo: bump @openai/agents and @openai/agents-extensions to ^0.8.0, workers-ai-provider to ^3.1.7, hono to ^4.12.9, and ai-gateway-provider to ^3.1.2. Update package-lock.json to reflect these new versions (including updated optional provider sub-dependencies). Also apply minor formatting/whitespace cleanup in ai-chat and codemode CHANGELOGs and update example and site package.json files to use the new versions. * fix(ai-chat): preserve assistant messages across chained continuations * Clear _pendingAutoContinuation when continuation response has no body * fix(ai-chat): reuse resume handlers for tool continuations * Update packages/ai-chat/src/index.ts Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> * fix(ai-chat): close stream controller when resume request send fails --------- Co-authored-by: Sunil Pai <spai@cloudflare.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>whoiskatrin · 7053b495 · 2026-03-24
- 2.4ETVfix(chat): orphaned-stream recovery no longer merges a new turn into the previous message (#1691) (#1693) * fix(chat): orphaned-stream recovery no longer merges a new turn into the previous message (#1691) When an AIChatAgent stream is interrupted before its assistant message is persisted (Durable Object hibernation, deploy churn, isolate restart, reconnect), orphan recovery reconstructs the message from stored chunks. If the chunks carry no provider `start.messageId` — the common case with `streamText(...).toUIMessageStreamResponse()`, where the id is assigned client-side — recovery used to fall back to the LAST assistant message in history. That is correct for a continuation, but wrong for a normal new turn after a later user message: the recovered chunks were appended onto the PREVIOUS assistant message, corrupting both the persisted transcript and future model context. Core fix - ResumableStream now persists the allocated assistant message id in stream metadata (`message_id` column, added via a one-time, schema-checked migration) and exposes `getStreamMessageId()`. - `_persistOrphanedStream` keys recovery on that stored id when the chunks carry no provider `start.messageId`, so a new turn becomes its own message and a continuation still merges into the message it was extending (it stored the cloned last-assistant id). A provider `start.messageId` still wins when present. Pre-migration rows keep the legacy last-assistant fallback. - Dropped the now-unused `is_continuation` metadata column. Two related variants of the same corruption on the durable (chatRecovery) continuation path, found during review and fixed here: - Early-persist + recovery (e.g. a tool-approval pause) re-appended chunks it had already stored, duplicating a tool call's parts. Recovery now skips reconstructed parts whose `toolCallId` already exists on the message. - A new turn interrupted before any assistant part was persisted — cut off before the first chunk materialized, or discarded via `onChatRecovery` returning `{ persist: false }` — was "continued" by cloning the previous assistant message and merging into it. `_handleInternalFiberRecovery` now detects that the conversation leaf is still the unanswered user message (no partial to continue) and re-runs the turn fresh, so it becomes its own message. @cloudflare/think is unaffected — its session-tree recovery already allocates a distinct message id per orphan and never falls back to the last assistant message. Tests - New regression + wiring tests in durable-chat-recovery, resumable-streaming, and the test worker, including the fiber-continuation happy path and the two edge cases (empty partial, persist:false) that previously merged. Verification - Verified live against real LLMs (Workers AI, OpenAI, Anthropic) and Think via a SIGKILL-mid-stream / restart harness (wip/issue-1691-live): the recovered turn always lands as its own message and the previous turn is untouched. - Cross-model continuation with large partials is clean (no duplication, no restarts); OpenAI and Anthropic resume a truncated partial to completion. The harness and its methodology notes are documented in its README. * chore(chat): address PR review nits on #1691 recovery fix - Report `recoveryKind: "retry"` to `onChatRecovery` and the incident record for an empty-partial new turn (interrupted before any chunk), since that case is deterministically a retry — it's knowable before the hook runs. The `persist: false` sibling case still reports "continue" (it only becomes a retry based on the hook's own return value) and the comment documents why. - Await `_persistOrphanedStream` in the `triggerInterruptedStreamCheck` test helper so it matches the production fiber-recovery path (latent test-only race, harmless in practice but now correct). - Rename the two `wip/` package.json names to the `@cloudflare/agents-*` prefix so changesets' ignore glob excludes them from versioning/release.Sunil Pai · 6496c802 · 2026-06-06
- 2.3ETVAdd retained streaming agent tools (#1421) * Add agent tool orchestration Introduce first-class agent tools for running chat-capable Think sub-agents from a parent agent. This adds the parent run registry, event replay, cleanup, cancellation wiring, the AI SDK `agentTool` wrapper, React event aggregation, and the Think child adapter needed to stream retained child timelines through the parent connection. Rewrite the agents-as-tools example to consume the public APIs instead of the old helper-event prototype, and refresh docs, READMEs, design notes, tests, and release metadata so the feature is discoverable as the supported agent tools surface. Made-with: Cursor * Support AIChatAgent agent tools Extend the agent-tool child adapter contract to AIChatAgent so existing chat agents can run as retained, streaming tools with durable inspection, replay, and cancellation. Also update the shared live-tail transport for Durable Object RPC byte streams and document the headless client-tool limitation for follow-up work. Made-with: Cursor * Harden agent tool edge cases Persist structured agent-tool outputs, make AIChatAgent stream errors terminal, and expand cancellation/idempotency coverage so retained runs behave consistently across retries and replays. Refresh the docs and schema-version tests to reflect AIChatAgent support and the new parent registry column. Made-with: Cursor * Harden agent tool cancellation cleanup Clean up parent abort listeners after completed agent-tool runs and avoid acquiring stream readers when forwarding starts from an already-aborted signal. Add regression coverage for both edge cases so future cancellation changes preserve the resource cleanup behavior. Made-with: Cursor * Use polling helper for root keepAlive ref count Add expectRootKeepAliveRefCount helper that polls agent.getRootKeepAliveRefCount (up to 20 attempts with a short delay) and use it in sub-agent tests instead of ad-hoc setTimeout waits. This replaces fragile fixed delays with a deterministic polling assert to reduce test flakiness in packages/agents/src/tests/sub-agent.test.ts. * Skip malformed agent tool stream frames Drop malformed or shape-invalid NDJSON frames during agent-tool stream forwarding so a corrupted display chunk does not fail an otherwise completed child run. Add regression coverage for the byte-stream forwarding path. Made-with: Cursor * Test and fix agent-tool in-memory cleanup Add a unit test (packages/think/src/tests/agent-tools.test.ts) that verifies in-memory agent-tool bookkeeping is cleared after a run completes. Extend ThinkTestAgent with helpers to seed a last-error for a run and to inspect map sizes (seedAgentToolLastErrorForTest, getAgentToolCleanupMapSizesForTest). Fix cleanup logic in think.ts to remove entries from _agentToolLastErrors and _agentToolPreTurnAssistantIds when an agent-tool run is torn down to avoid retained in-memory state. * Add types for agent tool test utilities Introduce AgentToolInspection and ThinkAgentToolTestStub types and tighten test helpers' signatures. freshAgent now returns a Promise<ThinkAgentToolTestStub> (with a cast from getAgentByName) and waitForAgentToolRun accepts the stub and returns AgentToolInspection. These changes improve TypeScript safety for agent tool tests and make available explicit method shapes used in the tests (inspectAgentToolRun, seedAgentToolLastErrorForTest, startAgentToolRun, getAgentToolCleanupMapSizesForTest).Sunil Pai · 1b65ff55 · 2026-04-30
- 2.3ETVUpgrade to Vitest 4.1, Vite 8, and @cloudflare/vitest-pool-workers 0.13 (#1138) * Add decorator transform & bump deps Add a vite decorator-transform plugin and wire it into many example/experimental vite.configs, migrate tests for Vitest (new env.d.ts files and numerous test updates), and add a changeset describing MCP schema conversion (replace dynamic import("ai") with z.fromJSONSchema and remove ensureJsonSchema). Also bump multiple example/experimental dependencies (kumo, tailwindcss, nanoid, viem, jose, postal-mime, cronstrue, etc.), update package.json/package-lock, add new scripts and patch files, and remove an old vitest-browser-react patch. * Update vitest.config.ts * Require Zod v4 and warm up workers in tests Bump peer dependency range to require Zod ^4.0.0 across packages and update the changeset to reflect Zod v4 and MCP tool schema conversion (replace dynamic import with z.fromJSONSchema(), remove ensureJsonSchema()). Add Vitest test setup that warms up the Cloudflare worker module graph (beforeAll exports.default.fetch) and retains a short afterAll delay to avoid noisy Durable Object close-handler logs. Add a new setup file and enable setupFiles for the think package, and increase a flaky resumable-streaming test delay from 200ms to 1000ms to reduce CI timeouts.Sunil Pai · 36e2020d · 2026-03-20
- 2.3ETVfeat(think): add create-think package and starter templates (#1695) * feat(think): add create-think package and starter templates Introduce `create-think` (`npm create think`) and a top-level `think-starters/` directory of complete, runnable starter apps, and rework `think init` to scaffold from them via a `--template` flag. Templates - New `think-starters/` workspace members (added to pnpm-workspace.yaml): - basic — minimal Think chat agent + small React UI - personal-assistant — persistent memory (configureSession) + scheduled tasks - coding-agent — workspace file tools + a coding skill (Worker Loader) - customer-support — custom tools + an escalation skill - Each is a self-contained, deployable Workers app (own package.json, wrangler.jsonc, vite.config.ts, agents/**, generated think.d.ts) and uses `workspace:*` deps so they build/test in CI as in-repo examples. think init / @cloudflare/think/cli - Add `--template` (default `basic`) and `--ref` flags to `think init`. - Replace the single inline scaffolder with a template-fetch model: copy from the local `think-starters/` dir when in-repo, otherwise use an injected remote fetcher. On fetch, set the package name and rewrite `workspace:*` deps to published ranges so the app installs standalone. - Expose `initCommand` and template helpers via a new side-effect-free `@cloudflare/think/cli` export (added to build entries + package exports). create-think - New `create-think` package: a thin bin that forwards argv to `initCommand` and injects a degit (tiged) fetcher pulling starters from `cloudflare/agents/think-starters`. Tests / housekeeping - Rewrite CLI init tests for the template model: default template, all templates + workspace-version rewrite, unknown template, injected fetcher (ref/name handling), non-empty/outside-root guards, existing-app no-op, dry-run, and inspect/types on a generated app. - Normalize pending changesets to patch bumps; add changesets for `@cloudflare/think` and the new `create-think` package. * fix(think): rewrite Worker name on scaffold and drop dead --route-prefix - finalizeTemplate now also rewrites the `name` field in the scaffolded wrangler config to the user's project name (targeted replacement that preserves JSONC comments/formatting), so apps no longer all deploy under the shared template Worker name (e.g. "think-basic-starter"). Renamed finalizeTemplatePackageJson -> finalizeTemplate. - Remove the `--route-prefix` option from `think init`: the template-based scaffolder no longer generates config, so the flag was accepted but silently ignored. Also drop the now-unused `routePrefix` from InitCommandOptions and refresh the stale init command description. - Extend the all-templates init test to assert the Worker name is rewritten.Sunil Pai · b545e867 · 2026-06-07