ADR-026: Chain Re-Materialization Triggers — Sync, Evaluation, Deploy

Status

Proposed — 2026-05-19. Implementation lands in a separate sv0-platform PR that cites this ADR.

Context

execution_chains is a derived collection (ADR-008). Chains are assembled by src/ingestion/chain-builder.ts, which BFS-walks from every workload whose identitySubtype is in the ENTRY_POINT_SUBTYPES constant (chain-builder.ts:31). Today the only path that calls assembleExecutionChains is Job 1 of the sync pipeline (src/workers/handlers/sync-ingestion.ts step 9). The behavior is correct as long as either of these holds:

ENTRY_POINT_SUBTYPES is stable, OR
every tenant runs a fresh sync after the constant changes.

Neither holds in practice.

The live regression (2026-05-19)

PR #973 (issue #963, merged 2026-05-15) extended ENTRY_POINT_SUBTYPES to include ai_agent (Foundry coverage), plus script_include and the canonical AWS Step Functions subtypes. The deploy shipped to dev and staging the same day. Seeded tenants on both environments retained execution_chains documents written by earlier syncs — those documents did not contain anchors for the new subtypes, and no subsequent sync re-assembled them.

On 2026-05-19, ~4 days after the deploy, the v0.6 /access-paths/:id page dead-ended with "No access chain materialized for this path yet" because src/api/routes/authority-paths.ts:224 calls getExecutionChainByAnchor(tenantId, path.workload_id) and falls through when the anchor is missing. Live measurement:

default tenant on dev.securityv0.com — 75% of authority paths had no chain.
demo-nimbus tenant on staging.securityv0.com — 100% of authority paths had no chain.

A tactical fix restored chains on both environments by invoking assembleExecutionChains directly against live Mongo (no PR, ops-only — operator script + dev/staging credentials, ~2 minutes per tenant). This is the second occurrence of the same shape: feat(chains): treat AWS workloads as execution-chain entry points (issue #841, PR #843, merged 2026-05-11) had the same property, only it was discovered while the surface still tolerated empty chains.

Why this is an architectural gap, not a hotfix

The pipeline doc (02-processing-pipeline.md) pins chain assembly to Job 1 step 9. Job 2 (evaluate_findings) is documented as read-only over derived state (02-processing-pipeline.md §2.3, current lines 147-189). The pipeline is consistent — but it has no answer for "the code that produces the derived state changed; the source-system inputs did not." Code change does not show up as a connector sync event; it shows up as a deploy.

A deploy gate is the trigger source the pipeline is missing.

Decision

Chain (re-)materialization is triggered by three sources only, in this layering:

(a) `sync_ingestion` worker — continuous

First-class assembly during the sync pipeline (Job 1 step 9). Runs every time a connector submits a NormalizedGraph. Source-system graph changes propagate to execution_chains through this path. This is the existing behavior and does not change. Owned by src/workers/handlers/sync-ingestion.ts.

(b) Deploy-gate `assemble_chains` job — one-shot per active tenant

When a deploy includes changes to chain-builder.ts (or to the schema files that define ChainEntityRef, ENTRY_POINT_SUBTYPES, or the BFS edge list), the deploy enqueues one assemble_chains job per active tenant. The job invokes assembleExecutionChains against the existing entity graph and upserts the resulting chains. It does not re-ingest source-system data.

This is the trigger that closes the gap surfaced 2026-05-19. The job kind is new; the assembly code path is the existing one used by Job 1.

(c) Operator-run recovery — same job kind, manual enqueue

A documented operator script (scripts/cli/assemble-chains.ts or equivalent — name decided at implementation time) enqueues the same assemble_chains job for one or more tenants. This is the cold-recovery path for incidents where the deploy gate missed an enqueue, or where an operator opts to re-materialize a single tenant during investigation. It uses the same job kind as (b) so it inherits the same idempotency guarantees.

What is explicitly excluded

Evaluation never triggers chain assembly. evaluateTenant and evaluate_findings are read-only over execution_chains. The evaluator does not write to that collection under any condition.

This carves out a clean Job 1 / Job 2 separation that the pipeline doc already documents and the code already enforces. The ADR is the durable record of why a future Claude session should not propose moving assembly into Job 2 to "solve" a sync miss.

Alternatives considered

Plan B — `evaluateTenant` assembles chains before evaluation

The simplest local fix: add an assembleExecutionChains call at the top of evaluateTenant. Every evaluation cycle re-materializes; the regression cannot recur because evaluation runs frequently.

Rejected. Adversarial review surfaced four blocking objections:

Violates Job 1 / Job 2 separation. 02-processing-pipeline.md step 9 (Job 1 step 9, current lines 102-104 and 117-126) pins chain assembly to sync. Job 2 is documented as read-only over derived state. Plan B moves a write into Job 2 silently — the doc says one thing and the code does another.
Poisons last_seen_at. chain-builder.ts:147 writes last_seen_at on every assembly. ADR-008 cites last_seen_at as the GC signal for stale chain anchors. If evaluation overwrites last_seen_at on every cycle, every chain looks fresh regardless of whether the source system still produces it. The GC signal is destroyed before any GC code is written against it. (Note: as of 2026-05-19 the GC code does not exist — see the ADR-008 amendment that ships with this ADR — but the documented intent does, and Plan B forecloses it.)
Unbounded perf cost on every evaluate. findEntryPoints (chain-builder.ts:169) queries with limit: 0 — no upper bound. On a tenant with many workload entry-points the BFS is O(entry points × graph fan-out) and runs on every evaluation. For an event that fires once per deploy of chain-builder.ts, this is permanent cost for a rare trigger.
Operator scripts amplify the cost. scripts/reevaluate-findings.ts iterates evaluateTenant across all production tenants. With Plan B, every operator-driven re-evaluation triggers a full assembly across every prod tenant. The blast radius is invisible at the call site.

Plan F — Lazy on-read assembly in the API route

/access-paths/:id detects a missing chain and calls assembleExecutionChains inline before responding.

Rejected. Inverts architectural layers: an HTTP route writes to a derived collection. Adds latency to the first request after a deploy. Concurrent requests race to write the same chain. The API layer becomes the only writer for some chains and shares the write with sync for others — split ownership is the failure shape this ADR is trying to remove.

Plan G — Treat as one-time ops debt, no durable trigger

Accept that chain-builder.ts changes are rare. After each such change, an operator runs the recovery script (path (c) only). No automation.

Rejected for production tenants; acceptable for nimbus-cloud-class tenants. Two-occurrence baseline (PR #841 + PR #973 inside three months) suggests "rare" is wrong at the relevant time-scale. For demo tenants — which are recreated on cadence — the manual path is sufficient and the deploy gate is overhead. The decision keeps path (c) for that case while gating production tenants behind path (b).

Consequences

Positive

Clean Job 1 / Job 2 separation preserved. Evaluation stays read-only over derived state. The pipeline doc and the code agree.
Trigger source matches the root cause. The regression is caused by a code deploy, not by a connector sync. The fix lives at the deploy layer.
last_seen_at semantics preserved. A future GC implementation has a usable signal.
One assembly code path. assembleExecutionChains is invoked from sync, the deploy gate, and the operator script. Behavior diverges at the trigger, not at the assembler.
Idempotent at the job level. Re-running assemble_chains against an unchanged graph upserts the same composition_hash and produces no net mutation beyond last_seen_at.

Negative

Requires deploy-detection plumbing. A CI hook, a version-stamp comparison, or an explicit chain_builder_version constant — choice deferred to implementation review (see Open questions below).
Does not GC stale anchor zombies. When a workload that previously matched ENTRY_POINT_SUBTYPES no longer matches, its chain document persists until a separate GC pass deletes it. This ADR does not introduce GC. A future ADR addresses it.
Operator scripts must remain aware. reevaluate-findings.ts and similar tools must not be confused into thinking re-evaluation also re-materializes chains. The ADR is the durable note that explains why it does not.
stitched_paths shares the same vulnerability. stitched_paths (introduced by sv0-platform PR #1121) is a peer-derived materialization with the same code-deploy-without-resync vulnerability. This ADR does not extend the deploy-gate trigger to that pipeline; a follow-up ADR scopes that decision. Implementers of Plan E should design the trigger mechanism so it can be extended to stitched_paths without re-architecture.

Neutral

No schema change to execution_chains. ADR-008's schema stands as-is.
No change to API contracts. getExecutionChainByAnchor continues to behave as today; it just gets called against a freshly-materialized collection after a deploy.

Open questions

These are deliberately left for the implementing PR — recording them here so the implementer is briefed.

Deploy-detection mechanism. Three candidates:
- CI file-path heuristic — the deploy workflow inspects the merge commit, sees a touch on chain-builder.ts (or the schema files), and emits an enqueue step.
- Version-stamp on execution_chains — each chain doc carries the chain_builder_version it was assembled under; the deploy compares the current code version to the most-recent stamp in Mongo and enqueues if they diverge.
- Explicit chain_builder_version constant in code — bumped manually when the BFS or ENTRY_POINT_SUBTYPES changes; the deploy reads the constant.
Recommend the explicit version constant: it is auditable in the code review, it does not depend on git path heuristics that miss schema changes, and it does not require a Mongo read at deploy time. Needs implementation review.

References

ADR-008: Execution Chains Collection — establishes execution_chains and pins assembly to sync. This ADR adds the (b) deploy-gate trigger; it does not reverse ADR-008. The companion update to ADR-008 (same PR) strikes the obsolete last_seen_at-as-GC-signal claim and tracks a follow-up issue against sv0-platform.
02-processing-pipeline.md §2 — Job 1 step 9 (sync assembly) and §2.3 (Job 2 read-only contract). The companion update in the same PR adds a Deploy-triggered re-materialization section and corrects the stale "deprecated" label on execution_chains in the collection mutation matrix.
sv0-platform issue #963 — "Chain page is unreachable on every path in every tenant — no execution chains exist." The live bug that surfaced the gap.
sv0-platform PR #973 — fix(chain): materialize chains for AI agents + script_include + canonical AWS Step Functions (#963). The deploy that triggered the regression (merged to main 2026-05-15).
sv0-platform issue #841 and PR #843 — feat(chains): treat AWS workloads as execution-chain entry points (merged 2026-05-11). Same shape, earlier occurrence.
Live restoration, 2026-05-19 — ops-only invocation of assembleExecutionChains against default (dev) and demo-nimbus (staging) restored 100% of dead-ended paths within minutes. No PR; the runbook entry for the operator script ships with the implementation PR for path (c).

Honored North Star clauses

C-13 (authoring-artifact, north-star.md line 405 — "SIEM landing supported, not a SIEM console") — the chain page must work if an analyst lands from a SIEM. A /access-paths/:id deep-link from a SIEM alert must render the access chain; the dead-end "No access chain materialized for this path yet" empty state breaks SIEM-cold landing. Plan E closes this by ensuring chain materialization survives chain-builder.ts vocabulary changes without requiring opportunistic re-sync.
C-15 (LOCKED-IN-CODE via ui/src/pages/__tests__/chain-per-path-differentiation.test.tsx, sv0-platform PR #1008; north-star.md line 377) — three paths sharing a workload must render distinguishable content. The 2026-05-19 regression broke per-path chain rendering for every active path on demo-nimbus (100%) and 75% on default — distinguishability collapses to zero when no chain materializes at all. The chain-per-path follow-up workstream (sv0-platform #1020) is a separate forward-looking schema; this ADR does not modify it.

Status​

Context​

The live regression (2026-05-19)​

Why this is an architectural gap, not a hotfix​

Decision​

(a) sync_ingestion worker — continuous​

(b) Deploy-gate assemble_chains job — one-shot per active tenant​

(c) Operator-run recovery — same job kind, manual enqueue​

What is explicitly excluded​

Alternatives considered​

Plan B — evaluateTenant assembles chains before evaluation​

Plan F — Lazy on-read assembly in the API route​

Plan G — Treat as one-time ops debt, no durable trigger​

Consequences​

Positive​

Negative​

Neutral​

Open questions​

References​

Honored North Star clauses​