ADR-026: Chain Re-Materialization Triggers — Sync, Evaluation, Deploy
Status
Proposed — 2026-05-19. Implementation lands in a separate sv0-platform PR that cites this ADR.
Context
execution_chains is a derived collection (ADR-008). Chains are assembled by src/ingestion/chain-builder.ts, which BFS-walks from every workload whose identitySubtype is in the ENTRY_POINT_SUBTYPES constant (chain-builder.ts:31). Today the only path that calls assembleExecutionChains is Job 1 of the sync pipeline (src/workers/handlers/sync-ingestion.ts step 9). The behavior is correct as long as either of these holds:
ENTRY_POINT_SUBTYPESis stable, OR- every tenant runs a fresh sync after the constant changes.
Neither holds in practice.
The live regression (2026-05-19)
PR #973 (issue #963, merged 2026-05-15) extended ENTRY_POINT_SUBTYPES to include ai_agent (Foundry coverage), plus script_include and the canonical AWS Step Functions subtypes. The deploy shipped to dev and staging the same day. Seeded tenants on both environments retained execution_chains documents written by earlier syncs — those documents did not contain anchors for the new subtypes, and no subsequent sync re-assembled them.
On 2026-05-19, ~4 days after the deploy, the v0.6 /access-paths/:id page dead-ended with "No access chain materialized for this path yet" because src/api/routes/authority-paths.ts:224 calls getExecutionChainByAnchor(tenantId, path.workload_id) and falls through when the anchor is missing. Live measurement:
defaulttenant ondev.securityv0.com— 75% of authority paths had no chain.demo-nimbustenant onstaging.securityv0.com— 100% of authority paths had no chain.
A tactical fix restored chains on both environments by invoking assembleExecutionChains directly against live Mongo (no PR, ops-only — operator script + dev/staging credentials, ~2 minutes per tenant). This is the second occurrence of the same shape: feat(chains): treat AWS workloads as execution-chain entry points (issue #841, PR #843, merged 2026-05-11) had the same property, only it was discovered while the surface still tolerated empty chains.
Why this is an architectural gap, not a hotfix
The pipeline doc (02-processing-pipeline.md) pins chain assembly to Job 1 step 9. Job 2 (evaluate_findings) is documented as read-only over derived state (02-processing-pipeline.md §2.3, current lines 147-189). The pipeline is consistent — but it has no answer for "the code that produces the derived state changed; the source-system inputs did not." Code change does not show up as a connector sync event; it shows up as a deploy.
A deploy gate is the trigger source the pipeline is missing.
Decision
Chain (re-)materialization is triggered by three sources only, in this layering:
(a) sync_ingestion worker — continuous
First-class assembly during the sync pipeline (Job 1 step 9). Runs every time a connector submits a NormalizedGraph. Source-system graph changes propagate to execution_chains through this path. This is the existing behavior and does not change. Owned by src/workers/handlers/sync-ingestion.ts.
(b) Deploy-gate assemble_chains job — one-shot per active tenant
When a deploy includes changes to chain-builder.ts (or to the schema files that define ChainEntityRef, ENTRY_POINT_SUBTYPES, or the BFS edge list), the deploy enqueues one assemble_chains job per active tenant. The job invokes assembleExecutionChains against the existing entity graph and upserts the resulting chains. It does not re-ingest source-system data.
This is the trigger that closes the gap surfaced 2026-05-19. The job kind is new; the assembly code path is the existing one used by Job 1.
(c) Operator-run recovery — same job kind, manual enqueue
A documented operator script (scripts/cli/assemble-chains.ts or equivalent — name decided at implementation time) enqueues the same assemble_chains job for one or more tenants. This is the cold-recovery path for incidents where the deploy gate missed an enqueue, or where an operator opts to re-materialize a single tenant during investigation. It uses the same job kind as (b) so it inherits the same idempotency guarantees.
What is explicitly excluded
Evaluation never triggers chain assembly. evaluateTenant and evaluate_findings are read-only over execution_chains. The evaluator does not write to that collection under any condition.
This carves out a clean Job 1 / Job 2 separation that the pipeline doc already documents and the code already enforces. The ADR is the durable record of why a future Claude session should not propose moving assembly into Job 2 to "solve" a sync miss.
Alternatives considered
Plan B — evaluateTenant assembles chains before evaluation
The simplest local fix: add an assembleExecutionChains call at the top of evaluateTenant. Every evaluation cycle re-materializes; the regression cannot recur because evaluation runs frequently.
Rejected. Adversarial review surfaced four blocking objections:
- Violates Job 1 / Job 2 separation.
02-processing-pipeline.mdstep 9 (Job 1 step 9, current lines 102-104 and 117-126) pins chain assembly to sync. Job 2 is documented as read-only over derived state. Plan B moves a write into Job 2 silently — the doc says one thing and the code does another. - Poisons
last_seen_at.chain-builder.ts:147writeslast_seen_aton every assembly. ADR-008 citeslast_seen_atas the GC signal for stale chain anchors. If evaluation overwriteslast_seen_aton every cycle, every chain looks fresh regardless of whether the source system still produces it. The GC signal is destroyed before any GC code is written against it. (Note: as of 2026-05-19 the GC code does not exist — see the ADR-008 amendment that ships with this ADR — but the documented intent does, and Plan B forecloses it.) - Unbounded perf cost on every evaluate.
findEntryPoints(chain-builder.ts:169) queries withlimit: 0— no upper bound. On a tenant with many workload entry-points the BFS is O(entry points × graph fan-out) and runs on every evaluation. For an event that fires once per deploy ofchain-builder.ts, this is permanent cost for a rare trigger. - Operator scripts amplify the cost.
scripts/reevaluate-findings.tsiteratesevaluateTenantacross all production tenants. With Plan B, every operator-driven re-evaluation triggers a full assembly across every prod tenant. The blast radius is invisible at the call site.
Plan F — Lazy on-read assembly in the API route
/access-paths/:id detects a missing chain and calls assembleExecutionChains inline before responding.
Rejected. Inverts architectural layers: an HTTP route writes to a derived collection. Adds latency to the first request after a deploy. Concurrent requests race to write the same chain. The API layer becomes the only writer for some chains and shares the write with sync for others — split ownership is the failure shape this ADR is trying to remove.
Plan G — Treat as one-time ops debt, no durable trigger
Accept that chain-builder.ts changes are rare. After each such change, an operator runs the recovery script (path (c) only). No automation.
Rejected for production tenants; acceptable for nimbus-cloud-class tenants. Two-occurrence baseline (PR #841 + PR #973 inside three months) suggests "rare" is wrong at the relevant time-scale. For demo tenants — which are recreated on cadence — the manual path is sufficient and the deploy gate is overhead. The decision keeps path (c) for that case while gating production tenants behind path (b).
Consequences
Positive
- Clean Job 1 / Job 2 separation preserved. Evaluation stays read-only over derived state. The pipeline doc and the code agree.
- Trigger source matches the root cause. The regression is caused by a code deploy, not by a connector sync. The fix lives at the deploy layer.
last_seen_atsemantics preserved. A future GC implementation has a usable signal.- One assembly code path.
assembleExecutionChainsis invoked from sync, the deploy gate, and the operator script. Behavior diverges at the trigger, not at the assembler. - Idempotent at the job level. Re-running
assemble_chainsagainst an unchanged graph upserts the samecomposition_hashand produces no net mutation beyondlast_seen_at.
Negative
- Requires deploy-detection plumbing. A CI hook, a version-stamp comparison, or an explicit
chain_builder_versionconstant — choice deferred to implementation review (see Open questions below). - Does not GC stale anchor zombies. When a workload that previously matched
ENTRY_POINT_SUBTYPESno longer matches, its chain document persists until a separate GC pass deletes it. This ADR does not introduce GC. A future ADR addresses it. - Operator scripts must remain aware.
reevaluate-findings.tsand similar tools must not be confused into thinking re-evaluation also re-materializes chains. The ADR is the durable note that explains why it does not. stitched_pathsshares the same vulnerability.stitched_paths(introduced by sv0-platform PR #1121) is a peer-derived materialization with the same code-deploy-without-resync vulnerability. This ADR does not extend the deploy-gate trigger to that pipeline; a follow-up ADR scopes that decision. Implementers of Plan E should design the trigger mechanism so it can be extended tostitched_pathswithout re-architecture.
Neutral
- No schema change to
execution_chains. ADR-008's schema stands as-is. - No change to API contracts.
getExecutionChainByAnchorcontinues to behave as today; it just gets called against a freshly-materialized collection after a deploy.
Open questions
These are deliberately left for the implementing PR — recording them here so the implementer is briefed.
-
Deploy-detection mechanism. Three candidates:
- CI file-path heuristic — the deploy workflow inspects the merge commit, sees a touch on
chain-builder.ts(or the schema files), and emits an enqueue step. - Version-stamp on
execution_chains— each chain doc carries thechain_builder_versionit was assembled under; the deploy compares the current code version to the most-recent stamp in Mongo and enqueues if they diverge. - Explicit
chain_builder_versionconstant in code — bumped manually when the BFS orENTRY_POINT_SUBTYPESchanges; the deploy reads the constant.
Recommend the explicit version constant: it is auditable in the code review, it does not depend on git path heuristics that miss schema changes, and it does not require a Mongo read at deploy time. Needs implementation review.
- CI file-path heuristic — the deploy workflow inspects the merge commit, sees a touch on
References
- ADR-008: Execution Chains Collection — establishes
execution_chainsand pins assembly to sync. This ADR adds the (b) deploy-gate trigger; it does not reverse ADR-008. The companion update to ADR-008 (same PR) strikes the obsoletelast_seen_at-as-GC-signal claim and tracks a follow-up issue against sv0-platform. 02-processing-pipeline.md§2 — Job 1 step 9 (sync assembly) and §2.3 (Job 2 read-only contract). The companion update in the same PR adds aDeploy-triggered re-materializationsection and corrects the stale "deprecated" label onexecution_chainsin the collection mutation matrix.- sv0-platform issue #963 — "Chain page is unreachable on every path in every tenant — no execution chains exist." The live bug that surfaced the gap.
- sv0-platform PR #973 —
fix(chain): materialize chains for AI agents + script_include + canonical AWS Step Functions (#963). The deploy that triggered the regression (merged to main 2026-05-15). - sv0-platform issue #841 and PR #843 —
feat(chains): treat AWS workloads as execution-chain entry points(merged 2026-05-11). Same shape, earlier occurrence. - Live restoration, 2026-05-19 — ops-only invocation of
assembleExecutionChainsagainstdefault(dev) anddemo-nimbus(staging) restored 100% of dead-ended paths within minutes. No PR; the runbook entry for the operator script ships with the implementation PR for path (c).
Honored North Star clauses
C-13(authoring-artifact,north-star.mdline 405 — "SIEM landing supported, not a SIEM console") — the chain page must work if an analyst lands from a SIEM. A/access-paths/:iddeep-link from a SIEM alert must render the access chain; the dead-end "No access chain materialized for this path yet" empty state breaks SIEM-cold landing. Plan E closes this by ensuring chain materialization surviveschain-builder.tsvocabulary changes without requiring opportunistic re-sync.C-15(LOCKED-IN-CODEviaui/src/pages/__tests__/chain-per-path-differentiation.test.tsx, sv0-platform PR #1008;north-star.mdline 377) — three paths sharing a workload must render distinguishable content. The 2026-05-19 regression broke per-path chain rendering for every active path ondemo-nimbus(100%) and 75% ondefault— distinguishability collapses to zero when no chain materializes at all. The chain-per-path follow-up workstream (sv0-platform #1020) is a separate forward-looking schema; this ADR does not modify it.