ADR-019: Narrative Composition and Cluster-Overlay Accountability

Status

Accepted — 2026-04-29

Companion implementation plan: sv0-platform/docs/plans/2026-04-29-narrative-backend-redesign.md. Umbrella tracking: SecurityV0/sv0-platform#633.

Context

PR #420 on redesign/narrative-ui-pilot deployed a V2 narrative UI to pr-420-dev.securityv0.com. A 2026-04-29 critical review confirmed Sergey's suspicion that the redesign was UI-templated, not backend-driven:

ui/src/hooks/use-brief-stories.ts emits [founder draft pending] for every cluster except support-creds (the only entry in VERBATIM_CLUSTER_KEYS).
The Resolution Tracker's Create ticket / Link ticket / Track action / Attach evidence buttons have no onClick handlers.
"8 of 8 stories unowned" reduces to clusters.length because the V2 transform never assigns owner.
Real backend exists but is disconnected: mitigation-actions.ts (path-level state machine), ownership-assignments.ts, posture-service.ts, risk-cluster-service.ts (with founder-authored RISK_CLUSTER_DEFS cluster prose).

The CTO walkthrough on 2026-04-29 reframed the gap. The customer-visible narrative is not pre-written prose; it is a composition over two inputs at query time:

The idea (editorial). What we want to communicate when a cluster of this kind fires. Authored once per cluster type. Lives in code; overridable per tenant.
What is forcing this to appear (data). The triggering signals from the deterministic graph + evaluator: which paths matched, how many, which owners departed when, what the top destination is.

The customer reads natural English; the platform always cites which graph node, evaluator output, or count produced each slot. LLM rendering becomes an additive future toggle over the same slot contract.

The same walkthrough also locked in the accountability model: path remains the unit-of-work (existing mitigation-actions); cluster is an overlay record that ties path-level work into a campaign.

Decision

This ADR records four decisions. The implementation detail is in the companion plan; this document is the durable record of why future Claude sessions should not re-litigate these.

Decision 1 — Narrative is composition, not pre-written prose

The customer-visible narrative for a risk cluster is computed at query time as template(idea) + slots(forcing-data):

Idea template lives per cluster_key. Default in code (extending RISK_CLUSTER_DEFS with an idea_template field). Tenant overrides in a new cluster_narratives MongoDB collection, keyed by (tenant_id, cluster_key), with a review_state of draft | published. Only published rows are read.
Slot values are pure functions over deterministic graph state — matched paths, ownership history, finding outputs, execution evidence. Each slot carries provenance pointing to the evidence IDs that produced its value.
Render is a pure function. Same template + same slot values → same rendered prose. Replayable against a baseline.
LLM render mode is a future toggle over the same slot contract. Toggle off → deterministic template render. Toggle on → LLM rewrites for tone, slot values stay authoritative. The LLM never invents numbers.

This decision rejects two alternatives:

Pre-written prose blobs (per-tenant or global). Brittle: copy drifts from data, requires re-authoring on every cluster definition change.
LLM-only narrative at query time. Not falsifiable, breaks the "evidence-grade, explainable" property from 00-overview.md.

It also clarifies the relationship to ADR-005 (platform-only findings): findings remain deterministic; narrative is a presentation layer over findings, also deterministic, also explainable.

Decision 2 — Path is primary, cluster is overlay

Accountability persists at two levels:

Path-level (existing). ownership-assignments (target_type ∈ {"path", "entity"}) for who owns the work. mitigation-actions for the state machine, optimistic locking, audit trail, ticket linkage. Unchanged. Not extended.
Cluster-level (new). A new cluster_resolution_records collection, keyed by (tenant_id, cluster_key). Stores epic-style metadata: owner, target_date, blocker, parent_ticket_url, snapshot of path_ids at open time, audit trail, version (optimistic locking). Same state machine vocabulary as mitigation-actions.

The relationship is 1:N: one cluster_resolution_records row per (tenant_id, cluster_key) open epic, with N mitigation-actions rows tagged with the same cluster_key. The Tracker reads both — cluster card from cluster_resolution_records, drill-in path list from mitigation-actions filtered by cluster_key.

We considered two consolidation options and rejected both:

Option (a) — extend ownership-assignments.target_type += "cluster". Rejected: cluster-as-overlay is conceptually different from path-as-unit-of-work, and conflating them in one schema would force every consumer of ownership-assignments to handle a cluster discriminator they don't otherwise care about.
Option (b) — reuse mitigation-actions for both path-level tasks and cluster-level epics, leveraging the existing cluster_key?: string field (src/domain/mitigation-actions/types.ts:60). Rejected because:
1. mitigation-actions is path-anchored by contract — required path_id, entity_id, finding_id. An epic spans 8 paths and has no single owning path. Modeling an epic would force these required fields to be relaxed to optional, breaking the invariant that every existing consumer relies on (path-detail page, OLD UI remediation list, evidence-pack assembly).
2. Epic-only fields don't fit a path-level row: snapshot.path_ids[], narrative_rendered_at_capture, parent_ticket_url, campaign-scoped target_date. Adding them as optional fields scatters polymorphism through every mitigation-actions consumer.
3. State machines diverge meaningfully. Path-level: proposed → accepted → in_progress → completed → verified with verification fields. Epic-level: open → in_progress → blocked → resolved — verification happens at child path rows, not at epic level. Forcing one shape would either water down the path-level state machine or carry verify fields the epic never sets.

The existing mitigation-actions.cluster_key is a tag on path-level rows (used to filter "show me all actions for cluster X") — it is not, and is not promoted to, an epic primary key. Both fields stay; they describe different things at different layers.

A tenant-level config field (tenants.config.cluster_tracking_mode) controls the relationship between epic and child path actions: epic_only (default; epic stands alone, paths managed independently), epic_plus_paths (deferred; epic creation auto-creates child path actions), disabled (Tracker page hidden).

Decision 3 — Read-only-connector rule applies to ingestion only

Documentation update required: the literal wording in 00-overview.md:329 ("Read-only API access only — platform never writes to source systems") and 05-connectors.md is correct for connectors but reads as a blanket prohibition on platform-side writes. The same review cycle as this ADR (or immediately after) should patch those two documents to add a sentence distinguishing connector ingestion (read-only, unchanged) from platform-issued tickets (permitted, this ADR). Tracked as open question 12 in the companion implementation plan.

The connector framework's read-only rule (05-connectors.md, 00-overview.md) is about data ingestion: connectors never modify source systems they read from. The rule does not prohibit the platform from issuing tickets to external systems on the user's behalf as part of a remediation action. Ticket creation:

is initiated by the user (not the connector)
is the platform proposing a remediation, not modifying customer data
never touches PII or business records on the source side
is an additional selling point ("we close the loop, not just surface the problem")

This ADR carves out platform-issued ticket creation as a permitted outbound action. Connectors remain read-only — the carve-out does not apply to them.

For v1, outbound is Layer 0: paste-link plus deeplink-prefill. The platform composes a pre-filled URL (https://acme.atlassian.net/secure/CreateIssue.jspa?summary=…&description=…) from the cluster narrative. The user clicks; their browser opens Jira/GitHub/ServiceNow/Linear with the body pre-loaded; they click Create in the destination tool; they paste the resulting URL back. The platform never holds Jira credentials.

Layer 1 (real outbound API write) is deferred until first customer ask. It will require a follow-up ADR amendment specifying:

Per-tenant credential storage for outbound API keys
The list of source systems we'll write to (ServiceNow, Jira, Linear, GitHub Issues)
Idempotency (don't double-create if the user clicks twice)
Webhook / polling for status sync (or "no sync, link only" for v1.5)

Layer 2 (bidirectional sync) is deferred indefinitely; only entered if a paying customer requires it.

Decision 4 — Tenant override editor is deferred; v1 ships read path only

The cluster_narratives collection ships in v1 with a fully-functional read path. Tenants get sensible default copy from RISK_CLUSTER_DEFS.idea_template from day one, without any authoring action. A founder or CS person can manually db.cluster_narratives.insertOne(...) to demo or pilot the override mechanism for a specific tenant; that's enough to validate the contract.

A first-class tenant-facing editor UI for narrative overrides is deferred to v1.5+. Reasons:

v1 is about closing the gap between "polished mockup" and "real implementation," not adding new authored-content surfaces.
Until we know what a tenant actually wants to override, the editor design is speculative.
The slot contract may evolve through v1; freezing an editor UI now risks rework.

The promotion path "founder writes a great thesis for a cluster on one tenant; we want it to become the cross-tenant default" is documented as a manual procedure in the founder-driven onboarding runbook: edit RISK_CLUSTER_DEFS.idea_template directly, ship via the platform image.

Consequences

Positive

Truthful by construction. Slot values come from the graph; numbers can never drift from reality. A demo tenant cannot show a wrong count because a copywriter forgot to update prose.
Editable shape, immutable facts. A tenant can rewrite "access chains reach" to "service principals access" without touching the count or ownership fields.
LLM-additive later. Same slot bag, different render. The slot contract is the seam.
Accountability is real. Tracker buttons stop being decorative. Create epic writes to MongoDB. Path-level remediation work continues to use the existing battle-tested mitigation-actions state machine.
Outbound integration is shippable today (Layer 0). No new credential vault, no new connector category. A customer with Jira can paste a link in 10 seconds; with deeplink-prefill, the link is pre-loaded with the narrative.
Read-only-connector rule is preserved for the ingestion path it was actually designed for. Future PRs proposing outbound connector writes still require an ADR.

Negative / cost

Two new collections add operational surface area. Indexes, retention policy, baseline integration. Mitigated by following existing patterns (mitigation-actions schema is the template).
Slot contract is stickier than prose. Changing a slot name (e.g. top_path_label → top_chain_label) requires migrating every cluster_narratives document that references it. Mitigated by versioned template_id (e.g. orphaned_sensitive.v3 → .v4); old override is marked stale until it migrates.
Cluster-as-overlay introduces a model the team hasn't had to reason about before. Some users may expect a cluster to be a first-class accountability object (owner-of-cluster), not an epic. The OPEN QUESTIONS in the implementation plan flag this for Sergey + CISO advisory before GA.
Layer 0 puts the user in the loop on every ticket. A heavy-volume tenant will ask for Layer 1 quickly. Mitigated by treating Layer 1 as a "first customer pulls it" event, not "we'll get to it eventually."

Risks deliberately accepted

Cross-tenant author propagation is manual. No automated promotion from tenant override → code default. Acceptable while the platform has <10 tenants. Revisit at scale.
Snapshot drift. When an epic is opened, the path snapshot is captured. Live cluster membership may diverge over time. The Tracker exposes a refresh-snapshot action; the audit trail records snapshot regenerations.

Alternatives considered

A. Per-tenant pre-written prose in `cluster_narratives`, no slots

Each tenant authors a complete description / whyItMatters / topAction for each cluster. No template, no slots. Rejected: drift between prose and data; copy becomes stale on every sync.

B. Extend `ownership-assignments.target_type += "cluster"`

Reuse the existing collection for cluster-level ownership. Rejected: conflates distinct concepts (epic vs ownership-of-task), forces every consumer to discriminate, and the mitigation-actions schema doesn't naturally model "snapshot of children at epic-open time."

C. LLM-only narrative

Generate the entire prose at query time from the graph, no founder authoring at all. Rejected: not deterministic, not falsifiable. Breaks evidence-grade explainability.

D. Layer 1 (real outbound API) for v1

Build the credential vault, OAuth flows, and idempotency machinery up front. Rejected: 6+ weeks of work that no current customer has asked for. Layer 0 is shippable today and may cover 80% of the value.

References

Implementation plan: sv0-platform/docs/plans/2026-04-29-narrative-backend-redesign.md
Umbrella issue: SecurityV0/sv0-platform#633
Related: ADR-005 (platform-only findings), ADR-009 (OAA export projection), ADR-013 (container registry)
Critical review: local handoff sv0-platform/.claude/session-notes/2026-04-29-pr-420-deploy-and-critical-review-handoff.md (gitignored, not committed)
Existing real-but-disconnected backend cited in the plan:
- sv0-platform/src/api/routes/mitigation-actions.ts
- sv0-platform/src/api/routes/ownership-assignments.ts
- sv0-platform/src/services/risk-cluster-service.ts (RISK_CLUSTER_DEFS)
- sv0-platform/src/services/posture-service.ts
- sv0-platform/src/services/remediation-service.ts

Status​

Context​

Decision​

Decision 1 — Narrative is composition, not pre-written prose​

Decision 2 — Path is primary, cluster is overlay​

Decision 3 — Read-only-connector rule applies to ingestion only​

Decision 4 — Tenant override editor is deferred; v1 ships read path only​

Consequences​

Positive​

Negative / cost​

Risks deliberately accepted​

Alternatives considered​

A. Per-tenant pre-written prose in cluster_narratives, no slots​

B. Extend ownership-assignments.target_type += "cluster"​

C. LLM-only narrative​

D. Layer 1 (real outbound API) for v1​

References​