ADR-019: Narrative Composition and Cluster-Overlay Accountability
Status
Accepted — 2026-04-29
Companion implementation plan: sv0-platform/docs/plans/2026-04-29-narrative-backend-redesign.md. Umbrella tracking: SecurityV0/sv0-platform#633.
Context
PR #420 on redesign/narrative-ui-pilot deployed a V2 narrative UI to pr-420-dev.securityv0.com. A 2026-04-29 critical review confirmed Sergey's suspicion that the redesign was UI-templated, not backend-driven:
ui/src/hooks/use-brief-stories.tsemits[founder draft pending]for every cluster exceptsupport-creds(the only entry inVERBATIM_CLUSTER_KEYS).- The Resolution Tracker's
Create ticket/Link ticket/Track action/Attach evidencebuttons have noonClickhandlers. - "8 of 8 stories unowned" reduces to
clusters.lengthbecause the V2 transform never assignsowner. - Real backend exists but is disconnected:
mitigation-actions.ts(path-level state machine),ownership-assignments.ts,posture-service.ts,risk-cluster-service.ts(with founder-authoredRISK_CLUSTER_DEFScluster prose).
The CTO walkthrough on 2026-04-29 reframed the gap. The customer-visible narrative is not pre-written prose; it is a composition over two inputs at query time:
- The idea (editorial). What we want to communicate when a cluster of this kind fires. Authored once per cluster type. Lives in code; overridable per tenant.
- What is forcing this to appear (data). The triggering signals from the deterministic graph + evaluator: which paths matched, how many, which owners departed when, what the top destination is.
The customer reads natural English; the platform always cites which graph node, evaluator output, or count produced each slot. LLM rendering becomes an additive future toggle over the same slot contract.
The same walkthrough also locked in the accountability model: path remains the unit-of-work (existing mitigation-actions); cluster is an overlay record that ties path-level work into a campaign.
Decision
This ADR records four decisions. The implementation detail is in the companion plan; this document is the durable record of why future Claude sessions should not re-litigate these.
Decision 1 — Narrative is composition, not pre-written prose
The customer-visible narrative for a risk cluster is computed at query time as template(idea) + slots(forcing-data):
- Idea template lives per cluster_key. Default in code (extending
RISK_CLUSTER_DEFSwith anidea_templatefield). Tenant overrides in a newcluster_narrativesMongoDB collection, keyed by(tenant_id, cluster_key), with areview_stateofdraft | published. Only published rows are read. - Slot values are pure functions over deterministic graph state — matched paths, ownership history, finding outputs, execution evidence. Each slot carries
provenancepointing to the evidence IDs that produced its value. - Render is a pure function. Same template + same slot values → same rendered prose. Replayable against a baseline.
- LLM render mode is a future toggle over the same slot contract. Toggle off → deterministic template render. Toggle on → LLM rewrites for tone, slot values stay authoritative. The LLM never invents numbers.
This decision rejects two alternatives:
- Pre-written prose blobs (per-tenant or global). Brittle: copy drifts from data, requires re-authoring on every cluster definition change.
- LLM-only narrative at query time. Not falsifiable, breaks the "evidence-grade, explainable" property from
00-overview.md.
It also clarifies the relationship to ADR-005 (platform-only findings): findings remain deterministic; narrative is a presentation layer over findings, also deterministic, also explainable.
Decision 2 — Path is primary, cluster is overlay
Accountability persists at two levels:
- Path-level (existing).
ownership-assignments(target_type ∈ {"path", "entity"}) for who owns the work.mitigation-actionsfor the state machine, optimistic locking, audit trail, ticket linkage. Unchanged. Not extended. - Cluster-level (new). A new
cluster_resolution_recordscollection, keyed by(tenant_id, cluster_key). Stores epic-style metadata: owner, target_date, blocker, parent_ticket_url, snapshot of path_ids at open time, audit trail, version (optimistic locking). Same state machine vocabulary asmitigation-actions.
The relationship is 1:N: one cluster_resolution_records row per (tenant_id, cluster_key) open epic, with N mitigation-actions rows tagged with the same cluster_key. The Tracker reads both — cluster card from cluster_resolution_records, drill-in path list from mitigation-actions filtered by cluster_key.
We considered two consolidation options and rejected both:
-
Option (a) — extend
ownership-assignments.target_type += "cluster". Rejected: cluster-as-overlay is conceptually different from path-as-unit-of-work, and conflating them in one schema would force every consumer ofownership-assignmentsto handle a cluster discriminator they don't otherwise care about. -
Option (b) — reuse
mitigation-actionsfor both path-level tasks and cluster-level epics, leveraging the existingcluster_key?: stringfield (src/domain/mitigation-actions/types.ts:60). Rejected because:mitigation-actionsis path-anchored by contract — requiredpath_id,entity_id,finding_id. An epic spans 8 paths and has no single owning path. Modeling an epic would force these required fields to be relaxed to optional, breaking the invariant that every existing consumer relies on (path-detail page, OLD UI remediation list, evidence-pack assembly).- Epic-only fields don't fit a path-level row:
snapshot.path_ids[],narrative_rendered_at_capture,parent_ticket_url, campaign-scopedtarget_date. Adding them as optional fields scatters polymorphism through everymitigation-actionsconsumer. - State machines diverge meaningfully. Path-level:
proposed → accepted → in_progress → completed → verifiedwith verification fields. Epic-level:open → in_progress → blocked → resolved— verification happens at child path rows, not at epic level. Forcing one shape would either water down the path-level state machine or carry verify fields the epic never sets.
The existing mitigation-actions.cluster_key is a tag on path-level rows (used to filter "show me all actions for cluster X") — it is not, and is not promoted to, an epic primary key. Both fields stay; they describe different things at different layers.
A tenant-level config field (tenants.config.cluster_tracking_mode) controls the relationship between epic and child path actions: epic_only (default; epic stands alone, paths managed independently), epic_plus_paths (deferred; epic creation auto-creates child path actions), disabled (Tracker page hidden).
Decision 3 — Read-only-connector rule applies to ingestion only
Documentation update required: the literal wording in
00-overview.md:329("Read-only API access only — platform never writes to source systems") and05-connectors.mdis correct for connectors but reads as a blanket prohibition on platform-side writes. The same review cycle as this ADR (or immediately after) should patch those two documents to add a sentence distinguishing connector ingestion (read-only, unchanged) from platform-issued tickets (permitted, this ADR). Tracked as open question 12 in the companion implementation plan.
The connector framework's read-only rule (05-connectors.md, 00-overview.md) is about data ingestion: connectors never modify source systems they read from. The rule does not prohibit the platform from issuing tickets to external systems on the user's behalf as part of a remediation action. Ticket creation:
- is initiated by the user (not the connector)
- is the platform proposing a remediation, not modifying customer data
- never touches PII or business records on the source side
- is an additional selling point ("we close the loop, not just surface the problem")
This ADR carves out platform-issued ticket creation as a permitted outbound action. Connectors remain read-only — the carve-out does not apply to them.
For v1, outbound is Layer 0: paste-link plus deeplink-prefill. The platform composes a pre-filled URL (https://acme.atlassian.net/secure/CreateIssue.jspa?summary=…&description=…) from the cluster narrative. The user clicks; their browser opens Jira/GitHub/ServiceNow/Linear with the body pre-loaded; they click Create in the destination tool; they paste the resulting URL back. The platform never holds Jira credentials.
Layer 1 (real outbound API write) is deferred until first customer ask. It will require a follow-up ADR amendment specifying:
- Per-tenant credential storage for outbound API keys
- The list of source systems we'll write to (ServiceNow, Jira, Linear, GitHub Issues)
- Idempotency (don't double-create if the user clicks twice)
- Webhook / polling for status sync (or "no sync, link only" for v1.5)
Layer 2 (bidirectional sync) is deferred indefinitely; only entered if a paying customer requires it.
Decision 4 — Tenant override editor is deferred; v1 ships read path only
The cluster_narratives collection ships in v1 with a fully-functional read path. Tenants get sensible default copy from RISK_CLUSTER_DEFS.idea_template from day one, without any authoring action. A founder or CS person can manually db.cluster_narratives.insertOne(...) to demo or pilot the override mechanism for a specific tenant; that's enough to validate the contract.
A first-class tenant-facing editor UI for narrative overrides is deferred to v1.5+. Reasons:
- v1 is about closing the gap between "polished mockup" and "real implementation," not adding new authored-content surfaces.
- Until we know what a tenant actually wants to override, the editor design is speculative.
- The slot contract may evolve through v1; freezing an editor UI now risks rework.
The promotion path "founder writes a great thesis for a cluster on one tenant; we want it to become the cross-tenant default" is documented as a manual procedure in the founder-driven onboarding runbook: edit RISK_CLUSTER_DEFS.idea_template directly, ship via the platform image.
Consequences
Positive
- Truthful by construction. Slot values come from the graph; numbers can never drift from reality. A demo tenant cannot show a wrong count because a copywriter forgot to update prose.
- Editable shape, immutable facts. A tenant can rewrite "access chains reach" to "service principals access" without touching the count or ownership fields.
- LLM-additive later. Same slot bag, different render. The slot contract is the seam.
- Accountability is real. Tracker buttons stop being decorative.
Create epicwrites to MongoDB. Path-level remediation work continues to use the existing battle-testedmitigation-actionsstate machine. - Outbound integration is shippable today (Layer 0). No new credential vault, no new connector category. A customer with Jira can paste a link in 10 seconds; with deeplink-prefill, the link is pre-loaded with the narrative.
- Read-only-connector rule is preserved for the ingestion path it was actually designed for. Future PRs proposing outbound connector writes still require an ADR.
Negative / cost
- Two new collections add operational surface area. Indexes, retention policy, baseline integration. Mitigated by following existing patterns (
mitigation-actionsschema is the template). - Slot contract is stickier than prose. Changing a slot name (e.g.
top_path_label→top_chain_label) requires migrating everycluster_narrativesdocument that references it. Mitigated by versionedtemplate_id(e.g.orphaned_sensitive.v3→.v4); old override is marked stale until it migrates. - Cluster-as-overlay introduces a model the team hasn't had to reason about before. Some users may expect a cluster to be a first-class accountability object (owner-of-cluster), not an epic. The OPEN QUESTIONS in the implementation plan flag this for Sergey + CISO advisory before GA.
- Layer 0 puts the user in the loop on every ticket. A heavy-volume tenant will ask for Layer 1 quickly. Mitigated by treating Layer 1 as a "first customer pulls it" event, not "we'll get to it eventually."
Risks deliberately accepted
- Cross-tenant author propagation is manual. No automated promotion from tenant override → code default. Acceptable while the platform has <10 tenants. Revisit at scale.
- Snapshot drift. When an epic is opened, the path snapshot is captured. Live cluster membership may diverge over time. The Tracker exposes a
refresh-snapshotaction; the audit trail records snapshot regenerations.
Alternatives considered
A. Per-tenant pre-written prose in cluster_narratives, no slots
Each tenant authors a complete description / whyItMatters / topAction for each cluster. No template, no slots. Rejected: drift between prose and data; copy becomes stale on every sync.
B. Extend ownership-assignments.target_type += "cluster"
Reuse the existing collection for cluster-level ownership. Rejected: conflates distinct concepts (epic vs ownership-of-task), forces every consumer to discriminate, and the mitigation-actions schema doesn't naturally model "snapshot of children at epic-open time."
C. LLM-only narrative
Generate the entire prose at query time from the graph, no founder authoring at all. Rejected: not deterministic, not falsifiable. Breaks evidence-grade explainability.
D. Layer 1 (real outbound API) for v1
Build the credential vault, OAuth flows, and idempotency machinery up front. Rejected: 6+ weeks of work that no current customer has asked for. Layer 0 is shippable today and may cover 80% of the value.
References
- Implementation plan:
sv0-platform/docs/plans/2026-04-29-narrative-backend-redesign.md - Umbrella issue: SecurityV0/sv0-platform#633
- Related: ADR-005 (platform-only findings), ADR-009 (OAA export projection), ADR-013 (container registry)
- Critical review: local handoff
sv0-platform/.claude/session-notes/2026-04-29-pr-420-deploy-and-critical-review-handoff.md(gitignored, not committed) - Existing real-but-disconnected backend cited in the plan:
sv0-platform/src/api/routes/mitigation-actions.tssv0-platform/src/api/routes/ownership-assignments.tssv0-platform/src/services/risk-cluster-service.ts(RISK_CLUSTER_DEFS)sv0-platform/src/services/posture-service.tssv0-platform/src/services/remediation-service.ts