Combined Platform Pipeline Architecture

Date: 2026-02-18 Status: Proposed - revised after critical review (CTO/CISO/CEO/Architect lenses) Combines: W1.1 (Doc 14) + Phase 4 (Reconciled Roadmap 4D/4F/4I) Scope boundary: This document defines W1.1 implementation scope (persistent authority paths + path-level temporal findings). Baseline W1 wedge docs remain valid for conceptual UX/logic framing but do not override this architecture plan's persistence/lifecycle contracts.

0. Critical Review Outcomes

This document was reviewed for logical consistency, CISO workflow fit, CEO product direction, and CTO operational viability.

Terminology decision (2026-02-18): "Finding" remains the user-facing, API-facing, and UI-facing term. There is no separate exposures collection. "Exposure" is a derived concept meaning "the combined posture of an Authority Path as expressed by its currently active findings." Authority Paths are the primary durable objects and navigation units. Findings are extended with path-level fields (path_id, effective_from, resolved_at, intervals[]) to support temporal lifecycle on paths.

Severity	Finding	Resolution in this revision
High	Finding ID model (`hash(tenant, path, type)`) did not clearly support open-resolve-reopen history while staying idempotent	Keep deterministic finding key, but make interval history explicit via append-only `intervals[]` on `FindingDoc`
High	Pipeline target described correctness but not run-time observability/alerting needed for production operations	Add explicit operational requirement and dedicated architecture doc reference: `architecture/02-processing-pipeline.md` (sections 9-11)
High	UX screens needed to match Feb 18 mockups: Overview, Risk Cluster Detail, Authority Paths List, Authority Path Detail	Implementation sequence updated with UX phases and visual QA fix items
Medium	Retry, stall handling, and failure escalation were implied but not specified	Define as mandatory execution concerns and track in implementation sequence
Medium	Cleanup/deprecation could remove legacy paths before reliability is proven	Add observability gates before cleanup

1. Why Combine

Two planned changes touch the same pipeline:

Planned Change	What It Does	Pipeline Stage Affected
W1.1: Persistent Authority Paths	Materialize durable path records from entity graph	After entity upsert, before evaluation
W1.1: Path-Level Findings	Time-bound findings on specific paths (with `path_id`, `effective_from`, `intervals[]`)	Evaluation stage
4F: Import → Resolve → Reconcile → Project → Evaluate → Publish	Restructure ingestion into explicit stages	Entire pipeline
4I: Platform-side security_relevance	Move classification from connector to platform	Evaluation stage
4D: Platform-side correlation	Cross-connector entity matching	Reconcile stage

Building W1.1 authority paths now and then rebuilding the pipeline in Phase 4 means doing the work twice. Combining them means we build the target pipeline once with all capabilities.

The key insight: Authority Path materialization IS the "Project" stage of 4F. Path-level finding evaluation IS the "Evaluate" stage. They're not additions to Phase 4 — they are Phase 4.

2. Current Pipeline (What Exists Today)

POST /ingest/normalized-graph
  │
  ▼
┌─────────────────────────────────────────────────┐
│  Job 1: sync_ingestion                          │
│                                                 │
│  1. Create ConnectorSyncDoc (running)           │
│  2. transformGraph() → entities + evidence      │
│  3. computeDiff() → events, changed/created IDs │
│  4. upsertEntities()                            │
│  5. insertEvents()                              │
│  6. insertEntityVersions() for changed          │
│  7. soft-delete absent entities                 │
│  8. upsertExecutionEvidence()                   │
│  9. materializeExecutionPaths()    ← paths      │
│ 10. assembleExecutionChains()      ← chains     │
│ 11. Update sync → completed + metrics           │
└──────────────────────┬──────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────┐
│  Job 2: evaluate_findings                       │
│                                                 │
│  1. Gate: sync.status === "completed"           │
│  2. Query entities (workloads, identities, etc) │
│  3. Run 12 rules against each entity            │
│  4. Upsert FindingDocs                          │
│  5. Enqueue build_evidence_pack per finding     │
└──────────────────────┬──────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────┐
│  Job 3..N: build_evidence_pack (per finding)    │
│                                                 │
│  1. Fetch finding + entity + related entities   │
│  2. Fetch evidence + versions + events          │
│  3. Build 9-section evidence pack               │
│  4. Compute integrity hash (SHA256 + tenant)    │
│  5. Insert EvidencePackDoc                      │
│  6. Update finding with evidence_pack_id        │
└─────────────────────────────────────────────────┘

Problems with current pipeline:

Execution paths are embedded arrays on entities — not queryable, not versioned, rewritten every sync
Execution chains are workload-centric (1 chain per workload) — wrong grain for authority path investigation
Findings are entity-bound — "entity E has orphaned ownership" not "path P has orphaned ownership"
No temporal tracking on findings — no effective_from, no finding duration on specific paths
No path-level finding lifecycle — no open/resolve/reopen history, no intervals[]
Connector does too much — builds NormalizedGraph with pre-computed relationships, classification, filtering

3. Target Pipeline (Combined W1.1 + Phase 4)

POST /ingest/normalized-graph     (current — preserved for backward compat)
POST /ingest/discovered-entities  (new — ADR-004 flat entity format)
  │
  ▼
┌─────────────────────────────────────────────────────────────────┐
│  Stage 1: IMPORT                                                │
│  Worker: sync_import                                            │
│                                                                 │
│  Accept either NormalizedGraph (legacy) or DiscoveredEntities   │
│  (ADR-004). Normalize to internal entity representation.        │
│                                                                 │
│  Steps:                                                         │
│    1. Create ConnectorSyncDoc (status: importing)               │
│    2. transformGraph() or transformDiscoveredEntities()          │
│    3. Validate entity types, required fields, edge targets      │
│    4. Output: RawEntityBatch (entities + evidence + edges)      │
│                                                                 │
│  Writes: connector_syncs                                        │
│  Reads: nothing                                                 │
│  Correctness: schema validation, duplicate detection            │
└──────────────────────┬──────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│  Stage 2: RESOLVE                                               │
│  Worker: sync_resolve                                           │
│                                                                 │
│  Resolve cross-entity relationships. Today this happens in the  │
│  connector (EdgeResolver). Target: platform does it.            │
│                                                                 │
│  Steps:                                                         │
│    1. computeDiff() — detect created/changed/deleted entities   │
│    2. Resolve edges: client_id matching (OAuth→SP), script-text │
│       search (workload→REST), trigger matching                  │
│    3. Classify entities platform-side:                           │
│       - execution_mode (autonomous/operator_assisted/etc)       │
│       - security_relevance (active_external/dormant/internal)   │
│       - egress_category (from endpoint analysis)                │
│    4. upsertEntities()                                          │
│    5. insertEvents()                                            │
│    6. insertEntityVersions()                                    │
│    7. upsertExecutionEvidence()                                 │
│    8. Update sync status: resolved                              │
│                                                                 │
│  Writes: entities, events, entity_versions, execution_evidence  │
│  Reads: entities (existing state for diff)                      │
│  Correctness: diff produces deterministic events,               │
│    entity IDs are stable hashes, versions are temporal           │
└──────────────────────┬──────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│  Stage 3: RECONCILE (future — Phase 4D)                         │
│  Worker: sync_reconcile                                         │
│                                                                 │
│  Cross-connector entity matching. When multiple connectors      │
│  discover the same real-world entity (e.g., Entra SP seen by    │
│  both Entra connector and ServiceNow connector), reconcile      │
│  into a single canonical entity.                                │
│                                                                 │
│  Steps:                                                         │
│    1. Match entities across source_systems by deterministic      │
│       keys (client_id, objectId, email, sys_id)                 │
│    2. Merge properties (prefer source-of-truth system)          │
│    3. Create/update SAME_AS relationships                       │
│    4. Update sync status: reconciled                            │
│                                                                 │
│  Writes: entities (merge), relationships                        │
│  Reads: entities (cross-system query)                           │
│  Correctness: deterministic matching rules, no fuzzy/ML,        │
│    merge conflicts logged as events                             │
│                                                                 │
│  NOTE: Can be a no-op initially (single connector).             │
│  Becomes relevant with 2+ connectors per tenant.                │
└──────────────────────┬──────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│  Stage 4: PROJECT                                               │
│  Worker: sync_project                                           │
│                                                                 │
│  Materialize durable Authority Paths from the entity graph.     │
│  This is the core new stage — replaces embedded execution_paths │
│  and execution_chains with persistent, versioned path records.  │
│                                                                 │
│  Steps:                                                         │
│    1. For each affected workload/identity entity:               │
│       a. BFS: HAS_ROLE → GRANTS → APPLIES_TO → resource        │
│       b. Follow RUNS_AS → identity (cross-lookup)               │
│       c. Follow AUTHENTICATES_TO (depth-limited, cross-system)  │
│       d. For each reachable resource, emit AuthorityPathDoc     │
│    2. Compute path_lineage_id = hash(tenant, workload, resource)│
│    3. Compute _id = hash(tenant, workload, identity, resource)  │
│    4. Compute composition_hash = hash(identity, roles, actions) │
│    5. Upsert into authority_paths collection:                   │
│       - New path: first_seen_at = now, status = active          │
│       - Existing, same hash: update last_seen_at only           │
│       - Existing, different hash: update fields + composition   │
│    6. Mark paths NOT seen in this sync: status = removed,       │
│       removed_at = now                                          │
│    7. Update current_state snapshot on each active path:         │
│       - execution_30d: count evidence in last 30 days           │
│       - ownership_status: check OWNED_BY relationships          │
│       - egress_category: from workload properties               │
│       - active_finding_count: (updated after Stage 5)           │
│    8. Backward compat: continue writing execution_paths[] on    │
│       entities and accessible_by[] on resources (deprecated)    │
│    9. Update sync status: projected                             │
│                                                                 │
│  Writes: authority_paths (upsert), entities (backward compat)   │
│  Reads: entities, execution_evidence (for current_state)        │
│  Correctness:                                                   │
│    - Path IDs are deterministic hashes (idempotent upserts)     │
│    - composition_hash detects mutations (role added/removed)    │
│    - Removed paths are soft-deleted, never hard-deleted         │
│    - path_lineage_id groups identity rotations                  │
│    - Sync metrics: paths_created, paths_updated, paths_removed  │
└──────────────────────┬──────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│  Stage 5: EVALUATE                                              │
│  Worker: sync_evaluate                                          │
│                                                                 │
│  Produce path-level findings in the `findings` collection.      │
│  Extends FindingDoc with path-level fields (path_id,            │
│  effective_from, resolved_at, intervals[]). "Finding" is the    │
│  user/API/UI term. "Exposure" is a derived concept: the         │
│  combined posture of an Authority Path expressed by its         │
│  currently active findings.                                     │
│                                                                 │
│  Steps:                                                         │
│    1. Query active authority_paths for this tenant               │
│    2. For each path, evaluate finding rules:                    │
│                                                                 │
│       ┌──────────────────────────────────────────────────────┐  │
│       │  Finding Rules (path-level)                          │  │
│       │                                                      │  │
│       │  orphaned_ownership: no active OWNED_BY on workload   │  │
│       │  dormant_authority:  no evidence in 90 days          │  │
│       │  reachable_sensitive_domain: sensitivity = restricted │  │
│       │  unknown_identity_binding: identity_id is null       │  │
│       │  unproven_execution: no execution evidence linked    │  │
│       │  scope_drift:        roles increased vs baseline     │  │
│       │  llm_egress:         egress_category = llm           │  │
│       │  external_egress:    egress_category = external      │  │
│       │  ownership_ambiguous: only group/team owners         │  │
│       │  ownership_unknown:  insufficient ownership metadata │  │
│       └──────────────────────────────────────────────────────┘  │
│                                                                 │
│    3. For each rule that fires:                                 │
│       a. Compute finding_key = hash(tenant, path_id, type)      │
│       b. Upsert FindingDoc by finding_key (in `findings`        │
│          collection — same collection, extended schema)          │
│       c. If new OR existing status is remediated:               │
│          - set status = active                                   │
│          - set effective_from = detected_at = now                │
│          - append new interval in intervals[]                    │
│       d. If existing + active: update last_evaluated_at          │
│       e. Generate deterministic explanation                      │
│                                                                 │
│    4. Auto-resolve findings whose conditions no longer hold:    │
│       a. Query active path-level findings for evaluated paths   │
│       b. If finding type NOT in current evaluation results:     │
│          - Set status = remediated, resolved_at = now           │
│          - Set resolution_reason (owner_assigned, evidence_     │
│            appeared, path_removed, sensitivity_downgraded)      │
│          - Close current interval in intervals[]                 │
│       c. If status = acknowledged|false_positive and condition   │
│          still fires: keep status, update last_evaluated_at      │
│                                                                 │
│    5. Update current_state.active_finding_count on paths        │
│    6. Update current_state.max_finding_severity on paths        │
│                                                                 │
│    7. Update sync status: evaluated                             │
│    8. Enqueue build_evidence_pack for changed findings          │
│                                                                 │
│  Writes: findings (upsert), authority_paths (current_state)     │
│  Reads: authority_paths, entities, execution_evidence,          │
│         entity_versions, findings (existing)                    │
│  Correctness:                                                   │
│    - Finding IDs are deterministic (idempotent)                 │
│    - Auto-resolve is re-entrant: same rule set re-evaluated     │
│      each sync, condition gone = finding remediated             │
│    - effective_from never changes after creation                │
│    - resolved_at + resolution_reason create audit trail         │
│    - Finding duration = resolved_at - effective_from            │
│    - Sync metrics: findings_opened, findings_resolved,          │
│      findings_unchanged                                         │
└──────────────────────┬──────────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────────┐
│  Stage 6: PUBLISH                                               │
│  Worker: sync_publish                                           │
│                                                                 │
│  Build evidence packs for changed findings. Update sync to      │
│  completed. Record final metrics.                               │
│                                                                 │
│  Steps:                                                         │
│    1. For each changed finding:                                 │
│       a. Fetch path + workload + identity + related entities    │
│       b. Build 9-section evidence pack                          │
│       c. Compute integrity hash (SHA256 + tenant_id)            │
│       d. Insert EvidencePackDoc with previous_pack_id chain     │
│       e. Update finding with evidence_pack_id                   │
│                                                                 │
│    2. Update ConnectorSyncDoc:                                  │
│       - status: completed                                       │
│       - completed_at: now                                       │
│       - metrics: {                                              │
│           entities_created, entities_updated, entities_deleted,  │
│           events_created,                                       │
│           paths_created, paths_updated, paths_removed,          │
│           findings_opened, findings_resolved,                   │
│           evidence_packs_built                                  │
│         }                                                       │
│                                                                 │
│  Writes: evidence_packs, findings (pack_id), connector_syncs    │
│  Reads: findings, authority_paths, entities, execution_evidence │
│         entity_versions, events                                 │
│  Correctness:                                                   │
│    - Evidence packs are immutable after creation                │
│    - Integrity hash includes tenant_id (cross-tenant tamper)    │
│    - Pack chaining via previous_pack_id (audit evolution)       │
│    - Sync metrics are the definitive record of what happened    │
└─────────────────────────────────────────────────────────────────┘

4. MongoDB Collections (Target State)

Existing (extended)

Collection	Purpose	Written By	Changes
`connector_syncs`	Sync metadata + metrics	Import, Publish	Metrics renamed (see Section 7)
`entities`	Entity graph (9 types)	Resolve	Unchanged
`entity_versions`	Temporal snapshots	Resolve	Unchanged
`events`	Change audit log	Resolve	Unchanged
`execution_evidence`	Activity records from source systems	Resolve	Unchanged
`findings`	Path-level and entity-level findings	Evaluate	Extended with `path_id`, `effective_from`, `resolved_at`, `intervals[]` (see Section 5)
`evidence_packs`	Immutable evidence attestations	Publish	Unchanged

New

Collection	Purpose	Written By
`authority_paths`	Durable workload→resource routes	Project

Note: There is no separate exposures collection. "Exposure" is a derived concept meaning "the combined posture of an Authority Path as expressed by its currently active findings." All finding records (entity-level and path-level) live in the findings collection. Path-level findings are distinguished by the presence of path_id.

Deprecated (remove after migration)

Collection	Replaced By
`execution_chains`	`authority_paths`
`entities.execution_paths[]`	`authority_paths` collection (embedded array kept temporarily for backward compat)
`entities.accessible_by[]`	Derived from `authority_paths` at query time

5. Data Model (Target)

AuthorityPathDoc

interface AuthorityPathDoc {
  _id: string;                    // hash(tenant, workload_id, identity_id, resource_id)
  tenant_id: string;
  path_lineage_id: string;        // hash(tenant, workload_id, resource_id)

  // Path nodes
  workload_id: string;
  identity_id: string | null;     // null = unbound
  destination_id: string;
  data_domain: string;

  // Path metadata
  sensitivity: string;
  via_roles: string[];
  actions: string[];
  source_system: string;
  auth_chain_depth: number;

  // Denormalized state (updated each sync by Project + Evaluate)
  current_state: {
    execution_30d: number;
    ownership_status: string;     // valid, orphaned, ambiguous, unknown
    egress_category: string;      // external, llm, internal, none
    active_finding_count: number;
    max_finding_severity: string | null;
  };

  // Mutation detection
  composition_hash: string;       // hash(identity, roles, actions)

  // Temporal
  first_seen_at: Date;
  last_seen_at: Date;
  status: "active" | "removed";
  removed_at?: Date;

  // Sync tracking
  sync_version: number;
  created_at: Date;
  updated_at: Date;
}

Indexes:

{ tenant_id: 1, workload_id: 1, status: 1 } — paths from a workload
{ tenant_id: 1, identity_id: 1 } — paths through an identity
{ tenant_id: 1, data_domain: 1, sensitivity: 1 } — paths to sensitive domains
{ tenant_id: 1, status: 1, "current_state.max_finding_severity": 1 } — active paths with findings
{ tenant_id: 1, path_lineage_id: 1 } — lineage grouping

FindingDoc (Extended for Path-Level Findings)

The existing FindingDoc in src/domain/findings/types.ts is extended with path-level fields. There is no separate ExposureDoc. Both entity-level findings (legacy, path_id absent) and path-level findings (path_id present) coexist in the same findings collection.

interface FindingDoc {
  // Existing fields (unchanged)
  _id: string;                    // path-level: hash(tenant, path_id, finding_type)
                                  // entity-level (legacy): "eval:" + hash
  tenant_id: string;
  entity_id: string;              // workload (denormalized for both types)
  finding_type: FindingType;      // orphaned_ownership, dormant_authority, etc.
  severity: FindingSeverity;
  explanation: string;
  status: "active" | "acknowledged" | "remediated" | "false_positive";
  resolution_reason?: string;
  evidence_refs: Record<string, unknown>;
  evidence_completeness: EvidenceCompletenessSection;
  evidence_pack_id?: string;
  sync_version: number;
  detected_at: Date;
  last_evaluated_at: Date;

  // === NEW: Path-level fields (added for path-level findings) ===

  path_id?: string;               // links to authority_paths._id
                                  // absent = entity-level finding (legacy)
                                  // present = path-level finding (new)

  effective_from?: Date;          // when condition became true on this path
  resolved_at?: Date;             // when condition ended on this path
                                  // NOTE: "resolved_at" is the canonical field name
                                  // regardless of resolution reason (remediated,
                                  // false_positive, path_removed). The resolution_reason
                                  // field captures WHY. UI renders status "remediated"
                                  // as "Resolved".

  intervals?: Array<{
    effective_from: Date;         // interval open timestamp
    resolved_at?: Date;           // interval close timestamp
    resolution_reason?: string;   // set when remediated/closed
  }>;                             // append-only history for open/resolve/reopen cycles
                                  // SAFEGUARD: if intervals.length > 50, rotate to a new
                                  // FindingDoc (new _id, link via previous_finding_id) to
                                  // prevent mega-document performance degradation in MongoDB.
                                  // 50 intervals ≈ 25 flicker cycles — well above normal
                                  // operational patterns (typical: 1-3 intervals per finding).
  previous_finding_id?: string;   // links to rotated-out finding doc (if intervals exceeded cap)
}

New indexes (in addition to existing):

{ tenant_id: 1, path_id: 1, status: 1 } — findings on a specific path
{ tenant_id: 1, path_id: 1, finding_type: 1 } — unique finding per path+type (for deterministic upsert)
{ tenant_id: 1, status: 1, severity: 1, path_id: 1 } — active path-level findings by severity

Note on finding types: Keep the canonical FindingType taxonomy (for example: orphaned_ownership, dormant_authority, reachable_sensitive_domain, unknown_identity_binding, unproven_execution, scope_drift, llm_egress, external_egress, ownership_ambiguous, ownership_unknown). Do not introduce parallel _path-suffixed finding types. Path scope is represented by path_id, not by duplicating type names.

6. API Endpoints (Target)

Authority Paths

Row model decision: Each Authority Path is one row (path-level, not lineage-level). When the same workload reaches the same destination through multiple identities, each identity produces a separate row. Findings are bound to specific paths. Lineage-level grouping (collapsible rows by path_lineage_id) may be added later as a UI-only presentation concern without schema or API changes.

GET  /api/v1/authority-paths
     ?status=active|removed
     ?sensitivity=restricted,confidential
     ?data_domain=finance
     ?workload_id=...
     ?identity_id=...
     ?has_findings=true
     ?limit=50&cursor=...

GET  /api/v1/authority-paths/:id

GET  /api/v1/authority-paths/:id/findings
     ?status=active|acknowledged|remediated|false_positive
     ?limit=50&cursor=...

List row contract (AuthorityPathListItem) — shared by both /authority-paths and /risk-clusters/:key/authority-paths:

interface AuthorityPathListItem {
  _id: string;
  path_lineage_id: string;

  // Path nodes (display names resolved server-side)
  workload: { id: string; display_name: string };
  identity: { id: string; display_name: string } | null;  // null = unbound
  destination: { id: string; display_name: string };
  data_domain: string;

  // Path metadata
  sensitivity: string;
  via_roles: string[];
  source_system: string;

  // UX row fields
  first_seen_at: string;                // ISO date, shown as "First Seen" + recency tag
  last_seen_at: string;                 // ISO date, shown as "Last Execution"
  execution_30d: number;                // "30d Executions" column
  ownership_status: string;             // "valid" | "orphaned" | "ambiguous" | "unknown"
  egress_category: string;              // "external" | "llm" | "internal" | "none"
  is_autonomous: boolean;               // true if workload execution_mode = "autonomous"

  // Finding pills (rendered as severity-colored badges)
  active_finding_count: number;
  max_finding_severity: string | null;  // "critical" | "high" | "medium" | "low" | null
  finding_types: string[];              // active finding types for pill labels

  // Status
  status: "active" | "removed";
}

Identity column in collapsed row: Each row represents one full path, so identity is always a single value. For paths with identity: null (unbound workloads), display "Unbound" with a warning indicator. The identity column shows the identity's display_name (e.g., "svc-billing-sync (Service Principal)").

Findings (extended — primary API surface)

"Finding" is the user/API/UI-facing term. Path-level findings have path_id set; entity-level findings (legacy) do not.

GET  /api/v1/findings
     ?status=active|acknowledged|remediated|false_positive
     ?severity=critical,high
     ?finding_type=orphaned_ownership,dormant_authority
     ?path_id=...                    (filter to specific authority path)
     ?entity_id=...
     ?scope=path|entity|all          (default: all — filter by path-level vs entity-level)
     ?limit=50&cursor=...

GET  /api/v1/findings/:id

GET  /api/v1/findings/:id/evidence-pack
     ?format=json|markdown

PATCH /api/v1/findings/:id/status
      { status: "acknowledged" | "false_positive" | "remediated", reason?: string }

UI mapping note: status = remediated is rendered as Resolved in path-level finding panels.

Posture Summary

GET  /api/v1/posture/summary
     Returns: active autonomous authority path count, active paths with
     invalid ownership count, delta since last refresh

Risk Clusters (computed from active findings on paths)

UX decision: Risk clusters are pre-configured compound conditions. The cluster detail page shows a locked, read-only view of authority paths matching the cluster's condition. Users cannot edit filters on the cluster page. A "View in Authority Paths" link carries the cluster's conditions as pre-filled editable filters to the Authority Paths inventory page for ad-hoc exploration. Custom/configurable clusters may be added in a future release.

GET  /api/v1/posture/risk-clusters
     Returns: cluster summaries computed from active path-level findings

GET  /api/v1/risk-clusters/:key/authority-paths
     Returns: AuthorityPathListItem[] matching cluster compound condition
     Uses the same row contract as /authority-paths (see above)

Backward Compat (deprecated, kept temporarily)

GET  /api/v1/chains            — alias for /authority-paths

7. Correctness Model

How We Know the Pipeline Worked Correctly

Each sync produces a ConnectorSyncDoc with complete metrics. This is the correctness record.

interface SyncMetrics {
  // Stage 1: Import
  nodes_received: number;
  edges_received: number;
  validation_errors: number;

  // Stage 2: Resolve
  entities_created: number;
  entities_updated: number;
  entities_deleted: number;
  events_created: number;
  evidence_upserted: number;
  classifications_computed: number;     // security_relevance, execution_mode

  // Stage 3: Reconcile
  cross_connector_matches: number;      // 0 until multi-connector

  // Stage 4: Project
  paths_created: number;               // new authority paths discovered
  paths_updated: number;               // existing paths with composition change
  paths_unchanged: number;             // existing paths confirmed (last_seen_at updated)
  paths_removed: number;               // paths not seen → status: removed

  // Stage 5: Evaluate
  findings_opened: number;             // new path-level finding conditions detected
  findings_resolved: number;           // conditions no longer true
  findings_unchanged: number;          // existing active findings re-confirmed

  // Stage 6: Publish
  evidence_packs_built: number;

  // Timing
  stage_durations: {
    import_ms: number;
    resolve_ms: number;
    reconcile_ms: number;
    project_ms: number;
    evaluate_ms: number;
    publish_ms: number;
    total_ms: number;
  };
}

Invariants (always true after a successful sync)

Path completeness: Every entity with entity_type = workload|identity that has reachable resources via HAS_ROLE→GRANTS→APPLIES_TO has at least one active authority_path record
Path consistency: authority_paths.composition_hash matches the current entity graph state. If roles or actions changed, the hash changed, and paths_updated counter incremented
Finding re-entrancy: Re-running the same sync produces identical finding state. Rules are pure functions. Same input = same output
Temporal monotonicity: effective_from on a path-level finding never changes after creation. resolved_at is set exactly once per interval. last_evaluated_at always increases
Soft-delete guarantee: No authority_path or finding document is ever hard-deleted. paths_removed + findings_resolved are always >= 0
Evidence integrity: Every evidence_pack has a SHA256 hash that includes tenant_id. Tampering is detectable
Metric accounting: paths_created + paths_updated + paths_unchanged + paths_removed = total paths seen + total paths previously active but not seen
Interval cap: No FindingDoc has intervals.length > 50. If a finding reaches the cap, it is rotated to a new document (new _id, linked via previous_finding_id). This prevents mega-document performance degradation in MongoDB (16MB doc limit, slow updates on large arrays)

Correctness Checks (automated, run after each sync)

async function validateSyncCorrectness(tenantId: string, syncId: string): Promise<ValidationResult> {
  const sync = await storage.getSync(tenantId, syncId);
  const errors: string[] = [];

  // 1. All active paths have at least one node reference that exists
  const activePaths = await storage.queryAuthorityPaths(tenantId, { status: "active" });
  for (const path of activePaths) {
    const workload = await storage.getEntity(tenantId, path.workload_id);
    if (!workload) errors.push(`Path ${path._id}: workload ${path.workload_id} not found`);
    const resource = await storage.getEntity(tenantId, path.destination_id);
    if (!resource) errors.push(`Path ${path._id}: destination ${path.destination_id} not found`);
  }

  // 2. All active path-level findings reference an existing active path
  const activeFindings = await storage.queryFindings(tenantId, { status: "active", scope: "path" });
  for (const finding of activeFindings) {
    if (!finding.path_id) continue; // entity-level finding, skip
    const path = await storage.getAuthorityPath(tenantId, finding.path_id);
    if (!path) errors.push(`Finding ${finding._id}: path ${finding.path_id} not found`);
    if (path && path.status === "removed") {
      errors.push(`Finding ${finding._id}: active finding on removed path ${finding.path_id}`);
    }
  }

  // 3. current_state.active_finding_count matches actual count
  for (const path of activePaths) {
    const pathFindings = activeFindings.filter(f => f.path_id === path._id);
    if (pathFindings.length !== path.current_state.active_finding_count) {
      errors.push(`Path ${path._id}: finding count mismatch (${path.current_state.active_finding_count} vs ${pathFindings.length})`);
    }
  }

  // 4. Metrics add up
  const totalPathsProcessed = sync.metrics.paths_created + sync.metrics.paths_updated
    + sync.metrics.paths_unchanged + sync.metrics.paths_removed;
  // Should equal: active paths from previous sync + new paths from this sync

  return { valid: errors.length === 0, errors, sync_id: syncId };
}

8. Connector Responsibility (Current → Target)

Current (connector does too much)

Connector:
Discover entities from source APIs
Resolve cross-entity edges (EdgeResolver)
Classify entities (execution_mode, security_relevance, egress_category)
Filter entities (internal_inventory pre-filter)
Build NormalizedGraph with full relationship set
Submit to platform

Platform:
Store entities
Compute execution paths (BFS)
Assemble execution chains
Evaluate findings (entity-level)
Build evidence packs

Target (connector is thin, platform does the work)

Connector:
Discover entities from source APIs (flat DiscoveredEntities)
Resolve source-system-specific edges (client_id match, script search)
     ↑ Only edges that require source-system knowledge
Submit to platform

Platform:
Import: validate, normalize
Resolve: cross-connector matching, platform-side classification
Reconcile: multi-connector entity merge (future)
Project: materialize authority paths (BFS → persistent path records)
Evaluate: compute path-level findings (persistent, temporal, with intervals[])
Publish: evidence packs, sync metrics, correctness validation

What moves from connector to platform:

security_relevance classification → platform evaluator (4I)
execution_mode classification → platform evaluator (can derive from trigger types in properties)
egress_category classification → platform evaluator (can derive from endpoint URLs in properties)
internal_inventory filtering → platform UI filter (show/hide toggle)
Cross-connector entity matching → platform reconcile stage (4D)

What stays in the connector:

Source API authentication and discovery
Source-system-specific edge resolution (OAuth client_id → SP matching requires ServiceNow + Entra API knowledge)
Raw property extraction (display names, trigger types, endpoints)

9. Implementation Sequence

W1.1 implementation note: Section 3 describes the long-term target 6-stage pipeline with separate workers per stage. For W1.1, we follow architecture/02-processing-pipeline.md: the monolithic sync_ingestion handler gains materializeAuthorityPaths() as a new step, and the existing evaluate_findings handler gains a path-level evaluation pass. No separate stage workers are created. The stages below map to functions within the existing handlers, not to new workers.

Phase 1: Authority Path Persistence (Project stage)

Scope: New authority_paths collection, path materializer writes path records, new API endpoints.

Task	What	Touches
1a	Define `AuthorityPathDoc` type in `src/domain/paths/types.ts`	New file
1b	Add `authority_paths` to schema manager (indexes)	`src/storage/mongo/schema.ts`
1c	Add path storage methods to `StorageAdapter`	`src/storage/storage-adapter.ts`
1d	Implement `materializeAuthorityPaths()` function	`src/ingestion/authority-path-materializer.ts` (new module)
1e	Call materializer from `sync_ingestion` handler after chain assembly (step 11 in Doc 09)	`src/workers/handlers/sync-ingestion.ts`
1f	Add `GET /authority-paths` and `GET /authority-paths/:id` API	New route file
1g	Tests: path materialization, upsert idempotency, removal	New test files

Backward compat: Continue writing execution_paths[] on entities. Both exist during transition.

Phase 2: Path-Level Finding Extension (Evaluate stage)

Scope: Extend FindingDoc with path-level fields (path_id, effective_from, resolved_at, intervals[]). Evaluator produces path-level findings in the existing findings collection. No new collection.

Task	What	Touches
2a	Extend `FindingDoc` type with `path_id`, `effective_from`, `resolved_at`, `intervals[]`	`src/domain/findings/types.ts`
2b	Add new finding indexes for path-level queries	`src/storage/mongo/schema.ts`
2c	Add path-level finding query methods to `StorageAdapter` (`queryFindings` with `scope` param)	`src/storage/storage-adapter.ts`
2d	Create path-level finding evaluator (wraps existing rules, maps to paths)	New evaluator module
2e	Implement finding lifecycle on paths (open/resolve/reopen with `intervals[]`)	`src/workers/handlers/evaluate-findings.ts`
2f	Extend `GET /findings` with `?path_id=...&scope=path\|entity\|all` filters	`src/api/routes/findings.ts`
2g	Add `GET /authority-paths/:id/findings`	New route
2h	Add `FindingType` enum values for path-level types	`src/domain/findings/types.ts`
2i	Tests: path-level finding lifecycle, auto-resolve, interval append, idempotency	New test files

Key decision: No ExposureDoc, no exposures collection. Path-level findings coexist with entity-level findings in findings. Distinguish by presence of path_id.

Phase 3: UX — Overview + Risk Clusters (Feb 18 mockups)

Scope: Implement the Overview homepage and Risk Cluster Detail screens per Feb 18 mockups.

Task	What	Touches
3a	Add `GET /api/v1/posture/summary` API (active path count, invalid ownership count, delta)	New route
3b	Add `GET /api/v1/posture/risk-clusters` API (top clusters from active findings on paths)	New route
3c	Add `GET /api/v1/risk-clusters/:key/authority-paths` API (paths matching cluster condition)	New route
3d	Overview page — "Autonomous Authority Posture": 2 posture cards (Active Autonomous Authority Paths + Dormant Autonomous Authority Paths), "Since Last Refresh" delta section (`+X new paths`, `+Y ownership invalidations`), Top 5 Risk Clusters list	New UI page
3e	Risk Cluster Detail page — click cluster → locked, read-only table of Authority Paths matching cluster condition (no ad-hoc filter editing). Includes inline expand (findings, ownership, authority diagram, evidence completeness). "View in Authority Paths" link carries filters to inventory page for ad-hoc exploration	New UI page
3f	Update nav: Overview \| Risk Clusters \| Authority Paths \| Graph \| Settings	UI layout

Phase 4: UX — Authority Paths List + Detail (Feb 18 mockups)

Scope: Implement the Authority Paths List and Authority Path Detail screens.

Task	What	Touches
4a	Authority Paths List page — all paths with search, filters, pagination. Columns: ID, Authority Path (workload→destination), Last Execution, 30d Executions, First Seen + tag pills	New UI page
4b	Authority Path Detail page — breadcrumb nav, dagre diagram (Workload → Identity → Destination → Data Domain), findings panel, authority state, ownership breakdown, automation metadata, linkage proof	New UI page
4c	Path-level finding timeline in detail view (open/resolve/reopen intervals)	UI component
4d	Evidence completeness bar on path detail	UI component
4e	Update TanStack Query hooks for authority-paths, path-findings	`ui/src/hooks/`

Phase 5: Visual QA Fixes

Scope: Address visual QA items with migration-aware sequencing (avoid over-investing in soon-to-be-replaced surfaces).

Batch	ID	Category	Fix	Touches
Pre-migration must-fix	B1	Bug	Dashboard posture summary shows contradictory totals — fix API query and tenant/type filter correctness	API + UI
Pre-migration must-fix	D1	Design	Truncated finding descriptions — add tooltip/full-text affordance	UI components
Pre-migration must-fix	D2	Design	Date format inconsistency across pages — standardize formatter	UI date utils
Pre-migration must-fix	D4	Design	Graph legend gap — add missing node/edge legend entries	Graph Explorer
Pre-migration must-fix	U2	UX	Badge contrast — improve unknown/low-contrast status badges	UI theme
Pre-migration must-fix	U6	UX	Graph centering — auto-fit graph on load	Graph Explorer
Post-migration/defer	D3	Design	Chains empty-state guidance (only if Chains survives migration window)	UI pages
Post-migration/defer	U1	UX	Posture card tooltips (rework on new path-based cards only)	Overview page
Post-migration/defer	U3	UX	Domain name formatting	UI formatters
Post-migration/defer	U4	UX	Redundant filter cleanup	Filter sidebar
Post-migration/defer	U5	UX	Temporal compare guidance text	Temporal Compare
Post-migration/defer	U7	UX	Sync row expand affordance	Syncs page

Phase 6: Platform-Side Classification (4I)

Scope: Move security_relevance, execution_mode, egress_category from connector to platform.

Task	What	Touches
6a	Add classification rules to Resolve stage	New classifier module
6b	Remove connector-side classification code	Connector repo
6c	Add UI filter toggle for `security_relevance`	UI filter sidebar

Phase 7: Cleanup (blocked until Phase 8 gates pass)

Task	What
7a	Deprecate `execution_chains` collection
7b	Remove `execution_paths[]` embedded array from entities
7c	Remove entity-level finding types (after confirming all consumers use path-level)
7d	Run correctness validation across all tenant data

Phase 8: Operational Hardening (CTO requirement)

Task	What
8a	Adopt observability contract in `architecture/02-processing-pipeline.md` (sections 9-11)
8b	Add stage-level metrics, traces, and structured logs across all 6 stages
8c	Configure early alerts for failure, stall, ingestion silence, and backlog growth
8d	Run failure drills: transient failure retry, permanent failure escalation, replay from checkpoint
8e	Gate Phase 7 cleanup on 2 consecutive weeks of stable SLO compliance

10. Operational Requirement (Non-Optional)

For production rollout, this pipeline must be implemented with explicit runtime observability and alerting. The detailed architecture is specified in:

docs/architecture/02-processing-pipeline.md (sections 9-11: observability, SLOs, alerts, dashboards)

This is required to satisfy CTO requirements for:

Clear batch/ETL pipeline execution model and stage transitions
Runtime observability of stage health, queue depth, and data freshness
Early failure alerting with actionable ownership and escalation

11. Acceptance Criteria

Clicking a Risk Cluster shows Authority Paths as primary rows
Each path row shows execution magnitude (30d), ownership status, active finding count, and max severity — without additional API calls
Path detail page shows finding timeline with effective_from and status transitions
A finding can be opened, remediated (shown as Resolved in UI), and re-opened on the same path without losing history (via intervals[])
Removed paths and remediated findings remain historically queryable
Every sync produces a SyncMetrics record with complete accounting of paths/findings processed
Correctness validation passes after every sync (no orphaned findings, no stale counts)
Platform can classify security_relevance and execution_mode without connector changes
Re-running the same sync produces identical path and finding state (deterministic, idempotent)
Re-opened findings preserve prior intervals in intervals[] and remain audit-queryable
Pipeline emits operational telemetry and triggers alerts before silent data staleness occurs
Overview page matches Feb 18 mockup: 2 posture cards (Active + Dormant) + delta + Top 5 Risk Clusters
Authority Paths list and detail pages match Feb 18 mockups (search, filters, dagre diagram, findings panel)
Navigation: Overview | Risk Clusters | Authority Paths | Graph | Settings

0. Critical Review Outcomes​

1. Why Combine​

2. Current Pipeline (What Exists Today)​

3. Target Pipeline (Combined W1.1 + Phase 4)​

4. MongoDB Collections (Target State)​

Existing (extended)​

New​

Deprecated (remove after migration)​

5. Data Model (Target)​

AuthorityPathDoc​

FindingDoc (Extended for Path-Level Findings)​

6. API Endpoints (Target)​

Authority Paths​

Findings (extended — primary API surface)​

Posture Summary​

Risk Clusters (computed from active findings on paths)​

Backward Compat (deprecated, kept temporarily)​

7. Correctness Model​

How We Know the Pipeline Worked Correctly​

Invariants (always true after a successful sync)​

Correctness Checks (automated, run after each sync)​

8. Connector Responsibility (Current → Target)​

Current (connector does too much)​

Target (connector is thin, platform does the work)​

9. Implementation Sequence​

Phase 1: Authority Path Persistence (Project stage)​

Phase 2: Path-Level Finding Extension (Evaluate stage)​

Phase 3: UX — Overview + Risk Clusters (Feb 18 mockups)​

Phase 4: UX — Authority Paths List + Detail (Feb 18 mockups)​

Phase 5: Visual QA Fixes​

Phase 6: Platform-Side Classification (4I)​

Phase 7: Cleanup (blocked until Phase 8 gates pass)​

Phase 8: Operational Hardening (CTO requirement)​

10. Operational Requirement (Non-Optional)​

11. Acceptance Criteria​