Combined Platform Pipeline Architecture
Date: 2026-02-18 Status: Proposed - revised after critical review (CTO/CISO/CEO/Architect lenses) Combines: W1.1 (Doc 14) + Phase 4 (Reconciled Roadmap 4D/4F/4I) Scope boundary: This document defines W1.1 implementation scope (persistent authority paths + path-level temporal findings). Baseline W1 wedge docs remain valid for conceptual UX/logic framing but do not override this architecture plan's persistence/lifecycle contracts.
0. Critical Review Outcomes
This document was reviewed for logical consistency, CISO workflow fit, CEO product direction, and CTO operational viability.
Terminology decision (2026-02-18): "Finding" remains the user-facing, API-facing, and UI-facing term. There is no separate exposures collection. "Exposure" is a derived concept meaning "the combined posture of an Authority Path as expressed by its currently active findings." Authority Paths are the primary durable objects and navigation units. Findings are extended with path-level fields (path_id, effective_from, resolved_at, intervals[]) to support temporal lifecycle on paths.
| Severity | Finding | Resolution in this revision |
|---|---|---|
| High | Finding ID model (hash(tenant, path, type)) did not clearly support open-resolve-reopen history while staying idempotent | Keep deterministic finding key, but make interval history explicit via append-only intervals[] on FindingDoc |
| High | Pipeline target described correctness but not run-time observability/alerting needed for production operations | Add explicit operational requirement and dedicated architecture doc reference: architecture/02-processing-pipeline.md (sections 9-11) |
| High | UX screens needed to match Feb 18 mockups: Overview, Risk Cluster Detail, Authority Paths List, Authority Path Detail | Implementation sequence updated with UX phases and visual QA fix items |
| Medium | Retry, stall handling, and failure escalation were implied but not specified | Define as mandatory execution concerns and track in implementation sequence |
| Medium | Cleanup/deprecation could remove legacy paths before reliability is proven | Add observability gates before cleanup |
1. Why Combine
Two planned changes touch the same pipeline:
| Planned Change | What It Does | Pipeline Stage Affected |
|---|---|---|
| W1.1: Persistent Authority Paths | Materialize durable path records from entity graph | After entity upsert, before evaluation |
| W1.1: Path-Level Findings | Time-bound findings on specific paths (with path_id, effective_from, intervals[]) | Evaluation stage |
| 4F: Import → Resolve → Reconcile → Project → Evaluate → Publish | Restructure ingestion into explicit stages | Entire pipeline |
| 4I: Platform-side security_relevance | Move classification from connector to platform | Evaluation stage |
| 4D: Platform-side correlation | Cross-connector entity matching | Reconcile stage |
Building W1.1 authority paths now and then rebuilding the pipeline in Phase 4 means doing the work twice. Combining them means we build the target pipeline once with all capabilities.
The key insight: Authority Path materialization IS the "Project" stage of 4F. Path-level finding evaluation IS the "Evaluate" stage. They're not additions to Phase 4 — they are Phase 4.
2. Current Pipeline (What Exists Today)
POST /ingest/normalized-graph
│
▼
┌─────────────────────────────────────────────────┐
│ Job 1: sync_ingestion │
│ │
│ 1. Create ConnectorSyncDoc (running) │
│ 2. transformGraph() → entities + evidence │
│ 3. computeDiff() → events, changed/created IDs │
│ 4. upsertEntities() │
│ 5. insertEvents() │
│ 6. insertEntityVersions() for changed │
│ 7. soft-delete absent entities │
│ 8. upsertExecutionEvidence() │
│ 9. materializeExecutionPaths() ← paths │
│ 10. assembleExecutionChains() ← chains │
│ 11. Update sync → completed + metrics │
└──────────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Job 2: evaluate_findings │
│ │
│ 1. Gate: sync.status === "completed" │
│ 2. Query entities (workloads, identities, etc) │
│ 3. Run 12 rules against each entity │
│ 4. Upsert FindingDocs │
│ 5. Enqueue build_evidence_pack per finding │
└──────────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Job 3..N: build_evidence_pack (per finding) │
│ │
│ 1. Fetch finding + entity + related entities │
│ 2. Fetch evidence + versions + events │
│ 3. Build 9-section evidence pack │
│ 4. Compute integrity hash (SHA256 + tenant) │
│ 5. Insert EvidencePackDoc │
│ 6. Update finding with evidence_pack_id │
└─────────────────────────────────────────────────┘
Problems with current pipeline:
- Execution paths are embedded arrays on entities — not queryable, not versioned, rewritten every sync
- Execution chains are workload-centric (1 chain per workload) — wrong grain for authority path investigation
- Findings are entity-bound — "entity E has orphaned ownership" not "path P has orphaned ownership"
- No temporal tracking on findings — no
effective_from, no finding duration on specific paths - No path-level finding lifecycle — no open/resolve/reopen history, no
intervals[] - Connector does too much — builds NormalizedGraph with pre-computed relationships, classification, filtering
3. Target Pipeline (Combined W1.1 + Phase 4)
POST /ingest/normalized-graph (current — preserved for backward compat)
POST /ingest/discovered-entities (new — ADR-004 flat entity format)
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 1: IMPORT │
│ Worker: sync_import │
│ │
│ Accept either NormalizedGraph (legacy) or DiscoveredEntities │
│ (ADR-004). Normalize to internal entity representation. │
│ │
│ Steps: │
│ 1. Create ConnectorSyncDoc (status: importing) │
│ 2. transformGraph() or transformDiscoveredEntities() │
│ 3. Validate entity types, required fields, edge targets │
│ 4. Output: RawEntityBatch (entities + evidence + edges) │
│ │
│ Writes: connector_syncs │
│ Reads: nothing │
│ Correctness: schema validation, duplicate detection │
└──────────────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 2: RESOLVE │
│ Worker: sync_resolve │
│ │
│ Resolve cross-entity relationships. Today this happens in the │
│ connector (EdgeResolver). Target: platform does it. │
│ │
│ Steps: │
│ 1. computeDiff() — detect created/changed/deleted entities │
│ 2. Resolve edges: client_id matching (OAuth→SP), script-text │
│ search (workload→REST), trigger matching │
│ 3. Classify entities platform-side: │
│ - execution_mode (autonomous/operator_assisted/etc) │
│ - security_relevance (active_external/dormant/internal) │
│ - egress_category (from endpoint analysis) │
│ 4. upsertEntities() │
│ 5. insertEvents() │
│ 6. insertEntityVersions() │
│ 7. upsertExecutionEvidence() │
│ 8. Update sync status: resolved │
│ │
│ Writes: entities, events, entity_versions, execution_evidence │
│ Reads: entities (existing state for diff) │
│ Correctness: diff produces deterministic events, │
│ entity IDs are stable hashes, versions are temporal │
└──────────────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 3: RECONCILE (future — Phase 4D) │
│ Worker: sync_reconcile │
│ │
│ Cross-connector entity matching. When multiple connectors │
│ discover the same real-world entity (e.g., Entra SP seen by │
│ both Entra connector and ServiceNow connector), reconcile │
│ into a single canonical entity. │
│ │
│ Steps: │
│ 1. Match entities across source_systems by deterministic │
│ keys (client_id, objectId, email, sys_id) │
│ 2. Merge properties (prefer source-of-truth system) │
│ 3. Create/update SAME_AS relationships │
│ 4. Update sync status: reconciled │
│ │
│ Writes: entities (merge), relationships │
│ Reads: entities (cross-system query) │
│ Correctness: deterministic matching rules, no fuzzy/ML, │
│ merge conflicts logged as events │
│ │
│ NOTE: Can be a no-op initially (single connector). │
│ Becomes relevant with 2+ connectors per tenant. │
└──────────────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 4: PROJECT │
│ Worker: sync_project │
│ │
│ Materialize durable Authority Paths from the entity graph. │
│ This is the core new stage — replaces embedded execution_paths │
│ and execution_chains with persistent, versioned path records. │
│ │
│ Steps: │
│ 1. For each affected workload/identity entity: │
│ a. BFS: HAS_ROLE → GRANTS → APPLIES_TO → resource │
│ b. Follow RUNS_AS → identity (cross-lookup) │
│ c. Follow AUTHENTICATES_TO (depth-limited, cross-system) │
│ d. For each reachable resource, emit AuthorityPathDoc │
│ 2. Compute path_lineage_id = hash(tenant, workload, resource)│
│ 3. Compute _id = hash(tenant, workload, identity, resource) │
│ 4. Compute composition_hash = hash(identity, roles, actions) │
│ 5. Upsert into authority_paths collection: │
│ - New path: first_seen_at = now, status = active │
│ - Existing, same hash: update last_seen_at only │
│ - Existing, different hash: update fields + composition │
│ 6. Mark paths NOT seen in this sync: status = removed, │
│ removed_at = now │
│ 7. Update current_state snapshot on each active path: │
│ - execution_30d: count evidence in last 30 days │
│ - ownership_status: check OWNED_BY relationships │
│ - egress_category: from workload properties │
│ - active_finding_count: (updated after Stage 5) │
│ 8. Backward compat: continue writing execution_paths[] on │
│ entities and accessible_by[] on resources (deprecated) │
│ 9. Update sync status: projected │
│ │
│ Writes: authority_paths (upsert), entities (backward compat) │
│ Reads: entities, execution_evidence (for current_state) │
│ Correctness: │
│ - Path IDs are deterministic hashes (idempotent upserts) │
│ - composition_hash detects mutations (role added/removed) │
│ - Removed paths are soft-deleted, never hard-deleted │
│ - path_lineage_id groups identity rotations │
│ - Sync metrics: paths_created, paths_updated, paths_removed │
└──────────────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 5: EVALUATE │
│ Worker: sync_evaluate │
│ │
│ Produce path-level findings in the `findings` collection. │
│ Extends FindingDoc with path-level fields (path_id, │
│ effective_from, resolved_at, intervals[]). "Finding" is the │
│ user/API/UI term. "Exposure" is a derived concept: the │
│ combined posture of an Authority Path expressed by its │
│ currently active findings. │
│ │
│ Steps: │
│ 1. Query active authority_paths for this tenant │
│ 2. For each path, evaluate finding rules: │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Finding Rules (path-level) │ │
│ │ │ │
│ │ orphaned_ownership: no active OWNED_BY on workload │ │
│ │ dormant_authority: no evidence in 90 days │ │
│ │ reachable_sensitive_domain: sensitivity = restricted │ │
│ │ unknown_identity_binding: identity_id is null │ │
│ │ unproven_execution: no execution evidence linked │ │
│ │ scope_drift: roles increased vs baseline │ │
│ │ llm_egress: egress_category = llm │ │
│ │ external_egress: egress_category = external │ │
│ │ ownership_ambiguous: only group/team owners │ │
│ │ ownership_unknown: insufficient ownership metadata │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ 3. For each rule that fires: │
│ a. Compute finding_key = hash(tenant, path_id, type) │
│ b. Upsert FindingDoc by finding_key (in `findings` │
│ collection — same collection, extended schema) │
│ c. If new OR existing status is remediated: │
│ - set status = active │
│ - set effective_from = detected_at = now │
│ - append new interval in intervals[] │
│ d. If existing + active: update last_evaluated_at │
│ e. Generate deterministic explanation │
│ │
│ 4. Auto-resolve findings whose conditions no longer hold: │
│ a. Query active path-level findings for evaluated paths │
│ b. If finding type NOT in current evaluation results: │
│ - Set status = remediated, resolved_at = now │
│ - Set resolution_reason (owner_assigned, evidence_ │
│ appeared, path_removed, sensitivity_downgraded) │
│ - Close current interval in intervals[] │
│ c. If status = acknowledged|false_positive and condition │
│ still fires: keep status, update last_evaluated_at │
│ │
│ 5. Update current_state.active_finding_count on paths │
│ 6. Update current_state.max_finding_severity on paths │
│ │
│ 7. Update sync status: evaluated │
│ 8. Enqueue build_evidence_pack for changed findings │
│ │
│ Writes: findings (upsert), authority_paths (current_state) │
│ Reads: authority_paths, entities, execution_evidence, │
│ entity_versions, findings (existing) │
│ Correctness: │
│ - Finding IDs are deterministic (idempotent) │
│ - Auto-resolve is re-entrant: same rule set re-evaluated │
│ each sync, condition gone = finding remediated │
│ - effective_from never changes after creation │
│ - resolved_at + resolution_reason create audit trail │
│ - Finding duration = resolved_at - effective_from │
│ - Sync metrics: findings_opened, findings_resolved, │
│ findings_unchanged │
└──────────────────────┬──────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Stage 6: PUBLISH │
│ Worker: sync_publish │
│ │
│ Build evidence packs for changed findings. Update sync to │
│ completed. Record final metrics. │
│ │
│ Steps: │
│ 1. For each changed finding: │
│ a. Fetch path + workload + identity + related entities │
│ b. Build 9-section evidence pack │
│ c. Compute integrity hash (SHA256 + tenant_id) │
│ d. Insert EvidencePackDoc with previous_pack_id chain │
│ e. Update finding with evidence_pack_id │
│ │
│ 2. Update ConnectorSyncDoc: │
│ - status: completed │
│ - completed_at: now │
│ - metrics: { │
│ entities_created, entities_updated, entities_deleted, │
│ events_created, │
│ paths_created, paths_updated, paths_removed, │
│ findings_opened, findings_resolved, │
│ evidence_packs_built │
│ } │
│ │
│ Writes: evidence_packs, findings (pack_id), connector_syncs │
│ Reads: findings, authority_paths, entities, execution_evidence │
│ entity_versions, events │
│ Correctness: │
│ - Evidence packs are immutable after creation │
│ - Integrity hash includes tenant_id (cross-tenant tamper) │
│ - Pack chaining via previous_pack_id (audit evolution) │
│ - Sync metrics are the definitive record of what happened │
└─────────────────────────────────────────────────────────────────┘
4. MongoDB Collections (Target State)
Existing (extended)
| Collection | Purpose | Written By | Changes |
|---|---|---|---|
connector_syncs | Sync metadata + metrics | Import, Publish | Metrics renamed (see Section 7) |
entities | Entity graph (9 types) | Resolve | Unchanged |
entity_versions | Temporal snapshots | Resolve | Unchanged |
events | Change audit log | Resolve | Unchanged |
execution_evidence | Activity records from source systems | Resolve | Unchanged |
findings | Path-level and entity-level findings | Evaluate | Extended with path_id, effective_from, resolved_at, intervals[] (see Section 5) |
evidence_packs | Immutable evidence attestations | Publish | Unchanged |
New
| Collection | Purpose | Written By |
|---|---|---|
authority_paths | Durable workload→resource routes | Project |
Note: There is no separate exposures collection. "Exposure" is a derived concept meaning "the combined posture of an Authority Path as expressed by its currently active findings." All finding records (entity-level and path-level) live in the findings collection. Path-level findings are distinguished by the presence of path_id.
Deprecated (remove after migration)
| Collection | Replaced By |
|---|---|
execution_chains | authority_paths |
entities.execution_paths[] | authority_paths collection (embedded array kept temporarily for backward compat) |
entities.accessible_by[] | Derived from authority_paths at query time |
5. Data Model (Target)
AuthorityPathDoc
interface AuthorityPathDoc {
_id: string; // hash(tenant, workload_id, identity_id, resource_id)
tenant_id: string;
path_lineage_id: string; // hash(tenant, workload_id, resource_id)
// Path nodes
workload_id: string;
identity_id: string | null; // null = unbound
destination_id: string;
data_domain: string;
// Path metadata
sensitivity: string;
via_roles: string[];
actions: string[];
source_system: string;
auth_chain_depth: number;
// Denormalized state (updated each sync by Project + Evaluate)
current_state: {
execution_30d: number;
ownership_status: string; // valid, orphaned, ambiguous, unknown
egress_category: string; // external, llm, internal, none
active_finding_count: number;
max_finding_severity: string | null;
};
// Mutation detection
composition_hash: string; // hash(identity, roles, actions)
// Temporal
first_seen_at: Date;
last_seen_at: Date;
status: "active" | "removed";
removed_at?: Date;
// Sync tracking
sync_version: number;
created_at: Date;
updated_at: Date;
}
Indexes:
{ tenant_id: 1, workload_id: 1, status: 1 }— paths from a workload{ tenant_id: 1, identity_id: 1 }— paths through an identity{ tenant_id: 1, data_domain: 1, sensitivity: 1 }— paths to sensitive domains{ tenant_id: 1, status: 1, "current_state.max_finding_severity": 1 }— active paths with findings{ tenant_id: 1, path_lineage_id: 1 }— lineage grouping
FindingDoc (Extended for Path-Level Findings)
The existing FindingDoc in src/domain/findings/types.ts is extended with path-level fields. There is no separate ExposureDoc. Both entity-level findings (legacy, path_id absent) and path-level findings (path_id present) coexist in the same findings collection.
interface FindingDoc {
// Existing fields (unchanged)
_id: string; // path-level: hash(tenant, path_id, finding_type)
// entity-level (legacy): "eval:" + hash
tenant_id: string;
entity_id: string; // workload (denormalized for both types)
finding_type: FindingType; // orphaned_ownership, dormant_authority, etc.
severity: FindingSeverity;
explanation: string;
status: "active" | "acknowledged" | "remediated" | "false_positive";
resolution_reason?: string;
evidence_refs: Record<string, unknown>;
evidence_completeness: EvidenceCompletenessSection;
evidence_pack_id?: string;
sync_version: number;
detected_at: Date;
last_evaluated_at: Date;
// === NEW: Path-level fields (added for path-level findings) ===
path_id?: string; // links to authority_paths._id
// absent = entity-level finding (legacy)
// present = path-level finding (new)
effective_from?: Date; // when condition became true on this path
resolved_at?: Date; // when condition ended on this path
// NOTE: "resolved_at" is the canonical field name
// regardless of resolution reason (remediated,
// false_positive, path_removed). The resolution_reason
// field captures WHY. UI renders status "remediated"
// as "Resolved".
intervals?: Array<{
effective_from: Date; // interval open timestamp
resolved_at?: Date; // interval close timestamp
resolution_reason?: string; // set when remediated/closed
}>; // append-only history for open/resolve/reopen cycles
// SAFEGUARD: if intervals.length > 50, rotate to a new
// FindingDoc (new _id, link via previous_finding_id) to
// prevent mega-document performance degradation in MongoDB.
// 50 intervals ≈ 25 flicker cycles — well above normal
// operational patterns (typical: 1-3 intervals per finding).
previous_finding_id?: string; // links to rotated-out finding doc (if intervals exceeded cap)
}
New indexes (in addition to existing):
{ tenant_id: 1, path_id: 1, status: 1 }— findings on a specific path{ tenant_id: 1, path_id: 1, finding_type: 1 }— unique finding per path+type (for deterministic upsert){ tenant_id: 1, status: 1, severity: 1, path_id: 1 }— active path-level findings by severity
Note on finding types: Keep the canonical FindingType taxonomy (for example: orphaned_ownership, dormant_authority, reachable_sensitive_domain, unknown_identity_binding, unproven_execution, scope_drift, llm_egress, external_egress, ownership_ambiguous, ownership_unknown). Do not introduce parallel _path-suffixed finding types. Path scope is represented by path_id, not by duplicating type names.
6. API Endpoints (Target)
Authority Paths
Row model decision: Each Authority Path is one row (path-level, not lineage-level). When the same workload reaches the same destination through multiple identities, each identity produces a separate row. Findings are bound to specific paths. Lineage-level grouping (collapsible rows by path_lineage_id) may be added later as a UI-only presentation concern without schema or API changes.
GET /api/v1/authority-paths
?status=active|removed
?sensitivity=restricted,confidential
?data_domain=finance
?workload_id=...
?identity_id=...
?has_findings=true
?limit=50&cursor=...
GET /api/v1/authority-paths/:id
GET /api/v1/authority-paths/:id/findings
?status=active|acknowledged|remediated|false_positive
?limit=50&cursor=...
List row contract (AuthorityPathListItem) — shared by both /authority-paths and /risk-clusters/:key/authority-paths:
interface AuthorityPathListItem {
_id: string;
path_lineage_id: string;
// Path nodes (display names resolved server-side)
workload: { id: string; display_name: string };
identity: { id: string; display_name: string } | null; // null = unbound
destination: { id: string; display_name: string };
data_domain: string;
// Path metadata
sensitivity: string;
via_roles: string[];
source_system: string;
// UX row fields
first_seen_at: string; // ISO date, shown as "First Seen" + recency tag
last_seen_at: string; // ISO date, shown as "Last Execution"
execution_30d: number; // "30d Executions" column
ownership_status: string; // "valid" | "orphaned" | "ambiguous" | "unknown"
egress_category: string; // "external" | "llm" | "internal" | "none"
is_autonomous: boolean; // true if workload execution_mode = "autonomous"
// Finding pills (rendered as severity-colored badges)
active_finding_count: number;
max_finding_severity: string | null; // "critical" | "high" | "medium" | "low" | null
finding_types: string[]; // active finding types for pill labels
// Status
status: "active" | "removed";
}
Identity column in collapsed row: Each row represents one full path, so identity is always a single value. For paths with identity: null (unbound workloads), display "Unbound" with a warning indicator. The identity column shows the identity's display_name (e.g., "svc-billing-sync (Service Principal)").
Findings (extended — primary API surface)
"Finding" is the user/API/UI-facing term. Path-level findings have path_id set; entity-level findings (legacy) do not.
GET /api/v1/findings
?status=active|acknowledged|remediated|false_positive
?severity=critical,high
?finding_type=orphaned_ownership,dormant_authority
?path_id=... (filter to specific authority path)
?entity_id=...
?scope=path|entity|all (default: all — filter by path-level vs entity-level)
?limit=50&cursor=...
GET /api/v1/findings/:id
GET /api/v1/findings/:id/evidence-pack
?format=json|markdown
PATCH /api/v1/findings/:id/status
{ status: "acknowledged" | "false_positive" | "remediated", reason?: string }
UI mapping note: status = remediated is rendered as Resolved in path-level finding panels.
Posture Summary
GET /api/v1/posture/summary
Returns: active autonomous authority path count, active paths with
invalid ownership count, delta since last refresh
Risk Clusters (computed from active findings on paths)
UX decision: Risk clusters are pre-configured compound conditions. The cluster detail page shows a locked, read-only view of authority paths matching the cluster's condition. Users cannot edit filters on the cluster page. A "View in Authority Paths" link carries the cluster's conditions as pre-filled editable filters to the Authority Paths inventory page for ad-hoc exploration. Custom/configurable clusters may be added in a future release.
GET /api/v1/posture/risk-clusters
Returns: cluster summaries computed from active path-level findings
GET /api/v1/risk-clusters/:key/authority-paths
Returns: AuthorityPathListItem[] matching cluster compound condition
Uses the same row contract as /authority-paths (see above)
Backward Compat (deprecated, kept temporarily)
GET /api/v1/chains — alias for /authority-paths
7. Correctness Model
How We Know the Pipeline Worked Correctly
Each sync produces a ConnectorSyncDoc with complete metrics. This is the correctness record.
interface SyncMetrics {
// Stage 1: Import
nodes_received: number;
edges_received: number;
validation_errors: number;
// Stage 2: Resolve
entities_created: number;
entities_updated: number;
entities_deleted: number;
events_created: number;
evidence_upserted: number;
classifications_computed: number; // security_relevance, execution_mode
// Stage 3: Reconcile
cross_connector_matches: number; // 0 until multi-connector
// Stage 4: Project
paths_created: number; // new authority paths discovered
paths_updated: number; // existing paths with composition change
paths_unchanged: number; // existing paths confirmed (last_seen_at updated)
paths_removed: number; // paths not seen → status: removed
// Stage 5: Evaluate
findings_opened: number; // new path-level finding conditions detected
findings_resolved: number; // conditions no longer true
findings_unchanged: number; // existing active findings re-confirmed
// Stage 6: Publish
evidence_packs_built: number;
// Timing
stage_durations: {
import_ms: number;
resolve_ms: number;
reconcile_ms: number;
project_ms: number;
evaluate_ms: number;
publish_ms: number;
total_ms: number;
};
}
Invariants (always true after a successful sync)
- Path completeness: Every entity with
entity_type = workload|identitythat has reachable resources via HAS_ROLE→GRANTS→APPLIES_TO has at least one active authority_path record - Path consistency:
authority_paths.composition_hashmatches the current entity graph state. If roles or actions changed, the hash changed, andpaths_updatedcounter incremented - Finding re-entrancy: Re-running the same sync produces identical finding state. Rules are pure functions. Same input = same output
- Temporal monotonicity:
effective_fromon a path-level finding never changes after creation.resolved_atis set exactly once per interval.last_evaluated_atalways increases - Soft-delete guarantee: No authority_path or finding document is ever hard-deleted.
paths_removed+findings_resolvedare always >= 0 - Evidence integrity: Every evidence_pack has a SHA256 hash that includes tenant_id. Tampering is detectable
- Metric accounting:
paths_created + paths_updated + paths_unchanged + paths_removed= total paths seen + total paths previously active but not seen - Interval cap: No FindingDoc has
intervals.length > 50. If a finding reaches the cap, it is rotated to a new document (new_id, linked viaprevious_finding_id). This prevents mega-document performance degradation in MongoDB (16MB doc limit, slow updates on large arrays)
Correctness Checks (automated, run after each sync)
async function validateSyncCorrectness(tenantId: string, syncId: string): Promise<ValidationResult> {
const sync = await storage.getSync(tenantId, syncId);
const errors: string[] = [];
// 1. All active paths have at least one node reference that exists
const activePaths = await storage.queryAuthorityPaths(tenantId, { status: "active" });
for (const path of activePaths) {
const workload = await storage.getEntity(tenantId, path.workload_id);
if (!workload) errors.push(`Path ${path._id}: workload ${path.workload_id} not found`);
const resource = await storage.getEntity(tenantId, path.destination_id);
if (!resource) errors.push(`Path ${path._id}: destination ${path.destination_id} not found`);
}
// 2. All active path-level findings reference an existing active path
const activeFindings = await storage.queryFindings(tenantId, { status: "active", scope: "path" });
for (const finding of activeFindings) {
if (!finding.path_id) continue; // entity-level finding, skip
const path = await storage.getAuthorityPath(tenantId, finding.path_id);
if (!path) errors.push(`Finding ${finding._id}: path ${finding.path_id} not found`);
if (path && path.status === "removed") {
errors.push(`Finding ${finding._id}: active finding on removed path ${finding.path_id}`);
}
}
// 3. current_state.active_finding_count matches actual count
for (const path of activePaths) {
const pathFindings = activeFindings.filter(f => f.path_id === path._id);
if (pathFindings.length !== path.current_state.active_finding_count) {
errors.push(`Path ${path._id}: finding count mismatch (${path.current_state.active_finding_count} vs ${pathFindings.length})`);
}
}
// 4. Metrics add up
const totalPathsProcessed = sync.metrics.paths_created + sync.metrics.paths_updated
+ sync.metrics.paths_unchanged + sync.metrics.paths_removed;
// Should equal: active paths from previous sync + new paths from this sync
return { valid: errors.length === 0, errors, sync_id: syncId };
}
8. Connector Responsibility (Current → Target)
Current (connector does too much)
Connector:
1. Discover entities from source APIs
2. Resolve cross-entity edges (EdgeResolver)
3. Classify entities (execution_mode, security_relevance, egress_category)
4. Filter entities (internal_inventory pre-filter)
5. Build NormalizedGraph with full relationship set
6. Submit to platform
Platform:
1. Store entities
2. Compute execution paths (BFS)
3. Assemble execution chains
4. Evaluate findings (entity-level)
5. Build evidence packs
Target (connector is thin, platform does the work)
Connector:
1. Discover entities from source APIs (flat DiscoveredEntities)
2. Resolve source-system-specific edges (client_id match, script search)
↑ Only edges that require source-system knowledge
3. Submit to platform
Platform:
1. Import: validate, normalize
2. Resolve: cross-connector matching, platform-side classification
3. Reconcile: multi-connector entity merge (future)
4. Project: materialize authority paths (BFS → persistent path records)
5. Evaluate: compute path-level findings (persistent, temporal, with intervals[])
6. Publish: evidence packs, sync metrics, correctness validation
What moves from connector to platform:
security_relevanceclassification → platform evaluator (4I)execution_modeclassification → platform evaluator (can derive from trigger types in properties)egress_categoryclassification → platform evaluator (can derive from endpoint URLs in properties)internal_inventoryfiltering → platform UI filter (show/hide toggle)- Cross-connector entity matching → platform reconcile stage (4D)
What stays in the connector:
- Source API authentication and discovery
- Source-system-specific edge resolution (OAuth client_id → SP matching requires ServiceNow + Entra API knowledge)
- Raw property extraction (display names, trigger types, endpoints)
9. Implementation Sequence
W1.1 implementation note: Section 3 describes the long-term target 6-stage pipeline with separate workers per stage. For W1.1, we follow architecture/02-processing-pipeline.md: the monolithic sync_ingestion handler gains materializeAuthorityPaths() as a new step, and the existing evaluate_findings handler gains a path-level evaluation pass. No separate stage workers are created. The stages below map to functions within the existing handlers, not to new workers.
Phase 1: Authority Path Persistence (Project stage)
Scope: New authority_paths collection, path materializer writes path records, new API endpoints.
| Task | What | Touches |
|---|---|---|
| 1a | Define AuthorityPathDoc type in src/domain/paths/types.ts | New file |
| 1b | Add authority_paths to schema manager (indexes) | src/storage/mongo/schema.ts |
| 1c | Add path storage methods to StorageAdapter | src/storage/storage-adapter.ts |
| 1d | Implement materializeAuthorityPaths() function | src/ingestion/authority-path-materializer.ts (new module) |
| 1e | Call materializer from sync_ingestion handler after chain assembly (step 11 in Doc 09) | src/workers/handlers/sync-ingestion.ts |
| 1f | Add GET /authority-paths and GET /authority-paths/:id API | New route file |
| 1g | Tests: path materialization, upsert idempotency, removal | New test files |
Backward compat: Continue writing execution_paths[] on entities. Both exist during transition.
Phase 2: Path-Level Finding Extension (Evaluate stage)
Scope: Extend FindingDoc with path-level fields (path_id, effective_from, resolved_at, intervals[]). Evaluator produces path-level findings in the existing findings collection. No new collection.
| Task | What | Touches |
|---|---|---|
| 2a | Extend FindingDoc type with path_id, effective_from, resolved_at, intervals[] | src/domain/findings/types.ts |
| 2b | Add new finding indexes for path-level queries | src/storage/mongo/schema.ts |
| 2c | Add path-level finding query methods to StorageAdapter (queryFindings with scope param) | src/storage/storage-adapter.ts |
| 2d | Create path-level finding evaluator (wraps existing rules, maps to paths) | New evaluator module |
| 2e | Implement finding lifecycle on paths (open/resolve/reopen with intervals[]) | src/workers/handlers/evaluate-findings.ts |
| 2f | Extend GET /findings with ?path_id=...&scope=path|entity|all filters | src/api/routes/findings.ts |
| 2g | Add GET /authority-paths/:id/findings | New route |
| 2h | Add FindingType enum values for path-level types | src/domain/findings/types.ts |
| 2i | Tests: path-level finding lifecycle, auto-resolve, interval append, idempotency | New test files |
Key decision: No ExposureDoc, no exposures collection. Path-level findings coexist with entity-level findings in findings. Distinguish by presence of path_id.
Phase 3: UX — Overview + Risk Clusters (Feb 18 mockups)
Scope: Implement the Overview homepage and Risk Cluster Detail screens per Feb 18 mockups.
| Task | What | Touches |
|---|---|---|
| 3a | Add GET /api/v1/posture/summary API (active path count, invalid ownership count, delta) | New route |
| 3b | Add GET /api/v1/posture/risk-clusters API (top clusters from active findings on paths) | New route |
| 3c | Add GET /api/v1/risk-clusters/:key/authority-paths API (paths matching cluster condition) | New route |
| 3d | Overview page — "Autonomous Authority Posture": 2 posture cards (Active Autonomous Authority Paths + Dormant Autonomous Authority Paths), "Since Last Refresh" delta section (+X new paths, +Y ownership invalidations), Top 5 Risk Clusters list | New UI page |
| 3e | Risk Cluster Detail page — click cluster → locked, read-only table of Authority Paths matching cluster condition (no ad-hoc filter editing). Includes inline expand (findings, ownership, authority diagram, evidence completeness). "View in Authority Paths" link carries filters to inventory page for ad-hoc exploration | New UI page |
| 3f | Update nav: Overview | Risk Clusters | Authority Paths | Graph | Settings | UI layout |
Phase 4: UX — Authority Paths List + Detail (Feb 18 mockups)
Scope: Implement the Authority Paths List and Authority Path Detail screens.
| Task | What | Touches |
|---|---|---|
| 4a | Authority Paths List page — all paths with search, filters, pagination. Columns: ID, Authority Path (workload→destination), Last Execution, 30d Executions, First Seen + tag pills | New UI page |
| 4b | Authority Path Detail page — breadcrumb nav, dagre diagram (Workload → Identity → Destination → Data Domain), findings panel, authority state, ownership breakdown, automation metadata, linkage proof | New UI page |
| 4c | Path-level finding timeline in detail view (open/resolve/reopen intervals) | UI component |
| 4d | Evidence completeness bar on path detail | UI component |
| 4e | Update TanStack Query hooks for authority-paths, path-findings | ui/src/hooks/ |
Phase 5: Visual QA Fixes
Scope: Address visual QA items with migration-aware sequencing (avoid over-investing in soon-to-be-replaced surfaces).
| Batch | ID | Category | Fix | Touches |
|---|---|---|---|---|
| Pre-migration must-fix | B1 | Bug | Dashboard posture summary shows contradictory totals — fix API query and tenant/type filter correctness | API + UI |
| Pre-migration must-fix | D1 | Design | Truncated finding descriptions — add tooltip/full-text affordance | UI components |
| Pre-migration must-fix | D2 | Design | Date format inconsistency across pages — standardize formatter | UI date utils |
| Pre-migration must-fix | D4 | Design | Graph legend gap — add missing node/edge legend entries | Graph Explorer |
| Pre-migration must-fix | U2 | UX | Badge contrast — improve unknown/low-contrast status badges | UI theme |
| Pre-migration must-fix | U6 | UX | Graph centering — auto-fit graph on load | Graph Explorer |
| Post-migration/defer | D3 | Design | Chains empty-state guidance (only if Chains survives migration window) | UI pages |
| Post-migration/defer | U1 | UX | Posture card tooltips (rework on new path-based cards only) | Overview page |
| Post-migration/defer | U3 | UX | Domain name formatting | UI formatters |
| Post-migration/defer | U4 | UX | Redundant filter cleanup | Filter sidebar |
| Post-migration/defer | U5 | UX | Temporal compare guidance text | Temporal Compare |
| Post-migration/defer | U7 | UX | Sync row expand affordance | Syncs page |
Phase 6: Platform-Side Classification (4I)
Scope: Move security_relevance, execution_mode, egress_category from connector to platform.
| Task | What | Touches |
|---|---|---|
| 6a | Add classification rules to Resolve stage | New classifier module |
| 6b | Remove connector-side classification code | Connector repo |
| 6c | Add UI filter toggle for security_relevance | UI filter sidebar |
Phase 7: Cleanup (blocked until Phase 8 gates pass)
| Task | What |
|---|---|
| 7a | Deprecate execution_chains collection |
| 7b | Remove execution_paths[] embedded array from entities |
| 7c | Remove entity-level finding types (after confirming all consumers use path-level) |
| 7d | Run correctness validation across all tenant data |
Phase 8: Operational Hardening (CTO requirement)
| Task | What |
|---|---|
| 8a | Adopt observability contract in architecture/02-processing-pipeline.md (sections 9-11) |
| 8b | Add stage-level metrics, traces, and structured logs across all 6 stages |
| 8c | Configure early alerts for failure, stall, ingestion silence, and backlog growth |
| 8d | Run failure drills: transient failure retry, permanent failure escalation, replay from checkpoint |
| 8e | Gate Phase 7 cleanup on 2 consecutive weeks of stable SLO compliance |
10. Operational Requirement (Non-Optional)
For production rollout, this pipeline must be implemented with explicit runtime observability and alerting. The detailed architecture is specified in:
docs/architecture/02-processing-pipeline.md(sections 9-11: observability, SLOs, alerts, dashboards)
This is required to satisfy CTO requirements for:
- Clear batch/ETL pipeline execution model and stage transitions
- Runtime observability of stage health, queue depth, and data freshness
- Early failure alerting with actionable ownership and escalation
11. Acceptance Criteria
- Clicking a Risk Cluster shows Authority Paths as primary rows
- Each path row shows execution magnitude (30d), ownership status, active finding count, and max severity — without additional API calls
- Path detail page shows finding timeline with
effective_fromand status transitions - A finding can be opened, remediated (shown as
Resolvedin UI), and re-opened on the same path without losing history (viaintervals[]) - Removed paths and remediated findings remain historically queryable
- Every sync produces a
SyncMetricsrecord with complete accounting of paths/findings processed - Correctness validation passes after every sync (no orphaned findings, no stale counts)
- Platform can classify
security_relevanceandexecution_modewithout connector changes - Re-running the same sync produces identical path and finding state (deterministic, idempotent)
- Re-opened findings preserve prior intervals in
intervals[]and remain audit-queryable - Pipeline emits operational telemetry and triggers alerts before silent data staleness occurs
- Overview page matches Feb 18 mockup: 2 posture cards (Active + Dormant) + delta + Top 5 Risk Clusters
- Authority Paths list and detail pages match Feb 18 mockups (search, filters, dagre diagram, findings panel)
- Navigation: Overview | Risk Clusters | Authority Paths | Graph | Settings