Skip to main content

Combined Platform Pipeline Architecture

Date: 2026-02-18 Status: Proposed - revised after critical review (CTO/CISO/CEO/Architect lenses) Combines: W1.1 (Doc 14) + Phase 4 (Reconciled Roadmap 4D/4F/4I) Scope boundary: This document defines W1.1 implementation scope (persistent authority paths + path-level temporal findings). Baseline W1 wedge docs remain valid for conceptual UX/logic framing but do not override this architecture plan's persistence/lifecycle contracts.


0. Critical Review Outcomes

This document was reviewed for logical consistency, CISO workflow fit, CEO product direction, and CTO operational viability.

Terminology decision (2026-02-18): "Finding" remains the user-facing, API-facing, and UI-facing term. There is no separate exposures collection. "Exposure" is a derived concept meaning "the combined posture of an Authority Path as expressed by its currently active findings." Authority Paths are the primary durable objects and navigation units. Findings are extended with path-level fields (path_id, effective_from, resolved_at, intervals[]) to support temporal lifecycle on paths.

SeverityFindingResolution in this revision
HighFinding ID model (hash(tenant, path, type)) did not clearly support open-resolve-reopen history while staying idempotentKeep deterministic finding key, but make interval history explicit via append-only intervals[] on FindingDoc
HighPipeline target described correctness but not run-time observability/alerting needed for production operationsAdd explicit operational requirement and dedicated architecture doc reference: architecture/02-processing-pipeline.md (sections 9-11)
HighUX screens needed to match Feb 18 mockups: Overview, Risk Cluster Detail, Authority Paths List, Authority Path DetailImplementation sequence updated with UX phases and visual QA fix items
MediumRetry, stall handling, and failure escalation were implied but not specifiedDefine as mandatory execution concerns and track in implementation sequence
MediumCleanup/deprecation could remove legacy paths before reliability is provenAdd observability gates before cleanup

1. Why Combine

Two planned changes touch the same pipeline:

Planned ChangeWhat It DoesPipeline Stage Affected
W1.1: Persistent Authority PathsMaterialize durable path records from entity graphAfter entity upsert, before evaluation
W1.1: Path-Level FindingsTime-bound findings on specific paths (with path_id, effective_from, intervals[])Evaluation stage
4F: Import → Resolve → Reconcile → Project → Evaluate → PublishRestructure ingestion into explicit stagesEntire pipeline
4I: Platform-side security_relevanceMove classification from connector to platformEvaluation stage
4D: Platform-side correlationCross-connector entity matchingReconcile stage

Building W1.1 authority paths now and then rebuilding the pipeline in Phase 4 means doing the work twice. Combining them means we build the target pipeline once with all capabilities.

The key insight: Authority Path materialization IS the "Project" stage of 4F. Path-level finding evaluation IS the "Evaluate" stage. They're not additions to Phase 4 — they are Phase 4.


2. Current Pipeline (What Exists Today)

POST /ingest/normalized-graph


┌─────────────────────────────────────────────────┐
│ Job 1: sync_ingestion │
│ │
│ 1. Create ConnectorSyncDoc (running) │
│ 2. transformGraph() → entities + evidence │
│ 3. computeDiff() → events, changed/created IDs │
│ 4. upsertEntities() │
│ 5. insertEvents() │
│ 6. insertEntityVersions() for changed │
│ 7. soft-delete absent entities │
│ 8. upsertExecutionEvidence() │
│ 9. materializeExecutionPaths() ← paths │
│ 10. assembleExecutionChains() ← chains │
│ 11. Update sync → completed + metrics │
└──────────────────────┬──────────────────────────┘


┌─────────────────────────────────────────────────┐
│ Job 2: evaluate_findings │
│ │
│ 1. Gate: sync.status === "completed" │
│ 2. Query entities (workloads, identities, etc) │
│ 3. Run 12 rules against each entity │
│ 4. Upsert FindingDocs │
│ 5. Enqueue build_evidence_pack per finding │
└──────────────────────┬──────────────────────────┘


┌─────────────────────────────────────────────────┐
│ Job 3..N: build_evidence_pack (per finding) │
│ │
│ 1. Fetch finding + entity + related entities │
│ 2. Fetch evidence + versions + events │
│ 3. Build 9-section evidence pack │
│ 4. Compute integrity hash (SHA256 + tenant) │
│ 5. Insert EvidencePackDoc │
│ 6. Update finding with evidence_pack_id │
└─────────────────────────────────────────────────┘

Problems with current pipeline:

  1. Execution paths are embedded arrays on entities — not queryable, not versioned, rewritten every sync
  2. Execution chains are workload-centric (1 chain per workload) — wrong grain for authority path investigation
  3. Findings are entity-bound — "entity E has orphaned ownership" not "path P has orphaned ownership"
  4. No temporal tracking on findings — no effective_from, no finding duration on specific paths
  5. No path-level finding lifecycle — no open/resolve/reopen history, no intervals[]
  6. Connector does too much — builds NormalizedGraph with pre-computed relationships, classification, filtering

3. Target Pipeline (Combined W1.1 + Phase 4)

POST /ingest/normalized-graph     (current — preserved for backward compat)
POST /ingest/discovered-entities (new — ADR-004 flat entity format)


┌─────────────────────────────────────────────────────────────────┐
│ Stage 1: IMPORT │
│ Worker: sync_import │
│ │
│ Accept either NormalizedGraph (legacy) or DiscoveredEntities │
│ (ADR-004). Normalize to internal entity representation. │
│ │
│ Steps: │
│ 1. Create ConnectorSyncDoc (status: importing) │
│ 2. transformGraph() or transformDiscoveredEntities() │
│ 3. Validate entity types, required fields, edge targets │
│ 4. Output: RawEntityBatch (entities + evidence + edges) │
│ │
│ Writes: connector_syncs │
│ Reads: nothing │
│ Correctness: schema validation, duplicate detection │
└──────────────────────┬──────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Stage 2: RESOLVE │
│ Worker: sync_resolve │
│ │
│ Resolve cross-entity relationships. Today this happens in the │
│ connector (EdgeResolver). Target: platform does it. │
│ │
│ Steps: │
│ 1. computeDiff() — detect created/changed/deleted entities │
│ 2. Resolve edges: client_id matching (OAuth→SP), script-text │
│ search (workload→REST), trigger matching │
│ 3. Classify entities platform-side: │
│ - execution_mode (autonomous/operator_assisted/etc) │
│ - security_relevance (active_external/dormant/internal) │
│ - egress_category (from endpoint analysis) │
│ 4. upsertEntities() │
│ 5. insertEvents() │
│ 6. insertEntityVersions() │
│ 7. upsertExecutionEvidence() │
│ 8. Update sync status: resolved │
│ │
│ Writes: entities, events, entity_versions, execution_evidence │
│ Reads: entities (existing state for diff) │
│ Correctness: diff produces deterministic events, │
│ entity IDs are stable hashes, versions are temporal │
└──────────────────────┬──────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Stage 3: RECONCILE (future — Phase 4D) │
│ Worker: sync_reconcile │
│ │
│ Cross-connector entity matching. When multiple connectors │
│ discover the same real-world entity (e.g., Entra SP seen by │
│ both Entra connector and ServiceNow connector), reconcile │
│ into a single canonical entity. │
│ │
│ Steps: │
│ 1. Match entities across source_systems by deterministic │
│ keys (client_id, objectId, email, sys_id) │
│ 2. Merge properties (prefer source-of-truth system) │
│ 3. Create/update SAME_AS relationships │
│ 4. Update sync status: reconciled │
│ │
│ Writes: entities (merge), relationships │
│ Reads: entities (cross-system query) │
│ Correctness: deterministic matching rules, no fuzzy/ML, │
│ merge conflicts logged as events │
│ │
│ NOTE: Can be a no-op initially (single connector). │
│ Becomes relevant with 2+ connectors per tenant. │
└──────────────────────┬──────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Stage 4: PROJECT │
│ Worker: sync_project │
│ │
│ Materialize durable Authority Paths from the entity graph. │
│ This is the core new stage — replaces embedded execution_paths │
│ and execution_chains with persistent, versioned path records. │
│ │
│ Steps: │
│ 1. For each affected workload/identity entity: │
│ a. BFS: HAS_ROLE → GRANTS → APPLIES_TO → resource │
│ b. Follow RUNS_AS → identity (cross-lookup) │
│ c. Follow AUTHENTICATES_TO (depth-limited, cross-system) │
│ d. For each reachable resource, emit AuthorityPathDoc │
│ 2. Compute path_lineage_id = hash(tenant, workload, resource)│
│ 3. Compute _id = hash(tenant, workload, identity, resource) │
│ 4. Compute composition_hash = hash(identity, roles, actions) │
│ 5. Upsert into authority_paths collection: │
│ - New path: first_seen_at = now, status = active │
│ - Existing, same hash: update last_seen_at only │
│ - Existing, different hash: update fields + composition │
│ 6. Mark paths NOT seen in this sync: status = removed, │
│ removed_at = now │
│ 7. Update current_state snapshot on each active path: │
│ - execution_30d: count evidence in last 30 days │
│ - ownership_status: check OWNED_BY relationships │
│ - egress_category: from workload properties │
│ - active_finding_count: (updated after Stage 5) │
│ 8. Backward compat: continue writing execution_paths[] on │
│ entities and accessible_by[] on resources (deprecated) │
│ 9. Update sync status: projected │
│ │
│ Writes: authority_paths (upsert), entities (backward compat) │
│ Reads: entities, execution_evidence (for current_state) │
│ Correctness: │
│ - Path IDs are deterministic hashes (idempotent upserts) │
│ - composition_hash detects mutations (role added/removed) │
│ - Removed paths are soft-deleted, never hard-deleted │
│ - path_lineage_id groups identity rotations │
│ - Sync metrics: paths_created, paths_updated, paths_removed │
└──────────────────────┬──────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Stage 5: EVALUATE │
│ Worker: sync_evaluate │
│ │
│ Produce path-level findings in the `findings` collection. │
│ Extends FindingDoc with path-level fields (path_id, │
│ effective_from, resolved_at, intervals[]). "Finding" is the │
│ user/API/UI term. "Exposure" is a derived concept: the │
│ combined posture of an Authority Path expressed by its │
│ currently active findings. │
│ │
│ Steps: │
│ 1. Query active authority_paths for this tenant │
│ 2. For each path, evaluate finding rules: │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Finding Rules (path-level) │ │
│ │ │ │
│ │ orphaned_ownership: no active OWNED_BY on workload │ │
│ │ dormant_authority: no evidence in 90 days │ │
│ │ reachable_sensitive_domain: sensitivity = restricted │ │
│ │ unknown_identity_binding: identity_id is null │ │
│ │ unproven_execution: no execution evidence linked │ │
│ │ scope_drift: roles increased vs baseline │ │
│ │ llm_egress: egress_category = llm │ │
│ │ external_egress: egress_category = external │ │
│ │ ownership_ambiguous: only group/team owners │ │
│ │ ownership_unknown: insufficient ownership metadata │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ 3. For each rule that fires: │
│ a. Compute finding_key = hash(tenant, path_id, type) │
│ b. Upsert FindingDoc by finding_key (in `findings` │
│ collection — same collection, extended schema) │
│ c. If new OR existing status is remediated: │
│ - set status = active │
│ - set effective_from = detected_at = now │
│ - append new interval in intervals[] │
│ d. If existing + active: update last_evaluated_at │
│ e. Generate deterministic explanation │
│ │
│ 4. Auto-resolve findings whose conditions no longer hold: │
│ a. Query active path-level findings for evaluated paths │
│ b. If finding type NOT in current evaluation results: │
│ - Set status = remediated, resolved_at = now │
│ - Set resolution_reason (owner_assigned, evidence_ │
│ appeared, path_removed, sensitivity_downgraded) │
│ - Close current interval in intervals[] │
│ c. If status = acknowledged|false_positive and condition │
│ still fires: keep status, update last_evaluated_at │
│ │
│ 5. Update current_state.active_finding_count on paths │
│ 6. Update current_state.max_finding_severity on paths │
│ │
│ 7. Update sync status: evaluated │
│ 8. Enqueue build_evidence_pack for changed findings │
│ │
│ Writes: findings (upsert), authority_paths (current_state) │
│ Reads: authority_paths, entities, execution_evidence, │
│ entity_versions, findings (existing) │
│ Correctness: │
│ - Finding IDs are deterministic (idempotent) │
│ - Auto-resolve is re-entrant: same rule set re-evaluated │
│ each sync, condition gone = finding remediated │
│ - effective_from never changes after creation │
│ - resolved_at + resolution_reason create audit trail │
│ - Finding duration = resolved_at - effective_from │
│ - Sync metrics: findings_opened, findings_resolved, │
│ findings_unchanged │
└──────────────────────┬──────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Stage 6: PUBLISH │
│ Worker: sync_publish │
│ │
│ Build evidence packs for changed findings. Update sync to │
│ completed. Record final metrics. │
│ │
│ Steps: │
│ 1. For each changed finding: │
│ a. Fetch path + workload + identity + related entities │
│ b. Build 9-section evidence pack │
│ c. Compute integrity hash (SHA256 + tenant_id) │
│ d. Insert EvidencePackDoc with previous_pack_id chain │
│ e. Update finding with evidence_pack_id │
│ │
│ 2. Update ConnectorSyncDoc: │
│ - status: completed │
│ - completed_at: now │
│ - metrics: { │
│ entities_created, entities_updated, entities_deleted, │
│ events_created, │
│ paths_created, paths_updated, paths_removed, │
│ findings_opened, findings_resolved, │
│ evidence_packs_built │
│ } │
│ │
│ Writes: evidence_packs, findings (pack_id), connector_syncs │
│ Reads: findings, authority_paths, entities, execution_evidence │
│ entity_versions, events │
│ Correctness: │
│ - Evidence packs are immutable after creation │
│ - Integrity hash includes tenant_id (cross-tenant tamper) │
│ - Pack chaining via previous_pack_id (audit evolution) │
│ - Sync metrics are the definitive record of what happened │
└─────────────────────────────────────────────────────────────────┘

4. MongoDB Collections (Target State)

Existing (extended)

CollectionPurposeWritten ByChanges
connector_syncsSync metadata + metricsImport, PublishMetrics renamed (see Section 7)
entitiesEntity graph (9 types)ResolveUnchanged
entity_versionsTemporal snapshotsResolveUnchanged
eventsChange audit logResolveUnchanged
execution_evidenceActivity records from source systemsResolveUnchanged
findingsPath-level and entity-level findingsEvaluateExtended with path_id, effective_from, resolved_at, intervals[] (see Section 5)
evidence_packsImmutable evidence attestationsPublishUnchanged

New

CollectionPurposeWritten By
authority_pathsDurable workload→resource routesProject

Note: There is no separate exposures collection. "Exposure" is a derived concept meaning "the combined posture of an Authority Path as expressed by its currently active findings." All finding records (entity-level and path-level) live in the findings collection. Path-level findings are distinguished by the presence of path_id.

Deprecated (remove after migration)

CollectionReplaced By
execution_chainsauthority_paths
entities.execution_paths[]authority_paths collection (embedded array kept temporarily for backward compat)
entities.accessible_by[]Derived from authority_paths at query time

5. Data Model (Target)

AuthorityPathDoc

interface AuthorityPathDoc {
_id: string; // hash(tenant, workload_id, identity_id, resource_id)
tenant_id: string;
path_lineage_id: string; // hash(tenant, workload_id, resource_id)

// Path nodes
workload_id: string;
identity_id: string | null; // null = unbound
destination_id: string;
data_domain: string;

// Path metadata
sensitivity: string;
via_roles: string[];
actions: string[];
source_system: string;
auth_chain_depth: number;

// Denormalized state (updated each sync by Project + Evaluate)
current_state: {
execution_30d: number;
ownership_status: string; // valid, orphaned, ambiguous, unknown
egress_category: string; // external, llm, internal, none
active_finding_count: number;
max_finding_severity: string | null;
};

// Mutation detection
composition_hash: string; // hash(identity, roles, actions)

// Temporal
first_seen_at: Date;
last_seen_at: Date;
status: "active" | "removed";
removed_at?: Date;

// Sync tracking
sync_version: number;
created_at: Date;
updated_at: Date;
}

Indexes:

  • { tenant_id: 1, workload_id: 1, status: 1 } — paths from a workload
  • { tenant_id: 1, identity_id: 1 } — paths through an identity
  • { tenant_id: 1, data_domain: 1, sensitivity: 1 } — paths to sensitive domains
  • { tenant_id: 1, status: 1, "current_state.max_finding_severity": 1 } — active paths with findings
  • { tenant_id: 1, path_lineage_id: 1 } — lineage grouping

FindingDoc (Extended for Path-Level Findings)

The existing FindingDoc in src/domain/findings/types.ts is extended with path-level fields. There is no separate ExposureDoc. Both entity-level findings (legacy, path_id absent) and path-level findings (path_id present) coexist in the same findings collection.

interface FindingDoc {
// Existing fields (unchanged)
_id: string; // path-level: hash(tenant, path_id, finding_type)
// entity-level (legacy): "eval:" + hash
tenant_id: string;
entity_id: string; // workload (denormalized for both types)
finding_type: FindingType; // orphaned_ownership, dormant_authority, etc.
severity: FindingSeverity;
explanation: string;
status: "active" | "acknowledged" | "remediated" | "false_positive";
resolution_reason?: string;
evidence_refs: Record<string, unknown>;
evidence_completeness: EvidenceCompletenessSection;
evidence_pack_id?: string;
sync_version: number;
detected_at: Date;
last_evaluated_at: Date;

// === NEW: Path-level fields (added for path-level findings) ===

path_id?: string; // links to authority_paths._id
// absent = entity-level finding (legacy)
// present = path-level finding (new)

effective_from?: Date; // when condition became true on this path
resolved_at?: Date; // when condition ended on this path
// NOTE: "resolved_at" is the canonical field name
// regardless of resolution reason (remediated,
// false_positive, path_removed). The resolution_reason
// field captures WHY. UI renders status "remediated"
// as "Resolved".

intervals?: Array<{
effective_from: Date; // interval open timestamp
resolved_at?: Date; // interval close timestamp
resolution_reason?: string; // set when remediated/closed
}>; // append-only history for open/resolve/reopen cycles
// SAFEGUARD: if intervals.length > 50, rotate to a new
// FindingDoc (new _id, link via previous_finding_id) to
// prevent mega-document performance degradation in MongoDB.
// 50 intervals ≈ 25 flicker cycles — well above normal
// operational patterns (typical: 1-3 intervals per finding).
previous_finding_id?: string; // links to rotated-out finding doc (if intervals exceeded cap)
}

New indexes (in addition to existing):

  • { tenant_id: 1, path_id: 1, status: 1 } — findings on a specific path
  • { tenant_id: 1, path_id: 1, finding_type: 1 } — unique finding per path+type (for deterministic upsert)
  • { tenant_id: 1, status: 1, severity: 1, path_id: 1 } — active path-level findings by severity

Note on finding types: Keep the canonical FindingType taxonomy (for example: orphaned_ownership, dormant_authority, reachable_sensitive_domain, unknown_identity_binding, unproven_execution, scope_drift, llm_egress, external_egress, ownership_ambiguous, ownership_unknown). Do not introduce parallel _path-suffixed finding types. Path scope is represented by path_id, not by duplicating type names.


6. API Endpoints (Target)

Authority Paths

Row model decision: Each Authority Path is one row (path-level, not lineage-level). When the same workload reaches the same destination through multiple identities, each identity produces a separate row. Findings are bound to specific paths. Lineage-level grouping (collapsible rows by path_lineage_id) may be added later as a UI-only presentation concern without schema or API changes.

GET  /api/v1/authority-paths
?status=active|removed
?sensitivity=restricted,confidential
?data_domain=finance
?workload_id=...
?identity_id=...
?has_findings=true
?limit=50&cursor=...

GET /api/v1/authority-paths/:id

GET /api/v1/authority-paths/:id/findings
?status=active|acknowledged|remediated|false_positive
?limit=50&cursor=...

List row contract (AuthorityPathListItem) — shared by both /authority-paths and /risk-clusters/:key/authority-paths:

interface AuthorityPathListItem {
_id: string;
path_lineage_id: string;

// Path nodes (display names resolved server-side)
workload: { id: string; display_name: string };
identity: { id: string; display_name: string } | null; // null = unbound
destination: { id: string; display_name: string };
data_domain: string;

// Path metadata
sensitivity: string;
via_roles: string[];
source_system: string;

// UX row fields
first_seen_at: string; // ISO date, shown as "First Seen" + recency tag
last_seen_at: string; // ISO date, shown as "Last Execution"
execution_30d: number; // "30d Executions" column
ownership_status: string; // "valid" | "orphaned" | "ambiguous" | "unknown"
egress_category: string; // "external" | "llm" | "internal" | "none"
is_autonomous: boolean; // true if workload execution_mode = "autonomous"

// Finding pills (rendered as severity-colored badges)
active_finding_count: number;
max_finding_severity: string | null; // "critical" | "high" | "medium" | "low" | null
finding_types: string[]; // active finding types for pill labels

// Status
status: "active" | "removed";
}

Identity column in collapsed row: Each row represents one full path, so identity is always a single value. For paths with identity: null (unbound workloads), display "Unbound" with a warning indicator. The identity column shows the identity's display_name (e.g., "svc-billing-sync (Service Principal)").

Findings (extended — primary API surface)

"Finding" is the user/API/UI-facing term. Path-level findings have path_id set; entity-level findings (legacy) do not.

GET  /api/v1/findings
?status=active|acknowledged|remediated|false_positive
?severity=critical,high
?finding_type=orphaned_ownership,dormant_authority
?path_id=... (filter to specific authority path)
?entity_id=...
?scope=path|entity|all (default: all — filter by path-level vs entity-level)
?limit=50&cursor=...

GET /api/v1/findings/:id

GET /api/v1/findings/:id/evidence-pack
?format=json|markdown

PATCH /api/v1/findings/:id/status
{ status: "acknowledged" | "false_positive" | "remediated", reason?: string }

UI mapping note: status = remediated is rendered as Resolved in path-level finding panels.

Posture Summary

GET  /api/v1/posture/summary
Returns: active autonomous authority path count, active paths with
invalid ownership count, delta since last refresh

Risk Clusters (computed from active findings on paths)

UX decision: Risk clusters are pre-configured compound conditions. The cluster detail page shows a locked, read-only view of authority paths matching the cluster's condition. Users cannot edit filters on the cluster page. A "View in Authority Paths" link carries the cluster's conditions as pre-filled editable filters to the Authority Paths inventory page for ad-hoc exploration. Custom/configurable clusters may be added in a future release.

GET  /api/v1/posture/risk-clusters
Returns: cluster summaries computed from active path-level findings

GET /api/v1/risk-clusters/:key/authority-paths
Returns: AuthorityPathListItem[] matching cluster compound condition
Uses the same row contract as /authority-paths (see above)

Backward Compat (deprecated, kept temporarily)

GET  /api/v1/chains            — alias for /authority-paths

7. Correctness Model

How We Know the Pipeline Worked Correctly

Each sync produces a ConnectorSyncDoc with complete metrics. This is the correctness record.

interface SyncMetrics {
// Stage 1: Import
nodes_received: number;
edges_received: number;
validation_errors: number;

// Stage 2: Resolve
entities_created: number;
entities_updated: number;
entities_deleted: number;
events_created: number;
evidence_upserted: number;
classifications_computed: number; // security_relevance, execution_mode

// Stage 3: Reconcile
cross_connector_matches: number; // 0 until multi-connector

// Stage 4: Project
paths_created: number; // new authority paths discovered
paths_updated: number; // existing paths with composition change
paths_unchanged: number; // existing paths confirmed (last_seen_at updated)
paths_removed: number; // paths not seen → status: removed

// Stage 5: Evaluate
findings_opened: number; // new path-level finding conditions detected
findings_resolved: number; // conditions no longer true
findings_unchanged: number; // existing active findings re-confirmed

// Stage 6: Publish
evidence_packs_built: number;

// Timing
stage_durations: {
import_ms: number;
resolve_ms: number;
reconcile_ms: number;
project_ms: number;
evaluate_ms: number;
publish_ms: number;
total_ms: number;
};
}

Invariants (always true after a successful sync)

  1. Path completeness: Every entity with entity_type = workload|identity that has reachable resources via HAS_ROLE→GRANTS→APPLIES_TO has at least one active authority_path record
  2. Path consistency: authority_paths.composition_hash matches the current entity graph state. If roles or actions changed, the hash changed, and paths_updated counter incremented
  3. Finding re-entrancy: Re-running the same sync produces identical finding state. Rules are pure functions. Same input = same output
  4. Temporal monotonicity: effective_from on a path-level finding never changes after creation. resolved_at is set exactly once per interval. last_evaluated_at always increases
  5. Soft-delete guarantee: No authority_path or finding document is ever hard-deleted. paths_removed + findings_resolved are always >= 0
  6. Evidence integrity: Every evidence_pack has a SHA256 hash that includes tenant_id. Tampering is detectable
  7. Metric accounting: paths_created + paths_updated + paths_unchanged + paths_removed = total paths seen + total paths previously active but not seen
  8. Interval cap: No FindingDoc has intervals.length > 50. If a finding reaches the cap, it is rotated to a new document (new _id, linked via previous_finding_id). This prevents mega-document performance degradation in MongoDB (16MB doc limit, slow updates on large arrays)

Correctness Checks (automated, run after each sync)

async function validateSyncCorrectness(tenantId: string, syncId: string): Promise<ValidationResult> {
const sync = await storage.getSync(tenantId, syncId);
const errors: string[] = [];

// 1. All active paths have at least one node reference that exists
const activePaths = await storage.queryAuthorityPaths(tenantId, { status: "active" });
for (const path of activePaths) {
const workload = await storage.getEntity(tenantId, path.workload_id);
if (!workload) errors.push(`Path ${path._id}: workload ${path.workload_id} not found`);
const resource = await storage.getEntity(tenantId, path.destination_id);
if (!resource) errors.push(`Path ${path._id}: destination ${path.destination_id} not found`);
}

// 2. All active path-level findings reference an existing active path
const activeFindings = await storage.queryFindings(tenantId, { status: "active", scope: "path" });
for (const finding of activeFindings) {
if (!finding.path_id) continue; // entity-level finding, skip
const path = await storage.getAuthorityPath(tenantId, finding.path_id);
if (!path) errors.push(`Finding ${finding._id}: path ${finding.path_id} not found`);
if (path && path.status === "removed") {
errors.push(`Finding ${finding._id}: active finding on removed path ${finding.path_id}`);
}
}

// 3. current_state.active_finding_count matches actual count
for (const path of activePaths) {
const pathFindings = activeFindings.filter(f => f.path_id === path._id);
if (pathFindings.length !== path.current_state.active_finding_count) {
errors.push(`Path ${path._id}: finding count mismatch (${path.current_state.active_finding_count} vs ${pathFindings.length})`);
}
}

// 4. Metrics add up
const totalPathsProcessed = sync.metrics.paths_created + sync.metrics.paths_updated
+ sync.metrics.paths_unchanged + sync.metrics.paths_removed;
// Should equal: active paths from previous sync + new paths from this sync

return { valid: errors.length === 0, errors, sync_id: syncId };
}

8. Connector Responsibility (Current → Target)

Current (connector does too much)

Connector:
1. Discover entities from source APIs
2. Resolve cross-entity edges (EdgeResolver)
3. Classify entities (execution_mode, security_relevance, egress_category)
4. Filter entities (internal_inventory pre-filter)
5. Build NormalizedGraph with full relationship set
6. Submit to platform

Platform:
1. Store entities
2. Compute execution paths (BFS)
3. Assemble execution chains
4. Evaluate findings (entity-level)
5. Build evidence packs

Target (connector is thin, platform does the work)

Connector:
1. Discover entities from source APIs (flat DiscoveredEntities)
2. Resolve source-system-specific edges (client_id match, script search)
↑ Only edges that require source-system knowledge
3. Submit to platform

Platform:
1. Import: validate, normalize
2. Resolve: cross-connector matching, platform-side classification
3. Reconcile: multi-connector entity merge (future)
4. Project: materialize authority paths (BFS → persistent path records)
5. Evaluate: compute path-level findings (persistent, temporal, with intervals[])
6. Publish: evidence packs, sync metrics, correctness validation

What moves from connector to platform:

  • security_relevance classification → platform evaluator (4I)
  • execution_mode classification → platform evaluator (can derive from trigger types in properties)
  • egress_category classification → platform evaluator (can derive from endpoint URLs in properties)
  • internal_inventory filtering → platform UI filter (show/hide toggle)
  • Cross-connector entity matching → platform reconcile stage (4D)

What stays in the connector:

  • Source API authentication and discovery
  • Source-system-specific edge resolution (OAuth client_id → SP matching requires ServiceNow + Entra API knowledge)
  • Raw property extraction (display names, trigger types, endpoints)

9. Implementation Sequence

W1.1 implementation note: Section 3 describes the long-term target 6-stage pipeline with separate workers per stage. For W1.1, we follow architecture/02-processing-pipeline.md: the monolithic sync_ingestion handler gains materializeAuthorityPaths() as a new step, and the existing evaluate_findings handler gains a path-level evaluation pass. No separate stage workers are created. The stages below map to functions within the existing handlers, not to new workers.

Phase 1: Authority Path Persistence (Project stage)

Scope: New authority_paths collection, path materializer writes path records, new API endpoints.

TaskWhatTouches
1aDefine AuthorityPathDoc type in src/domain/paths/types.tsNew file
1bAdd authority_paths to schema manager (indexes)src/storage/mongo/schema.ts
1cAdd path storage methods to StorageAdaptersrc/storage/storage-adapter.ts
1dImplement materializeAuthorityPaths() functionsrc/ingestion/authority-path-materializer.ts (new module)
1eCall materializer from sync_ingestion handler after chain assembly (step 11 in Doc 09)src/workers/handlers/sync-ingestion.ts
1fAdd GET /authority-paths and GET /authority-paths/:id APINew route file
1gTests: path materialization, upsert idempotency, removalNew test files

Backward compat: Continue writing execution_paths[] on entities. Both exist during transition.

Phase 2: Path-Level Finding Extension (Evaluate stage)

Scope: Extend FindingDoc with path-level fields (path_id, effective_from, resolved_at, intervals[]). Evaluator produces path-level findings in the existing findings collection. No new collection.

TaskWhatTouches
2aExtend FindingDoc type with path_id, effective_from, resolved_at, intervals[]src/domain/findings/types.ts
2bAdd new finding indexes for path-level queriessrc/storage/mongo/schema.ts
2cAdd path-level finding query methods to StorageAdapter (queryFindings with scope param)src/storage/storage-adapter.ts
2dCreate path-level finding evaluator (wraps existing rules, maps to paths)New evaluator module
2eImplement finding lifecycle on paths (open/resolve/reopen with intervals[])src/workers/handlers/evaluate-findings.ts
2fExtend GET /findings with ?path_id=...&scope=path|entity|all filterssrc/api/routes/findings.ts
2gAdd GET /authority-paths/:id/findingsNew route
2hAdd FindingType enum values for path-level typessrc/domain/findings/types.ts
2iTests: path-level finding lifecycle, auto-resolve, interval append, idempotencyNew test files

Key decision: No ExposureDoc, no exposures collection. Path-level findings coexist with entity-level findings in findings. Distinguish by presence of path_id.

Phase 3: UX — Overview + Risk Clusters (Feb 18 mockups)

Scope: Implement the Overview homepage and Risk Cluster Detail screens per Feb 18 mockups.

TaskWhatTouches
3aAdd GET /api/v1/posture/summary API (active path count, invalid ownership count, delta)New route
3bAdd GET /api/v1/posture/risk-clusters API (top clusters from active findings on paths)New route
3cAdd GET /api/v1/risk-clusters/:key/authority-paths API (paths matching cluster condition)New route
3dOverview page — "Autonomous Authority Posture": 2 posture cards (Active Autonomous Authority Paths + Dormant Autonomous Authority Paths), "Since Last Refresh" delta section (+X new paths, +Y ownership invalidations), Top 5 Risk Clusters listNew UI page
3eRisk Cluster Detail page — click cluster → locked, read-only table of Authority Paths matching cluster condition (no ad-hoc filter editing). Includes inline expand (findings, ownership, authority diagram, evidence completeness). "View in Authority Paths" link carries filters to inventory page for ad-hoc explorationNew UI page
3fUpdate nav: Overview | Risk Clusters | Authority Paths | Graph | SettingsUI layout

Phase 4: UX — Authority Paths List + Detail (Feb 18 mockups)

Scope: Implement the Authority Paths List and Authority Path Detail screens.

TaskWhatTouches
4aAuthority Paths List page — all paths with search, filters, pagination. Columns: ID, Authority Path (workload→destination), Last Execution, 30d Executions, First Seen + tag pillsNew UI page
4bAuthority Path Detail page — breadcrumb nav, dagre diagram (Workload → Identity → Destination → Data Domain), findings panel, authority state, ownership breakdown, automation metadata, linkage proofNew UI page
4cPath-level finding timeline in detail view (open/resolve/reopen intervals)UI component
4dEvidence completeness bar on path detailUI component
4eUpdate TanStack Query hooks for authority-paths, path-findingsui/src/hooks/

Phase 5: Visual QA Fixes

Scope: Address visual QA items with migration-aware sequencing (avoid over-investing in soon-to-be-replaced surfaces).

BatchIDCategoryFixTouches
Pre-migration must-fixB1BugDashboard posture summary shows contradictory totals — fix API query and tenant/type filter correctnessAPI + UI
Pre-migration must-fixD1DesignTruncated finding descriptions — add tooltip/full-text affordanceUI components
Pre-migration must-fixD2DesignDate format inconsistency across pages — standardize formatterUI date utils
Pre-migration must-fixD4DesignGraph legend gap — add missing node/edge legend entriesGraph Explorer
Pre-migration must-fixU2UXBadge contrast — improve unknown/low-contrast status badgesUI theme
Pre-migration must-fixU6UXGraph centering — auto-fit graph on loadGraph Explorer
Post-migration/deferD3DesignChains empty-state guidance (only if Chains survives migration window)UI pages
Post-migration/deferU1UXPosture card tooltips (rework on new path-based cards only)Overview page
Post-migration/deferU3UXDomain name formattingUI formatters
Post-migration/deferU4UXRedundant filter cleanupFilter sidebar
Post-migration/deferU5UXTemporal compare guidance textTemporal Compare
Post-migration/deferU7UXSync row expand affordanceSyncs page

Phase 6: Platform-Side Classification (4I)

Scope: Move security_relevance, execution_mode, egress_category from connector to platform.

TaskWhatTouches
6aAdd classification rules to Resolve stageNew classifier module
6bRemove connector-side classification codeConnector repo
6cAdd UI filter toggle for security_relevanceUI filter sidebar

Phase 7: Cleanup (blocked until Phase 8 gates pass)

TaskWhat
7aDeprecate execution_chains collection
7bRemove execution_paths[] embedded array from entities
7cRemove entity-level finding types (after confirming all consumers use path-level)
7dRun correctness validation across all tenant data

Phase 8: Operational Hardening (CTO requirement)

TaskWhat
8aAdopt observability contract in architecture/02-processing-pipeline.md (sections 9-11)
8bAdd stage-level metrics, traces, and structured logs across all 6 stages
8cConfigure early alerts for failure, stall, ingestion silence, and backlog growth
8dRun failure drills: transient failure retry, permanent failure escalation, replay from checkpoint
8eGate Phase 7 cleanup on 2 consecutive weeks of stable SLO compliance

10. Operational Requirement (Non-Optional)

For production rollout, this pipeline must be implemented with explicit runtime observability and alerting. The detailed architecture is specified in:

  • docs/architecture/02-processing-pipeline.md (sections 9-11: observability, SLOs, alerts, dashboards)

This is required to satisfy CTO requirements for:

  1. Clear batch/ETL pipeline execution model and stage transitions
  2. Runtime observability of stage health, queue depth, and data freshness
  3. Early failure alerting with actionable ownership and escalation

11. Acceptance Criteria

  1. Clicking a Risk Cluster shows Authority Paths as primary rows
  2. Each path row shows execution magnitude (30d), ownership status, active finding count, and max severity — without additional API calls
  3. Path detail page shows finding timeline with effective_from and status transitions
  4. A finding can be opened, remediated (shown as Resolved in UI), and re-opened on the same path without losing history (via intervals[])
  5. Removed paths and remediated findings remain historically queryable
  6. Every sync produces a SyncMetrics record with complete accounting of paths/findings processed
  7. Correctness validation passes after every sync (no orphaned findings, no stale counts)
  8. Platform can classify security_relevance and execution_mode without connector changes
  9. Re-running the same sync produces identical path and finding state (deterministic, idempotent)
  10. Re-opened findings preserve prior intervals in intervals[] and remain audit-queryable
  11. Pipeline emits operational telemetry and triggers alerts before silent data staleness occurs
  12. Overview page matches Feb 18 mockup: 2 posture cards (Active + Dormant) + delta + Top 5 Risk Clusters
  13. Authority Paths list and detail pages match Feb 18 mockups (search, filters, dagre diagram, findings panel)
  14. Navigation: Overview | Risk Clusters | Authority Paths | Graph | Settings