Automation Persistence Architecture — Architect Analysis

Role: Architect Date: 2026-02-13 Scope: Whether the platform needs a separate execution_chains collection (or equivalent first-class entity) to track autonomous execution chains over time, vs. keeping the current model where automations are entity_type: identity with identitySubtype

Executive Summary

The founder's concern is architecturally valid: an "automation chain" is a higher-order concept -- a named subgraph -- that the current data model cannot address, track over time, or present to a CISO as a stable, listable object. The current model stores individual entities and their pairwise relationships. It has no concept of "this set of 6 entities, connected in this order, constitutes a single automation." That is a real gap.

However, the gap is narrower than it appears. The current model already has all the data needed to reconstruct any execution chain on demand -- the entities exist, the edges exist, the temporal history exists per-entity. What is missing is:

A stable identifier for the chain as a whole
A chain-level diff that answers "what changed in this automation since last scan?"
A listable, queryable object that a CISO can bookmark, filter, and track

My recommendation is Option B+: a lightweight execution_chains collection with versioned snapshots, implemented in two phases. This gives stable identity, chain-level temporal tracking, and CISO-oriented querying without duplicating the entity-level temporal machinery that already works.

The key architectural insight: execution chains are projections, not source data. They are computed from the entity graph, not received from connectors. This means they can be built incrementally, rebuilt from scratch if the definition changes, and safely discarded if a better approach emerges. The risk of getting this wrong is bounded.

1. The Identity Problem: Chains as Subgraphs

1a. What is an "automation chain"?

An automation chain is a directed path through the entity graph, following specific edge types in execution order, that represents a complete autonomous execution flow from trigger to terminal effect.

Example (AzureGraphRouter):

incident table (resource)
  --[TRIGGERS_ON]<-- BR: "Auto-route identity tickets via Entra" (identity, business_rule)
    --[CALLS]--> SI: "AzureGraphRouter" (identity, system_execution)
      --[EXECUTES_ON]--> REST Message: "graph.microsoft.com sn-ticket-router" (resource)
        --[AUTHENTICATES_VIA]--> OAuth Client: "Azure Graph OAuth Client" (identity, oauth_app)
          --[AUTHENTICATES_TO]<-- SP: "sn-ticket-router" (identity, service_principal)

This chain contains 6 entities of 3 entity types connected by 5 edges of 5 edge types. No single entity in the chain "is" the automation. The chain is the automation.

1b. Why the current model cannot represent this

The current model stores entities individually. Each entity knows its direct neighbors (via relationships[]), and the path materializer computes reachable resources for each identity. But:

No chain-level identity. There is no document in any collection that says "these 6 entities, in this order, are the AzureGraphRouter automation." The chain must be recomputed by BFS from a seed entity on every query.
No chain-level history. If the OAuth client ID rotates (new credential, same target), the chain conceptually stays the same. But the platform records this as a property change on the OAuth entity. There is no mechanism to say "the AzureGraphRouter chain's auth credential was rotated on 2026-03-15."
No chain-level listing. A CISO cannot ask "show me all automation chains" because chains are not queryable objects. They can ask "show me all identities with identitySubtype=business_rule" which is a partial proxy, but it excludes chains anchored at SIs, flows, or scheduled jobs.
No chain-level diff. When a scan runs and an entity in the chain changes, the diff engine emits events at the entity level (property_changed, role_assigned). There is no event that says "the AzureGraphRouter chain gained access to the hr_case table" because the diff engine does not know what a chain is.

1c. How graph databases handle this

In Neo4j, the equivalent concept is a named path or path pattern:

// Define a path pattern
MATCH path = (trigger:Resource)<-[:TRIGGERS_ON]-(entry:Identity)
  -[:CALLS*0..3]->(code:Identity)
  -[:EXECUTES_ON]->(outbound:Resource)
  -[:AUTHENTICATES_VIA]->(cred:Identity)
  <-[:AUTHENTICATES_TO]-(dest:Identity)
WHERE entry.identitySubtype IN ['business_rule', 'flow_designer_flow', 'scheduled_job']
RETURN path

Neo4j can:

Return paths as first-class objects with identity (the path itself has a structure)
Apply temporal predicates (WHERE entry.valid_at <= $asOf) if using bi-temporal properties
Diff two path results from different time points

But Neo4j does not natively give a path a stable ID. If entity E3 is removed and E3' is added, Neo4j returns a different path object. The "is it the same automation?" question remains an application-level concern regardless of database.

2. Chain Identity and Stability

2a. What defines "the same automation"?

This is the Ship of Theseus problem applied to execution chains. Consider three change scenarios:

Scenario	What changed	Same chain?	Why
OAuth client secret rotated	Credential property (expires_at, key_id)	Yes	Business logic unchanged; auth refreshed
Script Include replaced with equivalent	Entity removed, new entity added	Debatable	Same function, different implementation
Trigger table changed from `incident` to `change_request`	TRIGGERS_ON target changed	No	Different business context
New role added to downstream SP	New HAS_ROLE edge	Yes	Same chain, expanded authority
Business Rule disabled	BR status changed to `disabled`	Yes (dormant)	Chain exists but is inactive

The answer depends on what aspect of the chain we anchor identity to.

2b. Anchor options

Option 1: Anchor = entry point entity (the automation that starts the chain)

The Business Rule or Flow Designer Flow that has TRIGGERS_ON is the anchor. The chain identity is sha256(tenantId:anchorEntityId).

Pro: Most stable. BRs and Flows rarely change their sys_id in ServiceNow.
Pro: Maps to how connectors discover chains -- starting from trigger automations.
Con: A single anchor can have multiple chains (a BR might CALLS two different SIs leading to different destinations).
Con: If the anchor entity is replaced (deleted + recreated), the chain gets a new ID.

Option 2: Anchor = entry point + destination pair

Chain identity = sha256(tenantId:anchorEntityId:destinationEntityId) where destination is the terminal identity or resource in the chain.

Pro: Distinguishes multiple chains from the same entry point.
Pro: Captures the business intent (this automation routes incidents to graph.microsoft.com).
Con: If the destination changes (OAuth client rotated to a different SP), the chain gets a new ID even though the business function is the same.

Option 3: Anchor = trigger + entry point pair

Chain identity = sha256(tenantId:triggerResourceId:anchorEntityId).

Pro: Captures what activates the chain and what code runs.
Con: If the trigger table changes, the chain gets a new ID.
Con: Multiple triggers can activate the same chain.

2c. Recommendation: Anchor = entry point entity, with chain fingerprint for change detection

Use the entry point entity ID as the stable anchor. This is the "branch name" in the git analogy -- it persists even as the underlying "commits" (entity versions) change.

For change detection, compute a chain fingerprint: a hash of the ordered list of (entity_id, relationship_type) tuples in the chain. When the fingerprint changes, a new chain version is created. When it does not change, only entity-level changes are recorded.

function computeChainFingerprint(entityRefs: ChainEntityRef[]): string {
  const input = entityRefs
    .map(ref => `${ref.entity_id}:${ref.role}`)
    .join("|");
  return createHash("sha256").update(input).digest("hex").slice(0, 16);
}

This means:

OAuth rotation (property change, same entities) --> same fingerprint, no new chain version
New role added to SP (relationship change, same entities in chain) --> same fingerprint, entity-level event captures the change
Script Include replaced with different SI --> different fingerprint, new chain version created
Trigger table changed --> same fingerprint (trigger is not in the entity_refs ordered list by default, but if it were, different fingerprint)

2d. The git analogy

Git concept	Chain analogy
Branch name	Chain anchor (entry point entity ID)
Commit	Chain version (fingerprint snapshot)
Commit diff	Chain-level diff (entities added/removed, authority changed)
Tree (files)	Entity refs list
HEAD	Current chain version
`git log`	Chain version history

A branch (chain) persists even when all its commits change. The branch name (anchor entity ID) is stable. Individual commits (chain versions) capture the state at each sync.

2e. When identity breaks

There is one scenario where anchor-based identity genuinely breaks: the entry point entity is deleted and recreated in the source system. In ServiceNow, this happens when an admin deletes a Business Rule and creates a new one with the same name. The new BR gets a different sys_id, therefore a different entity_id, therefore a different chain anchor.

For this case, the platform should support chain aliasing: a mechanism to declare that chain X is the successor of chain Y. This is a P3 concern. For now, accept that chain identity breaks on anchor deletion and document it.

3. Deep Technical Analysis of Options

Option A: Computed View (No Schema Change)

How it works: No new collection. Chain data is computed on-demand by running a BFS/DFS from seed entities.

// Pseudocode: compute chain on API request
async function getExecutionChain(tenantId: string, seedId: string): Promise<ChainView> {
  const subgraph = await storageAdapter.getSubgraph(tenantId, {
    seedId,
    mode: "execution_flow",
    depth: 6,
    limit: 50
  });

  // Order nodes by execution flow
  const ordered = topologicalSort(subgraph.nodes, subgraph.edges);

  return {
    name: deriveChainName(ordered),
    entities: ordered,
    edges: subgraph.edges,
    // No history -- this is the current state only
  };
}

What works:

Zero migration, zero new infrastructure
Uses existing BFS (executionFlowTraversal in adapter.ts)
No staleness problem -- always reflects current state

What breaks:

No stable ID. Every computation might return a different chain if the graph changed.
No history. Cannot answer "what did this chain look like 3 months ago?"
No chain-level events. Cannot say "the AzureGraphRouter chain was modified."
No listing. Cannot query "all chains sorted by last_changed" without computing every chain.
Performance: computing all chains for a tenant listing requires N BFS traversals (one per potential anchor).

Complexity budget:

Code changes: 0 platform, ~50 lines API endpoint
New tests: ~10
Migration risk: None

Neo4j portability: Excellent. In Neo4j, this IS the natural query pattern. No data to migrate or reconcile.

Verdict: Insufficient for the founder's requirements. Cannot track chains over time, cannot list them, cannot diff them. Useful only as a building block for other options.

Option B: `execution_chains` Collection (Lightweight Reference)

How it works: A new collection stores chain metadata as lightweight reference documents. Entities remain in the entities collection; the chain document contains ordered references.

// execution_chains collection
{
  _id: "chain-sha256(tenantId:anchorEntityId)",   // Deterministic, stable
  tenant_id: "uuid-...",
  anchor_entity_id: "uuid-br-auto-route",          // The entry point

  // Human-readable identification
  name: "AzureGraphRouter Incident Routing",        // Derived or user-assigned
  chain_type: "trigger_to_external",                // Classification
  status: "active",                                 // active | dormant | disabled | broken

  // Ordered execution flow
  entity_refs: [
    { entity_id: "uuid-incident-table",     role: "trigger",             position: 0 },
    { entity_id: "uuid-br-auto-route",      role: "entry_point",         position: 1 },
    { entity_id: "uuid-si-azure-graph",     role: "executor",            position: 2 },
    { entity_id: "uuid-rest-msg-graph",     role: "outbound_target",     position: 3 },
    { entity_id: "uuid-oauth-client",       role: "auth_credential",     position: 4 },
    { entity_id: "uuid-sp-sn-ticket",       role: "destination_identity", position: 5 }
  ],

  // Chain fingerprint for change detection
  fingerprint: "a3f8c2e1b5d94720",

  // Computed summary (refreshed each sync)
  summary: {
    trigger_table: "incident",
    trigger_event: "on_insert",
    destination_system: "graph.microsoft.com",
    egress_category: "external",
    blast_radius_domains: ["identity_platform"],
    ownership_status: "orphaned",                    // Worst ownership status in chain
    has_workflow_suppression: false,
    total_roles: 4,
    total_permissions: 12,
    total_resources_reachable: 3,
    sensitivity_max: "confidential"
  },

  // Temporal markers
  first_detected_at: ISODate("2026-02-12T10:00:00Z"),
  last_seen_at: ISODate("2026-02-13T10:00:00Z"),
  last_changed_at: ISODate("2026-02-13T10:00:00Z"),  // When fingerprint last changed
  sync_version: 42,

  // Chain-level findings
  active_finding_ids: ["uuid-finding-orphaned", "uuid-finding-scope-drift"]
}

Chain computation pipeline (runs during sync):

// Pseudocode: chain builder runs after entity upsert + path materialization
async function buildExecutionChains(
  tenantId: string,
  storageAdapter: StorageAdapter
): Promise<ChainBuildResult> {
  // 1. Find all anchor candidates: identities with TRIGGERS_ON relationships
  const anchors = await storageAdapter.queryEntities(tenantId, {
    entityType: "identity",
    identitySubtype: "business_rule,flow_designer_flow,scheduled_job"
  });

  const chains: ExecutionChainDoc[] = [];

  for (const anchor of anchors) {
    // 2. BFS forward from anchor following execution edges
    const subgraph = await storageAdapter.getSubgraph(tenantId, {
      seedId: anchor._id,
      mode: "execution_flow",
      depth: 6,
      limit: 30
    });

    // 3. Order entities by execution flow
    const entityRefs = orderByExecutionFlow(subgraph);

    // 4. Compute fingerprint
    const fingerprint = computeChainFingerprint(entityRefs);

    // 5. Compute summary from entity data
    const summary = computeChainSummary(subgraph, entityRefs);

    // 6. Build chain ID
    const chainId = buildStableChainId(tenantId, anchor._id);

    // 7. Check if fingerprint changed
    const existing = await storageAdapter.getExecutionChain(tenantId, chainId);
    const changed = !existing || existing.fingerprint !== fingerprint;

    chains.push({
      _id: chainId,
      tenant_id: tenantId,
      anchor_entity_id: anchor._id,
      name: deriveChainName(entityRefs, subgraph),
      chain_type: classifyChainType(summary),
      status: deriveChainStatus(subgraph),
      entity_refs: entityRefs,
      fingerprint,
      summary,
      first_detected_at: existing?.first_detected_at ?? new Date(),
      last_seen_at: new Date(),
      last_changed_at: changed ? new Date() : (existing?.last_changed_at ?? new Date()),
      sync_version: Date.now()
    });
  }

  // 8. Upsert all chains
  await storageAdapter.upsertExecutionChains(chains);

  // 9. Mark chains not seen in this sync as potentially broken
  // (anchor entity was deleted or no longer matches criteria)

  return { chainsComputed: chains.length };
}

What works:

Stable ID (deterministic from anchor entity ID)
Listable: db.execution_chains.find({ tenant_id: "..." }).sort({ "summary.sensitivity_max": -1 })
Filterable: by chain_type, ownership_status, egress_category, etc.
Chain-level change detection via fingerprint comparison
Lightweight: references entities rather than duplicating their data
Chain-level summary answers CISO questions without N+1 entity lookups

What does not work (yet):

No temporal history for the chain itself (only current state)
Cannot answer "what did this chain look like 3 months ago?" -- only entity-level history available
last_changed_at tracks fingerprint changes but not what specifically changed

Complexity budget:

New collection: execution_chains (1 new collection, bringing total to 11)
New code: ~200 lines chain builder + ~100 lines chain types + ~50 lines API endpoints
New tests: ~30 (builder logic, fingerprint stability, chain classification)
StorageAdapter additions: 4 new methods
Migration risk: Low. Additive -- no existing data or APIs change.
Rebuild safety: Chains can be fully rebuilt from entity data at any time (they are projections).

Neo4j portability:

In Neo4j, chain metadata could live as a virtual node with CONTAINS edges to member entities
Or it could be a pure application-layer concept with Neo4j providing the traversal
The execution_chains collection data would be partially redundant in Neo4j (the traversal replaces the entity_refs), but the summary and temporal markers would still live in MongoDB
Verdict: Compatible. Chain metadata stays in MongoDB regardless of graph backend.

Option C: `execution_chains` + `execution_chain_versions` (Rich Temporal)

How it works: Like Option B, plus a execution_chain_versions collection that stores a full snapshot of the chain document each time the fingerprint changes.

// execution_chain_versions collection
{
  _id: ObjectId,
  chain_id: "chain-sha256(...)",
  tenant_id: "uuid-...",
  valid_at: ISODate("2026-02-12T10:00:00Z"),
  expired_at: ISODate("2026-02-13T10:00:00Z"),   // null = current
  sync_version: 42,

  // Full chain snapshot at this point in time
  entity_refs: [ /* same structure as chain doc */ ],
  fingerprint: "a3f8c2e1b5d94720",
  summary: { /* same structure */ },

  // What changed from previous version
  diff: {
    entities_added: ["uuid-new-role"],
    entities_removed: [],
    summary_changes: {
      total_roles: { old: 3, new: 4 },
      sensitivity_max: { old: "internal", new: "confidential" }
    }
  }
}

What this adds over Option B:

Full temporal chain history: "show me this chain at any point in time"
Chain-level diffs: "what changed between version N and N+1?"
Answers the founder's scenario directly: "OAuth client rotated, same chain, here is the diff"

What it costs:

Storage growth: one chain version per chain per fingerprint change per sync. For 50 chains with weekly changes, that is ~2,600 documents/year. Manageable.
Complexity: diff computation at the chain level, not just entity level
Partial redundancy with entity_versions: chain version snapshots entity_refs, but the entity details live in entity_versions

Complexity budget:

New collections: 2 (execution_chains + execution_chain_versions)
New code: ~350 lines chain builder + ~150 lines version/diff + ~100 lines types + ~80 lines API
New tests: ~50
StorageAdapter additions: 6 new methods
Migration risk: Low-medium. More code to maintain, but still additive.

Neo4j portability: Same as Option B. Chain versions are an application-level concept that lives in MongoDB regardless.

Option D: Virtual Entity in `entities` Collection

How it works: Add entity_type: "execution_chain" to the existing single-collection model.

{
  _id: "chain-sha256(tenantId:anchorEntityId)",
  tenant_id: "uuid-...",
  entity_type: "execution_chain",          // New entity type
  source_system: "platform",               // Platform-computed, not from connector
  source_id: "chain:uuid-br-auto-route",

  properties: {
    display_name: "AzureGraphRouter Incident Routing",
    status: "active",
    chain_type: "trigger_to_external",
    anchor_entity_id: "uuid-br-auto-route",
    fingerprint: "a3f8c2e1b5d94720",
    // ... summary fields ...
  },

  relationships: [
    { type: "CONTAINS", target_id: "uuid-incident-table", properties: { role: "trigger", position: 0 } },
    { type: "CONTAINS", target_id: "uuid-br-auto-route", properties: { role: "entry_point", position: 1 } },
    // ... one CONTAINS per chain member ...
  ],

  execution_paths: [],   // Not applicable -- chains are not identities
  sync_version: 42,
  last_synced_at: ISODate("2026-02-13"),
  created_at: ISODate("2026-02-12"),
  updated_at: ISODate("2026-02-13")
}

What works:

Uses ALL existing temporal machinery: entity_versions, events, baselines
Uses existing indexes on entities collection
Uses existing API endpoints (queryEntities with entityType=execution_chain)
Diff engine automatically produces events when chain relationships change
Path materializer skips non-identity entities, so no interference

What breaks:

Semantic violation. entity_type was designed for source-system entities. An execution chain is a platform-computed projection, not something discovered from a source system. This blurs a critical boundary.
source_system = "platform" has no connector. The ENTITY_TYPES const, every type guard, and every switch/case on entity_type needs a new branch.
Relationships are overloaded. CONTAINS does not mean the same thing as OWNED_BY or HAS_ROLE. Entity relationships represent authority graph edges. Chain membership is a grouping operation, not an authority relationship.
Index pollution. Every entities index now includes execution_chain documents. Queries that filter on entity_type: "identity" are unaffected, but queries that scan all entities (e.g., baseline creation) now include chains.
Circular reference risk. A chain CONTAINS entities, but entities in the chain have relationships to each other. If any downstream code follows CONTAINS edges during path materialization or subgraph traversal, it could create unexpected results.
Event noise. The diff engine would produce events like "relationship_added: CONTAINS uuid-br-auto-route" on the chain document, which would mix with real entity events in the events collection.

Complexity budget:

Schema changes: Add "execution_chain" to ENTITY_TYPES const, EntityType union
Code changes: Significant. Every place that branches on entity_type needs review: path materializer, evaluator, subgraph traversal, API response shaping, UI rendering.
New tests: ~40, plus regression tests for every entity_type branch
Migration risk: Medium-high. Touches core type system.

Neo4j portability: Poor. In Neo4j, a chain would be a node with CONTAINS edges to member entities. But this creates a "virtual" node in the graph that does not correspond to any real-world entity. Neo4j queries would need to exclude chain nodes from traversals, adding complexity to every Cypher query.

4. Neo4j Portability Analysis

The planned Neo4j integration uses Neo4j as a "thin graph index" with MongoDB remaining source of truth. The MongoNeo4jStorageAdapter dual-writes thin nodes and edges to Neo4j.

How each option interacts with Neo4j

Aspect	Option A (Computed)	Option B (Lightweight)	Option C (Rich Temporal)	Option D (Virtual Entity)
Neo4j stores chain data?	No -- computed from graph	No -- MongoDB only	No -- MongoDB only	Yes -- as a node with CONTAINS edges
Neo4j traversal finds chains?	Yes, natively	Yes, but listing requires MongoDB	Yes, but listing requires MongoDB	Yes, but pollutes real graph
Redundancy in Neo4j?	None	None	None	High -- CONTAINS edges duplicate path data
Migration complexity	None	None	None	Must add/exclude chain nodes in Neo4j sync
Best fit?	For real-time traversal	For listing + tracking	For full temporal analysis	Worst fit -- semantic pollution

Verdict: Options B and C are the most Neo4j-friendly. They keep chain metadata in MongoDB (where it belongs as application-level state) and let Neo4j do what it does best (traversal). Option D would pollute the Neo4j graph with virtual nodes that every Cypher query must filter out.

Named paths in Neo4j

When Neo4j is added, the platform could define named path patterns that replace or supplement the execution_chains collection for real-time queries:

// Named path pattern (Neo4j 5.x)
CALL {
  MATCH path = (trigger:Resource)<-[:TRIGGERS_ON]-(entry:Identity)
    -[:CALLS*0..3]->(code:Identity)
    -[:EXECUTES_ON]->(outbound:Resource)
  WHERE entry.identitySubtype IN ['business_rule', 'flow_designer_flow', 'scheduled_job']
  RETURN path, entry._id AS anchor_id
}
RETURN anchor_id, nodes(path), relationships(path)

The execution_chains collection would still hold:

Stable chain IDs (Neo4j paths are ephemeral)
Chain-level summaries (pre-computed for listing)
Temporal chain versions (Neo4j is not the temporal store)
Chain-level findings linkage

Neo4j replaces the BFS traversal used to build chains, but MongoDB holds the chain metadata that the BFS cannot provide.

5. Complexity Budget Analysis

Current codebase state: 205 unit + 84 integration tests, 10 collections, ~4,500 lines of platform TypeScript.

Option A: +0 complexity, -0 capability debt

No new code. But the founder's requirements remain unmet. This is not a viable long-term answer.

Option B: +400 lines, +30 tests, +1 collection

Component	Lines	Tests	Risk
ChainBuilder service	200	15	Low -- can be rebuilt from entities
Chain types + StorageAdapter additions	100	5	Low -- additive interface
API endpoints (list, get, history)	80	8	Low -- read-only queries
Chain computation in sync pipeline	50	5	Medium -- adds time to sync
Total	~430	~33	Low overall

Sync pipeline impact: Chain building runs after path materialization. For 20 anchor entities with depth-6 BFS each, this adds ~200ms to sync time. Acceptable.

Collection count: 10 --> 11. Within reasonable bounds.

Option C: +680 lines, +50 tests, +2 collections

Component	Lines	Tests	Risk
Everything in Option B	430	33	Low
Chain versioning + diff	150	12	Medium -- diff logic complexity
Version query endpoints	80	5	Low
Chain-level event emission	50	5	Low
Total	~710	~55	Low-medium

Collection count: 10 --> 12. Still reasonable, but the execution_chain_versions collection adds storage management concerns (retention policy, index sizing).

Option D: +350 lines, +40 tests, +0 collections (but mutates core type system)

Component	Lines	Tests	Risk
Entity type additions	20	5	High -- touches core discriminator
Type guard updates across codebase	80	15	High -- regression risk
Path materializer guards	30	5	Medium
UI rendering for new type	100	10	Low
Chain builder (same as B)	200	15	Low
Total	~430	~50	Medium-high

The risk is not in the volume of code but in the surface area of change. Every entity_type branch needs review. The current codebase has these branch points:

path-materializer.ts line 27: entity.entity_type !== "identity"
graph-transformer.ts line 35: mapNodeType() switch
diff-engine.ts line 34: relationshipToEventType()
adapter.ts line 582: AUTOMATION_SUBTYPES
sync-ingestion.ts line 132: e.entity_type === "identity"
UI: at least 8 components branch on entity_type for colors, icons, routing
API routes: entity queries accept entity_type as filter

Each of these is low-risk individually but the aggregate regression surface is substantial.

What breaks if we get the chain definition wrong?

Option B/C: Nothing breaks. Chains are projections. If the chain definition is wrong:

Delete the execution_chains collection
Adjust the chain builder
Re-run on next sync
No entity data is affected

Option D: More disruptive. Chain entities are mixed into the entities collection with entity_versions, events, and baselines. Removing them requires:

Deleting all documents with entity_type: "execution_chain" from entities, entity_versions, events
Removing the entity_type from the const/union
Reviewing all code paths that might have consumed chain entities

This is the strongest argument for B/C over D: projection safety. A projection can be wrong without damaging source data. A type system change is permanent.

6. The Ship of Theseus Problem

6a. When every entity is replaced

Consider this timeline:

Sync 1 (Feb):  BR-A --> SI-A --> REST-A --> OAuth-A --> SP-A
Sync 2 (Mar):  BR-A --> SI-A --> REST-A --> OAuth-A' --> SP-A    // OAuth client rotated
Sync 3 (Apr):  BR-A --> SI-B --> REST-A --> OAuth-A' --> SP-A    // SI refactored (new sys_id)
Sync 4 (May):  BR-A --> SI-B --> REST-B --> OAuth-A' --> SP-A    // REST message endpoint changed
Sync 5 (Jun):  BR-A --> SI-B --> REST-B --> OAuth-B --> SP-B     // Full auth chain replaced

By Sync 5, only the Business Rule (BR-A) remains from Sync 1. Is this still the same automation?

Yes. The anchor entity (BR-A) persists. The chain identity is anchored to BR-A. Every other entity was replaced, but the CISO's question remains the same: "what does the incident-triggered auto-routing automation do now, and how has it changed?"

This is precisely the git branch analogy. After a force push that rewrites every commit, the branch name (main, feature/x) is the same branch. The content changed completely.

6b. When the anchor itself is replaced

If BR-A is deleted and BR-C is created with the same name and trigger:

BR-C gets a new sys_id, therefore a new entity_id
The chain anchored to BR-A shows status: "broken" (anchor entity deleted)
A new chain is created anchored to BR-C
The CISO sees two chains: one broken (historical), one new (current)

This is the correct behavior. The platform should NOT automatically assume BR-C is the successor of BR-A. That is an inference. The CISO (or an operator) can manually link them via chain aliasing (P3 feature).

6c. Minimal anchor for chain identity

The minimal anchor is: the entry point entity that has a TRIGGERS_ON relationship (or equivalent initiator pattern).

Why this and not the trigger resource (incident table)?

Multiple automations can trigger on the same table. The table is not unique to any chain.
The entry point entity is the first piece of code that runs. It defines the chain.

Why not the full path?

The path changes too frequently. Every entity swap changes the fingerprint.
Chain identity should be more stable than chain content.

Why not a user-assigned name?

User-assigned names require manual maintenance.
Deterministic anchor IDs are computed automatically.
Users CAN override the chain name (stored in the name field) but the ID remains deterministic.

6d. Interaction with connector discovery

Connectors have no concept of "chain." They discover individual entities and pairwise relationships. The chain is assembled by the platform from connector output.

This is architecturally correct. Connectors are source-system-specific. Chains are cross-system concepts (an Entra SP in the chain is discovered by the Entra connector; the ServiceNow BR is discovered by the ServiceNow connector). Only the platform, which has visibility across all connectors, can assemble the complete chain.

The chain builder runs AFTER ingestion (after all connector outputs are merged, entities upserted, and paths materialized). It consumes the unified entity graph, not the raw connector output.

7. Recommendation

Phase 1: Option B (Lightweight `execution_chains` collection)

Ship in the current sprint. This gives:

Stable chain IDs
Listable, queryable chain objects
Chain-level summaries for CISO dashboard
Change detection via fingerprint comparison
last_changed_at for temporal awareness

Do NOT ship chain versions yet. Phase 1 gives the CISO a list of automations to review. Chain-level temporal history (Option C) is a Phase 2 addition when the first customer asks "show me what changed in this chain over the last 3 months."

Phase 2: Add `execution_chain_versions` (Option C extension)

Ship when temporal chain tracking becomes a requirement. This adds:

Full chain snapshots at each fingerprint change
Chain-level diffs (entities added/removed, authority changes)
Chain-level events ("chain_authority_expanded", "chain_entity_replaced")

Phase 3: Chain aliasing and user overrides

Ship when customers hit the anchor-deletion scenario. This adds:

Chain successor linking (chain X is the continuation of chain Y)
User-assigned chain names that survive re-anchoring
Chain-level finding rules (cross-entity detections at the chain level)

StorageAdapter Interface Additions

// Phase 1 additions
interface StorageAdapter {
  // ... existing methods ...

  // === Execution Chains ===
  upsertExecutionChain(chain: ExecutionChainDoc): Promise<{ upserted: boolean }>;
  upsertExecutionChains(chains: ExecutionChainDoc[]): Promise<{ upserted: number; updated: number }>;
  getExecutionChain(tenantId: string, chainId: string): Promise<ExecutionChainDoc | null>;
  queryExecutionChains(tenantId: string, query: ChainQuery): Promise<ExecutionChainDoc[]>;
  countExecutionChains(tenantId: string, query?: ChainQuery): Promise<number>;
}

interface ChainQuery {
  chainType?: string;           // trigger_to_external, trigger_to_internal, etc.
  status?: string;              // active, dormant, disabled, broken
  ownershipStatus?: string;     // from summary.ownership_status
  egressCategory?: string;      // from summary.egress_category
  anchorEntityId?: string;      // specific anchor
  q?: string;                   // name search
  sort?: string;                // -last_changed_at, -summary.sensitivity_max
  limit?: number;
  offset?: number;
}

// Phase 2 additions
interface StorageAdapter {
  // ... Phase 1 methods ...

  // === Execution Chain Versions ===
  insertChainVersion(version: ExecutionChainVersionDoc): Promise<void>;
  getChainVersion(tenantId: string, chainId: string, asOf: Date): Promise<ExecutionChainVersionDoc | null>;
  getChainVersionHistory(tenantId: string, chainId: string, limit?: number): Promise<ExecutionChainVersionDoc[]>;
}

Type Definitions

// execution-chains/types.ts (Phase 1)

export interface ChainEntityRef {
  entity_id: string;
  role: ChainEntityRole;
  position: number;
}

export type ChainEntityRole =
  | "trigger"
  | "entry_point"
  | "executor"
  | "outbound_target"
  | "auth_credential"
  | "destination_identity";

export type ChainType =
  | "trigger_to_external"       // ends at external system (egress)
  | "trigger_to_internal"       // internal-only chain
  | "scheduled_external"        // no trigger resource, scheduled entry point
  | "scheduled_internal";       // scheduled, internal-only

export type ChainStatus =
  | "active"                    // all entities exist, anchor is active
  | "dormant"                   // all entities exist but no recent execution
  | "disabled"                  // anchor entity is disabled
  | "broken"                    // anchor entity deleted or chain cannot be assembled
  | "partial";                  // some entities in chain are missing/deleted

export interface ChainSummary {
  trigger_table?: string;
  trigger_event?: string;
  destination_system?: string;
  egress_category: string;
  blast_radius_domains: string[];
  ownership_status: string;
  has_workflow_suppression: boolean;
  total_roles: number;
  total_permissions: number;
  total_resources_reachable: number;
  sensitivity_max: string;
}

export interface ExecutionChainDoc {
  _id: string;                           // Deterministic: sha256(tenantId:anchorEntityId)
  tenant_id: string;
  anchor_entity_id: string;
  name: string;
  chain_type: ChainType;
  status: ChainStatus;
  entity_refs: ChainEntityRef[];
  fingerprint: string;
  summary: ChainSummary;
  first_detected_at: Date;
  last_seen_at: Date;
  last_changed_at: Date;
  sync_version: number;
  active_finding_ids?: string[];
}

// execution-chain-versions/types.ts (Phase 2)

export interface ChainDiff {
  entities_added: string[];
  entities_removed: string[];
  summary_changes: Record<string, { old: unknown; new: unknown }>;
}

export interface ExecutionChainVersionDoc {
  _id?: string;
  chain_id: string;
  tenant_id: string;
  valid_at: Date;
  expired_at: Date | null;
  sync_version: number;
  entity_refs: ChainEntityRef[];
  fingerprint: string;
  summary: ChainSummary;
  diff?: ChainDiff;
}

Indexes

// === execution_chains (Phase 1) ===

// Primary lookup
db.execution_chains.createIndex(
  { tenant_id: 1, _id: 1 },
  { unique: true }
);

// List chains by type and status
db.execution_chains.createIndex(
  { tenant_id: 1, chain_type: 1, status: 1 }
);

// Find chains by anchor entity
db.execution_chains.createIndex(
  { tenant_id: 1, anchor_entity_id: 1 },
  { unique: true }
);

// Sort by last changed (for CISO dashboard: "recently changed automations")
db.execution_chains.createIndex(
  { tenant_id: 1, last_changed_at: -1 }
);

// Filter by summary fields (ownership, egress, sensitivity)
db.execution_chains.createIndex(
  { tenant_id: 1, "summary.ownership_status": 1, "summary.egress_category": 1 }
);

// Find chains containing a specific entity
db.execution_chains.createIndex(
  { tenant_id: 1, "entity_refs.entity_id": 1 }
);

// === execution_chain_versions (Phase 2) ===

// Point-in-time lookup
db.execution_chain_versions.createIndex(
  { chain_id: 1, valid_at: 1, expired_at: 1 }
);

// Version history for a chain
db.execution_chain_versions.createIndex(
  { tenant_id: 1, chain_id: 1, valid_at: -1 }
);

Ingestion Pipeline Changes

The sync pipeline currently has this structure (from sync-ingestion.ts):

Create ConnectorSyncDoc
Transform graph -> entities + evidence
Diff against existing state -> events
Upsert entities
Insert events
Create entity versions for changed entities
Insert execution evidence
Materialize execution paths
Update ConnectorSyncDoc with metrics

Chain building is inserted as step 8.5:

Materialize execution paths
5 Build/update execution chains          <-- NEW
Update ConnectorSyncDoc with metrics

Why after path materialization: The chain builder needs entity_refs that include the full execution flow. The path materializer resolves RUNS_AS and AUTHENTICATES_TO traversals that determine which entities are in the chain. Without materialized paths, the chain builder would need to duplicate traversal logic.

Why before metrics: Chain build metrics (chains computed, chains changed) should be included in the sync metrics.

// In sync-ingestion.ts, after step 8:

// 8.5 Build/update execution chains
const chainResult = await buildExecutionChains(tenantId, storageAdapter);

// Then include in metrics:
metrics.chains_computed = chainResult.chainsComputed;
metrics.chains_changed = chainResult.chainsChanged;

API Endpoints

GET  /api/v1/execution-chains                    # List chains (paginated, filterable)
GET  /api/v1/execution-chains/:chainId           # Get chain detail
GET  /api/v1/execution-chains/:chainId/entities  # Get all entities in chain (expanded)
GET  /api/v1/execution-chains/:chainId/history   # Phase 2: chain version history
GET  /api/v1/execution-chains/:chainId/diff      # Phase 2: diff between two versions

Query parameters for listing:

?chainType=trigger_to_external
&status=active
&ownershipStatus=orphaned
&sort=-last_changed_at
&limit=20
&offset=0

8. Chain Builder Algorithm

8a. Anchor Discovery

async function discoverAnchors(
  tenantId: string,
  storageAdapter: StorageAdapter
): Promise<EntityDoc[]> {
  // Anchors are identity entities with TRIGGERS_ON relationships
  // (business_rule, flow_designer_flow, scheduled_job)
  const candidates = await storageAdapter.queryEntities(tenantId, {
    entityType: "identity",
    identitySubtype: "business_rule,flow_designer_flow,scheduled_job",
    limit: 0 // no limit
  });

  // Filter to entities that actually have TRIGGERS_ON or are scheduled
  return candidates.filter(entity => {
    const hasTriggersOn = entity.relationships.some(r => r.type === "TRIGGERS_ON");
    const isScheduled = entity.relationships.some(
      r => r.type === "TRIGGERS_ON" && r.properties.trigger_type === "schedule"
    ) || (entity.properties.identitySubtype === "scheduled_job");
    return hasTriggersOn || isScheduled;
  });
}

8b. Chain Assembly

function orderByExecutionFlow(
  subgraph: SubgraphResult,
  anchorId: string
): ChainEntityRef[] {
  const refs: ChainEntityRef[] = [];
  const visited = new Set<string>();
  const nodeMap = new Map(subgraph.nodes.map(n => [n._id, n]));

  // 1. Find trigger resources (targets of TRIGGERS_ON from anchor)
  const anchor = nodeMap.get(anchorId);
  if (!anchor) return refs;

  for (const edge of subgraph.edges) {
    if (edge.source_id === anchorId && edge.relationship_type === "TRIGGERS_ON") {
      if (!visited.has(edge.target_id)) {
        visited.add(edge.target_id);
        refs.push({ entity_id: edge.target_id, role: "trigger", position: refs.length });
      }
    }
  }

  // 2. Entry point (the anchor itself)
  visited.add(anchorId);
  refs.push({ entity_id: anchorId, role: "entry_point", position: refs.length });

  // 3. Follow execution edges forward: CALLS -> EXECUTES_ON -> AUTHENTICATES_VIA
  const FORWARD_EDGES = ["CALLS", "EXECUTES_ON", "AUTHENTICATES_VIA"];
  const REVERSE_EDGES = ["AUTHENTICATES_TO"]; // SP -> OAuth direction

  let frontier = [anchorId];
  while (frontier.length > 0) {
    const nextFrontier: string[] = [];

    for (const nodeId of frontier) {
      // Forward edges from this node
      for (const edge of subgraph.edges) {
        if (edge.source_id === nodeId && FORWARD_EDGES.includes(edge.relationship_type)) {
          if (!visited.has(edge.target_id)) {
            visited.add(edge.target_id);
            const role = classifyEntityRole(nodeMap.get(edge.target_id), edge.relationship_type);
            refs.push({ entity_id: edge.target_id, role, position: refs.length });
            nextFrontier.push(edge.target_id);
          }
        }
      }

      // Reverse AUTHENTICATES_TO edges (SP -> this node)
      for (const edge of subgraph.edges) {
        if (edge.target_id === nodeId && REVERSE_EDGES.includes(edge.relationship_type)) {
          if (!visited.has(edge.source_id)) {
            visited.add(edge.source_id);
            refs.push({
              entity_id: edge.source_id,
              role: "destination_identity",
              position: refs.length
            });
            nextFrontier.push(edge.source_id);
          }
        }
      }

      // Follow RUNS_AS from this node (automation -> SP it runs as)
      for (const edge of subgraph.edges) {
        if (edge.source_id === nodeId && edge.relationship_type === "RUNS_AS") {
          if (!visited.has(edge.target_id)) {
            visited.add(edge.target_id);
            refs.push({
              entity_id: edge.target_id,
              role: "destination_identity",
              position: refs.length
            });
            nextFrontier.push(edge.target_id);
          }
        }
      }
    }

    frontier = nextFrontier;
  }

  return refs;
}

8c. Summary Computation

function computeChainSummary(
  subgraph: SubgraphResult,
  entityRefs: ChainEntityRef[]
): ChainSummary {
  const nodeMap = new Map(subgraph.nodes.map(n => [n._id, n]));

  // Trigger info
  const triggerRef = entityRefs.find(r => r.role === "trigger");
  const triggerEntity = triggerRef ? nodeMap.get(triggerRef.entity_id) : undefined;

  // Destination info
  const destRef = entityRefs.find(r => r.role === "destination_identity");
  const destEntity = destRef ? nodeMap.get(destRef.entity_id) : undefined;

  // Aggregate across all entities in chain
  let worstOwnership = "active";
  let hasWorkflowSuppression = false;
  const blastRadiusDomains = new Set<string>();
  let maxSensitivity = "public";
  let totalRoles = 0;
  let totalPermissions = 0;
  let totalResources = 0;

  for (const ref of entityRefs) {
    const entity = nodeMap.get(ref.entity_id);
    if (!entity) continue;

    // Ownership: worst status in chain
    const ownershipStatus = entity.properties.ownership_status as string ?? "unknown";
    if (ownershipStatus === "orphaned") worstOwnership = "orphaned";
    else if (ownershipStatus === "degraded" && worstOwnership !== "orphaned") worstOwnership = "degraded";

    // Workflow suppression
    if (entity.properties.workflow_suppression) hasWorkflowSuppression = true;

    // Execution paths (from identity entities)
    for (const path of entity.execution_paths ?? []) {
      blastRadiusDomains.add(path.business_domain);
      totalResources++;
      if (SENSITIVITY_ORDER[path.sensitivity] > SENSITIVITY_ORDER[maxSensitivity]) {
        maxSensitivity = path.sensitivity;
      }
    }

    // Count roles and permissions via relationships
    for (const rel of entity.relationships) {
      if (rel.type === "HAS_ROLE") totalRoles++;
      if (rel.type === "GRANTS") totalPermissions++;
    }
  }

  return {
    trigger_table: triggerEntity?.properties.resource_name as string,
    trigger_event: getTriggerEvent(entityRefs, subgraph),
    destination_system: destEntity?.source_system,
    egress_category: deriveEgressCategory(destEntity),
    blast_radius_domains: [...blastRadiusDomains],
    ownership_status: worstOwnership,
    has_workflow_suppression: hasWorkflowSuppression,
    total_roles: totalRoles,
    total_permissions: totalPermissions,
    total_resources_reachable: totalResources,
    sensitivity_max: maxSensitivity
  };
}

const SENSITIVITY_ORDER: Record<string, number> = {
  public: 0,
  internal: 1,
  confidential: 2,
  restricted: 3
};

9. Migration Plan

Phase 1 Implementation (estimated: 12-16 hours)

Step	Effort	Description
1. Types	1h	Create `ExecutionChainDoc`, `ChainQuery`, `ChainEntityRef` types
2. Collection	0.5h	Add `execution_chains` to collections.ts, schema.ts
3. StorageAdapter	2h	Add 5 methods to interface + MongoStorageAdapter implementation
4. Indexes	0.5h	Add 6 indexes to schema.ts
5. Chain builder	4h	Anchor discovery, chain assembly, fingerprint, summary computation
6. Sync pipeline integration	1h	Add step 8.5 to sync-ingestion.ts
7. API endpoints	2h	List, get, get-entities endpoints
8. Unit tests	2h	Chain builder, fingerprint, summary, ordering
9. Integration tests	2h	End-to-end: ingest -> build chains -> query
Total	~15h

Phase 2 Implementation (estimated: 8-10 hours, deferred)

Step	Effort	Description
1. Version types	0.5h	`ExecutionChainVersionDoc`, `ChainDiff`
2. Version collection + indexes	0.5h	Add `execution_chain_versions`
3. StorageAdapter additions	1.5h	3 new methods
4. Version builder	3h	Diff computation, version creation on fingerprint change
5. API endpoints	1.5h	History, diff endpoints
6. Tests	2h	Version creation, diff accuracy, history ordering
Total	~9h

Rollback Plan

If the chain model proves wrong:

Drop execution_chains collection: db.execution_chains.drop()
Remove chain builder step from sync pipeline (comment out 2 lines in sync-ingestion.ts)
Remove API endpoints (delete route file)
No entity data is affected. No events are lost. No versions are corrupted.

Total rollback time: 15 minutes of code changes + 1 deploy.

This is the primary advantage of the projection approach: zero blast radius on rollback.

10. The Founder's Scenarios, Addressed

Scenario 1: "OAuth client_id updated, but logic is the same"

With Option B:

Sync N: Chain exists with fingerprint F1, entity_refs includes OAuth-A
OAuth-A's properties change (new client_id). Entity-level diff emits updated event.
Chain builder runs. BFS finds same entities. entity_refs unchanged. Fingerprint F1 unchanged.
Chain last_seen_at updated. last_changed_at NOT updated (fingerprint same).
CISO sees: same chain, last changed 2 weeks ago, property change on OAuth entity visible in entity history.

With Option C (Phase 2):

No chain version created (fingerprint unchanged). Entity version history shows the OAuth change.
CISO can drill into entity history for the OAuth client to see the property change.

Scenario 2: "Automation gets permissions to sensitive data"

With Option B:

Sync N: Chain exists. SP has 2 roles. summary.sensitivity_max = "internal".
New role hr_admin granted to SP. Entity-level diff emits role_assigned event.
Path materializer computes new execution paths including hr_case (confidential).
Chain builder runs. BFS returns same entities (same fingerprint). But summary.sensitivity_max changes to "confidential".
Chain last_changed_at updated (summary changed even though fingerprint did not).
Finding evaluator fires scope_drift on the SP entity.
Chain's active_finding_ids updated to include the new finding.
CISO dashboard: chain "AzureGraphRouter" shows red badge (new finding, sensitivity escalation).

With Option C (Phase 2):

New chain version created with diff: summary_changes: { sensitivity_max: { old: "internal", new: "confidential" } }
CISO can see the exact moment the chain's blast radius expanded.

Scenario 3: "Security breach -- CISO needs to find why quickly"

With Option B:

CISO goes to chain listing, filters by ownershipStatus=orphaned and egressCategory=external.
Finds "AzureGraphRouter Incident Routing" chain with 2 active findings.
Clicks chain -> sees all 6 entities in execution order with roles and permissions.
Clicks SP entity -> sees entity version history: role added on 2026-01-15, ownership decayed on 2025-07-15.
Clicks finding -> evidence pack with full temporal context.

With Option C (Phase 2):

Step 3 also shows chain version history: "3 months ago this chain had 2 roles and internal-only sensitivity. Today it has 4 roles and confidential sensitivity."
The chain-level diff is the "timeline view" the founder wants.

11. What This Analysis Does NOT Cover

UI design for chain listing/detail pages. That is the product owner and developer's scope.
Finding rules at the chain level. Currently, findings fire per-entity. Chain-level findings (e.g., "this entire chain is unowned end-to-end") are a Phase 3 concern.
Cross-tenant chains. The current model is single-tenant. Cross-tenant chains (Entra in tenant A authenticating to ServiceNow in tenant B) require multi-tenant chain assembly, which is out of scope.
Performance optimization. The chain builder does N BFS traversals per sync (one per anchor). For <100 anchors this is fine. For 1,000+ anchors, batch optimization or incremental rebuilding would be needed.

12. Summary Table

Criterion	Option A (Computed)	Option B (Lightweight)	Option C (Rich Temporal)	Option D (Virtual Entity)
Stable chain ID	No	Yes	Yes	Yes
Chain-level listing	No	Yes	Yes	Yes
Chain-level history	No	No	Yes	Yes (via entity_versions)
Chain-level diff	No	Fingerprint only	Full diff	Via event replay
Neo4j portability	Best	Good	Good	Poor
Rollback safety	N/A	Safe (drop collection)	Safe (drop 2 collections)	Risky (type system change)
Implementation effort	0	15h	24h	15h + high regression risk
New collections	0	1	2	0 (but mutates core types)
Meets founder requirements	No	Partially (no temporal)	Yes	Yes (at high cost)
Recommendation	Stepping stone	Phase 1	Phase 2	Reject

Final recommendation: Phase 1 = Option B. Phase 2 = extend to Option C when temporal chain tracking is validated by customer need. Do not implement Option D.

Appendix A: Chain Entity Role Classification

function classifyEntityRole(
  entity: EntityDoc | undefined,
  edgeType: string
): ChainEntityRole {
  if (!entity) return "executor";

  const subtype = entity.properties.identitySubtype as string;
  const entityType = entity.entity_type;

  // Resource at the end of EXECUTES_ON
  if (entityType === "resource" && edgeType === "EXECUTES_ON") return "outbound_target";

  // OAuth/credential via AUTHENTICATES_VIA
  if (entityType === "credential" || subtype === "oauth_app") return "auth_credential";

  // Service Principal at the end of RUNS_AS or AUTHENTICATES_TO
  if (subtype === "service_principal" || subtype === "machine_account") return "destination_identity";

  // Code artifacts (SI, script)
  if (subtype === "system_execution") return "executor";

  // Default
  return "executor";
}

Appendix B: Chain Fingerprint Stability Contract

The fingerprint MUST be stable across syncs if and only if the set of entities in the chain and their roles have not changed. This means:

Property changes do not change the fingerprint. A role name change, a status change, a credential rotation -- none of these change which entities are in the chain or their roles.
Entity addition/removal changes the fingerprint. If a new Script Include is added to the chain (BR now CALLS two SIs), the fingerprint changes.
Entity replacement changes the fingerprint. If SI-A is replaced by SI-B (different entity_id), the fingerprint changes.
Relationship type changes change the fingerprint. If an entity's role in the chain changes (e.g., from "executor" to "auth_credential"), the fingerprint changes.
Entity reordering does NOT change the fingerprint. The fingerprint is based on sorted entity_id:role pairs, not position. This prevents phantom changes when BFS traversal order varies.

function computeChainFingerprint(entityRefs: ChainEntityRef[]): string {
  // Sort by entity_id for determinism (BFS order may vary)
  const sorted = [...entityRefs]
    .sort((a, b) => a.entity_id.localeCompare(b.entity_id));

  const input = sorted
    .map(ref => `${ref.entity_id}:${ref.role}`)
    .join("|");

  return createHash("sha256").update(input).digest("hex").slice(0, 16);
}

Appendix C: Chain-Level Events (Phase 2)

When a chain version is created (fingerprint changed), the platform emits chain-level events:

// New event types for chains
type ChainEventType =
  | "chain_detected"           // First time this chain is seen
  | "chain_entity_added"       // New entity joined the chain
  | "chain_entity_removed"     // Entity left the chain
  | "chain_authority_expanded" // summary.total_roles or sensitivity increased
  | "chain_authority_reduced"  // summary.total_roles or sensitivity decreased
  | "chain_ownership_decayed"  // summary.ownership_status worsened
  | "chain_broken"             // Anchor entity deleted or chain cannot be assembled
  | "chain_restored";          // Previously broken chain reassembled

These events go into the existing events collection with entity_id = chain_id and entity_type = "execution_chain". This reuses the existing event infrastructure without creating a new collection.

Note: This means Option C's chain-level events DO use the "execution_chain" concept in the events collection, but only as an entity_type string in event documents -- NOT as a new EntityType in the type system. Events already store entity_type as a free string field, not a typed enum. This is a subtle but important distinction from Option D.

Appendix D: Files to Create/Modify

New Files (Phase 1)

File	Purpose
`src/domain/execution-chains/types.ts`	ExecutionChainDoc, ChainQuery, ChainEntityRef types
`src/domain/execution-chains/chain-builder.ts`	Anchor discovery, chain assembly, fingerprint, summary
`src/api/routes/execution-chains.ts`	API endpoints for chain listing and detail
`test/domain/execution-chains/chain-builder.test.ts`	Unit tests
`test/api/execution-chains.test.ts`	Integration tests

Modified Files (Phase 1)

File	Change
`src/storage/storage-adapter.ts`	Add 5 chain methods to interface
`src/storage/mongo/adapter.ts`	Implement 5 chain methods
`src/storage/mongo/collections.ts`	Add execution_chains collection
`src/storage/mongo/schema.ts`	Add indexes
`src/workers/handlers/sync-ingestion.ts`	Add step 8.5 (chain building)
`src/api/routes/index.ts`	Register chain routes

New Files (Phase 2, deferred)

File	Purpose
`src/domain/execution-chains/chain-versioner.ts`	Version creation, diff computation
`src/domain/execution-chain-versions/types.ts`	Version and diff types

End of architect analysis. Recommendation: Option B (Phase 1) + Option C extension (Phase 2). Estimated effort: 15h Phase 1, 9h Phase 2. Zero-risk rollback path.

Executive Summary​

1. The Identity Problem: Chains as Subgraphs​

1a. What is an "automation chain"?​

1b. Why the current model cannot represent this​

1c. How graph databases handle this​

2. Chain Identity and Stability​

2a. What defines "the same automation"?​

2b. Anchor options​

2c. Recommendation: Anchor = entry point entity, with chain fingerprint for change detection​

2d. The git analogy​

2e. When identity breaks​

3. Deep Technical Analysis of Options​

Option A: Computed View (No Schema Change)​

Option B: execution_chains Collection (Lightweight Reference)​

Option C: execution_chains + execution_chain_versions (Rich Temporal)​

Option D: Virtual Entity in entities Collection​

4. Neo4j Portability Analysis​

How each option interacts with Neo4j​

Named paths in Neo4j​

5. Complexity Budget Analysis​

Option A: +0 complexity, -0 capability debt​

Option B: +400 lines, +30 tests, +1 collection​

Option C: +680 lines, +50 tests, +2 collections​

Option D: +350 lines, +40 tests, +0 collections (but mutates core type system)​

What breaks if we get the chain definition wrong?​

6. The Ship of Theseus Problem​

6a. When every entity is replaced​

6b. When the anchor itself is replaced​

6c. Minimal anchor for chain identity​

6d. Interaction with connector discovery​

7. Recommendation​

Phase 1: Option B (Lightweight execution_chains collection)​

Phase 2: Add execution_chain_versions (Option C extension)​

Phase 3: Chain aliasing and user overrides​

StorageAdapter Interface Additions​

Type Definitions​

Indexes​

Ingestion Pipeline Changes​

API Endpoints​

8. Chain Builder Algorithm​

8a. Anchor Discovery​

8b. Chain Assembly​

8c. Summary Computation​

9. Migration Plan​

Phase 1 Implementation (estimated: 12-16 hours)​

Phase 2 Implementation (estimated: 8-10 hours, deferred)​

Rollback Plan​

10. The Founder's Scenarios, Addressed​

Scenario 1: "OAuth client_id updated, but logic is the same"​

Scenario 2: "Automation gets permissions to sensitive data"​

Scenario 3: "Security breach -- CISO needs to find why quickly"​

11. What This Analysis Does NOT Cover​

12. Summary Table​

Appendix A: Chain Entity Role Classification​

Appendix B: Chain Fingerprint Stability Contract​

Appendix C: Chain-Level Events (Phase 2)​

Appendix D: Files to Create/Modify​

New Files (Phase 1)​

Modified Files (Phase 1)​

New Files (Phase 2, deferred)​

Executive Summary

1. The Identity Problem: Chains as Subgraphs

1a. What is an "automation chain"?

1b. Why the current model cannot represent this

1c. How graph databases handle this

2. Chain Identity and Stability

2a. What defines "the same automation"?

2b. Anchor options

2c. Recommendation: Anchor = entry point entity, with chain fingerprint for change detection

2d. The git analogy

2e. When identity breaks

3. Deep Technical Analysis of Options

Option A: Computed View (No Schema Change)

Option B: `execution_chains` Collection (Lightweight Reference)

Option C: `execution_chains` + `execution_chain_versions` (Rich Temporal)

Option D: Virtual Entity in `entities` Collection

4. Neo4j Portability Analysis

How each option interacts with Neo4j

Named paths in Neo4j

5. Complexity Budget Analysis

Option A: +0 complexity, -0 capability debt

Option B: +400 lines, +30 tests, +1 collection

Option C: +680 lines, +50 tests, +2 collections

Option D: +350 lines, +40 tests, +0 collections (but mutates core type system)

What breaks if we get the chain definition wrong?

6. The Ship of Theseus Problem

6a. When every entity is replaced

6b. When the anchor itself is replaced

6c. Minimal anchor for chain identity

6d. Interaction with connector discovery

7. Recommendation

Phase 1: Option B (Lightweight `execution_chains` collection)

Phase 2: Add `execution_chain_versions` (Option C extension)

Phase 3: Chain aliasing and user overrides

StorageAdapter Interface Additions

Type Definitions

Indexes

Ingestion Pipeline Changes

API Endpoints

8. Chain Builder Algorithm

8a. Anchor Discovery

8b. Chain Assembly

8c. Summary Computation

9. Migration Plan

Phase 1 Implementation (estimated: 12-16 hours)

Phase 2 Implementation (estimated: 8-10 hours, deferred)

Rollback Plan

10. The Founder's Scenarios, Addressed

Scenario 1: "OAuth client_id updated, but logic is the same"

Scenario 2: "Automation gets permissions to sensitive data"

Scenario 3: "Security breach -- CISO needs to find why quickly"

11. What This Analysis Does NOT Cover

12. Summary Table

Appendix A: Chain Entity Role Classification

Appendix B: Chain Fingerprint Stability Contract

Appendix C: Chain-Level Events (Phase 2)

Appendix D: Files to Create/Modify

New Files (Phase 1)

Modified Files (Phase 1)

New Files (Phase 2, deferred)