Automation Persistence — Integrator Analysis

Date: 2026-02-13 Author: Integrator (connector architecture, data pipeline, cross-system correlation) Status: Draft for team review Related: CEO synthesis, Execution Flow Analysis

Executive Summary

The question of whether automations need a separate collection is fundamentally a connector architecture and data pipeline question, not primarily a UI or database question.

Core finding: The problem is real, but a separate automations collection is the wrong solution. The right solution is:

Stable chain anchors in NormalizedGraph — connectors emit chainId metadata for each automation node
Platform-side chain assembly — the ingestion pipeline builds chain entities from nodes + edges, tracking them over time
Backward-compatible schema evolution — extend NormalizedGraph, don't break it

Why this matters: When connector sync schedules diverge (Entra Monday, ServiceNow Friday), chain stability must be computable from partial graphs — the platform can't wait for "all entities present" to identify chains.

Implementation path: 3-phase rollout (metadata-only → chain hints → platform assembly), fully backward compatible.

Problem Statement (Connector Perspective)

The CEO's Concern

"If at one point we find an automation with certain list of entities and connections, it has certain business logic and flow. Then some entities change, eg. oauth client id is updated in the chain, but logic is the same. From end user/CISO perspective — everything remains the same. So logically we should continue to show exactly same automation in the UI."

This is a cross-scan identity problem: how does the platform know that these two graphs represent the "same" automation?

Scan 1 (Monday):

BR:auto-route → SI:AzureGraphRouter → REST:Graph-Router → OAuth:abc123 → SP:GraphApp

Scan 2 (Friday, OAuth client_id rotated):

BR:auto-route → SI:AzureGraphRouter → REST:Graph-Router → OAuth:xyz789 → SP:GraphApp
                                                          ↑ new oauth_entity record

From the CISO perspective: same BR, same SI, same REST Message, same Azure SP, same functionality. The OAuth credential rotated (good security hygiene!) but the automation chain is unchanged.

From the platform's perspective (today): the OAuth entity sys_id changed, so the graph diff shows oauth_entity deleted + oauth_entity created. The BR→REST Message path looks like it was re-created.

Connector question: Should the connector emit metadata that says "this is the same chain despite entity changes"?

1. Chain Discovery Across Connectors

Today's Architecture (Single Connector)

The Entra-ServiceNow connector is monolithic — it discovers both ServiceNow and Entra entities in a single sync run:

# servicenow_client.discover_all()
business_rules = client.get_business_rules()
script_includes = client.get_script_includes()
rest_messages = client.get_rest_messages()
oauth_entities = client.get_oauth_entities()

# entra_client.discover_service_principals()
service_principals = client.get_service_principals()
sign_ins = client.get_sign_ins()

# correlator.correlate_execution_chains()
# Matches OAuth.client_id → SP.app_id IN MEMORY
for oauth in oauth_entities:
    client_id = oauth['client_id']
    matched_sp = sp_by_app_id.get(client_id)
    if matched_sp:
        chains.append(ExecutionChain(
            rest_message=rm,
            oauth_entity=oauth,
            azure_sp=matched_sp,
            ...
        ))

Key insight: Chain assembly happens in the connector because all entities are available in the same process, at the same time.

Tomorrow's Architecture (Separate Connectors)

When we split into entra-connector and servicenow-connector, chain assembly becomes asynchronous:

Monday 9am — ServiceNow connector runs:

{
  "nodes": [
    {"nodeId": "sn-br-abc", "nodeType": "autonomous_identity", "identitySubtype": "business_rule"},
    {"nodeId": "sn-si-def", "nodeType": "autonomous_identity", "identitySubtype": "system_execution"},
    {"nodeId": "sn-restmsg-ghi", "nodeType": "resource", "resourceType": "rest_message"},
    {"nodeId": "sn-oauth-jkl", "nodeType": "autonomous_identity", "identitySubtype": "oauth_app", "properties": {"clientId": "abc-123"}}
  ],
  "edges": [
    {"edgeType": "CALLS", "sourceNodeId": "sn-br-abc", "targetNodeId": "sn-si-def"},
    {"edgeType": "EXECUTES_ON", "sourceNodeId": "sn-si-def", "targetNodeId": "sn-restmsg-ghi"},
    {"edgeType": "AUTHENTICATES_VIA", "sourceNodeId": "sn-restmsg-ghi", "targetNodeId": "sn-oauth-jkl"}
  ]
}

Tuesday 3pm — Entra connector runs:

{
  "nodes": [
    {"nodeId": "entra-sp-xyz", "nodeType": "autonomous_identity", "identitySubtype": "service_principal", "properties": {"appId": "abc-123"}}
  ],
  "edges": []
}

Platform ingestion (Tuesday 3pm): Now the platform must:

Match sn-oauth-jkl.clientId == entra-sp-xyz.appId ✓ (already works via AUTHENTICATES_TO edge creation)
Realize that BR→SI→REST→OAuth→SP is a chain
Assign that chain a stable ID that survives:
- OAuth credential rotation
- REST Message endpoint URL changes
- BR script updates (logic unchanged)
- SP credential rotation

Question for connector architecture: Who emits the chain ID? Connector or platform?

2. Chain Stability Across Scans

Option A: Connector Emits Chain Metadata

Approach: Add chainMetadata to NormalizedGraph:

interface NormalizedGraph {
  syncId: string;
  connectorId: string;
  tenantId: string;
  transformedAt: string;
  nodes: NormalizedNode[];
  edges: NormalizedEdge[];
  temporalMarkers: TemporalMarker[];
  evidenceCompleteness: EvidenceCompletenessReport;

  // NEW: Connector-computed chain definitions
  chainMetadata?: ChainDefinition[];
}

interface ChainDefinition {
  chainId: string;           // Stable ID (see options below)
  chainType: 'trigger_to_destination' | 'scheduled_task' | 'approval_workflow';
  displayName: string;       // Human-readable name
  anchorEntityId: string;    // Primary stable reference (e.g., BR source_id)

  // Member entities (by nodeId)
  entryPoints: string[];     // BR/Flow sys_ids that trigger this chain
  executors: string[];       // SI sys_ids in the execution path
  credentials: string[];     // OAuth entity sys_ids
  destinations: string[];    // SP/endpoint identities

  // Semantic fingerprint
  triggerPattern: {
    tables: string[];        // Tables that trigger this chain
    events: string[];        // insert/update/delete
  };
  egressPattern: {
    category: 'external' | 'cloud' | 'internal';
    destinations: string[]; // Base URLs or SP display names
  };

  // Provenance
  firstSeenAt: string;
  lastSeenAt: string;
  scanVersions: string[];    // syncIds where this chain was observed
}

Connector implementation (transformer.py):

def _build_chain_metadata(self, chain: ExecutionChain) -> dict:
    """Emit chain-level metadata for platform chain tracking."""

    # Option 1: Anchor on entry point (BR/Flow)
    anchor_source_id = None
    if chain.business_rules:
        anchor_source_id = chain.business_rules[0].get('sys_id')
    elif chain.flows:
        anchor_source_id = chain.flows[0].get('sys_id')

    if not anchor_source_id:
        return None  # Can't build stable chain without entry point

    # Chain ID = hash(entry_point + trigger_table + destination)
    chain_id_input = f"{anchor_source_id}:{chain.trigger_info.get('table', '')}:{chain.rest_message.get('endpoint', '')}"
    chain_id = hashlib.sha256(chain_id_input.encode()).hexdigest()[:16]

    return {
        "chainId": chain_id,
        "chainType": "trigger_to_destination",
        "displayName": chain.display_name,
        "anchorEntityId": anchor_source_id,
        "entryPoints": [br['sys_id'] for br in chain.business_rules],
        "executors": [si['sys_id'] for si in chain.script_includes],
        "credentials": [chain.oauth_entity.get('sys_id')] if chain.oauth_entity else [],
        "destinations": [chain.azure_sp.get('id')] if chain.azure_sp else [],
        "triggerPattern": {
            "tables": [chain.trigger_info.get('table')],
            "events": chain.trigger_info.get('events', [])
        },
        "egressPattern": {
            "category": classify_egress(chain.rest_message.get('endpoint')),
            "destinations": [chain.rest_message.get('endpoint')]
        },
        "firstSeenAt": self._current_sync_time,
        "lastSeenAt": self._current_sync_time,
        "scanVersions": [self.config.sync_id]
    }

Platform ingestion (chain merger):

// sv0-platform/src/ingestion/chain-tracker.ts
class ChainTracker {
  async mergeChainMetadata(
    tenantId: string,
    incomingChains: ChainDefinition[]
  ): Promise<void> {
    for (const newChain of incomingChains) {
      // Find existing chain by chainId
      const existing = await this.db.collection('automation_chains').findOne({
        tenant_id: tenantId,
        chain_id: newChain.chainId
      });

      if (existing) {
        // Chain stability check: did member entities change?
        const credentialChanged = !arraysEqual(
          existing.credentials,
          newChain.credentials
        );

        if (credentialChanged) {
          // Credential rotation — update chain, preserve chainId
          await this.db.collection('automation_chains').updateOne(
            { _id: existing._id },
            {
              $set: {
                credentials: newChain.credentials,
                lastSeenAt: newChain.lastSeenAt
              },
              $push: {
                scanVersions: newChain.scanVersions[0],
                credentialHistory: {
                  timestamp: newChain.lastSeenAt,
                  oldCredentials: existing.credentials,
                  newCredentials: newChain.credentials
                }
              }
            }
          );
        } else {
          // No changes — just update lastSeenAt
          await this.db.collection('automation_chains').updateOne(
            { _id: existing._id },
            {
              $set: { lastSeenAt: newChain.lastSeenAt },
              $push: { scanVersions: newChain.scanVersions[0] }
            }
          );
        }
      } else {
        // New chain — insert with history initialized
        await this.db.collection('automation_chains').insertOne({
          tenant_id: tenantId,
          chain_id: newChain.chainId,
          ...newChain,
          credentialHistory: []
        });
      }
    }
  }
}

Option A Pros/Cons

Pros:

Connector has full context — knows BR calls SI, knows trigger table, knows destination
Chain ID computed deterministically from semantic properties
Works with monolithic connector (Entra-ServiceNow) TODAY
Backward compatible — platform ignores chainMetadata if not present

Cons:

Breaks down with separate connectors — ServiceNow connector can't compute full chain without Azure SP
Connector must maintain chain ID stability logic (complex, error-prone)
What if two BRs share the same SI? Which BR is the anchor?
Requires connector to understand "what makes a chain the same" — this is business logic leaking into extraction layer

Option B: Platform Assembles Chains from Edges

Approach: Connectors emit nodes + edges only. Platform builds chains via graph traversal.

NormalizedGraph (no changes):

// Existing schema — nodes + edges only
interface NormalizedGraph {
  nodes: NormalizedNode[];
  edges: NormalizedEdge[];
  // No chainMetadata field
}

Platform chain assembler:

// sv0-platform/src/ingestion/chain-assembler.ts
class ChainAssembler {
  /**
   * Build automation chains from ingested graph via BFS traversal.
   *
   * Algorithm:
   * 1. Find all automation entry points (nodes with TRIGGERS_ON edges)
   * 2. For each entry point, BFS traverse:
   *    - Follow CALLS edges (BR → SI, SI → SI)
   *    - Follow EXECUTES_ON edges (SI → REST Message, BR → REST Message)
   *    - Follow AUTHENTICATES_VIA edges (REST Message → OAuth)
   *    - Follow AUTHENTICATES_TO edges (SP → OAuth)
   * 3. Each traversal produces a chain subgraph
   * 4. Compute chain ID from anchor + trigger + destination
   */
  async assembleChains(tenantId: string, syncId: string): Promise<void> {
    const entryPoints = await this.storage.queryNodes({
      tenant_id: tenantId,
      node_type: 'autonomous_identity',
      'properties.identitySubtype': {
        $in: ['business_rule', 'flow_designer_flow', 'scheduled_job']
      }
    });

    for (const entryNode of entryPoints) {
      const chain = await this._traverseChain(entryNode);
      if (!chain) continue;

      // Compute stable chain ID
      const chainId = this._computeChainId(chain);

      // Merge with existing chain
      await this._mergeChain(tenantId, chainId, chain, syncId);
    }
  }

  private async _traverseChain(
    entryNode: NormalizedNode
  ): Promise<ChainSubgraph | null> {
    const visited = new Set<string>();
    const subgraph: ChainSubgraph = {
      entryPoint: entryNode,
      executors: [],
      resources: [],
      credentials: [],
      crossSystemIdentities: [],
      edges: []
    };

    // BFS from entry point
    const queue = [entryNode];
    while (queue.length > 0) {
      const current = queue.shift()!;
      if (visited.has(current.nodeId)) continue;
      visited.add(current.nodeId);

      // Get outgoing edges
      const edges = await this.storage.getEdgesForNode(
        current.nodeId,
        'outgoing'
      );

      for (const edge of edges) {
        subgraph.edges.push(edge);
        const target = await this.storage.getNode(edge.targetNodeId);
        if (!target) continue;

        // Classify target and continue traversal
        if (edge.edgeType === 'CALLS') {
          // Code invocation — continue chain
          subgraph.executors.push(target);
          queue.push(target);
        } else if (edge.edgeType === 'EXECUTES_ON') {
          // Resource access — terminal or continuation
          if (target.nodeType === 'resource') {
            subgraph.resources.push(target);
            // If REST Message, look for AUTHENTICATES_VIA
            const authEdges = await this.storage.getEdgesForNode(
              target.nodeId,
              'outgoing'
            );
            const authViaEdge = authEdges.find(
              e => e.edgeType === 'AUTHENTICATES_VIA'
            );
            if (authViaEdge) {
              queue.push(await this.storage.getNode(authViaEdge.targetNodeId));
            }
          }
        } else if (edge.edgeType === 'AUTHENTICATES_TO') {
          // Cross-system auth — terminal identity
          subgraph.crossSystemIdentities.push(target);
        }
      }
    }

    return subgraph.credentials.length > 0 || subgraph.resources.length > 0
      ? subgraph
      : null;  // Incomplete chain — missing destination
  }

  private _computeChainId(chain: ChainSubgraph): string {
    // Stable anchor: entry point source_id (BR/Flow sys_id)
    const anchor = chain.entryPoint.sourceId;

    // Trigger pattern: tables from TRIGGERS_ON edges
    const triggerTables = chain.edges
      .filter(e => e.edgeType === 'TRIGGERS_ON')
      .map(e => e.targetNodeId)  // Table resource node IDs
      .sort()
      .join(',');

    // Destination pattern: REST Message endpoints or SP appIds
    const destinations = [
      ...chain.resources
        .filter(r => r.properties.resourceType === 'rest_message')
        .map(r => r.properties.endpoint_url as string),
      ...chain.crossSystemIdentities
        .map(sp => sp.properties.appId as string)
    ]
      .filter(Boolean)
      .sort()
      .join(',');

    // Hash(anchor + triggers + destinations)
    const input = `${anchor}|${triggerTables}|${destinations}`;
    return crypto.createHash('sha256').update(input).digest('hex').slice(0, 16);
  }

  private async _mergeChain(
    tenantId: string,
    chainId: string,
    chain: ChainSubgraph,
    syncId: string
  ): Promise<void> {
    const existing = await this.db.collection('automation_chains').findOne({
      tenant_id: tenantId,
      chain_id: chainId
    });

    if (existing) {
      // Detect entity changes
      const credentialIds = chain.credentials.map(c => c.nodeId);
      const credentialChanged = !arraysEqual(
        existing.credential_node_ids,
        credentialIds
      );

      await this.db.collection('automation_chains').updateOne(
        { _id: existing._id },
        {
          $set: {
            entry_point_node_id: chain.entryPoint.nodeId,
            executor_node_ids: chain.executors.map(e => e.nodeId),
            resource_node_ids: chain.resources.map(r => r.nodeId),
            credential_node_ids: credentialIds,
            cross_system_identity_node_ids: chain.crossSystemIdentities.map(i => i.nodeId),
            last_seen_at: new Date(),
            last_seen_sync_id: syncId
          },
          $push: {
            scan_versions: syncId,
            ...(credentialChanged && {
              change_history: {
                timestamp: new Date(),
                change_type: 'credential_rotation',
                old_credential_ids: existing.credential_node_ids,
                new_credential_ids: credentialIds
              }
            })
          }
        }
      );
    } else {
      // New chain
      await this.db.collection('automation_chains').insertOne({
        tenant_id: tenantId,
        chain_id: chainId,
        entry_point_node_id: chain.entryPoint.nodeId,
        executor_node_ids: chain.executors.map(e => e.nodeId),
        resource_node_ids: chain.resources.map(r => r.nodeId),
        credential_node_ids: chain.credentials.map(c => c.nodeId),
        cross_system_identity_node_ids: chain.crossSystemIdentities.map(i => i.nodeId),
        first_seen_at: new Date(),
        last_seen_at: new Date(),
        last_seen_sync_id: syncId,
        scan_versions: [syncId],
        change_history: []
      });
    }
  }
}

interface ChainSubgraph {
  entryPoint: NormalizedNode;
  executors: NormalizedNode[];     // Script Includes
  resources: NormalizedNode[];     // REST Messages, tables
  credentials: NormalizedNode[];   // OAuth entities
  crossSystemIdentities: NormalizedNode[];  // Azure SPs
  edges: NormalizedEdge[];
}

Option B Pros/Cons

Pros:

Connector simplicity — connectors just emit nodes + edges, no chain logic
Works with separate connectors — platform assembles chains when both sides are present
Handles partial graphs — if ServiceNow syncs but Entra doesn't, platform still builds "incomplete" chain and waits for Entra data
Platform owns stability — chain ID computation in one place, easier to debug/evolve
No breaking changes — existing connectors work, chain assembly is additive

Cons:

Platform must traverse graphs (CPU cost, complexity)
Chain assembly happens after ingestion — can't validate chains at ingestion time
BFS traversal logic must be kept in sync with edge semantics
What if platform traversal logic has bugs? Connector can't verify chain was assembled correctly

Option C: Hybrid (Connector Hints, Platform Assembles)

Approach: Connectors emit lightweight chain hints in node properties, platform uses hints + edges to assemble chains.

Node properties extension:

interface AutonomousIdentityProperties {
  identitySubtype: 'business_rule' | 'flow_designer_flow' | ...;

  // NEW: Chain membership hints (optional, non-breaking)
  chainMembership?: {
    role: 'entry_point' | 'executor' | 'credential' | 'destination';
    anchorEntityId?: string;  // Entry point source_id if known
    chainSemanticHash?: string;  // Hash of trigger+destination pattern
  };
}

Connector emits hints (transformer.py):

def _process_execution_chain(self, chain: ExecutionChain) -> None:
    # Compute chain semantic hash
    trigger_table = chain.trigger_info.get('table', '')
    endpoint = chain.rest_message.get('endpoint', '')
    chain_hash = hashlib.sha256(f"{trigger_table}:{endpoint}".encode()).hexdigest()[:8]

    # Entry point (BR)
    for br in chain.business_rules:
        br_node = self._add_node(
            node_id=f"sn-br-{br['sys_id']}",
            properties={
                "identitySubtype": "business_rule",
                "chainMembership": {
                    "role": "entry_point",
                    "anchorEntityId": br['sys_id'],
                    "chainSemanticHash": chain_hash
                }
            }
        )

    # Executor (SI)
    for si in chain.script_includes:
        si_node = self._add_node(
            node_id=f"sn-si-{si['sys_id']}",
            properties={
                "identitySubtype": "system_execution",
                "chainMembership": {
                    "role": "executor",
                    "anchorEntityId": chain.business_rules[0]['sys_id'] if chain.business_rules else None,
                    "chainSemanticHash": chain_hash
                }
            }
        )

    # ... similar for credentials, destinations

Platform uses hints to optimize assembly:

class ChainAssembler {
  async assembleChains(tenantId: string): Promise<void> {
    // Fast path: group nodes by chainSemanticHash
    const nodesByChainHash = await this.storage.queryNodes({
      tenant_id: tenantId,
      'properties.chainMembership.chainSemanticHash': { $exists: true }
    }).then(nodes => groupBy(nodes, n =>
      n.properties.chainMembership?.chainSemanticHash
    ));

    for (const [chainHash, members] of Object.entries(nodesByChainHash)) {
      // Hint-based grouping — verify with edge traversal
      const entryPoint = members.find(
        m => m.properties.chainMembership?.role === 'entry_point'
      );
      if (!entryPoint) continue;

      // Traverse edges to confirm chain structure
      const chain = await this._traverseChain(entryPoint);

      // Compute stable chain ID from anchor + pattern
      const anchor = entryPoint.properties.chainMembership?.anchorEntityId
        || entryPoint.sourceId;
      const chainId = crypto.createHash('sha256')
        .update(`${anchor}|${chainHash}`)
        .digest('hex')
        .slice(0, 16);

      await this._mergeChain(tenantId, chainId, chain);
    }
  }
}

Option C Pros/Cons

Pros:

Connector provides hints (fast grouping), platform verifies via edges (correctness)
Backward compatible — nodes without hints still work, platform falls back to pure traversal
Partial connector support — ServiceNow connector can emit hints, Entra connector doesn't need to
Platform can detect hint-edge mismatches (data quality signal)

Cons:

More complex — two code paths (hint-based + edge-based)
Hints can go stale if connector logic changes but platform doesn't update
Duplication — chain pattern encoded in both hints and edge structure

3. Connector vs Platform Responsibility

Separation of Concerns

Concern	Connector	Platform	Rationale
Entity discovery	✓		Connector knows source API structure
Relationship resolution	✓		Connector has all data in memory
Cross-system matching	✓		OAuth.client_id → SP.app_id is connector domain knowledge
Chain identification	Hints only	✓	Platform sees all connectors, handles partial graphs
Chain stability		✓	Platform has temporal view, connector sees one snapshot
Chain history		✓	Platform stores baselines, connector is stateless

Recommended boundary:

Connector: Discover entities → Resolve edges → Emit NormalizedGraph + optional chain hints
Platform: Ingest graph → Assemble chains → Track stability → Expose via API

Why Platform Should Own Chain Assembly

Reason 1: Partial graph handling

When ServiceNow and Entra connectors run at different times:

Monday: ServiceNow sync → BR + SI + REST + OAuth
         Platform assembles: "Incomplete chain (no Azure SP yet)"

Tuesday: Entra sync → SP
         Platform assembles: "Complete chain (OAuth → SP link now present)"
         Platform merges: Same chain_id, now complete

Connector can't handle this — it only sees its own data.

Reason 2: Multi-connector chains

Future: GitHub Actions → AWS Lambda → Slack

github-connector:  GH Action → GH OIDC Token
aws-connector:     Lambda Function → IAM Role
slack-connector:   Slack Bot Token

No single connector can emit the full chain. Platform must stitch.

Reason 3: Chain evolution

Scan 1: BR → SI → REST → OAuth-v1 → SP
Scan 2: BR → SI → REST → OAuth-v2 → SP  (credential rotated)
Scan 3: BR → SI-updated → REST → OAuth-v2 → SP  (SI script changed)

Platform must decide: are these the same chain? Connector sees one scan only.

4. Cross-Connector Chain Assembly

Scenario: Separate Entra + ServiceNow Connectors

ServiceNow connector emits:

{
  "connectorId": "servicenow_v1",
  "nodes": [
    {
      "nodeId": "sn-br-abc",
      "nodeType": "autonomous_identity",
      "properties": {
        "identitySubtype": "business_rule",
        "chainMembership": {
          "role": "entry_point",
          "anchorEntityId": "abc",
          "chainSemanticHash": "tr-incident-ep-graph"
        }
      }
    },
    {
      "nodeId": "sn-oauth-jkl",
      "nodeType": "autonomous_identity",
      "properties": {
        "identitySubtype": "oauth_app",
        "clientId": "abc-123-xyz",
        "chainMembership": {
          "role": "credential",
          "anchorEntityId": "abc",
          "chainSemanticHash": "tr-incident-ep-graph"
        }
      }
    }
  ],
  "edges": [
    {
      "edgeType": "AUTHENTICATES_VIA",
      "sourceNodeId": "sn-restmsg-ghi",
      "targetNodeId": "sn-oauth-jkl"
    }
  ]
}

Entra connector emits:

{
  "connectorId": "entra_id_v1",
  "nodes": [
    {
      "nodeId": "entra-sp-xyz",
      "nodeType": "autonomous_identity",
      "properties": {
        "identitySubtype": "service_principal",
        "appId": "abc-123-xyz"
      }
    }
  ],
  "edges": []
}

Platform stitching:

// After Entra sync completes
class CrossSystemStitcher {
  async stitchAuthenticationEdges(tenantId: string): Promise<void> {
    // Find OAuth entities with clientId
    const oauthNodes = await this.storage.queryNodes({
      tenant_id: tenantId,
      node_type: 'autonomous_identity',
      'properties.identitySubtype': 'oauth_app',
      'properties.clientId': { $exists: true }
    });

    // Find SPs with matching appId
    const spNodes = await this.storage.queryNodes({
      tenant_id: tenantId,
      node_type: 'autonomous_identity',
      'properties.identitySubtype': 'service_principal'
    });

    const spByAppId = new Map(
      spNodes.map(sp => [
        (sp.properties.appId as string).toLowerCase(),
        sp
      ])
    );

    for (const oauthNode of oauthNodes) {
      const clientId = (oauthNode.properties.clientId as string).toLowerCase();
      const matchedSp = spByAppId.get(clientId);

      if (matchedSp) {
        // Create AUTHENTICATES_TO edge (SP → OAuth)
        await this.storage.upsertEdge({
          edgeId: `AUTHENTICATES_TO:${matchedSp.nodeId}:${oauthNode.nodeId}`,
          edgeType: 'AUTHENTICATES_TO',
          sourceNodeId: matchedSp.nodeId,
          targetNodeId: oauthNode.nodeId,
          properties: {
            evidenceReferences: {
              matchingField: 'client_id',
              matchingValue: clientId,
              issuingSystemId: matchedSp.properties.appId,
              targetSystemId: clientId
            }
          }
        });

        // Copy chain hints from OAuth to SP
        if (oauthNode.properties.chainMembership) {
          await this.storage.updateNode(matchedSp.nodeId, {
            'properties.chainMembership': {
              role: 'destination',
              anchorEntityId: oauthNode.properties.chainMembership.anchorEntityId,
              chainSemanticHash: oauthNode.properties.chainMembership.chainSemanticHash
            }
          });
        }
      }
    }
  }
}

Result: Platform creates cross-system edges after both connectors run, then reassembles chains to include the new SP nodes.

5. NormalizedGraph Schema Changes

Backward Compatibility Strategy

Phase 1: Additive properties (non-breaking)

Add optional chainMembership to node properties:

// types.ts
interface AutonomousIdentityProperties {
  identitySubtype: string;
  // ... existing properties

  // NEW (optional, backward compatible)
  chainMembership?: {
    role: 'entry_point' | 'executor' | 'credential' | 'destination';
    anchorEntityId?: string;
    chainSemanticHash?: string;
  };
}

Phase 2: Chain metadata block (optional)

Add top-level chainMetadata array:

interface NormalizedGraph {
  // ... existing fields

  // NEW (optional, backward compatible)
  chainMetadata?: ChainDefinition[];
}

Phase 3: Platform assembly (future)

Once chain assembler is stable, deprecate connector-emitted chainMetadata:

// Deprecation notice in docs:
// @deprecated Connectors should not emit chainMetadata.
// The platform assembles chains from nodes + edges.
// This field is ignored as of sv0-platform v2.0.

Validation Strategy

Connector-side validation:

# In transformer._enrich_automation_properties()
if node.get('properties', {}).get('chainMembership'):
    # Validate role is one of allowed values
    role = node['properties']['chainMembership'].get('role')
    if role not in ('entry_point', 'executor', 'credential', 'destination'):
        logger.warning(f"Invalid chainMembership role: {role}")
        del node['properties']['chainMembership']

Platform-side validation:

// In ingestion/validator.ts
function validateChainMembership(node: NormalizedNode): ValidationResult {
  const membership = node.properties.chainMembership;
  if (!membership) return { valid: true };

  // Check role consistency
  const expectedRoles: Record<string, string[]> = {
    'business_rule': ['entry_point'],
    'flow_designer_flow': ['entry_point'],
    'system_execution': ['executor'],
    'oauth_app': ['credential'],
    'service_principal': ['destination']
  };

  const subtype = node.properties.identitySubtype as string;
  const allowedRoles = expectedRoles[subtype] || [];

  if (!allowedRoles.includes(membership.role)) {
    return {
      valid: false,
      error: `Invalid chainMembership.role "${membership.role}" for subtype "${subtype}"`
    };
  }

  return { valid: true };
}

6. Incremental Sync & Chain Updates

Problem: Audit Log Syncs Don't Re-Send Full Chains

When running in syncMode: 'audit_log':

// Entra audit log sync
{
  "auditRecords": [
    {
      "operation": "Remove service principal credentials",
      "targetResources": [
        {"id": "sp-xyz", "type": "servicePrincipal"}
      ],
      "modifiedProperties": [
        {"name": "KeyCredential", "oldValue": "cert-abc", "newValue": null}
      ]
    }
  ]
}

The connector emits:

GraphEvent with eventType: 'credential_deleted'
Updated NormalizedNode for the SP (minus the deleted credential)

But it does not re-send the full chain (BR → SI → REST → OAuth → SP).

Platform challenge: How does the chain tracker know to update the chain's credential list?

Solution: Event-Driven Chain Updates

class ChainTracker {
  async processGraphEvent(event: GraphEvent): Promise<void> {
    // Find chains containing this entity
    const chains = await this.db.collection('automation_chains').find({
      tenant_id: event.tenantId,
      $or: [
        { entry_point_node_id: event.entityId },
        { executor_node_ids: event.entityId },
        { credential_node_ids: event.entityId },
        { cross_system_identity_node_ids: event.entityId }
      ]
    }).toArray();

    for (const chain of chains) {
      if (event.eventType === 'credential_deleted') {
        // Remove deleted credential from chain
        await this.db.collection('automation_chains').updateOne(
          { _id: chain._id },
          {
            $pull: { credential_node_ids: event.entityId },
            $push: {
              change_history: {
                timestamp: new Date(),
                change_type: 'credential_deleted',
                entity_id: event.entityId,
                sync_id: event.syncId
              }
            }
          }
        );
      } else if (event.eventType === 'status_changed') {
        // Mark chain as potentially inactive
        await this.db.collection('automation_chains').updateOne(
          { _id: chain._id },
          {
            $set: { requires_revalidation: true },
            $push: {
              change_history: {
                timestamp: new Date(),
                change_type: 'member_status_changed',
                entity_id: event.entityId
              }
            }
          }
        );
      }
    }
  }
}

Revalidation strategy:

// Background job: daily chain revalidation
async function revalidateChains(tenantId: string): Promise<void> {
  const staleChains = await db.collection('automation_chains').find({
    tenant_id: tenantId,
    requires_revalidation: true
  }).toArray();

  for (const chain of staleChains) {
    // Re-traverse chain from entry point
    const entryNode = await storage.getNode(chain.entry_point_node_id);
    if (!entryNode || entryNode.status === 'deleted') {
      // Entry point gone — mark chain as deleted
      await db.collection('automation_chains').updateOne(
        { _id: chain._id },
        { $set: { status: 'deleted', deleted_at: new Date() } }
      );
      continue;
    }

    // Re-assemble chain
    const freshChain = await chainAssembler._traverseChain(entryNode);
    await chainAssembler._mergeChain(tenantId, chain.chain_id, freshChain);
  }
}

7. Multi-Platform Chains (Future)

Scenario: GitHub Actions → AWS Lambda → Slack

GitHub connector:

{
  "nodes": [
    {
      "nodeId": "gh-workflow-abc",
      "nodeType": "autonomous_identity",
      "properties": {
        "identitySubtype": "github_workflow",
        "chainMembership": {
          "role": "entry_point",
          "anchorEntityId": "workflow-abc",
          "chainSemanticHash": "gh-issue-aws-notify"
        }
      }
    },
    {
      "nodeId": "gh-oidc-token",
      "nodeType": "credential",
      "properties": {
        "credentialSubtype": "oidc_token",
        "audience": "sts.amazonaws.com"
      }
    }
  ],
  "edges": [
    {
      "edgeType": "AUTHENTICATES_VIA",
      "sourceNodeId": "gh-workflow-abc",
      "targetNodeId": "gh-oidc-token"
    }
  ]
}

AWS connector:

{
  "nodes": [
    {
      "nodeId": "aws-role-xyz",
      "nodeType": "role",
      "properties": {
        "roleName": "GitHubActionsRole",
        "trustPolicy": {
          "principal": "token.actions.githubusercontent.com",
          "condition": { "StringEquals": { "aud": "sts.amazonaws.com" } }
        }
      }
    },
    {
      "nodeId": "aws-lambda-notify",
      "nodeType": "autonomous_identity",
      "properties": {
        "identitySubtype": "lambda_function"
      }
    }
  ],
  "edges": [
    {
      "edgeType": "HAS_ROLE",
      "sourceNodeId": "aws-lambda-notify",
      "targetNodeId": "aws-role-xyz"
    }
  ]
}

Slack connector:

{
  "nodes": [
    {
      "nodeId": "slack-bot-token",
      "nodeType": "credential",
      "properties": {
        "credentialSubtype": "api_key"
      }
    }
  ]
}

Platform stitching:

// Cross-platform chain assembly
class CrossPlatformStitcher {
  async stitchOIDCChains(tenantId: string): Promise<void> {
    // Find OIDC tokens with AWS audience
    const oidcTokens = await this.storage.queryNodes({
      tenant_id: tenantId,
      node_type: 'credential',
      'properties.credentialSubtype': 'oidc_token',
      'properties.audience': 'sts.amazonaws.com'
    });

    // Find AWS roles with OIDC trust policies
    const awsRoles = await this.storage.queryNodes({
      tenant_id: tenantId,
      node_type: 'role',
      'properties.trustPolicy.principal': { $regex: /github|gitlab/ }
    });

    for (const token of oidcTokens) {
      for (const role of awsRoles) {
        // Match based on trust policy conditions
        if (this._oidcTrustMatches(token, role)) {
          await this.storage.upsertEdge({
            edgeId: `DELEGATES_TO:${token.nodeId}:${role.nodeId}`,
            edgeType: 'DELEGATES_TO',
            sourceNodeId: token.nodeId,
            targetNodeId: role.nodeId,
            properties: {
              delegationType: 'oidc_federation',
              evidenceReferences: {
                trustPolicy: role.properties.trustPolicy
              }
            }
          });
        }
      }
    }
  }
}

Result: Platform creates GH Workflow → OIDC Token -[DELEGATES_TO]-> AWS Role → Lambda chain.

Connector implication: Each connector only knows its own platform. Platform must have cross-platform correlation logic.

8. Practical Recommendation

Three-Phase Implementation

Phase 1: Node-Level Chain Hints (2-3 days)

Connector work:

Add chainMembership properties to automation nodes
Emit anchorEntityId and chainSemanticHash in transformer

Platform work:

Extend NormalizedNode schema with optional chainMembership property
No behavioral changes — hints are informational only

Result: Connector emits chain context, platform ignores it (forward compatibility).

Phase 2: Platform Chain Assembler (1-2 weeks)

Platform work:

Implement ChainAssembler class (BFS traversal from entry points)
Implement automation_chains collection with chain tracking
Implement chain stability logic (chain_id computation, credential rotation detection)
Add /api/v1/tenants/:id/automation-chains endpoint

Connector work:

None — existing nodes + edges are sufficient

Result: Platform can show "Automation Chains" view with stable IDs across scans.

Phase 3: Incremental Chain Updates (1 week)

Platform work:

Implement event-driven chain updates (listen to GraphEvent stream)
Implement chain revalidation background job
Add change_history array to chains for audit trail

Result: Chains update incrementally without full re-sync.

Total Effort Estimate

Phase	Component	Effort	Risk
Phase 1	Connector hints	2-3 days	Low
	Platform schema	2 hours	None
Phase 2	Chain assembler	5 days	Medium (graph traversal complexity)
	Chain tracker	3 days	Low
	API endpoints	2 days	Low
Phase 3	Event-driven updates	3 days	Medium (incremental correctness)
	Revalidation job	2 days	Low
Total		~3-4 weeks	Medium

Why Not Separate Collection?

Reason 1: Chains are derived, not source entities

Chains are computed from nodes + edges. They're not discovered by connectors — they're assembled by the platform.

Analogy: A "blast radius" query doesn't create a blast_radius collection — it's a traversal result. Same with chains.

Reason 2: Storage doesn't solve the identity problem

The question "is this the same chain?" isn't answered by having a separate collection. It's answered by having a stable chain ID algorithm.

Reason 3: Chains change faster than baselines

Chains can change multiple times per day (credential rotation, script updates). Storing them as versioned baselines bloats the database.

Better: Store chains as {chain_id, member_node_ids[], change_history[]} — lightweight tracking, heavy lifting is in the graph.

Files Produced

/Users/lucky/dev/securityv0/sv0-documentation/docs/analysis/2026-02-12-automation-classification/03-automation-persistence-integrator.md (this document)

Next Steps

Architect review — Is the chain ID stability algorithm correct?
Developer review — Does the platform schema design make sense?
CEO decision — Phase 1 only? Or commit to full 3-phase implementation?
If approved: Create ADR for chain tracking architecture

Executive Summary​

Problem Statement (Connector Perspective)​

The CEO's Concern​

1. Chain Discovery Across Connectors​

Today's Architecture (Single Connector)​

Tomorrow's Architecture (Separate Connectors)​

2. Chain Stability Across Scans​

Option A: Connector Emits Chain Metadata​

Option A Pros/Cons​

Option B: Platform Assembles Chains from Edges​

Option B Pros/Cons​

Option C: Hybrid (Connector Hints, Platform Assembles)​

Option C Pros/Cons​

3. Connector vs Platform Responsibility​

Separation of Concerns​

Why Platform Should Own Chain Assembly​

4. Cross-Connector Chain Assembly​

Scenario: Separate Entra + ServiceNow Connectors​

5. NormalizedGraph Schema Changes​

Backward Compatibility Strategy​

Validation Strategy​

6. Incremental Sync & Chain Updates​

Problem: Audit Log Syncs Don't Re-Send Full Chains​

Solution: Event-Driven Chain Updates​

7. Multi-Platform Chains (Future)​

Scenario: GitHub Actions → AWS Lambda → Slack​

8. Practical Recommendation​

Three-Phase Implementation​

Phase 1: Node-Level Chain Hints (2-3 days)​

Phase 2: Platform Chain Assembler (1-2 weeks)​

Phase 3: Incremental Chain Updates (1 week)​

Total Effort Estimate​

Why Not Separate Collection?​

Files Produced​

Next Steps​

Executive Summary

Problem Statement (Connector Perspective)

The CEO's Concern

1. Chain Discovery Across Connectors

Today's Architecture (Single Connector)

Tomorrow's Architecture (Separate Connectors)

2. Chain Stability Across Scans

Option A: Connector Emits Chain Metadata

Option A Pros/Cons

Option B: Platform Assembles Chains from Edges

Option B Pros/Cons

Option C: Hybrid (Connector Hints, Platform Assembles)

Option C Pros/Cons

3. Connector vs Platform Responsibility

Separation of Concerns

Why Platform Should Own Chain Assembly

4. Cross-Connector Chain Assembly

Scenario: Separate Entra + ServiceNow Connectors

5. NormalizedGraph Schema Changes

Backward Compatibility Strategy

Validation Strategy

6. Incremental Sync & Chain Updates

Problem: Audit Log Syncs Don't Re-Send Full Chains

Solution: Event-Driven Chain Updates

7. Multi-Platform Chains (Future)

Scenario: GitHub Actions → AWS Lambda → Slack

8. Practical Recommendation

Three-Phase Implementation

Phase 1: Node-Level Chain Hints (2-3 days)

Phase 2: Platform Chain Assembler (1-2 weeks)

Phase 3: Incremental Chain Updates (1 week)

Total Effort Estimate

Why Not Separate Collection?

Files Produced

Next Steps