Skip to main content

Automation Persistence — Integrator Analysis

Date: 2026-02-13 Author: Integrator (connector architecture, data pipeline, cross-system correlation) Status: Draft for team review Related: CEO synthesis, Execution Flow Analysis


Executive Summary

The question of whether automations need a separate collection is fundamentally a connector architecture and data pipeline question, not primarily a UI or database question.

Core finding: The problem is real, but a separate automations collection is the wrong solution. The right solution is:

  1. Stable chain anchors in NormalizedGraph — connectors emit chainId metadata for each automation node
  2. Platform-side chain assembly — the ingestion pipeline builds chain entities from nodes + edges, tracking them over time
  3. Backward-compatible schema evolution — extend NormalizedGraph, don't break it

Why this matters: When connector sync schedules diverge (Entra Monday, ServiceNow Friday), chain stability must be computable from partial graphs — the platform can't wait for "all entities present" to identify chains.

Implementation path: 3-phase rollout (metadata-only → chain hints → platform assembly), fully backward compatible.


Problem Statement (Connector Perspective)

The CEO's Concern

"If at one point we find an automation with certain list of entities and connections, it has certain business logic and flow. Then some entities change, eg. oauth client id is updated in the chain, but logic is the same. From end user/CISO perspective — everything remains the same. So logically we should continue to show exactly same automation in the UI."

This is a cross-scan identity problem: how does the platform know that these two graphs represent the "same" automation?

Scan 1 (Monday):

BR:auto-route → SI:AzureGraphRouter → REST:Graph-Router → OAuth:abc123 → SP:GraphApp

Scan 2 (Friday, OAuth client_id rotated):

BR:auto-route → SI:AzureGraphRouter → REST:Graph-Router → OAuth:xyz789 → SP:GraphApp
↑ new oauth_entity record

From the CISO perspective: same BR, same SI, same REST Message, same Azure SP, same functionality. The OAuth credential rotated (good security hygiene!) but the automation chain is unchanged.

From the platform's perspective (today): the OAuth entity sys_id changed, so the graph diff shows oauth_entity deleted + oauth_entity created. The BR→REST Message path looks like it was re-created.

Connector question: Should the connector emit metadata that says "this is the same chain despite entity changes"?


1. Chain Discovery Across Connectors

Today's Architecture (Single Connector)

The Entra-ServiceNow connector is monolithic — it discovers both ServiceNow and Entra entities in a single sync run:

# servicenow_client.discover_all()
business_rules = client.get_business_rules()
script_includes = client.get_script_includes()
rest_messages = client.get_rest_messages()
oauth_entities = client.get_oauth_entities()

# entra_client.discover_service_principals()
service_principals = client.get_service_principals()
sign_ins = client.get_sign_ins()

# correlator.correlate_execution_chains()
# Matches OAuth.client_id → SP.app_id IN MEMORY
for oauth in oauth_entities:
client_id = oauth['client_id']
matched_sp = sp_by_app_id.get(client_id)
if matched_sp:
chains.append(ExecutionChain(
rest_message=rm,
oauth_entity=oauth,
azure_sp=matched_sp,
...
))

Key insight: Chain assembly happens in the connector because all entities are available in the same process, at the same time.

Tomorrow's Architecture (Separate Connectors)

When we split into entra-connector and servicenow-connector, chain assembly becomes asynchronous:

Monday 9am — ServiceNow connector runs:

{
"nodes": [
{"nodeId": "sn-br-abc", "nodeType": "autonomous_identity", "identitySubtype": "business_rule"},
{"nodeId": "sn-si-def", "nodeType": "autonomous_identity", "identitySubtype": "system_execution"},
{"nodeId": "sn-restmsg-ghi", "nodeType": "resource", "resourceType": "rest_message"},
{"nodeId": "sn-oauth-jkl", "nodeType": "autonomous_identity", "identitySubtype": "oauth_app", "properties": {"clientId": "abc-123"}}
],
"edges": [
{"edgeType": "CALLS", "sourceNodeId": "sn-br-abc", "targetNodeId": "sn-si-def"},
{"edgeType": "EXECUTES_ON", "sourceNodeId": "sn-si-def", "targetNodeId": "sn-restmsg-ghi"},
{"edgeType": "AUTHENTICATES_VIA", "sourceNodeId": "sn-restmsg-ghi", "targetNodeId": "sn-oauth-jkl"}
]
}

Tuesday 3pm — Entra connector runs:

{
"nodes": [
{"nodeId": "entra-sp-xyz", "nodeType": "autonomous_identity", "identitySubtype": "service_principal", "properties": {"appId": "abc-123"}}
],
"edges": []
}

Platform ingestion (Tuesday 3pm): Now the platform must:

  1. Match sn-oauth-jkl.clientId == entra-sp-xyz.appId ✓ (already works via AUTHENTICATES_TO edge creation)
  2. Realize that BR→SI→REST→OAuth→SP is a chain
  3. Assign that chain a stable ID that survives:
    • OAuth credential rotation
    • REST Message endpoint URL changes
    • BR script updates (logic unchanged)
    • SP credential rotation

Question for connector architecture: Who emits the chain ID? Connector or platform?


2. Chain Stability Across Scans

Option A: Connector Emits Chain Metadata

Approach: Add chainMetadata to NormalizedGraph:

interface NormalizedGraph {
syncId: string;
connectorId: string;
tenantId: string;
transformedAt: string;
nodes: NormalizedNode[];
edges: NormalizedEdge[];
temporalMarkers: TemporalMarker[];
evidenceCompleteness: EvidenceCompletenessReport;

// NEW: Connector-computed chain definitions
chainMetadata?: ChainDefinition[];
}

interface ChainDefinition {
chainId: string; // Stable ID (see options below)
chainType: 'trigger_to_destination' | 'scheduled_task' | 'approval_workflow';
displayName: string; // Human-readable name
anchorEntityId: string; // Primary stable reference (e.g., BR source_id)

// Member entities (by nodeId)
entryPoints: string[]; // BR/Flow sys_ids that trigger this chain
executors: string[]; // SI sys_ids in the execution path
credentials: string[]; // OAuth entity sys_ids
destinations: string[]; // SP/endpoint identities

// Semantic fingerprint
triggerPattern: {
tables: string[]; // Tables that trigger this chain
events: string[]; // insert/update/delete
};
egressPattern: {
category: 'external' | 'cloud' | 'internal';
destinations: string[]; // Base URLs or SP display names
};

// Provenance
firstSeenAt: string;
lastSeenAt: string;
scanVersions: string[]; // syncIds where this chain was observed
}

Connector implementation (transformer.py):

def _build_chain_metadata(self, chain: ExecutionChain) -> dict:
"""Emit chain-level metadata for platform chain tracking."""

# Option 1: Anchor on entry point (BR/Flow)
anchor_source_id = None
if chain.business_rules:
anchor_source_id = chain.business_rules[0].get('sys_id')
elif chain.flows:
anchor_source_id = chain.flows[0].get('sys_id')

if not anchor_source_id:
return None # Can't build stable chain without entry point

# Chain ID = hash(entry_point + trigger_table + destination)
chain_id_input = f"{anchor_source_id}:{chain.trigger_info.get('table', '')}:{chain.rest_message.get('endpoint', '')}"
chain_id = hashlib.sha256(chain_id_input.encode()).hexdigest()[:16]

return {
"chainId": chain_id,
"chainType": "trigger_to_destination",
"displayName": chain.display_name,
"anchorEntityId": anchor_source_id,
"entryPoints": [br['sys_id'] for br in chain.business_rules],
"executors": [si['sys_id'] for si in chain.script_includes],
"credentials": [chain.oauth_entity.get('sys_id')] if chain.oauth_entity else [],
"destinations": [chain.azure_sp.get('id')] if chain.azure_sp else [],
"triggerPattern": {
"tables": [chain.trigger_info.get('table')],
"events": chain.trigger_info.get('events', [])
},
"egressPattern": {
"category": classify_egress(chain.rest_message.get('endpoint')),
"destinations": [chain.rest_message.get('endpoint')]
},
"firstSeenAt": self._current_sync_time,
"lastSeenAt": self._current_sync_time,
"scanVersions": [self.config.sync_id]
}

Platform ingestion (chain merger):

// sv0-platform/src/ingestion/chain-tracker.ts
class ChainTracker {
async mergeChainMetadata(
tenantId: string,
incomingChains: ChainDefinition[]
): Promise<void> {
for (const newChain of incomingChains) {
// Find existing chain by chainId
const existing = await this.db.collection('automation_chains').findOne({
tenant_id: tenantId,
chain_id: newChain.chainId
});

if (existing) {
// Chain stability check: did member entities change?
const credentialChanged = !arraysEqual(
existing.credentials,
newChain.credentials
);

if (credentialChanged) {
// Credential rotation — update chain, preserve chainId
await this.db.collection('automation_chains').updateOne(
{ _id: existing._id },
{
$set: {
credentials: newChain.credentials,
lastSeenAt: newChain.lastSeenAt
},
$push: {
scanVersions: newChain.scanVersions[0],
credentialHistory: {
timestamp: newChain.lastSeenAt,
oldCredentials: existing.credentials,
newCredentials: newChain.credentials
}
}
}
);
} else {
// No changes — just update lastSeenAt
await this.db.collection('automation_chains').updateOne(
{ _id: existing._id },
{
$set: { lastSeenAt: newChain.lastSeenAt },
$push: { scanVersions: newChain.scanVersions[0] }
}
);
}
} else {
// New chain — insert with history initialized
await this.db.collection('automation_chains').insertOne({
tenant_id: tenantId,
chain_id: newChain.chainId,
...newChain,
credentialHistory: []
});
}
}
}
}

Option A Pros/Cons

Pros:

  • Connector has full context — knows BR calls SI, knows trigger table, knows destination
  • Chain ID computed deterministically from semantic properties
  • Works with monolithic connector (Entra-ServiceNow) TODAY
  • Backward compatible — platform ignores chainMetadata if not present

Cons:

  • Breaks down with separate connectors — ServiceNow connector can't compute full chain without Azure SP
  • Connector must maintain chain ID stability logic (complex, error-prone)
  • What if two BRs share the same SI? Which BR is the anchor?
  • Requires connector to understand "what makes a chain the same" — this is business logic leaking into extraction layer

Option B: Platform Assembles Chains from Edges

Approach: Connectors emit nodes + edges only. Platform builds chains via graph traversal.

NormalizedGraph (no changes):

// Existing schema — nodes + edges only
interface NormalizedGraph {
nodes: NormalizedNode[];
edges: NormalizedEdge[];
// No chainMetadata field
}

Platform chain assembler:

// sv0-platform/src/ingestion/chain-assembler.ts
class ChainAssembler {
/**
* Build automation chains from ingested graph via BFS traversal.
*
* Algorithm:
* 1. Find all automation entry points (nodes with TRIGGERS_ON edges)
* 2. For each entry point, BFS traverse:
* - Follow CALLS edges (BR → SI, SI → SI)
* - Follow EXECUTES_ON edges (SI → REST Message, BR → REST Message)
* - Follow AUTHENTICATES_VIA edges (REST Message → OAuth)
* - Follow AUTHENTICATES_TO edges (SP → OAuth)
* 3. Each traversal produces a chain subgraph
* 4. Compute chain ID from anchor + trigger + destination
*/
async assembleChains(tenantId: string, syncId: string): Promise<void> {
const entryPoints = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'autonomous_identity',
'properties.identitySubtype': {
$in: ['business_rule', 'flow_designer_flow', 'scheduled_job']
}
});

for (const entryNode of entryPoints) {
const chain = await this._traverseChain(entryNode);
if (!chain) continue;

// Compute stable chain ID
const chainId = this._computeChainId(chain);

// Merge with existing chain
await this._mergeChain(tenantId, chainId, chain, syncId);
}
}

private async _traverseChain(
entryNode: NormalizedNode
): Promise<ChainSubgraph | null> {
const visited = new Set<string>();
const subgraph: ChainSubgraph = {
entryPoint: entryNode,
executors: [],
resources: [],
credentials: [],
crossSystemIdentities: [],
edges: []
};

// BFS from entry point
const queue = [entryNode];
while (queue.length > 0) {
const current = queue.shift()!;
if (visited.has(current.nodeId)) continue;
visited.add(current.nodeId);

// Get outgoing edges
const edges = await this.storage.getEdgesForNode(
current.nodeId,
'outgoing'
);

for (const edge of edges) {
subgraph.edges.push(edge);
const target = await this.storage.getNode(edge.targetNodeId);
if (!target) continue;

// Classify target and continue traversal
if (edge.edgeType === 'CALLS') {
// Code invocation — continue chain
subgraph.executors.push(target);
queue.push(target);
} else if (edge.edgeType === 'EXECUTES_ON') {
// Resource access — terminal or continuation
if (target.nodeType === 'resource') {
subgraph.resources.push(target);
// If REST Message, look for AUTHENTICATES_VIA
const authEdges = await this.storage.getEdgesForNode(
target.nodeId,
'outgoing'
);
const authViaEdge = authEdges.find(
e => e.edgeType === 'AUTHENTICATES_VIA'
);
if (authViaEdge) {
queue.push(await this.storage.getNode(authViaEdge.targetNodeId));
}
}
} else if (edge.edgeType === 'AUTHENTICATES_TO') {
// Cross-system auth — terminal identity
subgraph.crossSystemIdentities.push(target);
}
}
}

return subgraph.credentials.length > 0 || subgraph.resources.length > 0
? subgraph
: null; // Incomplete chain — missing destination
}

private _computeChainId(chain: ChainSubgraph): string {
// Stable anchor: entry point source_id (BR/Flow sys_id)
const anchor = chain.entryPoint.sourceId;

// Trigger pattern: tables from TRIGGERS_ON edges
const triggerTables = chain.edges
.filter(e => e.edgeType === 'TRIGGERS_ON')
.map(e => e.targetNodeId) // Table resource node IDs
.sort()
.join(',');

// Destination pattern: REST Message endpoints or SP appIds
const destinations = [
...chain.resources
.filter(r => r.properties.resourceType === 'rest_message')
.map(r => r.properties.endpoint_url as string),
...chain.crossSystemIdentities
.map(sp => sp.properties.appId as string)
]
.filter(Boolean)
.sort()
.join(',');

// Hash(anchor + triggers + destinations)
const input = `${anchor}|${triggerTables}|${destinations}`;
return crypto.createHash('sha256').update(input).digest('hex').slice(0, 16);
}

private async _mergeChain(
tenantId: string,
chainId: string,
chain: ChainSubgraph,
syncId: string
): Promise<void> {
const existing = await this.db.collection('automation_chains').findOne({
tenant_id: tenantId,
chain_id: chainId
});

if (existing) {
// Detect entity changes
const credentialIds = chain.credentials.map(c => c.nodeId);
const credentialChanged = !arraysEqual(
existing.credential_node_ids,
credentialIds
);

await this.db.collection('automation_chains').updateOne(
{ _id: existing._id },
{
$set: {
entry_point_node_id: chain.entryPoint.nodeId,
executor_node_ids: chain.executors.map(e => e.nodeId),
resource_node_ids: chain.resources.map(r => r.nodeId),
credential_node_ids: credentialIds,
cross_system_identity_node_ids: chain.crossSystemIdentities.map(i => i.nodeId),
last_seen_at: new Date(),
last_seen_sync_id: syncId
},
$push: {
scan_versions: syncId,
...(credentialChanged && {
change_history: {
timestamp: new Date(),
change_type: 'credential_rotation',
old_credential_ids: existing.credential_node_ids,
new_credential_ids: credentialIds
}
})
}
}
);
} else {
// New chain
await this.db.collection('automation_chains').insertOne({
tenant_id: tenantId,
chain_id: chainId,
entry_point_node_id: chain.entryPoint.nodeId,
executor_node_ids: chain.executors.map(e => e.nodeId),
resource_node_ids: chain.resources.map(r => r.nodeId),
credential_node_ids: chain.credentials.map(c => c.nodeId),
cross_system_identity_node_ids: chain.crossSystemIdentities.map(i => i.nodeId),
first_seen_at: new Date(),
last_seen_at: new Date(),
last_seen_sync_id: syncId,
scan_versions: [syncId],
change_history: []
});
}
}
}

interface ChainSubgraph {
entryPoint: NormalizedNode;
executors: NormalizedNode[]; // Script Includes
resources: NormalizedNode[]; // REST Messages, tables
credentials: NormalizedNode[]; // OAuth entities
crossSystemIdentities: NormalizedNode[]; // Azure SPs
edges: NormalizedEdge[];
}

Option B Pros/Cons

Pros:

  • Connector simplicity — connectors just emit nodes + edges, no chain logic
  • Works with separate connectors — platform assembles chains when both sides are present
  • Handles partial graphs — if ServiceNow syncs but Entra doesn't, platform still builds "incomplete" chain and waits for Entra data
  • Platform owns stability — chain ID computation in one place, easier to debug/evolve
  • No breaking changes — existing connectors work, chain assembly is additive

Cons:

  • Platform must traverse graphs (CPU cost, complexity)
  • Chain assembly happens after ingestion — can't validate chains at ingestion time
  • BFS traversal logic must be kept in sync with edge semantics
  • What if platform traversal logic has bugs? Connector can't verify chain was assembled correctly

Option C: Hybrid (Connector Hints, Platform Assembles)

Approach: Connectors emit lightweight chain hints in node properties, platform uses hints + edges to assemble chains.

Node properties extension:

interface AutonomousIdentityProperties {
identitySubtype: 'business_rule' | 'flow_designer_flow' | ...;

// NEW: Chain membership hints (optional, non-breaking)
chainMembership?: {
role: 'entry_point' | 'executor' | 'credential' | 'destination';
anchorEntityId?: string; // Entry point source_id if known
chainSemanticHash?: string; // Hash of trigger+destination pattern
};
}

Connector emits hints (transformer.py):

def _process_execution_chain(self, chain: ExecutionChain) -> None:
# Compute chain semantic hash
trigger_table = chain.trigger_info.get('table', '')
endpoint = chain.rest_message.get('endpoint', '')
chain_hash = hashlib.sha256(f"{trigger_table}:{endpoint}".encode()).hexdigest()[:8]

# Entry point (BR)
for br in chain.business_rules:
br_node = self._add_node(
node_id=f"sn-br-{br['sys_id']}",
properties={
"identitySubtype": "business_rule",
"chainMembership": {
"role": "entry_point",
"anchorEntityId": br['sys_id'],
"chainSemanticHash": chain_hash
}
}
)

# Executor (SI)
for si in chain.script_includes:
si_node = self._add_node(
node_id=f"sn-si-{si['sys_id']}",
properties={
"identitySubtype": "system_execution",
"chainMembership": {
"role": "executor",
"anchorEntityId": chain.business_rules[0]['sys_id'] if chain.business_rules else None,
"chainSemanticHash": chain_hash
}
}
)

# ... similar for credentials, destinations

Platform uses hints to optimize assembly:

class ChainAssembler {
async assembleChains(tenantId: string): Promise<void> {
// Fast path: group nodes by chainSemanticHash
const nodesByChainHash = await this.storage.queryNodes({
tenant_id: tenantId,
'properties.chainMembership.chainSemanticHash': { $exists: true }
}).then(nodes => groupBy(nodes, n =>
n.properties.chainMembership?.chainSemanticHash
));

for (const [chainHash, members] of Object.entries(nodesByChainHash)) {
// Hint-based grouping — verify with edge traversal
const entryPoint = members.find(
m => m.properties.chainMembership?.role === 'entry_point'
);
if (!entryPoint) continue;

// Traverse edges to confirm chain structure
const chain = await this._traverseChain(entryPoint);

// Compute stable chain ID from anchor + pattern
const anchor = entryPoint.properties.chainMembership?.anchorEntityId
|| entryPoint.sourceId;
const chainId = crypto.createHash('sha256')
.update(`${anchor}|${chainHash}`)
.digest('hex')
.slice(0, 16);

await this._mergeChain(tenantId, chainId, chain);
}
}
}

Option C Pros/Cons

Pros:

  • Connector provides hints (fast grouping), platform verifies via edges (correctness)
  • Backward compatible — nodes without hints still work, platform falls back to pure traversal
  • Partial connector support — ServiceNow connector can emit hints, Entra connector doesn't need to
  • Platform can detect hint-edge mismatches (data quality signal)

Cons:

  • More complex — two code paths (hint-based + edge-based)
  • Hints can go stale if connector logic changes but platform doesn't update
  • Duplication — chain pattern encoded in both hints and edge structure

3. Connector vs Platform Responsibility

Separation of Concerns

ConcernConnectorPlatformRationale
Entity discoveryConnector knows source API structure
Relationship resolutionConnector has all data in memory
Cross-system matchingOAuth.client_id → SP.app_id is connector domain knowledge
Chain identificationHints onlyPlatform sees all connectors, handles partial graphs
Chain stabilityPlatform has temporal view, connector sees one snapshot
Chain historyPlatform stores baselines, connector is stateless

Recommended boundary:

Connector: Discover entities → Resolve edges → Emit NormalizedGraph + optional chain hints
Platform: Ingest graph → Assemble chains → Track stability → Expose via API

Why Platform Should Own Chain Assembly

Reason 1: Partial graph handling

When ServiceNow and Entra connectors run at different times:

Monday: ServiceNow sync → BR + SI + REST + OAuth
Platform assembles: "Incomplete chain (no Azure SP yet)"

Tuesday: Entra sync → SP
Platform assembles: "Complete chain (OAuth → SP link now present)"
Platform merges: Same chain_id, now complete

Connector can't handle this — it only sees its own data.

Reason 2: Multi-connector chains

Future: GitHub Actions → AWS Lambda → Slack

github-connector:  GH Action → GH OIDC Token
aws-connector: Lambda Function → IAM Role
slack-connector: Slack Bot Token

No single connector can emit the full chain. Platform must stitch.

Reason 3: Chain evolution

Scan 1: BR → SI → REST → OAuth-v1 → SP
Scan 2: BR → SI → REST → OAuth-v2 → SP (credential rotated)
Scan 3: BR → SI-updated → REST → OAuth-v2 → SP (SI script changed)

Platform must decide: are these the same chain? Connector sees one scan only.


4. Cross-Connector Chain Assembly

Scenario: Separate Entra + ServiceNow Connectors

ServiceNow connector emits:

{
"connectorId": "servicenow_v1",
"nodes": [
{
"nodeId": "sn-br-abc",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "business_rule",
"chainMembership": {
"role": "entry_point",
"anchorEntityId": "abc",
"chainSemanticHash": "tr-incident-ep-graph"
}
}
},
{
"nodeId": "sn-oauth-jkl",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "oauth_app",
"clientId": "abc-123-xyz",
"chainMembership": {
"role": "credential",
"anchorEntityId": "abc",
"chainSemanticHash": "tr-incident-ep-graph"
}
}
}
],
"edges": [
{
"edgeType": "AUTHENTICATES_VIA",
"sourceNodeId": "sn-restmsg-ghi",
"targetNodeId": "sn-oauth-jkl"
}
]
}

Entra connector emits:

{
"connectorId": "entra_id_v1",
"nodes": [
{
"nodeId": "entra-sp-xyz",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "service_principal",
"appId": "abc-123-xyz"
}
}
],
"edges": []
}

Platform stitching:

// After Entra sync completes
class CrossSystemStitcher {
async stitchAuthenticationEdges(tenantId: string): Promise<void> {
// Find OAuth entities with clientId
const oauthNodes = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'autonomous_identity',
'properties.identitySubtype': 'oauth_app',
'properties.clientId': { $exists: true }
});

// Find SPs with matching appId
const spNodes = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'autonomous_identity',
'properties.identitySubtype': 'service_principal'
});

const spByAppId = new Map(
spNodes.map(sp => [
(sp.properties.appId as string).toLowerCase(),
sp
])
);

for (const oauthNode of oauthNodes) {
const clientId = (oauthNode.properties.clientId as string).toLowerCase();
const matchedSp = spByAppId.get(clientId);

if (matchedSp) {
// Create AUTHENTICATES_TO edge (SP → OAuth)
await this.storage.upsertEdge({
edgeId: `AUTHENTICATES_TO:${matchedSp.nodeId}:${oauthNode.nodeId}`,
edgeType: 'AUTHENTICATES_TO',
sourceNodeId: matchedSp.nodeId,
targetNodeId: oauthNode.nodeId,
properties: {
evidenceReferences: {
matchingField: 'client_id',
matchingValue: clientId,
issuingSystemId: matchedSp.properties.appId,
targetSystemId: clientId
}
}
});

// Copy chain hints from OAuth to SP
if (oauthNode.properties.chainMembership) {
await this.storage.updateNode(matchedSp.nodeId, {
'properties.chainMembership': {
role: 'destination',
anchorEntityId: oauthNode.properties.chainMembership.anchorEntityId,
chainSemanticHash: oauthNode.properties.chainMembership.chainSemanticHash
}
});
}
}
}
}
}

Result: Platform creates cross-system edges after both connectors run, then reassembles chains to include the new SP nodes.


5. NormalizedGraph Schema Changes

Backward Compatibility Strategy

Phase 1: Additive properties (non-breaking)

Add optional chainMembership to node properties:

// types.ts
interface AutonomousIdentityProperties {
identitySubtype: string;
// ... existing properties

// NEW (optional, backward compatible)
chainMembership?: {
role: 'entry_point' | 'executor' | 'credential' | 'destination';
anchorEntityId?: string;
chainSemanticHash?: string;
};
}

Phase 2: Chain metadata block (optional)

Add top-level chainMetadata array:

interface NormalizedGraph {
// ... existing fields

// NEW (optional, backward compatible)
chainMetadata?: ChainDefinition[];
}

Phase 3: Platform assembly (future)

Once chain assembler is stable, deprecate connector-emitted chainMetadata:

// Deprecation notice in docs:
// @deprecated Connectors should not emit chainMetadata.
// The platform assembles chains from nodes + edges.
// This field is ignored as of sv0-platform v2.0.

Validation Strategy

Connector-side validation:

# In transformer._enrich_automation_properties()
if node.get('properties', {}).get('chainMembership'):
# Validate role is one of allowed values
role = node['properties']['chainMembership'].get('role')
if role not in ('entry_point', 'executor', 'credential', 'destination'):
logger.warning(f"Invalid chainMembership role: {role}")
del node['properties']['chainMembership']

Platform-side validation:

// In ingestion/validator.ts
function validateChainMembership(node: NormalizedNode): ValidationResult {
const membership = node.properties.chainMembership;
if (!membership) return { valid: true };

// Check role consistency
const expectedRoles: Record<string, string[]> = {
'business_rule': ['entry_point'],
'flow_designer_flow': ['entry_point'],
'system_execution': ['executor'],
'oauth_app': ['credential'],
'service_principal': ['destination']
};

const subtype = node.properties.identitySubtype as string;
const allowedRoles = expectedRoles[subtype] || [];

if (!allowedRoles.includes(membership.role)) {
return {
valid: false,
error: `Invalid chainMembership.role "${membership.role}" for subtype "${subtype}"`
};
}

return { valid: true };
}

6. Incremental Sync & Chain Updates

Problem: Audit Log Syncs Don't Re-Send Full Chains

When running in syncMode: 'audit_log':

// Entra audit log sync
{
"auditRecords": [
{
"operation": "Remove service principal credentials",
"targetResources": [
{"id": "sp-xyz", "type": "servicePrincipal"}
],
"modifiedProperties": [
{"name": "KeyCredential", "oldValue": "cert-abc", "newValue": null}
]
}
]
}

The connector emits:

  • GraphEvent with eventType: 'credential_deleted'
  • Updated NormalizedNode for the SP (minus the deleted credential)

But it does not re-send the full chain (BR → SI → REST → OAuth → SP).

Platform challenge: How does the chain tracker know to update the chain's credential list?

Solution: Event-Driven Chain Updates

class ChainTracker {
async processGraphEvent(event: GraphEvent): Promise<void> {
// Find chains containing this entity
const chains = await this.db.collection('automation_chains').find({
tenant_id: event.tenantId,
$or: [
{ entry_point_node_id: event.entityId },
{ executor_node_ids: event.entityId },
{ credential_node_ids: event.entityId },
{ cross_system_identity_node_ids: event.entityId }
]
}).toArray();

for (const chain of chains) {
if (event.eventType === 'credential_deleted') {
// Remove deleted credential from chain
await this.db.collection('automation_chains').updateOne(
{ _id: chain._id },
{
$pull: { credential_node_ids: event.entityId },
$push: {
change_history: {
timestamp: new Date(),
change_type: 'credential_deleted',
entity_id: event.entityId,
sync_id: event.syncId
}
}
}
);
} else if (event.eventType === 'status_changed') {
// Mark chain as potentially inactive
await this.db.collection('automation_chains').updateOne(
{ _id: chain._id },
{
$set: { requires_revalidation: true },
$push: {
change_history: {
timestamp: new Date(),
change_type: 'member_status_changed',
entity_id: event.entityId
}
}
}
);
}
}
}
}

Revalidation strategy:

// Background job: daily chain revalidation
async function revalidateChains(tenantId: string): Promise<void> {
const staleChains = await db.collection('automation_chains').find({
tenant_id: tenantId,
requires_revalidation: true
}).toArray();

for (const chain of staleChains) {
// Re-traverse chain from entry point
const entryNode = await storage.getNode(chain.entry_point_node_id);
if (!entryNode || entryNode.status === 'deleted') {
// Entry point gone — mark chain as deleted
await db.collection('automation_chains').updateOne(
{ _id: chain._id },
{ $set: { status: 'deleted', deleted_at: new Date() } }
);
continue;
}

// Re-assemble chain
const freshChain = await chainAssembler._traverseChain(entryNode);
await chainAssembler._mergeChain(tenantId, chain.chain_id, freshChain);
}
}

7. Multi-Platform Chains (Future)

Scenario: GitHub Actions → AWS Lambda → Slack

GitHub connector:

{
"nodes": [
{
"nodeId": "gh-workflow-abc",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "github_workflow",
"chainMembership": {
"role": "entry_point",
"anchorEntityId": "workflow-abc",
"chainSemanticHash": "gh-issue-aws-notify"
}
}
},
{
"nodeId": "gh-oidc-token",
"nodeType": "credential",
"properties": {
"credentialSubtype": "oidc_token",
"audience": "sts.amazonaws.com"
}
}
],
"edges": [
{
"edgeType": "AUTHENTICATES_VIA",
"sourceNodeId": "gh-workflow-abc",
"targetNodeId": "gh-oidc-token"
}
]
}

AWS connector:

{
"nodes": [
{
"nodeId": "aws-role-xyz",
"nodeType": "role",
"properties": {
"roleName": "GitHubActionsRole",
"trustPolicy": {
"principal": "token.actions.githubusercontent.com",
"condition": { "StringEquals": { "aud": "sts.amazonaws.com" } }
}
}
},
{
"nodeId": "aws-lambda-notify",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "lambda_function"
}
}
],
"edges": [
{
"edgeType": "HAS_ROLE",
"sourceNodeId": "aws-lambda-notify",
"targetNodeId": "aws-role-xyz"
}
]
}

Slack connector:

{
"nodes": [
{
"nodeId": "slack-bot-token",
"nodeType": "credential",
"properties": {
"credentialSubtype": "api_key"
}
}
]
}

Platform stitching:

// Cross-platform chain assembly
class CrossPlatformStitcher {
async stitchOIDCChains(tenantId: string): Promise<void> {
// Find OIDC tokens with AWS audience
const oidcTokens = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'credential',
'properties.credentialSubtype': 'oidc_token',
'properties.audience': 'sts.amazonaws.com'
});

// Find AWS roles with OIDC trust policies
const awsRoles = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'role',
'properties.trustPolicy.principal': { $regex: /github|gitlab/ }
});

for (const token of oidcTokens) {
for (const role of awsRoles) {
// Match based on trust policy conditions
if (this._oidcTrustMatches(token, role)) {
await this.storage.upsertEdge({
edgeId: `DELEGATES_TO:${token.nodeId}:${role.nodeId}`,
edgeType: 'DELEGATES_TO',
sourceNodeId: token.nodeId,
targetNodeId: role.nodeId,
properties: {
delegationType: 'oidc_federation',
evidenceReferences: {
trustPolicy: role.properties.trustPolicy
}
}
});
}
}
}
}
}

Result: Platform creates GH Workflow → OIDC Token -[DELEGATES_TO]-> AWS Role → Lambda chain.

Connector implication: Each connector only knows its own platform. Platform must have cross-platform correlation logic.


8. Practical Recommendation

Three-Phase Implementation

Phase 1: Node-Level Chain Hints (2-3 days)

Connector work:

  • Add chainMembership properties to automation nodes
  • Emit anchorEntityId and chainSemanticHash in transformer

Platform work:

  • Extend NormalizedNode schema with optional chainMembership property
  • No behavioral changes — hints are informational only

Result: Connector emits chain context, platform ignores it (forward compatibility).

Phase 2: Platform Chain Assembler (1-2 weeks)

Platform work:

  • Implement ChainAssembler class (BFS traversal from entry points)
  • Implement automation_chains collection with chain tracking
  • Implement chain stability logic (chain_id computation, credential rotation detection)
  • Add /api/v1/tenants/:id/automation-chains endpoint

Connector work:

  • None — existing nodes + edges are sufficient

Result: Platform can show "Automation Chains" view with stable IDs across scans.

Phase 3: Incremental Chain Updates (1 week)

Platform work:

  • Implement event-driven chain updates (listen to GraphEvent stream)
  • Implement chain revalidation background job
  • Add change_history array to chains for audit trail

Result: Chains update incrementally without full re-sync.

Total Effort Estimate

PhaseComponentEffortRisk
Phase 1Connector hints2-3 daysLow
Platform schema2 hoursNone
Phase 2Chain assembler5 daysMedium (graph traversal complexity)
Chain tracker3 daysLow
API endpoints2 daysLow
Phase 3Event-driven updates3 daysMedium (incremental correctness)
Revalidation job2 daysLow
Total~3-4 weeksMedium

Why Not Separate Collection?

Reason 1: Chains are derived, not source entities

Chains are computed from nodes + edges. They're not discovered by connectors — they're assembled by the platform.

Analogy: A "blast radius" query doesn't create a blast_radius collection — it's a traversal result. Same with chains.

Reason 2: Storage doesn't solve the identity problem

The question "is this the same chain?" isn't answered by having a separate collection. It's answered by having a stable chain ID algorithm.

Reason 3: Chains change faster than baselines

Chains can change multiple times per day (credential rotation, script updates). Storing them as versioned baselines bloats the database.

Better: Store chains as {chain_id, member_node_ids[], change_history[]} — lightweight tracking, heavy lifting is in the graph.


Files Produced

  • /Users/lucky/dev/securityv0/sv0-documentation/docs/analysis/2026-02-12-automation-classification/03-automation-persistence-integrator.md (this document)

Next Steps

  1. Architect review — Is the chain ID stability algorithm correct?
  2. Developer review — Does the platform schema design make sense?
  3. CEO decision — Phase 1 only? Or commit to full 3-phase implementation?
  4. If approved: Create ADR for chain tracking architecture