Automation Persistence — Integrator Analysis
Date: 2026-02-13 Author: Integrator (connector architecture, data pipeline, cross-system correlation) Status: Draft for team review Related: CEO synthesis, Execution Flow Analysis
Executive Summary
The question of whether automations need a separate collection is fundamentally a connector architecture and data pipeline question, not primarily a UI or database question.
Core finding: The problem is real, but a separate automations collection is the wrong solution. The right solution is:
- Stable chain anchors in NormalizedGraph — connectors emit
chainIdmetadata for each automation node - Platform-side chain assembly — the ingestion pipeline builds chain entities from nodes + edges, tracking them over time
- Backward-compatible schema evolution — extend NormalizedGraph, don't break it
Why this matters: When connector sync schedules diverge (Entra Monday, ServiceNow Friday), chain stability must be computable from partial graphs — the platform can't wait for "all entities present" to identify chains.
Implementation path: 3-phase rollout (metadata-only → chain hints → platform assembly), fully backward compatible.
Problem Statement (Connector Perspective)
The CEO's Concern
"If at one point we find an automation with certain list of entities and connections, it has certain business logic and flow. Then some entities change, eg. oauth client id is updated in the chain, but logic is the same. From end user/CISO perspective — everything remains the same. So logically we should continue to show exactly same automation in the UI."
This is a cross-scan identity problem: how does the platform know that these two graphs represent the "same" automation?
Scan 1 (Monday):
BR:auto-route → SI:AzureGraphRouter → REST:Graph-Router → OAuth:abc123 → SP:GraphApp
Scan 2 (Friday, OAuth client_id rotated):
BR:auto-route → SI:AzureGraphRouter → REST:Graph-Router → OAuth:xyz789 → SP:GraphApp
↑ new oauth_entity record
From the CISO perspective: same BR, same SI, same REST Message, same Azure SP, same functionality. The OAuth credential rotated (good security hygiene!) but the automation chain is unchanged.
From the platform's perspective (today): the OAuth entity sys_id changed, so the graph diff shows oauth_entity deleted + oauth_entity created. The BR→REST Message path looks like it was re-created.
Connector question: Should the connector emit metadata that says "this is the same chain despite entity changes"?
1. Chain Discovery Across Connectors
Today's Architecture (Single Connector)
The Entra-ServiceNow connector is monolithic — it discovers both ServiceNow and Entra entities in a single sync run:
# servicenow_client.discover_all()
business_rules = client.get_business_rules()
script_includes = client.get_script_includes()
rest_messages = client.get_rest_messages()
oauth_entities = client.get_oauth_entities()
# entra_client.discover_service_principals()
service_principals = client.get_service_principals()
sign_ins = client.get_sign_ins()
# correlator.correlate_execution_chains()
# Matches OAuth.client_id → SP.app_id IN MEMORY
for oauth in oauth_entities:
client_id = oauth['client_id']
matched_sp = sp_by_app_id.get(client_id)
if matched_sp:
chains.append(ExecutionChain(
rest_message=rm,
oauth_entity=oauth,
azure_sp=matched_sp,
...
))
Key insight: Chain assembly happens in the connector because all entities are available in the same process, at the same time.
Tomorrow's Architecture (Separate Connectors)
When we split into entra-connector and servicenow-connector, chain assembly becomes asynchronous:
Monday 9am — ServiceNow connector runs:
{
"nodes": [
{"nodeId": "sn-br-abc", "nodeType": "autonomous_identity", "identitySubtype": "business_rule"},
{"nodeId": "sn-si-def", "nodeType": "autonomous_identity", "identitySubtype": "system_execution"},
{"nodeId": "sn-restmsg-ghi", "nodeType": "resource", "resourceType": "rest_message"},
{"nodeId": "sn-oauth-jkl", "nodeType": "autonomous_identity", "identitySubtype": "oauth_app", "properties": {"clientId": "abc-123"}}
],
"edges": [
{"edgeType": "CALLS", "sourceNodeId": "sn-br-abc", "targetNodeId": "sn-si-def"},
{"edgeType": "EXECUTES_ON", "sourceNodeId": "sn-si-def", "targetNodeId": "sn-restmsg-ghi"},
{"edgeType": "AUTHENTICATES_VIA", "sourceNodeId": "sn-restmsg-ghi", "targetNodeId": "sn-oauth-jkl"}
]
}
Tuesday 3pm — Entra connector runs:
{
"nodes": [
{"nodeId": "entra-sp-xyz", "nodeType": "autonomous_identity", "identitySubtype": "service_principal", "properties": {"appId": "abc-123"}}
],
"edges": []
}
Platform ingestion (Tuesday 3pm): Now the platform must:
- Match
sn-oauth-jkl.clientId == entra-sp-xyz.appId✓ (already works via AUTHENTICATES_TO edge creation) - Realize that BR→SI→REST→OAuth→SP is a chain
- Assign that chain a stable ID that survives:
- OAuth credential rotation
- REST Message endpoint URL changes
- BR script updates (logic unchanged)
- SP credential rotation
Question for connector architecture: Who emits the chain ID? Connector or platform?
2. Chain Stability Across Scans
Option A: Connector Emits Chain Metadata
Approach: Add chainMetadata to NormalizedGraph:
interface NormalizedGraph {
syncId: string;
connectorId: string;
tenantId: string;
transformedAt: string;
nodes: NormalizedNode[];
edges: NormalizedEdge[];
temporalMarkers: TemporalMarker[];
evidenceCompleteness: EvidenceCompletenessReport;
// NEW: Connector-computed chain definitions
chainMetadata?: ChainDefinition[];
}
interface ChainDefinition {
chainId: string; // Stable ID (see options below)
chainType: 'trigger_to_destination' | 'scheduled_task' | 'approval_workflow';
displayName: string; // Human-readable name
anchorEntityId: string; // Primary stable reference (e.g., BR source_id)
// Member entities (by nodeId)
entryPoints: string[]; // BR/Flow sys_ids that trigger this chain
executors: string[]; // SI sys_ids in the execution path
credentials: string[]; // OAuth entity sys_ids
destinations: string[]; // SP/endpoint identities
// Semantic fingerprint
triggerPattern: {
tables: string[]; // Tables that trigger this chain
events: string[]; // insert/update/delete
};
egressPattern: {
category: 'external' | 'cloud' | 'internal';
destinations: string[]; // Base URLs or SP display names
};
// Provenance
firstSeenAt: string;
lastSeenAt: string;
scanVersions: string[]; // syncIds where this chain was observed
}
Connector implementation (transformer.py):
def _build_chain_metadata(self, chain: ExecutionChain) -> dict:
"""Emit chain-level metadata for platform chain tracking."""
# Option 1: Anchor on entry point (BR/Flow)
anchor_source_id = None
if chain.business_rules:
anchor_source_id = chain.business_rules[0].get('sys_id')
elif chain.flows:
anchor_source_id = chain.flows[0].get('sys_id')
if not anchor_source_id:
return None # Can't build stable chain without entry point
# Chain ID = hash(entry_point + trigger_table + destination)
chain_id_input = f"{anchor_source_id}:{chain.trigger_info.get('table', '')}:{chain.rest_message.get('endpoint', '')}"
chain_id = hashlib.sha256(chain_id_input.encode()).hexdigest()[:16]
return {
"chainId": chain_id,
"chainType": "trigger_to_destination",
"displayName": chain.display_name,
"anchorEntityId": anchor_source_id,
"entryPoints": [br['sys_id'] for br in chain.business_rules],
"executors": [si['sys_id'] for si in chain.script_includes],
"credentials": [chain.oauth_entity.get('sys_id')] if chain.oauth_entity else [],
"destinations": [chain.azure_sp.get('id')] if chain.azure_sp else [],
"triggerPattern": {
"tables": [chain.trigger_info.get('table')],
"events": chain.trigger_info.get('events', [])
},
"egressPattern": {
"category": classify_egress(chain.rest_message.get('endpoint')),
"destinations": [chain.rest_message.get('endpoint')]
},
"firstSeenAt": self._current_sync_time,
"lastSeenAt": self._current_sync_time,
"scanVersions": [self.config.sync_id]
}
Platform ingestion (chain merger):
// sv0-platform/src/ingestion/chain-tracker.ts
class ChainTracker {
async mergeChainMetadata(
tenantId: string,
incomingChains: ChainDefinition[]
): Promise<void> {
for (const newChain of incomingChains) {
// Find existing chain by chainId
const existing = await this.db.collection('automation_chains').findOne({
tenant_id: tenantId,
chain_id: newChain.chainId
});
if (existing) {
// Chain stability check: did member entities change?
const credentialChanged = !arraysEqual(
existing.credentials,
newChain.credentials
);
if (credentialChanged) {
// Credential rotation — update chain, preserve chainId
await this.db.collection('automation_chains').updateOne(
{ _id: existing._id },
{
$set: {
credentials: newChain.credentials,
lastSeenAt: newChain.lastSeenAt
},
$push: {
scanVersions: newChain.scanVersions[0],
credentialHistory: {
timestamp: newChain.lastSeenAt,
oldCredentials: existing.credentials,
newCredentials: newChain.credentials
}
}
}
);
} else {
// No changes — just update lastSeenAt
await this.db.collection('automation_chains').updateOne(
{ _id: existing._id },
{
$set: { lastSeenAt: newChain.lastSeenAt },
$push: { scanVersions: newChain.scanVersions[0] }
}
);
}
} else {
// New chain — insert with history initialized
await this.db.collection('automation_chains').insertOne({
tenant_id: tenantId,
chain_id: newChain.chainId,
...newChain,
credentialHistory: []
});
}
}
}
}
Option A Pros/Cons
Pros:
- Connector has full context — knows BR calls SI, knows trigger table, knows destination
- Chain ID computed deterministically from semantic properties
- Works with monolithic connector (Entra-ServiceNow) TODAY
- Backward compatible — platform ignores
chainMetadataif not present
Cons:
- Breaks down with separate connectors — ServiceNow connector can't compute full chain without Azure SP
- Connector must maintain chain ID stability logic (complex, error-prone)
- What if two BRs share the same SI? Which BR is the anchor?
- Requires connector to understand "what makes a chain the same" — this is business logic leaking into extraction layer
Option B: Platform Assembles Chains from Edges
Approach: Connectors emit nodes + edges only. Platform builds chains via graph traversal.
NormalizedGraph (no changes):
// Existing schema — nodes + edges only
interface NormalizedGraph {
nodes: NormalizedNode[];
edges: NormalizedEdge[];
// No chainMetadata field
}
Platform chain assembler:
// sv0-platform/src/ingestion/chain-assembler.ts
class ChainAssembler {
/**
* Build automation chains from ingested graph via BFS traversal.
*
* Algorithm:
* 1. Find all automation entry points (nodes with TRIGGERS_ON edges)
* 2. For each entry point, BFS traverse:
* - Follow CALLS edges (BR → SI, SI → SI)
* - Follow EXECUTES_ON edges (SI → REST Message, BR → REST Message)
* - Follow AUTHENTICATES_VIA edges (REST Message → OAuth)
* - Follow AUTHENTICATES_TO edges (SP → OAuth)
* 3. Each traversal produces a chain subgraph
* 4. Compute chain ID from anchor + trigger + destination
*/
async assembleChains(tenantId: string, syncId: string): Promise<void> {
const entryPoints = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'autonomous_identity',
'properties.identitySubtype': {
$in: ['business_rule', 'flow_designer_flow', 'scheduled_job']
}
});
for (const entryNode of entryPoints) {
const chain = await this._traverseChain(entryNode);
if (!chain) continue;
// Compute stable chain ID
const chainId = this._computeChainId(chain);
// Merge with existing chain
await this._mergeChain(tenantId, chainId, chain, syncId);
}
}
private async _traverseChain(
entryNode: NormalizedNode
): Promise<ChainSubgraph | null> {
const visited = new Set<string>();
const subgraph: ChainSubgraph = {
entryPoint: entryNode,
executors: [],
resources: [],
credentials: [],
crossSystemIdentities: [],
edges: []
};
// BFS from entry point
const queue = [entryNode];
while (queue.length > 0) {
const current = queue.shift()!;
if (visited.has(current.nodeId)) continue;
visited.add(current.nodeId);
// Get outgoing edges
const edges = await this.storage.getEdgesForNode(
current.nodeId,
'outgoing'
);
for (const edge of edges) {
subgraph.edges.push(edge);
const target = await this.storage.getNode(edge.targetNodeId);
if (!target) continue;
// Classify target and continue traversal
if (edge.edgeType === 'CALLS') {
// Code invocation — continue chain
subgraph.executors.push(target);
queue.push(target);
} else if (edge.edgeType === 'EXECUTES_ON') {
// Resource access — terminal or continuation
if (target.nodeType === 'resource') {
subgraph.resources.push(target);
// If REST Message, look for AUTHENTICATES_VIA
const authEdges = await this.storage.getEdgesForNode(
target.nodeId,
'outgoing'
);
const authViaEdge = authEdges.find(
e => e.edgeType === 'AUTHENTICATES_VIA'
);
if (authViaEdge) {
queue.push(await this.storage.getNode(authViaEdge.targetNodeId));
}
}
} else if (edge.edgeType === 'AUTHENTICATES_TO') {
// Cross-system auth — terminal identity
subgraph.crossSystemIdentities.push(target);
}
}
}
return subgraph.credentials.length > 0 || subgraph.resources.length > 0
? subgraph
: null; // Incomplete chain — missing destination
}
private _computeChainId(chain: ChainSubgraph): string {
// Stable anchor: entry point source_id (BR/Flow sys_id)
const anchor = chain.entryPoint.sourceId;
// Trigger pattern: tables from TRIGGERS_ON edges
const triggerTables = chain.edges
.filter(e => e.edgeType === 'TRIGGERS_ON')
.map(e => e.targetNodeId) // Table resource node IDs
.sort()
.join(',');
// Destination pattern: REST Message endpoints or SP appIds
const destinations = [
...chain.resources
.filter(r => r.properties.resourceType === 'rest_message')
.map(r => r.properties.endpoint_url as string),
...chain.crossSystemIdentities
.map(sp => sp.properties.appId as string)
]
.filter(Boolean)
.sort()
.join(',');
// Hash(anchor + triggers + destinations)
const input = `${anchor}|${triggerTables}|${destinations}`;
return crypto.createHash('sha256').update(input).digest('hex').slice(0, 16);
}
private async _mergeChain(
tenantId: string,
chainId: string,
chain: ChainSubgraph,
syncId: string
): Promise<void> {
const existing = await this.db.collection('automation_chains').findOne({
tenant_id: tenantId,
chain_id: chainId
});
if (existing) {
// Detect entity changes
const credentialIds = chain.credentials.map(c => c.nodeId);
const credentialChanged = !arraysEqual(
existing.credential_node_ids,
credentialIds
);
await this.db.collection('automation_chains').updateOne(
{ _id: existing._id },
{
$set: {
entry_point_node_id: chain.entryPoint.nodeId,
executor_node_ids: chain.executors.map(e => e.nodeId),
resource_node_ids: chain.resources.map(r => r.nodeId),
credential_node_ids: credentialIds,
cross_system_identity_node_ids: chain.crossSystemIdentities.map(i => i.nodeId),
last_seen_at: new Date(),
last_seen_sync_id: syncId
},
$push: {
scan_versions: syncId,
...(credentialChanged && {
change_history: {
timestamp: new Date(),
change_type: 'credential_rotation',
old_credential_ids: existing.credential_node_ids,
new_credential_ids: credentialIds
}
})
}
}
);
} else {
// New chain
await this.db.collection('automation_chains').insertOne({
tenant_id: tenantId,
chain_id: chainId,
entry_point_node_id: chain.entryPoint.nodeId,
executor_node_ids: chain.executors.map(e => e.nodeId),
resource_node_ids: chain.resources.map(r => r.nodeId),
credential_node_ids: chain.credentials.map(c => c.nodeId),
cross_system_identity_node_ids: chain.crossSystemIdentities.map(i => i.nodeId),
first_seen_at: new Date(),
last_seen_at: new Date(),
last_seen_sync_id: syncId,
scan_versions: [syncId],
change_history: []
});
}
}
}
interface ChainSubgraph {
entryPoint: NormalizedNode;
executors: NormalizedNode[]; // Script Includes
resources: NormalizedNode[]; // REST Messages, tables
credentials: NormalizedNode[]; // OAuth entities
crossSystemIdentities: NormalizedNode[]; // Azure SPs
edges: NormalizedEdge[];
}
Option B Pros/Cons
Pros:
- Connector simplicity — connectors just emit nodes + edges, no chain logic
- Works with separate connectors — platform assembles chains when both sides are present
- Handles partial graphs — if ServiceNow syncs but Entra doesn't, platform still builds "incomplete" chain and waits for Entra data
- Platform owns stability — chain ID computation in one place, easier to debug/evolve
- No breaking changes — existing connectors work, chain assembly is additive
Cons:
- Platform must traverse graphs (CPU cost, complexity)
- Chain assembly happens after ingestion — can't validate chains at ingestion time
- BFS traversal logic must be kept in sync with edge semantics
- What if platform traversal logic has bugs? Connector can't verify chain was assembled correctly
Option C: Hybrid (Connector Hints, Platform Assembles)
Approach: Connectors emit lightweight chain hints in node properties, platform uses hints + edges to assemble chains.
Node properties extension:
interface AutonomousIdentityProperties {
identitySubtype: 'business_rule' | 'flow_designer_flow' | ...;
// NEW: Chain membership hints (optional, non-breaking)
chainMembership?: {
role: 'entry_point' | 'executor' | 'credential' | 'destination';
anchorEntityId?: string; // Entry point source_id if known
chainSemanticHash?: string; // Hash of trigger+destination pattern
};
}
Connector emits hints (transformer.py):
def _process_execution_chain(self, chain: ExecutionChain) -> None:
# Compute chain semantic hash
trigger_table = chain.trigger_info.get('table', '')
endpoint = chain.rest_message.get('endpoint', '')
chain_hash = hashlib.sha256(f"{trigger_table}:{endpoint}".encode()).hexdigest()[:8]
# Entry point (BR)
for br in chain.business_rules:
br_node = self._add_node(
node_id=f"sn-br-{br['sys_id']}",
properties={
"identitySubtype": "business_rule",
"chainMembership": {
"role": "entry_point",
"anchorEntityId": br['sys_id'],
"chainSemanticHash": chain_hash
}
}
)
# Executor (SI)
for si in chain.script_includes:
si_node = self._add_node(
node_id=f"sn-si-{si['sys_id']}",
properties={
"identitySubtype": "system_execution",
"chainMembership": {
"role": "executor",
"anchorEntityId": chain.business_rules[0]['sys_id'] if chain.business_rules else None,
"chainSemanticHash": chain_hash
}
}
)
# ... similar for credentials, destinations
Platform uses hints to optimize assembly:
class ChainAssembler {
async assembleChains(tenantId: string): Promise<void> {
// Fast path: group nodes by chainSemanticHash
const nodesByChainHash = await this.storage.queryNodes({
tenant_id: tenantId,
'properties.chainMembership.chainSemanticHash': { $exists: true }
}).then(nodes => groupBy(nodes, n =>
n.properties.chainMembership?.chainSemanticHash
));
for (const [chainHash, members] of Object.entries(nodesByChainHash)) {
// Hint-based grouping — verify with edge traversal
const entryPoint = members.find(
m => m.properties.chainMembership?.role === 'entry_point'
);
if (!entryPoint) continue;
// Traverse edges to confirm chain structure
const chain = await this._traverseChain(entryPoint);
// Compute stable chain ID from anchor + pattern
const anchor = entryPoint.properties.chainMembership?.anchorEntityId
|| entryPoint.sourceId;
const chainId = crypto.createHash('sha256')
.update(`${anchor}|${chainHash}`)
.digest('hex')
.slice(0, 16);
await this._mergeChain(tenantId, chainId, chain);
}
}
}
Option C Pros/Cons
Pros:
- Connector provides hints (fast grouping), platform verifies via edges (correctness)
- Backward compatible — nodes without hints still work, platform falls back to pure traversal
- Partial connector support — ServiceNow connector can emit hints, Entra connector doesn't need to
- Platform can detect hint-edge mismatches (data quality signal)
Cons:
- More complex — two code paths (hint-based + edge-based)
- Hints can go stale if connector logic changes but platform doesn't update
- Duplication — chain pattern encoded in both hints and edge structure
3. Connector vs Platform Responsibility
Separation of Concerns
| Concern | Connector | Platform | Rationale |
|---|---|---|---|
| Entity discovery | ✓ | Connector knows source API structure | |
| Relationship resolution | ✓ | Connector has all data in memory | |
| Cross-system matching | ✓ | OAuth.client_id → SP.app_id is connector domain knowledge | |
| Chain identification | Hints only | ✓ | Platform sees all connectors, handles partial graphs |
| Chain stability | ✓ | Platform has temporal view, connector sees one snapshot | |
| Chain history | ✓ | Platform stores baselines, connector is stateless |
Recommended boundary:
Connector: Discover entities → Resolve edges → Emit NormalizedGraph + optional chain hints
Platform: Ingest graph → Assemble chains → Track stability → Expose via API
Why Platform Should Own Chain Assembly
Reason 1: Partial graph handling
When ServiceNow and Entra connectors run at different times:
Monday: ServiceNow sync → BR + SI + REST + OAuth
Platform assembles: "Incomplete chain (no Azure SP yet)"
Tuesday: Entra sync → SP
Platform assembles: "Complete chain (OAuth → SP link now present)"
Platform merges: Same chain_id, now complete
Connector can't handle this — it only sees its own data.
Reason 2: Multi-connector chains
Future: GitHub Actions → AWS Lambda → Slack
github-connector: GH Action → GH OIDC Token
aws-connector: Lambda Function → IAM Role
slack-connector: Slack Bot Token
No single connector can emit the full chain. Platform must stitch.
Reason 3: Chain evolution
Scan 1: BR → SI → REST → OAuth-v1 → SP
Scan 2: BR → SI → REST → OAuth-v2 → SP (credential rotated)
Scan 3: BR → SI-updated → REST → OAuth-v2 → SP (SI script changed)
Platform must decide: are these the same chain? Connector sees one scan only.
4. Cross-Connector Chain Assembly
Scenario: Separate Entra + ServiceNow Connectors
ServiceNow connector emits:
{
"connectorId": "servicenow_v1",
"nodes": [
{
"nodeId": "sn-br-abc",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "business_rule",
"chainMembership": {
"role": "entry_point",
"anchorEntityId": "abc",
"chainSemanticHash": "tr-incident-ep-graph"
}
}
},
{
"nodeId": "sn-oauth-jkl",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "oauth_app",
"clientId": "abc-123-xyz",
"chainMembership": {
"role": "credential",
"anchorEntityId": "abc",
"chainSemanticHash": "tr-incident-ep-graph"
}
}
}
],
"edges": [
{
"edgeType": "AUTHENTICATES_VIA",
"sourceNodeId": "sn-restmsg-ghi",
"targetNodeId": "sn-oauth-jkl"
}
]
}
Entra connector emits:
{
"connectorId": "entra_id_v1",
"nodes": [
{
"nodeId": "entra-sp-xyz",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "service_principal",
"appId": "abc-123-xyz"
}
}
],
"edges": []
}
Platform stitching:
// After Entra sync completes
class CrossSystemStitcher {
async stitchAuthenticationEdges(tenantId: string): Promise<void> {
// Find OAuth entities with clientId
const oauthNodes = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'autonomous_identity',
'properties.identitySubtype': 'oauth_app',
'properties.clientId': { $exists: true }
});
// Find SPs with matching appId
const spNodes = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'autonomous_identity',
'properties.identitySubtype': 'service_principal'
});
const spByAppId = new Map(
spNodes.map(sp => [
(sp.properties.appId as string).toLowerCase(),
sp
])
);
for (const oauthNode of oauthNodes) {
const clientId = (oauthNode.properties.clientId as string).toLowerCase();
const matchedSp = spByAppId.get(clientId);
if (matchedSp) {
// Create AUTHENTICATES_TO edge (SP → OAuth)
await this.storage.upsertEdge({
edgeId: `AUTHENTICATES_TO:${matchedSp.nodeId}:${oauthNode.nodeId}`,
edgeType: 'AUTHENTICATES_TO',
sourceNodeId: matchedSp.nodeId,
targetNodeId: oauthNode.nodeId,
properties: {
evidenceReferences: {
matchingField: 'client_id',
matchingValue: clientId,
issuingSystemId: matchedSp.properties.appId,
targetSystemId: clientId
}
}
});
// Copy chain hints from OAuth to SP
if (oauthNode.properties.chainMembership) {
await this.storage.updateNode(matchedSp.nodeId, {
'properties.chainMembership': {
role: 'destination',
anchorEntityId: oauthNode.properties.chainMembership.anchorEntityId,
chainSemanticHash: oauthNode.properties.chainMembership.chainSemanticHash
}
});
}
}
}
}
}
Result: Platform creates cross-system edges after both connectors run, then reassembles chains to include the new SP nodes.
5. NormalizedGraph Schema Changes
Backward Compatibility Strategy
Phase 1: Additive properties (non-breaking)
Add optional chainMembership to node properties:
// types.ts
interface AutonomousIdentityProperties {
identitySubtype: string;
// ... existing properties
// NEW (optional, backward compatible)
chainMembership?: {
role: 'entry_point' | 'executor' | 'credential' | 'destination';
anchorEntityId?: string;
chainSemanticHash?: string;
};
}
Phase 2: Chain metadata block (optional)
Add top-level chainMetadata array:
interface NormalizedGraph {
// ... existing fields
// NEW (optional, backward compatible)
chainMetadata?: ChainDefinition[];
}
Phase 3: Platform assembly (future)
Once chain assembler is stable, deprecate connector-emitted chainMetadata:
// Deprecation notice in docs:
// @deprecated Connectors should not emit chainMetadata.
// The platform assembles chains from nodes + edges.
// This field is ignored as of sv0-platform v2.0.
Validation Strategy
Connector-side validation:
# In transformer._enrich_automation_properties()
if node.get('properties', {}).get('chainMembership'):
# Validate role is one of allowed values
role = node['properties']['chainMembership'].get('role')
if role not in ('entry_point', 'executor', 'credential', 'destination'):
logger.warning(f"Invalid chainMembership role: {role}")
del node['properties']['chainMembership']
Platform-side validation:
// In ingestion/validator.ts
function validateChainMembership(node: NormalizedNode): ValidationResult {
const membership = node.properties.chainMembership;
if (!membership) return { valid: true };
// Check role consistency
const expectedRoles: Record<string, string[]> = {
'business_rule': ['entry_point'],
'flow_designer_flow': ['entry_point'],
'system_execution': ['executor'],
'oauth_app': ['credential'],
'service_principal': ['destination']
};
const subtype = node.properties.identitySubtype as string;
const allowedRoles = expectedRoles[subtype] || [];
if (!allowedRoles.includes(membership.role)) {
return {
valid: false,
error: `Invalid chainMembership.role "${membership.role}" for subtype "${subtype}"`
};
}
return { valid: true };
}
6. Incremental Sync & Chain Updates
Problem: Audit Log Syncs Don't Re-Send Full Chains
When running in syncMode: 'audit_log':
// Entra audit log sync
{
"auditRecords": [
{
"operation": "Remove service principal credentials",
"targetResources": [
{"id": "sp-xyz", "type": "servicePrincipal"}
],
"modifiedProperties": [
{"name": "KeyCredential", "oldValue": "cert-abc", "newValue": null}
]
}
]
}
The connector emits:
GraphEventwitheventType: 'credential_deleted'- Updated
NormalizedNodefor the SP (minus the deleted credential)
But it does not re-send the full chain (BR → SI → REST → OAuth → SP).
Platform challenge: How does the chain tracker know to update the chain's credential list?
Solution: Event-Driven Chain Updates
class ChainTracker {
async processGraphEvent(event: GraphEvent): Promise<void> {
// Find chains containing this entity
const chains = await this.db.collection('automation_chains').find({
tenant_id: event.tenantId,
$or: [
{ entry_point_node_id: event.entityId },
{ executor_node_ids: event.entityId },
{ credential_node_ids: event.entityId },
{ cross_system_identity_node_ids: event.entityId }
]
}).toArray();
for (const chain of chains) {
if (event.eventType === 'credential_deleted') {
// Remove deleted credential from chain
await this.db.collection('automation_chains').updateOne(
{ _id: chain._id },
{
$pull: { credential_node_ids: event.entityId },
$push: {
change_history: {
timestamp: new Date(),
change_type: 'credential_deleted',
entity_id: event.entityId,
sync_id: event.syncId
}
}
}
);
} else if (event.eventType === 'status_changed') {
// Mark chain as potentially inactive
await this.db.collection('automation_chains').updateOne(
{ _id: chain._id },
{
$set: { requires_revalidation: true },
$push: {
change_history: {
timestamp: new Date(),
change_type: 'member_status_changed',
entity_id: event.entityId
}
}
}
);
}
}
}
}
Revalidation strategy:
// Background job: daily chain revalidation
async function revalidateChains(tenantId: string): Promise<void> {
const staleChains = await db.collection('automation_chains').find({
tenant_id: tenantId,
requires_revalidation: true
}).toArray();
for (const chain of staleChains) {
// Re-traverse chain from entry point
const entryNode = await storage.getNode(chain.entry_point_node_id);
if (!entryNode || entryNode.status === 'deleted') {
// Entry point gone — mark chain as deleted
await db.collection('automation_chains').updateOne(
{ _id: chain._id },
{ $set: { status: 'deleted', deleted_at: new Date() } }
);
continue;
}
// Re-assemble chain
const freshChain = await chainAssembler._traverseChain(entryNode);
await chainAssembler._mergeChain(tenantId, chain.chain_id, freshChain);
}
}
7. Multi-Platform Chains (Future)
Scenario: GitHub Actions → AWS Lambda → Slack
GitHub connector:
{
"nodes": [
{
"nodeId": "gh-workflow-abc",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "github_workflow",
"chainMembership": {
"role": "entry_point",
"anchorEntityId": "workflow-abc",
"chainSemanticHash": "gh-issue-aws-notify"
}
}
},
{
"nodeId": "gh-oidc-token",
"nodeType": "credential",
"properties": {
"credentialSubtype": "oidc_token",
"audience": "sts.amazonaws.com"
}
}
],
"edges": [
{
"edgeType": "AUTHENTICATES_VIA",
"sourceNodeId": "gh-workflow-abc",
"targetNodeId": "gh-oidc-token"
}
]
}
AWS connector:
{
"nodes": [
{
"nodeId": "aws-role-xyz",
"nodeType": "role",
"properties": {
"roleName": "GitHubActionsRole",
"trustPolicy": {
"principal": "token.actions.githubusercontent.com",
"condition": { "StringEquals": { "aud": "sts.amazonaws.com" } }
}
}
},
{
"nodeId": "aws-lambda-notify",
"nodeType": "autonomous_identity",
"properties": {
"identitySubtype": "lambda_function"
}
}
],
"edges": [
{
"edgeType": "HAS_ROLE",
"sourceNodeId": "aws-lambda-notify",
"targetNodeId": "aws-role-xyz"
}
]
}
Slack connector:
{
"nodes": [
{
"nodeId": "slack-bot-token",
"nodeType": "credential",
"properties": {
"credentialSubtype": "api_key"
}
}
]
}
Platform stitching:
// Cross-platform chain assembly
class CrossPlatformStitcher {
async stitchOIDCChains(tenantId: string): Promise<void> {
// Find OIDC tokens with AWS audience
const oidcTokens = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'credential',
'properties.credentialSubtype': 'oidc_token',
'properties.audience': 'sts.amazonaws.com'
});
// Find AWS roles with OIDC trust policies
const awsRoles = await this.storage.queryNodes({
tenant_id: tenantId,
node_type: 'role',
'properties.trustPolicy.principal': { $regex: /github|gitlab/ }
});
for (const token of oidcTokens) {
for (const role of awsRoles) {
// Match based on trust policy conditions
if (this._oidcTrustMatches(token, role)) {
await this.storage.upsertEdge({
edgeId: `DELEGATES_TO:${token.nodeId}:${role.nodeId}`,
edgeType: 'DELEGATES_TO',
sourceNodeId: token.nodeId,
targetNodeId: role.nodeId,
properties: {
delegationType: 'oidc_federation',
evidenceReferences: {
trustPolicy: role.properties.trustPolicy
}
}
});
}
}
}
}
}
Result: Platform creates GH Workflow → OIDC Token -[DELEGATES_TO]-> AWS Role → Lambda chain.
Connector implication: Each connector only knows its own platform. Platform must have cross-platform correlation logic.
8. Practical Recommendation
Three-Phase Implementation
Phase 1: Node-Level Chain Hints (2-3 days)
Connector work:
- Add
chainMembershipproperties to automation nodes - Emit
anchorEntityIdandchainSemanticHashin transformer
Platform work:
- Extend
NormalizedNodeschema with optionalchainMembershipproperty - No behavioral changes — hints are informational only
Result: Connector emits chain context, platform ignores it (forward compatibility).
Phase 2: Platform Chain Assembler (1-2 weeks)
Platform work:
- Implement
ChainAssemblerclass (BFS traversal from entry points) - Implement
automation_chainscollection with chain tracking - Implement chain stability logic (chain_id computation, credential rotation detection)
- Add
/api/v1/tenants/:id/automation-chainsendpoint
Connector work:
- None — existing nodes + edges are sufficient
Result: Platform can show "Automation Chains" view with stable IDs across scans.
Phase 3: Incremental Chain Updates (1 week)
Platform work:
- Implement event-driven chain updates (listen to
GraphEventstream) - Implement chain revalidation background job
- Add
change_historyarray to chains for audit trail
Result: Chains update incrementally without full re-sync.
Total Effort Estimate
| Phase | Component | Effort | Risk |
|---|---|---|---|
| Phase 1 | Connector hints | 2-3 days | Low |
| Platform schema | 2 hours | None | |
| Phase 2 | Chain assembler | 5 days | Medium (graph traversal complexity) |
| Chain tracker | 3 days | Low | |
| API endpoints | 2 days | Low | |
| Phase 3 | Event-driven updates | 3 days | Medium (incremental correctness) |
| Revalidation job | 2 days | Low | |
| Total | ~3-4 weeks | Medium |
Why Not Separate Collection?
Reason 1: Chains are derived, not source entities
Chains are computed from nodes + edges. They're not discovered by connectors — they're assembled by the platform.
Analogy: A "blast radius" query doesn't create a blast_radius collection — it's a traversal result. Same with chains.
Reason 2: Storage doesn't solve the identity problem
The question "is this the same chain?" isn't answered by having a separate collection. It's answered by having a stable chain ID algorithm.
Reason 3: Chains change faster than baselines
Chains can change multiple times per day (credential rotation, script updates). Storing them as versioned baselines bloats the database.
Better: Store chains as {chain_id, member_node_ids[], change_history[]} — lightweight tracking, heavy lifting is in the graph.
Files Produced
/Users/lucky/dev/securityv0/sv0-documentation/docs/analysis/2026-02-12-automation-classification/03-automation-persistence-integrator.md(this document)
Next Steps
- Architect review — Is the chain ID stability algorithm correct?
- Developer review — Does the platform schema design make sense?
- CEO decision — Phase 1 only? Or commit to full 3-phase implementation?
- If approved: Create ADR for chain tracking architecture