Skip to main content

Critical Architectural Review: SaaS Connector ETL Pipeline

Date: 2026-02-20 Reviewer: Staff+ Software Architect Focus: CISO Security Perspective (Execution-risk + Auditability) under SaaS Constraints

A. High-Level Architecture Map

Components:

  1. Source System (ServiceNow): Contains identities (sys_user), workloads (sys_script, sysauto_script), outbound integrations (sys_rest_message), and credentials (oauth_entity). Constraint: Read-only access via API. No custom instrumentation or structural changes allowed.
  2. Target System (Azure AI Foundry / Fabric): Ingress point for ServiceNow calls. Authenticates via Service Principals (SPN) / Managed Identities. Executes jobs, maintains agents. Constraint: Read-only access via API.
  3. Execution/Storage (SV0 Platform): Ingests the Normalized Graph via REST API (/api/v1/ingest/normalized-graph), applies diffs, evaluates rules, and generates SHA256-hashed evidence_packs.

SaaS Visibility Boundaries: As a SaaS provider, we are bounded by the default telemetry emitted by customer environments. We cannot force customers to inject X-Correlation-ID headers or alter their core logging levels simply to satisfy our graph linkage.


B. Execution-Path Walkthrough (SaaS Reality)

  1. Trigger: A ServiceNow Business Rule fires on a table event.
    • Evidence: SN sys_audit or sysevent tables (if enabled for the target table).
  2. Workload Execution: The Business Rule calls a Script Include, which invokes a sys_rest_message.
    • Evidence: Difficult to prove definitively without outbound HTTP logs. We rely on structural scraping (the script can call this endpoint) rather than runtime proof (the script did call this endpoint).
  3. Outbound Call (Egress) -> Azure Ingress: The REST Message uses an oauth_profile to get a token and calls Azure.
    • Evidence: Entra ID Sign-in logs for the corresponding Service Principal (client_id).
  4. Platform Ingestion: Azure Connector constructs the NormalizedGraph representing the Azure state; ServiceNow Connector constructs the NormalizedGraph representing the SN state.
  5. Correlation (The SaaS Hop): Because we lack a shared trace ID across the SN-to-Azure boundary, correlation is probabilistic based on configuration rather than deterministic based on execution. We match the SN OAuth Entity client_id to the Entra ID Service Principal appId.

C. Evidence Gaps & The "Surfaced Uncertainty" Strategy

Since we cannot mandate configuration changes, we must adopt an "Embrace and Surface Uncertainty" strategy. When execution evidence is weak, the platform must explicitly downgrade the "Evidence Confidence" score of that specific authority_path.

Evidence GapSaaS RealityMitigation Strategy (Platform UI & Logic)
1. No Outbound Trace IDWe cannot force SN scripts to send a trace ID to Azure. The exact user trigger cannot be cryptographically linked to the exact Azure execution.Surface the Gap: UI must display "Correlation: Structural (Implicit)" rather than "Deterministic". We prove the capability exists, not the exact causal chain per run.
2. Unreliable Outbound PayloadsCustomers rarely log full HTTP bodies out of ServiceNow due to PII/storage concerns. We cannot see what data was sent.Compensating Control: Focus on State Diffing. Compare the target state in Fabric before and after the timestamp of the Entra Service Principal login to infer what was changed.
3. Execution Fidelity LossWe know a script contains .setValue(), but we don't know if that code path actually triggered in production.Probabilistic Inference: Use time-window correlation. If SN sys_audit shows incident #123 updated at 10:01:00, and the Azure SPN logged in at 10:01:02, draw a dotted "Inferred Execution" edge in the graph.

D. Deterministic Correlation Strategy (Without Instrumentation)

Since we cannot rely on standard Span IDs propagated across customer systems, our correlation engine (correlator.py) must be hardened to use environmental anchors:

  1. Identity Anchoring (Strong): The primary linchpin is the OAuth client_id. If a ServiceNow oauth_entity holds the credential for Entra ID Application X, any execution in Azure by X provides a strong structural link back to the ServiceNow tenant.
  2. Temporal Slicing (Weak/Probabilistic):
    • Connector A (ServiceNow) pulls the sys_updated_on for business rules.
    • Connector B (Azure) pulls the last_sign_in_at for Service Principals.
    • If a business rule triggers, resulting in an outbound call, we attempt to match the SN event timestamp with the Entra sign-in timestamp (within a 5-minute sliding window).
    • UI Representation: These temporal links must be visually distinct (e.g., dashed lines) from deterministic structural links (solid lines) in the SV0 Platform UI.

E. SaaS-Grade ETL Data Correctness

We cannot control the source telemetry, but we must absolutely control our ingestion pipeline's integrity.

Run Manifests & Invariants: Every connector_syncs document requires a cryptographic manifest:

  • Inputs: SHA256 of the raw Extracted Graph received via /api/v1/ingest/normalized-graph.
  • Outputs: Entity/Edge counts, sum of execution_evidence nodes committed to the DB.
  • Invariant: nodesCreated + nodesUpdated + nodesDeleted == Total Discovered Nodes. If this invariant fails, the sync_ingestion queue worker must halt and mark the sync as degraded. We guarantee we didn't lose any data the customer did give us.

F. Threat Model: Execution Evidence Fraud

Threat: A malicious actor compromises the customer's ServiceNow instance and alters the sys_audit logs or script definitions to hide unauthorized calls to Azure Fabric. SaaS Defense:

  1. Immutability of the Target: We pull Entra ID sign-in logs and Azure Activity logs independently of ServiceNow. Even if SN logs are wiped, the Entra SP login and subsequent Fabric actions are recorded on the Microsoft side.
  2. Cross-Checking: The SV0 Platform flags anomalies. If Entra shows 500 logins for SPN 'A', but ServiceNow shows 0 business rule triggers for the associated OAuth entity, an "Evidence Discrepancy" alert is generated.

G. Recommendations & Product Roadmap

1. "Best Effort" Advisor Engine: Build an advisory module within the platform. When viewing an authority path, if evidence is weak, display a contextual recommendation: "To achieve deterministic tracing for this path, advise the customer to enable glide.outbound_http.log.body for REST message X." This turns a gap into a consulting opportunity rather than a technical blocker.

2. Visualizing "Confidence Scores": The NormalizedGraph schema must be updated to include an evidenceConfidence enum (e.g., STRUCTURAL, TEMPORAL_INFERRED, DETERMINISTIC). The frontend UI should use this score to color-code the reliability of the execution chains presented to the auditor.

3. State-Delta Inferencing: Shift focus away from perfect execution tracing. If we can't prove how the data was changed (due to lack of trace IDs), double down on proving what changed by ensuring the Diff Engine in the SV0 platform takes extremely high-fidelity snapshots of the source and target states.