Skip to main content

Critical Connector ETL Architecture Review (Execution Evidence & Determinism)

Date: 2026-02-20
Scope: ServiceNow (origin), Azure (transport/compute/identity/monitoring), Microsoft Fabric/Foundry (destination analytics/evidence path)
Review lens: Correctness + security evidence + auditability

Assessment

Current pipeline is not audit-grade deterministic end-to-end.
Chain-of-custody breaks at multiple hops: execution proof (ServiceNow outbound runtime), identity provenance (platform ingest auth), and destination lineage closure (Fabric stage evidence not present in reviewed code path).


A) High-Level Architecture Map

Components

  • ServiceNow discovery + chain inference in connector:
    • ../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py:785
    • ../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/core/transformer.py:1193
  • Azure/Foundry discovery in connector:
    • ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/core/discoverer.py:107
    • ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:533
  • Platform ingest + worker + Mongo:
    • src/api/routes/ingest.ts:145
    • src/workers/runtime.ts:26
    • src/workers/handlers/sync-ingestion.ts:37
  • Evidence pack integrity hashing:
    • src/evidence/integrity.ts:8

Trust Boundaries

  • Customer SaaS boundary (ServiceNow) -> connector runtime.
  • Connector runtime -> Azure APIs (Graph/ARM/Foundry) using client-secret credentials:
    • ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:126
  • Connector -> platform API over HTTP:
    • ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/platform_client.py:65
  • Platform API -> worker runtime -> MongoDB persistence.

Identity Boundaries

  • ServiceNow integration user identity.
  • Azure app/SP identity for discovery API calls.
  • Platform ingest identity fidelity is weak today:
    • API-key callers collapse to shared principal api-key-client:
      • src/api/middleware/auth.ts:55
    • Bearer JWT payload is decoded without signature verification:
      • src/api/middleware/auth.ts:113

Data Lineage Boundaries

  • Source records -> NormalizedGraph.
  • NormalizedGraph -> entities/events/evidence/paths/chains in Mongo:
    • src/workers/handlers/sync-ingestion.ts:75
    • src/workers/handlers/sync-ingestion.ts:78
    • src/workers/handlers/sync-ingestion.ts:130
  • No code-verified Fabric evidence-store write path in reviewed repos (hypothesis requiring environment validation).

B) Execution-Path Walkthrough (One Run)

  1. Trigger starts connector run
  • Principal: scheduler/operator (not persisted in platform sync record).
  • Correlation: random connector syncId.
    • ../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/core/transformer.py:163
    • ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/core/transformer.py:467
  1. ServiceNow discovery & chain reconstruction
  • Principal: ServiceNow integration user.
  • Authority checks: table ACLs / API role grants.
  • Runtime execution evidence:
    • Deterministic for some sources (sys_flow_context, sys_trigger).
    • Not deterministic for BR/SI runtime today:
      • ../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/core/transformer.py:1193
  • Correlation mechanism:
    • Structural scriptLIKE caller inference:
      • ../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py:687
  1. Azure/Foundry discovery
  • Principal: Azure app/SP via ClientSecretCredential.
    • ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:126
  • Execution evidence quality:
    • Aggregated/non-blocking run summaries; thread timestamp used as proxy:
      • ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:533
      • ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:572
  1. Connector submits graph to platform
  • Principal seen by platform:
    • API key -> shared principal (api-key-client) or
    • Bearer JWT subject from unverified token.
    • src/api/middleware/auth.ts:55
    • src/api/middleware/auth.ts:113
  • Request correlation:
    • Server generates per-request UUID, not propagated end-to-end:
    • src/api/middleware/request-id.ts:9
  1. Platform ingest + processing
  • Sync creation and multi-stage write:
    • src/workers/handlers/sync-ingestion.ts:27
    • src/workers/handlers/sync-ingestion.ts:75
    • src/workers/handlers/sync-ingestion.ts:130
  • Principal provenance is not persisted to ConnectorSyncDoc:
    • src/ingestion/transport/ingest-service.ts:45
    • src/domain/syncs/types.ts:41
  1. Destination analytics/evidence store closure
  • No Fabric destination chain implementation confirmed in reviewed code path (hypothesis; requires deployment/runtime validation).

C) Evidence Gaps (Ranked)

GapChain-of-Custody BreakImpactHow to Prove Gap Exists TodayMinimal FixBest Fix
JWT not verified; API-key identity collapsedSubmitter identity fidelityCritical (security + audit)src/api/middleware/auth.ts:113, src/api/middleware/auth.ts:55Verify JWT via JWKS; map each API key to unique principalmTLS/OIDC workload identity + immutable signed submitter claims in manifest
No deterministic SN BR/SI runtime proofExecution -> outbound callCritical (auditability)../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/core/transformer.py:1193Enable SN outbound logs (elevated) and ingest sys_outbound_http_logHybrid proof: outbound logs + Azure independent receipts + structural client_id chain
No end-to-end correlation propagationRun -> logs -> artifacts mappingCritical (determinism)src/api/middleware/request-id.ts:9; no request-id headers in connector clientAccept/propagate x-request-id, persist on sync docsFull trace context (traceparent) + stage span IDs + hash-linked manifests
In-memory queue + non-transactional writesPartial-run correctnessHighsrc/workers/runtime.ts:26; src/workers/handlers/sync-ingestion.ts:75Persist queue and retry state; explicitly emit partial statusStage checkpoint model with resumable idempotent commits
In-memory sync dedupeRetry/restart idempotencyHighsrc/ingestion/transport/ingest-service.ts:13DB-backed dedupe keyed by run and stageFull idempotency key strategy + run state machine
Execution evidence upsert overwrites prior proofHistorical provenanceHighsrc/storage/mongo/schema.ts:313; src/storage/mongo/adapters/execution-evidence-adapter.ts:54; stable Foundry summary ID at ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/core/transformer.py:403Include run/window in source_record_idAppend-only evidence events + separate summary rollups
Evidence row can persist with empty entity_id; synthetic timestampsOutput->input linkageHighsrc/ingestion/graph-transformer.ts:168; src/ingestion/graph-transformer.ts:207Reject/quarantine evidence with unresolved entity/timestampStrict evidence schema + DLQ + mandatory source hash
Schema drift tolerated (agent_run_summary not in platform union)Semantics consistencyMedium-highsrc/domain/evidence/types.ts:1; connector emits at ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/core/transformer.py:420Extend enum + validate inputsVersioned schema registry with compatibility gates
Time-window boundary nondeterminismReplay consistencyMedium-highDate-only SN cutoff ../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py:1552; Foundry proxy timestamps ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:572Explicit UTC window start/end persisted per runMonotonic watermark ledger + boundary replay tests
Fabric destination evidence chain not code-evidencedEnd-to-end closureMedium to critical (depends on deployment)No corresponding implementation in reviewed reposDocument current trust boundary explicitlyImplement destination receipts + partition/hash attestations

D) ServiceNow Outbound Evidence Plan

Evidence required from ServiceNow (per outbound call)

  • sys_id, created_on, response_time, endpoint_url, http_method, http_status, rest_message
  • Optional: payload/body hash fields if policy allows.

Reference requirement and field availability:

  • ../sv0-connectors/integrations/entra-servicenow/docs/architecture/execution-chain-discovery.md:619
  • ../sv0-connectors/integrations/entra-servicenow/docs/architecture/execution-chain-discovery.md:668

Minimum config/instrumentation changes

  1. Set glide.rest.outbound_log_level=elevated (or all if allowed).
    • ../sv0-connectors/integrations/entra-servicenow/docs/architecture/execution-chain-discovery.md:625
  2. Ensure connector read access to sys_outbound_http_log.
  3. Add connector collector for sys_outbound_http_log and emit execution evidence nodes with deterministic source_record_id.
  4. Persist source timestamps from SN logs exactly (no fallback-to-now).

Fallback if SN logging cannot be raised

  1. Add Azure-side receipt logs at ingress with canonical request hash, caller principal, timestamp, nonce.
  2. Require signed request envelopes from connector (HMAC/asymmetric signature).
  3. Correlate receipts with Entra sign-ins + SN structural client_id chain and mark confidence level explicitly.

E) Deterministic Verification Plan

Run Manifest Fields

  • run_id, parent_run_id, connector_id, connector_version, tenant_id
  • Trigger: trigger_type, trigger_source_id, triggered_at, trigger_principal
  • Submitter auth: auth_method, principal_id, credential_fingerprint, request_id, trace_id
  • Windows/watermarks: per-source window_start_utc, window_end_utc, watermark_before, watermark_after
  • Input artifacts: URI + sha256 + record counts
  • Stage attestations: stage input/output hashes, counts, timestamps, status, prior-stage hash
  • Outputs: entity/event/evidence/path/chain counts + artifact pointers
  • Integrity: detached signature + key ID

Invariants

  1. Same input hashes + connector version => same deterministic IDs.
  2. No execution evidence row with empty entity_id.
  3. Stage count equations reconcile with diff outputs.
  4. Watermarks are monotonic (except explicit replay mode).
  5. Every output artifact/partition is tagged with run_id and manifest hash.

Storage and Audit Replay

  • Hot index: manifest metadata in platform DB.
  • Immutable archive: WORM-capable object store/OneLake with retained manifests and detached signatures.
  • Auditor replay:
    1. Fetch manifest by run_id.
    2. Verify signature and stage hash chain.
    3. Recompute hashes from source artifacts.
    4. Re-run transform on pinned connector version.
    5. Compare outputs and invariants.

F) Recommendations Roadmap

Now (1-3 days)

  1. Harden ingest identity provenance: JWT signature verification + per-key principal IDs.
  2. Add/persist deterministic correlation fields (run_id, request_id) in sync records.
  3. Enable and ingest SN outbound evidence at elevated.

Next (1-2 weeks)

  1. Replace in-memory queue/dedupe with persisted retry-safe run state.
  2. Implement run manifest schema and stage hash chaining.
  3. Split append-only execution evidence from summary rollups.

Later

  1. Full execution graph with trace propagation across SN/Azure/Fabric.
  2. Detached signatures + immutable/WORM evidence archive.
  3. Independent verifier service for run truth reconstruction.

G) If You Only Fix 3 Things

  1. ServiceNow outbound runtime evidence (sys_outbound_http_log at elevated) and ingest it.
  2. Signed run manifests + end-to-end correlation propagation (run_id, trace_id, hash chain).
  3. Identity provenance hardening at ingest (verified JWT, unique caller identity, immutable submitter fields).

Validation Inputs Still Required

To convert remaining hypotheses into verified findings:

  1. ServiceNow logging/retention/ACL config snapshot for outbound/runtime tables.
  2. Azure diagnostics configuration and receipt logs for ingress/compute/storage.
  3. Fabric pipeline/job/lakehouse audit logs and partition naming standards.
  4. One complete real run packet with timestamps/IDs/artifact pointers/hashes across all hops.