Critical Connector ETL Architecture Review (Execution Evidence & Determinism)
Date: 2026-02-20
Scope: ServiceNow (origin), Azure (transport/compute/identity/monitoring), Microsoft Fabric/Foundry (destination analytics/evidence path)
Review lens: Correctness + security evidence + auditability
Assessment
Current pipeline is not audit-grade deterministic end-to-end.
Chain-of-custody breaks at multiple hops: execution proof (ServiceNow outbound runtime), identity provenance (platform ingest auth), and destination lineage closure (Fabric stage evidence not present in reviewed code path).
A) High-Level Architecture Map
Components
- ServiceNow discovery + chain inference in connector:
../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py:785../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/core/transformer.py:1193
- Azure/Foundry discovery in connector:
../sv0-connectors/integrations/azure-foundry/src/azure_foundry/core/discoverer.py:107../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:533
- Platform ingest + worker + Mongo:
src/api/routes/ingest.ts:145src/workers/runtime.ts:26src/workers/handlers/sync-ingestion.ts:37
- Evidence pack integrity hashing:
src/evidence/integrity.ts:8
Trust Boundaries
- Customer SaaS boundary (ServiceNow) -> connector runtime.
- Connector runtime -> Azure APIs (Graph/ARM/Foundry) using client-secret credentials:
../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:126
- Connector -> platform API over HTTP:
../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/platform_client.py:65
- Platform API -> worker runtime -> MongoDB persistence.
Identity Boundaries
- ServiceNow integration user identity.
- Azure app/SP identity for discovery API calls.
- Platform ingest identity fidelity is weak today:
- API-key callers collapse to shared principal
api-key-client:src/api/middleware/auth.ts:55
- Bearer JWT payload is decoded without signature verification:
src/api/middleware/auth.ts:113
- API-key callers collapse to shared principal
Data Lineage Boundaries
- Source records ->
NormalizedGraph. NormalizedGraph-> entities/events/evidence/paths/chains in Mongo:src/workers/handlers/sync-ingestion.ts:75src/workers/handlers/sync-ingestion.ts:78src/workers/handlers/sync-ingestion.ts:130
- No code-verified Fabric evidence-store write path in reviewed repos (hypothesis requiring environment validation).
B) Execution-Path Walkthrough (One Run)
- Trigger starts connector run
- Principal: scheduler/operator (not persisted in platform sync record).
- Correlation: random connector
syncId.../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/core/transformer.py:163../sv0-connectors/integrations/azure-foundry/src/azure_foundry/core/transformer.py:467
- ServiceNow discovery & chain reconstruction
- Principal: ServiceNow integration user.
- Authority checks: table ACLs / API role grants.
- Runtime execution evidence:
- Deterministic for some sources (
sys_flow_context,sys_trigger). - Not deterministic for BR/SI runtime today:
../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/core/transformer.py:1193
- Deterministic for some sources (
- Correlation mechanism:
- Structural
scriptLIKEcaller inference:../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py:687
- Structural
- Azure/Foundry discovery
- Principal: Azure app/SP via
ClientSecretCredential.../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:126
- Execution evidence quality:
- Aggregated/non-blocking run summaries; thread timestamp used as proxy:
../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:533../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:572
- Aggregated/non-blocking run summaries; thread timestamp used as proxy:
- Connector submits graph to platform
- Principal seen by platform:
- API key -> shared principal (
api-key-client) or - Bearer JWT subject from unverified token.
src/api/middleware/auth.ts:55src/api/middleware/auth.ts:113
- API key -> shared principal (
- Request correlation:
- Server generates per-request UUID, not propagated end-to-end:
src/api/middleware/request-id.ts:9
- Platform ingest + processing
- Sync creation and multi-stage write:
src/workers/handlers/sync-ingestion.ts:27src/workers/handlers/sync-ingestion.ts:75src/workers/handlers/sync-ingestion.ts:130
- Principal provenance is not persisted to
ConnectorSyncDoc:src/ingestion/transport/ingest-service.ts:45src/domain/syncs/types.ts:41
- Destination analytics/evidence store closure
- No Fabric destination chain implementation confirmed in reviewed code path (hypothesis; requires deployment/runtime validation).
C) Evidence Gaps (Ranked)
| Gap | Chain-of-Custody Break | Impact | How to Prove Gap Exists Today | Minimal Fix | Best Fix |
|---|---|---|---|---|---|
| JWT not verified; API-key identity collapsed | Submitter identity fidelity | Critical (security + audit) | src/api/middleware/auth.ts:113, src/api/middleware/auth.ts:55 | Verify JWT via JWKS; map each API key to unique principal | mTLS/OIDC workload identity + immutable signed submitter claims in manifest |
| No deterministic SN BR/SI runtime proof | Execution -> outbound call | Critical (auditability) | ../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/core/transformer.py:1193 | Enable SN outbound logs (elevated) and ingest sys_outbound_http_log | Hybrid proof: outbound logs + Azure independent receipts + structural client_id chain |
| No end-to-end correlation propagation | Run -> logs -> artifacts mapping | Critical (determinism) | src/api/middleware/request-id.ts:9; no request-id headers in connector client | Accept/propagate x-request-id, persist on sync docs | Full trace context (traceparent) + stage span IDs + hash-linked manifests |
| In-memory queue + non-transactional writes | Partial-run correctness | High | src/workers/runtime.ts:26; src/workers/handlers/sync-ingestion.ts:75 | Persist queue and retry state; explicitly emit partial status | Stage checkpoint model with resumable idempotent commits |
| In-memory sync dedupe | Retry/restart idempotency | High | src/ingestion/transport/ingest-service.ts:13 | DB-backed dedupe keyed by run and stage | Full idempotency key strategy + run state machine |
| Execution evidence upsert overwrites prior proof | Historical provenance | High | src/storage/mongo/schema.ts:313; src/storage/mongo/adapters/execution-evidence-adapter.ts:54; stable Foundry summary ID at ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/core/transformer.py:403 | Include run/window in source_record_id | Append-only evidence events + separate summary rollups |
Evidence row can persist with empty entity_id; synthetic timestamps | Output->input linkage | High | src/ingestion/graph-transformer.ts:168; src/ingestion/graph-transformer.ts:207 | Reject/quarantine evidence with unresolved entity/timestamp | Strict evidence schema + DLQ + mandatory source hash |
Schema drift tolerated (agent_run_summary not in platform union) | Semantics consistency | Medium-high | src/domain/evidence/types.ts:1; connector emits at ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/core/transformer.py:420 | Extend enum + validate inputs | Versioned schema registry with compatibility gates |
| Time-window boundary nondeterminism | Replay consistency | Medium-high | Date-only SN cutoff ../sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py:1552; Foundry proxy timestamps ../sv0-connectors/integrations/azure-foundry/src/azure_foundry/adapters/foundry_client.py:572 | Explicit UTC window start/end persisted per run | Monotonic watermark ledger + boundary replay tests |
| Fabric destination evidence chain not code-evidenced | End-to-end closure | Medium to critical (depends on deployment) | No corresponding implementation in reviewed repos | Document current trust boundary explicitly | Implement destination receipts + partition/hash attestations |
D) ServiceNow Outbound Evidence Plan
Evidence required from ServiceNow (per outbound call)
sys_id,created_on,response_time,endpoint_url,http_method,http_status,rest_message- Optional: payload/body hash fields if policy allows.
Reference requirement and field availability:
../sv0-connectors/integrations/entra-servicenow/docs/architecture/execution-chain-discovery.md:619../sv0-connectors/integrations/entra-servicenow/docs/architecture/execution-chain-discovery.md:668
Minimum config/instrumentation changes
- Set
glide.rest.outbound_log_level=elevated(orallif allowed).../sv0-connectors/integrations/entra-servicenow/docs/architecture/execution-chain-discovery.md:625
- Ensure connector read access to
sys_outbound_http_log. - Add connector collector for
sys_outbound_http_logand emit execution evidence nodes with deterministicsource_record_id. - Persist source timestamps from SN logs exactly (no fallback-to-now).
Fallback if SN logging cannot be raised
- Add Azure-side receipt logs at ingress with canonical request hash, caller principal, timestamp, nonce.
- Require signed request envelopes from connector (HMAC/asymmetric signature).
- Correlate receipts with Entra sign-ins + SN structural
client_idchain and mark confidence level explicitly.
E) Deterministic Verification Plan
Run Manifest Fields
run_id,parent_run_id,connector_id,connector_version,tenant_id- Trigger:
trigger_type,trigger_source_id,triggered_at,trigger_principal - Submitter auth:
auth_method,principal_id,credential_fingerprint,request_id,trace_id - Windows/watermarks: per-source
window_start_utc,window_end_utc,watermark_before,watermark_after - Input artifacts: URI +
sha256+ record counts - Stage attestations: stage input/output hashes, counts, timestamps, status, prior-stage hash
- Outputs: entity/event/evidence/path/chain counts + artifact pointers
- Integrity: detached signature + key ID
Invariants
- Same input hashes + connector version => same deterministic IDs.
- No execution evidence row with empty
entity_id. - Stage count equations reconcile with diff outputs.
- Watermarks are monotonic (except explicit replay mode).
- Every output artifact/partition is tagged with
run_idand manifest hash.
Storage and Audit Replay
- Hot index: manifest metadata in platform DB.
- Immutable archive: WORM-capable object store/OneLake with retained manifests and detached signatures.
- Auditor replay:
- Fetch manifest by
run_id. - Verify signature and stage hash chain.
- Recompute hashes from source artifacts.
- Re-run transform on pinned connector version.
- Compare outputs and invariants.
- Fetch manifest by
F) Recommendations Roadmap
Now (1-3 days)
- Harden ingest identity provenance: JWT signature verification + per-key principal IDs.
- Add/persist deterministic correlation fields (
run_id,request_id) in sync records. - Enable and ingest SN outbound evidence at
elevated.
Next (1-2 weeks)
- Replace in-memory queue/dedupe with persisted retry-safe run state.
- Implement run manifest schema and stage hash chaining.
- Split append-only execution evidence from summary rollups.
Later
- Full execution graph with trace propagation across SN/Azure/Fabric.
- Detached signatures + immutable/WORM evidence archive.
- Independent verifier service for run truth reconstruction.
G) If You Only Fix 3 Things
- ServiceNow outbound runtime evidence (
sys_outbound_http_logatelevated) and ingest it. - Signed run manifests + end-to-end correlation propagation (
run_id,trace_id, hash chain). - Identity provenance hardening at ingest (verified JWT, unique caller identity, immutable submitter fields).
Validation Inputs Still Required
To convert remaining hypotheses into verified findings:
- ServiceNow logging/retention/ACL config snapshot for outbound/runtime tables.
- Azure diagnostics configuration and receipt logs for ingress/compute/storage.
- Fabric pipeline/job/lakehouse audit logs and partition naming standards.
- One complete real run packet with timestamps/IDs/artifact pointers/hashes across all hops.