Skip to main content

Execution Evidence Fidelity

1. Executive Summary

SecurityV0 tells customers "this workload executed N times in the last 30 days." N gates dormant_authority, unproven_execution, and risk-cluster severity (src/evaluator/rules/dormant-authority.ts:52-64, src/evaluator/rules/unproven-execution.ts:26, src/services/risk-cluster-service.ts:500). It is also one of the hardest numbers in the product: every connector produces N from a different source, most sources measure something subtly different from "the thing actually ran once," and several sources require licensed features the customer may not have enabled.

We propose a four-tier fidelity model — GROUND_TRUTH, HIGH_FIDELITY, APPROXIMATE, UNAVAILABLE — and score each shipping connector. Today every production execution count is either APPROXIMATE or UNAVAILABLE; zero GROUND_TRUTH paths exist across the three connectors. The thread-count proxy used for Foundry today can be 0.1× to 3× of the true call count, and 1:N per-agent. The canonical Sergey scenario lands at roughly 1.14× (our 8 vs Azure Monitor's 7) because that specific workload does one thread = one completion = no tools = single active agent. Any customer workload using tools, batched completions, or multiple agents per project will diverge by at least 2× in either direction. Do not cite the Sergey match as a general accuracy claim.

ConnectorToday's tierMethodWhat it countsGround-truth source we'd match against
Entra → ServiceNowAPPROXIMATEmixed: per-record (Flow/Job), proxy-derived (BR/SI via trigger records), per-record (sys_log opt-in)Sign-ins (P1/P2 only), Flow/Job exec records, BR trigger-record proxies, sys_log (when opted in)sys_script_execution_history, sys_audit, Azure Monitor per-app metrics
Azure AI FoundrySTRUCTURAL_ONLY (multi-agent) / APPROXIMATE (single-agent)summary-read (project-wide threads, NOT per-agent)Threads opened in the project, counted as a proxy for agent runs — same number on every agent in the projectAzure Monitor TotalCalls per deployment, /threads/{id}/runs filtered by assistant_id
AWSUNAVAILABLE (zero emitted today on demo)none (no Lambda Invocations metric fallback; CloudTrail disabled on demo)CloudTrail S3 archive when configured; CloudWatch Invocations fallback NOT available todayCloudWatch AWS/Lambda Invocations, CloudTrail Lake, S3 trail

The honest reading: our execution numbers are useful directional signals that get meaningfully better as customers enable standard telemetry (Entra P1/P2, CloudTrail, Azure Monitor diagnostic settings), and meaningfully worse when they don't.

Field-engineer cheat sheet

One-page summary for anyone walking into a customer call. If you read nothing else in this doc, read this table and Section 8.

ConnectorTierMethodWhat to say on the callWhat NOT to say
Entra-SN Flow/JobHIGH_FIDELITYper-record enumeration (sys_flow_context, sys_trigger)"We match sys_flow_context 1:1""We count every BR execution"
Entra-SN BR/SIAPPROXIMATEproxy-derived (trigger records on monitored tables)"We count trigger records as a proxy"Anything implying determinism
FoundrySTRUCTURAL_ONLY (multi-agent) / APPROXIMATE (single-agent)summary-read (project-wide threads, not per-agent)"We count threads project-wide, not per-agent""We count Foundry executions per agent"
AWS LambdaUNAVAILABLE without CloudTrailnone emitted today"Enable CloudTrail for this to work""The Lambda is dormant"
AWS any without CloudTrailUNAVAILABLEnone"We cannot tell without CloudTrail"Any number

Recent platform fixes that change what you can say:

  • sv0-platform #500 (merged 2026-04-23) — per-path exec_30d is now destination-scoped on Azure Cognitive Services (Foundry) paths; the BR-SI-SP double/triple-count bug filed as #465 is fixed for AWS + Azure Cognitive Services destinations. ServiceNow and Entra destinations still rely on the entity-only fallback pending sv0-connectors #91.
  • sv0-platform #502 (merged 2026-04-23) — non-AWS authority-path UI rows now render "recent activity (events)" instead of "executions" when the evidence tier is below GROUND_TRUTH. The worst of the label problem is dampened but the underlying number remains a proxy.

2. Terminology

Product copy uses these words interchangeably today. They are not interchangeable.

  • Execution — "the workload ran to completion and produced its side-effect." For a ServiceNow BR, one script evaluation. For a Foundry agent, one run record. For a Lambda, one Invoke the runtime returned. No system-neutral source exists; every platform exposes a different proxy.
  • Call ≠ Invocation ≠ Run — these three are NOT interchangeable and conflating them is the single most common mistake in demo copy. run is a Foundry agent execution (one POST /threads/{id}/runs). Invocation is AWS's word for a Lambda that was called (one Invoke the runtime returned). call is a raw API call to the backing service (e.g. Azure Monitor's TotalCalls counts chat-completion HTTPs regardless of caller). One agent run can produce many calls (tool loops); one Lambda Invocation is one call; neither run nor Invocation equals thread.
  • Call — an API call to the backing service. Azure Monitor's TotalCalls on a Foundry deployment counts chat-completion HTTPs regardless of caller. One agent execution can produce many calls (tool loops).
  • Trigger — an external event that could have caused an execution. ServiceNow incident.insert triggers the Auto-route identity tickets BR. Observing the trigger is evidence the BR likely ran, not proof.
  • Invocation — AWS's word for "a Lambda was called." Equivalent to execution when read from Invocations metric or Invoke CloudTrail events — which we do not read today.
  • Sign-in — Entra's record of a token being issued. An SP with token caching signs in once per token lifetime (60-90 min) and serves many calls. Sign-ins are a lower bound on call volume.
  • Token request — exchanging credentials for a token. For service principals in Entra this equals sign-in; for OAuth clients that rotate per-call it diverges.

When a UI surface says "executions" we mean the first. Every derivation from the other four is a proxy.


3. Fidelity Tier System

Four tiers. No sub-tiers.

Tier 1 — GROUND_TRUTH

  • Source: Platform emits a per-execution record with unique ID, timestamp, and attributable principal. We read every record in the window.
  • Accuracy: ±1% (covers clock-skew and in-flight delivery).
  • Example we would score this: CloudTrail Invoke events for AWS Lambda — one record per invocation with userIdentity, eventTime, and full request context. Each event is per-execution, not an aggregated metric. No shipping connector operates at this tier today (CloudTrail is disabled on every demo tenant to date). TotalCalls on Azure Monitor is a metric, not per-execution records — it lands between Tier 1 and Tier 2 depending on how it is read; we treat it as GROUND_TRUTH only for the "how many calls hit the deployment" question, not for "who called" or per-execution attribution.
  • Failure modes: Record loss in delivery-delay windows; out-of-region events invisible; per-event volume can hit billing limits on chatty services.

Tier 2 — HIGH_FIDELITY

  • Source: Deterministic execution log exists and is read, but either (a) attribution is to the workload not the triggering identity, or (b) aggregation is done upstream (summary endpoints). Counts are correct; attribution may be lossy.
  • Accuracy: ±10%, with known direction of error (overcount from retry-duplication; undercount from upstream sampling).
  • Example: ServiceNow sys_flow_context per flow — one record per run, reliable timestamps, attributed to the flow but not the triggering user (entra-servicenow/src/entra_servicenow/core/transformer.py:2040-2082). sys_log structured evidence (opt-in, transformer.py:2182-2241) also lives here.
  • Failure modes: Attribution gap; records beyond platform retention invisible.

Tier 3 — APPROXIMATE

  • Source: No per-execution log read. Count derived from a related signal — threads opened as proxy for agent runs, trigger records as proxy for BR evaluations, sign-ins as proxy for API calls.
  • Accuracy: Error bound ranges from 0.1× to 3×+ depending on tool use, batched completions, and multi-agent projects. Useful for "is this used at all," not for "how often." A thread with 3 tool-use turns + 1 completion produces 4 create_completions API calls but counts as 1 thread — a 4× underestimate. A project with 5 agents sharing 1 thread over-reports by 5× per agent. The 2026-04-20 result (our 8 vs TotalCalls=7) is a single-workload coincidence: one completion per thread, no tool use, single agent active. This does not generalize.
  • Example: Foundry run_count_30d = count of threads with created_at in the 30d window, project-wide, NOT filtered by agent (azure-foundry/src/azure_foundry/adapters/foundry_client.py:611-724). See the self-authored commentary at lines 30-46 of that file.
  • Failure modes: Systematic bias in either direction; silent divergence as usage patterns shift (e.g., an agent adopting long-running tool-use sessions will diverge further from its thread count).

Tier 4 — UNAVAILABLE

  • Source: None read. Customer hasn't enabled the prerequisite (license, diagnostic setting, trail), or API returned 403, or the service isn't configured. Connector emits zero evidence; platform shows execution_30d = 0.
  • Accuracy: Structurally wrong. Zero means "we don't know," not "it hasn't run."
  • Example: AWS CloudTrail on dev-scan-2026-04-20.jsonaws_cloudtrail.status = "unavailable_not_enabled" (sv0-connectors/integrations/aws/reports/dev-scan-2026-04-20.json). Every workload in that scan reports zero.
  • Failure mode: The critical one — a customer's high-risk workload running thousands of times per day still shows execution_30d = 0, firing dormant_authority as a false positive.

Rule: UNAVAILABLE MUST be reported via evidenceCompleteness.sources.*.status (see azure-foundry/src/azure_foundry/core/discoverer.py:200-269). The UI should suppress dormant-authority findings when the relevant source is unavailable_not_enabled. This suppression is not yet implemented — tracked as a Roadmap item below.


4. Per-Connector Deep Dive

4.1 Entra-ServiceNow connector

Code: sv0-connectors/integrations/entra-servicenow/

Sources read today

Five execution-evidence sources, each at a different tier. They compose into the execution_count_30d property on ServiceNow workload nodes.

#SourceReaderEvidence confidence (connector claim)Tier
1Entra /auditLogs/signInsadapters/azure_client.py:169-211Structural sign-in recordAPPROXIMATE
2ServiceNow sys_flow_contextemitted via _add_flow_execution_node, core/transformer.py:2040-2082DETERMINISTIC for flowsHIGH_FIDELITY
3ServiceNow sys_trigger_add_job_execution_node, transformer.py:2085-2122DETERMINISTIC for scheduled jobsHIGH_FIDELITY
4ServiceNow trigger-record examples (e.g. incident, change_request)_add_trigger_record_node, transformer.py:2125-2180TEMPORAL_INFERREDAPPROXIMATE
5ServiceNow sys_log structured entries_add_syslog_evidence_node, transformer.py:2182-2241DETERMINISTIC (opt-in)HIGH_FIDELITY

How execution_count_30d is computed

Critical junction at transformer.py:1496-1537:

  • For flows / scheduled jobs with records in sys_flow_context / sys_trigger, execution_count_30d is the record count — HIGH_FIDELITY.
  • For BRs and Script Includes (no native execution table exposed to REST until sys_audit ships), the count comes from _evidence_count_by_workload — a map incremented once per trigger-record example emitted (transformer.py:2177-2180). A BR monitoring incident.insert gets execution_count_30d = (recent incident records observed) — a trigger-record proxy.
  • A Script Include with no direct evidence but CALLED by a BR that does inherits the caller's count (transformer.py:1514-1531).

Trigger-record propagation is where the platform's 8 for servicenow-openai-client on 2026-04-20 largely comes from. The UI number is the path-materializer rollup across ALL evidence nodes touching that principal (Graph-leg BR triggers + Foundry-leg summary), summed in sv0-platform/src/ingestion/authority-path-materializer.ts:262-336. It is not a count of one system's records.

Ground-truth comparison (canonical Sergey scenario, 2026-04-20)

  • sys_script_execution_history for the BR: 21 records, all 2026-04-20, polluted by developer/test runs.
  • incident updates (upper bound of what could trigger the BR): 14.
  • Platform Peak Executions 30D, Foundry-leg path: 8.
  • Azure Monitor TotalCalls on create_completions, gpt-nano-for-summary: 7.

The 8 is not a direct sum of any of these. It's the path-materializer's rollup of evidence nodes attached to the workload or its RUNS_AS targets — combining BR trigger-record evidence (5 in the scan report, but platform-side propagation through RUNS_AS changes the count) with Foundry agent-run-summary evidence. Landing within ±1 of Azure Monitor truth here is coincidence, not guarantee.

Customer-side requirements

SourceRequires
Sign-in logsEntra ID P1 or P2 licence (Microsoft.com MSRP: ~$6/user/month P1). azure_client.py:170 documents this.
sys_flow_contextRead access to sys_flow_context; Flow Designer flows enabled. Default-on for modern ServiceNow.
sys_triggerRead access to sys_trigger. Default-on.
Trigger recordsRead access to the BR's monitored tables (may include incident, change_request, PII tables) — often the hardest permission to get in a customer demo.
sys_logThe customer's scripts must emit structured log entries (gs.error('[SI_NAME] START ...')). We detect the pattern; we don't write it. Most customers don't have it.

Known discrepancies

  • Sign-ins count token issuances, not calls. An SP with SDK token caching signs in once per token lifetime and serves many calls. Our sign-in count is a lower bound.
  • Trigger-record evidence is not execution proof. Five incident.insert events prove the monitored table saw traffic — not that the BR ran (condition evaluation may have skipped it). Flagged explicitly with confidence: "TEMPORAL_INFERRED" + proof_notes (transformer.py:2161-2166).
  • Propagated evidence double-counts — previously live, fixed 2026-04-23. A BR with 5 trigger records calling two SIs gave each SI 5 inherited; a risk cluster containing the BR + both SIs summed 15 for 5 real events. Pre-fix, risk-cluster-service.ts performed a raw total_execution_30d += path.current_state.execution_30d with no dedupe on source event, evidence node, or resource_key. authority-path-materializer.ts summed across [workload._id, ...runsAsTargets], so a BR already counted on the workload was counted again under each RUNS_AS SP. A cluster containing BR + Script Include + Service Principal (a BR-SI-SP triangle) triple-counted the same source events. Filed as sv0-platform #465 and fixed by PR #500 (merged 2026-04-23, commit e0c05d6). Post-fix: per-path evidence dedupes by source_record_id (sumDistinctExecutionEvidenceCount at risk-cluster-service.ts:574), and aggregation scopes to destination_resource_key on paths whose destination has a derivable canonical resource key. AWS and Azure Cognitive Services destinations now scope correctly; ServiceNow and Entra destinations remain on the entity-only fallback until connector-side target_resource_key emission lands (sv0-connectors #91). On those remaining paths the pre-#500 over-report characterization still applies directionally, though #500 at minimum deduplicates per source event.
  • execution_count_30d is workload-wide, not per-resource. authority-path-materializer.ts:273-280 acknowledges this as intentional v1 fallback (limitation P1-1).

Failure-mode reporting

When P1/P2 is missing, the connector correctly emits execution_evidence.status = "unavailable_no_access" + note "Azure AD P1/P2 license required for signIns API" (transformer.py:2309-2313). ServiceNow permission failures on sys_flow_context, however, surface as silent-zero counts rather than unavailable_no_access — a gap.

Current tier and upgrade path

  • Flow / scheduled-job paths: HIGH_FIDELITY.
  • BR / SI paths: APPROXIMATE. Upgrade to HIGH_FIDELITY requires sys_audit integration (Phase 2 of the BR/SI evidence plan).
  • Overall connector tier: APPROXIMATE (lowest common denominator).

4.2 Azure Foundry connector

Code: sv0-connectors/integrations/azure-foundry/

Tier: STRUCTURAL_ONLY by default; APPROXIMATE only for verified single-agent projects. Previous drafts scored this connector APPROXIMATE across the board — that was wrong for multi-agent projects (the realistic case) and has been corrected here.

FIELD WARNING — READ BEFORE ANY CUSTOMER DEMO WITH MULTIPLE AGENTS: Every agent in a Foundry project reports the same execution_count_30d — the count is project-wide, not per-agent. The root cause is at azure-foundry/src/azure_foundry/core/transformer.py:287-290: the connector assigns exec_count_30d = run_summary.run_count_30d directly onto every agent workload node with no per-agent filter, because foundry_client.py:611-724 does not filter threads by assistant_id. A customer with 5 agents in a project will see all 5 rows reporting the same N — a factor-5 over-report for the least-active agents, factor-5 under-report for the most-active. On the 2026-04-20 dev scan every agent reported execution_count_30d = 1 from a single project-wide thread.

ui/src/pages/AuthorityPathsListPage.tsx:642 renders the "Peak Executions (30d)" column from this value. sv0-platform #502 (merged 2026-04-23) now renders non-AWS paths as "recent activity (events)" instead of "executions" when the evidence tier is below GROUND_TRUTH — so the worst of the label problem is dampened on Foundry rows as of 2026-04-23 — but the underlying number remains project-wide, not per-agent. Do not walk a customer through multi-agent Foundry views without explaining this first.

For multi-agent Foundry projects: use this data to detect whether the project is used at all — not to quantify per-agent usage. See the tier guidance below.

Sources read today

Only one: the /agents/v1.0/threads endpoint (new /threads on AIFoundry endpoints). The connector calls it per-project and treats each thread's created_at as a proxy for an agent run. The source comment at adapters/foundry_client.py:20-46 should be read verbatim by anyone writing demo copy — it pre-empts every misleading claim we might otherwise make.

How the count is computed

get_agent_run_summary (foundry_client.py:611-724):

  1. Enumerate project threads, paginated.
  2. Filter to threads with created_atnow - 30d.
  3. Return run_count_30d = number of qualifying threads, last_run_at = max created_at.
  4. transformer.py:287-290 writes execution_count_30d = run_summary.run_count_30d directly onto every agent workload node in the project with no per-agent filter (because foundry_client.py:611-724 does not filter threads by assistant_id). One aggregate execution_evidence node per agent (DQ-4 summary-first).

Three named biases the code itself acknowledges:

  • The count is project-wide and NOT filtered by assistant_id. Every agent in the same project gets the same thread count — a factor-N over-report for the least-active agents and a factor-N under-report for the most-active. In practice on the 2026-04-20 dev scan, all five agents reported run_count_30d = 1, reflecting one thread created across the project.
  • A thread can have zero runs inside it (client opened a session, never asked anything). Each still counts.
  • A thread can have many runs inside it. It still counts once.

Ground-truth comparison (same scenario as 4.1)

  • Our derived count (path-materialized, rolled up): 8.
  • Azure Monitor TotalCalls, create_completions, gpt-nano-for-summary: 7.

This alignment is a coincidence, not a guarantee. The caller makes one completion per thread, no tool use, one thread per inbound incident — the most favorable possible scenario. Real-world divergence: a single thread with 3 tool-use turns + 1 completion = 4 Azure Monitor TotalCalls but 1 thread in our count (0.25× ratio). A project with 5 agents but 1 active thread = all 5 agents report 8, true per-agent count is roughly 8/5 (5× over-report). Do not cite the 8 vs 7 match as evidence of ±14% accuracy — the error bound in production is 0.1× to 3×+.

Customer-side requirements

SourceRequires
Threads endpoint (/agents/v1.0/threads)Azure AI User or Azure AI Project Manager role on the Foundry project. Data-plane token. Default not granted — our own demo app registration needed an explicit grant.
Azure Monitor TotalCalls (would-be GROUND_TRUTH source)Diagnostic setting on the Foundry deployment routing AllMetrics to a workspace we read. Not configured today in the demo tenant, and not read by the connector today.
Entra sign-in logs for the SP (attribution)Entra P1/P2 license. The Foundry connector's evidence_completeness.sign_in_logs.status = "unavailable_not_enabled" on the 2026-04-20 dev scan is exactly this.

Known discrepancies

  • Project-wide, not agent-specific. Five agents with one active thread all report execution_count_30d = 1 — ratio of true-to-reported error is N_agents:1.
  • Threads ≠ runs. See above.
  • No chat-completion call count. A 30-call multi-tool session shows 1. TotalCalls shows 30.
  • No attribution. Cannot answer "who opened this thread" without joining sign-in logs on SP object_id.

Failure-mode reporting

Correctly scored. discoverer.py:256-269 distinguishes available / partial / unavailable_no_access / unavailable_not_applicable / unavailable_not_enabled. 403 on threads → empty summary + degraded status. No agents → unavailable_not_applicable, not unavailable_no_access — correct.

Current tier and upgrade path

  • Current: APPROXIMATEwith a critical caveat for multi-agent projects. Do NOT quote per-agent execution counts from Foundry projects with more than one agent — the counts are not agent-attributed. For single-agent projects, APPROXIMATE (directional signal, may be off by 0.1× to 3×+). For multi-agent projects: treat as STRUCTURAL ONLY — use to detect whether the project is used at all, not to quantify per-agent usage.
  • HIGH_FIDELITY: fetch /threads/{id}/runs per thread, filter by assistant_id, aggregate runs with status. ~3 days. Gap G03/G04 in ETL plan.
  • GROUND_TRUTH: add Azure Monitor TotalCalls query per deployment + sign-ins cross-reference for attribution. ~1 week. Requires customer diagnostic setting; we should fire a specific onboarding_gap finding when absent.

4.3 AWS connector

Code: sv0-connectors/integrations/aws/

Sources read today

Structurally: IAM, Lambda config, ECS task defs, Bedrock agents, Secrets inventory. Execution evidence: only CloudTrail, read from S3 archive (extractors/cloudtrail_extractor.py).

Lambda does not fetch CloudWatch Invocations. The Lambda extractor reads list_functions, get_function, event source mappings, and resource policies — not GetMetricStatistics. When CloudTrail is disabled, a second APPROXIMATE data source could be added — but cloudwatch:GetMetricStatistics / cloudwatch:GetMetricData are NOT in the managed IAM policy today (sv0-connectors/integrations/aws/cfn/securityv0-readonly-role.yaml, 353 lines, zero cloudwatch:* statements). Enabling this fallback is therefore not a connector-only change: it requires (a) updating the CFN template to add the CloudWatch actions, (b) every customer re-deploying the CFN stack to widen their trust-policy grant, and (c) connector code to call the API. Until (a) + (b) ship, calling CloudWatch from the connector will 403 on every customer role.

How the count is computed (when CloudTrail IS enabled)

CloudTrailExtractor.extract_for_workload (cloudtrail_extractor.py:193-265):

  1. Walk S3 prefix AWSLogs/<account>/CloudTrail/<region>/<YYYY>/<MM>/<DD>/ for the 30-day window.
  2. Decompress, parse, iterate Records.
  3. Filter to event names the workload cares about (Invoke for Lambda, RunTask for ECS, InvokeModel for Bedrock — cloudtrail_extractor.py:53-81).
  4. Match on exact ARN, or on userIdentity.sessionContext.sessionIssuer.arn == role_arn for non-Invoke calls produced by the workload's role (S3 GetObject, KMS Decrypt, etc.).
  5. Emit one CloudTrailEvidence per matching event.

When the trail exists, this is the only path in the codebase producing per-execution attribution (events carry principalId + sessionIssuer). Hypothetically, with full CloudTrail coverage, Lambda paths score GROUND_TRUTH for direct invocations and HIGH_FIDELITY for indirect role-mediated evidence. We have never achieved this on any demo tenant.

Ground-truth comparison

Impossible today: aws_cloudtrail.status = "unavailable_not_enabled" on both dev-scan-2026-04-20.json and prod-post-461-aws.json. Every AWS workload reports execution_count_30d = 0. No empirical ±% — only structural analysis.

Customer-side requirements

SourceRequires
CloudTrail (per-account)Trail configured in every region of interest, writing to an S3 bucket we read. ~$2/100k events typical cost; free trail allowed one per account but with 90-day UI retention.
CloudTrail (org trail)Organization-level trail with AWSLogs/<org_id>/<account_id>/CloudTrail/... prefix layout. Connector supports this via CLOUDTRAIL_LAYOUTS (cloudtrail_extractor.py:102-115).
CloudWatch Invocations metric (potential fallback)IAM cloudwatch:GetMetricData / cloudwatch:GetMetricStatistics. NOT in cfn/securityv0-readonly-role.yaml today — enabling this fallback requires (a) a CFN template update to add the CloudWatch actions, (b) customer re-deploys the CloudFormation stack, (c) connector code to call the API. Every existing deployed role must be re-onboarded.
CloudTrail Lake (long-window queries)Event data store configured. Not currently supported.
Cross-account S3 readThe trail's S3 bucket must be readable by our scanning principal. Common gotcha.

Known discrepancies

  • Zero-not-unknown. CloudTrail-off emits zero evidence + scan-level unavailable_not_enabled, but each workload's execution_count_30d silently reads 0 — indistinguishable from an idle workload with trail enabled. No path-level degraded-confidence flag.
  • Attribution is per-call, not aggregated. Unlike Foundry summary-first, each CloudTrail evidence is a full event. Great for attribution, but 30 days × chatty Lambda can hit millions. CloudTrailScanBudgetExceededError (cloudtrail_extractor.py:118-123) caps it.
  • Non-Invoke attribution requires role_arn. Without it, non-Invoke events are skipped (cloudtrail_extractor.py:390-398) — intentional undercount to avoid false attribution.
  • Region scoping. Only regions explicitly listed are scanned; out-of-scope regions emit zero.

Failure-mode reporting

Scan-level: correct. Workload-level: execution_count_30d = 0 on each Lambda is indistinguishable from a real idle Lambda. This is the meta-finding — we silently communicate UNAVAILABLE as APPROXIMATE_ZERO, and the dormant_authority rule (src/evaluator/rules/dormant-authority.ts:52-64) does not yet suppress itself when evidence completeness is unavailable_not_enabled.

Current tier and upgrade path

  • Current: UNAVAILABLE on every demo scenario to date.
  • Connector + CFN update win: CloudWatch Invocations fallback for Lambda (3 days connector code, plus CFN update to add cloudwatch:GetMetricStatistics / cloudwatch:GetMetricData to cfn/securityv0-readonly-role.yaml, plus customer re-stack for every deployed role). Moves UNAVAILABLEAPPROXIMATE. Attribution still absent.
  • 1-week win: trails when present, CloudWatch fallback when not, first-class onboarding_gap:cloudtrail_not_enabled finding. Moves to HIGH_FIDELITY when trail is present.

5. Attribution vs Counting

Counting "how many executions" and attributing "which identity executed" are separable problems. Today's connectors conflate them.

SystemCount fidelityAttribution fidelityWhy they diverge
Azure Monitor TotalCallsGROUND_TRUTHUNAVAILABLEMetric is aggregated at the deployment, not attributed to caller.
Entra sign-in logsAPPROXIMATE (lower bound)GROUND_TRUTHEach sign-in names a principal, but one sign-in covers many calls.
CloudTrailGROUND_TRUTH when enabledGROUND_TRUTH when enabledPer-event, carries userIdentity.
ServiceNow sys_flow_contextHIGH_FIDELITYHIGH_FIDELITYFlows record the user, but sys_audit is needed for the full causal chain.
Foundry /threadsAPPROXIMATEUNAVAILABLENo agent filter, no principal on the thread record.

Practical implication: when a customer asks "who is calling this Foundry agent seven times a day," we cannot answer from Azure Monitor alone — we need sign-ins for the principal join. That's why P1/P2 is a hard dependency for Foundry's usefulness, even though Foundry will list threads without it. Not intuitive to field engineers; should be in the onboarding checklist.

CloudTrail gives us both for free when enabled. That's the structural reason AWS is the easiest path to GROUND_TRUTH — if the customer already pays for CloudTrail, we just read it.


6. Where We Give Real Numbers vs Approximations

6.1 Within ±10% of ground truth today

  • ServiceNow Flow executions with sys_flow_context access. One evidence node per flow-context record; 1:1 of source table. (Evidence: dev-scan-2026-04-20.json reports 67 flow_execution nodes.)
  • ServiceNow scheduled-job executions with sys_trigger access. Same mechanism.
  • ServiceNow Script Include executions via sys_log structured logging (Auto-route identity tickets when the opt-in pattern is used).

6.2 Trigger-count proxies (off by factor-of-N)

  • BR execution via trigger-record evidence. Proxies BR eligibility, not actual execution. Direction depends on condition evaluation; factor 0.1× (skipped) to 5× (retry loop).
  • Script Include counts via BR propagation. An SI called from three BRs inherits three sets of trigger records even if only one reached it.
  • Foundry agent runs via thread count. Off by N_agents × (runs/thread ratio).
  • Entra sign-ins as proxy for calls. Lower bound; divide by token-cache duration for a rough ceiling.

6.3 Cases where we have no number

  • Any AWS workload without CloudTrail. Every Lambda/ECS/Bedrock on the current demo environment.
  • Any Foundry project we cannot data-plane-access (403 on /threads).
  • Any ServiceNow workload where the scanning principal lacks sys_flow_context read.
  • Any principal whose sign-ins are unavailable due to P1/P2 absence — Foundry run summary still works, attribution gone.

7. Customer Licensing Implications

Open this table on a first-call qualifying conversation. MSRP is public retail list; enterprise agreements vary.

Note on Entra ID P1 pricing: the ~$6/user/month figure is per workforce user, tenant-wide — not per privileged user and not per service principal. A customer with 10,000 employees pays for 10,000 P1 seats to make sign-ins available for any SP in the tenant. This is usually a sunk cost for mid-market+ customers who already have P1 via M365 E3/E5 bundles; greenfield tenants on Microsoft 365 Business Basic/Standard will need an uplift.

ConnectorPrereqMSRP (approx)Tier unlock
Entra-ServiceNowEntra ID P1~$6/user/monthSign-in attribution → HIGH_FIDELITY on identity attribution
Entra-ServiceNowEntra ID P2~$9/user/monthP1 features + risk detections; same tier unlock as P1 for our purposes
Entra-ServiceNowServiceNow read on sys_flow_context, sys_trigger, sys_logIncluded in Platform license; access control is customer-sideHIGH_FIDELITY on ServiceNow executions
Entra-ServiceNowServiceNow read on monitored tables (incident, change_request, PII)Customer-side RBACAPPROXIMATE BR trigger evidence
Entra-ServiceNow(future) sys_audit read accessCustomer-side RBACHIGH_FIDELITY BR execution (requires sys_audit integration, not yet implemented)
Azure FoundryAzure AI User role on projectIncluded in Azure AIThread-count APPROXIMATE
Azure FoundryAzure Monitor diagnostic setting on deploymentAzure Monitor ingestion (~$2.30/GB)GROUND_TRUTH call-count (requires connector upgrade, not yet implemented)
Azure FoundryEntra P1/P2~$6-9/user/monthAttribution layer for Foundry findings
AWSCloudTrail per-account (management events)First trail free, ~$2/100k events beyondGROUND_TRUTH Lambda invocation evidence
AWSCloudTrail org trailSame pricing; org-level adminSame as above at org scope
AWSCloudWatch GetMetricStatistics / GetMetricDataNo cost; IAM permission — but NOT in the managed policy todayAPPROXIMATE invocation fallback (requires CFN update + customer re-stack + connector change; not yet implemented)

Meta-finding on our own demo tenant (as of 2026-04-20):

  • Entra P1/P2: absent on the primary demo tenant. Why sign-ins are unavailable_not_enabled on the Foundry scan; why Foundry findings lack attribution even when threads are readable.
  • Azure Monitor diagnostic settings on Foundry deployments: not configured.
  • CloudTrail on the demo AWS account: not enabled.

Our canonical demo environment is misconfigured in all three dimensions that would materially improve our output. Field engineers walking into a customer call should know this: customers with real production workloads often have at least CloudTrail and Entra P1 already in place, and will see meaningfully better numbers than the demo tenant does.


8. Product Positioning Guidance

Do not claim "we count every Foundry call." We count threads. A thread with 30 tool-use iterations shows as 1. First customer to look at Azure Monitor next to our UI will catch it.

Do not claim "we show every Lambda execution." Only if CloudTrail is enabled in the relevant regions and window.

Do not claim "this workload is dormant." Claim "no execution evidence in 30d from the sources we scanned." The qualifier is load-bearing.

Claim accurately: "We correlate executions across systems into a single access-path view, with fidelity that improves as customers enable standard telemetry — Entra P1, CloudTrail, Azure Monitor diagnostic settings. Our evidence-completeness blocks tell you per-finding which sources were available — and we are surfacing that same signal at the workload/path level next (see Roadmap item #1)."

The strongest feature of the current system is not the accuracy of the number — it is the evidenceCompleteness block that tells the customer what we could and couldn't see. That is the honest product, and it should be foregrounded in the UI.


9. Roadmap — What Moves the Needle

Five prioritized upgrades. Each is scoped; none is aspirational.

  1. Surface evidence-completeness in dormancy findings. dormant_authority today fires on execution_count_30d === 0 regardless of whether that zero is UNAVAILABLE or real absence (src/evaluator/rules/dormant-authority.ts:52-64). Change the rule to read ctx.getEvidenceCompleteness(entity) and suppress or downgrade when the relevant source is unavailable_not_enabled. Effort: 2 days. Tier impact: none (eliminates false positives). Customer benefit: the #1 dormant-authority false positive goes away. Demo-tenant success: zero AWS dormant_authority findings on the CloudTrail-disabled demo.

  2. Add CloudWatch Invocations fallback to AWS. Extend lambda_extractor.py with GetMetricStatistics for AWS/Lambda Invocations, 1-day period, 30-day window. Emit as execution_evidence with confidence: TEMPORAL_INFERRED. Effort: 3 days connector code + CFN template update + customer change-management (every existing deployed role must re-stack). cloudwatch:GetMetricData / cloudwatch:GetMetricStatistics are NOT in cfn/securityv0-readonly-role.yaml today; the CFN update is a prerequisite and must ship before connector code is useful. Tier impact: Lambda paths UNAVAILABLEAPPROXIMATE. Customer benefit: dormant detection works for Lambda on tenants without CloudTrail — most of them, once they re-stack. Demo-tenant success: every Lambda in dev-scan-2026-04-20.json reports execution_count_30d > 0 where the metric shows invocations.

  3. Fetch /threads/{id}/runs for Foundry. Gap G03/G04 in the ETL plan. Filter by assistant_id, aggregate runs with status. Effort: 3 days. Tier impact: Foundry agents APPROXIMATEHIGH_FIDELITY on count (attribution still needs sign-ins). Customer benefit: per-agent counts instead of project-wide thread count. Demo-tenant success: the 5 agents in the demo project report independent run counts.

  4. Add Azure Monitor TotalCalls as a second Foundry source. azure-monitor-query per deployment; cross-check threads vs calls; emit secondary execution_evidence per deployment per day. Effort: 1 week. Tier impact: Foundry deployment paths → GROUND_TRUTH on call count. Customer benefit: our number matches the Azure Monitor blade on the same screen. Demo-tenant success: Foundry call count on UI matches TotalCalls (±1 for clock skew) on the Sergey scenario.

  5. ServiceNow sys_audit integration for BR / SI. Phase 2 of the BR/SI evidence plan. Moves BR/SI from TEMPORAL_INFERRED to DETERMINISTIC. Effort: 2 weeks. Tier impact: BR / SI paths APPROXIMATEHIGH_FIDELITY. Customer benefit: clean answer to "how do you know that BR actually ran?" Demo-tenant success: Auto-route identity tickets BR reports execution_count_30d from sys_audit, consistent within ±10% of the incident trigger count.

Sergey's canonical demo tenant is the test case for all five. A fix is "merged" only when the demo tenant passes the success criterion.


10. Appendix — 2026-04-20 Reconciliation Worked Example

Scenario: Sergey's ServiceNow → Entra → Foundry canonical demo. Workload in focus: servicenow-openai-client (Entra SP) and the BR on the ServiceNow side that invokes OpenAI via the Azure gateway. Foundry agent: gpt-nano-for-summary.

Source30-day countTierWhat it measures
Platform UI (Foundry path, rolled up)8APPROXIMATEAggregate evidence-node count on the workload, summed across Foundry summary + Graph-leg trigger evidence. Path-materializer output at authority-path-materializer.ts:289-321.
Azure Monitor TotalCalls (create_completions, gpt-nano-for-summary)7GROUND_TRUTHChat-completion HTTP calls to the deployment. Aggregated 1-minute granularity.
Entra sign-in logs via Graph APIn/aUNAVAILABLEDemo tenant has no P1/P2 license; Graph API returned 403. Status correctly flagged as unavailable_not_enabled.
ServiceNow sys_script_execution_history (raw)21GROUND_TRUTH (of a different thing)BR evaluation traces, including developer/test runs during demo setup. Noisy.
ServiceNow incident table updates (upper bound)14APPROXIMATERecords that could have triggered the BR. Doesn't prove BR ran.

Observations:

  • The 8 vs 7 gap is the best we have ever observed on any connector, any scenario. It is a coincidence: this workload makes exactly one completion per thread, uses no tools, and has one active agent. The real error bound for the thread-as-run proxy is 0.1× to 3×+ in the general case. Do not cite this match as evidence of ~14% accuracy — it does not generalize.
  • 21 in sys_script_execution_history and 14 in incident updates bracket the true "how many times did the BR run" number, which is unknown. sys_audit would give us a specific number in that bracket, HIGH_FIDELITY.
  • Sign-ins absent (n/a) means the Foundry 8 cannot be attributed to servicenow-openai-client. Attribution is inferred from the structural graph (SP → BR → REST message → Foundry agent), not from runtime evidence.
  • The UI presents Peak Executions 30D = 8 without qualifier. A customer reading that next to their Azure Monitor blade sees 7. The delta is small; the burden is in the explanation, not the match.

Rerun this appendix for any new canonical scenario and store in the scenario's session-notes.