Execution Evidence Fidelity

1. Executive Summary

SecurityV0 tells customers "this workload executed N times in the last 30 days." N gates dormant_authority, unproven_execution, and risk-cluster severity (src/evaluator/rules/dormant-authority.ts:52-64, src/evaluator/rules/unproven-execution.ts:26, src/services/risk-cluster-service.ts:500). It is also one of the hardest numbers in the product: every connector produces N from a different source, most sources measure something subtly different from "the thing actually ran once," and several sources require licensed features the customer may not have enabled.

We propose a four-tier fidelity model — GROUND_TRUTH, HIGH_FIDELITY, APPROXIMATE, UNAVAILABLE — and score each shipping connector. Today every production execution count is either APPROXIMATE or UNAVAILABLE; zero GROUND_TRUTH paths exist across the three connectors. The thread-count proxy used for Foundry today can be 0.1× to 3× of the true call count, and 1:N per-agent. The canonical Sergey scenario lands at roughly 1.14× (our 8 vs Azure Monitor's 7) because that specific workload does one thread = one completion = no tools = single active agent. Any customer workload using tools, batched completions, or multiple agents per project will diverge by at least 2× in either direction. Do not cite the Sergey match as a general accuracy claim.

Connector	Today's tier	Method	What it counts	Ground-truth source we'd match against
Entra → ServiceNow	`APPROXIMATE`	mixed: per-record (Flow/Job), proxy-derived (BR/SI via trigger records), per-record (sys_log opt-in)	Sign-ins (P1/P2 only), Flow/Job exec records, BR trigger-record proxies, sys_log (when opted in)	`sys_script_execution_history`, `sys_audit`, Azure Monitor per-app metrics
Azure AI Foundry	`STRUCTURAL_ONLY` (multi-agent) / `APPROXIMATE` (single-agent)	summary-read (project-wide threads, NOT per-agent)	Threads opened in the project, counted as a proxy for agent runs — same number on every agent in the project	Azure Monitor `TotalCalls` per deployment, `/threads/{id}/runs` filtered by `assistant_id`
AWS	`UNAVAILABLE` (zero emitted today on demo)	none (no Lambda `Invocations` metric fallback; CloudTrail disabled on demo)	CloudTrail S3 archive when configured; CloudWatch `Invocations` fallback NOT available today	CloudWatch `AWS/Lambda Invocations`, CloudTrail Lake, S3 trail

The honest reading: our execution numbers are useful directional signals that get meaningfully better as customers enable standard telemetry (Entra P1/P2, CloudTrail, Azure Monitor diagnostic settings), and meaningfully worse when they don't.

Field-engineer cheat sheet

One-page summary for anyone walking into a customer call. If you read nothing else in this doc, read this table and Section 8.

Connector	Tier	Method	What to say on the call	What NOT to say
Entra-SN Flow/Job	`HIGH_FIDELITY`	per-record enumeration (`sys_flow_context`, `sys_trigger`)	"We match `sys_flow_context` 1:1"	"We count every BR execution"
Entra-SN BR/SI	`APPROXIMATE`	proxy-derived (trigger records on monitored tables)	"We count trigger records as a proxy"	Anything implying determinism
Foundry	`STRUCTURAL_ONLY` (multi-agent) / `APPROXIMATE` (single-agent)	summary-read (project-wide threads, not per-agent)	"We count threads project-wide, not per-agent"	"We count Foundry executions per agent"
AWS Lambda	`UNAVAILABLE` without CloudTrail	none emitted today	"Enable CloudTrail for this to work"	"The Lambda is dormant"
AWS any without CloudTrail	`UNAVAILABLE`	none	"We cannot tell without CloudTrail"	Any number

Recent platform fixes that change what you can say:

sv0-platform #500 (merged 2026-04-23) — per-path exec_30d is now destination-scoped on Azure Cognitive Services (Foundry) paths; the BR-SI-SP double/triple-count bug filed as #465 is fixed for AWS + Azure Cognitive Services destinations. ServiceNow and Entra destinations still rely on the entity-only fallback pending sv0-connectors #91.
sv0-platform #502 (merged 2026-04-23) — non-AWS authority-path UI rows now render "recent activity (events)" instead of "executions" when the evidence tier is below GROUND_TRUTH. The worst of the label problem is dampened but the underlying number remains a proxy.

2. Terminology

Product copy uses these words interchangeably today. They are not interchangeable.

Execution — "the workload ran to completion and produced its side-effect." For a ServiceNow BR, one script evaluation. For a Foundry agent, one run record. For a Lambda, one Invoke the runtime returned. No system-neutral source exists; every platform exposes a different proxy.
Call ≠ Invocation ≠ Run — these three are NOT interchangeable and conflating them is the single most common mistake in demo copy. run is a Foundry agent execution (one POST /threads/{id}/runs). Invocation is AWS's word for a Lambda that was called (one Invoke the runtime returned). call is a raw API call to the backing service (e.g. Azure Monitor's TotalCalls counts chat-completion HTTPs regardless of caller). One agent run can produce many calls (tool loops); one Lambda Invocation is one call; neither run nor Invocation equals thread.
Call — an API call to the backing service. Azure Monitor's TotalCalls on a Foundry deployment counts chat-completion HTTPs regardless of caller. One agent execution can produce many calls (tool loops).
Trigger — an external event that could have caused an execution. ServiceNow incident.insert triggers the Auto-route identity tickets BR. Observing the trigger is evidence the BR likely ran, not proof.
Invocation — AWS's word for "a Lambda was called." Equivalent to execution when read from Invocations metric or Invoke CloudTrail events — which we do not read today.
Sign-in — Entra's record of a token being issued. An SP with token caching signs in once per token lifetime (60-90 min) and serves many calls. Sign-ins are a lower bound on call volume.
Token request — exchanging credentials for a token. For service principals in Entra this equals sign-in; for OAuth clients that rotate per-call it diverges.

When a UI surface says "executions" we mean the first. Every derivation from the other four is a proxy.

3. Fidelity Tier System

Four tiers. No sub-tiers.

Tier 1 — `GROUND_TRUTH`

Source: Platform emits a per-execution record with unique ID, timestamp, and attributable principal. We read every record in the window.
Accuracy: ±1% (covers clock-skew and in-flight delivery).
Example we would score this: CloudTrail Invoke events for AWS Lambda — one record per invocation with userIdentity, eventTime, and full request context. Each event is per-execution, not an aggregated metric. No shipping connector operates at this tier today (CloudTrail is disabled on every demo tenant to date). TotalCalls on Azure Monitor is a metric, not per-execution records — it lands between Tier 1 and Tier 2 depending on how it is read; we treat it as GROUND_TRUTH only for the "how many calls hit the deployment" question, not for "who called" or per-execution attribution.
Failure modes: Record loss in delivery-delay windows; out-of-region events invisible; per-event volume can hit billing limits on chatty services.

Tier 2 — `HIGH_FIDELITY`

Source: Deterministic execution log exists and is read, but either (a) attribution is to the workload not the triggering identity, or (b) aggregation is done upstream (summary endpoints). Counts are correct; attribution may be lossy.
Accuracy: ±10%, with known direction of error (overcount from retry-duplication; undercount from upstream sampling).
Example: ServiceNow sys_flow_context per flow — one record per run, reliable timestamps, attributed to the flow but not the triggering user (entra-servicenow/src/entra_servicenow/core/transformer.py:2040-2082). sys_log structured evidence (opt-in, transformer.py:2182-2241) also lives here.
Failure modes: Attribution gap; records beyond platform retention invisible.

Tier 3 — `APPROXIMATE`

Source: No per-execution log read. Count derived from a related signal — threads opened as proxy for agent runs, trigger records as proxy for BR evaluations, sign-ins as proxy for API calls.
Accuracy: Error bound ranges from 0.1× to 3×+ depending on tool use, batched completions, and multi-agent projects. Useful for "is this used at all," not for "how often." A thread with 3 tool-use turns + 1 completion produces 4 create_completions API calls but counts as 1 thread — a 4× underestimate. A project with 5 agents sharing 1 thread over-reports by 5× per agent. The 2026-04-20 result (our 8 vs TotalCalls=7) is a single-workload coincidence: one completion per thread, no tool use, single agent active. This does not generalize.
Example: Foundry run_count_30d = count of threads with created_at in the 30d window, project-wide, NOT filtered by agent (azure-foundry/src/azure_foundry/adapters/foundry_client.py:611-724). See the self-authored commentary at lines 30-46 of that file.
Failure modes: Systematic bias in either direction; silent divergence as usage patterns shift (e.g., an agent adopting long-running tool-use sessions will diverge further from its thread count).

Tier 4 — `UNAVAILABLE`

Source: None read. Customer hasn't enabled the prerequisite (license, diagnostic setting, trail), or API returned 403, or the service isn't configured. Connector emits zero evidence; platform shows execution_30d = 0.
Accuracy: Structurally wrong. Zero means "we don't know," not "it hasn't run."
Example: AWS CloudTrail on dev-scan-2026-04-20.json — aws_cloudtrail.status = "unavailable_not_enabled" (sv0-connectors/integrations/aws/reports/dev-scan-2026-04-20.json). Every workload in that scan reports zero.
Failure mode: The critical one — a customer's high-risk workload running thousands of times per day still shows execution_30d = 0, firing dormant_authority as a false positive.

Rule: UNAVAILABLE MUST be reported via evidenceCompleteness.sources.*.status (see azure-foundry/src/azure_foundry/core/discoverer.py:200-269). The UI should suppress dormant-authority findings when the relevant source is unavailable_not_enabled. This suppression is not yet implemented — tracked as a Roadmap item below.

4. Per-Connector Deep Dive

4.1 Entra-ServiceNow connector

Code: sv0-connectors/integrations/entra-servicenow/

Sources read today

Five execution-evidence sources, each at a different tier. They compose into the execution_count_30d property on ServiceNow workload nodes.

#	Source	Reader	Evidence confidence (connector claim)	Tier
1	Entra `/auditLogs/signIns`	`adapters/azure_client.py:169-211`	Structural sign-in record	`APPROXIMATE`
2	ServiceNow `sys_flow_context`	emitted via `_add_flow_execution_node`, `core/transformer.py:2040-2082`	`DETERMINISTIC` for flows	`HIGH_FIDELITY`
3	ServiceNow `sys_trigger`	`_add_job_execution_node`, `transformer.py:2085-2122`	`DETERMINISTIC` for scheduled jobs	`HIGH_FIDELITY`
4	ServiceNow trigger-record examples (e.g. `incident`, `change_request`)	`_add_trigger_record_node`, `transformer.py:2125-2180`	`TEMPORAL_INFERRED`	`APPROXIMATE`
5	ServiceNow `sys_log` structured entries	`_add_syslog_evidence_node`, `transformer.py:2182-2241`	`DETERMINISTIC` (opt-in)	`HIGH_FIDELITY`

How `execution_count_30d` is computed

Critical junction at transformer.py:1496-1537:

For flows / scheduled jobs with records in sys_flow_context / sys_trigger, execution_count_30d is the record count — HIGH_FIDELITY.
For BRs and Script Includes (no native execution table exposed to REST until sys_audit ships), the count comes from _evidence_count_by_workload — a map incremented once per trigger-record example emitted (transformer.py:2177-2180). A BR monitoring incident.insert gets execution_count_30d = (recent incident records observed) — a trigger-record proxy.
A Script Include with no direct evidence but CALLED by a BR that does inherits the caller's count (transformer.py:1514-1531).

Trigger-record propagation is where the platform's 8 for servicenow-openai-client on 2026-04-20 largely comes from. The UI number is the path-materializer rollup across ALL evidence nodes touching that principal (Graph-leg BR triggers + Foundry-leg summary), summed in sv0-platform/src/ingestion/authority-path-materializer.ts:262-336. It is not a count of one system's records.

Ground-truth comparison (canonical Sergey scenario, 2026-04-20)

sys_script_execution_history for the BR: 21 records, all 2026-04-20, polluted by developer/test runs.
incident updates (upper bound of what could trigger the BR): 14.
Platform Peak Executions 30D, Foundry-leg path: 8.
Azure Monitor TotalCalls on create_completions, gpt-nano-for-summary: 7.

The 8 is not a direct sum of any of these. It's the path-materializer's rollup of evidence nodes attached to the workload or its RUNS_AS targets — combining BR trigger-record evidence (5 in the scan report, but platform-side propagation through RUNS_AS changes the count) with Foundry agent-run-summary evidence. Landing within ±1 of Azure Monitor truth here is coincidence, not guarantee.

Customer-side requirements

Source	Requires
Sign-in logs	Entra ID P1 or P2 licence (Microsoft.com MSRP: ~$6/user/month P1). `azure_client.py:170` documents this.
`sys_flow_context`	Read access to `sys_flow_context`; Flow Designer flows enabled. Default-on for modern ServiceNow.
`sys_trigger`	Read access to `sys_trigger`. Default-on.
Trigger records	Read access to the BR's monitored tables (may include `incident`, `change_request`, PII tables) — often the hardest permission to get in a customer demo.
`sys_log`	The customer's scripts must emit structured log entries (`gs.error('[SI_NAME] START ...')`). We detect the pattern; we don't write it. Most customers don't have it.

Known discrepancies

Sign-ins count token issuances, not calls. An SP with SDK token caching signs in once per token lifetime and serves many calls. Our sign-in count is a lower bound.
Trigger-record evidence is not execution proof. Five incident.insert events prove the monitored table saw traffic — not that the BR ran (condition evaluation may have skipped it). Flagged explicitly with confidence: "TEMPORAL_INFERRED" + proof_notes (transformer.py:2161-2166).
Propagated evidence double-counts — previously live, fixed 2026-04-23. A BR with 5 trigger records calling two SIs gave each SI 5 inherited; a risk cluster containing the BR + both SIs summed 15 for 5 real events. Pre-fix, risk-cluster-service.ts performed a raw total_execution_30d += path.current_state.execution_30d with no dedupe on source event, evidence node, or resource_key. authority-path-materializer.ts summed across [workload._id, ...runsAsTargets], so a BR already counted on the workload was counted again under each RUNS_AS SP. A cluster containing BR + Script Include + Service Principal (a BR-SI-SP triangle) triple-counted the same source events. Filed as sv0-platform #465 and fixed by PR #500 (merged 2026-04-23, commit e0c05d6). Post-fix: per-path evidence dedupes by source_record_id (sumDistinctExecutionEvidenceCount at risk-cluster-service.ts:574), and aggregation scopes to destination_resource_key on paths whose destination has a derivable canonical resource key. AWS and Azure Cognitive Services destinations now scope correctly; ServiceNow and Entra destinations remain on the entity-only fallback until connector-side target_resource_key emission lands (sv0-connectors #91). On those remaining paths the pre-#500 over-report characterization still applies directionally, though #500 at minimum deduplicates per source event.
execution_count_30d is workload-wide, not per-resource. authority-path-materializer.ts:273-280 acknowledges this as intentional v1 fallback (limitation P1-1).

Failure-mode reporting

When P1/P2 is missing, the connector correctly emits execution_evidence.status = "unavailable_no_access" + note "Azure AD P1/P2 license required for signIns API" (transformer.py:2309-2313). ServiceNow permission failures on sys_flow_context, however, surface as silent-zero counts rather than unavailable_no_access — a gap.

Current tier and upgrade path

Flow / scheduled-job paths: HIGH_FIDELITY.
BR / SI paths: APPROXIMATE. Upgrade to HIGH_FIDELITY requires sys_audit integration (Phase 2 of the BR/SI evidence plan).
Overall connector tier: APPROXIMATE (lowest common denominator).

4.2 Azure Foundry connector

Code: sv0-connectors/integrations/azure-foundry/

Tier: STRUCTURAL_ONLY by default; APPROXIMATE only for verified single-agent projects. Previous drafts scored this connector APPROXIMATE across the board — that was wrong for multi-agent projects (the realistic case) and has been corrected here.

FIELD WARNING — READ BEFORE ANY CUSTOMER DEMO WITH MULTIPLE AGENTS: Every agent in a Foundry project reports the same execution_count_30d — the count is project-wide, not per-agent. The root cause is at azure-foundry/src/azure_foundry/core/transformer.py:287-290: the connector assigns exec_count_30d = run_summary.run_count_30d directly onto every agent workload node with no per-agent filter, because foundry_client.py:611-724 does not filter threads by assistant_id. A customer with 5 agents in a project will see all 5 rows reporting the same N — a factor-5 over-report for the least-active agents, factor-5 under-report for the most-active. On the 2026-04-20 dev scan every agent reported execution_count_30d = 1 from a single project-wide thread.

ui/src/pages/AuthorityPathsListPage.tsx:642 renders the "Peak Executions (30d)" column from this value. sv0-platform #502 (merged 2026-04-23) now renders non-AWS paths as "recent activity (events)" instead of "executions" when the evidence tier is below GROUND_TRUTH — so the worst of the label problem is dampened on Foundry rows as of 2026-04-23 — but the underlying number remains project-wide, not per-agent. Do not walk a customer through multi-agent Foundry views without explaining this first.

For multi-agent Foundry projects: use this data to detect whether the project is used at all — not to quantify per-agent usage. See the tier guidance below.

Sources read today

Only one: the /agents/v1.0/threads endpoint (new /threads on AIFoundry endpoints). The connector calls it per-project and treats each thread's created_at as a proxy for an agent run. The source comment at adapters/foundry_client.py:20-46 should be read verbatim by anyone writing demo copy — it pre-empts every misleading claim we might otherwise make.

How the count is computed

get_agent_run_summary (foundry_client.py:611-724):

Enumerate project threads, paginated.
Filter to threads with created_at ≥ now - 30d.
Return run_count_30d = number of qualifying threads, last_run_at = max created_at.
transformer.py:287-290 writes execution_count_30d = run_summary.run_count_30d directly onto every agent workload node in the project with no per-agent filter (because foundry_client.py:611-724 does not filter threads by assistant_id). One aggregate execution_evidence node per agent (DQ-4 summary-first).

Three named biases the code itself acknowledges:

The count is project-wide and NOT filtered by assistant_id. Every agent in the same project gets the same thread count — a factor-N over-report for the least-active agents and a factor-N under-report for the most-active. In practice on the 2026-04-20 dev scan, all five agents reported run_count_30d = 1, reflecting one thread created across the project.
A thread can have zero runs inside it (client opened a session, never asked anything). Each still counts.
A thread can have many runs inside it. It still counts once.

Ground-truth comparison (same scenario as 4.1)

Our derived count (path-materialized, rolled up): 8.
Azure Monitor TotalCalls, create_completions, gpt-nano-for-summary: 7.

This alignment is a coincidence, not a guarantee. The caller makes one completion per thread, no tool use, one thread per inbound incident — the most favorable possible scenario. Real-world divergence: a single thread with 3 tool-use turns + 1 completion = 4 Azure Monitor TotalCalls but 1 thread in our count (0.25× ratio). A project with 5 agents but 1 active thread = all 5 agents report 8, true per-agent count is roughly 8/5 (5× over-report). Do not cite the 8 vs 7 match as evidence of ±14% accuracy — the error bound in production is 0.1× to 3×+.

Customer-side requirements

Source	Requires
Threads endpoint (`/agents/v1.0/threads`)	`Azure AI User` or `Azure AI Project Manager` role on the Foundry project. Data-plane token. Default not granted — our own demo app registration needed an explicit grant.
Azure Monitor `TotalCalls` (would-be `GROUND_TRUTH` source)	Diagnostic setting on the Foundry deployment routing `AllMetrics` to a workspace we read. Not configured today in the demo tenant, and not read by the connector today.
Entra sign-in logs for the SP (attribution)	Entra P1/P2 license. The Foundry connector's `evidence_completeness.sign_in_logs.status = "unavailable_not_enabled"` on the 2026-04-20 dev scan is exactly this.

Known discrepancies

Project-wide, not agent-specific. Five agents with one active thread all report execution_count_30d = 1 — ratio of true-to-reported error is N_agents:1.
Threads ≠ runs. See above.
No chat-completion call count. A 30-call multi-tool session shows 1. TotalCalls shows 30.
No attribution. Cannot answer "who opened this thread" without joining sign-in logs on SP object_id.

Failure-mode reporting

Correctly scored. discoverer.py:256-269 distinguishes available / partial / unavailable_no_access / unavailable_not_applicable / unavailable_not_enabled. 403 on threads → empty summary + degraded status. No agents → unavailable_not_applicable, not unavailable_no_access — correct.

Current tier and upgrade path

Current: APPROXIMATE — with a critical caveat for multi-agent projects. Do NOT quote per-agent execution counts from Foundry projects with more than one agent — the counts are not agent-attributed. For single-agent projects, APPROXIMATE (directional signal, may be off by 0.1× to 3×+). For multi-agent projects: treat as STRUCTURAL ONLY — use to detect whether the project is used at all, not to quantify per-agent usage.
HIGH_FIDELITY: fetch /threads/{id}/runs per thread, filter by assistant_id, aggregate runs with status. ~3 days. Gap G03/G04 in ETL plan.
GROUND_TRUTH: add Azure Monitor TotalCalls query per deployment + sign-ins cross-reference for attribution. ~1 week. Requires customer diagnostic setting; we should fire a specific onboarding_gap finding when absent.

4.3 AWS connector

Code: sv0-connectors/integrations/aws/

Sources read today

Structurally: IAM, Lambda config, ECS task defs, Bedrock agents, Secrets inventory. Execution evidence: only CloudTrail, read from S3 archive (extractors/cloudtrail_extractor.py).

Lambda does not fetch CloudWatch Invocations. The Lambda extractor reads list_functions, get_function, event source mappings, and resource policies — not GetMetricStatistics. When CloudTrail is disabled, a second APPROXIMATE data source could be added — but cloudwatch:GetMetricStatistics / cloudwatch:GetMetricData are NOT in the managed IAM policy today (sv0-connectors/integrations/aws/cfn/securityv0-readonly-role.yaml, 353 lines, zero cloudwatch:* statements). Enabling this fallback is therefore not a connector-only change: it requires (a) updating the CFN template to add the CloudWatch actions, (b) every customer re-deploying the CFN stack to widen their trust-policy grant, and (c) connector code to call the API. Until (a) + (b) ship, calling CloudWatch from the connector will 403 on every customer role.

How the count is computed (when CloudTrail IS enabled)

CloudTrailExtractor.extract_for_workload (cloudtrail_extractor.py:193-265):

Walk S3 prefix AWSLogs/<account>/CloudTrail/<region>/<YYYY>/<MM>/<DD>/ for the 30-day window.
Decompress, parse, iterate Records.
Filter to event names the workload cares about (Invoke for Lambda, RunTask for ECS, InvokeModel for Bedrock — cloudtrail_extractor.py:53-81).
Match on exact ARN, or on userIdentity.sessionContext.sessionIssuer.arn == role_arn for non-Invoke calls produced by the workload's role (S3 GetObject, KMS Decrypt, etc.).
Emit one CloudTrailEvidence per matching event.

When the trail exists, this is the only path in the codebase producing per-execution attribution (events carry principalId + sessionIssuer). Hypothetically, with full CloudTrail coverage, Lambda paths score GROUND_TRUTH for direct invocations and HIGH_FIDELITY for indirect role-mediated evidence. We have never achieved this on any demo tenant.

Ground-truth comparison

Impossible today: aws_cloudtrail.status = "unavailable_not_enabled" on both dev-scan-2026-04-20.json and prod-post-461-aws.json. Every AWS workload reports execution_count_30d = 0. No empirical ±% — only structural analysis.

Customer-side requirements

Source	Requires
CloudTrail (per-account)	Trail configured in every region of interest, writing to an S3 bucket we read. ~$2/100k events typical cost; free trail allowed one per account but with 90-day UI retention.
CloudTrail (org trail)	Organization-level trail with `AWSLogs/<org_id>/<account_id>/CloudTrail/...` prefix layout. Connector supports this via `CLOUDTRAIL_LAYOUTS` (`cloudtrail_extractor.py:102-115`).
CloudWatch `Invocations` metric (potential fallback)	IAM `cloudwatch:GetMetricData` / `cloudwatch:GetMetricStatistics`. NOT in `cfn/securityv0-readonly-role.yaml` today — enabling this fallback requires (a) a CFN template update to add the CloudWatch actions, (b) customer re-deploys the CloudFormation stack, (c) connector code to call the API. Every existing deployed role must be re-onboarded.
CloudTrail Lake (long-window queries)	Event data store configured. Not currently supported.
Cross-account S3 read	The trail's S3 bucket must be readable by our scanning principal. Common gotcha.

Known discrepancies

Zero-not-unknown. CloudTrail-off emits zero evidence + scan-level unavailable_not_enabled, but each workload's execution_count_30d silently reads 0 — indistinguishable from an idle workload with trail enabled. No path-level degraded-confidence flag.
Attribution is per-call, not aggregated. Unlike Foundry summary-first, each CloudTrail evidence is a full event. Great for attribution, but 30 days × chatty Lambda can hit millions. CloudTrailScanBudgetExceededError (cloudtrail_extractor.py:118-123) caps it.
Non-Invoke attribution requires role_arn. Without it, non-Invoke events are skipped (cloudtrail_extractor.py:390-398) — intentional undercount to avoid false attribution.
Region scoping. Only regions explicitly listed are scanned; out-of-scope regions emit zero.

Failure-mode reporting

Scan-level: correct. Workload-level: execution_count_30d = 0 on each Lambda is indistinguishable from a real idle Lambda. This is the meta-finding — we silently communicate UNAVAILABLE as APPROXIMATE_ZERO, and the dormant_authority rule (src/evaluator/rules/dormant-authority.ts:52-64) does not yet suppress itself when evidence completeness is unavailable_not_enabled.

Current tier and upgrade path

Current: UNAVAILABLE on every demo scenario to date.
Connector + CFN update win: CloudWatch Invocations fallback for Lambda (3 days connector code, plus CFN update to add cloudwatch:GetMetricStatistics / cloudwatch:GetMetricData to cfn/securityv0-readonly-role.yaml, plus customer re-stack for every deployed role). Moves UNAVAILABLE → APPROXIMATE. Attribution still absent.
1-week win: trails when present, CloudWatch fallback when not, first-class onboarding_gap:cloudtrail_not_enabled finding. Moves to HIGH_FIDELITY when trail is present.

5. Attribution vs Counting

Counting "how many executions" and attributing "which identity executed" are separable problems. Today's connectors conflate them.

System	Count fidelity	Attribution fidelity	Why they diverge
Azure Monitor `TotalCalls`	`GROUND_TRUTH`	`UNAVAILABLE`	Metric is aggregated at the deployment, not attributed to caller.
Entra sign-in logs	`APPROXIMATE` (lower bound)	`GROUND_TRUTH`	Each sign-in names a principal, but one sign-in covers many calls.
CloudTrail	`GROUND_TRUTH` when enabled	`GROUND_TRUTH` when enabled	Per-event, carries `userIdentity`.
ServiceNow `sys_flow_context`	`HIGH_FIDELITY`	`HIGH_FIDELITY`	Flows record the user, but `sys_audit` is needed for the full causal chain.
Foundry `/threads`	`APPROXIMATE`	`UNAVAILABLE`	No agent filter, no principal on the thread record.

Practical implication: when a customer asks "who is calling this Foundry agent seven times a day," we cannot answer from Azure Monitor alone — we need sign-ins for the principal join. That's why P1/P2 is a hard dependency for Foundry's usefulness, even though Foundry will list threads without it. Not intuitive to field engineers; should be in the onboarding checklist.

CloudTrail gives us both for free when enabled. That's the structural reason AWS is the easiest path to GROUND_TRUTH — if the customer already pays for CloudTrail, we just read it.

6. Where We Give Real Numbers vs Approximations

6.1 Within ±10% of ground truth today

ServiceNow Flow executions with sys_flow_context access. One evidence node per flow-context record; 1:1 of source table. (Evidence: dev-scan-2026-04-20.json reports 67 flow_execution nodes.)
ServiceNow scheduled-job executions with sys_trigger access. Same mechanism.
ServiceNow Script Include executions via sys_log structured logging (Auto-route identity tickets when the opt-in pattern is used).

6.2 Trigger-count proxies (off by factor-of-N)

BR execution via trigger-record evidence. Proxies BR eligibility, not actual execution. Direction depends on condition evaluation; factor 0.1× (skipped) to 5× (retry loop).
Script Include counts via BR propagation. An SI called from three BRs inherits three sets of trigger records even if only one reached it.
Foundry agent runs via thread count. Off by N_agents × (runs/thread ratio).
Entra sign-ins as proxy for calls. Lower bound; divide by token-cache duration for a rough ceiling.

6.3 Cases where we have no number

Any AWS workload without CloudTrail. Every Lambda/ECS/Bedrock on the current demo environment.
Any Foundry project we cannot data-plane-access (403 on /threads).
Any ServiceNow workload where the scanning principal lacks sys_flow_context read.
Any principal whose sign-ins are unavailable due to P1/P2 absence — Foundry run summary still works, attribution gone.

7. Customer Licensing Implications

Open this table on a first-call qualifying conversation. MSRP is public retail list; enterprise agreements vary.

Note on Entra ID P1 pricing: the ~$6/user/month figure is per workforce user, tenant-wide — not per privileged user and not per service principal. A customer with 10,000 employees pays for 10,000 P1 seats to make sign-ins available for any SP in the tenant. This is usually a sunk cost for mid-market+ customers who already have P1 via M365 E3/E5 bundles; greenfield tenants on Microsoft 365 Business Basic/Standard will need an uplift.

Connector	Prereq	MSRP (approx)	Tier unlock
Entra-ServiceNow	Entra ID P1	~$6/user/month	Sign-in attribution → `HIGH_FIDELITY` on identity attribution
Entra-ServiceNow	Entra ID P2	~$9/user/month	P1 features + risk detections; same tier unlock as P1 for our purposes
Entra-ServiceNow	ServiceNow read on `sys_flow_context`, `sys_trigger`, `sys_log`	Included in Platform license; access control is customer-side	`HIGH_FIDELITY` on ServiceNow executions
Entra-ServiceNow	ServiceNow read on monitored tables (`incident`, `change_request`, PII)	Customer-side RBAC	`APPROXIMATE` BR trigger evidence
Entra-ServiceNow	(future) `sys_audit` read access	Customer-side RBAC	`HIGH_FIDELITY` BR execution (requires sys_audit integration, not yet implemented)
Azure Foundry	`Azure AI User` role on project	Included in Azure AI	Thread-count `APPROXIMATE`
Azure Foundry	Azure Monitor diagnostic setting on deployment	Azure Monitor ingestion (~$2.30/GB)	`GROUND_TRUTH` call-count (requires connector upgrade, not yet implemented)
Azure Foundry	Entra P1/P2	~$6-9/user/month	Attribution layer for Foundry findings
AWS	CloudTrail per-account (management events)	First trail free, ~$2/100k events beyond	`GROUND_TRUTH` Lambda invocation evidence
AWS	CloudTrail org trail	Same pricing; org-level admin	Same as above at org scope
AWS	CloudWatch `GetMetricStatistics` / `GetMetricData`	No cost; IAM permission — but NOT in the managed policy today	`APPROXIMATE` invocation fallback (requires CFN update + customer re-stack + connector change; not yet implemented)

Meta-finding on our own demo tenant (as of 2026-04-20):

Entra P1/P2: absent on the primary demo tenant. Why sign-ins are unavailable_not_enabled on the Foundry scan; why Foundry findings lack attribution even when threads are readable.
Azure Monitor diagnostic settings on Foundry deployments: not configured.
CloudTrail on the demo AWS account: not enabled.

Our canonical demo environment is misconfigured in all three dimensions that would materially improve our output. Field engineers walking into a customer call should know this: customers with real production workloads often have at least CloudTrail and Entra P1 already in place, and will see meaningfully better numbers than the demo tenant does.

8. Product Positioning Guidance

Do not claim "we count every Foundry call." We count threads. A thread with 30 tool-use iterations shows as 1. First customer to look at Azure Monitor next to our UI will catch it.

Do not claim "we show every Lambda execution." Only if CloudTrail is enabled in the relevant regions and window.

Do not claim "this workload is dormant." Claim "no execution evidence in 30d from the sources we scanned." The qualifier is load-bearing.

Claim accurately: "We correlate executions across systems into a single access-path view, with fidelity that improves as customers enable standard telemetry — Entra P1, CloudTrail, Azure Monitor diagnostic settings. Our evidence-completeness blocks tell you per-finding which sources were available — and we are surfacing that same signal at the workload/path level next (see Roadmap item #1)."

The strongest feature of the current system is not the accuracy of the number — it is the evidenceCompleteness block that tells the customer what we could and couldn't see. That is the honest product, and it should be foregrounded in the UI.

9. Roadmap — What Moves the Needle

Five prioritized upgrades. Each is scoped; none is aspirational.

Surface evidence-completeness in dormancy findings. dormant_authority today fires on execution_count_30d === 0 regardless of whether that zero is UNAVAILABLE or real absence (src/evaluator/rules/dormant-authority.ts:52-64). Change the rule to read ctx.getEvidenceCompleteness(entity) and suppress or downgrade when the relevant source is unavailable_not_enabled. Effort: 2 days. Tier impact: none (eliminates false positives). Customer benefit: the #1 dormant-authority false positive goes away. Demo-tenant success: zero AWS dormant_authority findings on the CloudTrail-disabled demo.
Add CloudWatch Invocations fallback to AWS. Extend lambda_extractor.py with GetMetricStatistics for AWS/Lambda Invocations, 1-day period, 30-day window. Emit as execution_evidence with confidence: TEMPORAL_INFERRED. Effort: 3 days connector code + CFN template update + customer change-management (every existing deployed role must re-stack). cloudwatch:GetMetricData / cloudwatch:GetMetricStatistics are NOT in cfn/securityv0-readonly-role.yaml today; the CFN update is a prerequisite and must ship before connector code is useful. Tier impact: Lambda paths UNAVAILABLE → APPROXIMATE. Customer benefit: dormant detection works for Lambda on tenants without CloudTrail — most of them, once they re-stack. Demo-tenant success: every Lambda in dev-scan-2026-04-20.json reports execution_count_30d > 0 where the metric shows invocations.
Fetch /threads/{id}/runs for Foundry. Gap G03/G04 in the ETL plan. Filter by assistant_id, aggregate runs with status. Effort: 3 days. Tier impact: Foundry agents APPROXIMATE → HIGH_FIDELITY on count (attribution still needs sign-ins). Customer benefit: per-agent counts instead of project-wide thread count. Demo-tenant success: the 5 agents in the demo project report independent run counts.
Add Azure Monitor TotalCalls as a second Foundry source. azure-monitor-query per deployment; cross-check threads vs calls; emit secondary execution_evidence per deployment per day. Effort: 1 week. Tier impact: Foundry deployment paths → GROUND_TRUTH on call count. Customer benefit: our number matches the Azure Monitor blade on the same screen. Demo-tenant success: Foundry call count on UI matches TotalCalls (±1 for clock skew) on the Sergey scenario.
ServiceNow sys_audit integration for BR / SI. Phase 2 of the BR/SI evidence plan. Moves BR/SI from TEMPORAL_INFERRED to DETERMINISTIC. Effort: 2 weeks. Tier impact: BR / SI paths APPROXIMATE → HIGH_FIDELITY. Customer benefit: clean answer to "how do you know that BR actually ran?" Demo-tenant success: Auto-route identity tickets BR reports execution_count_30d from sys_audit, consistent within ±10% of the incident trigger count.

Sergey's canonical demo tenant is the test case for all five. A fix is "merged" only when the demo tenant passes the success criterion.

10. Appendix — 2026-04-20 Reconciliation Worked Example

Scenario: Sergey's ServiceNow → Entra → Foundry canonical demo. Workload in focus: servicenow-openai-client (Entra SP) and the BR on the ServiceNow side that invokes OpenAI via the Azure gateway. Foundry agent: gpt-nano-for-summary.

Source	30-day count	Tier	What it measures
Platform UI (Foundry path, rolled up)	8	`APPROXIMATE`	Aggregate evidence-node count on the workload, summed across Foundry summary + Graph-leg trigger evidence. Path-materializer output at `authority-path-materializer.ts:289-321`.
Azure Monitor `TotalCalls` (`create_completions`, `gpt-nano-for-summary`)	7	`GROUND_TRUTH`	Chat-completion HTTP calls to the deployment. Aggregated 1-minute granularity.
Entra sign-in logs via Graph API	n/a	`UNAVAILABLE`	Demo tenant has no P1/P2 license; Graph API returned 403. Status correctly flagged as `unavailable_not_enabled`.
ServiceNow `sys_script_execution_history` (raw)	21	`GROUND_TRUTH` (of a different thing)	BR evaluation traces, including developer/test runs during demo setup. Noisy.
ServiceNow `incident` table updates (upper bound)	14	`APPROXIMATE`	Records that could have triggered the BR. Doesn't prove BR ran.

Observations:

The 8 vs 7 gap is the best we have ever observed on any connector, any scenario. It is a coincidence: this workload makes exactly one completion per thread, uses no tools, and has one active agent. The real error bound for the thread-as-run proxy is 0.1× to 3×+ in the general case. Do not cite this match as evidence of ~14% accuracy — it does not generalize.
21 in sys_script_execution_history and 14 in incident updates bracket the true "how many times did the BR run" number, which is unknown. sys_audit would give us a specific number in that bracket, HIGH_FIDELITY.
Sign-ins absent (n/a) means the Foundry 8 cannot be attributed to servicenow-openai-client. Attribution is inferred from the structural graph (SP → BR → REST message → Foundry agent), not from runtime evidence.
The UI presents Peak Executions 30D = 8 without qualifier. A customer reading that next to their Azure Monitor blade sees 7. The delta is small; the burden is in the explanation, not the match.

Rerun this appendix for any new canonical scenario and store in the scenario's session-notes.

1. Executive Summary​

Field-engineer cheat sheet​

2. Terminology​

3. Fidelity Tier System​

Tier 1 — GROUND_TRUTH​

Tier 2 — HIGH_FIDELITY​

Tier 3 — APPROXIMATE​

Tier 4 — UNAVAILABLE​

4. Per-Connector Deep Dive​

4.1 Entra-ServiceNow connector​

Sources read today​

How execution_count_30d is computed​

Ground-truth comparison (canonical Sergey scenario, 2026-04-20)​

Customer-side requirements​

Known discrepancies​

Failure-mode reporting​

Current tier and upgrade path​

4.2 Azure Foundry connector​

Sources read today​

How the count is computed​

Ground-truth comparison (same scenario as 4.1)​

Customer-side requirements​

Known discrepancies​

Failure-mode reporting​

Current tier and upgrade path​

4.3 AWS connector​

Sources read today​

How the count is computed (when CloudTrail IS enabled)​

Ground-truth comparison​

Customer-side requirements​

Known discrepancies​

Failure-mode reporting​

Current tier and upgrade path​

5. Attribution vs Counting​

6. Where We Give Real Numbers vs Approximations​

6.1 Within ±10% of ground truth today​

6.2 Trigger-count proxies (off by factor-of-N)​

6.3 Cases where we have no number​

7. Customer Licensing Implications​

8. Product Positioning Guidance​

9. Roadmap — What Moves the Needle​

10. Appendix — 2026-04-20 Reconciliation Worked Example​

1. Executive Summary

Field-engineer cheat sheet

2. Terminology

3. Fidelity Tier System

Tier 1 — `GROUND_TRUTH`

Tier 2 — `HIGH_FIDELITY`

Tier 3 — `APPROXIMATE`

Tier 4 — `UNAVAILABLE`

4. Per-Connector Deep Dive

4.1 Entra-ServiceNow connector

Sources read today

How `execution_count_30d` is computed

Ground-truth comparison (canonical Sergey scenario, 2026-04-20)

Customer-side requirements

Known discrepancies

Failure-mode reporting

Current tier and upgrade path

4.2 Azure Foundry connector

Sources read today

How the count is computed

Ground-truth comparison (same scenario as 4.1)

Customer-side requirements

Known discrepancies

Failure-mode reporting

Current tier and upgrade path

4.3 AWS connector

Sources read today

How the count is computed (when CloudTrail IS enabled)

Ground-truth comparison

Customer-side requirements

Known discrepancies

Failure-mode reporting

Current tier and upgrade path

5. Attribution vs Counting

6. Where We Give Real Numbers vs Approximations

6.1 Within ±10% of ground truth today

6.2 Trigger-count proxies (off by factor-of-N)

6.3 Cases where we have no number

7. Customer Licensing Implications

8. Product Positioning Guidance

9. Roadmap — What Moves the Needle

10. Appendix — 2026-04-20 Reconciliation Worked Example