Execution Evidence Fidelity
1. Executive Summary
SecurityV0 tells customers "this workload executed N times in the last 30 days." N gates dormant_authority, unproven_execution, and risk-cluster severity (src/evaluator/rules/dormant-authority.ts:52-64, src/evaluator/rules/unproven-execution.ts:26, src/services/risk-cluster-service.ts:500). It is also one of the hardest numbers in the product: every connector produces N from a different source, most sources measure something subtly different from "the thing actually ran once," and several sources require licensed features the customer may not have enabled.
We propose a four-tier fidelity model — GROUND_TRUTH, HIGH_FIDELITY, APPROXIMATE, UNAVAILABLE — and score each shipping connector. Today every production execution count is either APPROXIMATE or UNAVAILABLE; zero GROUND_TRUTH paths exist across the three connectors. The thread-count proxy used for Foundry today can be 0.1× to 3× of the true call count, and 1:N per-agent. The canonical Sergey scenario lands at roughly 1.14× (our 8 vs Azure Monitor's 7) because that specific workload does one thread = one completion = no tools = single active agent. Any customer workload using tools, batched completions, or multiple agents per project will diverge by at least 2× in either direction. Do not cite the Sergey match as a general accuracy claim.
| Connector | Today's tier | Method | What it counts | Ground-truth source we'd match against |
|---|---|---|---|---|
| Entra → ServiceNow | APPROXIMATE | mixed: per-record (Flow/Job), proxy-derived (BR/SI via trigger records), per-record (sys_log opt-in) | Sign-ins (P1/P2 only), Flow/Job exec records, BR trigger-record proxies, sys_log (when opted in) | sys_script_execution_history, sys_audit, Azure Monitor per-app metrics |
| Azure AI Foundry | STRUCTURAL_ONLY (multi-agent) / APPROXIMATE (single-agent) | summary-read (project-wide threads, NOT per-agent) | Threads opened in the project, counted as a proxy for agent runs — same number on every agent in the project | Azure Monitor TotalCalls per deployment, /threads/{id}/runs filtered by assistant_id |
| AWS | UNAVAILABLE (zero emitted today on demo) | none (no Lambda Invocations metric fallback; CloudTrail disabled on demo) | CloudTrail S3 archive when configured; CloudWatch Invocations fallback NOT available today | CloudWatch AWS/Lambda Invocations, CloudTrail Lake, S3 trail |
The honest reading: our execution numbers are useful directional signals that get meaningfully better as customers enable standard telemetry (Entra P1/P2, CloudTrail, Azure Monitor diagnostic settings), and meaningfully worse when they don't.
Field-engineer cheat sheet
One-page summary for anyone walking into a customer call. If you read nothing else in this doc, read this table and Section 8.
| Connector | Tier | Method | What to say on the call | What NOT to say |
|---|---|---|---|---|
| Entra-SN Flow/Job | HIGH_FIDELITY | per-record enumeration (sys_flow_context, sys_trigger) | "We match sys_flow_context 1:1" | "We count every BR execution" |
| Entra-SN BR/SI | APPROXIMATE | proxy-derived (trigger records on monitored tables) | "We count trigger records as a proxy" | Anything implying determinism |
| Foundry | STRUCTURAL_ONLY (multi-agent) / APPROXIMATE (single-agent) | summary-read (project-wide threads, not per-agent) | "We count threads project-wide, not per-agent" | "We count Foundry executions per agent" |
| AWS Lambda | UNAVAILABLE without CloudTrail | none emitted today | "Enable CloudTrail for this to work" | "The Lambda is dormant" |
| AWS any without CloudTrail | UNAVAILABLE | none | "We cannot tell without CloudTrail" | Any number |
Recent platform fixes that change what you can say:
- sv0-platform #500 (merged 2026-04-23) — per-path
exec_30dis now destination-scoped on Azure Cognitive Services (Foundry) paths; the BR-SI-SP double/triple-count bug filed as #465 is fixed for AWS + Azure Cognitive Services destinations. ServiceNow and Entra destinations still rely on the entity-only fallback pending sv0-connectors #91. - sv0-platform #502 (merged 2026-04-23) — non-AWS authority-path UI rows now render "recent activity (events)" instead of "executions" when the evidence tier is below
GROUND_TRUTH. The worst of the label problem is dampened but the underlying number remains a proxy.
2. Terminology
Product copy uses these words interchangeably today. They are not interchangeable.
- Execution — "the workload ran to completion and produced its side-effect." For a ServiceNow BR, one script evaluation. For a Foundry agent, one
runrecord. For a Lambda, oneInvokethe runtime returned. No system-neutral source exists; every platform exposes a different proxy. - Call ≠ Invocation ≠ Run — these three are NOT interchangeable and conflating them is the single most common mistake in demo copy.
runis a Foundry agent execution (onePOST /threads/{id}/runs).Invocationis AWS's word for a Lambda that was called (oneInvokethe runtime returned).callis a raw API call to the backing service (e.g. Azure Monitor'sTotalCallscounts chat-completion HTTPs regardless of caller). One agentruncan produce manycalls(tool loops); one LambdaInvocationis onecall; neitherrunnorInvocationequalsthread. - Call — an API call to the backing service. Azure Monitor's
TotalCallson a Foundry deployment counts chat-completion HTTPs regardless of caller. One agent execution can produce many calls (tool loops). - Trigger — an external event that could have caused an execution. ServiceNow
incident.inserttriggers theAuto-route identity ticketsBR. Observing the trigger is evidence the BR likely ran, not proof. - Invocation — AWS's word for "a Lambda was called." Equivalent to execution when read from
Invocationsmetric orInvokeCloudTrail events — which we do not read today. - Sign-in — Entra's record of a token being issued. An SP with token caching signs in once per token lifetime (60-90 min) and serves many calls. Sign-ins are a lower bound on call volume.
- Token request — exchanging credentials for a token. For service principals in Entra this equals sign-in; for OAuth clients that rotate per-call it diverges.
When a UI surface says "executions" we mean the first. Every derivation from the other four is a proxy.
3. Fidelity Tier System
Four tiers. No sub-tiers.
Tier 1 — GROUND_TRUTH
- Source: Platform emits a per-execution record with unique ID, timestamp, and attributable principal. We read every record in the window.
- Accuracy: ±1% (covers clock-skew and in-flight delivery).
- Example we would score this: CloudTrail
Invokeevents for AWS Lambda — one record per invocation withuserIdentity,eventTime, and full request context. Each event is per-execution, not an aggregated metric. No shipping connector operates at this tier today (CloudTrail is disabled on every demo tenant to date).TotalCallson Azure Monitor is a metric, not per-execution records — it lands between Tier 1 and Tier 2 depending on how it is read; we treat it asGROUND_TRUTHonly for the "how many calls hit the deployment" question, not for "who called" or per-execution attribution. - Failure modes: Record loss in delivery-delay windows; out-of-region events invisible; per-event volume can hit billing limits on chatty services.
Tier 2 — HIGH_FIDELITY
- Source: Deterministic execution log exists and is read, but either (a) attribution is to the workload not the triggering identity, or (b) aggregation is done upstream (summary endpoints). Counts are correct; attribution may be lossy.
- Accuracy: ±10%, with known direction of error (overcount from retry-duplication; undercount from upstream sampling).
- Example: ServiceNow
sys_flow_contextper flow — one record per run, reliable timestamps, attributed to the flow but not the triggering user (entra-servicenow/src/entra_servicenow/core/transformer.py:2040-2082).sys_logstructured evidence (opt-in,transformer.py:2182-2241) also lives here. - Failure modes: Attribution gap; records beyond platform retention invisible.
Tier 3 — APPROXIMATE
- Source: No per-execution log read. Count derived from a related signal — threads opened as proxy for agent runs, trigger records as proxy for BR evaluations, sign-ins as proxy for API calls.
- Accuracy: Error bound ranges from 0.1× to 3×+ depending on tool use, batched completions, and multi-agent projects. Useful for "is this used at all," not for "how often." A thread with 3 tool-use turns + 1 completion produces 4
create_completionsAPI calls but counts as 1 thread — a 4× underestimate. A project with 5 agents sharing 1 thread over-reports by 5× per agent. The 2026-04-20 result (our8vsTotalCalls=7) is a single-workload coincidence: one completion per thread, no tool use, single agent active. This does not generalize. - Example: Foundry
run_count_30d= count of threads withcreated_atin the 30d window, project-wide, NOT filtered by agent (azure-foundry/src/azure_foundry/adapters/foundry_client.py:611-724). See the self-authored commentary at lines 30-46 of that file. - Failure modes: Systematic bias in either direction; silent divergence as usage patterns shift (e.g., an agent adopting long-running tool-use sessions will diverge further from its thread count).
Tier 4 — UNAVAILABLE
- Source: None read. Customer hasn't enabled the prerequisite (license, diagnostic setting, trail), or API returned 403, or the service isn't configured. Connector emits zero evidence; platform shows
execution_30d = 0. - Accuracy: Structurally wrong. Zero means "we don't know," not "it hasn't run."
- Example: AWS CloudTrail on
dev-scan-2026-04-20.json—aws_cloudtrail.status = "unavailable_not_enabled"(sv0-connectors/integrations/aws/reports/dev-scan-2026-04-20.json). Every workload in that scan reports zero. - Failure mode: The critical one — a customer's high-risk workload running thousands of times per day still shows
execution_30d = 0, firingdormant_authorityas a false positive.
Rule: UNAVAILABLE MUST be reported via evidenceCompleteness.sources.*.status (see azure-foundry/src/azure_foundry/core/discoverer.py:200-269). The UI should suppress dormant-authority findings when the relevant source is unavailable_not_enabled. This suppression is not yet implemented — tracked as a Roadmap item below.
4. Per-Connector Deep Dive
4.1 Entra-ServiceNow connector
Code: sv0-connectors/integrations/entra-servicenow/
Sources read today
Five execution-evidence sources, each at a different tier. They compose into the execution_count_30d property on ServiceNow workload nodes.
| # | Source | Reader | Evidence confidence (connector claim) | Tier |
|---|---|---|---|---|
| 1 | Entra /auditLogs/signIns | adapters/azure_client.py:169-211 | Structural sign-in record | APPROXIMATE |
| 2 | ServiceNow sys_flow_context | emitted via _add_flow_execution_node, core/transformer.py:2040-2082 | DETERMINISTIC for flows | HIGH_FIDELITY |
| 3 | ServiceNow sys_trigger | _add_job_execution_node, transformer.py:2085-2122 | DETERMINISTIC for scheduled jobs | HIGH_FIDELITY |
| 4 | ServiceNow trigger-record examples (e.g. incident, change_request) | _add_trigger_record_node, transformer.py:2125-2180 | TEMPORAL_INFERRED | APPROXIMATE |
| 5 | ServiceNow sys_log structured entries | _add_syslog_evidence_node, transformer.py:2182-2241 | DETERMINISTIC (opt-in) | HIGH_FIDELITY |
How execution_count_30d is computed
Critical junction at transformer.py:1496-1537:
- For flows / scheduled jobs with records in
sys_flow_context/sys_trigger,execution_count_30dis the record count —HIGH_FIDELITY. - For BRs and Script Includes (no native execution table exposed to REST until sys_audit ships), the count comes from
_evidence_count_by_workload— a map incremented once per trigger-record example emitted (transformer.py:2177-2180). A BR monitoringincident.insertgetsexecution_count_30d = (recent incident records observed)— a trigger-record proxy. - A Script Include with no direct evidence but CALLED by a BR that does inherits the caller's count (
transformer.py:1514-1531).
Trigger-record propagation is where the platform's 8 for servicenow-openai-client on 2026-04-20 largely comes from. The UI number is the path-materializer rollup across ALL evidence nodes touching that principal (Graph-leg BR triggers + Foundry-leg summary), summed in sv0-platform/src/ingestion/authority-path-materializer.ts:262-336. It is not a count of one system's records.
Ground-truth comparison (canonical Sergey scenario, 2026-04-20)
sys_script_execution_historyfor the BR: 21 records, all2026-04-20, polluted by developer/test runs.incidentupdates (upper bound of what could trigger the BR): 14.- Platform
Peak Executions 30D, Foundry-leg path: 8. - Azure Monitor
TotalCallsoncreate_completions,gpt-nano-for-summary: 7.
The 8 is not a direct sum of any of these. It's the path-materializer's rollup of evidence nodes attached to the workload or its RUNS_AS targets — combining BR trigger-record evidence (5 in the scan report, but platform-side propagation through RUNS_AS changes the count) with Foundry agent-run-summary evidence. Landing within ±1 of Azure Monitor truth here is coincidence, not guarantee.
Customer-side requirements
| Source | Requires |
|---|---|
| Sign-in logs | Entra ID P1 or P2 licence (Microsoft.com MSRP: ~$6/user/month P1). azure_client.py:170 documents this. |
sys_flow_context | Read access to sys_flow_context; Flow Designer flows enabled. Default-on for modern ServiceNow. |
sys_trigger | Read access to sys_trigger. Default-on. |
| Trigger records | Read access to the BR's monitored tables (may include incident, change_request, PII tables) — often the hardest permission to get in a customer demo. |
sys_log | The customer's scripts must emit structured log entries (gs.error('[SI_NAME] START ...')). We detect the pattern; we don't write it. Most customers don't have it. |
Known discrepancies
- Sign-ins count token issuances, not calls. An SP with SDK token caching signs in once per token lifetime and serves many calls. Our sign-in count is a lower bound.
- Trigger-record evidence is not execution proof. Five
incident.insertevents prove the monitored table saw traffic — not that the BR ran (condition evaluation may have skipped it). Flagged explicitly withconfidence: "TEMPORAL_INFERRED"+proof_notes(transformer.py:2161-2166). - Propagated evidence double-counts — previously live, fixed 2026-04-23. A BR with 5 trigger records calling two SIs gave each SI 5 inherited; a risk cluster containing the BR + both SIs summed 15 for 5 real events. Pre-fix,
risk-cluster-service.tsperformed a rawtotal_execution_30d += path.current_state.execution_30dwith no dedupe on source event, evidence node, or resource_key.authority-path-materializer.tssummed across[workload._id, ...runsAsTargets], so a BR already counted on the workload was counted again under each RUNS_AS SP. A cluster containing BR + Script Include + Service Principal (a BR-SI-SP triangle) triple-counted the same source events. Filed as sv0-platform #465 and fixed by PR #500 (merged 2026-04-23, commite0c05d6). Post-fix: per-path evidence dedupes bysource_record_id(sumDistinctExecutionEvidenceCountatrisk-cluster-service.ts:574), and aggregation scopes todestination_resource_keyon paths whose destination has a derivable canonical resource key. AWS and Azure Cognitive Services destinations now scope correctly; ServiceNow and Entra destinations remain on the entity-only fallback until connector-sidetarget_resource_keyemission lands (sv0-connectors #91). On those remaining paths the pre-#500 over-report characterization still applies directionally, though #500 at minimum deduplicates per source event. execution_count_30dis workload-wide, not per-resource.authority-path-materializer.ts:273-280acknowledges this as intentional v1 fallback (limitation P1-1).
Failure-mode reporting
When P1/P2 is missing, the connector correctly emits execution_evidence.status = "unavailable_no_access" + note "Azure AD P1/P2 license required for signIns API" (transformer.py:2309-2313). ServiceNow permission failures on sys_flow_context, however, surface as silent-zero counts rather than unavailable_no_access — a gap.
Current tier and upgrade path
- Flow / scheduled-job paths:
HIGH_FIDELITY. - BR / SI paths:
APPROXIMATE. Upgrade toHIGH_FIDELITYrequiressys_auditintegration (Phase 2 of the BR/SI evidence plan). - Overall connector tier:
APPROXIMATE(lowest common denominator).
4.2 Azure Foundry connector
Code: sv0-connectors/integrations/azure-foundry/
Tier: STRUCTURAL_ONLY by default; APPROXIMATE only for verified single-agent projects. Previous drafts scored this connector APPROXIMATE across the board — that was wrong for multi-agent projects (the realistic case) and has been corrected here.
FIELD WARNING — READ BEFORE ANY CUSTOMER DEMO WITH MULTIPLE AGENTS: Every agent in a Foundry project reports the same
execution_count_30d— the count is project-wide, not per-agent. The root cause is atazure-foundry/src/azure_foundry/core/transformer.py:287-290: the connector assignsexec_count_30d = run_summary.run_count_30ddirectly onto every agent workload node with no per-agent filter, becausefoundry_client.py:611-724does not filter threads byassistant_id. A customer with 5 agents in a project will see all 5 rows reporting the same N — a factor-5 over-report for the least-active agents, factor-5 under-report for the most-active. On the 2026-04-20 dev scan every agent reportedexecution_count_30d = 1from a single project-wide thread.
ui/src/pages/AuthorityPathsListPage.tsx:642renders the "Peak Executions (30d)" column from this value. sv0-platform #502 (merged 2026-04-23) now renders non-AWS paths as "recent activity (events)" instead of "executions" when the evidence tier is belowGROUND_TRUTH— so the worst of the label problem is dampened on Foundry rows as of 2026-04-23 — but the underlying number remains project-wide, not per-agent. Do not walk a customer through multi-agent Foundry views without explaining this first.For multi-agent Foundry projects: use this data to detect whether the project is used at all — not to quantify per-agent usage. See the tier guidance below.
Sources read today
Only one: the /agents/v1.0/threads endpoint (new /threads on AIFoundry endpoints). The connector calls it per-project and treats each thread's created_at as a proxy for an agent run. The source comment at adapters/foundry_client.py:20-46 should be read verbatim by anyone writing demo copy — it pre-empts every misleading claim we might otherwise make.
How the count is computed
get_agent_run_summary (foundry_client.py:611-724):
- Enumerate project threads, paginated.
- Filter to threads with
created_at≥now - 30d. - Return
run_count_30d= number of qualifying threads,last_run_at= max created_at. transformer.py:287-290writesexecution_count_30d = run_summary.run_count_30ddirectly onto every agent workload node in the project with no per-agent filter (becausefoundry_client.py:611-724does not filter threads byassistant_id). One aggregateexecution_evidencenode per agent (DQ-4 summary-first).
Three named biases the code itself acknowledges:
- The count is project-wide and NOT filtered by
assistant_id. Every agent in the same project gets the same thread count — a factor-N over-report for the least-active agents and a factor-N under-report for the most-active. In practice on the 2026-04-20 dev scan, all five agents reportedrun_count_30d = 1, reflecting one thread created across the project. - A thread can have zero runs inside it (client opened a session, never asked anything). Each still counts.
- A thread can have many runs inside it. It still counts once.
Ground-truth comparison (same scenario as 4.1)
- Our derived count (path-materialized, rolled up):
8. - Azure Monitor
TotalCalls,create_completions,gpt-nano-for-summary:7.
This alignment is a coincidence, not a guarantee. The caller makes one completion per thread, no tool use, one thread per inbound incident — the most favorable possible scenario. Real-world divergence: a single thread with 3 tool-use turns + 1 completion = 4 Azure Monitor TotalCalls but 1 thread in our count (0.25× ratio). A project with 5 agents but 1 active thread = all 5 agents report 8, true per-agent count is roughly 8/5 (5× over-report). Do not cite the 8 vs 7 match as evidence of ±14% accuracy — the error bound in production is 0.1× to 3×+.
Customer-side requirements
| Source | Requires |
|---|---|
Threads endpoint (/agents/v1.0/threads) | Azure AI User or Azure AI Project Manager role on the Foundry project. Data-plane token. Default not granted — our own demo app registration needed an explicit grant. |
Azure Monitor TotalCalls (would-be GROUND_TRUTH source) | Diagnostic setting on the Foundry deployment routing AllMetrics to a workspace we read. Not configured today in the demo tenant, and not read by the connector today. |
| Entra sign-in logs for the SP (attribution) | Entra P1/P2 license. The Foundry connector's evidence_completeness.sign_in_logs.status = "unavailable_not_enabled" on the 2026-04-20 dev scan is exactly this. |
Known discrepancies
- Project-wide, not agent-specific. Five agents with one active thread all report
execution_count_30d = 1— ratio of true-to-reported error is N_agents:1. - Threads ≠ runs. See above.
- No chat-completion call count. A 30-call multi-tool session shows
1.TotalCallsshows30. - No attribution. Cannot answer "who opened this thread" without joining sign-in logs on SP object_id.
Failure-mode reporting
Correctly scored. discoverer.py:256-269 distinguishes available / partial / unavailable_no_access / unavailable_not_applicable / unavailable_not_enabled. 403 on threads → empty summary + degraded status. No agents → unavailable_not_applicable, not unavailable_no_access — correct.
Current tier and upgrade path
- Current:
APPROXIMATE— with a critical caveat for multi-agent projects. Do NOT quote per-agent execution counts from Foundry projects with more than one agent — the counts are not agent-attributed. For single-agent projects,APPROXIMATE(directional signal, may be off by 0.1× to 3×+). For multi-agent projects: treat as STRUCTURAL ONLY — use to detect whether the project is used at all, not to quantify per-agent usage. HIGH_FIDELITY: fetch/threads/{id}/runsper thread, filter byassistant_id, aggregate runs with status. ~3 days. Gap G03/G04 in ETL plan.GROUND_TRUTH: add Azure MonitorTotalCallsquery per deployment + sign-ins cross-reference for attribution. ~1 week. Requires customer diagnostic setting; we should fire a specificonboarding_gapfinding when absent.
4.3 AWS connector
Code: sv0-connectors/integrations/aws/
Sources read today
Structurally: IAM, Lambda config, ECS task defs, Bedrock agents, Secrets inventory. Execution evidence: only CloudTrail, read from S3 archive (extractors/cloudtrail_extractor.py).
Lambda does not fetch CloudWatch Invocations. The Lambda extractor reads list_functions, get_function, event source mappings, and resource policies — not GetMetricStatistics. When CloudTrail is disabled, a second APPROXIMATE data source could be added — but cloudwatch:GetMetricStatistics / cloudwatch:GetMetricData are NOT in the managed IAM policy today (sv0-connectors/integrations/aws/cfn/securityv0-readonly-role.yaml, 353 lines, zero cloudwatch:* statements). Enabling this fallback is therefore not a connector-only change: it requires (a) updating the CFN template to add the CloudWatch actions, (b) every customer re-deploying the CFN stack to widen their trust-policy grant, and (c) connector code to call the API. Until (a) + (b) ship, calling CloudWatch from the connector will 403 on every customer role.
How the count is computed (when CloudTrail IS enabled)
CloudTrailExtractor.extract_for_workload (cloudtrail_extractor.py:193-265):
- Walk S3 prefix
AWSLogs/<account>/CloudTrail/<region>/<YYYY>/<MM>/<DD>/for the 30-day window. - Decompress, parse, iterate
Records. - Filter to event names the workload cares about (Invoke for Lambda, RunTask for ECS, InvokeModel for Bedrock —
cloudtrail_extractor.py:53-81). - Match on exact ARN, or on
userIdentity.sessionContext.sessionIssuer.arn == role_arnfor non-Invoke calls produced by the workload's role (S3 GetObject, KMS Decrypt, etc.). - Emit one
CloudTrailEvidenceper matching event.
When the trail exists, this is the only path in the codebase producing per-execution attribution (events carry principalId + sessionIssuer). Hypothetically, with full CloudTrail coverage, Lambda paths score GROUND_TRUTH for direct invocations and HIGH_FIDELITY for indirect role-mediated evidence. We have never achieved this on any demo tenant.
Ground-truth comparison
Impossible today: aws_cloudtrail.status = "unavailable_not_enabled" on both dev-scan-2026-04-20.json and prod-post-461-aws.json. Every AWS workload reports execution_count_30d = 0. No empirical ±% — only structural analysis.
Customer-side requirements
| Source | Requires |
|---|---|
| CloudTrail (per-account) | Trail configured in every region of interest, writing to an S3 bucket we read. ~$2/100k events typical cost; free trail allowed one per account but with 90-day UI retention. |
| CloudTrail (org trail) | Organization-level trail with AWSLogs/<org_id>/<account_id>/CloudTrail/... prefix layout. Connector supports this via CLOUDTRAIL_LAYOUTS (cloudtrail_extractor.py:102-115). |
CloudWatch Invocations metric (potential fallback) | IAM cloudwatch:GetMetricData / cloudwatch:GetMetricStatistics. NOT in cfn/securityv0-readonly-role.yaml today — enabling this fallback requires (a) a CFN template update to add the CloudWatch actions, (b) customer re-deploys the CloudFormation stack, (c) connector code to call the API. Every existing deployed role must be re-onboarded. |
| CloudTrail Lake (long-window queries) | Event data store configured. Not currently supported. |
| Cross-account S3 read | The trail's S3 bucket must be readable by our scanning principal. Common gotcha. |
Known discrepancies
- Zero-not-unknown. CloudTrail-off emits zero evidence + scan-level
unavailable_not_enabled, but each workload'sexecution_count_30dsilently reads0— indistinguishable from an idle workload with trail enabled. No path-level degraded-confidence flag. - Attribution is per-call, not aggregated. Unlike Foundry summary-first, each CloudTrail evidence is a full event. Great for attribution, but 30 days × chatty Lambda can hit millions.
CloudTrailScanBudgetExceededError(cloudtrail_extractor.py:118-123) caps it. - Non-Invoke attribution requires
role_arn. Without it, non-Invoke events are skipped (cloudtrail_extractor.py:390-398) — intentional undercount to avoid false attribution. - Region scoping. Only regions explicitly listed are scanned; out-of-scope regions emit zero.
Failure-mode reporting
Scan-level: correct. Workload-level: execution_count_30d = 0 on each Lambda is indistinguishable from a real idle Lambda. This is the meta-finding — we silently communicate UNAVAILABLE as APPROXIMATE_ZERO, and the dormant_authority rule (src/evaluator/rules/dormant-authority.ts:52-64) does not yet suppress itself when evidence completeness is unavailable_not_enabled.
Current tier and upgrade path
- Current:
UNAVAILABLEon every demo scenario to date. - Connector + CFN update win: CloudWatch
Invocationsfallback for Lambda (3 days connector code, plus CFN update to addcloudwatch:GetMetricStatistics/cloudwatch:GetMetricDatatocfn/securityv0-readonly-role.yaml, plus customer re-stack for every deployed role). MovesUNAVAILABLE→APPROXIMATE. Attribution still absent. - 1-week win: trails when present, CloudWatch fallback when not, first-class
onboarding_gap:cloudtrail_not_enabledfinding. Moves toHIGH_FIDELITYwhen trail is present.
5. Attribution vs Counting
Counting "how many executions" and attributing "which identity executed" are separable problems. Today's connectors conflate them.
| System | Count fidelity | Attribution fidelity | Why they diverge |
|---|---|---|---|
Azure Monitor TotalCalls | GROUND_TRUTH | UNAVAILABLE | Metric is aggregated at the deployment, not attributed to caller. |
| Entra sign-in logs | APPROXIMATE (lower bound) | GROUND_TRUTH | Each sign-in names a principal, but one sign-in covers many calls. |
| CloudTrail | GROUND_TRUTH when enabled | GROUND_TRUTH when enabled | Per-event, carries userIdentity. |
ServiceNow sys_flow_context | HIGH_FIDELITY | HIGH_FIDELITY | Flows record the user, but sys_audit is needed for the full causal chain. |
Foundry /threads | APPROXIMATE | UNAVAILABLE | No agent filter, no principal on the thread record. |
Practical implication: when a customer asks "who is calling this Foundry agent seven times a day," we cannot answer from Azure Monitor alone — we need sign-ins for the principal join. That's why P1/P2 is a hard dependency for Foundry's usefulness, even though Foundry will list threads without it. Not intuitive to field engineers; should be in the onboarding checklist.
CloudTrail gives us both for free when enabled. That's the structural reason AWS is the easiest path to GROUND_TRUTH — if the customer already pays for CloudTrail, we just read it.
6. Where We Give Real Numbers vs Approximations
6.1 Within ±10% of ground truth today
- ServiceNow Flow executions with
sys_flow_contextaccess. One evidence node per flow-context record; 1:1 of source table. (Evidence:dev-scan-2026-04-20.jsonreports 67flow_executionnodes.) - ServiceNow scheduled-job executions with
sys_triggeraccess. Same mechanism. - ServiceNow Script Include executions via
sys_logstructured logging (Auto-route identity ticketswhen the opt-in pattern is used).
6.2 Trigger-count proxies (off by factor-of-N)
- BR execution via trigger-record evidence. Proxies BR eligibility, not actual execution. Direction depends on condition evaluation; factor 0.1× (skipped) to 5× (retry loop).
- Script Include counts via BR propagation. An SI called from three BRs inherits three sets of trigger records even if only one reached it.
- Foundry agent runs via thread count. Off by N_agents × (runs/thread ratio).
- Entra sign-ins as proxy for calls. Lower bound; divide by token-cache duration for a rough ceiling.
6.3 Cases where we have no number
- Any AWS workload without CloudTrail. Every Lambda/ECS/Bedrock on the current demo environment.
- Any Foundry project we cannot data-plane-access (403 on
/threads). - Any ServiceNow workload where the scanning principal lacks
sys_flow_contextread. - Any principal whose sign-ins are unavailable due to P1/P2 absence — Foundry run summary still works, attribution gone.
7. Customer Licensing Implications
Open this table on a first-call qualifying conversation. MSRP is public retail list; enterprise agreements vary.
Note on Entra ID P1 pricing: the ~$6/user/month figure is per workforce user, tenant-wide — not per privileged user and not per service principal. A customer with 10,000 employees pays for 10,000 P1 seats to make sign-ins available for any SP in the tenant. This is usually a sunk cost for mid-market+ customers who already have P1 via M365 E3/E5 bundles; greenfield tenants on Microsoft 365 Business Basic/Standard will need an uplift.
| Connector | Prereq | MSRP (approx) | Tier unlock |
|---|---|---|---|
| Entra-ServiceNow | Entra ID P1 | ~$6/user/month | Sign-in attribution → HIGH_FIDELITY on identity attribution |
| Entra-ServiceNow | Entra ID P2 | ~$9/user/month | P1 features + risk detections; same tier unlock as P1 for our purposes |
| Entra-ServiceNow | ServiceNow read on sys_flow_context, sys_trigger, sys_log | Included in Platform license; access control is customer-side | HIGH_FIDELITY on ServiceNow executions |
| Entra-ServiceNow | ServiceNow read on monitored tables (incident, change_request, PII) | Customer-side RBAC | APPROXIMATE BR trigger evidence |
| Entra-ServiceNow | (future) sys_audit read access | Customer-side RBAC | HIGH_FIDELITY BR execution (requires sys_audit integration, not yet implemented) |
| Azure Foundry | Azure AI User role on project | Included in Azure AI | Thread-count APPROXIMATE |
| Azure Foundry | Azure Monitor diagnostic setting on deployment | Azure Monitor ingestion (~$2.30/GB) | GROUND_TRUTH call-count (requires connector upgrade, not yet implemented) |
| Azure Foundry | Entra P1/P2 | ~$6-9/user/month | Attribution layer for Foundry findings |
| AWS | CloudTrail per-account (management events) | First trail free, ~$2/100k events beyond | GROUND_TRUTH Lambda invocation evidence |
| AWS | CloudTrail org trail | Same pricing; org-level admin | Same as above at org scope |
| AWS | CloudWatch GetMetricStatistics / GetMetricData | No cost; IAM permission — but NOT in the managed policy today | APPROXIMATE invocation fallback (requires CFN update + customer re-stack + connector change; not yet implemented) |
Meta-finding on our own demo tenant (as of 2026-04-20):
- Entra P1/P2: absent on the primary demo tenant. Why sign-ins are
unavailable_not_enabledon the Foundry scan; why Foundry findings lack attribution even when threads are readable. - Azure Monitor diagnostic settings on Foundry deployments: not configured.
- CloudTrail on the demo AWS account: not enabled.
Our canonical demo environment is misconfigured in all three dimensions that would materially improve our output. Field engineers walking into a customer call should know this: customers with real production workloads often have at least CloudTrail and Entra P1 already in place, and will see meaningfully better numbers than the demo tenant does.
8. Product Positioning Guidance
Do not claim "we count every Foundry call." We count threads. A thread with 30 tool-use iterations shows as 1. First customer to look at Azure Monitor next to our UI will catch it.
Do not claim "we show every Lambda execution." Only if CloudTrail is enabled in the relevant regions and window.
Do not claim "this workload is dormant." Claim "no execution evidence in 30d from the sources we scanned." The qualifier is load-bearing.
Claim accurately: "We correlate executions across systems into a single access-path view, with fidelity that improves as customers enable standard telemetry — Entra P1, CloudTrail, Azure Monitor diagnostic settings. Our evidence-completeness blocks tell you per-finding which sources were available — and we are surfacing that same signal at the workload/path level next (see Roadmap item #1)."
The strongest feature of the current system is not the accuracy of the number — it is the evidenceCompleteness block that tells the customer what we could and couldn't see. That is the honest product, and it should be foregrounded in the UI.
9. Roadmap — What Moves the Needle
Five prioritized upgrades. Each is scoped; none is aspirational.
-
Surface evidence-completeness in dormancy findings.
dormant_authoritytoday fires onexecution_count_30d === 0regardless of whether that zero isUNAVAILABLEor real absence (src/evaluator/rules/dormant-authority.ts:52-64). Change the rule to readctx.getEvidenceCompleteness(entity)and suppress or downgrade when the relevant source isunavailable_not_enabled. Effort: 2 days. Tier impact: none (eliminates false positives). Customer benefit: the #1 dormant-authority false positive goes away. Demo-tenant success: zero AWSdormant_authorityfindings on the CloudTrail-disabled demo. -
Add CloudWatch
Invocationsfallback to AWS. Extendlambda_extractor.pywithGetMetricStatisticsforAWS/Lambda Invocations, 1-day period, 30-day window. Emit asexecution_evidencewithconfidence: TEMPORAL_INFERRED. Effort: 3 days connector code + CFN template update + customer change-management (every existing deployed role must re-stack).cloudwatch:GetMetricData/cloudwatch:GetMetricStatisticsare NOT incfn/securityv0-readonly-role.yamltoday; the CFN update is a prerequisite and must ship before connector code is useful. Tier impact: Lambda pathsUNAVAILABLE→APPROXIMATE. Customer benefit: dormant detection works for Lambda on tenants without CloudTrail — most of them, once they re-stack. Demo-tenant success: every Lambda indev-scan-2026-04-20.jsonreportsexecution_count_30d > 0where the metric shows invocations. -
Fetch
/threads/{id}/runsfor Foundry. Gap G03/G04 in the ETL plan. Filter byassistant_id, aggregate runs with status. Effort: 3 days. Tier impact: Foundry agentsAPPROXIMATE→HIGH_FIDELITYon count (attribution still needs sign-ins). Customer benefit: per-agent counts instead of project-wide thread count. Demo-tenant success: the 5 agents in the demo project report independent run counts. -
Add Azure Monitor
TotalCallsas a second Foundry source.azure-monitor-queryper deployment; cross-check threads vs calls; emit secondaryexecution_evidenceper deployment per day. Effort: 1 week. Tier impact: Foundry deployment paths →GROUND_TRUTHon call count. Customer benefit: our number matches the Azure Monitor blade on the same screen. Demo-tenant success: Foundry call count on UI matchesTotalCalls(±1 for clock skew) on the Sergey scenario. -
ServiceNow
sys_auditintegration for BR / SI. Phase 2 of the BR/SI evidence plan. Moves BR/SI fromTEMPORAL_INFERREDtoDETERMINISTIC. Effort: 2 weeks. Tier impact: BR / SI pathsAPPROXIMATE→HIGH_FIDELITY. Customer benefit: clean answer to "how do you know that BR actually ran?" Demo-tenant success:Auto-route identity ticketsBR reportsexecution_count_30dfromsys_audit, consistent within ±10% of theincidenttrigger count.
Sergey's canonical demo tenant is the test case for all five. A fix is "merged" only when the demo tenant passes the success criterion.
10. Appendix — 2026-04-20 Reconciliation Worked Example
Scenario: Sergey's ServiceNow → Entra → Foundry canonical demo. Workload in focus: servicenow-openai-client (Entra SP) and the BR on the ServiceNow side that invokes OpenAI via the Azure gateway. Foundry agent: gpt-nano-for-summary.
| Source | 30-day count | Tier | What it measures |
|---|---|---|---|
| Platform UI (Foundry path, rolled up) | 8 | APPROXIMATE | Aggregate evidence-node count on the workload, summed across Foundry summary + Graph-leg trigger evidence. Path-materializer output at authority-path-materializer.ts:289-321. |
Azure Monitor TotalCalls (create_completions, gpt-nano-for-summary) | 7 | GROUND_TRUTH | Chat-completion HTTP calls to the deployment. Aggregated 1-minute granularity. |
| Entra sign-in logs via Graph API | n/a | UNAVAILABLE | Demo tenant has no P1/P2 license; Graph API returned 403. Status correctly flagged as unavailable_not_enabled. |
ServiceNow sys_script_execution_history (raw) | 21 | GROUND_TRUTH (of a different thing) | BR evaluation traces, including developer/test runs during demo setup. Noisy. |
ServiceNow incident table updates (upper bound) | 14 | APPROXIMATE | Records that could have triggered the BR. Doesn't prove BR ran. |
Observations:
- The
8vs7gap is the best we have ever observed on any connector, any scenario. It is a coincidence: this workload makes exactly one completion per thread, uses no tools, and has one active agent. The real error bound for the thread-as-run proxy is 0.1× to 3×+ in the general case. Do not cite this match as evidence of ~14% accuracy — it does not generalize. 21insys_script_execution_historyand14inincidentupdates bracket the true "how many times did the BR run" number, which is unknown.sys_auditwould give us a specific number in that bracket,HIGH_FIDELITY.- Sign-ins absent (n/a) means the Foundry
8cannot be attributed toservicenow-openai-client. Attribution is inferred from the structural graph (SP → BR → REST message → Foundry agent), not from runtime evidence. - The UI presents
Peak Executions 30D = 8without qualifier. A customer reading that next to their Azure Monitor blade sees7. The delta is small; the burden is in the explanation, not the match.
Rerun this appendix for any new canonical scenario and store in the scenario's session-notes.