Skip to main content

Evidence Classification Model -- Research Brief

Executive Summary

Sergey's feedback on the March 2026 sprint review identified a fundamental gap: when SecurityV0 surfaces a finding, users cannot tell whether the claim is proven from observed execution logs or inferred from the structure of the identity graph. Today, the platform has the raw machinery -- EvidenceConfidenceLevel on execution evidence records, EvidenceCompletenessSection on findings, and structured evidence_refs -- but these are implementation details invisible to the user. No single field on a finding answers: "Is this proven?"

This research proposes an EvidenceClassification enum with five values (observed_execution, observed_absence, correlated_pattern, structural_authority, inferred_capability) and an EvidenceClaim type that wraps every finding in a human-readable claim statement, a classification, a runtime confidence overlay, and per-section confidence metadata. Each of the 14 rule files (producing 15 finding types) is analyzed and assigned a classification based on what data it actually checks.

The impact is threefold: (1) every finding in the UI can display a trust indicator ("Proven" vs. "Inferred"), (2) API consumers can filter and sort by evidence class, and (3) the platform satisfies the four-part requirement: what is proven, why it matters, what to do first, and who owns it. The classification is deterministic -- it is a combination of the rule's static data access pattern and the runtime confidence of the underlying evidence records, not a probabilistic model.

Note: Evidence classification alone does not solve the scanability problem identified by Sergey (too many clicks to understand a finding). See #217 (Wiz UX research) and #224 (expand default-visible info) for the complementary efforts needed to surface the "why/action/owner" payload without requiring detail-view clicks.

Current State Analysis

Existing Infrastructure

The platform already tracks evidence quality at multiple layers, but they are disconnected:

Layer 1: Execution Evidence Records (ExecutionEvidenceDoc in src/domain/evidence/types.ts)

  • confidence?: EvidenceConfidenceLevel -- set by the connector, values: DETERMINISTIC, TEMPORAL_INFERRED, STRUCTURAL
  • proof_notes?: string -- human-readable explanation of what this evidence does and does not prove
  • Populated during ingestion via resolveConfidence() in src/ingestion/graph-transformer.ts

Layer 2: Evidence Completeness (EvidenceCompletenessSection in src/domain/evidence-packs/types.ts)

  • Six categories: current_roles, role_history, execution_evidence, ownership_records, approval_records, credential_state
  • Each has an EvidenceAvailability value: available, unavailable_not_enabled, unavailable_no_access, unavailable_not_applicable, partial
  • Free-text notes per category
  • Populated by each evaluator rule via defaultEvidenceCompleteness()

Layer 3: Finding-Level References (FindingDoc in src/domain/findings/types.ts)

  • evidence_refs: Record<string, unknown> -- rule-specific structured data (counts, IDs, timestamps)
  • deterministic_explanation: string -- human-readable but not structured as a claim
  • evidence_completeness: EvidenceCompletenessSection -- completeness, not classification

What is missing:

  • No field says whether a finding is based on observed execution logs vs. structural graph analysis
  • No claim statement that a non-technical stakeholder can read
  • No way to filter findings by "how sure are we?"
  • The EvidenceConfidenceLevel on individual evidence records never propagates to the finding level

Per-Rule Evidence Analysis

1. dormant_authority (dormant-authority.ts)

What it checks: Queries execution evidence records for the entity and its RUNS_AS targets. Compares the most recent source_timestamp against a 90-day threshold. Also checks the connector-computed last_observed_execution_timestamp property.

Evidence basis: Observed execution log timestamps (or absence thereof). The rule queries real ExecutionEvidenceDoc records and makes a temporal determination.

evidence_completeness: Sets execution_evidence: "available" with a note about the age of the most recent record or the absence of records.

evidence_refs: execution_path_count, last_evidence_timestamp, threshold_days

Classification: observed_absence -- The finding fires when execution evidence is absent or stale. Although the rule queries real ExecutionEvidenceDoc records, the claim is about absence of recent activity, not about observed activity. Labelling this observed_execution ("Proven from execution logs") would mislead a user who sees "no logs found." The observed_absence classification makes the claim honest: "We looked and confirmed nothing recent exists."


2. external_egress (external-egress.ts)

What it checks: Reads entity.properties.egress_category and checks for the value "external". No execution evidence is queried.

Evidence basis: A connector-assigned property on the entity. The connector determines egress category from integration metadata (e.g., HTTP connector target URL classification).

evidence_completeness: Sets current_roles: "available" only.

evidence_refs: egress_category, execution_path_count

Classification: structural_authority -- The finding is based on a structural property of the entity in the graph. The egress classification comes from connector metadata, not from observed runtime data.


3. llm_egress (llm-egress.ts)

What it checks: Reads entity.properties.egress_category and checks for the value "llm". Identical pattern to external_egress.

Evidence basis: Connector-assigned entity property.

evidence_completeness: Sets current_roles: "available" only.

evidence_refs: egress_category, execution_path_count

Classification: structural_authority -- Same as external_egress. The LLM endpoint classification is a structural property from the connector.


4. orphaned_ownership (orphaned-ownership.ts)

What it checks: Reads OWNED_BY and CREATED_BY relationships. Fetches owner entities and checks their status property against a list of non-active statuses (deleted, departed, disabled, disbanded, restructured, expired). Also produces ownership_degraded findings when primary owners are non-active but secondary/inherited owners remain.

Evidence basis: Graph structure (relationships) plus entity status properties. No execution evidence, no version history.

evidence_completeness: Sets current_roles: "available", ownership_records: "available".

evidence_refs: relationships (filtered OWNED_BY/CREATED_BY), execution_path_count

Classification: structural_authority -- Based entirely on the current graph structure: who owns what and whether those owners are active. The status values come from the source system via the connector.


5. ownership_degraded (emitted by orphaned-ownership.ts)

What it checks: Same rule as orphaned_ownership -- fires the ownership_degraded variant when primary owners are non-active but secondary/inherited owners are still active.

Evidence basis: Same as orphaned_ownership.

Classification: structural_authority -- Same rationale.


6. ownership_ambiguous (ownership-ambiguous.ts)

What it checks: Reads OWNED_BY relationships and fetches owner entities. Checks whether all owners are group/team entities (by owner_type or entity_type). Also checks version history to rule out cases where an individual owner previously existed.

Evidence basis: Graph structure (current relationships + entity types) and version history to confirm this is not a degradation.

evidence_completeness: Sets ownership_records: "available".

evidence_refs: owner_ids, owner_types, version_count_checked

Classification: structural_authority -- Based on entity types and relationship structure. The version history check is used to differentiate from ownership_degraded, but the finding itself is about the current structural state.


7. ownership_unknown (ownership-unknown.ts)

What it checks: Verifies that the entity has execution paths (it matters who owns it) but has no OWNED_BY relationships, no CREATED_BY relationships, and no ownership-indicating properties (sys_created_by, created_by, owner, managed_by).

Evidence basis: Absence of metadata -- a metadata quality signal.

evidence_completeness: Sets ownership_records: "unavailable_no_access" with note "No ownership metadata available from source systems".

evidence_refs: execution_path_count, has_owned_by, has_created_by, has_ownership_properties

Classification: structural_authority -- The finding is about missing graph structure. The rule does not infer anything; it reports what is not there.


8. privilege_justification_gap (privilege-justification-gap.ts)

What it checks: Identifies elevated (confidential/restricted) execution paths and queries execution evidence to determine whether the granted write-level actions are actually being used. Detects two gap types: no_activity (no evidence for a resource at all) and action_mismatch (evidence exists but only shows read-level actions when write-level is granted).

Evidence basis: Combination of graph structure (execution paths, sensitivity levels, granted actions) and execution evidence records (observed actions). This is the only rule that correlates structural grants with observed usage patterns.

evidence_completeness: Sets current_roles: "available", execution_evidence: "available".

evidence_refs: elevated_paths_total, gap_count, no_activity_count, action_mismatch_count, observed_evidence_count, gap_resources

Classification: correlated_pattern -- The finding correlates two data sources: structural authority grants and observed execution logs. A gap means the structure says "can write" but the evidence says "only reads" (or nothing). This is neither pure observation nor pure structure -- it is pattern correlation.


9. reachable_sensitive_domain (reachable-sensitive-domain.ts)

What it checks: Reads execution_paths and filters for paths with sensitivity of confidential, restricted, or high. No execution evidence is queried.

Evidence basis: Purely structural -- the computed execution paths with their sensitivity labels.

evidence_completeness: Sets current_roles: "available".

evidence_refs: sensitive_path_count, total_path_count, sensitivity_levels, business_domains

Classification: structural_authority -- The finding reports what an entity can reach based on its role grants and path computation. Whether it actually accesses those resources is not checked by this rule.


10. unknown_identity_binding (unknown-identity-binding.ts)

What it checks: For workload entities, checks for RUNS_AS relationships. Three failure modes: (a) no RUNS_AS at all, (b) RUNS_AS targets exist but cannot be resolved to known entities, (c) multiple RUNS_AS targets resolve -- ambiguous binding.

Evidence basis: Graph structure only -- relationship existence and entity resolution.

evidence_completeness: Sets credential_state: "available" with a note about the specific failure mode.

evidence_refs: runs_as_targets, resolved_count

Classification: structural_authority -- Based entirely on graph relationship structure.


11. unproven_execution (unproven-execution.ts)

What it checks: For workload entities with execution paths, checks execution_count_30d property, then queries direct execution evidence, then checks evidence for RUNS_AS targets. Fires only when no evidence exists anywhere.

Evidence basis: Absence of execution evidence across the entity and all linked identities.

evidence_completeness: Sets execution_evidence: "available" with note about the query scope.

evidence_refs: execution_path_count, runs_as_target_count

Classification: observed_absence -- The finding fires when zero execution evidence records exist across the entity and all linked identities. The claim is about confirmed absence ("we queried and found nothing"), not about observed activity. Using observed_execution ("Proven from execution logs") would be misleading when the entire point is that no logs were found.


12. unresolved_cross_system_auth (unresolved-auth.ts)

What it checks: Reads identity_binding_status property and identitySubtype/workloadSubtype property. Only fires on oauth_app entities with identity_binding_status: "unlinked" -- meaning the connector could not match the OAuth app's client_id to a governed Azure service principal.

Evidence basis: Connector-determined binding status. The connector ran a join operation (client_id matching) and reported the result.

evidence_completeness: Sets credential_state: "available".

evidence_refs: identity_binding_status, identity_subtype, identity_type (fallback when subtype is absent), execution_path_count

Classification: structural_authority -- The rule reads a connector-set property (identity_binding_status) and a subtype filter. This is the same pattern as external_egress (which reads egress_category): a connector-assigned property on the entity. The connector performed the cross-system join and recorded the result as a property; the evaluator rule simply reads that property. Both should be classified consistently as structural_authority.


13. scope_drift (scope-drift.ts)

What it checks: Compares current relationships (HAS_ROLE, GRANTS, USES, RUNS_AS) against the oldest version in history. Detects added/removed targets. Also queries execution evidence (1 record) to determine if authority is "exercised". Checks whether new roles reach sensitive domains via execution paths.

Evidence basis: Version history (temporal comparison) + execution evidence (exercised flag) + execution path sensitivity.

evidence_completeness: Sets current_roles: "available", role_history: "partial" (with note about version diff inference), execution_evidence and credential_state conditionally.

evidence_refs: current_role_count, baseline_role_count, added_role_targets, baseline_version_date, sensitive_domains_affected, exercised, drift_categories, grants_added, uses_added, uses_removed, runs_as_added, runs_as_removed.

Classification: correlated_pattern -- Correlates temporal drift (version history diffs) with structural authority (execution paths, sensitivity) and optionally with observed execution. The "exercised" flag elevates severity but even without it, the finding correlates two data sources.


14. reachability_drift (reachability-drift.ts)

What it checks: Compares current execution_paths against the oldest version's paths. Identifies new destinations and new business domains. Queries execution evidence to determine if authority is exercised. Checks for sensitive domain impact.

Evidence basis: Version history (temporal comparison) + execution paths + execution evidence.

evidence_completeness: Sets current_roles: "available", role_history: "partial", execution_evidence conditionally.

evidence_refs: baseline_destination_count, current_destination_count, new_destination_ids, new_destination_names, new_domains, sensitive_domains_affected, baseline_version_date, exercised

Classification: correlated_pattern -- Same pattern as scope_drift: correlates temporal change with structural authority and optional execution observation.


15. ownership_drift (ownership-drift.ts)

What it checks: Compares current OWNED_BY relationships against the oldest version in history. Identifies removed owners and owners whose status changed from active to non-active since baseline.

Evidence basis: Version history (temporal comparison) + entity status properties.

evidence_completeness: Sets ownership_records: "available", role_history: "partial" (inferred from version diffs).

evidence_refs: baseline_owner_count, current_owner_count, removed_owner_ids, removed_owner_names, disabled_owner_ids, disabled_owner_names, baseline_version_date

Classification: structural_authority -- Although the rule uses version history to detect which owners were removed, the finding's claim is about the current state: owners are gone or disabled now. This is the same pattern as ownership_ambiguous (which also checks version history for differentiation but reports on current structure). Both are classified as structural_authority for consistency.

Proposed Type Definitions

EvidenceClassification Enum

/**
* Classification of the evidence basis for a finding.
*
* Determines how the finding's claim was derived and what level of
* trust a reviewer should place in it.
*
* Values are ordered from highest to lowest confidence:
* 1. observed_execution -- based on actual execution log records showing activity
* 2. observed_absence -- based on querying execution logs and confirming absence of activity
* 3. correlated_pattern -- correlates multiple data sources (e.g., version history + execution evidence)
* 4. structural_authority -- based on the static graph structure (roles, paths, relationships, connector-set properties)
* 5. inferred_capability -- connector or platform inferred a capability from indirect signals
*/
export const EVIDENCE_CLASSIFICATIONS = [
"observed_execution",
"observed_absence",
"correlated_pattern",
"structural_authority",
"inferred_capability"
] as const;

export type EvidenceClassification = (typeof EVIDENCE_CLASSIFICATIONS)[number];

EvidenceClaim Type

/**
* A structured claim statement attached to every finding.
*
* Answers Sergey's four questions:
* 1. What is proven vs. inferred? -> classification + basis
* 2. Why does it matter? -> business_impact
* 3. What is the safest first action? -> recommended_action
* 4. Who should own that action? -> owner_role
*/
export interface EvidenceClaim {
/**
* One-sentence claim statement suitable for display in a finding card.
* Written in plain language for a non-technical reviewer.
*
* @example "This workload has write access to the Payroll domain but
* has only been observed reading data in the last 90 days."
*/
claim_statement: string;

/**
* Static classification of the evidence basis (rule-level).
* Determined by the rule's data access pattern and does not change at runtime.
*/
classification: EvidenceClassification;

/**
* Runtime confidence derived from the actual EvidenceConfidenceLevel
* of the underlying evidence records consulted when producing this finding.
* Maps EvidenceConfidenceLevel -> EvidenceClassification:
* DETERMINISTIC -> observed_execution
* TEMPORAL_INFERRED -> correlated_pattern
* STRUCTURAL -> structural_authority
*
* When no execution evidence records are consulted, this field is omitted.
*/
runtime_confidence?: EvidenceClassification;

/**
* Effective classification shown to the user.
* Computed as: min(classification, runtime_confidence) using the
* EVIDENCE_CLASSIFICATIONS ordering (lower index = higher confidence).
* When runtime_confidence is absent, equals classification.
*/
effective_classification: EvidenceClassification;

/**
* Human-readable label for the effective classification.
* Used in the UI trust indicator.
*
* @example "Proven from execution logs"
* @example "Confirmed absent from execution logs"
* @example "Derived from graph structure"
*/
classification_label: string;

/**
* What data sources contributed to this finding.
* Each entry describes one source of evidence.
*
* @example ["Execution evidence: 3 records from ServiceNow (most recent: 2026-02-15)"]
* @example ["Graph structure: 2 OWNED_BY relationships, both targets status=departed"]
*/
basis: string[];

/**
* One-sentence business impact statement.
* Explains why a business stakeholder should care.
*
* @example "Unreviewed access to restricted financial data increases
* regulatory exposure under SOX controls."
*/
business_impact: string;

/**
* The safest first action to take.
* Not the full remediation plan -- just the lowest-risk next step.
*
* Cross-reference: This maps to `MitigationActionDoc.action` from the
* ownership workflow research (#215). Both should share vocabulary to
* ensure consistency between evidence claims and mitigation workflows.
*
* @example "Review the 3 execution paths to Payroll resources and
* confirm whether write access is still required."
*/
recommended_action: string;

/**
* The role that should own this action.
* Maps to an organizational function, not a specific person.
*
* This is a static recommendation from DEFAULT_OWNER_ROLES. It should
* eventually resolve through the `OwnershipAssignmentDoc` defined in
* the ownership workflow research (#215). The text label alone does NOT
* satisfy the "who should own that action" requirement -- it needs
* integration with the ownership assignment workflow once that ships.
*
* @example "Application Owner"
* @example "Identity Governance Team"
* @example "Security Operations"
*/
owner_role: string;

/**
* Per-section evidence availability snapshot from the finding's evidence completeness.
* This is the standard `EvidenceCompletenessSection` (the same shape already on `FindingDoc`),
* NOT the extended `ClassifiedEvidenceCompleteness` type defined later in this document.
* The `ClassifiedEvidenceCompleteness` type lives on the *evidence pack*, not on the finding.
*
* In other words:
* - `EvidenceClaim.section_confidence` (on the finding) = lightweight availability snapshot
* - `ClassifiedEvidenceCompleteness` (on the evidence pack) = extended version with
* `contributed_to_classification` and `influence_note` per section
*/
section_confidence: EvidenceCompletenessSection;
}

Classification Label Map

/**
* Human-readable labels for each evidence classification.
* Used in UI trust indicators and API responses.
*/
export const CLASSIFICATION_LABELS: Record<EvidenceClassification, string> = {
observed_execution: "Proven from execution logs",
observed_absence: "Confirmed absent from execution logs",
correlated_pattern: "Correlated across data sources",
structural_authority: "Derived from graph structure",
inferred_capability: "Inferred from indirect signals"
};

Owner Role Map

Cross-reference: These default owner roles are static text recommendations. They do not replace the ownership workflow from #215. Once OwnershipAssignmentDoc is implemented, the owner_role on EvidenceClaim should resolve through that workflow rather than relying solely on this lookup table. The vocabulary used here should align with the role taxonomy defined in the ownership research.

/**
* Default owner role for each finding type.
* Can be overridden per-tenant in configuration.
* See #215 (ownership workflow research) for how these map to OwnershipAssignmentDoc.
*/
export const DEFAULT_OWNER_ROLES: Record<FindingType, string> = {
orphaned_ownership: "Identity Governance Team",
ownership_degraded: "Identity Governance Team",
ownership_ambiguous: "Identity Governance Team",
ownership_unknown: "Identity Governance Team",
ownership_drift: "Identity Governance Team",
dormant_authority: "Application Owner",
privilege_justification_gap: "Application Owner",
unresolved_cross_system_auth: "Security Operations",
unproven_execution: "Application Owner",
unknown_identity_binding: "Security Operations",
reachable_sensitive_domain: "Data Protection Officer",
llm_egress: "Security Operations",
external_egress: "Security Operations",
scope_drift: "Application Owner",
reachability_drift: "Application Owner"
};

Integration with FindingDoc

The EvidenceClaim field should be added to FindingDoc as an optional field during migration, becoming required once all rules populate it:

export interface FindingDoc {
// ... existing fields ...

/**
* Structured evidence claim. Populated by the evaluator when the
* finding is created or updated. Contains the classification,
* claim statement, business impact, and recommended action.
*
* Optional during migration (Phase 1). Required after Phase 2.
*/
evidence_claim?: EvidenceClaim;
}

The RuleFindingCandidate in src/evaluator/types.ts should also gain the field. During Phase 1, the field is optional to avoid a big-bang migration (all 14 rule files would need updating simultaneously before anything compiles). Rules are updated incrementally; once all rules populate the field, it becomes required in Phase 2:

export interface RuleFindingCandidate {
// ... existing fields ...

/**
* Evidence claim metadata.
* Optional during Phase 1 (incremental rule migration).
* Required after Phase 2 (all rules populated).
*/
evidenceClaim?: EvidenceClaim;
}

Classification Table

Finding TypeRule FileEvidence BasisClassificationDefault SeverityOwner RoleNotes
dormant_authoritydormant-authority.tsExecution evidence timestamps, RUNS_AS traversal, connector-computed last_observed_execution_timestampobserved_absencehighApplication OwnerFires on absence of recent execution evidence
external_egressexternal-egress.tsegress_category entity property from connectorstructural_authoritymediumSecurity OperationsConnector classifies egress target; no runtime verification
llm_egressllm-egress.tsegress_category entity property from connectorstructural_authorityhighSecurity OperationsSame pattern as external_egress
orphaned_ownershiporphaned-ownership.tsOWNED_BY/CREATED_BY relationships, owner entity statusstructural_authoritycriticalIdentity Governance TeamStatus values come from source system via connector
ownership_degradedorphaned-ownership.tsOWNED_BY relationships, owner entity status, ownership_levelstructural_authorityhighIdentity Governance TeamEmitted by same rule as orphaned_ownership
ownership_ambiguousownership-ambiguous.tsOWNED_BY relationships, owner entity types, version historystructural_authoritymediumIdentity Governance TeamVersion history used for differentiation only
ownership_unknownownership-unknown.tsAbsence of OWNED_BY/CREATED_BY relationships and ownership propertiesstructural_authoritymediumIdentity Governance TeamMetadata quality signal
privilege_justification_gapprivilege-justification-gap.tsExecution paths (sensitivity, actions) + execution evidence (observed actions)correlated_patternmediumApplication OwnerOnly rule that correlates granted vs. observed actions
reachable_sensitive_domainreachable-sensitive-domain.tsExecution paths with elevated sensitivitystructural_authoritymedium/highData Protection OfficerSeverity depends on restricted vs. confidential
unknown_identity_bindingunknown-identity-binding.tsRUNS_AS relationships, entity resolutionstructural_authorityhighSecurity OperationsThree failure modes: missing, unresolvable, ambiguous
unproven_executionunproven-execution.tsExecution evidence query (zero results), RUNS_AS traversal, execution_count_30dobserved_absencehighApplication OwnerFires on confirmed absence of any execution evidence
unresolved_cross_system_authunresolved-auth.tsConnector-set identity_binding_status, identity_subtype, identity_type (fallback)structural_authoritymediumSecurity OperationsReads connector-set property, same pattern as external_egress
scope_driftscope-drift.tsVersion history diffs (HAS_ROLE, GRANTS, USES, RUNS_AS) + execution evidence + path sensitivity. Refs: grants_added, uses_added, uses_removed, runs_as_added, runs_as_removedcorrelated_patternmedium-criticalApplication OwnerSeverity escalates with execution evidence and sensitive domains
reachability_driftreachability-drift.tsVersion history diffs (execution_paths) + execution evidence + domain sensitivitycorrelated_patternmedium-criticalApplication OwnerSame escalation pattern as scope_drift
ownership_driftownership-drift.tsVersion history diffs (OWNED_BY) + owner entity statusstructural_authoritymedium/highIdentity Governance TeamClaim is about current state (owners gone/disabled now); version history used for detection, not for the claim

Evidence Pack Integration

Per-Section Confidence

The existing EvidenceCompletenessSection already tracks six categories. The proposal adds a confidence interpretation layer on top:

/**
* Per-section confidence metadata for an evidence pack.
* Extends the existing EvidenceCompletenessSection with
* classification-relevant context.
*/
export interface SectionConfidenceMetadata {
/** The existing availability status */
availability: EvidenceAvailability;

/**
* Whether this section contributed to the finding classification.
* Not all sections are relevant to every finding type.
*/
contributed_to_classification: boolean;

/**
* How this section influenced the classification.
* Only populated when contributed_to_classification is true.
*
* @example "Execution evidence records provided the temporal threshold comparison"
* @example "Role history inferred from version diffs (not from audit logs)"
*/
influence_note?: string;
}

/**
* Extended evidence completeness with classification metadata.
* Carried in the evidence pack, not in the finding itself.
*/
export interface ClassifiedEvidenceCompleteness {
sections: Record<keyof Omit<EvidenceCompletenessSection, "notes">, SectionConfidenceMetadata>;
notes: Record<string, string>;
}

Claim Derivation

The overall finding classification is derived deterministically from the rule's data access pattern:

  1. Rule declares its static classification. Each rule knows what data it accesses. The static classification is a property of the rule that does not change between invocations.

  2. Runtime confidence overlays the static classification. When a rule consults execution evidence records, the EvidenceConfidenceLevel of those records is mapped to an EvidenceClassification and stored as runtime_confidence. The effective_classification shown to the user is the minimum of (static classification, runtime confidence) using the EVIDENCE_CLASSIFICATIONS ordering. For example, a scope_drift finding (static: correlated_pattern) backed by STRUCTURAL-confidence evidence records gets an effective classification of structural_authority, not correlated_pattern. When no execution evidence records are consulted, runtime_confidence is omitted and effective_classification equals the static classification.

  3. Severity may vary, static classification does not. A scope_drift finding is always statically correlated_pattern regardless of whether execution evidence was found. The severity escalates when execution evidence exists. The effective classification may differ from the static one based on runtime confidence.

  4. Claim statement is templated per rule. Each rule provides a template function that interpolates rule-specific values (counts, names, timestamps) into a human-readable sentence.

  5. Business impact is derived from the affected domains. When a finding touches sensitive business domains (from execution paths), the business impact references those domains. When it does not, a generic impact statement is used.

Example claim derivation for dormant_authority:

// In dormant-authority.ts evaluate():
const staticClassification = "observed_absence" as const;
const runtimeConfidence = mostRecent
? mapConfidenceToClassification(mostRecent.confidence) // e.g., DETERMINISTIC -> observed_execution
: undefined;
const effectiveClassification = runtimeConfidence
? minClassification(staticClassification, runtimeConfidence)
: staticClassification;

const claim: EvidenceClaim = {
claim_statement: mostRecent
? `This ${label.toLowerCase()} has ${paths.length} execution path(s) but has not been active for ${daysSinceActivity} days.`
: `This ${label.toLowerCase()} has ${paths.length} execution path(s) but has never produced execution evidence.`,
classification: staticClassification,
runtime_confidence: runtimeConfidence,
effective_classification: effectiveClassification,
classification_label: CLASSIFICATION_LABELS[effectiveClassification],
basis: mostRecent
? [`Execution evidence: most recent record at ${mostRecent.source_timestamp.toISOString()} (${daysSinceActivity} days ago)`]
: ["Execution evidence: queried entity and RUNS_AS targets, zero records found"],
business_impact: buildBusinessImpact(paths),
recommended_action: "Review the execution paths and confirm whether this authority is still needed. If not, revoke the associated roles.",
owner_role: DEFAULT_OWNER_ROLES.dormant_authority,
section_confidence: evidenceCompleteness
};

Two-Layer Confidence Model

Classification is not purely a static per-rule property. A rule classified as correlated_pattern (e.g., scope_drift) may, at runtime, consult execution evidence records whose EvidenceConfidenceLevel is only STRUCTURAL. In that case, the user-facing classification should reflect the weaker runtime confidence, not the optimistic static label.

The model works as follows:

  1. Static rule classification (classification field): Determined by the rule's data access pattern. Immutable per rule -- dormant_authority is always observed_absence, scope_drift is always correlated_pattern, etc.

  2. Runtime evidence confidence (runtime_confidence field): Derived from the actual EvidenceConfidenceLevel of the execution evidence records consulted during evaluation. The mapping is:

    EvidenceConfidenceLevelMaps to EvidenceClassification
    DETERMINISTICobserved_execution
    TEMPORAL_INFERREDcorrelated_pattern
    STRUCTURALstructural_authority

    When a rule consults multiple evidence records, the minimum confidence (weakest link) is used. When a rule does not consult any execution evidence records, runtime_confidence is omitted.

  3. Effective classification (effective_classification field): The minimum of (static classification, runtime confidence) using the EVIDENCE_CLASSIFICATIONS ordering. This is the value shown to the user and used for filtering/sorting.

Example: A scope_drift finding (static: correlated_pattern) backed by execution evidence with confidence: STRUCTURAL gets:

  • classification: correlated_pattern
  • runtime_confidence: structural_authority
  • effective_classification: structural_authority (the weaker of the two)
  • classification_label: "Derived from graph structure"

Example: A dormant_authority finding (static: observed_absence) where the most recent evidence record has confidence: DETERMINISTIC gets:

  • classification: observed_absence
  • runtime_confidence: observed_execution
  • effective_classification: observed_absence (absence is weaker than presence)
  • classification_label: "Confirmed absent from execution logs"

This ensures the user-facing label never overstates the actual evidence quality.

API Contract Changes

GET /findings/:id

Add evidence_claim to the detail response:

{
"data": {
"id": "abc123",
"finding_type": "dormant_authority",
"severity": "high",
// ... existing fields ...

// NEW: structured evidence claim
"evidence_claim": {
"claim_statement": "This workload has 4 execution paths but has not been active for 127 days.",
"classification": "observed_absence",
"runtime_confidence": "observed_execution",
"effective_classification": "observed_absence",
"classification_label": "Confirmed absent from execution logs",
"basis": [
"Execution evidence: most recent record at 2025-11-19T14:22:00Z (127 days ago)",
"Checked 2 linked identities via RUNS_AS"
],
"business_impact": "Dormant authority to Payroll and HR domains creates standing risk without active justification.",
"recommended_action": "Review the 4 execution paths and confirm whether this authority is still needed. If not, revoke the associated roles.",
"owner_role": "Application Owner",
"section_confidence": {
"current_roles": "unavailable_not_applicable",
"role_history": "unavailable_not_applicable",
"execution_evidence": "available",
"ownership_records": "unavailable_not_applicable",
"approval_records": "unavailable_not_applicable",
"credential_state": "unavailable_not_applicable",
"notes": {
"execution_evidence": "Most recent evidence is 127 days old (beyond 90-day threshold)"
}
}
}
}
}

GET /findings (list)

Add summary classification fields to the normalized list response:

{
"data": [
{
"id": "abc123",
"finding_type": "dormant_authority",
"severity": "high",
// ... existing fields ...

// NEW: classification summary (lightweight, no full claim)
"evidence_classification": "observed_absence",
"effective_classification": "observed_absence",
"evidence_classification_label": "Confirmed absent from execution logs"
}
],
"meta": {
"total_count": 42,
"bySeverity": { "high": 12, "medium": 20, "critical": 5, "low": 5 },
"byType": { /* ... */ },

// NEW: classification distribution
"byClassification": {
"observed_execution": 5,
"observed_absence": 6,
"correlated_pattern": 15,
"structural_authority": 16
}
}
}

GET /exposures and GET /exposures/:id

Add the highest-confidence classification across all findings for an exposure:

{
"data": {
"id": "EXP-abc123",
// ... existing fields ...

// NEW: worst-case classification across all findings
"evidence_classifications": ["observed_execution", "structural_authority"],
"primary_classification": "observed_execution"
}
}

Filtering and Sorting

New query parameters for GET /findings:

ParameterTypeDescription
classificationstringFilter by evidence classification. Comma-separated for multiple values.
sort=classificationstringSort by classification confidence order (observed_execution > observed_absence > correlated > structural > inferred).

Example: GET /api/v1/findings?classification=observed_execution,correlated_pattern&sort=classification

Scanability caveat: Adding a classification label to the findings list improves filtering but does not, by itself, solve the "too many clicks" problem. The "why/action/owner" payload (business_impact, recommended_action, owner_role) is still only available in the detail response. Evidence classification is one piece of the puzzle; see #217 (Wiz UX research) for user-experience direction and #224 (expand default-visible info in list view) for the complementary API work needed.

MongoDB Index Requirement

The classification filter requires a compound index on the findings collection:

{ tenant_id: 1, "evidence_claim.effective_classification": 1, status: 1 }

This supports the GET /findings?classification=... query without a collection scan. The index should be created as part of the Phase 2 API migration.

UI Direction

Specific UI treatment should be informed by the Wiz UX research (#217). Sergey explicitly asked for UX research before proposing a final solution. The following are directional principles only:

  • Make evidence classification visible. The classification (and its human-readable label) should be surfaced wherever findings appear -- list views, detail views, and exposure summaries.
  • Don't bury it. Classification should be as prominent as severity, not hidden behind an expand/collapse.
  • Use plain-English labels. Labels like "Confirmed absent from execution logs" and "Derived from graph structure" are more useful than enum values. The classification_label field exists for this purpose.
  • Distinguish observed-presence from observed-absence. Users should understand the difference between "we saw this happen" (observed_execution) and "we looked and confirmed it didn't happen" (observed_absence). The labels and any visual treatment should not conflate these.
  • Show the two-layer model when relevant. When runtime_confidence differs from classification, the UI should make it possible to understand why the effective classification was downgraded (e.g., "Rule classification: correlated pattern, but underlying evidence is structural-only").

Detailed component design (badges, colors, icons, layouts, panels) is deferred until after #217 research is complete.

Implementation Sequence

Phase 1: Types and Rule Changes (Sprint S+1)

Deliverables:

  1. Add EvidenceClassification, EvidenceClaim, CLASSIFICATION_LABELS, DEFAULT_OWNER_ROLES to src/domain/findings/types.ts
  2. Add evidence_claim?: EvidenceClaim to FindingDoc
  3. Add evidenceClaim?: EvidenceClaim to RuleFindingCandidate (optional during Phase 1 to allow incremental rule migration)
  4. Add mapConfidenceToClassification() and minClassification() utilities for the two-layer confidence model
  5. Incrementally update evaluator rules to populate evidenceClaim (can be done rule-by-rule; no big-bang required since the field is optional)
  6. Add a buildBusinessImpact(paths: ExecutionPath[]): string utility that generates business impact statements from affected domains
  7. Add a buildClaimStatement(findingType: FindingType, templateVars: Record<string, unknown>): string utility for templated claim generation
  8. Update src/evaluator/ orchestrator to pass evidenceClaim through to FindingDoc
  9. Migration: backfill evidence_claim for existing findings (offline script -- see Backfill Strategy below)

Backfill Strategy:

The backfill script re-evaluates existing findings to populate evidence_claim. Key decisions:

  • Non-authoritative marking: Backfilled claims are inherently ahistorical — they reflect the state at backfill time, not the posture that originally produced the finding. For a product positioning itself around deterministic truth and repeatable outputs, this distinction matters. Backfilled claims MUST carry backfilled: true and claim_generated_at (distinct from the finding's created_at). The UI should render these with a subtle indicator (e.g., "Classification generated retroactively") so users know the claim was not computed at evaluation time.
  • Entity state constraints: The backfill uses current entity state because historical snapshots are not stored. This means an old finding may carry a claim based on today's posture. To mitigate: (a) the backfilled flag alerts consumers, (b) the runtime_confidence field reflects current evidence availability which may differ from original, (c) if the static rule_classification and runtime_confidence produce a different effective_classification than would have been computed originally, the backfilled flag signals this is expected.
  • Deleted entities: When a finding's target entity has been deleted, the backfill script cannot produce a meaningful claim. These findings should have evidence_claim set to null with a note in the finding's deterministic_explanation field: "Entity deleted; evidence claim could not be generated retroactively." They will appear as unclassified in the UI until resolved/archived.
  • Idempotency: The script must be idempotent — re-running it on a finding that already has evidence_claim should be a no-op unless the claim schema version has changed.
  • Scope limitation: Backfilled claims should NOT be used for audit or compliance purposes. They are a best-effort enrichment for UI display. Only claims generated at evaluation time (where backfilled is false/absent) carry full deterministic trust.

Estimated effort: 4-5 days for one engineer (extra day for backfill script and two-layer confidence utilities).

Phase 2: API Exposure and Connector-Report Classification (Sprint S+1)

Deliverables:

  1. Make evidenceClaim required on RuleFindingCandidate (all rules now populate it)
  2. Update GET /findings/:id to include evidence_claim in response
  3. Update GET /findings to include evidence_classification, effective_classification, and evidence_classification_label in list items
  4. Add byClassification to findings list meta
  5. Add classification query parameter for filtering
  6. Add classification as a valid sort field
  7. Create MongoDB compound index: { tenant_id: 1, "evidence_claim.effective_classification": 1, status: 1 }
  8. Update GET /exposures and GET /exposures/:id to include classification summary
  9. Update OpenAPI/Zod schemas for all changed endpoints
  10. Connector-report findings: Classify all connector-report findings (those coming through FindingsStore in-memory rather than evaluator rules) as structural_authority by default. Add a classification field to the connector report schema so that connectors can override the default. Connector-report findings should produce an EvidenceClaim with a generic claim statement derived from the connector's description field and a basis of ["Connector detection logic"].

Estimated effort: 3-4 days for one engineer (extra time for connector-report integration).

Phase 3: UI Visualization (Sprint S+2)

Prerequisite: Wiz UX research (#217) should be complete before finalizing UI deliverables. Phase 3 scope will be defined based on #217 findings.

Tentative deliverables (subject to #217 research):

  1. Surface effective_classification and classification_label in finding list and detail views
  2. Classification filter in the findings list
  3. Exposure detail: classification summary
  4. Specific component design (badge style, colors, icons, layout) to be determined after #217

Estimated effort: TBD after #217.

Open Questions

  1. Should classification be immutable per finding type, or can it change at runtime? Resolved: the two-layer model (static rule classification + runtime evidence confidence) addresses this. The static classification is immutable per rule. The effective_classification can differ based on the runtime confidence of underlying evidence records. See "Claim Derivation" section.

  2. Should correlated_pattern findings show which correlation elevated them? For example, a scope_drift finding where execution evidence was found could display "Correlated: version history + execution logs" while one without execution evidence could display "Correlated: version history + path sensitivity". This adds complexity to the basis field but makes it more informative. Recommendation: yes, populate basis dynamically based on which data sources were actually present.

  3. How should connector-report findings (non-evaluator) be classified? Resolved: moved to Phase 2 implementation plan. Connector-report findings default to structural_authority (consistent with how evaluator rules that read connector-set properties are classified). Connectors can override via a classification field in the report schema.

  4. Should the owner_role be tenant-configurable? The DEFAULT_OWNER_ROLES map provides sensible defaults, but different organizations have different role names. Recommendation: make it configurable via tenant settings in a future phase. For Phase 1, use defaults.

  5. What about findings that span multiple entities? Currently, every finding is scoped to a single entity. If a future rule produces cross-entity findings, the claim statement and owner role may need to handle multiple subjects. Recommendation: defer until a concrete cross-entity rule is designed.

  6. How does this interact with compliance_references? The existing ComplianceReference field on findings maps finding types to compliance frameworks (SOX, SOC2, etc.). The business_impact field in EvidenceClaim should reference these frameworks when available. Recommendation: have the claim builder consult getComplianceReferences(findingType) and include relevant framework names in the business impact statement.

References

Source Files

  • Evidence types: src/domain/evidence/types.ts -- EvidenceConfidenceLevel, ExecutionEvidenceDoc
  • Finding types: src/domain/findings/types.ts -- FindingDoc, FindingType, FINDING_TYPES
  • Evidence pack types: src/domain/evidence-packs/types.ts -- EvidenceCompletenessSection, EvidencePackContent, EvidencePackDoc
  • Evidence sections builder: src/evidence/sections.ts -- buildEvidencePackContent(), buildEvidenceCompleteness()
  • Evaluator types: src/evaluator/types.ts -- RuleFindingCandidate, FindingRule, EvaluationContext
  • Graph transformer: src/ingestion/graph-transformer.ts -- resolveConfidence()
  • Ingest route: src/api/routes/ingest.ts -- NormalizedGraphSchema, EVIDENCE_CONFIDENCE_LEVELS
  • Findings route: src/api/routes/findings.ts -- createFindingsRoutes()
  • Exposures route: src/api/routes/exposures.ts -- createExposureRoutes()
  • Evaluator rules index: src/evaluator/rules/index.ts -- ALL_RULES
  • Entity types: src/domain/entities/types.ts -- EntityDoc, ExecutionPath, EntityVersionDoc

Evaluator Rules (14 rule files producing 15 finding types)

#RuleFile
1dormant_authoritysrc/evaluator/rules/dormant-authority.ts
2external_egresssrc/evaluator/rules/external-egress.ts
3llm_egresssrc/evaluator/rules/llm-egress.ts
4orphaned_ownershipsrc/evaluator/rules/orphaned-ownership.ts
5ownership_degradedsrc/evaluator/rules/orphaned-ownership.ts (variant)
6ownership_ambiguoussrc/evaluator/rules/ownership-ambiguous.ts
7ownership_unknownsrc/evaluator/rules/ownership-unknown.ts
8privilege_justification_gapsrc/evaluator/rules/privilege-justification-gap.ts
9reachable_sensitive_domainsrc/evaluator/rules/reachable-sensitive-domain.ts
10unknown_identity_bindingsrc/evaluator/rules/unknown-identity-binding.ts
11unproven_executionsrc/evaluator/rules/unproven-execution.ts
12unresolved_cross_system_authsrc/evaluator/rules/unresolved-auth.ts
13scope_driftsrc/evaluator/rules/scope-drift.ts
14reachability_driftsrc/evaluator/rules/reachability-drift.ts
15ownership_driftsrc/evaluator/rules/ownership-drift.ts

GitHub