Skip to main content

Evidence Model Separation: Claim Type vs Evidence Strength

Problem Statement

The current EvidenceClassification type conflates two orthogonal concerns into a single enum:

  • Claim type — What are we asserting? (execution happened, permission exists structurally, capability is inferred, absence confirmed)

  • Evidence strength — How confident are we? (deterministic proof, strong correlation, structural derivation, inference)

The existing model defines five values that each encode both dimensions simultaneously:

export const EVIDENCE_CLASSIFICATIONS = [
"observed_execution",
"observed_absence",
"correlated_pattern",
"structural_authority",
"inferred_capability",
] as const;

As Sergey noted in his March 31 founder feedback: the platform must distinguish between "the type of claim being made and the strength of the supporting evidence." These are separate axes. A claim about execution can have deterministic or correlated evidence. A claim about permission can have structural or inferred evidence. Flattening this into a single dimension loses information and constrains how the UI can present trust to users.

Current Model Analysis

Each of the five current classifications implicitly encodes a (claim_type, strength) pair:

Current ClassificationImplicit Claim TypeImplicit Evidence StrengthRank
observed_executionExecution happenedDeterministic0
observed_absenceExecution did NOT happenDeterministic1
correlated_patternExecution likelyCorrelation2
structural_authorityPermission existsStructural derivation3
inferred_capabilityExecution possibleInference4

This is a 2D matrix being flattened into a 1D ranking. The flattening creates several problems:

  1. No way to express new combinations. A structurally-derived execution claim (e.g., "this identity executed based on permission + temporal correlation") has no classification. It would need to be forced into either correlated_pattern or structural_authority, losing precision.

  2. Ranking conflates confidence with claim semantics. observed_absence (rank 1) is ranked lower than observed_execution (rank 0) even though both are deterministic. The ranking is really about claim importance, not evidence quality.

  3. UI cannot independently display trust. The badge system (PR #242) must map a single value to both "what kind of thing is this?" and "how much should I trust it?" — forcing color to carry two meanings.

  4. effective_classification logic is constrained. The current weakestClassification function compares two values on a single axis, but "weakest" is only meaningful for evidence strength, not for claim type.

Proposed Two-Axis Model

Claim Types (what we assert)

Claim TypeMeaning
execution_observedWe saw this identity execute this action in source system logs
execution_absentWe confirmed this identity did NOT execute this action
permission_existsStructural authority exists (role, permission, or policy grant)
capability_inferredIndirect signals suggest this identity could perform this action

Claim types are mutually exclusive for a given evidence claim. Each claim asserts exactly one thing.

Evidence Strength (how confident we are)

Evidence StrengthMeaning
deterministicDirect proof from source system logs or records
correlatedCross-source pattern match, high confidence
structuralDerived from graph structure, permissions, or configuration
inferredIndirect signals only, lowest confidence

Evidence strength is orderable — deterministic > correlated > structural > inferred. This ordering is what the renamed weakestStrength function should operate on (currently weakestClassification).

The 2D Matrix

This separation allows the full matrix of valid combinations:

Claim \ Strengthdeterministiccorrelatedstructuralinferred
execution_observedDirect log proofCross-source correlation
execution_absentConfirmed no logsAbsence across sources
permission_existsRole/permission graphIndirect permission signal
capability_inferredStructural reachabilityBehavioral inference

Not all cells are valid. Execution claims require deterministic or correlated evidence. Permission claims require structural or inferred evidence. This constraint should be encoded in the type system.

Mapping from Current Model

The migration is deterministic — each old value maps to exactly one (claim_type, strength) pair:

Currentclaim_typeevidence_strength
observed_executionexecution_observeddeterministic
observed_absenceexecution_absentdeterministic
correlated_patternexecution_observedcorrelated
structural_authoritypermission_existsstructural
inferred_capabilitycapability_inferredinferred

Aggregation Semantics

Evidence Strength Aggregation

When an access chain contains multiple evidence claims, the effective strength is the weakest strength across all claims in the chain. This is the existing weakestClassification logic, renamed to weakestStrength. It operates purely on the strength axis:

deterministic > correlated > structural > inferred

If a chain has one deterministic claim and one inferred claim, the effective strength is inferred. This is conservative by design — the chain is only as strong as its weakest link.

Claim Type Aggregation

Claim type does not aggregate the same way. An access chain may contain claims of different types — e.g., an execution_observed claim (identity ran a query) alongside a permission_exists claim (identity has a role grant). These are not comparable on a single axis.

For access chain summarization, the effective claim type follows a priority rule — the chain is characterized by the strongest assertion it contains:

  1. execution_observed — chain includes observed activity (strongest: proves usage)
  2. execution_absent — chain includes confirmed non-usage
  3. permission_exists — chain is structural only (permission granted, no execution data)
  4. capability_inferred — chain is entirely inferred (weakest: no direct evidence)

If a chain has both execution_observed and capability_inferred claims, the effective claim type is execution_observed — the chain demonstrably includes real activity, even if some paths within it are inferred.

This is semantically different from strength aggregation: strength takes the weakest (conservative for trust), claim type takes the strongest (most informative for triage).

Decided: The UI shows the effective claim type as the primary label, with an inline count breakdown (e.g., "4 observed · 1 inferred") visible without hover. This keeps cards scannable while answering "how much of this chain is proven?" without requiring the detail view.

User-Facing Trust Language

The four evidence strength values need simplified, non-technical labels for the UI. Proposed mapping:

Evidence StrengthUser-Facing LabelBadge ColorMeaning for the User
deterministicConfirmedGreenDirect proof from source systems
correlatedLikelyBlueCorrelated across data sources
structuralConfiguredAmberDerived from permissions and configuration
inferredPossibleGrayInferred from indirect signals

"Configured" was chosen over "Structural" because it communicates to non-technical users that the evidence comes from how systems are set up (roles, permissions, policies) rather than from observed behavior. "Derived" was rejected as too vague; "Granted" implies a deliberate human act, which doesn't cover inherited or default permissions. Decided: "Configured".

Claim type should be communicated via icon or label, not color:

  • Execution observed → activity/log icon
  • Execution absent → empty/check icon
  • Permission exists → key/lock icon
  • Capability inferred → question/signal icon

This gives users two independent visual channels: color for "how much do I trust this?" and icon for "what kind of claim is this?"

Impact Assessment

EvidenceClaim Interface

Add claim_type field, rename classification to evidence_strength:

export interface EvidenceClaim {
claim_statement: string;
claim_type: ClaimType; // NEW
evidence_strength: EvidenceStrength; // RENAMED from classification
runtime_confidence?: EvidenceStrength;
effective_strength: EvidenceStrength; // RENAMED from effective_classification
strength_rank: number; // RENAMED from classification_rank
strength_label: string; // RENAMED from classification_label
basis: string[];
business_impact: string;
recommended_action: string;
section_confidence: EvidenceCompletenessSection;
}

Effective Strength Logic (replaces effective_classification)

The computeEffectiveClassification function becomes computeEffectiveStrength and operates only on the strength axis. The weakestClassification function becomes weakestStrength. Claim type does not participate in strength computation — it passes through unchanged.

Evaluator Rules

All 24 evaluator rule files call buildEvidenceClaim with a classification parameter. Each call must be updated to provide both claim_type and evidence_strength. Since every rule currently passes a single classification that deterministically maps to one (claim_type, strength) pair, this is a mechanical change.

UI Badges (PR #242)

The current badge system maps classification to a single color. Under the new model:

  • Badge color maps to evidence_strength (Confirmed=green, Likely=blue, Configured=amber, Possible=gray)

  • A separate icon or label maps to claim_type

Evidence Packs

section_confidence in evidence packs is unaffected. It measures data availability (which connector sections returned data), not claim strength. No changes needed.

API Responses

  • evidence_classification field → evidence_strength

  • New field: evidence_claim_type

  • effective_classificationeffective_strength

  • classification_rankstrength_rank

  • classification_labelstrength_label

Migration Feasibility

Since each current classification value deterministically maps to exactly one (claim_type, evidence_strength) pair, migration can be done in a single pass:

  1. Compute the new fields from the old field for all stored documents.

  2. No ambiguity, no manual review needed.

  3. Backwards compatibility can be maintained by computing the old field from the new pair during the transition period.

Migration Path

Phase 1: Add New Fields (Backwards Compatible)

  • Add claim_type and evidence_strength fields to EvidenceClaim interface as optional fields

  • Populate new fields alongside existing ones in buildEvidenceClaim

  • API returns both old and new field names

  • No breaking changes for consumers

Phase 2: Update Evaluator Rules

  • Update all 24 evaluator rule files to explicitly provide claim_type and evidence_strength

  • Update buildEvidenceClaim signature to require both new fields

  • Rename computeEffectiveClassification to computeEffectiveStrength

Phase 3: Update UI

  • Update badge rendering to use evidence_strength for color

  • Add claim type icons

  • Update user-facing labels to Confirmed / Likely / Configured / Possible

  • Update any filtering or sorting logic to use new field names

Phase 4: Deprecate Old Fields

  • Mark classification, effective_classification, classification_rank, and classification_label as deprecated in API

  • Remove old fields from EvidenceClaim interface

  • Remove backwards-compatibility mapping

  • Clean up stored documents

Next Action

Status: adopted

Decisions made:

  • Evidence model: Full two-axis separation (claim type + evidence strength). Not a cosmetic rename — the 2D matrix and independent UI channels are required.

  • User-facing labels: Confirmed / Likely / Configured / Possible. "Configured" chosen over "Structural", "Derived", and "Granted".

  • Claim type display: Effective claim type as primary label, with inline count breakdown (e.g., "4 observed · 1 inferred").

  • Timing: Implement before access chain UI work so the new model is the foundation, not a retrofit.

Implementation:

  1. Create GitHub issues in sv0-platform for each migration phase

  2. Phase 1 (backwards-compatible new fields) can start immediately

  3. Phases 2–4 follow sequentially per the migration path above