Skip to main content

Access Chain — Identity-Anchored Control Primitive

GitHub Issue: SecurityV0/sv0-platform#216

Trigger: Sergey's March 26 feedback: "We should re-examine whether the current one-row-per-path presentation is the right unit for risk, drift, and remediation."

Reframe (March 31): Founder response elevated the question from grouping/noise-reduction to defining the canonical product unit. The access chain is the thing the customer acts on — not a grouped view over paths.

We show actual access chains based on observed execution, not just assigned permissions.


1. Current Data Model

Path Primary Key (4-tuple)

AuthorityPathDoc._id = SHA256(tenant_id : workload_id : identity_id : destination_id)

One row per (workload, identity, destination) triple. Multiple roles/actions reaching the same destination through the same identity are merged into a single path via composition_hash.

Existing Grouping Primitives

PrimitiveKeyIndexedGroups by
path_lineage_idSHA256(tenant, workload, destination)YesAll identities reaching same destination from same workload
identity_idEntity IDYesAll paths through one identity (queryable, not aggregated)
workload_idEntity IDYes (compound)All paths from one workload
composition_hashSHA256(identity, roles, actions)NoChange detection for upserts

What's Missing

  • No aggregation key for (identity + all destinations) or (identity + all data domains)

  • No API endpoint: "show all destinations reachable by identity X"

  • No UI component for collapsed/expanded identity grouping

  • No grouped drift delta showing role changes per identity across all affected paths


2. Noise Analysis — Demo Data

Path Distribution

MetricCount
Total authority paths29
Unique identities (with binding)7
Paths without identity (unbound)6
Paths with identity23
Avg destinations per identity3.3
Max paths for single identity6 (id-svc-ascribe-prod)
Shared identity across workloads1 (id-svc-foundry: 5 paths across 2 workloads)

Identity Fan-Out Detail

IdentityWorkload(s)PathsDestinationsRoles
id-svc-financewl-invoice-rule53 (financial-api, invoice-archive, apar-ledger)5
id-svc-ascribe-prodwl-ascribe-summarizer65 (clinical-notes, oncology, psych, billing, health-files)2
id-svc-foundrywl-foundry-agent + wl-foundry-provisioner55 (azure-openai, azure-func, sn-incident-api, logic-app, sn-incident-table)4
id-svc-hr-syncwl-hr-sync222
id-svc-auditwl-audit-export221
id-svc-it-opswl-it-router222
id-svc-sec-auditwl-sec-logger111

Noise Observations

  1. id-svc-finance appears as 5 separate rows. The real story is: "one service account reaches 3 financial systems via 5 roles." That's one remediation conversation, not five.

  2. id-svc-ascribe-prod appears as 6 rows touching clinical data. The risk is the combined reachable surface (5 healthcare destinations), not 6 individual path risks.

  3. id-svc-foundry spans 2 workloads (agent + provisioner) — 5 rows. This is actually the most important case: one identity shared across workloads, meaning compromise of that identity affects both.

  4. Remediation overlap: In the current flat view, the action "review roles assigned to id-svc-finance" would appear identically on 5 separate path rows.

  5. Drift fragmentation: If id-svc-finance gains a new role, scope_drift fires on all 5 paths separately. The analyst sees 5 drift findings when the root cause is one role change on one identity.

  6. Exposure is combinatorial. Risk is not the sum of isolated paths — it is the union of reachable surface. Several individually modest paths can create a materially different class of risk when combined across domains and systems.

Projected Scale

At production scale (100+ workloads, 50+ identities), the flat path list could grow to 500-2000 rows. Access chains would reduce this to 50-200 objects — a 5-10x reduction — and each object is directly actionable.


3. Impact Assessment

3.1 Evaluator Rules

14 entity-level rules: Evaluate on EntityDoc, not path. Reference entity.execution_paths[] for severity/evidence. No impact from access chain introduction.

10 path-level rules: Evaluate on individual AuthorityPathDoc. Finding IDs are keyed by path._id:

stableFindingId(tenantId, findingType, path._id)
Impact AreaSeverityDetail
Finding ID stabilityHIGH if changedSwitching from path-keyed to identity-keyed finding IDs would orphan all historical findings
Path-level evaluation logicNoneRules check individual path fields (sensitivity, execution_30d, via_roles) — still valid per-path
Drift comparisonNonescope_drift compares baseline vs current via_roles per path — works regardless of presentation

Conclusion: Path-level rules must continue to operate per-path. The access chain is a presentation/query layer — not a change to the evaluation primitive.

3.2 Risk Clusters

Risk cluster service groups paths by finding type combinations. Key metrics:

path_count    = matchingPathIds.length       // Stays as evidence count within access chains
identity_count = unique identity_ids // Becomes access chain count
workload_count = unique workload_ids // Stays meaningful as facet
Impact AreaSeverityDetail
path_count metricLowStays valid — counts individual paths in cluster, even within access chains
identity_countChanges meaningCurrently a derived count; becomes the primary access chain count
Cluster remediationMediumCurrently deduplicates by workloadName || workload_id, not by identity. runs_as_name is optionally carried but not the dedup key. Access-chain-level remediation requires semantic change to dedup by identity.
Governance checklistNoneAlready flags identity reuse: "N identities across M paths"

Conclusion: Clusters already track identity_count. Making access chains the primary dimension is a natural fit.

3.3 Evidence Packs

Evidence sections reference entity.execution_paths[] (resource_id level), not path._id. The only path-ID dependency is in PathRemediationAction (remediation guidance).

Impact AreaSeverityDetail
authority_snapshot sectionNoneLists all execution_paths for entity — identity-agnostic
scope_drift_detail sectionNoneMaps role changes to affected resources — identity-aware already
blast_radius sectionNoneGroups by sensitivity/domain — no path IDs
Remediation actionsLowInclude path_id for targeting — could include access_chain_id instead

Conclusion: Evidence packs are effectively unaffected. They're entity-scoped with resource-level detail.

3.4 API Layer

Current API: GET /api/v1/authority-paths returns flat list with identity_id filter support.

The access chain model needs a dedicated endpoint: GET /api/v1/access-chains. Flat path endpoint remains for backwards compatibility and evaluator-level queries.

3.5 UI

Currently flat tables in both AuthorityPathsListPage and RiskClusterDetailPage. One existing grouping computation:

// RiskClusterDetailPage — PathInlineExpand (lines 282-290)
const sameIdentityPaths = allPaths.filter((p) => p.identity?.id === identityId);
const distinctRoles = new Set(sameIdentityPaths.flatMap((p) => p.via_roles));
// Displays: "Identity total: N roles across M paths"

This is already doing identity-level aggregation — but only as an annotation inside expanded rows, not as a top-level object.


4. Option Analysis

Option A: Access chain as virtual first-class object

Approach: AuthorityPathDoc stays as-is. Introduce AccessChain as a virtual first-class object computed at query time. The access chain IS the product unit — paths are expandable supporting evidence within it.

Canonical axis for v1: identity. This is a product decision, not an open question. For the current wedge, identity is the anchor. Workloads, destinations, and data domains are context inside the access chain object. Unbound paths (where identity_id is null) fall back to workload as the grouping key, clearly labelled "[Unbound]".

Data model changes:

  • None to AuthorityPathDoc

  • New virtual type: AccessChain (computed at query time)

interface AccessChain {
identity: { id: string; display_name: string; source_system: string } | null;
workloads: Array<{ id: string; display_name: string }>;
destinations: Array<{
id: string;
display_name: string;
data_domain: string;
sensitivity: string;
}>;
combined_roles: string[]; // union of via_roles across all paths
combined_actions: string[]; // union of actions across all paths
path_ids: string[]; // underlying path IDs (supporting evidence)
path_count: number;
total_execution_30d: number; // sum across paths
last_execution_at: string | null; // max across paths
ownership_status: string; // worst-case across paths
max_finding_severity: string | null;
finding_types: string[]; // union across paths
active_finding_count: number; // sum across paths
behavior_pattern: string; // see section 4.3
chain_rank: number; // see section 4.2
}

API changes:

  • New endpoint: GET /api/v1/access-chains?workload_id=&identity_id=&...

  • Existing flat endpoint unchanged (backwards compatible)

Evaluator impact: None. Path-level findings continue per-path. Access chains aggregate findings for presentation.

Evidence impact: None. Evidence packs stay entity/path-scoped.

Risk cluster impact: Minimal. Could add access_chain_count alongside path_count in cluster metadata.

4.1 One Remediation Object per Access Chain

Each access chain produces one primary remediation object. This is not visual dedup of path-level actions — it is a semantically distinct remediation primitive.

Example:

Reduce svc-finance from 5 roles to 2, with impact across 3 destinations.

Not 5 path-level actions that happen to collapse visually.

The access-chain remediation object contains:

  • Identity: which service account or principal to act on

  • Current state: N roles reaching M destinations across K workloads

  • Recommended action: reduce to target role set, with justification per removed role

  • Impact scope: which destinations and workloads are affected by the change

  • Supporting evidence: individual path-level findings that drive the recommendation

Path-level PathRemediationAction records continue to exist as evidence. The access-chain remediation object is the unit the customer sees and acts on. The relationship is: one AccessChainRemediation references N PathRemediationAction records and may produce one MitigationActionDoc (see #215).

4.2 Access Chain Ranking Model

max_finding_severity and active_finding_count are not sufficient for prioritization. Access chains are ranked by a composite model driven by these factors:

FactorSignalWeight rationale
Blast radiusNumber of distinct destinations reachable, weighted by data domain diversityMore destinations across more domains = wider damage if compromised
SensitivityWorst-case sensitivity across all destinations (restricted > confidential > internal > public)A chain reaching one restricted system outranks one reaching five internal systems
Execution intensitytotal_execution_30d normalized against peer chains in the same tenantHigh-activity chains are confirmed attack surface, not theoretical
DriftCount of scope_drift findings across paths, weighted by recencyActive drift signals loss of control — prioritize chains that are changing now
Cross-workload reuseNumber of distinct workloads sharing this identityShared identities are force multipliers — compromise spreads across workload boundaries
Ownership qualityownership_status — orphaned > unknown > ownedOrphaned chains have no accountable owner and are hardest to remediate

The ranking model is deterministic (no ML, no probabilistic scoring). Implementation computes a composite rank per chain based on these factors and sorts the access chain list accordingly.

4.3 Behavior and Drift Narrative

Aggregating execution_30d and last_execution_at is not enough to tell the story of an access chain over time. Each access chain carries a behavior pattern classification:

  • Newly active: First observed execution within the last 30 days. May indicate a newly provisioned identity or a previously dormant one that has started executing.

  • Steadily active: Consistent execution volume over multiple observation windows. This is the baseline — expected behavior for a legitimate workload.

  • Bursty: Execution spikes followed by quiet periods. May indicate batch jobs, incident response automation, or anomalous usage patterns.

  • Expanding over time: The set of destinations or roles is growing across snapshots. More destinations or more roles than the previous baseline — drift in action.

  • Dormant but dangerous: No recent execution, but the identity retains broad permissions. These are standing privileges that could be exploited without triggering execution-based alerts.

The behavior pattern is derived from the execution history across snapshots (not a single point-in-time metric). It is displayed prominently on the access chain card to give the analyst immediate context about the chain's trajectory.

Option A Summary

Pros:

  • Zero migration risk — path-level findings, IDs, evidence all unchanged

  • Backwards compatible API

  • Can ship incrementally (API first, then UI)

  • Access chain is the product unit from day one — not a toggle layered on top of flat paths

  • Ranking model and behavior narrative make each chain self-explanatory

Cons:

  • Access chain is computed at query time (no materialized document)

  • "Which view is canonical for reports and evidence packs?" — access chain for customer-facing, paths for evaluator internals

Option B: New primary entity — AccessChain as stored document

Approach: Materialize access chains as a new MongoDB collection alongside paths.

Data model changes:

  • New collection: access_chains

  • New document type: AccessChainDoc (materialized, not virtual)

  • New ID builder: buildAccessChainId(tenantId, identityId)

  • Path documents gain access_chain_id foreign key

API changes:

  • New resource: /api/v1/access-chains with full CRUD-like query support

  • Paths become children: /api/v1/access-chains/:id/paths

  • Findings could be queried at chain level: /api/v1/access-chains/:id/findings

Evaluator impact:

  • Path-level rules continue per-path (finding IDs unchanged)

  • Could add chain-level rules in the future (e.g., "this identity's combined blast radius exceeds threshold")

  • Would need chain refresh after path materialization

Evidence impact:

  • Evidence packs could reference access_chain_id in addition to path_id

  • New evidence section: "access chain snapshot" showing all destinations

Risk cluster impact:

  • Clusters could group by chain instead of path

  • identity_count becomes chain_count (same semantics, clearer naming)

  • Remediation targets by chain instead of path

Pros:

  • Clean, queryable entity with its own lifecycle

  • Enables chain-level findings, drift tracking, and remediation

  • Natural unit for "what can this identity reach?" question

  • Could support future "access review" workflow (approve/reject per access chain)

Cons:

  • New collection, new materialization step, new indexes

  • Chain-to-path sync complexity (what if paths change but chain isn't refreshed?)

  • Finding ID namespace decision: chain-scoped findings would need new IDs

  • More infrastructure to build and maintain

  • Risk of premature abstraction if the grouping key changes


5. Recommendation

Option A — access chain as virtual first-class object — is the right first step.

This is not a grouping toggle. It is an access-chain-first product model implemented via a virtual computation layer.

Rationale:

  1. Lower risk. No migration, no new collection, no finding ID changes. Ship it and validate with real users before committing to a materialized entity.

  2. Identity is the canonical axis for v1. This is a product decision per founder direction — not an open question. Workloads, destinations, and data domains are facets inside the access chain.

  3. Demo data confirms the value. Even with 29 paths, access chains reduce the list to 7+6=13 objects — each directly actionable. The remediation deduplication alone justifies the model.

  4. Infrastructure exists. The cluster detail page already computes identity totals. The API already supports identity_id filtering. The leap to an access chain endpoint is small.

  5. Path to Option B is clear. If access chains prove to be the right permanent model, materializing AccessChainDoc from the virtual logic is straightforward. Option A is a reversible bet; Option B is not.

Proposed Sequence

PhaseScopeEffort
Phase 1API: GET /api/v1/access-chains returns AccessChain[] with identity as canonical axis2-3 days
Phase 2UI: Access-chain-first list page (replaces flat path list as default view)2-3 days
Phase 3UI: access-chain default on RiskClusterDetailPage, cluster navigation by chain1-2 days
Phase 4Remediation: one remediation object per access chain (deduplicated by identity). Must define how chain-level actions relate to path-level PathRemediationAction and MitigationActionDoc (#215).3-4 days
Phase 5Ranking + behavior: access chain ranking model and behavior pattern classification. Must define derivation rules for composite rank and how behavior patterns are computed from snapshot history.3-4 days

Phase 1-3 are implementation-ready (~1 week). Phase 4-5 need one more design pass before implementation — they introduce access-chain-level remediation, ranking, and behavior as new product concepts. Total: ~2 weeks of focused work, with a design checkpoint between Phase 3 and Phase 4.

API Contract Sketch

GET /api/v1/access-chains?status=active

Response:
{
"data": [
{
"identity": { "id": "id-svc-finance", "display_name": "svc-finance", "source_system": "entra_id" },
"workloads": [{ "id": "wl-invoice-rule", "display_name": "Invoice Rule Engine" }],
"destinations": [
{ "id": "res-financial-api", "display_name": "Financial API", "data_domain": "Financial", "sensitivity": "restricted" },
{ "id": "res-invoice-archive", "display_name": "Invoice Archive", "data_domain": "Financial", "sensitivity": "confidential" },
{ "id": "res-apar-ledger", "display_name": "AP/AR Ledger", "data_domain": "Financial", "sensitivity": "restricted" }
],
"combined_roles": ["sn-role-finance-read", "sn-role-invoice-view", "sn-role-ap-write", "sn-role-ar-write", "sn-role-ledger-admin"],
"path_count": 5,
"path_ids": ["abc123", "def456", "..."],
"total_execution_30d": 847,
"last_execution_at": "2026-03-25T14:30:00Z",
"ownership_status": "owned",
"max_finding_severity": "high",
"finding_types": ["scope_drift", "reachable_sensitive_domain"],
"active_finding_count": 10,
"behavior_pattern": "expanding_over_time",
"chain_rank": 1
}
],
"meta": {
"total_chains": 9,
"total_paths": 29
}
}

UX Model — Access-Chain-First

The product UX centers on the access chain, not on a table of paths with a grouping toggle. The mental model for every screen is:

  1. One identity — who is this?
  2. Actual access chain — what can it reach, via which roles?
  3. What it reaches — destinations, data domains, sensitivity levels
  4. Why it matters — ranking factors, behavior pattern, drift narrative
  5. What to do next — one remediation action per chain

Access Chain List Page

The primary view is a ranked list of access chain cards. Each card is a self-contained summary:

┌─────────────────────────────────────────────────────────────────┐
│ #1 svc-finance Owned │
│ Invoice Rule Engine │
│ │
│ 5 roles → 3 destinations (Financial) │
│ 847 exec/30d · Expanding over time │
│ Findings: scope_drift, reachable_sensitive_domain │
│ │
│ Action: Reduce from 5 roles to 2, │
│ impact across 3 destinations │
├─────────────────────────────────────────────────────────────────┤
│ #2 svc-ascribe-prod Orphaned │
│ Ascribe Summarizer │
│ │
│ 2 roles → 5 destinations (Healthcare) │
│ 312 exec/30d · Steadily active │
│ Findings: reachable_sensitive_domain │
│ │
│ Action: Assign owner, review clinical data access │
├─────────────────────────────────────────────────────────────────┤
│ #3 svc-foundry Orphaned │
│ Foundry Agent + Foundry Provisioner │
│ !! Shared across 2 workloads │
│ │
│ 4 roles → 5 destinations (Infrastructure, IT) │
│ 715 exec/30d · Bursty │
│ Findings: scope_drift, cross_workload_identity │
│ │
│ Action: Split into per-workload identities │
└─────────────────────────────────────────────────────────────────┘

Access Chain Detail Page

Clicking into a chain shows the full detail:

┌─────────────────────────────────────────────────────────────────┐
│ svc-finance · Invoice Rule Engine Owned │
│ │
│ CHAIN SUMMARY │
│ 5 roles → 3 destinations │ 847 exec/30d │ Expanding over time │
│ Rank: #1 of 13 chains │
│ │
│ WHY IT MATTERS │
│ · Blast radius: 3 financial systems across 1 data domain │
│ · Sensitivity: restricted (Financial API, AP/AR Ledger) │
│ · Drift: scope expanded — 2 new roles since baseline │
│ · Execution: 847 calls/30d, trending up │
│ │
│ WHAT TO DO NEXT │
│ Reduce svc-finance from 5 roles to 2, with impact across │
│ 3 destinations. Remove: ap-write, ar-write, ledger-admin. │
│ Keep: finance-read, invoice-view. │
│ │
│ DESTINATIONS (expandable supporting evidence) │
│ ▸ Financial API restricted │ 421 exec │ scope_drift │
│ ▸ Invoice Archive confidential│ 312 exec │ scope_drift │
│ ▸ AP/AR Ledger restricted │ 114 exec │ scope_drift │
│ │
│ ROLES │
│ finance-read, invoice-view, ap-write, ar-write, ledger-admin │
└─────────────────────────────────────────────────────────────────┘

Expanding a destination row shows the underlying path-level detail: specific roles, actions, execution counts, and individual findings. This is the supporting evidence layer — not the primary frame.


6. Risks and Open Questions

  1. Unbound paths. Paths with identity_id = null don't form access chains naturally. Proposed: group by workload for unbound paths, clearly labelled "[Unbound]".

  2. Cross-workload identity sharing. id-svc-foundry spans 2 workloads. The access chain should surface this prominently — it's a higher-risk pattern and a force multiplier.

  3. Pagination. Access chain response may not fit cursor-based pagination cleanly (chains have variable path counts). May need offset pagination or full-result with client-side virtualization.

  4. Reports and exports. If reports currently list flat paths, they'll need an access-chain variant. Defer until after Phase 3.

  5. Grouped drift aggregation. When drift is shown at the access chain level, how are severity, finding counts, and finding types derived? Rules: (a) max_finding_severity = worst across paths, (b) active_finding_count = sum across paths (with dedup against entity-level findings to avoid inflation), (c) finding_types = union across paths.

  6. Behavior pattern derivation. Behavior classification requires access to multiple snapshots. Phase 5 must define which snapshot fields are compared and the threshold logic for each pattern category.

  7. Ranking model calibration. The composite rank needs weight calibration against real customer environments. Start with equal weights, adjust based on feedback.


Next Action

Status: research-complete

Decision needed from: Sergey (product direction), Ivan (engineering scope)

Options:

  1. Adopt Phase 1-3 — API access chain endpoint + UI access-chain-first pages. ~1 week. Validates the model with minimal risk. Implementation-ready now.

  2. Adopt Phase 1-5 — Full sequence including remediation, ranking, and behavior. ~2 weeks. Phase 4-5 require an additional design pass — they introduce access-chain remediation and ranking as new product concepts.

  3. Defer — Wait for more connector data to validate access chain model at scale.

GitHub Issue: SecurityV0/sv0-platform#216