Skip to main content

Revocation Rehearsal — Deep-Dive (applicability · feasibility · implementation · integration)

Founder TL;DR. The idea survives the deep-dive, and gets bigger. (1) Rehearsing a revocation needs nothing we don't already have — the scanned graph is the rehearsal environment; the engineering lift is a bounded dry-run mode, and there is a way to spike it in 2–3 days without touching the materializer at all. (2) Rehearsing a planned deployment does not require imagining infrastructure or correlating dev with prod: the customer hands us the declared change (a Terraform plan, an agent manifest), we compile it through the same connector transformers we already ship into graph nodes, overlay it on the real production graph we already scan, and re-run the same deterministic engine. The prod graph is the ground truth; only the delta is hypothetical. (3) The MCP-server interaction you described is realistic and on-pattern — but as a front door to a deterministic engine, never as "agentic prediction." AWS is shipping exactly that shape. (4) Your instinct on cross-environment correlation is right: nobody does it, it's convention-dependent, and the overlay approach makes it unnecessary. Ask: fund the M0 spike (below) this sprint.

Working issue: sv0-documentation#379 · umbrella #299 · concept origin: the counterfactual-angles doc (Angle 1, PR #376).


1. The question that unlocks the product: "how can we rehearse if we don't scan anything?"

It splits into three modes with very different costs. The founder's prior framing (cross-correlate dev/pre-prod/prod environments) is Mode C — the hardest one — and it turns out to be unnecessary for the core value, because Mode B substitutes for it.

ModeWhat's rehearsedWhat the hypothetical part isDo we need data we don't scan?
A — Revoke what exists"What breaks if we revoke grant G / disable identity I?"Nothing — we mask edges that are already in the scanned graphNo. The scanned graph is the simulator state.
B — Deploy what's declared"We plan to deploy X — what authority does it gain, what does it expose?"The delta only: synthetic nodes/edges compiled from a declared artifact (Terraform plan JSON, ARM template, Entra app registration, agent manifest)No. The delta is overlaid on the real scanned prod graph. We never model the environment — the customer's actual environment, as scanned, is the model.
C — Predict prod from pre-prod"This is how the agent behaves in dev — what happens in prod?"Cross-account identity correlation (the same workload under different ARNs/subscriptions/tenants)Yes, structurally. Requires per-customer naming/tagging convention mapping. Market scan: no vendor does this; SPIFFE-style portable workload identity is still aspiration-stage, with env-encoding left entirely to convention.

The reframe that answers the founder's question: you don't predict prod from dev observation. You take the prod graph you already have (Mode A's asset), and the declared description of the change — which is exactly what a deployment pipeline already produces as a machine-readable artifact — and overlay it (Mode B). The thing that made Mode C feel impossible (correlating proprietary environments across accounts "without real SaaS ingestion") never enters the picture: in Mode B the customer's CI hands us the delta explicitly, no correlation needed.

Mode C is not dead — it degrades gracefully into a convention-config feature (the same shape as our existing endpoint-URI config-bridge stitching: customer declares "role names map dev→prod by this rule"), for opinionated estates only, later. It should not gate anything.

What Mode B requires technically — and why it's cheaper than it sounds

Verified against code (file references in §3):

  1. The graph schema already accepts synthetic nodes. NormalizedGraph / NormalizedNode / NormalizedEdge (src/ingestion/types.ts) carry no assumption that a node came from a live API. A Terraform plan can be expressed in this schema without platform changes.
  2. The connector transformers are reusable on declared artifacts. Both sampled transformers (AWS transformer.py, entra-servicenow transformer.py) operate on extracted Python dicts/dataclasses, not on live API clients. A Terraform plan JSON parsed into the same dataclass shapes (IAMRole, IAMPolicy, LambdaFunction, …) flows through the existing transformer unmodified and emits normalized nodes/edges. The "plan compiler" is a parser, not a new transformer.
  3. What does NOT exist: any ephemeral/preview ingestion. Today everything upserts into the entities collection. The overlay needs a shadow path — materialize over (real graph + overlay) in memory, write nothing. That is the same in-memory machinery Mode A's dry-run needs, so Modes A and B share one engine investment.
  4. One real correctness gap to engineer (flagged honestly): cross-system correlation edges (SAME_ENTITY / BRIDGES_TO) are created by stitching rules at ingest time. Synthetic overlay nodes won't have them unless the stitching rules are also run over (real + overlay) in the shadow path. Without this, a planned deployment that bridges systems (the most interesting case) under-reports its reach. Bounded work — the rules are deterministic functions — but it must be in the plan, not discovered later.

2. Applicability — who asks the question, and when

Four concrete moments where the rehearsal verdict is the artifact someone already needs:

  1. Before executing a Safe-to-Revoke item (Mode A). The verdict — breakage cone / risk closure / redundancy — is the missing line in every recertification decision. Demand is quantified: reviewers approve >95% of access in certification campaigns at <10 seconds per decision, and >75% of orgs admit rubber-stamping, because nobody can assess impact (Clarity, Core Security). This mode also hardens our own Safe-to-Revoke List (#348): every list item ships pre-rehearsed.
  2. Before a credential rotation/decommission (Mode A). The canonical citable incident exists: Cloudflare R2, 2025-03-21 — credentials rotated into the dev environment instead of prod (an omitted --env production flag), old credentials deleted, 100% of R2 writes failed globally for 67 minutes (Cloudflare post-mortem). One incident validates both the rehearsal premise and the env-confusion premise. Oasis Security already markets against this incident — with context pages, not simulation.
  3. Before merging an infrastructure PR (Mode B). terraform plan → predicted authority delta as a PR comment / CI gate. The buyer here is platform engineering, and the consumption pattern is market-proven by Overmind (plan-time blast radius on a live AWS graph, bought via CLI/PR-comment) — but Overmind computes generic resource dependencies and outsources "risk" to an LLM; nobody computes the authority/permission delta, deterministically, cross-system.
  4. Before deploying an AI agent (Mode B, the strategic one). The agent manifest (identities it runs as, scopes requested) compiles to an overlay; the verdict is "this agent, in your environment as it actually is, reaches these resources through these chains — including the ones the vendor's docs don't mention because they only exist in your wiring." This is the agent-governance leg's pre-deployment gate, and it composes with the Agent Authority Manifest angle (same artifact, before vs after deployment).

Who it does not apply to (honest limits): estates where our connector coverage is thin — the breakage cone is only as complete as observed-execution coverage, so every verdict must carry the per-connector blind-spot statement ("no observed use in 90d, and observation for this surface is X-complete"). AWS's own docs concede the failure mode (DR roles: legitimate but unused). This hard-couples Rehearsal to Decision 0 (blind-spot fixes) — which we recommend funding anyway.


3. Feasibility — what the code says (verified 2026-06-09)

ComponentStatusDetail
Traversal engineExists; needs bounded decouplingmaterializeExecutionPaths (src/ingestion/path-materializer.ts:124) takes a StorageAdapter. Storage touchpoints are few and enumerable: entity reads (getEntity / getEntitiesByIds, ~7 sites), correlation reads (queryCorrelations, lines ~766–769, fetched per-entity during recursion), entity writes (upsertEntity, 2 sites: execution_paths ~line 205, accessible_by ~line 247).
Cheapest spike pathZero materializer changesStorageAdapter is an interface ⇒ implement an in-memory adapter over a tenant snapshot (we already have snapshot/restore tooling from #1437), mask the candidate edges in the in-memory copy, run the unmodified materializer against it, diff the outputs. Throwaway for production, perfect for the feasibility spike.
Overlay inputFeasible as-isNormalizedGraph schema accepts synthetic nodes/edges unchanged; connector transformers are dict-based, no API-client coupling (verified on AWS + entra-servicenow transformers).
Shadow (no-write) ingestionDoes not exist; must buildNo ephemeral/preview mode anywhere; everything upserts. The dry-run materializer and the shadow overlay are the same in-memory build.
Stitching on overlayGap, boundedCorrelation edges are minted at ingest by deterministic stitching rules; the shadow path must run them over (real + overlay) or cross-system reach is under-reported.
Diff layerNew, smallNo ExecutionPath-set diff exists (staleness/drift machinery diffs per-entity authority state, not path sets). Diffing two MaterializeResult.computedPathsByEntity maps is O(paths); straightforward.
Downstream readersCautionBlast-radius endpoint and risk-cluster service read pre-computed storage docs (entity.execution_paths, authority_paths). The rehearsal verdict must render directly from the dry-run result — it must never write through the production collections, or the simulation contaminates the record.
Scale envelopePlausibly interactive; spike must confirmReal tenants today: ~200 entities, auth-chain depth capped at 2, well under the 5,000-path caps. No timing instrumentation exists anywhere — the spike's primary deliverable is the number.
Job runtimeAcceptable for this productThe in-memory, non-durable worker queue (src/workers/runtime.ts:34) disqualifies write-path products (adversarial review §6) — but rehearsal is read-only and idempotent: a lost job re-runs with zero consequence. Rehearsal is the one pivot bet the current runtime can carry.
Determinism ruleFully compliantMasked re-materialization + set diff + deterministic stitching: no ML anywhere, and the counterfactual inherits the evidence grading of the real graph.

Net feasibility verdict: the original angles doc sized this M with "the dry-run refactor as the bulk." The deep-dive improves that: the spike needs no refactor at all (in-memory StorageAdapter), and the production refactor is a bounded decoupling of ~10 enumerated call sites, not a rewrite. The two genuinely new builds are the shadow-stitching pass and the diff layer — both small, both deterministic.


4. Ease of implementation — the milestone ladder

MilestoneScopeSizeProves
M0 — Spike (fund now)In-memory StorageAdapter over a snapshot tenant (contoso / nimbus-cloud); mask each dormant_authority finding's grant; run unmodified materializer; diff. Output: the sales-deck table — "of N revoke candidates: K have redundant paths (zero-benefit revocations), M have observed traversals in 30d (would break something)" — plus the timing number (interactive vs batch).2–3 daysThe kill criterion (performance) and the demo artifact, simultaneously.
M1 — Rehearse-a-revocation productDry-run materializer path (decouple the enumerated call sites); diff layer; rehearse worker job + API; verdict panel on dormant-authority findings + a section in the evidence pack; blind-spot statement attached to every verdict.~3–5 weeksMode A shippable; Safe-to-Revoke List (#348) upgrades from "list with proof" to "list with pre-flight verdict."
M2 — Pre-deployment overlayTerraform-plan→dataclass parser feeding the existing AWS transformer; shadow (no-write) overlay ingestion incl. stitching pass; "planned-change authority delta" report; CI/PR-comment delivery.~4–6 weeksMode B shippable for the AWS leg; agent-manifest overlay follows the same rail for the Microsoft/agent leg.
M3 — MCP front doorAn MCP server exposing two tools — rehearse_revocation(target) and rehearse_plan(artifact) — that call the deterministic engine and return the verdict with evidence references. No platform MCP code exists today (verified — the one "mcp-server" string in the codebase is a stitching-rule name); this is a thin new service over M1+M2 APIs.~1–2 weeks after M2The conversational interaction the founder described — on the credible architecture (§5).

Sequencing note: M0 needs nothing merged; M1 is platform-only; M2 touches sv0-connectors (parser) + platform (shadow path); M3 is additive. None of it touches the read-only connector model — the engine never writes anywhere, including our own production collections.


5. Ease of integration — how the customer actually touches this

The interaction surfaces (each maps to a market-proven consumption pattern)

SurfaceModeWho consumesPrecedent
Verdict panel on the finding / Safe-to-Revoke item + evidence-pack sectionASecurity analyst, recertification reviewerOur own evidence-pack model; the missing line in the >95%-rubber-stamp workflow
Ticket enrichment (rides Evidence-Backed Tickets)AIAM / app owner in their ITSMEvery vendor's ticket motion; ours carries the verdict, not just the finding
CI gate / PR comment on terraform planBPlatform engineeringOvermind (proven buyer + delivery), Checkov plan scanning — none compute authority deltas
MCP tools (rehearse_revocation, rehearse_plan)A + BThe customer's own agents/copilots; our demo motionawslabs IAM + Access Analyzer MCP servers; Wiz MCP server over its security graph
Change record attachment (ServiceNow change/CAB)A + BChange managerServiceNow's own Predictive Intelligence for Change (GA 2025-12) proves the budget; it scores risk with ML on ticket similarity — we attach a deterministic authority-graph verdict to the same change record, via a connector we already ship

The MCP question, answered directly

The founder asked: "will we talk to an MCP server saying 'we plan to deploy X — imagine the impact', like agentic-level predictions?"

Yes to the interface, no to the imagination. The market has already converged on the credible architecture: deterministic analyzer behind a conversational front-end. AWS exposes IAM and Access Analyzer through MCP servers rather than letting the model predict; Wiz's MCP server answers from its security graph; SOC-adoption research finds analysts refuse to delegate critical decisions to non-deterministic output. Overmind is the counterexample (LLM-predicted impact, real customers) — and the cautionary tale: their risk layer is the part buyers must take on faith. Our version: the LLM/agent layer translates ("we're deploying a payment-reconciliation agent with these scopes" → an overlay artifact) and narrates the verdict; the verdict itself is computed by the masked/overlay re-materialization and is bit-for-bit reproducible. This is also the only version compatible with our no-ML platform rule — the rule turns out to be the differentiator, not the constraint.

Practical demo shape: customer pastes a Terraform plan (or agent manifest) into any MCP-capable assistant connected to our server → rehearse_plan → "this change grants identity X reach to 4 production resources through 2 chains, one of which crosses into ServiceNow via the integration user; here are the evidence references." Nothing in that sentence is generated — it's read off a deterministic diff.

The cross-environment idea, ruled

Confirmed: no vendor correlates the same workload across dev/staging/prod accounts to predict prod impact; the industry treats account separation as a blast-radius control, not a correlation opportunity; portable workload identity (SPIFFE) leaves environment encoding to per-estate convention. Deterministic cross-env correlation is possible only as customer-declared convention mapping — which our config-bridge stitching pattern could express — but it is strictly dominated by the Mode B overlay for the actual product question ("what will this change do in prod"), because the overlay uses the real prod graph directly. Recommendation: park Mode C as a convention-config roadmap item; do not let it near the critical path.


6. Collisions, demand, and the honest limits

Collision verdict (market scan, 2026-06-09, sources verified): the whitespace holds — nobody ships cross-system + observed-execution + redundancy-aware revocation simulation, and the redundancy verdict ("this revocation is pointless, a parallel path remains") appears claimed by no one. Two adjacencies to manage:

  • Veza literally markets "preview blast radius before changes" (Access AuthZ, 2025-11). Evidence says it's an entitlement-delta preview on a declared-permission graph — no execution-chain breakage, no redundancy. Action: demo-level diligence before we use blast-radius language in any deck, and position on what theirs isn't: observed-execution breakage + redundancy proof + determinism.
  • Overmind proves plan-time blast radius on a live graph is buildable and bought — AWS+K8s resource dependencies only, no authority semantics, LLM risk layer. Existence proof for Mode B's delivery model; differentiation is the authority delta + determinism.

Also instructive: the industry's two shipping answers to revocation fear are context (Oasis: "decommission without fear" — shows consumers, doesn't simulate) and undo (Sonrai, P0: block + just-in-time restore). Undo-instead-of-foresight is a structural concession that nobody has solved prediction. We'd be selling foresight; undo remains the customer's belt-and-braces, not our competitor.

Honest limits (these go in the product, not just this doc):

  1. Absence-of-use is not proof of safety. The breakage cone covers observed chains; an unobserved-but-real dependency (DR role, annual job) escapes it. Every verdict carries the coverage statement; the blind-spot map (Decision 0) is a hard dependency of honest output.
  2. Overlay fidelity is bounded by the declared artifact. A Terraform plan describes intended infrastructure; runtime behavior (what the deployed thing calls) appears only after deployment, when the normal scan picks it up. Pre-deployment verdicts are authority-reach claims (permission_exists-grade), never execution claims — the existing evidence taxonomy already expresses exactly this distinction.
  3. Performance is still the kill criterion. Nothing in the codebase measures materialization time; today's tenants are small (~200 entities). If masked re-materialization can't answer interactively at 10–100× that, the product degrades to batch reports — still sellable into recertification campaigns, not the wedge. M0 exists to get this number first.

7. Decision asks

  1. Fund M0 now (2–3 days, one engineer/agent, snapshot tenants, zero merge risk). Output: timing number + the K/M/N rehearsal table on real data — simultaneously the kill-criterion test and the first sales artifact. (Platform spike issue to be opened in sv0-platform on approval.)
  2. Adopt the mode framing (A: revoke-what-exists → B: deploy-what's-declared → C: parked) in the decision brief — it answers "how do we rehearse what we don't scan" in one table, and it tells the Mode C idea's story honestly: not wrong, just dominated.
  3. Schedule Veza diligence (demo of Access AuthZ "blast radius preview") before any external use of blast-radius language.
  4. Sequence note for the brief: Rehearsal M1 slots between the blind-spot fixes (Decision 0, which it depends on for honest verdicts) and the Safe-to-Revoke List pilot (#348, which it upgrades). It is not a competitor to the Governed Remediation Loop — it is the loop's safety case, productized.

Next Action

Status: research-complete

Decision needed from: Sergey + Ivan — fund M0, and confirm Mode B (declared-delta overlay) replaces the cross-environment correlation idea on the board.

GitHub Issue: #379 · umbrella #299