Skip to main content

Scalability Sizing and Decision Points

Companion to Graph Scalability and Migration Strategy. The position paper describes the architectural staircase qualitatively; this document supplies the numbers.

This document refines the position paper's Step 4 into two compositional tiers (Tier 3a — per-tenant Atlas cluster; Tier 3b — per-tenant worker pool) plus full cells, because the difference matters for both deal-signing and pricing. Tier 3a is independently sellable as a Premium SKU long before Step 4 is on the roadmap.

How to read this document

The position paper is the narrative; this document is the numbers. Read the position paper first if you want the architectural story; read this document for the per-archetype cutoff numbers, per-tier accounting, and pricing math.

Key vocabulary used throughout — defined in the position paper's "Terms used in this document" section:

  • Staircase — five-step migration plan (Step 0 → Step 1 → Step 2 → Step 3 → Step 4) plus two intermediate isolation tiers (Tier 3a, Tier 3b) and one parallel workstream.
  • Cliffs 1–4 — the four named scaling failure modes (1: full-tenant evaluator read; 2: synchronous evidence aggregation; 3: in-memory job queue; 4: role fan-out write amplification).
  • Archetypes A–E — five customer-size profiles; defined in the §Customer archetypes table below.
  • ADR-020 Phase 0 — the live shared-cluster Atlas footprint with logical control-plane / data-plane separation. Phase 1 splits them into physically separate clusters and is triggered by the first paying EU customer.
  • NHI — non-human identity (service principal, IAM role, OAuth app, automation account, AI agent — every identity that is not a human user). All estate sizes in this document are in NHIs.

Methodology

The infrastructure baseline anchors on the production target — MongoDB Atlas (already live per ADR-020) and Azure VM compute (in flight per ADR-022) — rather than on the Docker Compose stack that runs the dev / PR-preview environment on Hetzner. The dev/preview stack remains a real artifact and is described where relevant, but it is not the reference frame for "what bites at customer X."

Three sources feed every claim:

  1. Code (verified file:line citations). Every cost claim is anchored in sv0-platform. Where prior reviews cited a number, the file was re-read to confirm.
  2. Existing architecture docs. Independent reviews already contain observed numbers for some scenarios; those are surfaced with their citations and labelled measured-by-audit.
  3. Industry-standard ratios. Identity counts, role/permission fan-out, CloudTrail event volumes, and human-to-NHI ratios use published vendor numbers (CyberArk Identity Security Threat Landscape, Astrix State of NHI Security, Wiz State of AWS IAM). Labelled estimated (ratio) with confidence called out per use.

Confidence labels:

  • Verified (code): read directly from production code.
  • Measured-by-audit: prior reviews report the number from running through the code.
  • Estimated (ratio): derived from a public benchmark applied to a customer-profile assumption.
  • Estimated (forward): assumption made for sizing; would need production observation to confirm.

Critical caveat. The platform has no production at scale. Every "this is what the platform will do at customer X" statement is a forward projection. The numbers are tight enough to make migration-step decisions and to discuss deal viability with investors; they are not tight enough to commit to SLAs in customer contracts.


Infrastructure baseline (production target = Atlas; dev/preview = Docker Compose multi-instance)

Production target (the reference frame for this document)

ComponentSettingSource
MongoDBAtlas M10 in aws:eu-west-1 (Ireland), 2 GB RAM / 2 vCPU / 10–128 GB storageADR-020 Phase 0 — live
ComputeAzure VM Standard_B2s (2 vCPU, 4 GB RAM), 2 prod replicas, zone-spreadADR-022 — in-flight, IaaS-only
App containerSingle Node process per VM, ~3 GB usable RAM after OS + sidecarsEstimated (forward) — the VM hosts API + UI + cloudflared + image-watcher; ~1 GB of overhead is realistic
API + workersSame Node processsrc/index.ts:77 (WorkerRuntime constructed before server.listen at src/index.ts:204) — verified
Job queueIn-process JS array, FIFO, non-persistentsrc/workers/runtime.ts:33 (private readonly queue: WorkerJob[] = []) — verified
Hop ceilingMAX_AUTH_CHAIN_DEPTH = 2 (raised from 1; reaches a three-system chain)src/ingestion/path-materializer.ts:30 — verified
Storage seamStorageAdapter TypeScript interface, every method takes tenantId as first argsrc/storage/storage-adapter.ts:226 (interface declaration) — verified
Materializer write-amplification formulaO(I × R × P × Res) per role-permission change, where I = identities holding the role, R = roles, P = permissions per role, Res = resources reached per permissionderived from materializeExecutionPaths at src/ingestion/path-materializer.ts:76 — verified
Evaluator full-tenant entity readqueryEntities(tenantId, { limit: 0 }) — unbounded .toArray() into memorysrc/evaluator/index.ts:51 — verified (Cliff 1, in-API-process)
Evaluator full-tenant authority-path readqueryAuthorityPaths(tenantId, { status: "active", limit: 0 }) — same shapesrc/evaluator/index.ts:103 — verified

The production architecture is deliberately stateless on the compute side: Atlas is the durable layer, the Azure VMs hold no per-tenant state. ADR-022 fixes the prod fleet at two B2s VMs (2 vCPU / 4 GB), sized to absorb a single-VM failure without an SLO breach. The compute path can scale vertically (B4ms = 4 vCPU / 16 GB; D4as_v5 = 4 vCPU / 16 GB non-burstable) for ~5–10× headroom before any horizontal split — that vertical headroom is what shifts the cliff arrivals out by 5–10× compared to the dev frame.

No managed-PaaS lock-in. ADR-022 explicitly rejects Container Apps / AKS / ACI in favor of IaaS primitives, on cloud-portability grounds. Sizing assumes the API process is a Node container running on a VM, scaled vertically first and horizontally second.

Dev / PR-preview baseline (NOT a production reference)

ComponentSettingSource
Hetzner CPX21 host3 vCPU, 4 GB RAM, 80 GB SSDProvider spec
MongoDB cache256 MB pinned (wiredTigerCacheSizeGB=0.25)docker-compose.deploy.yml:23
MongoDB container mem_limit512 MB pinneddocker-compose.deploy.yml:35
API container512 MB / 1 vCPU pinneddocker-compose.yml:62-63
Concurrency modelMultiple pr-N-dev instances co-resident on one CPX21 hostADR-022 §1

The 256 MB cache pin and 512 MB API container exist because multiple PR-preview instances coexist on one CPX21 host — each open PR runs its own pr-N-dev compose project via Caddy drop-ins. The pins are intentionally small to allow many parallel preview environments per host, not because that is the right size for a customer-facing process. These dev/preview container limits are not a production reference; production sizing in this document anchors on the Atlas + Azure VM baseline above.

Quick win still available on the dev side. The 256 MB cache pin is a known overaggressive default (sv0-platform#872); raising to 1 GB on the same CPX21 still leaves room for ~3 PR previews. This is a dev-experience improvement, unrelated to production sizing.


Customer archetypes

Five archetypes, ordered by estate size. Ratios anchored to published vendor benchmarks (CyberArk Identity Security Threat Landscape, Astrix State of NHI Security, Wiz State of AWS IAM). NHI count is the dominant size driver — at archetype D the platform sizes against ~10× workforce headcount.

ArchetypeProfileIdentities (NHI)ResourcesRolesEdges (rough)
A — Nimbus-Lab (the demo tenant)50-employee fictional SaaS, 1 cloud~150~1.2K~80~10K
B — Series-A SaaS (typical pilot)500 employees, 1 cloud + Entra + ServiceNow~5K~50K~400~250K
C — Mid-Market Multi-Cloud (first whale)2,500 employees, 2-3 clouds~25K~500K~2K~3M
D — Enterprise Multi-Cloud (Fortune 1000)15K employees, 4+ clouds~150K~5M~15K~40M
E — Regulated Mega (FedRAMP / GDPR-strict / pharma)50K employees, all clouds + on-prem~600K~25M~80K~250M

What each tier solves vs doesn't

This is the load-bearing accounting in this report. Step and tier names follow the position paper §Migration Staircase. Different tiers solve different problems and introduce different operational debt.

Step 0 — Promote chain contract; lift hop limit further + surface safety breakers

  • Solves: ordered-chain provenance (the real gap now that depth is 2, not 1); demoability of deeper multi-system chains; operator visibility when materializer protective-blocks.
  • Does NOT solve: any of Cliffs 1–4. Lifting depth further makes Cliffs 1, 2, 4 arrive sooner.
  • New operational concern: evaluator semantics shift with depth — finding rules were built around shallow chains and need re-validation as depth rises past the current 2-hop bound.

Step 1 — Pipeline stabilization

  • Solves: Cliff 1 (bounded streamed reads), Cliff 2 (async projection), Cliff 3 (durable queue + fair-share lanes).
  • Does NOT solve: Cliff 4 (write amplification on role fan-out is structural to the materialized model). No isolation, no residency, no premium pricing lever.
  • New operational concern: event-log lifecycle (retention, replay, freshness markers); a real durable queue needs its own monitoring.

Step 2 — Native graph engine as a read model

  • Solves: the traversal-side of Cliff 4 (graph queries are O(traversal) at query time, no fan-out write storm). Eliminates the hop ceiling for reads.
  • Does NOT solve: Cliff 1, 2, 3. The materializer, evaluator, and evidence pack assembly still consume materialized shapes until refactored.
  • New operational concern: two stores to keep in sync; per-tenant cache warmup; restart-rebuild from event log.

Tier 3a — Per-tenant Atlas cluster, shared API + workers

  • Solves: storage-side isolation per tenant — backup posture, blast radius, residency, contractual physical-isolation language. Removes Cliff 4's cross-tenant contagion (a fan-out storm on tenant D no longer slows tenant C's reads). Bounds Cliff 4 per cluster.
  • Does NOT solve: Cliff 1 (the eval still loads the full tenant entity set into the shared API process — load is just routed to a different cluster). Cliff 2 (synchronous aggregation still inflates the sync window for that tenant). Cliff 3 (one tenant's sync still occupies a slot in the shared FIFO; other tenants still wait). Cliff 4 within a single whale tenant is still write-amplifying inside their cluster — and is now bounded by that cluster's IOPS, see Atlas connection-pool budget and Atlas IOPS section below.
  • New operational concern: per-tenant cluster lifecycle — provisioning, backup verification per cluster, version-skew across clusters, connection-pool budgeting (the API process holds N pools instead of one), per-cluster monitoring fan-out, billing reconciliation per tenant.

Tier 3b — Per-tenant worker pool

  • Solves: Cliff 2 (a tenant's heavy aggregation no longer competes with other tenants' sync windows). Cliff 3 (no cross-tenant queue starvation). Partially Cliff 1 (if 3b is implemented as a per-tenant process, the OOM is contained to that tenant's process; if it's just a logical lane in the shared process, Cliff 1 persists).
  • Does NOT solve: Cliff 4. No storage-side isolation, no residency. The API request path is still shared.
  • New operational concern: worker pool autoscaling per tenant; cold-pool startup latency; cost-attribution surface; "noisy tenant" detection now needs to drive pool sizing automatically.

Step 4 — Full cell architecture

  • Solves: all of the above plus FedRAMP-grade physical isolation, regional sovereignty, cell-level blast-radius bound, ability to retire/restore a cell as one unit.
  • Does NOT solve: the single-whale-tenant problem inside one cell — that still needs the parallel intra-tenant partitioning workstream.
  • New operational concern: global routing layer (tenant→cell), cell-provisioning automation, cross-cell schema-migration discipline, cross-cell observability, cross-cell auth principal model.

Parallel — Intra-tenant partitioning

  • Solves: the whale-tenant case (1 tenant > 100K identities, very high churn) regardless of which tier they sit in. Partition by account/region/source; process deltas not full re-projection.
  • Does NOT solve: anything multi-tenant.
  • New operational concern: delta correctness across partition boundaries; partition-aware materialization; partition rebalancing.

The corrected solves-vs-doesn't table

Cliff / ConcernStep 1Step 2Tier 3a (per-tenant cluster)Tier 3b (per-tenant workers)Step 4 (cells)
Cliff 1 — eval OOM in API processFixes (bounded streaming reads)n/aDoes NOT fix — load still hits shared API processPartial — only if pools are separate processesFixes
Cliff 2 — sync evidence aggregationFixes (async projection)n/aDoes NOT fixFixes per tenantFixes
Cliff 3 — queue starvationFixes (fair-share lanes)n/aDoes NOT fix — same FIFOFixes per tenantFixes
Cliff 4 — write amp on role fan-outn/aFixes for traversal readsBounds to one cluster's IOPSn/aBounds to one cell
Cross-tenant data isolationn/an/aStorage-level (real)API-process-level (weaker)Full physical
Storage residency (GDPR)NoNoYes — region per clusterNoYes
FedRAMP Moderate (SC-4 / SC-7)NoNoPartial — storage isolation, but shared compute fails strict readingPartialYes — full physical separation
Premium pricing tier (sellable line item)NoNoYesMaybeYes
Whale-tenant write throughputNoPartial (read-side relief)Yes — dedicated IOPS/RAM (tier-bounded; see §IOPS)Yes (compute side)Yes
Single-tenant restore granularityNoNoYes (per-cluster snapshots)NoYes

The single line that matters most: Tier 3a does not fix any of the three pipeline cliffs. A buyer who pays for "your own database" on top of the shared platform can still suffer a sync stall when another tenant is mid-evaluation. That is exactly why Tier 3b exists as a separate column.


Atlas tier reference (production)

Public May 2026 Atlas pricing, AWS single-region single-AZ. Add ~25–40% real-world overhead for backup, data transfer, PrivateLink, and cross-AZ replica-set fees once any tier moves above Phase 0.

TierRAMvCPUCPU classStorage rangeBase IOPSMax connsMonthly (~720h, AWS)
M0 (shared)sharedsharedshared512 MBn/a500$0
M2 (shared)sharedsharedshared2 GBn/a500$9
M5 (shared)sharedsharedshared5 GBn/a500$25
M10 (dedicated)2 GB2burstable10–128 GB~3K (gp3)500~$58
M20 (dedicated)4 GB2burstable20–256 GB~3K (gp3)2,000~$144
M30 (dedicated)8 GB2dedicated40–512 GB3,0003,000~$389
M40 (dedicated)16 GB4dedicated80 GB–1 TB3,0006,000~$749
M50 (dedicated)32 GB8dedicated160 GB–4 TB7,50016,000~$1,440
M60 (dedicated)64 GB16dedicated320 GB–4 TB7,50032,000~$2,880

Two columns added relative to the previous version: Base IOPS and Max conns. Both bind real cliffs.

  • IOPS binds Cliff 4 inside a single cluster: write amplification on a fan-out storm is bounded by provisioned IOPS, not by RAM. M30's 3K base IOPS is the soft ceiling for whale-tenant write bursts; M50's 7.5K base is the next step up. Atlas allows IOPS bursting on gp3 volumes within limits, but sustained write storms saturate at the provisioned rate.
  • Max connections binds Tier 3a's pool budget: at the default Node MongoDB driver pool size (maxPoolSize=100), the shared API process can hold at most min(N_clusters × 100, sum(per_cluster_max_conns)) connections. See the pool budget section below.

The two MongoDB CPU-class boundaries (burstable below M30, dedicated at M30+) drove the previous report's M30-as-fallback widening. They still apply: an M10 or M20 cluster handling sustained CPU pressure (e.g. an evaluator cycle scanning 25K entities + materializer recomputing paths) burns CPU credits and degrades; M30 is the first tier that holds steady under sustained load.

Tier mapping per archetype

ArchetypeRight-sized clusterAll-in monthly storage cost (with overhead)Connection-pool budget impact
A (demo)shared M0 / pooled M10$0–80trivial — pooled tenants share one client pool
Bpooled M10 (≤ 5 tenants/cluster)~$80 / tenant when pooledone pool per cluster; ≤ 5 tenants per pool
Cdedicated M30~$500–700 fully loadedone pool per tenant — see §Pool budget
Ddedicated M50 (or M50 + read-replica)~$1,800–3,000 fully loadedone pool per tenant; cap at 50 conns to fit 16K limit comfortably
Ededicated M50+ multi-region + Step 4$5K+ fully loadedone pool per cell-region

Cliff arrival per archetype

The four cliffs arrive in code-structural order: 1 → 2 → 3 → 4. What changes per archetype is which tier moves which cliff and at what entity count it bites.

Cliff legend (see position paper §Where the cliffs actually are for full descriptions): Cliff 1 = full-tenant eval read into memory; Cliff 2 = sync evidence aggregation in materialization; Cliff 3 = in-memory FIFO queue; Cliff 4 = role fan-out write amplification.

On the production baseline, the API process is a Node container on a 4 GB Azure VM with ~3 GB usable. Cliff 1's working-set ceiling for a single tenant is ~300K–400K entities at this size — large enough that archetypes B and C are comfortable on baseline.

ArchetypeCliff 1 (eval OOM)Cliff 2 (sync agg)Cliff 3 (queue)Cliff 4 (fan-out)
A (150 NHI)Never on production baselineNeverOnly if shared-FIFO contention with multiple co-tenantsNever
B (5K NHI)Comfortable on production baseline (≤ 30K total entities; ~1–2% of 3 GB usable)Visible after 30 days of execution evidence accrual; tolerableBites if ≥3 large tenants sync simultaneously and one dominatesNo
C (25K NHI)Comfortable on production single-tenant baseline (~10–15% of 3 GB usable). Becomes risky once 2–3 C-class tenants are co-resident — still well within RAM individually, but concurrent eval cycles compoundBites within first ~14 days of evidence accrual; sync window grows but still finishesBites whenever a co-resident tenant is also large. Shared FIFO problemApproaching (largest role ~500–800 holders) — bounded by M10/M20 IOPS if pooled, comfortable on M30
D (150K NHI)Bites on a 4 GB VM API at single-tenant scale during peak eval — full-tenant entity + active-path read approaches GB-scale working set. Cannot run safely without Step 1 (bounded reads)Sync window measured in hours without async projectionWill starve every co-resident tenantBites — single Azure built-in role can hold 5K+ identities. M30's 3K IOPS saturates under sustained fan-out; M50's 7.5K IOPS is the floor
E (600K NHI)Cannot onboard without Step 1 + Step 2 + Tier 3aCannot onboard without Step 1 + parallel partitioningCannot onboard without per-tenant pool (3b) or cellCritical — needs Step 2 + intra-tenant partitioning + M50 multi-region

Reading sideways: archetypes A, B, and a single C tenant ride the production baseline (Atlas M10 + Azure B2s, possibly upgraded to M30 for C) without any staircase work. A second C tenant still triggers the shared-FIFO Cliff 3 problem the moment both run heavy aggregation simultaneously, but the OOM no longer dominates the picture. C requires Step 1 fully deployed by the time multiple C-class tenants are co-resident. D requires Step 1 + Step 2 + at least Tier 3a + at least M30→M50 storage. E requires the full staircase plus parallel partitioning.

Cliff 1 is not an archetype-B/C concern on production baseline — it is a multi-tenancy concern (more than one C-class tenant co-resident) or an archetype-D concern (single-tenant scale exceeds VM working set). Step 1 (pipeline stabilization) becomes the gate for "second concurrent C tenant or the first D pilot," not for the first commercial deal.

The read-side dimension the four-cliff table omits

The four cliffs above are all write/infra cliffs — memory, sync window, queue, IOPS, write amplification. There is a separate read-side limit that bites on a different axis, and it bites earlier than the write-side numbers suggest:

  • Read-side queryability cliff. Graph queryability today is seed-anchored: the graph query layer (GET /api/v1/graph/subgraph) requires a seed_id (returns 400 MISSING_SEED_ID without one), there is no predicate/browse query endpoint, and the analyst browse surface loads a capped entity inventory (useEntities({ limit: 200 }) in GraphExplorerPage.tsx) and filters client-side. At Archetype C (~3M edges) and Archetype D (~40M edges) this read model is inadequate for an analyst long before write-side role fan-out (Cliff 4) bites. The fix — a predicate query layer and large-answer aggregation — is ADR-031; this paper does not duplicate it.
  • Cliff 4 has a read face that arrives at C, not D. The table above scores Cliff 4 (role fan-out) as a write-amplification event arriving around D. That is the write face. The same high-holder role also produces a traversal/blast answer too large for a human to use — and that read/render face bites at C (largest role ~500–800 holders), earlier than the write face at D. The platform already handles it bluntly: server-side traversal safety caps (MAX_EDGES_PER_ENTITY_PER_HOP=1000 in subgraph-adapter.ts, subgraph MAX_LIMIT=500 in graph.ts, truncation_reason flags) and client-side aggregation (blast MAX_REACHABLE=200, plus group/supergroup/overflow nodes). Results are capped and aggregated, not returned raw — the gap is that aggregation is client-side and counts are approximate, not server-side with true counts. ADR-031 D4 is the home for that refinement. Provenance for both the read-time cross-system merge and the correlation-blindness audit that motivated this read-side framing: sv0-platform #1292 / #1289 (shipped 2026-05).

Atlas connection-pool budget under Tier 3a

The failure mode here is silent until it bites — worth treating as a first-class operational concern, not a footnote.

The Node MongoDB driver defaults to maxPoolSize=100 per MongoClient instance. In Tier 3a the shared API process holds one client per tenant cluster. Three working scenarios:

Tier 3a tenant countDefault pool (100) — sockets heldM10 ceiling (500/cluster)M30 ceiling (3,000/cluster)M50 ceiling (16,000/cluster)Process FD pressure
5500OK (100/500 per cluster)comfortablecomfortabletrivial
252,500breaches if any cluster maxesOKcomfortablemoderate (need ulimit ≥ 4096)
505,000breachesOK if pool kept to ~50/clustercomfortablehigh (need ulimit ≥ 8192)
10010,000breachesbreaches at default poolOK if pool kept to ~50/clustervery high — exceeds default ulimit
20020,000breachesbreachesbreaches at default poolnot viable on a single API process

Two recommendations follow:

  1. Cap per-tenant pool at 20–50 connections, not the 100 default. A pool size of 20 is comfortable for the API+workers shared process per tenant — most concurrent operations are short reads and bulk writes that don't need 100 sockets in flight. This pushes the safe Tier 3a ceiling to ~250–500 tenants per shared API process before pool-budget pressure forces horizontal split.
  2. Use a MongoClient pool factory keyed by tenant, with idle eviction. Tenants that haven't been seen in N minutes drop their pool. This bounds resident socket count to active tenants, not provisioned tenants — a critical optimization once the long tail of B-class tenants on pooled M10s becomes large.

The M10 max-connections ceiling (500) is the binding constraint for pooled tenants on a shared M10: the cluster's 500-conn budget is split across all tenants pooled onto it. For 5 tenants per pooled M10 with 50-conn pool each, that's 250/500 — safe. For 10 tenants per pooled M10 it's 500/500 — at the ceiling. The pooled-M10 multi-tenancy ratio caps at ~5–8 tenants per cluster for connection-pool reasons before capacity is the binding constraint.

Cross-reference to ADR-020. Atlas Phase 0 collapses control-plane and tenant-data-plane onto a single M10 with logical database separation. That is fine until multi-tenancy scales — the 500-conn ceiling applies to the whole cluster, not per-database. Phase 1 cluster split is partly motivated by exactly this: the data plane gets its own connection budget separate from the control plane.


Atlas IOPS and Cliff 4

Cliff 4 (write amplification per role-permission change) is bounded by both compute and Atlas IOPS. The compute bound is the materializer's per-write cost; the IOPS bound is what Atlas can sustain before queueing.

Order-of-magnitude math for a worst-case archetype-D fan-out event — one high-population role gets a permission added, materializer must update every identity that holds that role and every resource the role can now reach:

  • Identities holding role: 5,000 (Azure built-in role; measured-by-audit on enterprise estates).
  • Resources reached via the new permission: 500.
  • Permissions affected: 10.
  • Materialization writes per change: 5,000 × 500 × 10 = 25M writes.
  • Realistic batched bulk-write fan-out (Mongo bulkWrite with batch size 1,000): 25K bulk operations.
  • Per-bulk-op IOPS cost: ~5 IOPS (write + index update).
  • IOPS demand: ~125K IOPS over the duration of the burst.

That demand spread over 60 seconds = ~2.1K sustained write IOPS. Spread over 30 seconds = ~4.2K. Spread over 10 seconds (operator-visible) = 12.5K.

Cluster tierBase IOPSSustained at 60sSustained at 30sSustained at 10s
M10 (gp3)~3Kcomfortablequeueinghard saturation, errors
M303,000comfortablequeueinghard saturation
M507,500comfortablecomfortablequeueing tolerable
M50 + provisioned IOPS upgradeup to 64Kcomfortablecomfortablecomfortable

Implication for sales: an archetype-D tenant with frequent role-permission churn (typical for an enterprise still consolidating IAM) needs M50 from day 1, not M30. M30 will technically run that workload but will saturate IOPS during peak fan-out events and operator dashboards will show write queueing measured in seconds. The marginal cost of M50 over M30 (~$1,050/mo) is an order of magnitude smaller than the contract value of an archetype-D deal — never undersize the cluster to save $1K/mo on a $100K+ ACV.

Implication for Step 2 prioritization: Step 2 (graph engine as read model) eliminates the traversal-side of Cliff 4 but does NOT eliminate the materializer write storm — the materializer still runs to keep the document store consistent until each downstream surface is refactored to consume graph-derived deltas. Step 2 buys time but does not eliminate the IOPS demand until those refactors complete.


Decision-point grid (5 archetypes × 7 tiers — re-derived)

Verdicts: Required / Recommended / Comfortable / Not applicable.

Step 0Step 1Step 2Tier 3aTier 3bStep 4Parallel partitioning
A (Nimbus demo)Recommended (demo realism)ComfortableNot applicableNot applicableNot applicableNot applicableNot applicable
B (Series-A SaaS)RecommendedComfortableComfortableOptional (pricing lever only)Not applicableNot applicableNot applicable
C (Mid-Market)RequiredRequired by 2nd concurrent C tenantRecommendedRecommended (residency / premium tier)OptionalNot applicableNot applicable
D (Enterprise)RequiredRequiredRequiredRequiredRequiredRecommended (regulated subset)Recommended
E (Regulated Mega)RequiredRequiredRequiredRequiredRequiredRequiredRequired

Three things to notice:

  1. Tier 3a's "Required" column starts at archetype D, not C. C should be on Tier 3a for storage residency and the premium pricing line, but they can survive on shared storage with logical separation if necessary.
  2. Tier 3b becomes Required at D — once one D-class tenant exists, their evidence-aggregation runs alone are enough to starve everyone else's queue even with Step 1's fair-share lanes. Lanes give fairness; they do not give isolation.
  3. Step 4 (cells) is only Required at E — the regulated mega case. Many founders mistakenly believe "enterprise" means cells; in practice Tier 3a + Tier 3b carry an enterprise tenant well into nine-figure ARR, and full cells become Required only at the regulatory layer above that.

Cutoff thresholds in business language

Twelve numbered, anchored cutoffs.

  1. You can run any number of demo / archetype-A tenants on the production baseline indefinitely. The Atlas M10 + Azure B2s hold Nimbus-Lab's working set with two orders of magnitude of headroom.

  2. You can comfortably onboard a single archetype-C tenant (25K NHIs) on production baseline (Atlas M10 + Azure B2s) without any staircase work — the working set is ~10–15% of the 4 GB API VM's usable RAM. The real binding constraint on a single C tenant is Atlas IOPS during fan-out events, which pushes the cluster recommendation to M30.

  3. Once a second C-class tenant is co-resident on the same shared API process, Cliff 3 (queue starvation) bites before Cliff 1 (OOM). The fix is Step 1 sub-step 1 (fair-share lanes), not Step 1 sub-step 4 (bounded reads). This is the cheapest single piece of pipeline work that unblocks "second concurrent commercial tenant."

  4. You cannot onboard any archetype-D tenant (150K NHIs) without all four sub-steps of Step 1. This is the platform-wide load-bearing threshold for going from "we can demo at scale" to "we can serve at scale." Below it, shared infrastructure is honest at single-tenant scale; at D it is fiction.

  5. You cannot sell a contract clause that says "our data lives on dedicated storage" without Tier 3a. Per-tenant Atlas cluster delivers it without the cell control plane.

  6. You cannot sell GDPR data residency stronger than "logical separation" without Tier 3a. Tier 3a delivers per-region storage placement at the cluster level; this is the threshold buyers actually accept for residency commitments. Step 4 (full cells) is required for residency plus compute residency, which is a different and rarer ask. Note: ADR-020's Phase 0 today does not deliver residency — it ships when the first paying EU customer triggers the Phase 1 split.

  7. You cannot serve a single tenant whose write throughput exceeds the M30 IOPS ceiling (~3K sustained write IOPS, ~12K burst) without Tier 3a on M50. This is the whale-write case — one D-class tenant pushing hard would saturate a shared M30 and back-pressure every other tenant on that cluster.

  8. You can serve a single archetype-D tenant (150K NHIs) with Step 1 + Tier 3a (M50) + Tier 3b, no Step 4 required. Step 4 is only the gate for the regulatory properties D-class buyers may attach to that contract.

  9. FedRAMP Moderate is partially satisfied by Tier 3a; full satisfaction requires Step 4. FedRAMP (the US federal cloud security authorization framework) at the "Moderate" impact level rests on controls SC-4 ("information in shared resources") and SC-7 ("boundary protection"). These have a strict reading and a permissive reading. The strict reading — common at federal civilian agencies and DoD impact levels — requires both compute and storage isolation, which only Step 4 delivers. The permissive reading — accepted by some commercial regulated buyers — allows shared compute with isolated storage and per-tenant key encryption (Atlas BYOK [customer-managed keys via AWS KMS] is already enabled per ADR-020 Phase 0), which Tier 3a delivers. Default position for sales: do not promise FedRAMP Moderate on Tier 3a alone; do promise "FedRAMP-aligned data handling" with per-tenant Atlas cluster, BYOK + PrivateLink (private VPC connectivity) + Database Auditing + PITR (point-in-time recovery), then design Step 4 when an actual authorization is in flight.

  10. You cannot serve more than ~250 tenants on a single shared API process under Tier 3a without per-tenant pool caps and idle eviction. At the Node MongoDB driver default maxPoolSize=100, 250 tenants × 100 sockets = 25K resident TCP connections per process — exceeds default Linux file-descriptor limits and stresses the kernel. With per-tenant pool capped at 20 and idle eviction, the practical ceiling moves to ~500–1,000 active tenants per process before horizontal split. (Independent of Atlas connection-budget; that ceiling is per-cluster, see §Atlas connection-pool budget.)

  11. Premium isolation tier is sellable at Tier 3a. "Your data on its own dedicated Atlas cluster, in your selected region, with BYOK encryption, PrivateLink, Database Auditing, and PITR" is a CISO-readable contract clause. The price uplift this commands is meaningful (see §Premium pricing tier argument). The line on the order form does not need to wait for Step 4.

  12. You cannot serve archetype E (600K NHIs, regulated) without the full staircase plus parallel partitioning.


Connector vs platform bottleneck per growth shape

Connector-side and platform-side bottlenecks have different remediation paths and different owners; conflating them produces wrong roadmap calls.

The connector audit findings (Azure Entra serial RPS, ServiceNow break-on-429, AWS no jitter) are orthogonal to the platform staircase. Fixing them gets faster connectors; it does nothing for the platform pipeline cliffs. Conversely, no amount of pipeline work fixes a connector that has serialized itself behind a single async loop.

Growth shapeDominant bottleneckWhere the time is spentFix path
Many small tenants (lots of B archetype)Platform queue (Cliff 3) — concurrent sync_ingestion jobsWaiting in shared FIFOStep 1 (durable queue, fair-share). Tier 3b only needed when one tenant alone monopolizes a pool.
One large tenant (single D or E)Both — connector scan time + platform materializationConnector scan: ~hours on Entra at 2 RPS effective. Platform materialization: synchronous evidence rollup per path. They serialize.Connector: per-connector concurrency + jitter. Platform: Step 1 (async aggregation) + Tier 3a M50 (write throughput per cluster) + parallel partitioning.
Mixed (typical early commercial)Cross-tenant queue starvation — one D-tenant's sync blocks every B-tenant'sShared FIFO occupied by long-running D job for hoursStep 1 sub-step 1 (fair-share lanes) is the cheap mitigation; Tier 3b is the durable fix once contention is constant.
Whale-only (single E tenant, no others)Connector scan + within-tenant fan-out + Atlas IOPSConnector throughput; within-tenant write amp; eval loadConnector concurrency + Step 1 + Step 2 + parallel intra-tenant partitioning + M50+ provisioned IOPS. Tier 3a alone doesn't help — only one tenant.

Diagnostic heuristic for sales: if the pipeline shows up looking healthy on a single-tenant demo and falls over the moment a real customer turns on Entra-ServiceNow + AWS simultaneously, the bottleneck is almost certainly the connector RPS — not the platform. And vice-versa: if a single-customer workload is fine but adding a second customer immediately degrades the first, the bottleneck is the shared FIFO (Cliff 3), and Tier 3b — not Tier 3a — is the fix that actually changes the customer's experience.


Premium pricing tier argument (refined with Atlas-anchored marginal cost)

Tier 3a is independently sellable. It is the cleanest premium SKU available for at least the next 12–18 months because:

  • The CISO-readable contract clause is one sentence: "Your tenant's data resides on a dedicated MongoDB Atlas cluster in <region of your choice>, with per-cluster BYOK encryption keys (AWS KMS), PrivateLink network isolation, Database Auditing, point-in-time recovery, and an independent restore posture." This sells. (Every property in that sentence is already provisioned in ADR-020 Phase 0 on the shared cluster — Tier 3a is the per-tenant generalization of patterns already in place.)
  • The implementation cost per added tenant is the Atlas cluster monthly fee plus a fraction of a person-day of provisioning — both fully variable, both directly billable.
  • It does not commit to the Step 4 roadmap.

Atlas-anchored marginal cost per archetype

Numbers below include ~30% real-world overhead (backup, cross-AZ replication, PrivateLink hours, data transfer):

ArchetypeRight-sized clusterBare Atlas (single-AZ)All-in monthly (with overhead)All-in annualSuggested SKU uplift over sharedUplift / cost ratio
A (demo)shared M0 / pooled M10$0–58$0–80$0–960n/an/a
Bpooled M10 (≤ 5 tenants)~$58 / cluster~$80 / cluster ÷ 5 = ~$16/tenant~$190/tenant$6–12K/yr ("dedicated cluster, your region")30–60×
Cdedicated M30$389~$500~$6K$24–36K/yr uplift ("Premium Isolation")4–6×
Ddedicated M50 (single-region)$1,440~$1,900~$23K$60–120K/yr uplift ("Enterprise Isolation")3–5×
D + provisioned IOPSM50 + extra IOPS$1,440 + ~$500 IOPS~$2,500~$30K$60–120K/yr uplift2–4×
EM50+ multi-region (3-region replica set)~$4,300~$5,500~$66Kbundled into 7-figure ACVn/a

The arithmetic that matters: at archetype C, marginal infrastructure cost for delivering Tier 3a is ~$500/month ($6K/yr) anchored on actual M30 line-item pricing. The CISO-readable contract clause supports a $24–36K/yr SKU uplift — 4–6× cost. Structurally healthy gross margin on a SKU that requires zero net-new product engineering once the storage seam is upgraded to route by tenant — a one-time, bounded change.

The same arithmetic at D: marginal infrastructure cost is ~$23K/yr; the Enterprise Isolation uplift sits at $60–120K/yr (3–5×). Less fat than C because the cluster is more expensive, but still healthy. The point at which the ratio compresses to "barely worth it" is E + multi-region — at which point the contract is seven-figure ACV and the uplift is bundled, not line-itemed.

The same arithmetic applied to Step 4 (cells) is much worse — fully provisioned cells cost real engineering time per cell, the routing layer is real work, and the per-cell base cost is dominated by control-plane services, not storage. Tier 3a is the right premium SKU now; Step 4 is the right premium SKU only when a regulated buyer demands it.


Company stage timeline

A single archetype-C tenant on Atlas M30 + Azure B2s is plausible at Series A, not just at Growth — the production baseline can carry a single C tenant without any Step 1 work done. Step 1 becomes the gate for concurrent multi-tenant operation, not for the first C tenant per se.

Seed (now → ~10 paid tenants, archetype mix A/B)

  • Sufficient: baseline + Step 0 (lift hop limit, surface safety breakers).
  • Open work: scope Step 1 design.
  • Storage shape: Phase 0 — single shared Atlas M10 with logical control-plane / data-plane separation per ADR-020.

Series A (~10–40 paid tenants, first archetype-C tenant lands)

  • Comfortable on baseline for the first C tenant — single C on Atlas M30 + 4 GB API VM is the production-ready shape. No Step 1 required strictly.
  • Required for the second concurrent C tenant or first D pilot: Step 1 fully deployed.
  • Recommended: Step 2 prototype begins once the first ~1K-holder role appears in production data.
  • Storage shape: Phase 0 → Phase 1 transition triggers per ADR-020. First C tenant likely fires the "first paying EU enterprise customer" trigger and promotes to per-data-plane cluster split + multi-region control plane. Tier 3a Premium SKU sold but not yet provisioned for non-C tenants.

Growth (40–200 paid tenants, first D-class tenant in commercial pilot)

  • Required: Tier 3a ships. First C/D-class buyers land on dedicated clusters (M30 for C, M50 for D). Premium Isolation SKU on the order form. Per-tenant pool caps and idle eviction in place (see §pool budget).
  • Required (when one D-class tenant constantly starves the shared queue): Tier 3b — that tenant gets its own worker pool. Trigger is operational pain, not headcount.
  • In design: Step 4 (cells), driven by the first regulated commercial conversation; do not build until a real contract is in negotiation.
  • Storage shape: mix of pooled M10 (for B-class), dedicated M30 (for C-class on Premium), dedicated M50 (for D-class). ADR-020 Phase 1+ control-plane / data-plane split fully shipped.

Late Growth (200+ tenants, first regulated commercial deal)

  • Required: Step 4 (cells) for the regulated subset. Tier 3a tenants on a "commercial cell"; regulated tenants on a separate cell.
  • Required (for first whale tenant): Parallel intra-tenant partitioning workstream, regardless of which cell they sit in.
  • Storage shape: fleet of clusters managed by cell. Cell-level routing fronts everything.

The intuition that drives the staircase: Tier 3a lands after Step 1 + Step 2 are operating, sold as Premium, before any cell investment. Tier 3b only lands when fair-share lanes are no longer enough — when one tenant alone produces enough work to monopolize a pool. The first archetype-C deal is signable at Series A on production baseline alone.


Deal types that force re-architecture

Deal typeWhat the buyer saysWhat it forcesEarliest point we can sign honestly
Standard B-class commercial"Read-only posture for our 5K NHIs"Production baseline only (Atlas M10 + Azure B2s)Seed (today)
Single C-class commercial, no isolation clause"Our 25K NHIs, posture only"Production baseline + cluster upgrade to M30Seed–Series A (M30 upgrade is config, not engineering)
Mid-market with residency clause"Our data must stay in EU"Tier 3a with EU-region clusterSeries A (3a routing change shipped)
Mid-market with isolation clause"Our data on its own database"Tier 3aSeries A (3a routing change shipped)
Multiple concurrent C-class commercial deals"Just like the first one, on the same platform"Step 1 sub-step 1 (fair-share lanes)Series A (cheapest single piece of pipeline work)
Enterprise with throughput SLA"We need predictable sync time regardless of your other customers"Tier 3a + Tier 3bGrowth (3b shipped)
Enterprise with FedRAMP-aligned ask"Our data handling must align with FedRAMP Moderate"Tier 3a + per-tenant key separation (Atlas BYOK already in place)Growth (with security-team review of contract language; do not promise authorization)
Federal / DoD with FedRAMP authorization"We need Moderate (or higher) authorization"Step 4 + dedicated cellLate Growth (Step 4 operating)
Whale tenant (>100K NHI)"Onboard our whole estate"Step 1 + Step 2 + Tier 3a (M50) + parallel partitioningGrowth (assuming 3a + Step 2 are operating; partitioning is a 1–2 quarter project)
Cross-region latency SLA"EU users need <200ms p95"Step 4 with regional cellLate Growth

The biggest practical implication: the residency, isolation, and high-throughput deals all become signable at Growth stage on Tier 3a, well before Step 4 is on the roadmap, AND the first C-class deal becomes signable at Series A on production baseline alone. The investor pitch is correspondingly more credible — three deal classes are not deferred to a far-future cell architecture; they are unlocked at the next reasonable milestone, and the entry-level commercial deal is unlocked today.


Caveats and open numbers

  1. Production runtime is mid-cutover. ADR-022 commits to Azure VMs (B2s, 2 vCPU / 4 GB) with two prod replicas, but the migration is in flight. Sizing assumes the API process runs as a Node container on a VM with ~3 GB usable after OS + sidecars. Refine when the cutover is complete and observed RSS (resident set size) data is available.

  2. Tier 3a's connection-pool budget is a real operational cost. With per-tenant pool capped at 20 and idle eviction, ~250–500 active tenants per shared API process is achievable. Without those caps, the default 100-conn pool exhausts process file descriptors at ~50–100 tenants. Cap-and-evict is a one-time engineering change that should ship alongside the Tier 3a routing change.

  3. Atlas IOPS bound is real for Cliff 4. M30's 3K base IOPS saturates under archetype-D fan-out events; M50's 7.5K is the floor. Provisioned IOPS upgrades are available on M50+ up to 64K IOPS — a $300–700/mo line item that is essential, not optional, for whale-write workloads.

  4. FedRAMP cutoff (#9) is a defensible default, not a settled position. Whether SC-4 admits Tier 3a as compliant is buyer-specific and assessor-specific. Treat the "Tier 3a partially satisfies" line as a position to defend in a security review, not as a published guarantee.

  5. Premium pricing uplifts are anchored on infrastructure cost × 3–6×. That ratio is conservative for security software. Real uplifts should be benchmarked against comparable vendors (Snyk Premium, Wiz Enterprise, Lacework "Sovereign") rather than pure cost-plus.

  6. Tier 3b's "per tenant process" vs "per tenant logical lane" choice is left open. The table treats them as different (full process gives Cliff 1 isolation; logical lane does not). Pick a position before the first 3b implementation work; the investor-facing answer should be "per-process pools, isolated runtimes."

  7. Atlas pricing assumes AWS single-region single-AZ. GCP and Azure are within ~10%, but multi-region deployments (which any real Step 4 cell will use) are roughly 2.5–3× the single-region figure. Update the Premium Pricing Tier tables before quoting to a buyer who needs multi-region. Note: the prod cluster is currently AWS-Ireland (eu-west-1) per ADR-020 even though compute is Azure — the Atlas-on-Azure pricing sensitivity is a future revisit gated by ADR-022.

  8. The connector-side audit findings (Entra serial RPS, ServiceNow break-on-429, AWS no jitter) are not addressed by any platform tier. They live in sv0-connectors and need their own roadmap. A great platform fed by a slow connector is still slow to the customer.

  9. Role fan-out distribution per customer is genuinely unknown. The "1,200 holders at archetype C" figure could be 5,000 holders for a customer with one bad legacy "Developer" role. Cliff 4 arrives much earlier in that case — and the Atlas IOPS bound bites correspondingly earlier.


References