ADR-020: Multi-Region MongoDB Strategy via Per-Region Clusters
Status
Accepted (2026-05-03; Phase 0 carve-out added 2026-05-04; original draft 2026-05-02; verified against Atlas docs 2026-05-03)
The 2026-05-02 draft was reverted to Proposed after an adversarial architectural review surfaced four claims that needed verification or restructuring. The 2026-05-03 amendment added the control-plane / tenant-data-plane split, regional workers from Phase 2, region-tagged connector API keys, an explicit SLA, and the full residency control set. The 2026-05-03 verification pass against current MongoDB Atlas docs (see ADR-020 Atlas Verification) resolved three of the five open questions with citations and surfaced one substantive correction: M10 and M20 are both burstable CPU per Atlas Cluster Tiers; the real "dedicated CPU" fallback is M30. The cost band in §3 was widened accordingly. The strategic decision (per-region clusters over Global Cluster) is unchanged and survives even the M30-fallback worst case (~3× cheaper than a 2-zone Global Cluster).
The 2026-05-04 amendment adds Phase 0 (§0) — a pre-revenue carve-out that collapses the control plane and EU tenant data plane onto a single M10 single-region cluster with logical database separation. This defers the physical control-plane / data-plane split and the EU multi-region replica set until specific commercial triggers fire (first paying EU customer, first US client, or procurement requirement). Phase 1+ architecture below is unchanged — Phase 0 is a sequencing decision, not a reversal. Cost falls from ~$190/mo to ~$95/mo (prod + paused-off-hours staging), and the $500 startup credit now covers ~5 months instead of ~2.5.
Phase 0 carve-out (pre-revenue, single-cluster baseline)
The architecture in §1–§9 below describes the target state from Phase 1 onward. Phase 0 is the simpler shape we run today through the first commercial trigger, justified by the fact that the first onboarding client is an unpaid demo and no procurement officer is currently asking residency or DR questions.
Phase 0 shape
- One M10 single-region Atlas cluster in
aws:eu-west-1(Ireland). Holds both planes, separated logically by database name:control_planedatabase:tenants,users,memberships(per §1)tenant_data_eudatabase: everything keyed bytenant_id(per §1)
- One M10 single-region staging cluster in the same region, paused outside business hours via Atlas API to amortize ~$25/mo of compute cost.
- No multi-region replica set. Control plane SLA in Phase 0 inherits the data plane single-region 99.5% — no automatic failover. Documented as such in §8.
- No US data plane. US is provisioned only when a US client signs (paid or unpaid) — that's a residency hard requirement, not a deferrable structural one.
- No regional API or workers. Single EU API + single EU worker pool serves all traffic. The Cloudflare edge router from §5 is not yet provisioned.
- Cedar Resource Policy is provisioned in Phase 0. Cheap, no reason to defer; carries forward unchanged into Phase 1+.
- BYOK + PrivateLink + Database Auditing + PITR all enabled in Phase 0. These are M10-tier features and the cost-of-procurement-evidence math doesn't change with cluster count.
Why the storage adapter abstraction makes this safe
The application code looks identical in Phase 0 and Phase 1+. controlPlane() returns a connection scoped to the control_plane database; forTenant(tenantId) returns a connection scoped to tenant_data_eu. In Phase 0 both pointers resolve to the same MongoClient (different .db() calls); in Phase 1+ they resolve to two different MongoClient instances against two different clusters. The Phase 1 cutover is a config change (new MONGO_URI_EU_DATA secret), not a code refactor. The logical split must be present in code from Phase 0 — this is what makes Phase 1 cluster split mechanical instead of structural.
Phase 0 → Phase 1 triggers
Promote out of Phase 0 the moment any of these fires. Triggers are evaluated independently — meeting one is sufficient.
| Trigger | Source | Action on fire |
|---|---|---|
| First paying EU enterprise customer signs (any ARR) | Sales | Provision separate sv0-data-plane-eu-prod cluster; migrate tenant_data_eu database via Atlas Live Migration. Promote control plane to multi-region replica set (eu-west-1 + eu-central-1). |
Total tenant_data_eu storage exceeds 20 GB OR P95 query latency exceeds 300 ms sustained for >7 days | Atlas metrics | Same as above — capacity-driven cluster split. |
| Procurement / DPIA explicitly requires control-plane / data-plane physical separation in writing | Customer questionnaire | Same as above — provision before contract signature. |
| First US-resident client signs (paid or unpaid demo) | Sales | Provision sv0-data-plane-us-prod immediately. US separation is regulatory, not deferrable. Phase 1 EU split happens in the same rollout if not already done. |
| Customer contracts 99.9% SLA with financial penalties | Sales | Promote control plane to multi-region replica set; that customer's data plane upgrades to M30 multi-region within their jurisdiction (per §8 Phase 4 trigger). |
Phase 0 has no time limit. It runs as long as no trigger fires. The expectation is months, not years — MediaPro/Pelayo paid contracts or a US demo will fire one of the first three triggers in 2026 Q3 latest.
Cost model under Phase 0
| Environment | Setup | Active monthly | Paused-off-hours monthly |
|---|---|---|---|
| Prod | 1 × M10 single-region (eu-west-1), BYOK + PrivateLink + Audit + PITR | ~$60 | n/a (never paused) |
| Staging | 1 × M10 single-region (eu-west-1), same features | ~$60 | ~$30 (compute paused 16h/day weekdays + all weekend) |
| Phase 0 total | ~$120 | ~$90 |
The $500 MongoDB for Startups credit covers ~5 months of Phase 0 with staging paused off-hours, ~4 months without pausing.
What stays unchanged from §1–§9 in Phase 0
- Cedar Resource Policy at the org level — applies to all clusters from day 1.
- BYOK encryption with AWS KMS — provisioned on the Phase 0 cluster.
- PrivateLink, Database Auditing, PITR — all on.
- Connector API key prefix convention (
sv0_prod_eu_…) — keep this from day 1 even though there's only one region; Phase 1 cutover stays mechanical. - Storage adapter
controlPlane()/forTenant(tenantId)interface — already required, just resolves to the same client in Phase 0.
What is deferred in Phase 0
- Multi-region replica set on the control plane (§3, §8) — promotes when first paying EU customer or 99.9% SLA trigger fires.
- Physical control-plane / data-plane cluster split (§1, §6) — promotes on the first three triggers above.
- US data plane provisioning, US Atlas project, US Resource Policy exclusion (§2, §6) — provisioned on US-client trigger only.
- Regional API + worker pools (§5) — provisioned on US-client trigger only.
- Cloudflare Worker edge routing (§5) — provisioned on US-client trigger only.
Context
The first paying clients are Spanish (MediaPro, Pelayo). The next wave will likely include US enterprises. Both groups will eventually be asked by their procurement teams where their data lives. EU customers expect EU-region storage under GDPR + LOPDGDD; US enterprises (especially F500 CISOs) increasingly expect US-region storage as a default in vendor questionnaires.
The platform today runs a single self-hosted MongoDB instance in Docker. Migrating to MongoDB Atlas is happening in the next few weeks ahead of MediaPro onboarding. We have unredeemed MongoDB for Startups credits (base $500, up to $5K with referral) covering the first 12 months of Atlas spend.
The architectural question: how do we serve clients in different regions, with per-tenant data residency, from the single application domain app.securityv0.com?
Two paths considered
Option A — Atlas Global Cluster (sharded, single logical cluster)
MongoDB's named product for this use case. One logical cluster, one connection string, sharded across geographic zones. Each tenant's data is pinned to a region via a {location, tenant_id} compound shard key. The mongos router transparently routes queries to the correct shard.
- Tier minimum: M30 (confirmed in Atlas docs).
- Cost (2-zone EU+US, replication factor 3): ~$2.5K–$3.5K/month all-in, including data nodes, config servers, backup storage, and cross-region transfer.
- Operational footprint: one cluster, one billing line, one upgrade/patch surface.
- Founder constraint satisfied: yes — exactly the product MongoDB built for "single ops footprint, multi-region residency."
Option B — Per-region clusters with app-side tenant routing
Run one independent Atlas cluster per region. The application maintains a tenant-to-region map. The storage adapter holds a connection pool per region and selects the right one based on the tenant of the incoming request. A small control plane lives in one region (EU) holding cross-tenant operational metadata; a tenant data plane per region holds all tenant-scoped data.
- Tier minimum: M10 baseline per region; M20 fallback if M10 lacks required procurement features (BYOK, PrivateLink, Audit) — see §3.
- Cost (1 EU + 1 US): ~$120–$280/month for both clusters depending on tier, plus modest backup storage and cross-region egress.
- Operational footprint: N clusters per region — separate connection strings, backup policies, schema migration runs.
- Founder constraint satisfied: partially — single ops footprint at the application layer (storage adapter abstracts the routing), but multiple clusters at the infrastructure layer.
The decision criterion
The cost delta is ~10–25× at the point we have the second region (depending on which tier the procurement features force us to). Atlas Global Cluster makes economic sense at the scale where database cost is small relative to revenue per tenant — typically once ARR per cluster exceeds ~$100K. At pilot scale, $3K/month for two zones is unjustifiable when $120–$280/month delivers the same residency guarantee with a small amount of routing code in the storage adapter.
The "ops nightmare of N regional clusters" the founder originally feared is real but bounded: it's two clusters, not twenty. Backup policy, schema migrations, and monitoring are configured once per cluster and survive unchanged for years. The marginal operational cost of cluster #2 is much smaller than the marginal cost of going from one cluster to a Global Cluster.
Decision
Adopt per-region MongoDB Atlas clusters with application-side tenant-to-region routing, an explicit control-plane / tenant-data-plane split, region-tagged connector API keys, and regional worker pools from Phase 2. Do not adopt Global Clusters at this stage.
The decision has eight parts.
1. Control plane vs tenant data plane (the central split)
Phase 0 note: the split is logical (separate databases on the same cluster) until a Phase 1 trigger fires. See §0 Phase 0 carve-out. The architecture below describes the Phase 1+ target state.
The platform has two operationally distinct namespaces. Conflating them was the central gap of the initial draft.
Control plane — cross-tenant operational metadata. Lives in one location, dedicated _control database alongside the EU tenant data plane in the EU cluster (Phase 1/2). Replicated read-only mirrors only added if a customer specifically objects (§9 trigger).
| Collection | Purpose | Why it must be cross-tenant |
|---|---|---|
tenants | tenant registry, includes region field | Routing requires looking up the region before knowing which cluster to query |
users | user identity mirror, indexed by provider_user_id | Bearer/session middleware resolves the user before knowing the tenant |
memberships | user × tenant × role | Login flow loops memberships to render the tenant picker |
Tenant data plane — per-region cluster. Holds everything keyed by tenant_id:
entities, entity_versions, events, findings, evidence_packs, posture_snapshots, correlations, stitched_paths, connector_instances, connector_syncs, scan_scopes, connector_instance_api_keys, audit_log, ...
Residency classification of the control plane: operational metadata under SCCs/DPF. Customer-facing language for procurement:
"Operational metadata (user identities, tenant configuration, billing) is processed in our EU control plane under Standard Contractual Clauses and the EU-US Data Privacy Framework. Customer security posture data — entities, findings, evidence packs, audit trails — is residency-pinned to the contracted region and never leaves it."
Caching: the API caches the tenant_id → region map in-process with 15-minute TTL plus explicit invalidation on tenant config write (admin API or change stream). Region changes are once-per-tenant-lifetime; the long TTL is cheap.
2. Per-region tenant data plane clusters — and regional compute on the same trigger
- Phase 1 (today through first US client): one cluster in
aws:eu-west-1(Ireland). Holds the EU control plane (_controldatabase) and the EU tenant data plane (tenant_datadatabase). One API + one worker pool, both EU. All current and near-term tenants land here. - Phase 2 (trigger: first US client whose contract specifies US-resident storage): ship the complete US plane:
- US Atlas cluster in
aws:us-east-1(US tenant data plane only — control plane stays in EU) - US API server (Hetzner CX22 or Cloud Run, ~€5/mo)
- US worker pool (same)
- Cloudflare Worker at the edge that reads the session → routes to the regional API
- All shipped in one rollout, not staggered
- US Atlas cluster in
- Phase 3 (only if procurement specifically demands it): add
apacor other regions on the same pattern. No region is added speculatively — each is tied to a paying customer.
Why API + workers go regional together in Phase 2 (revised after 2026-05-03 review): an earlier draft kept the API single-region in Phase 2 and deferred the regional API split to Phase 3. That was wrong. A typical dashboard request issues 5+ sequential queries to fetch session, user, tenant, findings, and entity details; at 160ms RTT each, an EU API serving a US user injects ~800ms–1s of pure network latency per page load — unusable UX. It also leaves the residency story dependent on a "transient processing in EU memory" disclaimer that strict customers will challenge in procurement. Deploying a regional API alongside the regional worker pool costs ~€5/mo extra and removes both problems. The first reviewer flagged the latency softly; an independent second-pass review (Gemini 2026-05-03) named it as fatal and the team agreed.
Each cluster is a fully independent Atlas project with its own backup policy, monitoring, alerting, and Atlas Resource Policy restricting it to the region it serves.
3. Tier baseline (M10), fallback (M30 if dedicated CPU is required)
Phase 0 note: runs at M10 single-region in
aws:eu-west-1only — no multi-region replica set on the control plane until Phase 1 triggers fire. Phase 0 cost is ~$60/mo prod + ~$30/mo paused staging = ~$90/mo. The §3 description below applies from Phase 1 onward; see §0 Phase 0 carve-out for the pre-revenue baseline.
Plan for M10 dedicated per region as the baseline (~$80–$120/mo all-in including BYOK surcharge and PrivateLink endpoint cost). M10 in aws:eu-west-1 and aws:us-east-1 carries every procurement feature we need, verified against current MongoDB Atlas docs (2026-05-03):
- ✅ BYOK encryption with AWS KMS — minimum tier M10. Source
- ✅ PrivateLink (AWS) — minimum tier M10; both
eu-west-1andus-east-1are mainstream regions, supported. Source - ✅ Database Auditing — minimum tier M10; default destination is cluster-local (matches §6 requirement). Source
- ✅ Continuous Cloud Backup with PITR — minimum tier M10; default 7-day daily snapshots + 24h PITR oplog. Source
- ❌ Dedicated CPU is NOT available at M10 or M20 — per Atlas Cluster Tiers, both M10 and M20 are burstable (shared CPU on
t3-class instances). Dedicated CPU starts at M30 (~$394/mo base, ~$450–$500/mo all-in per region).
The fallback is M30, not M20. If a customer's procurement language explicitly requires "dedicated compute" or "no noisy-neighbor guarantees" (some F500 questionnaires do), upgrade that region's cluster to M30. Worst-case Phase 2 with both regions at M30: $900–$1,000/mo. Strategic decision unchanged — even at M30 fallback, the cost gap to a 2-zone M30 Global Cluster ($3K/mo) remains ~3×.
The M10 → M30 upgrade is a live operation in Atlas (~10–20 min cluster scale-up, no downtime), so the M30 fallback is genuinely a deferred decision: ship M10 by default, upgrade if a customer asks the question. Don't pre-pay for dedicated CPU before someone demands it in writing.
Control plane runs multi-region within EU jurisdiction (decided 2026-05-03 — see §8). The control-plane cluster is M10 with electable nodes in both aws:eu-west-1 (3 nodes, primary) and aws:eu-central-1 (2 nodes, secondary), giving Atlas-managed automatic failover within ~5 minutes. This adds ~$60–$90/mo over a single-region M10. Tenant data planes remain single-region per residency model (multi-region data plane would only happen at the Phase 4 99.9%-SLA trigger).
Open evidence to attach (procedural, not decision-blocking): live Atlas console screenshots of M10 default backup policy + PITR retention, M10 PrivateLink badge in the region picker, and the BYOK per-project surcharge dollar amount (the docs page does not publish it). Bundle in docs/compliance/residency-evidence-<region>.md once the live Atlas org is provisioned.
4. Region-tagged connector API keys (with lazy migration of legacy keys)
Connector API keys must carry their region in the key prefix:
sv0_prod_eu_<32-byte-random>
sv0_prod_us_<32-byte-random>
The bearer middleware reads the prefix → routes the hash lookup to the correct region's cluster → no fan-out across clusters, no cross-region scan on the hottest write path (connector ingest).
Legacy key migration plan (decided 2026-05-03): lazy now, forced reissuance at Phase 2. Until a second region exists, every legacy sv0_prod_<random> key is unambiguously EU by definition (only one cluster). The bearer middleware treats keys with no region prefix as EU. When Phase 2 ships (first US client signs), the same coordinated rollout that adds the US plane also reissues every legacy connector key in the new format and removes the legacy fallback in the bearer middleware. No customer action required pre-Phase-2; the Phase-2 reissuance folds into the same customer-facing change window. A legacy_connector_key_usage metric tracks the population so Phase 2 cutover verification is observable.
This requires:
- Wire-format change in
sv0-connectors(the connector receives its key at provisioning time and uses it as-is — it does not need to parse the prefix). - One change in the bearer middleware (
src/api/middleware/bearer-token-middleware.ts) to extract the region prefix and route the hash lookup. - Connector key issuance must be region-aware: when a tenant is provisioned in a region, all keys minted for that tenant carry that region's prefix.
Today's keys (legacy sv0_prod_<random>) are EU-region by definition (only the EU cluster exists). Migration to region-tagged keys can happen lazily — issue all new keys in the new format; legacy keys remain valid against the EU cluster only.
5. Regional API + worker plane from Phase 2 (revised)
The compute plane mirrors the data plane region-for-region from the moment Phase 2 begins:
| Phase | API location | Worker plane location | Edge routing |
|---|---|---|---|
| 1 | EU only | EU only | None (single region) |
| 2 | EU + US | EU + US | Cloudflare Worker reads session → routes to regional API |
| 3 | EU + US + APAC | EU + US + APAC | Same Cloudflare Worker, more regions |
API provisioning per region: a small Hetzner CX22 (~€5/month) or Cloud Run instance per region. Each API process is configured with the region's MONGO_URI_<REGION> for the tenant data plane plus a read-only connection to the EU control plane. The Cloudflare Worker at the edge reads the user's session/JWT and dispatches to the right regional API.
Worker provisioning per region: identical pattern. Each worker process only claims jobs from its region's scan_scopes. Each region runs its own scheduler — no cluster-wide claim primitive, no cross-region race conditions.
Connector secrets storage: create regional 1Password vaults (sv0-bots-eu, sv0-bots-us). The credential broker reference (credentials_ref in connector_instances) becomes region-aware (op://sv0-bots-us/...). Tenant secrets never cross the residency boundary.
Total Phase 2 compute footprint per region: one API box + one worker box = ~€10/month per region added. Total Phase 2 compute increment over Phase 1: ~€10/month. This is small enough that the previous rationale ("defer regional API to Phase 3 to save ops complexity") is not worth the latency and procurement cost it incurs.
6. Atlas Resource Policy + full residency control set
Cluster region is constrained by an Atlas Resource Policy (Cedar). Per MongoDB's docs and the GA announcement (2025-04-14), Resource Policies are:
- GA since 2025-04-14, blocking (not advisory) — Atlas refuses cluster create/modify operations that violate them
- Organization-scoped by default with optional per-project exclusion (not project-scoped, as an earlier draft of this ADR claimed)
- Enforced across UI, Administration API, Terraform, and CloudFormation — every surface the team uses
- Modifiable only by Organization Owners
Practical setup for SecurityV0: one EU-only policy at the org with the US project excluded; one US-only policy at the org with the EU project excluded. Both policies are stored in Terraform via mongodbatlas_resource_policy and inherit the same review/approval gate as any other infra change.
Resource Policies cover cluster region only — they do not automatically constrain backup region, KMS region, audit log destination, or replica member regions. The full control set must be locked down per cluster:
| Knob | Required setting | Where it's configured |
|---|---|---|
| Cluster region | Single allowed region | Resource Policy (Cedar) at org level, project-scoped via exclusion |
| Backup snapshot region | Same as cluster region (default per Atlas docs) | No copy_settings in mongodbatlas_cloud_backup_schedule |
| Backup copy region | None enabled | Same — omit copy_settings block |
| KMS key | AWS KMS in same region | BYOK config → key ARN; snapshots inherit cluster CMK per Cloud Backup Encryption |
| Replica set members | All in cluster's region (no cross-region electable) | Cluster topology |
| Database audit log | Cluster-local destination (default) | Database Auditing config — do not enable S3 export |
| Application logs (Grafana Cloud, BetterStack) | Per-region project (see §7) | App-side logger config |
Sample Cedar policy (EU-only, org level):
forbid (
principal,
action in [
ResourcePolicy::Action::"cluster.create",
ResourcePolicy::Action::"cluster.modify"
],
resource
)
unless {
[
ResourcePolicy::Region::"aws:eu-west-1",
ResourcePolicy::Region::"aws:eu-west-2",
ResourcePolicy::Region::"aws:eu-central-1"
].containsAll(context.cluster.regions)
};
Procurement evidence pack per region: Cedar policy export (GET /orgs/{ORG-ID}/resourcePolicies JSON) + the Terraform state showing mongodbatlas_resource_policy body verbatim + screenshots of backup, KMS, audit, and topology configs, bundled in docs/compliance/residency-evidence-<region>.md. Refresh quarterly. The companion endpoint GET /orgs/{ORG-ID}/nonCompliantResources should return empty — capture that snapshot too.
7. Observability and audit log routing
Application logs already carry tenant_id by design — see PR #763 which deliberately moved per-tenant verdict attribution from Prometheus metrics to structured logs, because the metrics surface was a tenant-enumeration oracle. Do not strip tenant_id at the logger, that would defeat the recent fix and lose Loki-based per-tenant queryability.
The right answer is per-region log destinations. Grafana Cloud supports regional projects:
- EU API + EU workers → Grafana Cloud EU project
- US API + US workers (both from Phase 2) → Grafana Cloud US project
Routing is by environment variable at process startup (GRAFANA_LOKI_URL per deployment). No tenant-side change. Adds a small operational tax (two Grafana dashboards instead of one) but keeps tenant_id log labels intact and residency-clean.
Database audit logs stay cluster-local (Atlas Database Auditing writes to the cluster itself). They are residency-safe by construction. Do not export them to a central observability stack.
8. SLA commitment
Phase 0 note: under Phase 0, the control plane is co-resident with the EU data plane on a single-region M10. Both inherit the 99.5% single-region SLA — no automatic failover, no cross-region promotion. Phase 0 is unsuitable for any customer contract that names a 99.9% target with penalties; if such a contract is on the table, the trigger has fired and Phase 0 is over. See §0 Phase 0 carve-out.
The platform has two distinct SLA tiers because the control plane and data planes have different residency constraints and therefore different DR postures.
Control plane: automatic failover within EU jurisdiction. RTO target: under 5 minutes for cross-region failover between aws:eu-west-1 and aws:eu-central-1. Atlas handles election and failover transparently — application connection string is unchanged. Cost: ~$120–$150/mo (M10 multi-region replica set within EU). This is mandatory infrastructure, not a customer-tier upgrade — without it, an eu-west-1 outage takes down auth for every customer globally including healthy US data planes.
Tenant data planes: 99.5% monthly uptime per cluster, matching Atlas's single-region M10/M20 SLA. Per residency model, data planes are intentionally NOT multi-region — that would require putting customer data in a region the customer didn't contract for.
Customer-facing language:
"Control plane: automatic failover within the EU jurisdiction. Target RTO under 5 minutes for cross-region failover between Ireland and Frankfurt."
"Tenant data plane: 99.5% monthly uptime per cluster. Excludes scheduled maintenance windows and upstream cloud provider region outages (e.g., AWS regional incidents). Service credits per [terms]."
The cloud-provider carve-out is industry-standard (Stripe, Vercel, Snowflake all use it). Simultaneous outages across both EU regions qualify as an EU-wide AWS incident under the carve-out.
Sales must not commit to 99.9% on the data plane under this architecture. 99.9% requires multi-AZ replicas at minimum, and full coverage of regional outages requires multi-region failover — which contradicts our residency model for tenant data.
Phase 4 trigger for 99.9% on a specific tenant's data plane: an enterprise customer signs a $250K+ ARR contract that contractually requires 99.9% with financial penalties. At that point upgrade their region's cluster to M30+ multi-region replicated within a single legal jurisdiction (e.g., Ireland + Frankfurt for EU residency). Bake the cost into the contract. Other tenants in the same region stay at 99.5% per the standard SLA.
9. Schema migrations apply per-cluster, with mandatory expand-and-contract discipline
A migration runner script iterates over the configured regions (EU control plane + EU data plane + US data plane + …), connects to each cluster in turn, and applies pending migrations. Migration files are region-agnostic (all data planes run the same schema; the control plane has its own migration namespace). The runner reports per-region success/failure independently.
Hard rule (added 2026-05-03): every breaking schema change MUST follow expand-and-contract across at least three deploys. The API code is deployed globally — there is no way to atomically synchronize a schema change across N independently-migrated clusters, so a single-step "drop a column" migration that succeeds in EU and fails in US would instantly produce 500s for US tenants on the next API deploy.
The expand-and-contract pattern:
- Migration N — expand. Add the new field, populate it via dual-write, do not remove the old field. Deploy the API to write to both. All clusters now hold both old and new shapes.
- Migration N+1 — switch reads. Update the API to read from the new field. Old field still exists. Deploy.
- Migration N+2 — contract. Drop the old field. Deploy.
For the migration runner this means:
- The runner refuses to apply a migration tagged
breakingunless the previous deploy is at least N-1 in the sequence (tracked in a_migrations_statecollection in the control plane). - Pre-PR review checklist requires every schema change to be tagged either
non-breaking(additive) orbreaking(with the three-step plan documented). - Rollback procedure for a single-region failure: pause deploy, fix the failed cluster, resume. Never fast-forward past a partial migration.
Why this is mandatory and not optional: with a single global cluster the deploy pipeline can sequence migrate-then-deploy atomically. With per-region clusters we cannot — the migration is by definition asynchronous across regions, and the global API deploy that follows must tolerate any cluster being one step behind. Expand-and-contract is the only correct pattern.
Consequences
Positive
- ~3–25× cheaper than a Global Cluster across the entire tier band (~$160–$1,000/mo for two regions vs ~$3K/mo for a 2-zone Global Cluster). Strategic decision survives even the M30-fallback worst case.
- Stronger residency claim. A US-region cluster physically cannot hold EU data, because the cluster does not exist outside
aws:us-*. Resource Policies + the residency control set (§6) make this auditable end-to-end. Verified against current Atlas docs (2026-05-03). - Compute mirrors data from Phase 2. Regional API + regional workers from the moment the second region exists. No "transient processing in EU memory" disclaimer to defend in procurement; no N+1 trans-Atlantic latency on US user dashboard loads.
- Hot-path lookups stay region-local because connector API keys are region-tagged at the wire. No cross-region scans on the connector ingest path.
- Smaller blast radius. A schema migration that goes wrong in one region doesn't take down the other. A noisy-neighbor tenant in one region doesn't affect tenants in another.
- Per-region disaster recovery is independent — backup, restore, and PITR are each scoped to a smaller M10/M20/M30 replica set.
- Path to single-tenant deployments is the same code path. A dedicated-deployment customer is just a region with one tenant; the same routing logic handles it.
Negative (Acceptable Trade-offs)
- Control plane is a single point of failure. If the EU cluster is down, all auth and tenant resolution stops globally — including for US tenants whose data plane is up. Acceptable for Phase 1/2; mitigated by control-plane read replicas in Phase 3 if a customer's procurement requires it.
- Two Atlas projects, two backup policies, two KMS keys, two audit log destinations, two API+worker deploys. Acceptable — done once via Terraform (per ADR-019) and rarely revisited.
- Connector API key wire format change. Acceptable — bounded change in bearer middleware + connector provisioning; legacy keys remain valid against EU.
- Expand-and-contract migration discipline is mandatory for every breaking schema change (§9). Slows the deploy cycle for any breaking change to three deploys instead of one. Acceptable — the only correct pattern across asynchronously-migrated clusters.
- Tenant region changes require a manual migration. Acceptable — extremely rare event, performed as a coordinated operation rather than an automated capability.
- No cross-region queries. Acceptable — confirmed against the codebase that all current cross-tenant operations (
listAllTenants,findUserByProviderUserId,findApiKeyByHash, scheduler claim) either belong in the control plane (first three) or have been refactored to be region-scoped (scheduler).
Risks Mitigated
- ✅ Storage adapter abstraction (per ADR-001) absorbs the routing logic without leaking into business logic.
forTenant(tenantId)returns the right region'sMongoClient; control-plane operations go through a separatecontrolPlaneClient. - ✅ Tenant isolation via
X-Tenant-Idalready exists; region-routing extends the same middleware. - ✅ Atlas Resource Policy + the full residency control set (§6) provide auditable, end-to-end region pinning.
- ✅ Region-tagged API keys (§4) eliminate the cross-cluster scan on connector ingest.
- ✅ Regional API + workers from Phase 2 (§5) eliminate both the latency hit and the residency processing concern flagged by independent reviewers.
- ✅ Expand-and-contract migrations (§9) eliminate the partial-migration deployment trap that would otherwise produce 500s for one region after a global API deploy.
- ✅ Per-region
MongoClientmaxPoolSizecapped at 20 (down from default 100) — keeps connection pool memory bounded as region count grows.
Risks Accepted
- ⚠ EU outage = global outage of the control plane. All auth stops if EU cluster is down, even for US tenants whose data plane is healthy. RTO is bounded by Atlas Cloud Backup PITR (~1 hour to restore + DNS propagation). Document this in the SLA terms (§8).
- ⚠ Auth path makes a control-plane round-trip on cache miss — every (API instance × tenant) combination pays one ~80ms cross-region hop per 15-minute TTL window. Acceptable; amortized cost is near-zero. Phase 3 optimization: embed
regionas a custom claim in the WorkOS JWT so the API can route on the JWT itself without ever consulting the control plane on the auth path. Eliminates the SPOF coupling for steady-state requests; control plane is only hit on tenant config writes and new-user onboarding. - ⚠ Operational drift between clusters is possible if changes are made directly in the Atlas console. Mitigated by Terraforming all cluster configuration once
sv0-infrastructurecovers the Atlas provider.
When to Reconsider
Reopen this ADR if any of these conditions hold:
- Cross-region analytics become a product requirement that cannot be satisfied by an aggregation layer — e.g., a large customer asks for a single dashboard across their EU and US subsidiaries.
- A single-tenant deployment customer signs in a region we don't already serve (e.g.,
aws:ca-central-1). Trigger to extend the region enum and provision a third cluster — small operational lift, but the ADR should track that this is a real trigger, not just "Phase 3." - More than four regions are in operation. At that point the per-region operational overhead starts to approach Global Cluster ops, and the cost gap narrows because the Global Cluster spreads across more zones.
- A single tenant outgrows an M10/M20 to the point of needing M30+ in their region. Once we're paying M30 for a single-tenant cluster, the Global Cluster math shifts.
- A customer with $250K+ ARR contractually requires 99.9% SLA with financial penalties. Triggers the Phase 4 multi-AZ-within-jurisdiction upgrade described in §8.
- Procurement requirement appears for a globally-replicated read tier (e.g., a customer with offices in 5 regions wanting sub-50ms read latency from each). Global Cluster is the cleaner answer for that pattern; per-region clusters are not.
- Control-plane outage repeatedly takes down healthy regional data planes. Trigger to add control-plane read replicas in each region.
Until then, per-region clusters with app-side routing remain the answer.
Resolved items
All five open questions from earlier drafts have been closed:
- ✅ M10 feature coverage — verified against current Atlas docs. M10 supports BYOK + PrivateLink + Database Auditing + Continuous Cloud Backup with PITR. M10 does NOT have dedicated CPU — neither does M20; the real fallback is M30. Cost band documented in §3.
- ✅ Atlas Resource Policy enforcement — verified blocking, GA since 2025-04-14, enforced across UI/API/Terraform/CloudFormation. Org-scoped with project exclusion. Sample Cedar policy in §6.
- ✅ Atlas Cloud Backup snapshot region default — verified same-region by default; cross-region copies require explicit
copy_settings. Snapshots inherit cluster CMK. Documented in §6. - ✅ Connector API key prefix migration plan — decided lazy now, forced reissuance at Phase 2. Documented in §4.
- ✅ Control plane DR posture — decided multi-region replica set within EU jurisdiction (
aws:eu-west-1+aws:eu-central-1), Atlas-managed automatic failover, RTO < 5 minutes. Documented in §3 and §8.
Procedural follow-ups (not decisions, just things to capture once Atlas is provisioned — bundled in docs/compliance/residency-evidence-<region>.md):
- M10 PrivateLink badge for
eu-west-1,eu-central-1, andus-east-1in the region picker - M10 default backup policy + PITR retention window
- Advanced Security (BYOK) per-project surcharge dollar amount (docs page does not publish it)
- Saved EU-only Resource Policy in the Atlas console UI
- Multi-region replica set topology screenshot (3 nodes Ireland + 2 nodes Frankfurt) as evidence of automatic failover capability
Related Documents
- 14-multi-region-database-deployment.md — How the per-region setup actually serves clients (with diagrams)
- ADR-001 — MongoDB-only decision and StorageAdapter abstraction
- ADR-016 — Tenant identification via
X-Tenant-Id - ADR-017 — Auth boundary and session model
- ADR-019 — Terraform for Atlas project provisioning
- PR #763 — Strip
tenant_idfrom Prometheus metrics; per-tenant attribution moved to structured logs (informs §7 logging strategy) - Research memo:
~/dev/securityv0/.scratch/session-notes/sv0-platform/mongodb-atlas-residency-recommendation.md(residency analysis) - Adversarial review (internal):
~/dev/securityv0/.scratch/session-notes/sv0-platform/adr-020-adversarial-review.md(drove control-plane split, regional workers, SLA, residency control set) - Adversarial review (Gemini, second pass): drove regional API in Phase 2 + expand-and-contract migration discipline + JWT-region-claim Phase 3 optimization
- Atlas verification:
~/dev/securityv0/.scratch/session-notes/sv0-platform/adr-020-atlas-verification.md(closed 3 of 5 open questions with citations)