Skip to main content

Cross-Env Tenant Reconciliation

TL;DR

CANONICAL_TENANTS (sv0-platform/src/domain/tenants/canonical-tenants.ts) is the single source of truth for the SV0-owned tenants (the demos + securityv0-internal). To make a deployed environment match it, run the apply-canonical-tenants.ts reconciler from a local main checkout, pointed at that env's Mongo. It is dry-run by default and read-only until you pass --apply.

The reconciler only syncs identity (display_name, tenant_class, description, status) on tenants that already carry the canonical slug. It does not rename slugs and does not create tenants — those are separate steps below.

Scope — reconcile only

The reconciler syncs identity on tenants that already carry the canonical slug; it does not rename slugs or create tenants. Renaming is a separate one-shot Mongo migration (tenant_id is the slug string, so renaming the tenants row alone orphans all data); creating is a seed (demo_seed) or a manual WorkOS-org + scan (demo_real). Both are out of scope here — see the design note if you hit them.

Reconciler contract

  • Update-only — never creates or renames (would force fabricating a per-env provider_org_id); missing tenants are reported, not created.
  • Hard-refuses (exit 1, zero writes) if any SV0 slug already holds a customer_* row.
  • Asserts the const as authoritative, so it overwrites admin-PATCH edits to the synced fields — dry-run shows the diff first. status is the highest-impact field (it flips active↔evaluation); scrutinise status diffs.

Per-env Mongo access

AspectHetzner devAzure devAzure staging
URLdev.securityv0.comdev-azure.securityv0.comstaging.securityv0.com
Mongoself-hosted sv0-main-mongoself-hosted sv0-mongo containerAtlas (sv0-staging cluster)
DBsv0_platformsv0_devsv0_staging
AccessSSH tunnel to deploy@178.156.217.150cloudflared SSH to sv0admin@dev-azure-ssh.securityv0.com, then tunnel the container port (no docker-group needed — port is host-published)Atlas TLS from an allowlisted IP only — laptop is blocked; run from the VM / CI (see below)

Always pass MONGODB_URI + MONGODB_DB inline. With neither set the script defaults to mongodb://127.0.0.1:27017 / sv0_platform. If a stray local tunnel is bound to :27017, the default silently targets the wrong database.

Hetzner dev

ssh -fN -L 27018:127.0.0.1:27017 deploy@178.156.217.150
cd sv0-platform # a checkout of merged main
MONGODB_URI=mongodb://127.0.0.1:27018 MONGODB_DB=sv0_platform npx tsx scripts/apply-canonical-tenants.ts # dry-run
MONGODB_URI=mongodb://127.0.0.1:27018 MONGODB_DB=sv0_platform npx tsx scripts/apply-canonical-tenants.ts --apply # if drift

Azure dev

VM vm-sv0-dev-1 in rg-sv0-dev. Reached through Cloudflare Access (see the Agent Auth for Deployed Envs and Azure VM Landing Zone runbooks) — requires the one-time ~/.ssh/config cloudflared block and a browser CF Access login (interactive; a headless agent cannot complete the OAuth). Mongo container is sv0-mongo, DB sv0_dev.

# 1. confirm the container's published Mongo port (run via az — sv0admin is NOT in the
# docker group, so `docker ...` over SSH returns "permission denied"). az runs as root:
az vm run-command invoke -g rg-sv0-dev -n vm-sv0-dev-1 --command-id RunShellScript \
--scripts "docker port sv0-mongo" --query "value[0].message" -o tsv
# -> 27017/tcp -> 127.0.0.1:27017 (published to the VM host, so the SSH tunnel works
# even without docker-group access)
# 2. tunnel it
ssh -fN -L 27019:127.0.0.1:27017 sv0admin@dev-azure-ssh.securityv0.com
# 3. dry-run against the sv0_dev DB
MONGODB_URI=mongodb://127.0.0.1:27019 MONGODB_DB=sv0_dev npx tsx scripts/apply-canonical-tenants.ts
  • Host-key change after a VM rebuild. A reprovisioned VM regenerates its SSH host keys, so you'll get REMOTE HOST IDENTIFICATION HAS CHANGED (a changed key is a hard refuse, no prompt). Verify the new key authoritatively before trusting it — read it from inside the VM via az and compare to the fingerprint SSH was offered:
    az vm run-command invoke -g rg-sv0-dev -n vm-sv0-dev-1 --command-id RunShellScript \
    --scripts "ssh-keygen -lf /etc/ssh/ssh_host_ed25519_key.pub" --query "value[0].message" -o tsv
    If the SHA256:… matches, it's a legitimate rebuild (not MITM): ssh-keygen -R dev-azure-ssh.securityv0.com, then reconnect (-o StrictHostKeyChecking=accept-new). If it does not match, stop.
  • Azure dev (sv0_dev) already holds the 6 canonical slugs (no default/demo-nimbus), so no rename is needed; it's pure reconcile.

Azure staging (Atlas)

The persistent staging env (vm-sv0-staging-1 in rg-sv0-staging) is Atlas-only, DB sv0_staging on the sv0-staging Atlas cluster. The connection string lives in /etc/sv0/app.env on the VM and in the staging GitHub Environment secret MONGODB_URI.

The laptop path does not work — Atlas only accepts connections from allowlisted IPs. Per sv0-infrastructure/envs/staging/terraform.tfvars, persistent staging reaches Atlas via the shared NAT static egress IP, allowlisted on the project (the staging_atlas_ip_allowlist / drill_ip_allowlist vars are for the ephemeral drill cluster only, default off). The staging VM's egress is allowlisted; an operator laptop is not, so a direct connection fails with:

MongoNetworkError: ... tlsv1 alert internal error ... SSL alert number 80

That TLS-alert-80 is Atlas's signature for source IP not in the access list — not a credential or cluster problem (the staging app connects fine from the VM). Note envs/prod/terraform.tfvars shows 0.0.0.0/0, but the live allowlist is restrictive — treat the live behaviour as authority and assume IaC drift, do not trust the checked-in 0.0.0.0/0.

So reconcile staging from an allowlisted context, not a laptop. Two practical options:

# A. Read-only dry-run / introspection from the VM (allowlisted), via the running api container:
az vm run-command invoke -g rg-sv0-staging -n vm-sv0-staging-1 --command-id RunShellScript \
--scripts 'docker exec sv0-api node -e "const {MongoClient}=require(\"mongodb\");(async()=>{const c=new MongoClient(process.env.MONGODB_URI);await c.connect();console.log(JSON.stringify(await c.db(process.env.MONGODB_DB).collection(\"tenants\").find({},{projection:{_id:0,slug:1,display_name:1,tenant_class:1,status:1}}).toArray()));await c.close();})()"' \
--query "value[0].message" -o tsv

# B. To actually reconcile: add the operator IP to the LIVE staging allowlist (resolve the
# IaC drift first), then run scripts/apply-canonical-tenants.ts from the laptop; or run it
# from a CI runner / the VM whose egress is already allowlisted.

Staging already holds all 6 canonical tenants (it is not create-first) — so it is reconcile-only, just from the right network.

First reconciliation (2026-05-20)

Order: dev → dev-azure → staging — dry-run and read the report before every --apply.

All three non-prod envs turned out to already hold the 6 canonical slugs (no rename, no create needed — just identity drift). The drift was nearly identical everywhere, because the demo-w1 seed created the row with display_name = the slug and $setOnInsert never corrected it once the const set "Demo W1".

envDBdrift foundapplied
Hetzner devsv0_platformdemo-w1 display_name "demo-w1""Demo W1"✅ → 6 in sync
Azure devsv0_devdemo-w1 display_name; securityv0-internal status "evaluation""internal"✅ → 6 in sync
Azure stagingsv0_stagingdemo-w1 display_name only✅ applied VM-side via az (laptop is allowlist-blocked) → 6 in sync

Production

Not covered by routine reconciliation. app.securityv0.com uses a different WorkOS app (powerful-falcon-83) and is customer-facing; apply only with explicit approval, same dry-run-first procedure, extra care on status. The full cross-env auto-promotion provisioner is deliberately deferred (see the design note sv0-platform .scratch/design-notes/2026-05-19-tenant-provisioning-promotion.md); this runbook is the manual interim.