ADR-016: Multi-Tenant Authentication Architecture
Status
Proposed (2026-04-09)
Supersedes: ADR-012: User Authentication Strategy — the dual-mode self-built approach (GitHub OAuth + Resend magic link) is superseded in scope. ADR-012 solved the "auth for <10 users" problem but did not address the multi-tenant data model, per-tenant SSO, /t/:slug URL scoping, SecurityV0 super-admin cross-tenant access, or per-tenant configuration — all of which are now explicit requirements.
Paired with: ADR-017: WorkOS as Authentication Provider — captures the vendor selection within the architecture defined here. ADR-016 is deliberately vendor-independent; if we ever migrate off the chosen provider, the architectural decisions in this ADR remain.
Context
The platform has no production-grade authentication or user model. Current state (verified in sv0-platform at commit 7dbce1b):
- Auth middleware (
src/api/middleware/auth.ts): JWT/JWKS scaffolding exists but is gated behindREQUIRE_AUTH=falsein every non-production environment and most production environments. When the bypass is active, every request receivesprincipalId: "dev-auth-bypass", wildcard scopes, and whateverX-Tenant-Idheader the client sends. - Tenant model: No
tenantscollection. The tenant is a denormalized string on every document in all 17 Mongo collections. Tenants are implicitly created on first write. No metadata. No owner. No config. - User model: No
userscollection.AuthContextis{ principalId, method, scopes, tenantId }, built per-request, never persisted. Even whenREQUIRE_AUTH=true, the JWTsubclaim is trusted as-is with no local user directory. - Frontend auth state (
ui/src/context/auth-context.tsx,ui/src/pages/SettingsPage.tsx): Tenant ID and API key are entered by hand into a Settings form, stored inlocalStorageundersv0-auth, and injected into every request asx-tenant-id/x-api-keyheaders. No login page. No OTP. No magic link. No tenant list. No URL-based tenant scoping. - Cloudflare Access:
app.securityv0.comanddev.securityv0.comsit behind CF Zero Trust, but the backend does not readCf-Access-Jwt-Assertion. CF Access is a network gate only; it does not participate in identity decisions inside the app. - Per-tenant configuration (Jira integration URLs, branding, connector credentials, feature flags): does not exist.
What changed since ADR-012
ADR-012 was written for a "pre-pilot, <10 users, no vendor" constraint. Three things have shifted since then:
- Enterprise pipeline is forming. If a Fortune 500 CISO asks to install sv0-platform today, we have nothing to offer them — no SAML, no SCIM, no per-tenant isolation story, no IT-admin self-serve SSO onboarding. This is no longer a hypothetical; it is a sales blocker.
- Internal team is growing. The SecurityV0 team has started dogfooding and reviewing multiple tenants concurrently. Manually editing
localStorageto switch between tenants is now a daily friction point, and there is no way to discover which tenants exist without asking the person who seeded them. - Demo/evaluation motion is real. We run multiple concurrent prospect evaluations. Each needs an isolated tenant with demo data, a shareable URL, and a login flow that doesn't require us to hand out API keys over Slack.
These pressures invalidate three of ADR-012's foundational assumptions: (a) that tenant switching is rare enough to live in localStorage, (b) that internal admins are tenant-pinned at login, and (c) that a self-built GitHub OAuth + Resend magic link is cheaper than a vendor. At <5 users the self-built cost was right; at the scale we're headed toward, it isn't.
Requirements this ADR must satisfy
- Enterprise SAML/OIDC SSO per customer, with self-serve onboarding so a customer's IT admin can configure their own IdP connection without us writing code.
- Magic-link evaluation path for prospects in POC, demos, and small customers who won't set up SSO. Same login code path as enterprise customers.
- First-class tenant model — tenants are real objects with metadata, not a denormalized string.
- Cross-tenant SecurityV0 super-admins — our team can see and act on every tenant, select from a dropdown, and paste shareable links into Slack.
- URL-based tenant scoping — the tenant is in the URL path (e.g.,
/t/acme/clusters/abc123), not in a header or localStorage. Share links work. Deep links work. Tenant-lost-on-reload bugs are impossible. - Per-tenant configuration surface — Jira base URL, branding, feature flags, connector credential references. Pre-provisioned by us now, self-service later.
- SecurityV0 as a first-class tenant too — we authenticate through the same provider as our customers, using the same flows. We do not maintain a parallel "admin login" that would drift from the customer flow.
- Clear ownership boundary between identity and authorization. Identity (who you are, what orgs you belong to) should live in the provider. Authorization (what you can do in sv0 specifically) should live in sv0. This ADR makes that boundary explicit.
- Low vendor lock-in. The architecture must survive a future migration to a different provider without touching the domain layer.
Decision
Adopt a B2B multi-tenant authentication architecture built around an external identity provider as the identity source of truth, with a thin local mirror for authorization and query performance. The specific vendor is selected in ADR-017; this ADR defines the architecture that would apply regardless of vendor choice.
The decision has six interlocking parts.
1. Identity source of truth lives in the provider
Users, organization membership, and authentication methods live in the external identity provider. We do not build our own user management UI, password reset flow, MFA flow, session management, or invitation system. The provider owns:
- User identity (email, name, profile metadata)
- Credentials (password, passkey, magic link tokens, social login, SAML assertion)
- Organization membership (which user is a member of which organization)
- Basic organization roles (e.g.,
admin,memberwithin an organization) - Session lifecycle (issuance, refresh, revocation)
- Admin-facing UIs for managing all of the above (for both SecurityV0 staff and customer IT admins)
Rationale: This is the single highest-leverage architectural decision. Building user management is a 6–8 week project with no product differentiation and permanent ongoing security surface area. A vendor that specializes in B2B auth will always do it better than we will. By drawing the boundary here, we reduce our auth code to ~500 lines (middleware + webhooks + mirror maintenance) and eliminate entire classes of bugs (session fixation, token replay, SCIM deprovisioning bugs, SAML XML signing flaws).
2. Tenants are first-class, modeled as provider Organizations
Every tenant — customer or SecurityV0-internal — is represented as an Organization in the identity provider and as a row in a new tenants collection in Mongo. The two are linked 1:1 by tenants.provider_org_id.
The tenants collection becomes the central hub for per-tenant metadata: slug, display name, status (evaluation | active | churned | internal), SSO enforcement flag, provisioning timestamps, and foreign keys to per-tenant configuration.
Rationale: Promoting the tenant from "denormalized string" to "real entity" unlocks everything downstream — the tenant switcher, the admin portal onboarding flow, per-tenant config, status reporting, and the SecurityV0 super-admin query ("show me all active tenants"). The 1:1 link with provider Organizations means we can flip a customer from magic-link eval to SAML enterprise by toggling a setting in the provider without any data migration on our side.
3. Local authorization mirror, webhook-synced
The backend maintains a thin local mirror of users and memberships for two reasons that cannot be served by calling the provider's API on every request:
- Performance. The auth middleware runs on every API request. A network call to the provider per request is unacceptable. We need local, indexed lookup.
- Cross-tenant queries. "Show me all tenants this super-admin can access" is a query over our own data, not the provider's. The provider knows about one organization at a time.
New collections:
users—{ _id, provider_user_id, email, display_name, is_super_admin, created_at, updated_at }. One row per provider user.is_super_adminis derived from membership in the SecurityV0 internal organization and cached for fast lookup.memberships—{ _id, user_id, tenant_id, role, created_at, updated_at }. One row per (user, tenant) pair. Role is mirrored from the provider's organization role.tenants—{ _id, slug, display_name, provider_org_id, status, sso_enforced, created_at, archived_at? }.tenant_configs—{ _id, tenant_id, jira_base_url?, jira_project_key?, branding?, feature_flags?, connector_credential_refs? }. One row per tenant.
The mirror is kept in sync by a webhook receiver at POST /api/v1/webhooks/<provider>. The provider sends events for user.created, user.updated, organization_membership.created, organization_membership.updated, organization_membership.deleted, organization.created, and (if supported) organization.deleted. Each event upserts or deletes the corresponding local mirror row. Webhooks are idempotent and signature-verified.
The mirror is never the source of truth. If the mirror and the provider disagree, the provider wins. A periodic reconciliation job (daily) detects drift and backfills missing events. Local mutations never write back to the provider — user management happens in the provider's dashboards and Admin Portal, not in sv0.
Rationale: Hybrid identity-local-mirror is the conventional pattern for B2B SaaS on managed auth providers, recommended by every B2B auth vendor and used by every company at our target scale. It gives us fast authorization decisions without reinventing user management, and the webhook+reconciliation combo keeps drift bounded.
4. URL-scoped tenant routing
All tenant-scoped UI routes move from their current flat structure (/clusters, /findings/:id) to a tenant-prefixed structure (/t/:tenantSlug/clusters, /t/:tenantSlug/findings/:id). The tenant slug in the URL is the sole source of tenant context — localStorage and headers no longer carry tenant state.
The backend derives the tenant from two places and requires them to match:
- The URL parameter passed by the frontend (or the X-Tenant-Id header for programmatic API clients).
- The user's validated membership in that tenant from the local mirror.
If the authenticated user is not a member of the URL's tenant (and is not a super-admin), the request is rejected with 404 (not 403 — we do not leak tenant existence).
Rationale: URL-scoped tenants solve three problems at once:
- Shareable links — SecurityV0 staff and customers can paste
/t/acme/findings/123into Slack and it works for everyone authorized. - No tenant drift — reloading the page, opening a new tab, or navigating via browser back never loses tenant context.
- Clean middleware contract — the auth middleware has a single deterministic source of truth for tenant context and rejects mismatches immediately.
This does break every existing deep link, but the existing deep links are internal and can be rewritten in the same PR.
5. SecurityV0 internal is its own tenant
The SecurityV0 team is not a separate "admin subsystem". It is an internal tenant — specifically, a special organization in the identity provider with slug securityv0-internal — authenticated through the exact same flow as our customers. Team members log in via the provider, get a session, and appear in the local users table like any other user.
The difference is one flag: users.is_super_admin = true is set for any user who is a member of the securityv0-internal organization (derived from the membership webhook). A super-admin gets implicit membership in every tenant when making API requests — the auth middleware skips the membership check for super-admins.
Internal roles (owner, engineer, analyst, read-only) are modeled as memberships within the SecurityV0 internal organization using the provider's role system, not as a separate sv0-specific table. This means role changes for SecurityV0 staff happen in the provider's dashboard, not in our code.
Rationale: "Eat our own dog food" is not a cliché here — it is the only way to ensure the customer auth flow works. If we maintained a parallel internal login (e.g., GitHub OAuth for us, SAML for them), the two paths would drift, we would debug them in isolation, and the customer path would accumulate bugs we never see. Using the same provider, organization model, and login flow for ourselves guarantees the customer path is continuously exercised.
The is_super_admin flag is the only sv0-specific authorization concept that sits outside the provider's role model. It is deliberately binary: you either can see every tenant or you can't. Finer-grained internal roles (who can delete findings, who can edit tenant configs) are expressed as roles within the internal org, not as new global flags.
6. Authorization split between provider roles and sv0 permissions
The clean boundary between identity and authorization:
| Concern | Owned by | Examples |
|---|---|---|
| Identity: who is this person, what orgs do they belong to | Provider | user.email, user.provider_user_id, membership.tenant_id, membership.role |
| Authorization: what can this person do inside sv0 | sv0 | "Can edit tenant config", "can mark finding as accepted risk", "can invite users to this tenant", "can generate Admin Portal link" |
The provider gives us roles (e.g., admin, member). sv0 maps those roles to permissions at the middleware level — a small, explicit, versioned table of {role → [permission, ...]}, living in code. Adding a new permission means editing a file and writing tests, not clicking around a vendor dashboard.
Rationale: Pushing all authorization into the provider's role system would couple us tightly to the vendor and make permission changes slow (dashboard clicks, no version control). Keeping authorization in our code keeps it grep-able, testable, and version-controlled. The provider remains the source of truth for who has which role, but what each role can do is sv0's business.
Alternatives Considered
A. Build it ourselves (ADR-012 approach)
GitHub OAuth + Resend magic link + self-hosted sessions. This is the path ADR-012 chose and the one the existing 2026-03-01-user-authentication-plan.md documents.
Why rejected: Acceptable for <10 users; not acceptable for enterprise. Does not provide SAML, OIDC, SCIM, or Admin Portal. Every enterprise customer would become a custom integration. Building these features ourselves is a 6–8 week project with permanent ongoing maintenance burden and a large security surface area. The cost to reach parity with a managed provider is higher than the cost of simply using one.
B. Hostname-scoped tenants (acme.app.securityv0.com)
Classic SaaS multi-tenancy. Each tenant gets a subdomain, TLS is wildcard, and the tenant is derived from req.hostname.
Why rejected: Subdomains add three sources of friction: wildcard TLS management, per-subdomain Cloudflare Access applications (our infrastructure-level gate), and CORS configuration for every new tenant. /t/:slug URL scoping gives us the same "shareable link" property without any of these. We may revisit hostname scoping later if we need to satisfy a compliance requirement that mandates stronger origin isolation, but it should not be a day-1 cost.
C. Provider owns all authorization
Push every permission decision into the provider's role and permission model. sv0 has no local authorization code — it just checks the provider's API for "can user X perform action Y on resource Z?".
Why rejected: Unacceptable coupling. Authorization changes become vendor-dashboard changes. Permission logic is no longer grep-able or version-controlled. Latency per API call balloons because every decision requires a provider call. And if we ever migrate vendors, we have to re-build our entire authorization model in the new provider's language. The hybrid approach — provider owns identity and roles, sv0 owns permissions — gives us the best of both.
D. No local mirror — query the provider on every request
Technically possible. Avoids the webhook complexity and the eventual-consistency questions.
Why rejected: Every API request adds a network roundtrip to the provider. For endpoints that make multiple internal calls (evidence pack generation, graph queries), this compounds. Response times go from <100ms to 500ms+. The local mirror is the industry-standard solution and the webhook reconciliation overhead is a one-time cost, not an ongoing one.
E. Cloudflare Access as the only auth layer
Use the existing CF Zero Trust gate as the identity provider. Backend reads Cf-Access-Jwt-Assertion and extracts identity from there. No other vendor needed.
Why rejected: CF Access is a network gate, not a B2B identity provider. It has no concept of organizations, memberships, roles, self-serve SSO configuration, or Admin Portal. Every customer SSO setup would require us to manually configure a CF Access application in the CF dashboard, which caps our customer count at ~20 before it operationally breaks. CF Access remains valuable as a defense-in-depth network perimeter (we keep it) but cannot substitute for the app-layer identity provider. See the operating policy in the auth architecture doc for how the two layers compose.
Consequences
Positive
- Enterprise SSO is unblocked. Any customer IT admin can self-serve their SAML/OIDC configuration the same day we sign the contract, without engineering involvement.
- Evaluation and paid flows share code. Magic link for POCs and SAML for enterprise are the same login path with different settings per organization. No "demo mode" vs "real mode" divergence.
- SecurityV0 team unblocked for cross-tenant work. The tenant switcher dropdown replaces localStorage editing. Shareable URLs replace Slack messages saying "go to Settings, paste this tenant ID".
- Security surface area shrinks dramatically. Session management, CSRF, MFA, passwordless token rotation, SAML XML signing, SCIM parsing — none of these are our code to maintain or audit.
- Clean vendor exit story. If we ever migrate off the chosen provider, the domain layer (findings, entities, evaluator, UI) is untouched. We rewrite the auth middleware, update the webhook receiver, and backfill the local mirror from the new provider. The architecture in this ADR survives the migration.
- Per-tenant configuration gets a home.
tenant_configsgives us a stable place to put Jira integration URLs, branding, connector credentials references, and feature flags — all things we have wanted but had no schema for. - We eat our own dog food. SecurityV0 staff use the same login flow as our customers, which continuously exercises the customer path.
Negative
- Vendor cost. See ADR-017 for the specific provider's pricing; at the scale we care about (first several enterprise customers), this is on the order of $125/enterprise-customer/month, which is a rounding error against ACV but is a new line item we did not have before.
- Vendor availability risk. If the provider has an outage, new logins fail. Existing sessions continue to work because the local mirror and session cookies do not require a live provider connection for the duration of the session, but new users can't log in. Mitigation: document the runbook, consider session length, monitor provider status.
- Webhook complexity. We now maintain a webhook receiver with signature verification, idempotency, and reconciliation. This is a new operational surface but a small one (~200 lines).
- Breaking change for the UI. The switch to
/t/:slug/...routes invalidates existing deep links. Internal deep links are rewritten in the same PR; external deep links (if any have been shared) get a 410 or a redirect. X-Tenant-Idheader andX-API-Keyauth are removed. Any external tooling that relies on these (scripts, seed tools, Postman collections) must be updated. Seed tools get a service-token path (see ADR-017 §Consequences).- Local dev ergonomics require care.
REQUIRE_AUTH=falseis replaced with a "dev bootstrap" that mints a seeded super-admin user in the local DB. Developers do not need to log in through the provider to run the platform locally, but the code paths that run locally are identical to production (same middleware, same mirror lookup).
Graduation criteria
This ADR is considered successfully implemented when all of the following are true:
- A SecurityV0 team member can log in via the provider, see every tenant in the dropdown, click into any tenant, and land on a working page.
- A SecurityV0 team member can paste
/t/acme/findings/abc123into Slack, and another team member clicks it and lands on the same view (after logging in if needed). - A brand-new prospect can be provisioned via one script command (
provision-eval-tenant.ts) which creates the provider Organization, creates thetenantsrow, sends an invitation email, and seeds demo data. - A prospect can click the invitation, log in via magic link, and see their evaluation tenant with demo data.
- A customer can be converted from evaluation to paid by the SecurityV0 team generating an Admin Portal link and sending it to their IT admin, who configures SAML without any sv0 engineer involvement.
REQUIRE_AUTH=falseis replaced with a dev-bootstrap super-admin, and the old header-based auth is removed from the codebase.- Old
SettingsPagetenant/api-key form is deleted.
Deferred to later (not in this ADR)
- SCIM (Directory Sync). Included in the chosen provider but enabled per-customer on demand, not day-1.
- Audit log UI. We will emit audit events from the webhook receiver and key auth middleware paths, but the UI for viewing them is a follow-up.
- MFA enforcement policy per tenant. The provider supports this; we surface the toggle in the tenant config UI as a follow-up.
- Service tokens for programmatic API access (e.g., CI/CD, seed scripts) — design in ADR-017 but delivery is phased.
- Migration tool for existing tenant data — currently every tenant is a string literal in documents; no existing tenant has a
tenantsrow. The implementation plan includes a backfill step.
Related
- Paired decision: ADR-017: WorkOS as Authentication Provider — vendor choice.
- Superseded ADR: ADR-012: User Authentication Strategy — the previous self-built approach.
- Architecture reference: 13 — Authentication and User Management — the living doc describing how auth actually works in sv0 once this ADR is implemented.
- Implementation plan: 2026-04-09 WorkOS Auth Implementation — phased rollout.
- Research: 2026-04-09 Provider comparison — the evaluation that informed ADR-017.
- Infrastructure context: Cloudflare Zero Trust perimeter — how CF Access composes with the app-layer identity provider.
- Current code that will change:
sv0-platform/src/api/middleware/auth.ts,sv0-platform/ui/src/context/auth-context.tsx,sv0-platform/ui/src/pages/SettingsPage.tsx,sv0-platform/ui/src/App.tsx,sv0-platform/src/storage/mongo/collections.ts.
Addendum — 2026-04-22: URL tenant scoping revisited
Triggered by Ivan questioning Decision §4 (URL-scoped tenant routing) on two grounds: (a) /t/:slug/... exposes tenant slug in every URL and may read as a dev tool to enterprise buyers, and (b) inclusion in the sv0_http_request_duration_seconds{route} Prometheus label could inflate cardinality. Opus research agent investigated with file:line evidence. Findings:
Current implementation state (as of 2026-04-22)
ADR-016 is approximately 70% implemented, with the remaining 30% deliberately deferred:
- UI routing — live.
/t/:tenantSlug/*is mounted atsv0-platform/ui/src/App.tsx:157with 22 child routes (clusters, findings, reports, graph, etc.).LegacyRedirect(same file, line 207) rewrites non-prefixed URLs usingsessionStorage("sv0-last-tenant")for fallback. Tenant source of truth in the UI is the URL, not localStorage (ui/src/context/auth-context.tsx:14-18). Merged indbb4cf4. - API routing — NOT migrated. Every route is still mounted at
/api/v1/*(sv0-platform/src/api/app.ts:160-177), not under/t/:slug/api/v1/*. The auth middleware resolves tenant fromreq.params.tenantSlugorreq.header("x-tenant-id")with the OR falling through to the header path because:tenantSlugnever appears on API routes (sv0-platform/src/api/middleware/auth-middleware.ts:201-203). The UI reads the slug from the URL and injects it into thex-tenant-idheader (ui/src/api/client.ts:33,63). - Practical meaning: the browser URL reads "tenant-scoped," but the HTTP API is still header-driven. Whether to migrate the API under
/t/:slug/is an open question; doing so is 4–8 hours of work and is not required to preserve the UI's deep-link / no-tenant-drift properties that motivated §4.
Cardinality — not an issue today
The HTTP route label in sv0_http_request_duration_seconds uses req.route.path (the parameterized Express pattern, e.g., /api/v1/findings/:id), not req.path (raw URL). See sv0-platform/src/api/middleware/metrics.ts:11. Tenant slug never lands in that label even if the API were mounted under /t/:tenantSlug/..., because the parameter name — not the value — is what appears in the label.
The real cardinality leak is sv0_job_duration_seconds{tenant_id} (sv0-platform/src/shared/metrics/metrics.ts:25) — an explicit metric label independent of URL shape. Fix is P0-5 in the 2026-04-21 readiness review, orthogonal to this ADR.
Enterprise aesthetic — split, not dispositive
Vendor URL survey across comparable tools:
| Pattern | Examples |
|---|---|
| Tenant hidden in session / no tenant in URL | Wiz, Orca Security, Vanta, Drata |
| Slug in path | Snyk (/org/:slug/...), Linear, GitHub, Auth0 |
| Subdomain per tenant | Okta, Slack |
There is no universal pattern. The pure security-tool segment (Wiz, Orca, Vanta, Drata) leans toward hiding tenant; engineering-adjacent security tools (Snyk, Auth0) expose it in path. /t/mediapro/clusters/abc reads as intentional, not amateurish. MediaPro-specific pushback, if it materializes, is the trigger to revisit — not speculative concern.
Dedicated-deployment mode (future)
In a dedicated single-tenant deployment (sv0.client.com/... dedicated to one customer), the /t/:slug/ prefix is redundant and confusing. ADR-016 §4 did not anticipate this mode.
Resolution: add a SINGLE_TENANT_SLUG environment variable to be introduced when the first dedicated-deployment client signs, not speculatively now. When set, the UI mounts routes flat (without /t/:slug/ prefix) and useTenantPath becomes a no-op; the API middleware resolves tenant from the env var rather than URL/header. Estimated effort 8–16 hours at the time it is needed. This deliberately trades a slightly larger later-cost for zero up-front cost and no constraint on the pre-pilot sprint.
Decision
ADR-016 §4 stands for the MediaPro pilot. The /t/:slug/... UI pattern remains; the API remains flat and header-driven; no subdomain migration; no reversal.
Revisit thresholds
Reconsider subdomain scoping (Option B: mediapro.app.securityv0.com/...) when any of these fire:
- >50 active tenants (cross-tenant deep-link sharing + per-tenant branding start pulling their weight);
- Explicit enterprise-buyer pushback on URL aesthetic tied to a deal;
- First dedicated-deployment client signs — at that point, add the
SINGLE_TENANT_SLUGflag (NOT a subdomain migration); the two modes can coexist.
Options considered and rejected
| Option | Engineering hours | Rejected because |
|---|---|---|
A. Status quo /t/:slug/ | 0 (kept) | — |
B. Subdomain per tenant (mediapro.app.sv0.com) | 60–120 | Cost exceeds benefit for a pilot sprint; revisit at a threshold above |
| C. Clean URL + tenant-in-session | 20–30 | Reintroduces tenant-lost-on-reload bug class §4 was designed to prevent |
D. Hybrid subdomain + /t/ super-admin view | 80–140 | Worst complexity; two code paths, cross-subdomain cookie dance |
| E. Cloudflare Worker vanity alias | 40–80 | Fragile; Worker becomes a failure mode |
References
- Research: readiness review v2.x research thread, agent output archived in the PR #190 commit trail
- Related: §2.1 of
docs/architecture/research/2026-04-22-observability-stack.md(cardinality discussion)