Skip to main content

ADR-016: Multi-Tenant Authentication Architecture

Status

Proposed (2026-04-09)

Supersedes: ADR-012: User Authentication Strategy — the dual-mode self-built approach (GitHub OAuth + Resend magic link) is superseded in scope. ADR-012 solved the "auth for <10 users" problem but did not address the multi-tenant data model, per-tenant SSO, /t/:slug URL scoping, SecurityV0 super-admin cross-tenant access, or per-tenant configuration — all of which are now explicit requirements.

Paired with: ADR-017: WorkOS as Authentication Provider — captures the vendor selection within the architecture defined here. ADR-016 is deliberately vendor-independent; if we ever migrate off the chosen provider, the architectural decisions in this ADR remain.


Context

The platform has no production-grade authentication or user model. Current state (verified in sv0-platform at commit 7dbce1b):

  • Auth middleware (src/api/middleware/auth.ts): JWT/JWKS scaffolding exists but is gated behind REQUIRE_AUTH=false in every non-production environment and most production environments. When the bypass is active, every request receives principalId: "dev-auth-bypass", wildcard scopes, and whatever X-Tenant-Id header the client sends.
  • Tenant model: No tenants collection. The tenant is a denormalized string on every document in all 17 Mongo collections. Tenants are implicitly created on first write. No metadata. No owner. No config.
  • User model: No users collection. AuthContext is { principalId, method, scopes, tenantId }, built per-request, never persisted. Even when REQUIRE_AUTH=true, the JWT sub claim is trusted as-is with no local user directory.
  • Frontend auth state (ui/src/context/auth-context.tsx, ui/src/pages/SettingsPage.tsx): Tenant ID and API key are entered by hand into a Settings form, stored in localStorage under sv0-auth, and injected into every request as x-tenant-id / x-api-key headers. No login page. No OTP. No magic link. No tenant list. No URL-based tenant scoping.
  • Cloudflare Access: app.securityv0.com and dev.securityv0.com sit behind CF Zero Trust, but the backend does not read Cf-Access-Jwt-Assertion. CF Access is a network gate only; it does not participate in identity decisions inside the app.
  • Per-tenant configuration (Jira integration URLs, branding, connector credentials, feature flags): does not exist.

What changed since ADR-012

ADR-012 was written for a "pre-pilot, <10 users, no vendor" constraint. Three things have shifted since then:

  1. Enterprise pipeline is forming. If a Fortune 500 CISO asks to install sv0-platform today, we have nothing to offer them — no SAML, no SCIM, no per-tenant isolation story, no IT-admin self-serve SSO onboarding. This is no longer a hypothetical; it is a sales blocker.
  2. Internal team is growing. The SecurityV0 team has started dogfooding and reviewing multiple tenants concurrently. Manually editing localStorage to switch between tenants is now a daily friction point, and there is no way to discover which tenants exist without asking the person who seeded them.
  3. Demo/evaluation motion is real. We run multiple concurrent prospect evaluations. Each needs an isolated tenant with demo data, a shareable URL, and a login flow that doesn't require us to hand out API keys over Slack.

These pressures invalidate three of ADR-012's foundational assumptions: (a) that tenant switching is rare enough to live in localStorage, (b) that internal admins are tenant-pinned at login, and (c) that a self-built GitHub OAuth + Resend magic link is cheaper than a vendor. At <5 users the self-built cost was right; at the scale we're headed toward, it isn't.

Requirements this ADR must satisfy

  1. Enterprise SAML/OIDC SSO per customer, with self-serve onboarding so a customer's IT admin can configure their own IdP connection without us writing code.
  2. Magic-link evaluation path for prospects in POC, demos, and small customers who won't set up SSO. Same login code path as enterprise customers.
  3. First-class tenant model — tenants are real objects with metadata, not a denormalized string.
  4. Cross-tenant SecurityV0 super-admins — our team can see and act on every tenant, select from a dropdown, and paste shareable links into Slack.
  5. URL-based tenant scoping — the tenant is in the URL path (e.g., /t/acme/clusters/abc123), not in a header or localStorage. Share links work. Deep links work. Tenant-lost-on-reload bugs are impossible.
  6. Per-tenant configuration surface — Jira base URL, branding, feature flags, connector credential references. Pre-provisioned by us now, self-service later.
  7. SecurityV0 as a first-class tenant too — we authenticate through the same provider as our customers, using the same flows. We do not maintain a parallel "admin login" that would drift from the customer flow.
  8. Clear ownership boundary between identity and authorization. Identity (who you are, what orgs you belong to) should live in the provider. Authorization (what you can do in sv0 specifically) should live in sv0. This ADR makes that boundary explicit.
  9. Low vendor lock-in. The architecture must survive a future migration to a different provider without touching the domain layer.

Decision

Adopt a B2B multi-tenant authentication architecture built around an external identity provider as the identity source of truth, with a thin local mirror for authorization and query performance. The specific vendor is selected in ADR-017; this ADR defines the architecture that would apply regardless of vendor choice.

The decision has six interlocking parts.

1. Identity source of truth lives in the provider

Users, organization membership, and authentication methods live in the external identity provider. We do not build our own user management UI, password reset flow, MFA flow, session management, or invitation system. The provider owns:

  • User identity (email, name, profile metadata)
  • Credentials (password, passkey, magic link tokens, social login, SAML assertion)
  • Organization membership (which user is a member of which organization)
  • Basic organization roles (e.g., admin, member within an organization)
  • Session lifecycle (issuance, refresh, revocation)
  • Admin-facing UIs for managing all of the above (for both SecurityV0 staff and customer IT admins)

Rationale: This is the single highest-leverage architectural decision. Building user management is a 6–8 week project with no product differentiation and permanent ongoing security surface area. A vendor that specializes in B2B auth will always do it better than we will. By drawing the boundary here, we reduce our auth code to ~500 lines (middleware + webhooks + mirror maintenance) and eliminate entire classes of bugs (session fixation, token replay, SCIM deprovisioning bugs, SAML XML signing flaws).

2. Tenants are first-class, modeled as provider Organizations

Every tenant — customer or SecurityV0-internal — is represented as an Organization in the identity provider and as a row in a new tenants collection in Mongo. The two are linked 1:1 by tenants.provider_org_id.

The tenants collection becomes the central hub for per-tenant metadata: slug, display name, status (evaluation | active | churned | internal), SSO enforcement flag, provisioning timestamps, and foreign keys to per-tenant configuration.

Rationale: Promoting the tenant from "denormalized string" to "real entity" unlocks everything downstream — the tenant switcher, the admin portal onboarding flow, per-tenant config, status reporting, and the SecurityV0 super-admin query ("show me all active tenants"). The 1:1 link with provider Organizations means we can flip a customer from magic-link eval to SAML enterprise by toggling a setting in the provider without any data migration on our side.

3. Local authorization mirror, webhook-synced

The backend maintains a thin local mirror of users and memberships for two reasons that cannot be served by calling the provider's API on every request:

  1. Performance. The auth middleware runs on every API request. A network call to the provider per request is unacceptable. We need local, indexed lookup.
  2. Cross-tenant queries. "Show me all tenants this super-admin can access" is a query over our own data, not the provider's. The provider knows about one organization at a time.

New collections:

  • users{ _id, provider_user_id, email, display_name, is_super_admin, created_at, updated_at }. One row per provider user. is_super_admin is derived from membership in the SecurityV0 internal organization and cached for fast lookup.
  • memberships{ _id, user_id, tenant_id, role, created_at, updated_at }. One row per (user, tenant) pair. Role is mirrored from the provider's organization role.
  • tenants{ _id, slug, display_name, provider_org_id, status, sso_enforced, created_at, archived_at? }.
  • tenant_configs{ _id, tenant_id, jira_base_url?, jira_project_key?, branding?, feature_flags?, connector_credential_refs? }. One row per tenant.

The mirror is kept in sync by a webhook receiver at POST /api/v1/webhooks/<provider>. The provider sends events for user.created, user.updated, organization_membership.created, organization_membership.updated, organization_membership.deleted, organization.created, and (if supported) organization.deleted. Each event upserts or deletes the corresponding local mirror row. Webhooks are idempotent and signature-verified.

The mirror is never the source of truth. If the mirror and the provider disagree, the provider wins. A periodic reconciliation job (daily) detects drift and backfills missing events. Local mutations never write back to the provider — user management happens in the provider's dashboards and Admin Portal, not in sv0.

Rationale: Hybrid identity-local-mirror is the conventional pattern for B2B SaaS on managed auth providers, recommended by every B2B auth vendor and used by every company at our target scale. It gives us fast authorization decisions without reinventing user management, and the webhook+reconciliation combo keeps drift bounded.

4. URL-scoped tenant routing

All tenant-scoped UI routes move from their current flat structure (/clusters, /findings/:id) to a tenant-prefixed structure (/t/:tenantSlug/clusters, /t/:tenantSlug/findings/:id). The tenant slug in the URL is the sole source of tenant context — localStorage and headers no longer carry tenant state.

The backend derives the tenant from two places and requires them to match:

  1. The URL parameter passed by the frontend (or the X-Tenant-Id header for programmatic API clients).
  2. The user's validated membership in that tenant from the local mirror.

If the authenticated user is not a member of the URL's tenant (and is not a super-admin), the request is rejected with 404 (not 403 — we do not leak tenant existence).

Rationale: URL-scoped tenants solve three problems at once:

  • Shareable links — SecurityV0 staff and customers can paste /t/acme/findings/123 into Slack and it works for everyone authorized.
  • No tenant drift — reloading the page, opening a new tab, or navigating via browser back never loses tenant context.
  • Clean middleware contract — the auth middleware has a single deterministic source of truth for tenant context and rejects mismatches immediately.

This does break every existing deep link, but the existing deep links are internal and can be rewritten in the same PR.

5. SecurityV0 internal is its own tenant

The SecurityV0 team is not a separate "admin subsystem". It is an internal tenant — specifically, a special organization in the identity provider with slug securityv0-internal — authenticated through the exact same flow as our customers. Team members log in via the provider, get a session, and appear in the local users table like any other user.

The difference is one flag: users.is_super_admin = true is set for any user who is a member of the securityv0-internal organization (derived from the membership webhook). A super-admin gets implicit membership in every tenant when making API requests — the auth middleware skips the membership check for super-admins.

Internal roles (owner, engineer, analyst, read-only) are modeled as memberships within the SecurityV0 internal organization using the provider's role system, not as a separate sv0-specific table. This means role changes for SecurityV0 staff happen in the provider's dashboard, not in our code.

Rationale: "Eat our own dog food" is not a cliché here — it is the only way to ensure the customer auth flow works. If we maintained a parallel internal login (e.g., GitHub OAuth for us, SAML for them), the two paths would drift, we would debug them in isolation, and the customer path would accumulate bugs we never see. Using the same provider, organization model, and login flow for ourselves guarantees the customer path is continuously exercised.

The is_super_admin flag is the only sv0-specific authorization concept that sits outside the provider's role model. It is deliberately binary: you either can see every tenant or you can't. Finer-grained internal roles (who can delete findings, who can edit tenant configs) are expressed as roles within the internal org, not as new global flags.

6. Authorization split between provider roles and sv0 permissions

The clean boundary between identity and authorization:

ConcernOwned byExamples
Identity: who is this person, what orgs do they belong toProvideruser.email, user.provider_user_id, membership.tenant_id, membership.role
Authorization: what can this person do inside sv0sv0"Can edit tenant config", "can mark finding as accepted risk", "can invite users to this tenant", "can generate Admin Portal link"

The provider gives us roles (e.g., admin, member). sv0 maps those roles to permissions at the middleware level — a small, explicit, versioned table of {role → [permission, ...]}, living in code. Adding a new permission means editing a file and writing tests, not clicking around a vendor dashboard.

Rationale: Pushing all authorization into the provider's role system would couple us tightly to the vendor and make permission changes slow (dashboard clicks, no version control). Keeping authorization in our code keeps it grep-able, testable, and version-controlled. The provider remains the source of truth for who has which role, but what each role can do is sv0's business.


Alternatives Considered

A. Build it ourselves (ADR-012 approach)

GitHub OAuth + Resend magic link + self-hosted sessions. This is the path ADR-012 chose and the one the existing 2026-03-01-user-authentication-plan.md documents.

Why rejected: Acceptable for <10 users; not acceptable for enterprise. Does not provide SAML, OIDC, SCIM, or Admin Portal. Every enterprise customer would become a custom integration. Building these features ourselves is a 6–8 week project with permanent ongoing maintenance burden and a large security surface area. The cost to reach parity with a managed provider is higher than the cost of simply using one.

B. Hostname-scoped tenants (acme.app.securityv0.com)

Classic SaaS multi-tenancy. Each tenant gets a subdomain, TLS is wildcard, and the tenant is derived from req.hostname.

Why rejected: Subdomains add three sources of friction: wildcard TLS management, per-subdomain Cloudflare Access applications (our infrastructure-level gate), and CORS configuration for every new tenant. /t/:slug URL scoping gives us the same "shareable link" property without any of these. We may revisit hostname scoping later if we need to satisfy a compliance requirement that mandates stronger origin isolation, but it should not be a day-1 cost.

C. Provider owns all authorization

Push every permission decision into the provider's role and permission model. sv0 has no local authorization code — it just checks the provider's API for "can user X perform action Y on resource Z?".

Why rejected: Unacceptable coupling. Authorization changes become vendor-dashboard changes. Permission logic is no longer grep-able or version-controlled. Latency per API call balloons because every decision requires a provider call. And if we ever migrate vendors, we have to re-build our entire authorization model in the new provider's language. The hybrid approach — provider owns identity and roles, sv0 owns permissions — gives us the best of both.

D. No local mirror — query the provider on every request

Technically possible. Avoids the webhook complexity and the eventual-consistency questions.

Why rejected: Every API request adds a network roundtrip to the provider. For endpoints that make multiple internal calls (evidence pack generation, graph queries), this compounds. Response times go from <100ms to 500ms+. The local mirror is the industry-standard solution and the webhook reconciliation overhead is a one-time cost, not an ongoing one.

E. Cloudflare Access as the only auth layer

Use the existing CF Zero Trust gate as the identity provider. Backend reads Cf-Access-Jwt-Assertion and extracts identity from there. No other vendor needed.

Why rejected: CF Access is a network gate, not a B2B identity provider. It has no concept of organizations, memberships, roles, self-serve SSO configuration, or Admin Portal. Every customer SSO setup would require us to manually configure a CF Access application in the CF dashboard, which caps our customer count at ~20 before it operationally breaks. CF Access remains valuable as a defense-in-depth network perimeter (we keep it) but cannot substitute for the app-layer identity provider. See the operating policy in the auth architecture doc for how the two layers compose.


Consequences

Positive

  1. Enterprise SSO is unblocked. Any customer IT admin can self-serve their SAML/OIDC configuration the same day we sign the contract, without engineering involvement.
  2. Evaluation and paid flows share code. Magic link for POCs and SAML for enterprise are the same login path with different settings per organization. No "demo mode" vs "real mode" divergence.
  3. SecurityV0 team unblocked for cross-tenant work. The tenant switcher dropdown replaces localStorage editing. Shareable URLs replace Slack messages saying "go to Settings, paste this tenant ID".
  4. Security surface area shrinks dramatically. Session management, CSRF, MFA, passwordless token rotation, SAML XML signing, SCIM parsing — none of these are our code to maintain or audit.
  5. Clean vendor exit story. If we ever migrate off the chosen provider, the domain layer (findings, entities, evaluator, UI) is untouched. We rewrite the auth middleware, update the webhook receiver, and backfill the local mirror from the new provider. The architecture in this ADR survives the migration.
  6. Per-tenant configuration gets a home. tenant_configs gives us a stable place to put Jira integration URLs, branding, connector credentials references, and feature flags — all things we have wanted but had no schema for.
  7. We eat our own dog food. SecurityV0 staff use the same login flow as our customers, which continuously exercises the customer path.

Negative

  1. Vendor cost. See ADR-017 for the specific provider's pricing; at the scale we care about (first several enterprise customers), this is on the order of $125/enterprise-customer/month, which is a rounding error against ACV but is a new line item we did not have before.
  2. Vendor availability risk. If the provider has an outage, new logins fail. Existing sessions continue to work because the local mirror and session cookies do not require a live provider connection for the duration of the session, but new users can't log in. Mitigation: document the runbook, consider session length, monitor provider status.
  3. Webhook complexity. We now maintain a webhook receiver with signature verification, idempotency, and reconciliation. This is a new operational surface but a small one (~200 lines).
  4. Breaking change for the UI. The switch to /t/:slug/... routes invalidates existing deep links. Internal deep links are rewritten in the same PR; external deep links (if any have been shared) get a 410 or a redirect.
  5. X-Tenant-Id header and X-API-Key auth are removed. Any external tooling that relies on these (scripts, seed tools, Postman collections) must be updated. Seed tools get a service-token path (see ADR-017 §Consequences).
  6. Local dev ergonomics require care. REQUIRE_AUTH=false is replaced with a "dev bootstrap" that mints a seeded super-admin user in the local DB. Developers do not need to log in through the provider to run the platform locally, but the code paths that run locally are identical to production (same middleware, same mirror lookup).

Graduation criteria

This ADR is considered successfully implemented when all of the following are true:

  • A SecurityV0 team member can log in via the provider, see every tenant in the dropdown, click into any tenant, and land on a working page.
  • A SecurityV0 team member can paste /t/acme/findings/abc123 into Slack, and another team member clicks it and lands on the same view (after logging in if needed).
  • A brand-new prospect can be provisioned via one script command (provision-eval-tenant.ts) which creates the provider Organization, creates the tenants row, sends an invitation email, and seeds demo data.
  • A prospect can click the invitation, log in via magic link, and see their evaluation tenant with demo data.
  • A customer can be converted from evaluation to paid by the SecurityV0 team generating an Admin Portal link and sending it to their IT admin, who configures SAML without any sv0 engineer involvement.
  • REQUIRE_AUTH=false is replaced with a dev-bootstrap super-admin, and the old header-based auth is removed from the codebase.
  • Old SettingsPage tenant/api-key form is deleted.

Deferred to later (not in this ADR)

  • SCIM (Directory Sync). Included in the chosen provider but enabled per-customer on demand, not day-1.
  • Audit log UI. We will emit audit events from the webhook receiver and key auth middleware paths, but the UI for viewing them is a follow-up.
  • MFA enforcement policy per tenant. The provider supports this; we surface the toggle in the tenant config UI as a follow-up.
  • Service tokens for programmatic API access (e.g., CI/CD, seed scripts) — design in ADR-017 but delivery is phased.
  • Migration tool for existing tenant data — currently every tenant is a string literal in documents; no existing tenant has a tenants row. The implementation plan includes a backfill step.


Addendum — 2026-04-22: URL tenant scoping revisited

Triggered by Ivan questioning Decision §4 (URL-scoped tenant routing) on two grounds: (a) /t/:slug/... exposes tenant slug in every URL and may read as a dev tool to enterprise buyers, and (b) inclusion in the sv0_http_request_duration_seconds{route} Prometheus label could inflate cardinality. Opus research agent investigated with file:line evidence. Findings:

Current implementation state (as of 2026-04-22)

ADR-016 is approximately 70% implemented, with the remaining 30% deliberately deferred:

  • UI routing — live. /t/:tenantSlug/* is mounted at sv0-platform/ui/src/App.tsx:157 with 22 child routes (clusters, findings, reports, graph, etc.). LegacyRedirect (same file, line 207) rewrites non-prefixed URLs using sessionStorage("sv0-last-tenant") for fallback. Tenant source of truth in the UI is the URL, not localStorage (ui/src/context/auth-context.tsx:14-18). Merged in dbb4cf4.
  • API routing — NOT migrated. Every route is still mounted at /api/v1/* (sv0-platform/src/api/app.ts:160-177), not under /t/:slug/api/v1/*. The auth middleware resolves tenant from req.params.tenantSlug or req.header("x-tenant-id") with the OR falling through to the header path because :tenantSlug never appears on API routes (sv0-platform/src/api/middleware/auth-middleware.ts:201-203). The UI reads the slug from the URL and injects it into the x-tenant-id header (ui/src/api/client.ts:33,63).
  • Practical meaning: the browser URL reads "tenant-scoped," but the HTTP API is still header-driven. Whether to migrate the API under /t/:slug/ is an open question; doing so is 4–8 hours of work and is not required to preserve the UI's deep-link / no-tenant-drift properties that motivated §4.

Cardinality — not an issue today

The HTTP route label in sv0_http_request_duration_seconds uses req.route.path (the parameterized Express pattern, e.g., /api/v1/findings/:id), not req.path (raw URL). See sv0-platform/src/api/middleware/metrics.ts:11. Tenant slug never lands in that label even if the API were mounted under /t/:tenantSlug/..., because the parameter name — not the value — is what appears in the label.

The real cardinality leak is sv0_job_duration_seconds{tenant_id} (sv0-platform/src/shared/metrics/metrics.ts:25) — an explicit metric label independent of URL shape. Fix is P0-5 in the 2026-04-21 readiness review, orthogonal to this ADR.

Enterprise aesthetic — split, not dispositive

Vendor URL survey across comparable tools:

PatternExamples
Tenant hidden in session / no tenant in URLWiz, Orca Security, Vanta, Drata
Slug in pathSnyk (/org/:slug/...), Linear, GitHub, Auth0
Subdomain per tenantOkta, Slack

There is no universal pattern. The pure security-tool segment (Wiz, Orca, Vanta, Drata) leans toward hiding tenant; engineering-adjacent security tools (Snyk, Auth0) expose it in path. /t/mediapro/clusters/abc reads as intentional, not amateurish. MediaPro-specific pushback, if it materializes, is the trigger to revisit — not speculative concern.

Dedicated-deployment mode (future)

In a dedicated single-tenant deployment (sv0.client.com/... dedicated to one customer), the /t/:slug/ prefix is redundant and confusing. ADR-016 §4 did not anticipate this mode.

Resolution: add a SINGLE_TENANT_SLUG environment variable to be introduced when the first dedicated-deployment client signs, not speculatively now. When set, the UI mounts routes flat (without /t/:slug/ prefix) and useTenantPath becomes a no-op; the API middleware resolves tenant from the env var rather than URL/header. Estimated effort 8–16 hours at the time it is needed. This deliberately trades a slightly larger later-cost for zero up-front cost and no constraint on the pre-pilot sprint.

Decision

ADR-016 §4 stands for the MediaPro pilot. The /t/:slug/... UI pattern remains; the API remains flat and header-driven; no subdomain migration; no reversal.

Revisit thresholds

Reconsider subdomain scoping (Option B: mediapro.app.securityv0.com/...) when any of these fire:

  1. >50 active tenants (cross-tenant deep-link sharing + per-tenant branding start pulling their weight);
  2. Explicit enterprise-buyer pushback on URL aesthetic tied to a deal;
  3. First dedicated-deployment client signs — at that point, add the SINGLE_TENANT_SLUG flag (NOT a subdomain migration); the two modes can coexist.

Options considered and rejected

OptionEngineering hoursRejected because
A. Status quo /t/:slug/0 (kept)
B. Subdomain per tenant (mediapro.app.sv0.com)60–120Cost exceeds benefit for a pilot sprint; revisit at a threshold above
C. Clean URL + tenant-in-session20–30Reintroduces tenant-lost-on-reload bug class §4 was designed to prevent
D. Hybrid subdomain + /t/ super-admin view80–140Worst complexity; two code paths, cross-subdomain cookie dance
E. Cloudflare Worker vanity alias40–80Fragile; Worker becomes a failure mode

References

  • Research: readiness review v2.x research thread, agent output archived in the PR #190 commit trail
  • Related: §2.1 of docs/architecture/research/2026-04-22-observability-stack.md (cardinality discussion)