Pre-Client Enterprise Readiness Plan (v2.0)
TL;DR
MediaPro pilot targets early May 2026 (~10–14 days from 2026-04-22). Six parallel adversarial reviews (security, ops, connectors, enterprise buyer, Tier-2 analyst, CISO UI) identified 15 P0 ship-blockers at v1. Two architectural decisions — taken between v1.1 and v1.4 — collapse most of those P0s:
WorkOS for auth eliminates the production auth bypass, attacker-controlled
x-tenant-id, and hardcoded dev cookie password (original P0-1/2/3) by replacing them with real production auth.MongoDB Atlas M10 migration in the pilot window eliminates the unauthenticated-MongoDB and same-disk-backup risks (original P0-4/6) — Atlas provides auth, HA, and native PITR backups out of the box.
What remains is 11 items of real pilot work across four capabilities (observability, connectors, analyst workflow, marketing/trust hygiene) plus a compute-substrate migration that is deferred to post-pilot. Three tracks run in parallel: engineering, client-facing (Isaac + Ivan + Sergey), and trust/legal (de-escalated because MediaPro waived DPO for read-only metadata scope).
The single highest-leverage action is not engineering — it is booking the WorkOS sales call (Sally Park thread, response received 2026-04-16). Nothing in Track A past Day 2 unblocks until that call completes.
1. Client context — MediaPro
Global media company (EMEA HQ). CIO Sergi and head of IT/infra Ezequiel are the buying committee. They stopped engagement with Fortinet and Cisco on AI security (both delivered incomplete proposals). Stack: Azure-heavy (Copilot, Foundry, Copilot Studio) + AWS + ServiceNow + Jira; Google and Oracle are present but not the primary concern; dev team uses Claude.
Compliance posture (confirmed in meeting)
- Access scope is read-only, metadata-only. Ezequiel confirmed this scope does not require DPO involvement on their side; Sergi (Corporate Security Officer within the CIO org) signed off verbally.
- Data-residency constraints have not been formally stated; MediaPro's stack is EMEA-based, so the Atlas region choice should match or explain.
- Net effect on this plan: Trust/legal artifacts (original P0-13, Trust & Controls Summary) are de-escalated from "legal gate" to "credibility hygiene." Still required for the next client, no longer pilot-blocking.
Pilot scope and client commitments
- Pilot environment: TBD (prod or staging) — they decide after stack inventory.
- They are offering us: Jira Cloud test environment (unblocks sv0-connectors#72), AWS test access (validates our AWS multi-account work on real data), full stack disclosure by end of this week.
- Client-side next step (two weeks): Ezequiel's team prepares credentials and env access for early-May pilot.
Use-case coverage — MediaPro's six concerns vs. our model
Tracked in detail in sv0-documentation#194.
| # | MediaPro concern | Our coverage | Confidence |
|---|---|---|---|
| 1 | Lack of AI tool visibility (shadow AI, licenses) | Foundry + Copilot Studio discovery via service principals; workload entity type | High for discovery; gap on license-usage framing |
| 2 | Hallucinations / guardrails | Out of scope — frame explicitly, do not overpromise | n/a |
| 3 | Accountability over time (ownership decay) | Scope-drift evaluator + temporal diff + orphaned-sensitive cluster | Direct hit — our core value prop |
| 4 | Illicit automation via script with egress | Scope drift + cross-system auth + data-domain egress category | Direct hit pending verification we catch IAM-roled scripts, not only registered apps |
| 5 | Platform coverage / global reader access | Read-only metadata connector model matches exactly | High; Google + Oracle on roadmap |
| 6 | Admin enforcement for out-of-report-line teams | Out of scope — reframe as visibility | n/a |
Ezequiel's emphasis on concern #4 — "automation via script with egress traffic, not DSPM" — is exactly our execution-authority story. This is the single most important positioning point for our discovery call.
2. Architectural decisions
Three decisions, taken between v1.0 and v1.4, shape everything downstream. Each one pre-resolves or collapses multiple items from the original P0 inventory.
2.1 Auth — WorkOS (decision: v1.0, unchanged)
Decision: WorkOS as the identity provider for production auth, per ADR-016 and ADR-017. WorkOS thread active with Sally Park (response 2026-04-16 offering startup credits, non-prod test environment, shared Slack channel).
What it resolves:
- Original P0-1: prod runs
NODE_ENV=development+AUTH_PROVIDER=dev— resolved by Phase 0/1 rollout flipping toNODE_ENV=production+AUTH_PROVIDER=workos. - Original P0-2: attacker-controlled
x-tenant-id— auto-resolved once P0-1 lands; the new middleware derives tenant from the verified JWT'sprovider_org_id. - Original P0-3: hardcoded
DEV_COOKIE_PASSWORD— obsolete onceAUTH_PROVIDER=devis no longer in prod.
What it enables:
- Magic-link test accounts for pilot-window QA (free, 2–3 hours setup, no Google Workspace required).
- Session-stacking "act as user" impersonation feature, post-pilot (3–4 days of engineering on already-scaffolded code:
Permission.INTERNAL_IMPERSONATE,AuthMethod.impersonation,is_super_adminsession fields are already present insrc/api/auth/).
Day-0 blocker: Book WorkOS sales call. Nothing past Day 2 of Track A unblocks until Phase 0/1 is live.
Tracking: sv0-platform#373, sv0-platform#492 (epic).
2.2 Database — MongoDB Atlas M10, direct billing, in-pilot (decision: v1.4)
Decision: Migrate prod MongoDB from Hetzner self-managed to Atlas M10 inside the pilot window. Direct MongoDB billing (not Azure Marketplace). Region: pending MediaPro confirmation; default to EU (Frankfurt) unless they explicitly approve US residency. MediaPro is an EMEA-based media company; defaulting to US assumes a residency posture they haven't signed off on. Frankfurt also aligns with the Grafana Cloud region picked in the observability research.
Why direct billing, not Marketplace: Azure Sponsorship and Founders Hub credits explicitly exclude Marketplace third-party products. Atlas via Azure Marketplace only helps with billing/procurement under MACC (a paid commitment we don't have). Atlas-direct preserves the ability to burn MongoDB for Startups credits (application in-flight).
What it resolves (once cutover is complete and verified):
- Original P0-4: unauthenticated MongoDB — resolved natively; Atlas requires auth.
- Original P0-6: no offsite backups on same-disk — resolved natively; Atlas ships with PITR and offsite backups.
- Single-node risk — resolved; M10 is a three-node replica set by default.
Resolution evidence required (must all pass before these P0s are marked closed):
- Atlas cluster has DB auth enabled; admin user is rotated to a managed secret.
- PITR + backup policy visible in Atlas console; retention matches committed SLA.
- App pointed at Atlas
MONGODB_URI; Hetznermongocompose service is dev-only or removed. - Restore + smoke test passed (verified
mongorestorefrom a backup, followed by a read/write smoke).
Until all four are green, mark P0-4 / P0-6 as "in-flight, expected-resolved-by-Atlas" rather than "resolved."
Migration plan (from Codex, validated):
- Externalize prod
MONGODB_URIindocker-compose.deploy.yml; make themongoservice dev-local-only. Zero production impact. - Stand up Atlas M10 with DB auth, IP allowlist from Hetzner static egress, backups on. No traffic yet.
- 30-minute maintenance window:
mongodumpHetzner →mongorestoreAtlas. FlipMONGODB_URI. Smoke test. Prod Mongo has near-zero data today; the data migration is trivial.
Anti-patterns locked (per Codex review):
- No self-hosted MongoDB on cloud VMs. Replaces one operational burden with another.
- No Cosmos DB for MongoDB (vCore or RU). Wire-compatible but ops-divergent; weakens portability and MongoDB behavioral confidence.
- No Atlas-via-Marketplace billing. Adds indirection without unlocking credit burn.
- No app-in-cloud-A + Atlas-in-cloud-B split pre-revenue.
Funding:
- MongoDB for Startups application — submitted this week. Eligibility confirmed (pre-Series-A, <7 yrs, single software product). Credit tier opaque publicly; typical $500–$5K with VC-partner referral unlocking higher.
- Cash overflow: Atlas M10 ~$57/mo if credits don't stretch. Tolerable.
Tracking: sv0-platform#493.
2.3 Compute — Hetzner through pilot, cloud-agnostic VM post-pilot (decision: v1.3)
Decision: Keep the API container on Hetzner for the pilot window. Migrate compute post-pilot to a cloud-agnostic Linux VM + Docker Compose. Cloud choice (Azure VM vs AWS VM) defers until post-pilot, contingent on credit approval status.
Why staged: Opus research concluded that migrating compute inside the 10-day pilot window introduces change-risk for cosmetic gain. ADR-018 explicitly allows managed-platform deferral ("we can defer through pilot as long as MongoDB is hardened"). With Atlas covering the DB layer, the Hetzner compute substrate is acceptable for a single-pilot duration.
Why cloud-agnostic VM, not Container Apps / ECS: Keep the first cloud migration boring and portable so a future cloud flip (Azure↔AWS) is a MONGODB_URI + DNS change, not a rewrite. Container Apps, ECS, and Fargate all bake in cloud-specific deployment semantics.
Post-pilot credit picture:
| Program | Status | What it pays for |
|---|---|---|
| AWS Mercury Activate | $5K activated | AWS compute if we go AWS post-pilot |
| Azure Founders Hub | NOT applied | Azure VM + networking + Key Vault if we go Azure post-pilot. Entry tier realistically $1–5K for bootstrapped/pre-revenue; $150K ceiling requires investor-backing. Decision: apply after post-pilot compute choice is locked, or at raise via VC-partner path. |
| MongoDB for Startups | Applying 2026-04 | Atlas (independent of cloud choice) |
| WorkOS startup credits | Awaiting Sally | SSO + SCIM + Impersonation |
Tracking: sv0-platform#493 (Phase 2 = post-pilot compute migration).
2.4 Trust & legal — de-escalated per DPO waiver (decision: v1.2)
Decision: MediaPro confirmed read-only metadata access does not require DPO involvement. Trust & Controls Summary is still produced (credibility hygiene, required for client #2+), but no longer gated by external counsel sign-off for the MediaPro pilot.
What it resolves: Original P0-13 framing of "no DPA will block signature" is materially weaker in the MediaPro-specific path. Track C runs in parallel with engineering but is not on the critical path for pilot go-live.
Tracking: sv0-documentation#192 (Trust & Controls Summary v1), sv0-website#41 (website EU claim + status string hygiene).
3. Active work — what's left to ship for the pilot
The 11 items below remain after architectural decisions. Grouped by capability, each with owner and concrete action.
3.1 Observability stack + security hygiene
Decision (2026-04-22): Grafana Cloud free tier + BetterStack free tier + grafana/mcp-grafana MCP server. Two engineer-days total. $0 at pilot scale. Portable across Hetzner → Azure VM / AWS VM post-pilot migration. Full analysis in research/2026-04-22-observability-stack.md. Tracked in sv0-platform#494.
Decision drivers: (1) we already emit prom-client metrics + JSON logs, so no instrumentation rewrite; (2) grafana/mcp-grafana is best-in-class for Claude Code agent access (logs, metrics, alerts, dashboards, traces via MCP); (3) survives the post-pilot compute migration unchanged. Datadog explicitly rejected (free tier is a trap); self-host LGTM deferred until post-pilot if cost becomes an issue.
Day-1 security fixes (fold into observability rollout):
- Close P0-5 — metrics tenant-ID leak.
sv0_job_duration_secondscarries atenant_idlabel insrc/shared/metrics/metrics.ts:25; call sites atsrc/workers/runtime.ts:125,133. Strip the label; move per-tenant duration into log context. Required before any external scraper is pointed at/metrics. Note: the HTTProutelabel onsv0_http_request_duration_secondsis already parameterized viareq.route.path(src/api/middleware/metrics.ts:11) — no tenant cardinality leak from URL scoping. ~15 min. - Gate
/metricsand/diagnosticsat the route level. Removing fromPUBLIC_PATHSalone is not sufficient —createSystemRoutes()is mounted atsrc/api/app.ts:80, before the bearer and session middleware at lines 108 and 114. The route never reachesPUBLIC_PATHSevaluation. Fix: add a route-local bearer-check middleware wrapping/metricsand/diagnosticsinsidecreateSystemRoutes(src/api/routes/system.ts:52,62), leaving/healthand/readypublic for Docker / k8s / Cloudflare probes. Alloy scrapes with the token. ~30 min. - Close P0-7 — external uptime monitor. BetterStack free, 60-second checks on
/api/v1/health, SMS + email, CF Access service-token headers to bypass login. ~30 min. - Session cookie
secure=trueunconditionally. Decouple fromNODE_ENV. ~5 min.
Day 2-5: Grafana Cloud signup + Alloy + dashboards + alerts + MCP server wiring. Full runbook in sv0-platform#494. Owner: Ivan + Ops.
3.2 Connector resilience at MediaPro scale — 3 items
- AWS multi-account enumeration (original P0-8, sv0-connectors#32). MediaPro is AWS-heavy; single-account scan is pilot-blocking. Validated against MediaPro-provided test env. Owner: Connectors.
- Streaming pagination for Entra + AWS (original P0-9).
azure_client.py:59-79andaws_client.py:306-343accumulate all pages in memory; 500K Entra SPs = OOM on 4GB host. Convert to generators. Owner: Connectors. - Azure shared
retry_session()honorsRetry-After(original P0-10).shared/sv0_azure/sv0_azure/auth.py:40-59. Pinrespect_retry_after_header=True,retry_after_max=300,retries=6. ServiceNow client is the reference impl. Owner: Connectors.
3.3 Analyst workflow — 2 items
- Connector-finding lifecycle parity (original P0-11).
src/api/routes/findings.ts:67hardcodesstatus: "active"for connector findings; they have no acknowledge/FP/close/reopen path. Give them the same lifecycle as evaluator findings. Owner: Platform. - Finding assignment + audit trail (original P0-12). Wire
acknowledged_byend-to-end, add comments collection + UI, allowremediated → activereopen with audit. Owner: Platform + UI.
3.4 Marketing / trust hygiene — 3 items, ~15 minutes total (plus Trust Summary)
- Remove four EU-infrastructure claims from website (original P0-14, sv0-website#41):
src/pages/index.astro:33feature listsrc/pages/evaluate.astro:58detail textsrc/pages/platform.astro:62-65FAQsrc/pages/platform.astro:345metric card
- Remove or wire "All Systems Operational" (original P0-15) — hardcoded footer string not connected to any monitor.
- Trust & Controls Summary v1 (original P0-13 reframed, sv0-documentation#192). De-escalated but still needed. CEO + external counsel.
3.5 Quality-of-life (P1 — ship in pilot-week-1 if not sooner)
- PDF / printable report export (currently Markdown-only; audit committees cannot consume
.md). - Evidence pack SHA256 hash visible on UI (computed but hidden today).
- Access-path nodes show display names, not truncated hex IDs (
FindingDetail.tsx:79,87,95— display names already on payload). - Connector-report findings show meaningful evidence, not empty JSON viewer.
- Jira Cloud connector Phase 1 ready against MediaPro test env (sv0-connectors#72).
- Tenant onboarding runbook + credential rotation runbook per connector.
3.6 Post-pilot follow-ups (weeks 3–6)
- Compute migration to Azure VM or AWS VM per credit status (sv0-platform#493 Phase 2).
- Session-stacking
acting_asimpersonation feature (3–4 days, gated onPermission.INTERNAL_IMPERSONATE). - Add Atlas Private Link, Terraform the stack, secrets to Key Vault / Secrets Manager.
- Log aggregation / tenant-tagged search.
express-rate-limitIPv6 bypass patch.- HMAC on evidence pack integrity (SHA256-only today).
4. Remediation calendar — three parallel tracks
Day 0 — single blocking action
Book WorkOS sales call. Ivan-owned. Nothing downstream unblocks until this call completes.
Track A — Platform / Ops / Connectors
| Day | Action | Owner | Ref |
|---|---|---|---|
| 1 | MongoDB for Startups application submitted. Externalize prod MONGODB_URI. External uptime monitor wired. Hetzner snapshots on. Remove /metrics from PUBLIC_PATHS. Session cookie secure=true. | Ivan + Ops | §3.1 |
| 2 | Atlas M10 cluster stood up. Default region: Frankfurt unless MediaPro confirms US-residency is acceptable. DB auth + IP allowlist + backups on. No traffic yet. | Ops | §2.2 |
| 2-3 | WorkOS Phase 0/1 (securityv0-internal org, Google Workspace OAuth). NODE_ENV=production + AUTH_PROVIDER=workos in prod env. Kill dev-auth-bypass. | Platform | §2.1 |
| 3 | Atlas cutover, 30-min maintenance window. mongodump → mongorestore. Flip MONGODB_URI. Smoke test. | Ops | §2.2 |
| 4 | Seed 3–5 magic-link test accounts in demo tenant. Manual walkthrough as a client-role user. | Platform | §5 |
| 4-7 | AWS multi-account enumeration validated against MediaPro test env. | Connectors | §3.2 |
| 5-8 | Connector streaming pagination; Azure Retry-After. | Connectors | §3.2 |
| 6-9 | Connector-finding lifecycle parity; finding assignment + audit. | Platform + UI | §3.3 |
| 8-11 | Jira Cloud connector Phase 1 against MediaPro test env. | Connectors | §3.5 |
| 10-12 | Tenant onboarding + credential rotation runbooks. Pilot provisioning dry-run. | Docs + Ops | §3.5 |
Track B — Client-facing
| Day | Action | Owner |
|---|---|---|
| 1-2 | Isaac probes MediaPro on: prod vs staging pilot target, data-residency expectations, SSO requirements, compliance attestations. | Isaac |
| 1-3 | MediaPro requirements packet v1 drafted (Azure / AWS / Foundry / Copilot Studio / ServiceNow / Jira / Entra permissions and IAM role templates). | Ivan + Isaac |
| 2-4 | Use-case coverage doc: MediaPro's 6 concerns → our rules → gap list → Sergey's fast-track recommendations. | Ivan |
| 5-7 | Requirements packet sent to Ezequiel. Two-week credential-prep window starts. | Isaac |
| ongoing | Weekly sync with Ezequiel on credential prep progress. | Isaac |
Track C — Trust / legal (de-escalated, parallel)
| Day | Action | Owner |
|---|---|---|
| 1 | Remove four EU claims from website; remove / wire "All Systems Operational". | Website |
| 2-7 | Trust & Controls Summary v1 drafted, reconciled with Atlas + Hetzner + WorkOS infrastructure. Supersedes fragmented Inetum/Deloitte partner-prep content. | CEO |
| 8-12 | Sign-off, version-controlled commit, share with MediaPro alongside requirements packet. | CEO |
If a track slips
- Track A slips: pilot goes live with documented exceptions in the client agreement annex. Exceptions: unfinished connector resilience = "one-account pilot" scope, deferred assignment UI = "contact us" support model, etc.
- Track B slips: MediaPro cannot prepare credentials in two weeks; pilot date pushes. Worse commercial outcome than a technical slip.
- Track C slips: acceptable for MediaPro (DPO waived). Not acceptable for client #2.
5. Multi-user testing and impersonation strategy
Current state in code (audit 2026-04-22)
The impersonation scaffolding is more built-out than ADR-016's "proposed" status suggests. Present in src/api/auth/:
Permission.INTERNAL_IMPERSONATEdefined and assigned toInternalRole.ownerAuthMethodunion includes"impersonation"workos-provider.tsmaps WorkOS's"Impersonation"responsesession.tsstoresis_super_adminwith 8-hour TTLresolvePermissions()distinguishessuper_admin_derivedvs.membershipsources
What's missing is the easy part: a POST /api/v1/admin/impersonate route, acting_as_user_id session field, UI banner, audit log emit. ~3–4 days of engineering.
Gating dependency
All multi-user testing paths require WorkOS Phase 0/1 live. Until then, REQUIRE_AUTH=false is in effect and the users / memberships / tenants collections don't exist.
Path comparison
| Path | Cost | Fit | Verdict |
|---|---|---|---|
| Second Google Workspace | $35–70/mo + 1–2 days | Operationally clunky | Overkill |
Users in securityv0.com Workspace | $0–70/mo | Domain-collision risk — users may inherit is_super_admin from securityv0-internal org | Dangerous; skip |
| Magic-link test accounts | $0 + 2–3 hours | Any email works; no IdP required | Recommended interim |
| WorkOS Dashboard Impersonation | Included (confirm with Sally) | Must trigger from admin.workos.com | Free fallback |
| sv0 session-stacking | $0 + 3–4 days engineering | Uses existing permission scaffolding | Recommended long-term (post-pilot) |
Recommendation
Pilot-window: magic-link test accounts, seeded post-WorkOS-Phase-0 (Track A day 4). Week 1 post-pilot: build the acting_as feature — gated by existing Permission.INTERNAL_IMPERSONATE, with mandatory audit log and server-rendered banner.
6. Trust & Controls Summary v1 — contents
De-escalated from "legal blocker" to "credibility artifact." Still needed.
- Named sub-processors: Hetzner (transitional pilot hosting), MongoDB Atlas (hosted on Azure or AWS per region choice), Cloudflare (Zero Trust + CDN), WorkOS (auth), GitHub (CI/CD + GHCR).
- Data controller identity + DPA commitment timeline. Unsigned draft DPA attached as annex.
- Breach notification SLA — 72 hours (GDPR-grade).
- Deletion-on-termination guarantee with timeframe and verification.
- Current security controls with honest caveats + remediation dates.
- SOC 2 readiness timeline. Interim: completed CAIQ.
- Named security contact + escalation path.
Tracked in sv0-documentation#192.
7. Issue tracking — standing readiness board
Umbrella tracker: sv0-documentation#195 — EPIC: MediaPro pilot readiness is the single pane of glass with per-track checklists. Use that issue for status skimming; use this document for rationale and evidence.
| Repo | Tracking issue | Covers |
|---|---|---|
sv0-platform | #492 — epic: pre-client readiness P0s (platform) | Remaining platform P0s (after auth + DB auto-resolutions) |
sv0-platform | #493 — Atlas cutover + compute migration | Atlas this week, compute post-pilot |
sv0-platform | #373 — WorkOS auth rollout | Auth auto-resolutions |
sv0-platform | #366 — capabilities endpoint | Settings hardcoded drift |
sv0-platform | #494 — observability stack rollout | §3.1 — Grafana Cloud + BetterStack + MCP |
sv0-connectors | #89 — epic: pre-client readiness (connectors) | §3.2 connector resilience |
sv0-connectors | #32 — AWS multi-account | AWS scale |
sv0-connectors | #72 — Jira Cloud Phase 1 | MediaPro-offered test env |
sv0-website | #41 — pre-client readiness (website) | EU claims + status string |
sv0-documentation | #192 — Trust & Controls Summary v1 | §6 |
sv0-documentation | #193 — MediaPro requirements packet | Track B |
sv0-documentation | #194 — MediaPro use-case coverage | Sergey fast-track input |
Appendix A: Original P0 inventory — what was found, how resolved
The v1.0 six-perspective review identified 15 P0 ship-blockers. Annotations reflect v1.4 architectural decisions.
| # | P0 | v2.0 status |
|---|---|---|
| P0-1 | Prod runs NODE_ENV=development + AUTH_PROVIDER=dev | Active — fixed by WorkOS Phase 0/1 (§2.1) |
| P0-2 | Attacker-controlled x-tenant-id | Auto-resolved once P0-1 lands; new middleware derives from JWT |
| P0-3 | Hardcoded DEV_COOKIE_PASSWORD | Auto-resolved once AUTH_PROVIDER=dev is no longer in prod |
| P0-4 | MongoDB runs with no authentication | In-flight, resolved once Atlas cutover verified against evidence checklist in §2.2 |
| P0-5 | /metrics publicly exposed + leaks tenant_id labels | Active — §3.1, 15 min fix |
| P0-6 | No offsite backups; backups on same disk | In-flight, resolved once Atlas cutover verified (native PITR; evidence checklist §2.2) |
| P0-7 | No external uptime monitor | Active — §3.1, 30 min fix |
| P0-8 | AWS connector single-account only | Active — §3.2, pilot-blocking at MediaPro scale |
| P0-9 | Connector pagination buffers in memory | Active — §3.2, OOM risk at scale |
| P0-10 | Azure retry_session() doesn't honor Retry-After | Active — §3.2 |
| P0-11 | Connector findings stuck at status: "active" | Active — §3.3 |
| P0-12 | No assignment / comments / audit trail | Active — §3.3 |
| P0-13 | No canonical Trust & Controls package | Active but de-escalated — §2.4, MediaPro waived DPO |
| P0-14 | Website EU-infrastructure claims (4 locations) | Active — §3.4, 5 min fix |
| P0-15 | Hardcoded "All Systems Operational" footer | Active — §3.4, 5 min fix |
Net: 2 in-flight via Atlas cutover (P0-4, P0-6, resolution gated on verification checklist), 2 in-flight via WorkOS Phase 0/1 (P0-2, P0-3), 11 active pilot work items.
Appendix B: Infrastructure options not taken
From Opus research agent (2026-04-22). Documented to audit the path not taken.
| Option | 1-mo cash | Why not |
|---|---|---|
| A. AWS t3.small + Atlas M10 | $72/mo | Deferred to post-pilot per Opus change-risk argument |
| B. AWS t3.medium + Atlas M10 | $87/mo | Same as A with more headroom |
| C. Azure Container Apps + Cosmos DB vCore | $0 cash, ~$210/mo credits | Cosmos weakens portability; explicit anti-pattern |
| D. Azure Container Apps + Atlas-on-Azure | ~$57 cash + ~$60 credits | Container Apps = cloud-specific deployment semantics; breaks portability promise |
| E. Hetzner + Atlas M10 (chosen hybrid) | ~$81/mo ($24 Hetzner + $57 Atlas) | Lowest-risk for pilot window, preserves cloud optionality post-pilot |
Revision history
| Date | Change |
|---|---|
| 2026-04-21 | v1.0 — initial six-perspective adversarial review, 15 P0s, single-track 14-day calendar |
| 2026-04-22 | v1.1 — Codex review applied: issue tracking table, P0-13 reframed, P0-14 expanded, Track A/B split |
| 2026-04-22 | v1.2 — MediaPro client context folded in; initial infra direction (migrate to AWS+Atlas during pilot) |
| 2026-04-22 | v1.3 — Opus + Sonnet research: deferred AWS migration to post-pilot; multi-user testing section added; Day-0 WorkOS sales call |
| 2026-04-22 | v1.4 — Codex review: split DB/compute decisions; Atlas cutover in-pilot; compute stays Hetzner; anti-patterns locked |
| 2026-04-22 | v2.0 — structural rewrite. Architectural decisions (WorkOS, Atlas, compute, trust) moved to the front. Active work regrouped by capability (observability, connectors, analyst workflow, marketing). Original 15-P0 inventory moved to Appendix A with per-item resolution status. Opus options table moved to Appendix B. Purpose: eliminate the incongruity between "MongoDB has no auth" style P0s and "we're migrating to Atlas" decisions they are pre-resolved by. |
| 2026-04-22 | v2.1 — Observability stack research completed (Opus subagent). Decision: Grafana Cloud free + BetterStack free + grafana/mcp-grafana MCP server. Rationale: best agentic-ops surface, zero ops burden, survives post-pilot compute migration, $0 at pilot scale. §3.1 rewritten to incorporate 5-day rollout; new issue sv0-platform#494. Full research in docs/architecture/research/2026-04-22-observability-stack.md. |
| 2026-04-22 | v2.2 — URL tenant-scoping research (Opus subagent) + Codex review applied: (1) Decided to keep /t/:slug/ per ADR-016; Path 1 (add SINGLE_TENANT_SLUG env flag when first dedicated-deployment client signs) rather than subdomain migration. ADR-016 amended with cardinality verification (route label is parameterized, no URL-cardinality leak), enterprise-tool survey, revisit thresholds. (2) §3.1 corrected: removing /metrics from PUBLIC_PATHS alone is insufficient because createSystemRoutes() mounts at app.ts:80 before auth middleware — requires route-local bearer-check on /metrics and /diagnostics. (3) "Resolved by Atlas" language softened to "in-flight, resolved once cutover verified" with explicit evidence checklist. (4) Atlas region language tightened: pending MediaPro confirmation, default EU/Frankfurt (aligns with Grafana Cloud region) unless US residency is explicitly approved. |
| 2026-04-22 | v2.3 — Umbrella epic sv0-documentation#195 opened as single-pane-of-glass tracker with per-track checklists. This review doc remains the source of rationale; the epic is for status skimming. |
Owner and review cadence
Owner: Ivan Fofanov (CTO).
Review cadence: Twice-weekly status update against Track A/B/C items until MediaPro go-live. Close items with commit/PR reference, not silently.
Definition of done: Every §3 active item is closed with evidence, or documented as a client-agreement exception with Ezequiel's sign-off.