Skip to main content

Deployment and Cloud Strategy Research

Date: 2026-02-07 Scope: Core platform (api, query, trigger evaluator, evidence generator, ui), database, connectors, scheduling/triggers, logging/observability, and autonomous operations support.

1. Decision Criteria

Primary criteria used for the options below:

  • Speed to first pilot (days/weeks, not months)
  • Deterministic operability (easy to inspect logs and state via CLI)
  • Cost predictability at low volume
  • Migration path without re-platforming core code
  • Support for autonomous troubleshooting (agents/automation can query logs/metrics with narrow permissions)

2. Deployment Options

Option A: MVP Self-Hosted VM (Hetzner + Docker Compose)

Shape

  • One VM for app services (api, ui, connectors, trigger, evidence) via Docker Compose
  • One VM for data plane (mongodb + backup job) OR same VM for earliest MVP
  • Optional third small VM for observability stack (Loki/Prometheus/Grafana)

Pros

  • Fastest setup and lowest fixed cost
  • Full root/SSH control for debugging
  • Easy to run all services together and iterate architecture quickly

Cons

  • You own HA, backups, patching, and security hardening
  • Manual scaling and failover
  • Higher operational risk for customer-facing production

Fit

  • Best for internal demo, design-partner pilot, and rapid product iteration

Option B: AWS ECS on EC2 (Production-leaning, lower complexity than EKS)

Shape

  • ECS services for core platform
  • Connector workers as separate ECS services or scheduled tasks
  • MongoDB remains self-managed initially (EC2) or moved to Atlas later
  • CloudWatch logs/metrics + ECS Exec for diagnostics

Pros

  • ECS EC2 launch type has no additional ECS control plane charge
  • Better IAM, networking, and security controls than single VM
  • Easier ops than EKS

Cons

  • You still manage EC2 capacity and patching
  • Still non-trivial if MongoDB remains self-managed

Fit

  • Strong mid-stage target when you want AWS controls without Kubernetes overhead

Option C: AWS ECS on Fargate (Managed container runtime)

Shape

  • Core services on ECS Fargate
  • Connector jobs as on-demand/scheduled Fargate tasks
  • DB likely moved to managed offering for better reliability

Pros

  • No node management
  • Clean scaling model per task
  • Good fit for bursty connectors

Cons

  • Can cost more than EC2 at steady state
  • Per-task compute pricing needs tight right-sizing

Fit

  • Good for production where team bandwidth for infra ops is limited

Option D: AWS EKS (Strategic only if Kubernetes-native operating model is required)

Shape

  • Full Kubernetes platform for core services and workers
  • GitOps (Argo CD/Flux) + HPA/KEDA + service mesh (optional)

Pros

  • Maximum flexibility and ecosystem
  • Standardized platform if broader org already runs Kubernetes

Cons

  • Highest platform complexity
  • EKS cluster fee applies regardless of workload size

Fit

  • Use only when there is a clear multi-team/platform reason

Option E: Event-Driven Connectors/Triggers with Lambda (Selective)

Shape

  • Keep core API/query services containerized
  • Move selected connector ingestion and trigger evaluation jobs to Lambda + EventBridge

Pros

  • Excellent for bursty workloads
  • Fine-grained cost for sporadic jobs

Cons

  • Distributed debugging complexity
  • Cold start/runtime constraints for heavy workloads

Fit

  • Good complement later, not a full replacement of core graph platform

Phase 0 (MVP, now): Hetzner + Docker Compose

Target timeline: 1-3 weeks

Baseline architecture

  • VM-1: api, query, trigger, evidence, ui, connector workers
  • VM-2: mongodb + nightly encrypted backup + restore test
  • Optional VM-3: observability stack if needed

Critical controls for MVP

  • Immutable image tags per deploy
  • Backup + restore drill (weekly)
  • Basic SLO dashboard (API error rate, connector failure rate, sync latency)
  • Structured JSON logs with correlation IDs (tenant_id, sync_id, entity_id, finding_id)

Why this is acceptable

  • Fastest path to pilot while preserving deterministic debugging via SSH + Docker logs

Phase 1 (Pilot to early production): AWS ECS on EC2

Target timeline: after first pilots, before broader customer rollout

Move plan

  • Migrate app services from Compose to ECS task definitions
  • Keep connectors as separate services/tasks
  • Use CloudWatch for centralized logs/metrics
  • Enable ECS Exec for break-glass diagnostics

Reasoning

  • Materially better security and operability than raw VMs without full Kubernetes burden

Phase 2 (Production scale): ECS Fargate or EKS by clear trigger

Choose ECS Fargate when

  • Team wants managed runtime and predictable operational model
  • Workloads are moderate and scaling is service/task oriented

Choose EKS when

  • Organization already has Kubernetes platform team
  • Need advanced Kubernetes-native controls that ECS cannot reasonably satisfy

Phase 3 (Optimization): Selective Lambda for burst jobs

Use Lambda/EventBridge for:

  • scheduled lightweight connectors
  • enrichment jobs
  • periodic housekeeping/reconciliation

Keep long-running graph/API workloads containerized.

4. Observability and Logging Strategy

MVP observability (low cost, high control)

Option MVP-Obs-A: Self-hosted LGTM stack

  • Loki + Promtail for logs
  • Prometheus + Alertmanager for metrics
  • Grafana for dashboards
  • Optional Tempo for traces

Pros:

  • Minimal vendor cost
  • Full control
  • Easy CLI access (docker logs, logcli, promtool)

Cons:

  • You operate it
  • Retention and scaling need discipline

Option MVP-Obs-B: Managed SaaS light footprint

  • Grafana Cloud or Better Stack for logs/metrics/traces
  • Keep app on Hetzner

Pros:

  • Lower operational overhead
  • Fast setup and team-friendly UI

Cons:

  • Ongoing ingest/retention costs
  • Vendor dependency

Production observability

Option Prod-Obs-A: AWS-native

  • CloudWatch Logs + CloudWatch metrics + alarms
  • Optional Amazon Managed Grafana and AMP

Pros:

  • IAM-native access control
  • Deep ECS/EKS integration
  • ECS Exec and CloudWatch CLI improve remote diagnostics

Cons:

  • Cost can rise quickly with log volume and query scanning

Option Prod-Obs-B: Hybrid

  • Keep CloudWatch for platform logs
  • Route application logs/metrics to Grafana Cloud or Better Stack

Pros:

  • Better query UX/correlation in some cases
  • Can reduce operational toil

Cons:

  • Dual tooling and data egress considerations

5. MVP Cost Comparison And Ready-to-Go Estimate

All numbers below are high-level estimates using public list pricing and simple assumptions.

5.1 Assumptions Used

  • Region assumptions:
  • Hetzner EU pricing from public cloud page.
  • AWS us-east-1 style list pricing anchors.
  • Workload assumptions:
  • Core services run 24x7.
  • 1 pilot tenant, low traffic.
  • 30 GB/month log ingestion.
  • 300 GB persistent block storage for MongoDB data and snapshots.
  • CI/CD assumptions:
  • GitHub Actions private repo includes 2,000 free Linux minutes/month; overage priced at $0.008/min.
  • Exclusions:
  • VAT/tax, support plans, data egress spikes, incident-response labor.
  • One-time engineering setup cost.

5.2 Unit Cost Anchors (Public Pricing)

  • Hetzner Cloud examples:
  • CPX21 = EUR 9.49/month
  • CPX31 = EUR 16.49/month
  • CPX41 = EUR 30.49/month
  • Hetzner Load Balancer = from EUR 5.39/month
  • Hetzner Object Storage = from EUR 4.99/month
  • AWS:
  • ECS on EC2: no additional ECS fee beyond underlying resources.
  • EKS control plane: $0.10/hour per cluster.
  • Fargate Linux x86 (on-demand): vCPU $0.000011244/second, memory $0.000001235/GB-second.
  • EC2 on-demand (reference class):
  • t2.medium = $0.0464/hour
  • t2.large = $0.0928/hour
  • Application Load Balancer:
  • base $0.0225/hour
  • LCU $0.008/LCU-hour
  • CloudWatch Logs examples:
  • ingest $0.50/GB
  • archive $0.03/GB-month
  • GitHub Actions:
  • private repos include 2,000 minutes/month free
  • Linux 2-core overage $0.008/min
  • Grafana Cloud Pro:
  • $19/month platform fee
  • includes 50 GB logs + 50 GB traces
  • beyond included logs/traces $0.50/GB

5.3 MVP Option Comparison (Monthly, High-Level)

MVP OptionWhat is includedEstimated Monthly Cost
A. Hetzner Lean1x CPX31 all-in-one app+db, object storage backups~EUR 21.48
B. Hetzner Ready-to-Go (recommended MVP)1x CPX31 app/workers + 1x CPX41 MongoDB + 1x CPX21 observability + LB + object storage~EUR 66.85
C. AWS ECS on EC2 (MVP production-like)2x t2.medium ECS nodes + 1x t2.large MongoDB + ALB + 300GB EBS + 3 IPv4 + 30GB CloudWatch logs~USD 205 to USD 240
D. AWS ECS Fargate + EC2 MongoDBFargate core services (3 vCPU, 6 GB) + 1x t2.large MongoDB + ALB + EBS + IPv4 + CloudWatch logs~USD 245 to USD 280

Notes:

  • AWS ranges reflect whether NAT Gateway is needed (+~USD 33/month base before traffic processing) and modest ALB LCU variability.
  • Option C/D are intentionally sized as practical MVP production baselines, not ultra-minimal single-instance demos.

Recommended for first customer pilot:

  • Platform: Hetzner Compose split-node MVP (CPX31 + CPX41 + CPX21 + LB + Object Storage)
  • Observability: self-hosted LGTM on the CPX21 node
  • CI/CD: GitHub Actions auto-deploy to staging on every main commit, prod via protected approval

Estimated monthly run cost:

  • Infrastructure subtotal: ~EUR 66.85/month
  • GitHub Actions overage:
  • 0 if under included 2,000 private Linux minutes
  • Example overage (+1,000 min) = ~USD 8/month
  • Optional managed observability alternative:
  • replace self-hosted observability node with Grafana Cloud Pro (USD 19/month), typically reducing ops burden

Planning budget for pilot:

  • Target run-rate: ~EUR 67 + USD 0 to 20/month (depending on CI and observability choice)
  • Operationally safe envelope with contingency: ~EUR 90 to 140/month equivalent

5.5 Hetzner-Range Competitors (US-Focused)

Reference date: 2026-02-07. Prices are entry-level and can change.

ProviderEntry Price (Approx)US PresenceFit Notes
OVHcloud US VPSfrom $4.20/monthUS regions including Hillsboro and Vint HillStrong price/perf, daily backups and anti-DDoS included in VPS range.
Vultr Cloud Computefrom $2.50/month (IPv6-only) or $5/month standard entryBroad US city coverage (e.g., Atlanta, Chicago, Dallas, Los Angeles, Miami, New York area, Seattle, SF Bay Area)Good balance of low price and many US regions.
DigitalOcean Dropletsfrom $6/month for 1GB basic dropletsUS datacenters in NYC, SFO, ATLUsually higher than Hetzner at equal specs, but simpler operations and good DX.
Contabo Cloud VPSCloud VPS 10 total shown around EUR 5.45 to 5.90 with US location feesUS East, US West, US CentralVery low monthly price; verify performance consistency for production workloads.
IONOS Cloud Cubesfrom $5.76 per 30 daysNewark, NJ (US)Cost-effective and simple billing model for small footprints.
UpCloud Developer Plansfrom $3.5/month (USD pricing)US in Chicago, New York, San JoseCompetitive low end with good US footprint and predictable plans.
AWS Lightsailfrom $3.50/month (IPv6-only) and $5/month standard entryMultiple US regionsSimple AWS entry point; costs can rise once add-ons/managed services are added.

Selection guidance for SecurityV0 MVP:

  • Choose Vultr when you want many US regions with low entry cost and simple VM operations.
  • Choose OVHcloud US when you prioritize cost and built-in VPS protections.
  • Choose DigitalOcean when developer workflow and operational simplicity are more important than minimum price.
  • Choose Contabo/IONOS/UpCloud when monthly floor cost is the main driver and you can validate workload behavior early.

6. CLI and Agent Access Requirements

Required for both human and autonomous troubleshooting:

  • Centralized logs accessible by CLI
  • Narrow-scoped read-only credentials for diagnostic agents
  • Correlation IDs across services and jobs
  • Ability to access runtime shell only via auditable controls

CLI paths by platform

  • Compose/VM: ssh, docker compose logs, docker logs
  • ECS: aws logs tail, aws logs start-query, aws ecs execute-command
  • EKS: kubectl logs, kubectl describe, kubectl top

Security model

  • No shared root credentials
  • Role-based temporary credentials (OIDC where possible)
  • Full audit trail for interactive access (ECS Exec supports CloudTrail auditing)

7. CI/CD Plan (MVP Mandatory)

GitHub Actions should perform automatic deployment on commit to main.

MVP pipeline (Hetzner + Compose)

  1. lint-test
  • Run tests and static checks.
  1. build-publish
  • Build container images.
  • Push to registry.
  1. deploy-staging (auto on main)
  • SSH to staging VM.
  • Pull latest images.
  • docker compose up -d.
  • Run smoke checks.
  1. deploy-prod (manual approval until stable)
  • Same flow as staging with protected environment approval.

Auth and secrets

  • Prefer OIDC to cloud where applicable.
  • For VM SSH, use short-lived deploy keys and restricted command scope.

Drift prevention

  • Store deployment manifests in Git.
  • Record deployed image digest.
  • Keep rollback command and previous digest available.

8. Strategic CI/CD Evolution

As platform matures:

  • Move from SSH-based deploys to GitOps (ECS task defs in Git, or EKS manifests via Argo CD/Flux).
  • Add canary or blue/green deployment patterns.
  • Add policy gates for schema migrations and evidence-pack integrity checks.
  1. Start with Hetzner Compose MVP for speed, but split DB onto separate node early.
  2. Implement structured logging and correlation IDs before pilot.
  3. Add minimal observability stack now (self-hosted LGTM or managed low-tier).
  4. Enforce auto-deploy via GitHub Actions to staging on each main commit.
  5. Migrate to ECS on EC2 as first production-grade target.
  6. Re-evaluate Fargate vs EKS only when scale/team constraints justify.
  7. Introduce Lambda selectively for bursty connector/trigger jobs.

10. Trigger-Based Reassessment Rules

Re-evaluate platform choice when one or more thresholds are crossed:

  • 10 production tenants

  • 200 connector sync jobs/day

  • 100 GB/day log ingestion

  • 99.9% uptime target with strict recovery objectives

  • Need multi-region failover

At that point, prioritize managed runtime and managed observability to reduce operations risk.

Sources


Next Action

Status: adopted — shipped Docker + Colima + GitHub Actions CI/CD model adopted. Deployment config lives in docker-compose.deploy.yml (Caddy TLS) and .github/workflows/. No further action required.