Skip to main content

ADR-001: MongoDB Only for MVP

Status

Accepted (2026-01-22)

Context

SecurityV0 has two primary data workloads:

  1. Graph traversal - Execution path queries: "What can this identity reach through which roles?"
  2. Temporal analysis - Point-in-time queries, drift detection over time

The question: Should we use a single database (MongoDB) or a hybrid approach (Neo4j for graph + TimescaleDB for temporal)?

Evaluation Criteria

  • Rich document storage (full policy JSON, raw API responses)
  • Graph traversal performance (3-5 hop paths typical)
  • Point-in-time subgraph queries
  • Operational simplicity
  • Team familiarity

Decision

MongoDB as the single database for MVP. All data — entities, relationships, versions, events, findings, evidence — lives in MongoDB.

Graph queries are handled through:

  • Materialized execution paths - Pre-computed on each identity during sync
  • Denormalized reverse lookups - accessible_by arrays on resource documents
  • Application-level traversal - Follow relationship references for on-demand queries

Future enhancement: When scale requires it (10,000+ identities, 5+ hop paths), add Neo4j as a thin graph index over MongoDB. The StorageAdapter interface enables this without changing connectors or API.

Rationale

MongoDB Advantages

CapabilityMongoDBNeo4j+TimescaleDB
Rich documentsNative (embedded JSON)Poor (flat properties)
Point-in-time queriesDirect read (versioned docs)Event reconstruction
Operational complexitySingle systemTwo systems to manage
Team familiarityHighLow

MongoDB Limitations (Acceptable)

  • $graphLookup has memory limits and single-collection constraint
  • No index-free adjacency (index lookup at each hop)
  • Variable-length path queries less efficient than native graph DB

Why These Limitations Are Acceptable

At MVP scale (< 1,000 identities, 2-3 connectors):

  • Materialized paths enable O(1) blast radius queries
  • Application-level traversal is fast enough for on-demand queries
  • Reverse queries via denormalized arrays are efficient

Why Not Neo4j for MVP

  • Overkill for MVP scale - Native graph traversal not needed at < 1,000 identities
  • Poor document support - Flat properties can't store rich policy JSON efficiently
  • Dual-write complexity - Two databases means consistency challenges
  • Operational overhead - Two systems to manage, backup, monitor

Consequences

Positive

  • Simpler mental model (one database)
  • Rich document storage for policy JSON and raw API responses
  • Direct point-in-time queries (no event reconstruction)
  • Team can move fast with familiar technology
  • StorageAdapter abstraction protects future migration

Negative (Acceptable Trade-offs)

  • Graph queries require application-level traversal
  • Materialized paths must be recomputed on sync
  • Deep path queries (5+ hops) become expensive at scale

Risks Mitigated

  • ✅ StorageAdapter interface isolates connectors/API from storage implementation
  • ✅ Neo4j migration path designed and documented
  • ✅ Materialized paths provide O(1) blast radius at any scale

When to Reconsider

Add Neo4j when ANY of these occur:

  • 10,000+ identities per tenant with 5+ connectors
  • 5+ hop transitive chains (cross-system paths)
  • Path recomputation becomes sync bottleneck
  • Real-time reverse queries required at scale