ADR-001: MongoDB Only for MVP
Status
Accepted (2026-01-22)
Context
SecurityV0 has two primary data workloads:
- Graph traversal - Execution path queries: "What can this identity reach through which roles?"
- Temporal analysis - Point-in-time queries, drift detection over time
The question: Should we use a single database (MongoDB) or a hybrid approach (Neo4j for graph + TimescaleDB for temporal)?
Evaluation Criteria
- Rich document storage (full policy JSON, raw API responses)
- Graph traversal performance (3-5 hop paths typical)
- Point-in-time subgraph queries
- Operational simplicity
- Team familiarity
Decision
MongoDB as the single database for MVP. All data — entities, relationships, versions, events, findings, evidence — lives in MongoDB.
Graph queries are handled through:
- Materialized execution paths - Pre-computed on each identity during sync
- Denormalized reverse lookups -
accessible_byarrays on resource documents - Application-level traversal - Follow relationship references for on-demand queries
Future enhancement: When scale requires it (10,000+ identities, 5+ hop paths), add Neo4j as a thin graph index over MongoDB. The StorageAdapter interface enables this without changing connectors or API.
Rationale
MongoDB Advantages
| Capability | MongoDB | Neo4j+TimescaleDB |
|---|---|---|
| Rich documents | Native (embedded JSON) | Poor (flat properties) |
| Point-in-time queries | Direct read (versioned docs) | Event reconstruction |
| Operational complexity | Single system | Two systems to manage |
| Team familiarity | High | Low |
MongoDB Limitations (Acceptable)
$graphLookuphas memory limits and single-collection constraint- No index-free adjacency (index lookup at each hop)
- Variable-length path queries less efficient than native graph DB
Why These Limitations Are Acceptable
At MVP scale (< 1,000 identities, 2-3 connectors):
- Materialized paths enable O(1) blast radius queries
- Application-level traversal is fast enough for on-demand queries
- Reverse queries via denormalized arrays are efficient
Why Not Neo4j for MVP
- Overkill for MVP scale - Native graph traversal not needed at < 1,000 identities
- Poor document support - Flat properties can't store rich policy JSON efficiently
- Dual-write complexity - Two databases means consistency challenges
- Operational overhead - Two systems to manage, backup, monitor
Consequences
Positive
- Simpler mental model (one database)
- Rich document storage for policy JSON and raw API responses
- Direct point-in-time queries (no event reconstruction)
- Team can move fast with familiar technology
- StorageAdapter abstraction protects future migration
Negative (Acceptable Trade-offs)
- Graph queries require application-level traversal
- Materialized paths must be recomputed on sync
- Deep path queries (5+ hops) become expensive at scale
Risks Mitigated
- ✅ StorageAdapter interface isolates connectors/API from storage implementation
- ✅ Neo4j migration path designed and documented
- ✅ Materialized paths provide O(1) blast radius at any scale
When to Reconsider
Add Neo4j when ANY of these occur:
- 10,000+ identities per tenant with 5+ connectors
- 5+ hop transitive chains (cross-system paths)
- Path recomputation becomes sync bottleneck
- Real-time reverse queries required at scale
Related Documents
- 03-database.md - Full database architecture
- ADR-002 - Single collection decision
- ADR-003 - Apache AGE rejection