ADR-002: Single Entities Collection
Status
Accepted (2026-01-27)
Context
During implementation, we evaluated whether the single entities collection approach would scale, or if we should split entities into separate collections.
Options Considered
- Approach A (Current): Single
entitiescollection withentity_typediscriminator - Approach B: Separate collection per type (identities, owners, roles, permissions, resources, credentials)
- Approach C: Type collections + separate
relationshipsedge collection
Concerns Addressed
- MongoDB 16MB document limit
- Query performance for security investigations
- Operational complexity (backups, indexes)
- Neo4j migration path
- Alignment with materialized execution paths strategy
Decision
Keep single entities collection with embedded relationships.
All entity types (identity, owner, role, permission, resource, credential) stored in one collection, discriminated by entity_type field.
Rationale
1. Document Size Not a Concern
- Typical identity with 50 execution paths: ~10KB
- High-privilege identity with 200 paths: ~40KB
- Would need 800+ paths to approach 16MB limit
- At 1K identities: average 50KB per document (well within limits)
2. Query Performance Comparison
| Query Pattern | Single Collection | Type Collections | Edge Collection |
|---|---|---|---|
| Blast radius | 1 query (O(1)) | 1 query | 1-2 queries |
| Ownership chain | 2 queries | 2+ queries | 3+ queries |
| On-demand path | 4 queries | 6 queries | 7+ queries |
| Mixed entity query | 1 query | 6 queries | 6+ queries |
Single collection minimizes round-trips for security investigation queries.
3. Operational Simplicity
| Dimension | Single | Type Collections | Edge Collection |
|---|---|---|---|
| Collections to manage | 6 | 11 | 12+ |
| Entity indexes | 8 | 30+ | 40+ |
| Write routing | Simple | Complex (6-way) | Very complex |
4. Aligns with Materialized Paths
The platform's core performance strategy is pre-computed execution paths stored directly on identity documents. This approach:
- ✅ Paths embedded on identities
- ✅ No join complexity at query time
- ✅ Application-level traversal is simple
Edge collection would conflict with materialized paths, requiring either:
- Keep paths embedded (defeats edge collection purpose)
- Move paths to separate collection (extra lookup overhead)
- Recompute on query (defeats materialization)
5. Industry Standard Pattern
Single collection with discriminator is the idiomatic MongoDB pattern for polymorphic documents. Used in production at scale across the industry.
Consequences
Positive
- Simplest mental model (one place for entities)
- Optimal performance for security queries
- Minimal operational overhead
- Clean Neo4j migration (single collection to iterate)
- Schema evolution without migration (add enum value)
Negative (Acceptable Trade-offs)
- Identity documents grow with execution paths (5-50KB typical)
- Non-atomic path materialization across documents (eventual consistency)
- Full collection scans if not filtering by
entity_type(mitigated by indexes)
When to Reconsider
Trigger to Split Identities
Only if ALL of:
- Tenants exceed 10,000 identities AND
- Identity documents consistently exceed 100KB AND
- Query performance degrades on entity_type scans
Response: Split to 2 collections (identities, other_entities) — minimal change via StorageAdapter.
Trigger for Edge Collection
Never for MongoDB-only architecture.
Only consider if Neo4j already deployed AND need graph analytics on raw relationships.
Related Documents
- 03-database.md - Full database architecture
- ADR-001 - MongoDB-only decision