Skip to main content

ADR-002: Single Entities Collection

Status

Accepted (2026-01-27)

Context

During implementation, we evaluated whether the single entities collection approach would scale, or if we should split entities into separate collections.

Options Considered

  1. Approach A (Current): Single entities collection with entity_type discriminator
  2. Approach B: Separate collection per type (identities, owners, roles, permissions, resources, credentials)
  3. Approach C: Type collections + separate relationships edge collection

Concerns Addressed

  • MongoDB 16MB document limit
  • Query performance for security investigations
  • Operational complexity (backups, indexes)
  • Neo4j migration path
  • Alignment with materialized execution paths strategy

Decision

Keep single entities collection with embedded relationships.

All entity types (identity, owner, role, permission, resource, credential) stored in one collection, discriminated by entity_type field.

Rationale

1. Document Size Not a Concern

  • Typical identity with 50 execution paths: ~10KB
  • High-privilege identity with 200 paths: ~40KB
  • Would need 800+ paths to approach 16MB limit
  • At 1K identities: average 50KB per document (well within limits)

2. Query Performance Comparison

Query PatternSingle CollectionType CollectionsEdge Collection
Blast radius1 query (O(1))1 query1-2 queries
Ownership chain2 queries2+ queries3+ queries
On-demand path4 queries6 queries7+ queries
Mixed entity query1 query6 queries6+ queries

Single collection minimizes round-trips for security investigation queries.

3. Operational Simplicity

DimensionSingleType CollectionsEdge Collection
Collections to manage61112+
Entity indexes830+40+
Write routingSimpleComplex (6-way)Very complex

4. Aligns with Materialized Paths

The platform's core performance strategy is pre-computed execution paths stored directly on identity documents. This approach:

  • ✅ Paths embedded on identities
  • ✅ No join complexity at query time
  • ✅ Application-level traversal is simple

Edge collection would conflict with materialized paths, requiring either:

  • Keep paths embedded (defeats edge collection purpose)
  • Move paths to separate collection (extra lookup overhead)
  • Recompute on query (defeats materialization)

5. Industry Standard Pattern

Single collection with discriminator is the idiomatic MongoDB pattern for polymorphic documents. Used in production at scale across the industry.

Consequences

Positive

  • Simplest mental model (one place for entities)
  • Optimal performance for security queries
  • Minimal operational overhead
  • Clean Neo4j migration (single collection to iterate)
  • Schema evolution without migration (add enum value)

Negative (Acceptable Trade-offs)

  • Identity documents grow with execution paths (5-50KB typical)
  • Non-atomic path materialization across documents (eventual consistency)
  • Full collection scans if not filtering by entity_type (mitigated by indexes)

When to Reconsider

Trigger to Split Identities

Only if ALL of:

  • Tenants exceed 10,000 identities AND
  • Identity documents consistently exceed 100KB AND
  • Query performance degrades on entity_type scans

Response: Split to 2 collections (identities, other_entities) — minimal change via StorageAdapter.

Trigger for Edge Collection

Never for MongoDB-only architecture.

Only consider if Neo4j already deployed AND need graph analytics on raw relationships.