Architecture and Data Model Review
Date: 2026-02-07
Scope
This review evaluates the SecurityV0 architecture, data model, and evidence strategy against the PRD constraints for deterministic, evidence-grade findings. It incorporates new automation research and ServiceNow evidence sources.
Strengths That Should Be Preserved
- Deterministic scope and non-goals keep the MVP credible and enforce evidence-grade discipline.
- AUTHENTICATES_TO enables cross-system execution paths and is the correct abstraction for Entra to ServiceNow chains.
- Audit-log provenance provides strong traceability for findings and evidence packs.
- Materialized execution paths enable fast blast-radius queries at small tenant scale.
- Connector contract cleanly separates extraction from storage and allows parallel development.
Evidence-Grade Blocking Gaps
- Baseline storage design breaks at scale. Current baselines embed all entities in a single document, which will exceed MongoDB 16MB limits well below real-world tenant sizes.
- Ownership model mismatch. The data model treats Owners as distinct entities, but the connector schema lacks an Owner node type, which makes OWNED_BY semantics ambiguous and compromises deterministic ownership decay.
- Execution evidence is not a first-class artifact. The PRD requires proof of autonomous execution, but the model only stores EXECUTES_ON edges without immutable evidence records or source log linkage.
- Embedded execution paths will hit document size limits. execution_paths and accessible_by arrays can exceed limits for high-fan-out identities and resources.
- No-inference constraint conflicts with drift narratives. Scope drift and approval language must be backed by explicit approval records or labeled as unavailable.
- raw_api_response risks policy violations. Storing raw API responses without redaction risks secrets or regulated data exposure, violating metadata-only constraints.
Architecture Improvements (Prioritized)
P0: Must Fix for MVP Integrity
- Redesign baselines. Store baselines as per-entity documents or bounded chunks keyed by baseline_id to avoid the 16MB limit and enable partial retrieval.
- Align ownership modeling. Add an Owner node type in the normalized schema or enforce owner_type on human/team nodes with clear OWNED_BY semantics. Do not overload sys_created_by as owner.
- Add execution evidence artifacts. Introduce an execution_evidence entity with source_table, sys_id, source_timestamp, and payload_hash. Link EXECUTES_ON and EXECUTES edges to these records.
- Add deterministic link evidence for AUTHENTICATES_TO. Include source IDs from OAuth Application Registry and token mapping configuration (client_id, user field) to prove Entra-to-ServiceNow linkage.
P1: Required for Realistic Tenant Scale
- Move execution_paths and accessible_by into a dedicated collection when thresholds are exceeded. Keep embedded arrays only for small tenants with size guards.
- Introduce evidence completeness flags. If transaction logs or role audit logs are not enabled, findings should explicitly note incomplete history rather than imply approval or execution.
- Define a redaction policy. Replace raw_api_response with a hashed payload and a field-level allowlist for persisted metadata.
P2: Improves Deterministic Drift and Automation Coverage
- Extend drift detection to automation artifacts. Track changes in flow triggers, schedules, run_as identity, and activation states as deterministic drift signals.
- Add automation-centric events. Examples include automation_created, automation_updated, run_as_changed, schedule_changed, flow_published.
Automation Coverage Additions
New automation research requires explicit modeling of ServiceNow automation artifacts beyond identities and roles.
Automation Artifacts As First-Class Entities
- Flow Designer flows
- Business Rules
- Script Actions
- Scheduled Jobs
- Script Includes
- Workflow Activities
Required Relationships
- CREATED_BY (automation to human)
- RUNS_AS (automation to identity)
- TRIGGERS_ON (automation to resource or event)
- EXECUTES (automation to execution_evidence)
Identity Subtype for System Execution
Many ServiceNow automations run as System. Model a system identity subtype as a privileged NHI for accurate blast radius.
Data Model Alignment Issues
- OWNED_BY should represent accountable ownership, not mere creation. Use CREATED_BY for sys_created_by. Ownership should be explicit and reassigned when possible.
- ownership_state should be derived by the evaluator, not ingested from connectors.
- AUTHENTICATES_VIA and AUTHENTICATES_TO should carry evidence references for deterministic linkage.
Evidence Chain Requirements (ServiceNow)
Required evidence sources for deterministic claims:
- syslog_transaction for inbound REST execution evidence.
- sys_flow_context for Flow Designer execution evidence.
- ecc_queue for MID Server execution evidence.
- sys_audit_role for role change evidence when enabled.
- sys_user_has_role for current role state.
Open Questions
- Which ServiceNow tables and fields will be approved as authoritative evidence in the MVP?
- Is sys_audit_role enabled in the target instances, and is syslog_transaction accessible for API ingestion?
- How will Entra appId (client_id) be mapped to ServiceNow oauth_entity records deterministically?
- What scale thresholds trigger externalization of execution paths and baselines?
- What approval system is authoritative for role expansion or scope change evidence?
Recommended Next Steps
- Update core architecture docs to incorporate baseline redesign, evidence artifacts, and automation entities.
- Add a formal evidence schema to the data model and define evidence completeness semantics.
- Validate ServiceNow table availability and field mappings in a real instance.
Next Action
Status: adopted — shipped
Findings incorporated into docs/architecture/01-data-model.md, 05-connectors.md, and associated ADRs. No further action required.