Validation of an Enterprise Autonomous Execution Architecture and Data Model
Executive summary
Your documents describe an Autonomous Execution Exposure Management platform that is intentionally deterministic, read-only, explainable, temporal, and evidence-grade, centered on reconstructing “standing execution paths” that persist after human ownership decays. Your core architectural decomposition (connector layer → normalization/diff → single system-of-record datastore → query/trigger/evidence services → export surfaces) is directionally correct for an MVP and strongly aligned with modern identity realities such as workload federation, token exchange, and short-lived credentials. The overall design is also consistent with contemporary identity/security fundamentals: separation of authentication material from principals (credentials vs identities), explicit modeling of delegation and token exchange concepts, and audit-grade logging and provenance concerns. citeturn5search1turn4search0turn13search4turn14search1
The largest correctness risks cluster into four areas:
- Principal taxonomy ambiguity (human vs non-human vs agent; identity vs credential): Several integration patterns (PATs, job tokens, delegated OAuth flows, workload federation) require a very strict separation between principal and credential. OAuth and OIDC standards, plus modern workload federation patterns, make this separation non-negotiable for correctness, blast-radius fidelity, and auditability. citeturn5search1turn0search1turn6search0turn7search1
- Authorization semantics beyond simple RBAC: Major target environments (AWS, GCP, Kubernetes, data platforms) require representing policy conditions, resource hierarchy, and evaluation constraints (e.g., AWS identity vs resource-based policies, permission boundaries / session policies, and “deny-by-default” semantics). A model that only supports “Identity → Role → Permission → Resource” without first-class conditionality risks false positives/negatives when you extend beyond the initial environments. citeturn10search0turn10search5turn13search1
- Graph scale and document bloat: Materializing large execution paths and reverse reachability lists inside MongoDB documents can hit both performance and document size ceilings as tenants grow (especially once you add multi-cloud, Kubernetes, CI/CD, and data platforms). Your own docs anticipate a graph index (e.g., Neo4j) at scale; the risk is that you’ll hit practical scaling limits before that threshold if execution paths are high-cardinality or frequently recomputed. citeturn1search6turn10search5
- Evidence “immutability” requires a cryptographic trust anchor: Hashing evidence packs is a solid start, but tamper evidence only becomes “audit-grade” if the hash is anchored to a signing key and/or an immutable storage control outside the mutable primary database. NIST log-management guidance emphasizes integrity protections and governance around log/evidence handling; the same logic applies to evidence packs. citeturn13search4turn1search6
Overall fit for future integrations: with modest schema hardening (especially around principal classification, policy conditionality, resource scoping, and evidence integrity anchoring), your 9-type graph can generalize to major cloud providers, Kubernetes, CI/CD OIDC federation, Databricks, Snowflake, identity providers, and on-prem—but you should expect schema evolution for (a) policy/binding objects and (b) ephemeral session identities. citeturn6search0turn10search2turn8search2turn11search0turn12search9
Assumptions and evaluation criteria
Assumptions
Unspecified constraints are treated as “no specific constraint.” In addition, this validation assumes:
- The platform’s intent is visibility and evidence, not enforcement (consistent with a mission-control / system-of-record posture). citeturn14search1
- Integrations will expand from early targets to a mix of SaaS, IaaS/PaaS, Kubernetes, CI/CD, and on-prem with heterogeneous authN/authZ models, including OAuth2/OIDC, SAML, SCIM provisioning, API keys/PATs, certificates, and workload federation. citeturn0search1turn5search39turn1search1turn6search0
- “Evidence-grade” means tamper-evident, time-bounded, provable chain-of-custody, and reproducible outputs suitable for audits and incident response—not merely “exportable JSON.” citeturn13search4turn1search6turn14search1
Documents reviewed
These findings are grounded in your provided documents (linked for traceability):
- SecurityV0 Platform Architecture Overview
- Database Architecture
- Data Model
- API Layer
- Connector Framework
- SecurityV0 Vision Summary
- ServiceNow Pilot PRD – Phase 1
- ServiceNow Phase 2 Scope
Validation criteria (what “correct” means here)
The architecture and data model are assessed against:
- Identity correctness: explicit separation of principal identity vs credential/authenticator, and accurate semantics for federation, delegation, impersonation, and token exchange. citeturn4search0turn5search1turn7search1
- Authorization model coverage: RBAC plus the minimum required structure to represent ABAC-like conditions and policy evaluation constraints. citeturn13search1turn10search5turn8search0
- Trust boundaries and tenancy mapping: ability to prevent cross-tenant confusion, correctly scope identities/resources to the right cloud account/directory/workspace/cluster/instance, and encode “zone of trust.” citeturn6search0turn10search2
- Auditability & evidence: integrity, provenance, and replayability consistent with log-management and security-control expectations. citeturn13search4turn1search6
- Scalability and operability: connector rate limits + incremental sync patterns, eventing vs polling, and graph query performance under realistic enterprise cardinalities. citeturn10search5turn12search0turn6search0
Architecture correctness assessment with risks and mitigations
What is correct and robust
Your architecture (as documented) makes several correct strategic calls for an “autonomous execution” system-of-record:
- Connector contract + normalization boundary: Treating connectors as independent extract/transform producers and centralizing diff/load and data-store concerns is correct for scaling integration development and preserving determinism. This aligns with the reality that each platform has its own API models, rate limits, and logging surfaces. citeturn6search0turn12search0turn8search2
- Temporal-first design: Modeling drift over time is the correct lens for the governance gap you target. Most identity ecosystems now emphasize short-lived tokens, frequent permission changes, and federated trust configurations that evolve continuously. citeturn6search0turn5search1turn10search5
- Separation of execution chain vs authorization path: This is conceptually strong and matches real-world autonomy: automation code “runs as” a principal; that principal holds authority. This is consistent with how OAuth-based service-to-service access, workload federation, and platform-managed identities actually work. citeturn0search2turn7search3turn6search4
Architecture risks that become acute at “broad enterprise coverage”
Risk: Evidence packs are “hashed” but not inherently tamper-resistant
A SHA-256 hash stored alongside the evidence in the same mutable datastore is not a complete integrity story if an attacker (or even an over-privileged admin) can rewrite both the evidence and the hash. NIST log management guidance emphasizes integrity, retention governance, and trustworthy handling for security-relevant records; evidence packs are effectively a specialized log artifact. citeturn13search4turn1search6
Mitigations (practical, incremental):
- Sign evidence packs using a managed signing key (KMS/HSM-backed) and store the detached signature and signing certificate chain; optionally timestamp via an RFC 3161-style TSA if you need stronger non-repudiation. (This is a standard approach for making “evidence-grade” outputs independently verifiable.)
- Externalize immutable storage for sealed packs (e.g., object storage with WORM / retention controls) while still indexing metadata in MongoDB.
- Control mapping: tie this to NIST 800-53 families that expect strong audit/log protections (AU), incident response support (IR), and system/information integrity (SI). citeturn1search6turn13search4
Risk: MongoDB document growth from materialized paths and reverse lookups
Modern enterprise environments can easily produce very high-cardinality effective access sets (especially on data platforms and cloud IAM). If you embed large execution_paths arrays on identities and accessible_by arrays on resources, you risk:
- document size limit pressure,
- “hot document” update contention during syncs, and
- expensive recompute cycles when many relationships change. citeturn10search5turn8search0
Your database design correctly anticipates that a graph index may be needed at scale; the key issue is when, not if, particularly once you ingest cloud policy and Kubernetes RBAC at enterprise sizes. citeturn10search5turn8search2
Mitigations:
- Move materialized paths into a dedicated edge/materialization collection (appendable, chunked) rather than embedding unbounded arrays in “identity” documents.
- Introduce a graph-compute pipeline with incremental recompute based on changed subgraphs (rather than recomputing all reachable paths per affected identity).
- Treat Neo4j (or another graph index) as an index/compute accelerator rather than a second source of truth; keep MongoDB authoritative, but offload k-hop traversals and reachability caching. citeturn10search5turn14search6
Risk: Connector permissions and rate-limit behavior can undermine determinism and completeness
Major APIs often throttle, paginate inconsistently, and provide partial audit log retention. GitHub OIDC tokens, cloud workload federation, Kubernetes ServiceAccount token validation patterns, and identity audit APIs all impose constraints that can create “blind spots” if you do not (a) model evidence completeness and (b) preserve exact sync cursor semantics. citeturn12search0turn8search2turn6search0
Mitigations:
- Preserve and surface an explicit evidence completeness state for every finding and reachability claim (e.g., “proof-grade,” “config-grade,” “unlinked,” “not-applicable”), consistent with your deterministic goal. citeturn13search4turn10search5
- Prefer audit-log-driven deltas where available; otherwise use incremental APIs (delta tokens, cursors) and persist cursors as first-class data. citeturn6search0turn10search5
- Build a connector discipline around “deterministic joins only,” consistent with OAuth/OIDC’s security properties: issuer/subject/audience and stable IDs must drive correlation, not string matching. citeturn0search1turn7search0turn12search2
Risk/priority chart
Data model validation with entity-relationship table and suggested schema changes
Core correctness of the proposed entity system
Your “9 entity types + typed relationships” model is a strong minimum spanning set for “autonomous execution” because it cleanly separates:
- Principal (Identity): who can authenticate and act,
- Executable logic (Automation): what causes actions to occur,
- Auth material (Credential) and destination (Connection): how actions cross trust boundaries, and
- Authority (Role/Permission/Resource): what the acting principal can do. citeturn0search2turn5search1turn10search0
This separation matches modern standards:
- OAuth 2.0 defines flows where tokens are issued to clients and used at protected resources; treating tokens/keys as credentials rather than identities reduces modeling error. citeturn0search2turn5search1
- OIDC builds an identity layer on OAuth; issuer/subject/audience claims are foundational for verifying who a token represents and whether it was minted for the right target. citeturn0search1turn12search2
- Token exchange (STS) explicitly introduces impersonation and delegation semantics (subject token vs actor token), which strongly suggests you should model delegation chains as first-class relationships rather than burying them in opaque metadata. citeturn4search0turn5search1
Conceptual ER diagram
Entity-relationship table (what must be representable everywhere)
| Conceptual object | Maps to your entity type | Why it matters for broad integrations | Minimum attributes for correctness |
|---|---|---|---|
| Principal that can authenticate | identity | Needed for all autonomous / workload / service access, and for federation where principals are externalized | principal_kind (human/workload/agent/system), stable source_id, source_scope (tenant/account/workspace), lifecycle state |
| Executable logic / job / workflow | automation | CI/CD pipelines, schedulers, workflows, agents, serverless functions | trigger type, execution mode, code provenance (repo/ref/build), runtime environment (cluster/workspace/project) |
| Outbound integration endpoint | connection | Defines trust boundary crossings (targets, protocols, egress classification) | URL/host, protocol, target system identifier, network boundary classification |
| Authenticator / secret / token / trust config | credential | Supports client secrets, certs, PATs, federation trust, key pairs, token exchange | credential type (secret/cert/PAT/OIDC/SAML/federation/assume-role), issuer, subject, audience, expiry/rotation, last-used evidence |
| Business accountability | owner | Governance and ownership decay, assignment history | owner subtype (human/team/BU/org), status, authoritative source |
| Grouping of privileges | role | RBAC across cloud/SaaS/K8s/data platforms | role name/id, source system, role scope |
| Atomic privileges | permission | Needed for explainable blast radius and drift | action/verb, effect, condition (if any), target resource patterns |
| Protected object | resource | Needed to show “what data/system can be touched” | resource kind, hierarchical path, sensitivity classification, residency/scope tags |
| Proof of execution | execution_evidence | Bridges “standing authority” to observed execution | timestamp, actor principal, target resource/action, source log reference, integrity metadata |
Suggested schema changes (high-value, minimal-disruption)
The following changes are recommended because they reduce ambiguity and improve fit for AWS/GCP/Kubernetes/CI/CD/data platforms without forcing a wholesale redesign.
Add explicit “source scope” and “trust boundary” fields on every entity
Problem: source_system + source_id is not enough to prevent accidental collisions or incorrect joins when the same connector type is deployed across multiple directories/accounts/instances (e.g., multiple AWS accounts, multiple Kubernetes clusters, multiple Snowflake accounts, multiple SaaS tenants). This becomes a correctness and security issue in multi-tenant aggregations. citeturn6search0turn10search2turn8search0
Change: add a normalized source_scope object to all entities (and reuse it in relationship evidence):
scope_type:entra_tenant|aws_account|gcp_project|k8s_cluster|snowflake_account|databricks_workspace|servicenow_instance| etc.scope_id: stable canonical ID (tenant GUID, account number, cluster UID, instance URL).- Optional:
region,org_id,environmentwhere relevant.
This aligns directly with workload federation realities where issuer/subject/audience are scoped and validated per trust configuration. citeturn7search0turn6search0turn12search2
Treat “principal kind” as a first-class discriminator, not only a subtype
Problem: Across vendors, the same “identity” label can represent: a human user, a service principal, an application instance, an agent, an ephemeral job identity, or a service account. Modern ecosystems also distinguish “app-only” vs “delegated” permissions (and workload identity objects vs user objects). citeturn7search2turn11search1turn12search4
Change: add principal_kind and principal_execution_style:
principal_kind:human|workload|agent|systemprincipal_execution_style:interactive_session|headless_daemon|federated_ephemeral|unknown
This is consistent with NIST’s identity framing (separating authenticators, lifecycle, and assurance concepts) and with OIDC/OAuth token semantics. citeturn0search1turn0search7turn5search1
Elevate policy conditionality into a normalized structure
Problem: Cloud IAM is not only “RBAC.” Conditions, boundaries, and resource-based policies change effective permissions materially. AWS explicitly distinguishes identity-based vs resource-based policies and provides evaluation logic where explicit deny overrides allow, and where boundaries/SCPs/session policies can override. citeturn10search0turn10search5
Change: add a normalized policy_condition structure under permission.properties (or as a separate “policy statement” entity if you prefer), capturing:
condition_language:aws_iam_condition|cel|rego|unknownraw_condition: machine-parsable blob (allowlisted)normalized_keys: extracted key/value constraints when deterministic.
This enables explainability (“allowed because condition X matched”) without requiring probabilistic inference. citeturn13search1turn10search5turn6search0
Separate high-cardinality materializations from core entity documents
Problem: “Execution paths” and reverse reachability can grow extremely large (think: data platforms, cloud storage, Kubernetes secrets). citeturn8search0turn12search9turn10search5
Change: represent them as:
effective_access_edgescollection (or similar) keyed by(tenant_id, identity_id, resource_id, action)with a computed timestamp and evidence grade.- Keep a small summary counter on identity/resource documents for UI and filtering.
This preserves your explainability while controlling document growth.
Distinguishing autonomous automation and separate identities
The conceptual distinction is correct—but must be hardened for “agentic + federated” reality
Your current separation (automation = executable logic; identity = authenticating principal; credential = authenticator; connection = outbound target) is the right abstraction for modern enterprise autonomy. It matches how:
- OAuth client credentials enable app-only access without a human session. citeturn0search2turn5search1
- Workload identity federation replaces long-lived secrets with federated tokens exchanged for short-lived access, removing key management burden. citeturn7search1turn6search0turn6search4
- CI/CD systems mint job-scoped OIDC tokens that cloud providers validate via issuer/subject/audience constraints, yielding short-lived cloud access tokens. citeturn12search2turn11search5turn10search2
However, two boundary cases will pressure your taxonomy if not made explicit:
- Agent systems can be both “automation” (code that runs) and “identity” (principal that authenticates), and can spawn sub-agents/jobs with ephemeral credentials.
- Delegated vs app-only identity: An application object can exist without credentials but act only via delegated user flows; conversely it can act autonomously only when it uses app-only credentials (client credentials, managed identity, federation). citeturn7search2turn11search1turn0search2
A practical classification you can standardize on
To keep your model deterministic and explainable, I recommend you standardize on these definitions:
- Human identity: a principal representing a person (employee/contractor). Usually interactive, but can possess long-lived tokens (PATs) that function as credentials. citeturn0search7turn12search9
- Service identity (workload identity): a principal representing software acting as itself (service principal, service account, managed identity, cloud role). May authenticate via secrets, certs, or federation. citeturn7search2turn7search3turn6search4
- Autonomous agent identity: a special case of workload identity where the automation is adaptive/orchestrated (but still must authenticate via the same primitives: tokens, certs, federation). In practice, treat “agent” as a principal_kind and the running components as automations.
Then enforce:
- PATs/API keys are always credentials, never identities.
- Federation trust configurations (OIDC provider, federation credential bindings) are credentials of type
federation_trust(not identities), because they are authenticators/trust rules, not actors. citeturn6search0turn10search2turn7search0 - Token exchange and delegation must produce a deterministic chain: actor principal → (token exchange) → delegated principal, consistent with OAuth Token Exchange semantics. citeturn4search0turn5search1
Integration fit analysis across SaaS, cloud, Kubernetes, data platforms, CI/CD, IdPs, and on-prem
Mapping future integration families to your model
Your 9 entity types can generalize across major environments if you explicitly support: (a) federation/token exchange, (b) policy conditionality, and (c) resource scoping. The table below describes high-value targets and what they imply.
image_group{"layout":"carousel","aspect_ratio":"16:9","query":["workload identity federation diagram OIDC token exchange","Kubernetes service account token projected volume diagram","AWS IAM role trust policy OIDC provider diagram","Snowflake role based access control diagram"],"num_per_query":1}
Entity types across integrations and whether they fit
| Integration family | Typical real entities encountered | Does your model fit “as-is”? | Where changes are likely needed |
|---|---|---|---|
| Major cloud IAM (AWS) | IAM users/roles, policies, trust policies, resource policies; STS sessions; OIDC providers | Mostly yes | Represent policy evaluation constraints (identity vs resource-based, explicit deny, boundaries/session policies) citeturn10search0turn10search5turn10search2 |
| Major cloud IAM (GCP) | Principals, IAM roles, IAM policy bindings, conditional bindings (CEL), workload identity pools/providers | Mostly yes | Model hierarchy + conditional bindings + federation as token exchange citeturn6search0turn13search1turn4search0 |
| Major cloud IAM (Azure) | Application objects vs service principals, managed identities, RBAC roles; workload identity federation | Yes | Need principled modeling of app object vs tenant instance; federation credential limits and issuer/subject validation citeturn7search2turn7search1turn7search0 |
| Kubernetes clusters | ServiceAccounts, Role/ClusterRole, RoleBindings; projected tokens; TokenReview | Yes with scoping | Need explicit cluster+namespace scoping; token audience & token binding semantics for evidence-grade claims citeturn8search2turn8search0 |
| CI/CD (GitHub Actions) | Workflow jobs, OIDC job token, cloud role trust conditions, short-lived cloud creds | Yes with federation support | Treat job as automation, OIDC token as credential, role session as identity; model claim constraints and audience citeturn12search2turn12search1turn10search2 |
| CI/CD (GitLab) | CI job, OIDC ID token, vault/cloud auth with bounded claims | Yes with federation support | Same as above; ensure bounded-claim modeling (issuer/subject/aud) citeturn11search5turn11search8 |
| Data platforms (Databricks) | Service principals, OAuth tokens vs PATs, app authorization vs user authorization | Yes | Must represent U2M vs M2M flows explicitly; avoid modeling PAT as identity citeturn11search0turn11search1 |
| Data platforms (Snowflake) | Roles, privileges, OAuth client credentials, key-pair auth, PATs, external OAuth integration | Yes | Need object hierarchy + external OAuth integration properties; treat keys/PATs as credentials citeturn12search9turn12search3turn12search5 |
| Identity providers (OIDC/SAML/SCIM) | SAML assertions, OIDC claims, SCIM users/groups, federation trusts | Yes | Requires schema extensions for SCIM beyond users/groups and for non-human principals citeturn0search1turn5search39turn1search1 |
| On-prem (AD/Kerberos/LDAP, legacy apps) | Users/service accounts, groups, ACLs, Kerberos tickets, X.509, app-to-app secrets | Mostly yes | Model legacy authenticators and protocol-specific evidence; add explicit trust boundary markers citeturn13search4turn5search39 |
Mapping integration patterns to entities, relationships, and required attributes
This table is the most operationally useful “fit check” for connector builders and schema reviewers.
| Integration/auth pattern | Entities involved | Required relationships | Required attributes/dimensions to preserve correctness |
|---|---|---|---|
| OAuth2 client credentials (secret/cert) | automation, connection, credential, identity | automation→connection (INVOKES); connection→credential (USES); credential→identity (AUTHENTICATES_AS) | grant type, client_id, token endpoint/issuer, credential expiry, rotation state, scopes/roles granted citeturn0search2turn5search1 |
| Managed identity (cloud metadata token service) | automation, identity, credential | automation RUNS_AS identity; identity obtains tokens via platform | identity lifecycle tied to resource (system-assigned) vs standalone (user-assigned); no secret rotation requirement but still track permissions citeturn7search3turn7search6 |
| Workload federation (OIDC → cloud STS) | automation, credential, identity, role/permission/resource | automation obtains OIDC token (credential); token exchanged into cloud access identity | issuer/subject/audience validation, token exchange linkage, time-bounded tokens, claim constraints to prevent confused deputy citeturn4search0turn12search2turn6search0turn10search2 |
| SAML assertion grant | automation, credential (assertion), identity | credential assertion presented to token endpoint; yields authorized identity | assertion audience restriction, subject confirmation, binding to token endpoint; expiry and replay considerations citeturn3search0turn5search39 |
| SCIM provisioning visibility | identity (human + service), role/group, owner | SCIM resource mapping; membership relationships | SCIM schema + extensions; tenancy/instance scoping; immutable IDs; least-privilege sync service account citeturn1search1turn1search5 |
| Kubernetes ServiceAccount token auth | automation/workload, identity (serviceaccount), credential (bound token), role/permission/resource | serviceaccount has RBAC bindings; token is credential evidence | token audience; token binding to pod; TokenReview-based validation; namespace scope; cluster UID citeturn8search2turn8search0 |
| SPIFFE mTLS identity | identity (workload), credential (X.509-SVID), connection | workload proves identity via SVID; mTLS channel | trust domain, SVID lifetime/rotation, attestation method; prefer X.509-SVIDs over JWT-SVID where possible citeturn9search0turn9search5 |
| AWS identity vs resource-based policy authorization | identity, role, permission(policy stmt), resource | role/policy enables actions; resource-based policy may directly grant | effect/condition/principal/resource patterns; explicit deny precedence; session policies/boundaries/SCP involvement citeturn10search0turn10search5 |
| ABAC-like conditional policies | identity, permission, resource | permission includes conditionality | subject/object/environment attributes; policy language; deterministically extractable constraints citeturn13search1turn6search0 |
Roadmap, prioritized recommendations, and migration plan
Roadmap of changes to support broad SaaS/cloud/on-prem coverage
Phase A: Schema hardening for correctness (no new connectors required)
Deliverables are primarily data-model and ingestion-contract refinements:
- Add
source_scopeto every entity and encode it into cross-system relationships for deterministic joins. citeturn6search0turn7search0 - Add
principal_kindandprincipal_execution_styleand enforce “credential ≠ identity” as an invariant (especially for PATs and job tokens). citeturn12search9turn12search2turn5search1 - Add normalized
policy_conditionrepresentation and a minimal hierarchical resource path model (“org/account/project/workspace/cluster/namespace/object”). citeturn10search5turn13search1turn8search0 - Move high-cardinality reachability materializations to dedicated collections.
Phase B: Federation-first connectors (CI/CD + cloud federation + Kubernetes)
This is where autonomous execution becomes cross-platform and your architecture differentiates:
- CI/CD OIDC (GitHub Actions + GitLab) modeling: job automation + minted ID token + cloud trust config + resulting session identity. citeturn12search2turn11search5turn10search2
- Cloud workload federation modeling:
- GCP Workload Identity Federation (pools/providers, token exchange). citeturn6search0turn4search0
- Entra workload identity federation (federated identity credentials with issuer/subject validation). citeturn7search1turn7search0
- AWS OIDC provider + role trust conditions. citeturn10search2turn6search4
- Kubernetes cluster identity/RBAC ingestion: service accounts, role bindings, and token semantics for evidence. citeturn8search2turn8search0
Phase C: Data platform depth (Databricks + Snowflake) and policy richness
- Databricks: unify service principal inventory, OAuth M2M tokens, and explicitly represent app identity vs user identity authorization. citeturn11search0turn11search1
- Snowflake: roles/privileges/object hierarchy plus external OAuth integration and secrets objects. citeturn12search9turn12search3turn12search5
Phase D: Evidence grade + governance exports
- Cryptographic signing + immutable retention strategy for evidence packs and key lifecycle controls aligned to audit expectations. citeturn13search4turn1search6
- SCIM export as interoperability surface; expect schema extensions for non-human identities. citeturn1search1turn1search5
Phase E: Scale-out graph compute
- Introduce a graph index/engine when tenant cardinalities and k-hop queries demand it; ensure MongoDB remains source of truth and the graph store is a derived index. citeturn10search5turn14search6
Prioritized recommendations
Security recommendations
- Anchor evidence integrity with signing keys and immutable retention controls; do not rely on “hash stored next to content” as the end state. citeturn13search4turn1search6
- Model token exchange and delegation explicitly using actor/subject semantics and claims-based constraints (issuer/subject/audience), consistent with OAuth Token Exchange and OIDC verification patterns. citeturn4search0turn0search1turn12search2
- Treat policy conditionality as first-class (especially AWS/GCP), because explicit deny and conditional grants change effective access materially. citeturn10search5turn6search0turn13search1
- Make “zone of trust” computable by encoding source scope, tenant mapping, and trust boundary crossings on every reachability edge. This aligns with zero trust’s emphasis on identity-centric policy rather than perimeter assumptions. citeturn14search1turn10search2
Operational recommendations
- Separate high-cardinality materializations into dedicated collections to avoid document bloat and reduce recompute contention.
- Design connectors around deterministic completeness states, including explicit reporting for audit-log availability/retention gaps. citeturn13search4turn6search0
- Plan for rate limits and cursor correctness as part of your “evidence-grade” claim: if data cannot be retrieved reliably, your findings must reflect that deterministically. citeturn12search0turn6search0
Product recommendations
- Standardize identity vocabulary (human vs workload vs agent; app-only vs delegated) and use it consistently in UI, APIs, and exports. This prevents category errors that will otherwise surface as confusing findings. citeturn7search2turn11search1turn0search2
- Explicitly separate standing authority vs observed execution in the user experience, with a clear evidence grade and provenance chain per finding. citeturn13search4turn1search6
- Make “trust boundary crossing” a first-class narrative: for autonomy risk, the most important paths are those that cross systems via federation, token exchange, and outbound connections. citeturn4search0turn6search0turn12search1
Migration and implementation plan with milestones
Milestone 1: Schema invariants and taxonomy lock (foundation)
- Implement
source_scope,principal_kind, and a strict “PAT/API key/job token = credential” rule. - Add a deterministic join contract: issuer/subject/audience (for federation) and stable IDs (for intra-system). citeturn7search0turn12search2turn0search1
Milestone 2: Policy conditionality and hierarchy (cloud readiness)
- Add normalized condition model and hierarchical resource addressing.
- Extend permission normalization rules to preserve effect/deny and condition fragments so explainability remains intact in AWS/GCP-style evaluations. citeturn10search5turn13search1turn10search0
Milestone 3: Materialization refactor (scale readiness)
- Move reachability materializations to dedicated collections; implement incremental recompute based on changed subgraph.
- Add guardrails: maximum path depth per default tenant tier; cached summaries for UI. citeturn10search5turn14search6
Milestone 4: Federation-first connector tranche (largest leverage)
- CI/CD OIDC connectors (GitHub Actions + GitLab). citeturn12search0turn11search5
- Cloud federation connectors (AWS OIDC provider + assume-role; GCP WIF; Entra WIF). citeturn10search2turn6search0turn7search1
- Kubernetes RBAC + ServiceAccount token semantics. citeturn8search2turn8search0
Milestone 5: Data platform connectors (blast radius into “data”)
- Databricks service principals + OAuth/PAT classification + UC privileges. citeturn11search0turn11search1
- Snowflake auth methods + role/privilege graph + external OAuth + secrets. citeturn12search9turn12search3turn12search5
Milestone 6: Evidence-grade hardening
- Sign evidence packs, store externally immutably, expose independent verification artifacts.
- Align audit and IR operational control expectations using NIST log guidance and control families. citeturn13search4turn1search6
Milestone 7: Graph acceleration
- Introduce a derived graph index when your observed tenant sizes and hop counts require it; ensure derived index rebuild is deterministic and verifiable against MongoDB source-of-truth. citeturn10search5turn14search6
Next Action
Status: adopted — shipped
V1 validation confirmed the entity model and authority path approach. Findings were refined further in V2 and incorporated into data model hardening in 01-data-model.md. No further action required.