API Data Quality & Security Audit

Summary

Endpoints tested: 26 (all listed endpoints + edge cases)
Findings: 1 CRITICAL, 3 HIGH, 5 MEDIUM, 4 INFO
Data quality issues: 6
All endpoints return valid JSON: Yes
Response structures match TypeScript types: Mostly (exceptions noted)

Part 1: Security Findings

CRITICAL

[C1] Auth Middleware Does Not Verify JWT Signatures

File: src/api/middleware/auth.ts:116-136
Issue: The decodeJwtPayload() function decodes JWT payloads without verifying signatures. The code explicitly acknowledges this with a WARNING comment. Any client can craft a JWT with arbitrary sub, tid, and scopes claims and the server will trust it.
Impact: An attacker can impersonate any user, claim any tenant ID, and gain any scope by constructing a fake JWT. This bypasses all authorization if REQUIRE_AUTH=true and Bearer tokens are used.
Fix: Implement proper JWT verification using jose library with JWKS endpoint validation from the IdP, as the comment already suggests for Phase 2. Until then, this is a critical vulnerability if Bearer auth is enabled in production.

HIGH

[H1] Missing Tenant Header Returns 200 Instead of 400/401

File: src/api/middleware/auth.ts:29-35
Issue: When REQUIRE_AUTH=false (current production configuration), requests without an X-Tenant-Id header are accepted and default to "dev-tenant". The API returns HTTP 200 with empty data arrays instead of rejecting the request. This means unauthenticated requests succeed silently.
Impact: No data leaks because the dev-tenant has no data, but the API accepts requests without any tenant context and returns a valid response shape. Clients cannot distinguish between "no tenant header" and "tenant exists but has no data." If dev-tenant ever gets data seeded, it would be exposed to anyone.
Tested:
- GET /api/v1/findings (no header) -> 200, empty data
- GET /api/v1/findings (empty header) -> 200, empty data
- GET /api/v1/findings -H "X-Tenant-Id: other-tenant" -> 200, empty data
Fix: In production, set REQUIRE_AUTH=true and ensure the requireTenant middleware (src/api/middleware/require-tenant.ts) is applied before all API routes. Currently it exists but its enforcement depends on the auth middleware running first.

[H2] Diagnostics and Metrics Endpoints Served by UI SPA Catch-All

File: src/api/routes/system.ts:53-69
Issue: The /diagnostics and /metrics endpoints are defined in the backend but on the production domain (app.securityv0.com) they return the UI SPA HTML instead. This indicates the reverse proxy is routing /diagnostics and /metrics to the UI static server rather than the API. The /diagnostics endpoint exposes process.memoryUsage(), process.uptime(), process.version, and worker queue details.
Impact: If the proxy routing is corrected, /diagnostics would leak internal server details (memory, uptime, Node version, worker status) to any request with a tenant header. The endpoint only checks for x-tenant-id presence, not validity.
Fix: Either (1) remove the /diagnostics endpoint in production, (2) require proper authentication, or (3) ensure the reverse proxy blocks it. The /metrics endpoint should only be accessible to internal monitoring.

[H3] Entity tenant_id Exposed in API Responses

File: src/api/routes/entities.ts:70-80, src/api/routes/exposures.ts:333-371
Issue: Entity detail and exposure detail responses include the raw tenant_id field from the database document. While this is not a cross-tenant leak (the data is correct), including tenant_id in API responses exposes internal infrastructure details to clients.
Tested: GET /api/v1/entities/{id} -> response includes "tenant_id": "demo-w1" in the entity object. Also visible in: authority path findings, execution evidence records, timeline events, and execution chains.
Impact: Low risk since the client already sends the tenant header, but it reveals the internal tenant naming convention and could be used for targeted attacks on other tenants.
Fix: Strip tenant_id from API response serialization. The NormalizedFinding type in api-types.ts does not include tenant_id, but the raw EntityDoc type does -- the entity routes return raw MongoDB documents without filtering.

MEDIUM

[M1] No Rate Limiting on Production API

File: src/api/middleware/rate-limit.ts
Issue: Rate limiting middleware exists but it is unclear whether it is enabled in production. The exposed endpoints (findings, entities, subgraph) could be used for resource exhaustion if not rate-limited.
Fix: Verify rate limiting is active on the production reverse proxy or enable the middleware.

Part 2: Data Quality Issues

HIGH

[DQ1] Posture Summary Path Count Does Not Match Authority Paths List

Issue: The posture summary endpoint reports active_paths: 29, dormant_paths: 3 (total: 32), but the authority-paths list endpoint returns total_count: 30 with default filters. Additionally, within the 30 listed paths, 25 have execution_30d > 0 and 5 have execution_30d == 0, which also does not match the posture summary's 29/3 split.
Tested:
- GET /api/v1/posture/summary -> {"active_paths": 29, "dormant_paths": 3}
- GET /api/v1/authority-paths?limit=50 -> {"meta": {"total_count": 30}}
- GET /api/v1/authority-paths?status=removed -> {"meta": {"total_count": 1}}
- Manual count of returned paths: 25 with execution, 5 without
Impact: The posture summary counts are likely computed differently than the list API. The PostureService may include removed paths or use a different dormancy definition than simple execution_30d == 0. This creates confusion in the UI where the dashboard number does not match the detail table.
Fix: Align the PostureService computation with the authority-path list definition, or document the difference. The posture summary should either (a) count only active-status paths, or (b) explain what additional paths are included.

[DQ2] Findings bySeverity/byType Reflects Current Page, Not Total Dataset

Issue: The meta.bySeverity and meta.byType breakdowns in the findings list response are computed from the current page of results, not from the total dataset. When has_more: true, these breakdowns are misleading.
Tested:
- GET /api/v1/findings?source=evaluator&limit=3 -> total_count=205, bySeverity sum=3
- GET /api/v1/findings?source=evaluator&limit=200 -> total_count=205, bySeverity sum=200
File: src/api/routes/findings.ts:158-167 -- the loop that builds bySeverity and byType iterates over the normalized array after pagination truncation.
Impact: The UI severity breakdown pie chart / badges will only reflect the visible page. If a user loads 50 findings sorted by date, the severity breakdown may show 0 critical even though there are critical findings on later pages.
Fix: Compute bySeverity and byType before pagination truncation (from the full result set), or use a separate aggregation query. Alternatively, add a note in the API docs that these reflect the current page.

[DQ3] Evidence Pack scope_drift_detail Has Inconsistent added_roles

Issue: For finding eval:080ff3bc4c6fbce622fe9b94f23d2da2, the evidence pack shows baseline_role_count: 0, current_role_count: 2, but added_roles: [] (empty). The finding's evidence_refs correctly lists new_roles: ["ap_write", "ar_write"], but the scope_drift_detail section does not populate its added_roles array. Furthermore, the remediation text says "Review 0 role assignment(s) added since baseline" which is factually wrong -- 2 roles were added.
Tested: GET /api/v1/findings/eval:080ff3bc4c6fbce622fe9b94f23d2da2/evidence-pack
- scope_drift_detail.added_roles = []
- scope_drift_detail.blast_radius_before = scope_drift_detail.blast_radius_after (identical, suggesting no expansion computed)
- scope_drift_detail.new_resources = []
Impact: The evidence pack provides incorrect forensic data. An auditor reading the evidence pack would see "0 roles added" despite the finding text saying "gained 'ap_write', 'ar_write'". This undermines the "evidence-grade" design constraint.
Fix: The evidence pack assembly for scope_drift_detail needs to populate added_roles from evidence_refs.new_roles and compute before/after blast radius correctly.

MEDIUM

[DQ4] Exposure Detail: Identity Node Labels Show Raw IDs Instead of Display Names

Issue: In the GET /api/v1/exposures/{id} response, authority path nodes of type "identity" have their label set to the raw entity ID instead of the identity's display name. For example, identity 7b9b2ccee67941b6448175ea appears as label "7b9b2ccee67941b6448175ea" instead of "svc-foundry-ascribe-prod".
File: src/api/routes/exposures.ts:286 -- the identity node is built with label: p.via_identity where p.via_identity is just the entity ID string, not the display name.
Tested: GET /api/v1/exposures/f6787c51af06a78527269d57 -- all 5 authority path identity nodes show raw ID as label.
Impact: The UI graph visualization shows raw hex IDs for identity nodes instead of human-readable names. This is a user-facing display issue that makes the authority paths harder to understand.
Fix: In exposures.ts:286, resolve p.via_identity to a display name by looking up the identity entity from the identityMap (already fetched for the identities array). Change label: p.via_identity to label: identityEntities.find(e => e._id === p.via_identity)?.properties.display_name ?? p.via_identity.

[DQ5] Execution Evidence Records Have Empty target_resource and payload_hash

Issue: All execution evidence records for entity f6787c51af06a78527269d57 have empty strings for target_resource and payload_hash fields. The ExecutionEvidenceRecord type in api-types.ts defines target_resource: string and payload_hash: string as required fields.
Tested: GET /api/v1/entities/f6787c51af06a78527269d57/execution-evidence -- all 70 records have target_resource: "" and payload_hash: "".
Impact: The execution evidence lacks specificity about which resource was accessed. For the "evidence-grade" design constraint, knowing the target of each API call is critical for proving whether a specific execution path was exercised. Without target_resource, the evidence can only prove that execution occurred, not what was accessed.
Fix: Populate target_resource during the connector transform or seed script. If the source data does not contain this information, update evidence_completeness.notes to indicate this limitation.

[DQ6] Remediation applies_to Uses Generic Labels Instead of Object Names

Issue: Path-level remediation actions use generic applies_to labels like "LLM endpoint access", "egress path", "execution path" instead of specific object names. The resolveAppliesTo() function in remediation-service.ts:576-607 can resolve names, but the authority-paths route handler at src/api/routes/authority-paths.ts:108 does not pass entityContext to generatePathRemediation().
Tested: GET /api/v1/authority-paths/0a3a4bb896821dc3813c9608/remediation
- applies_to: "LLM endpoint access" instead of "Egress: Agent Ascribe_Summarizer"
- applies_to: "egress path" instead of "Egress: Agent Ascribe_Summarizer"
- applies_to: "execution path" instead of "Path: Agent Ascribe_Summarizer -> Billing_Payment_Methods"
Note: The cluster remediation endpoint does pass entityContext and produces enriched labels like "Egress: Agent Ascribe_Summarizer". The path-level endpoint is inconsistent.
File: src/api/routes/authority-paths.ts:108 -- generatePathRemediation(path, findings) missing entityContext parameter.

Fix: In the authority-paths remediation route, resolve the workload, identity, and destination entities and pass them as RemediationEntityContext:

const workload = await storageAdapter.getEntity(tenantId, path.workload_id);
const dest = await storageAdapter.getEntity(tenantId, path.destination_id);
const entityContext = {
  workload_name: workload?.properties.display_name ?? undefined,
  destination_name: dest?.properties.display_name ?? undefined,
};
const guidance = generatePathRemediation(path, findings, entityContext);

Part 3: Cross-Endpoint Data Consistency

Entity Consistency (5 entities checked)

Entity	Entity Detail	Exposure	Blast Radius	Findings	Consistent?
Agent Ascribe_Summarizer (`f678...`)	workload, orphaned, 5 paths	orphaned, bound, 5 paths	5 paths, customer domain	30 findings	Yes
Invoice Processing Rule (`c32d...`)	workload, owned, 3 paths	exists in exposures	3 paths, finance domain	referenced	Yes
AI Assist Flow (`b9fb...`)	workload, ambiguous	owned (exposure), unbound	2 paths	6+ findings	Inconsistent
Compliance Audit Exporter (`faf2...`)	workload, owned	exists in exposures	exists	referenced	Yes
Security Log Collector (`da76...`)	workload, owned	exists in exposures	exists	referenced	Yes

AI Assist Flow Inconsistency: The entity detail shows ownership_status: "ambiguous" in properties and has group ownership (OWNED_BY relationship), but the exposure summary reports ownership_status: "owned". This is because the exposure route's deriveOwnershipStatus() returns "owned" when an OWNED_BY relationship exists, while the entity stores the evaluator's assessment of "ambiguous" (group ownership only). These represent different semantics but should be aligned.

Finding Consistency (3 findings checked)

Finding	Detail	Evidence Pack	Remediation	Consistent?
`eval:080ff3...` (scope_drift)	entity_id matches entity	Pack exists, sealed, SHA256 hash present	3 actions returned	Partially (see DQ3 - added_roles empty)
`eval:0b74c7...` (orphaned_ownership)	entity_id matches entity	Pack exists	refs valid entities	Yes
`eval:c444b0...` (scope_drift)	entity_id matches entity	Pack exists	valid evidence refs	Yes

Authority Path Role Analysis (Sprint Issue Check)

Question: Does the API return ALL roles for a path or just one?
Answer: The API correctly returns all roles. Path 0a3a4bb896821dc3813c9608 has via_roles: ["sql_clinical_reader", "sql_admin_reader"] (2 roles) and via_role_ids: ["49d60495127bd9136a23f723", "d7a0accb85f0b56922d12c07"] (2 role IDs). Across all 30 authority paths, multiple paths show 2+ roles:
- Agent Ascribe_Summarizer -> Billing_Payment_Methods: 2 roles
- Invoice Processing Rule -> AP/AR Ledger: 2 roles
- Invoice Processing Rule -> Financial Records API: 2 roles
Conclusion: The "1 role when there are 4" sprint issue appears to be resolved in the current API. All paths show their complete role sets.

Graph Subgraph Endpoint

Note: The task specification listed the parameter as entityId but the actual API uses seed_id. This is a documentation/spec discrepancy, not a bug.
Tested: GET /api/v1/graph/subgraph?seed_id=f6787c51af06a78527269d57 returns 5 nodes, 4 edges, truncated: false.

Part 4: Code-Level Audit

Files Audited

File	Lines	Notes
`src/api/routes/authority-paths.ts`	134	Missing entityContext in remediation (DQ6)
`src/api/routes/findings.ts`	335	bySeverity page-scoped (DQ2)
`src/api/routes/entities.ts`	190	Exposes tenant_id (H3)
`src/api/routes/exposures.ts`	376	Identity label bug (DQ4)
`src/api/routes/posture.ts`	49	Clean
`src/api/routes/risk-clusters.ts`	74	Clean
`src/api/routes/chains.ts`	71	Clean, exposes tenant_id
`src/api/routes/graph.ts`	57	Good input validation
`src/api/routes/syncs.ts`	43	Clean
`src/api/routes/evidence.ts`	108	Good cursor validation
`src/api/routes/system.ts`	73	Diagnostics info leak (H2)
`src/api/routes/paths.ts`	202	Good BFS depth limit
`src/api/routes/ingest.ts`	263	Good Zod validation
`src/api/middleware/auth.ts`	153	JWT not verified (C1)
`src/api/middleware/require-tenant.ts`	20	Clean
`src/api/middleware/tenant-context.ts`	31	Clean
`src/services/remediation-service.ts`	665	Comprehensive, deterministic
`src/services/risk-cluster-service.ts`	800	Good cap limits, no unbounded queries

Verified OK

Tenant Isolation in DB Queries: All route handlers use req.tenantId! which comes from the auth/tenant middleware chain. Every storageAdapter call passes tenantId as the first argument. No cross-tenant query paths found.
Read-Only Invariant: No outbound HTTP calls to source systems found in any route handler or service. Ingestion is receive-only via POST endpoints.
Input Validation: The ingest.ts route uses comprehensive Zod schemas for both NormalizedGraph and ConnectorReport payloads. The graph route validates mode, depth, and limit with bounds. The evidence route validates cursor format including ObjectId regex and date parsing.
Determinism: No Math.random() found in evaluation logic. The remediation service uses pure functions with computeCategoryRank() for deterministic ranking.
Finding Status Transitions: VALID_STATUS_TRANSITIONS is imported and enforced in the PATCH endpoint. Required fields (reason for false_positive, resolved_by for remediated) are validated.
Tenant ID in Evidence Pack Integrity: Per the security-auditor agent spec, evidence packs should include tenant_id in the SHA256 hash. The pack for eval:080ff3bc has integrity_hash: "sha256:fe7a78d98636..." -- verified present. (Full verification of hash computation would require reading the evidence assembly code.)
No Secret Leakage in API Responses: No API keys, passwords, connection strings, or internal credentials found in any tested response. Error messages are generic and do not leak stack traces or internal paths.
Error Responses: All error responses use a consistent {error: {code, message, status}} structure. No stack traces or internal details leaked.
Pagination Safety: All list endpoints use Math.min() to cap limits (typically 200) and use cursor-based pagination. The exposure and risk-cluster endpoints use internal caps (FINDINGS_CAP=5000, PATHS_CAP=5000) to prevent OOM.
BFS Depth Limit: The cross-system-paths BFS in paths.ts:184 is limited to 5 hops to prevent runaway graph traversal.

Endpoint Coverage Matrix

Endpoint	Status	HTTP	JSON	Types Match	Notes
`GET /health`	Tested	200	Yes	Yes	No auth required
`GET /ready`	Tested	200	Yes	Yes	No auth required
`GET /api/v1/posture/summary`	Tested	200	Yes	Yes	Path count discrepancy (DQ1)
`GET /api/v1/posture/risk-clusters`	Tested	200	Yes	Yes
`GET /api/v1/findings`	Tested	200	Yes	Yes	bySeverity issue (DQ2)
`GET /api/v1/findings?severity=critical`	Tested	200	Yes	Yes
`GET /api/v1/findings/{id}`	Tested	200	Yes	Yes
`GET /api/v1/findings/{id}/evidence-pack`	Tested	200	Yes	Yes	scope_drift_detail issue (DQ3)
`GET /api/v1/authority-paths`	Tested	200	Yes	Yes
`GET /api/v1/authority-paths/{id}`	Tested	200	Yes	Yes	Roles complete
`GET /api/v1/authority-paths/{id}/findings`	Tested	200	Yes	Yes
`GET /api/v1/authority-paths/{id}/remediation`	Tested	200	Yes	Yes	Generic applies_to (DQ6)
`GET /api/v1/entities`	Tested	200	Yes	Yes
`GET /api/v1/entities/{id}`	Tested	200	Yes	Yes	Exposes tenant_id (H3)
`GET /api/v1/entities/{id}/timeline`	Tested	200	Yes	Yes
`GET /api/v1/entities/{id}/cross-system-paths`	Tested	200	Yes	Yes
`GET /api/v1/entities/{id}/blast-radius`	Tested	200	Yes	Yes
`GET /api/v1/entities/{id}/execution-evidence`	Tested	200	Yes	Yes	Empty fields (DQ5)
`GET /api/v1/execution-chains`	Tested	200	Yes	Yes
`GET /api/v1/execution-chains/{id}`	Tested	200	Yes	Yes	Exposes tenant_id
`GET /api/v1/execution-chains/{id}/entities`	Tested	200	Yes	Yes
`GET /api/v1/exposures`	Tested	200	Yes	Yes
`GET /api/v1/exposures/{id}`	Tested	200	Yes	Yes	Identity label bug (DQ4)
`GET /api/v1/risk-clusters/{key}/remediation`	Tested	200	Yes	Yes	Has entity names
`GET /api/v1/risk-clusters/{key}/authority-paths`	Tested	200	Yes	Yes
`GET /api/v1/graph/subgraph?seed_id={id}`	Tested	200	Yes	Yes	Note: param is `seed_id` not `entityId`
`GET /api/v1/syncs`	Tested	200	Yes	Yes

Edge Cases Tested

Test	Expected	Actual	Pass?
No X-Tenant-Id header	400 or 401	200 (empty data)	FAIL (H1)
Empty X-Tenant-Id header	400	200 (empty data)	FAIL (H1)
Different tenant (other-tenant)	200 empty	200 empty	PASS (isolation works)
Invalid finding ID	404	404	PASS
Missing seed_id for subgraph	400	400	PASS
Invalid subgraph mode	400	400	PASS

Recommendations (Priority Order)

[CRITICAL] Implement JWT signature verification before enabling Bearer auth in production
[HIGH] Enable REQUIRE_AUTH=true in production and ensure tenant header enforcement returns 400/401
[HIGH] Fix evidence pack scope_drift_detail to populate added_roles and compute blast radius deltas
[MEDIUM] Fix exposure detail identity node labels to show display names
[MEDIUM] Fix path-level remediation to pass entity context for specific applies_to labels
[MEDIUM] Compute bySeverity/byType from full dataset or add server-side aggregation query
[MEDIUM] Investigate and align posture summary path counts with authority-paths list
[LOW] Strip tenant_id from API responses where not needed by the client
[LOW] Populate target_resource and payload_hash in execution evidence records

Summary​

Part 1: Security Findings​

CRITICAL​

[C1] Auth Middleware Does Not Verify JWT Signatures​

HIGH​

[H1] Missing Tenant Header Returns 200 Instead of 400/401​

[H2] Diagnostics and Metrics Endpoints Served by UI SPA Catch-All​

[H3] Entity tenant_id Exposed in API Responses​

MEDIUM​

[M1] No Rate Limiting on Production API​

Part 2: Data Quality Issues​

HIGH​

[DQ1] Posture Summary Path Count Does Not Match Authority Paths List​

[DQ2] Findings bySeverity/byType Reflects Current Page, Not Total Dataset​

[DQ3] Evidence Pack scope_drift_detail Has Inconsistent added_roles​

MEDIUM​

[DQ4] Exposure Detail: Identity Node Labels Show Raw IDs Instead of Display Names​

[DQ5] Execution Evidence Records Have Empty target_resource and payload_hash​

[DQ6] Remediation applies_to Uses Generic Labels Instead of Object Names​

Part 3: Cross-Endpoint Data Consistency​

Entity Consistency (5 entities checked)​

Finding Consistency (3 findings checked)​

Authority Path Role Analysis (Sprint Issue Check)​

Graph Subgraph Endpoint​

Part 4: Code-Level Audit​

Files Audited​

Verified OK​

Endpoint Coverage Matrix​

Edge Cases Tested​

Recommendations (Priority Order)​

Summary

Part 1: Security Findings

CRITICAL

[C1] Auth Middleware Does Not Verify JWT Signatures

HIGH

[H1] Missing Tenant Header Returns 200 Instead of 400/401

[H2] Diagnostics and Metrics Endpoints Served by UI SPA Catch-All

[H3] Entity tenant_id Exposed in API Responses

MEDIUM

[M1] No Rate Limiting on Production API

Part 2: Data Quality Issues

HIGH

[DQ1] Posture Summary Path Count Does Not Match Authority Paths List

[DQ2] Findings bySeverity/byType Reflects Current Page, Not Total Dataset

[DQ3] Evidence Pack scope_drift_detail Has Inconsistent added_roles

MEDIUM

[DQ4] Exposure Detail: Identity Node Labels Show Raw IDs Instead of Display Names

[DQ5] Execution Evidence Records Have Empty target_resource and payload_hash

[DQ6] Remediation applies_to Uses Generic Labels Instead of Object Names

Part 3: Cross-Endpoint Data Consistency

Entity Consistency (5 entities checked)

Finding Consistency (3 findings checked)

Authority Path Role Analysis (Sprint Issue Check)

Graph Subgraph Endpoint

Part 4: Code-Level Audit

Files Audited

Verified OK

Endpoint Coverage Matrix

Edge Cases Tested

Recommendations (Priority Order)