API Data Quality & Security Audit
Summary
- Endpoints tested: 26 (all listed endpoints + edge cases)
- Findings: 1 CRITICAL, 3 HIGH, 5 MEDIUM, 4 INFO
- Data quality issues: 6
- All endpoints return valid JSON: Yes
- Response structures match TypeScript types: Mostly (exceptions noted)
Part 1: Security Findings
CRITICAL
[C1] Auth Middleware Does Not Verify JWT Signatures
- File:
src/api/middleware/auth.ts:116-136 - Issue: The
decodeJwtPayload()function decodes JWT payloads without verifying signatures. The code explicitly acknowledges this with aWARNINGcomment. Any client can craft a JWT with arbitrarysub,tid, andscopesclaims and the server will trust it. - Impact: An attacker can impersonate any user, claim any tenant ID, and gain any scope by constructing a fake JWT. This bypasses all authorization if
REQUIRE_AUTH=trueand Bearer tokens are used. - Fix: Implement proper JWT verification using
joselibrary with JWKS endpoint validation from the IdP, as the comment already suggests for Phase 2. Until then, this is a critical vulnerability if Bearer auth is enabled in production.
HIGH
[H1] Missing Tenant Header Returns 200 Instead of 400/401
- File:
src/api/middleware/auth.ts:29-35 - Issue: When
REQUIRE_AUTH=false(current production configuration), requests without anX-Tenant-Idheader are accepted and default to"dev-tenant". The API returns HTTP 200 with empty data arrays instead of rejecting the request. This means unauthenticated requests succeed silently. - Impact: No data leaks because the
dev-tenanthas no data, but the API accepts requests without any tenant context and returns a valid response shape. Clients cannot distinguish between "no tenant header" and "tenant exists but has no data." Ifdev-tenantever gets data seeded, it would be exposed to anyone. - Tested:
GET /api/v1/findings(no header) -> 200, empty dataGET /api/v1/findings(empty header) -> 200, empty dataGET /api/v1/findings -H "X-Tenant-Id: other-tenant"-> 200, empty data
- Fix: In production, set
REQUIRE_AUTH=trueand ensure therequireTenantmiddleware (src/api/middleware/require-tenant.ts) is applied before all API routes. Currently it exists but its enforcement depends on the auth middleware running first.
[H2] Diagnostics and Metrics Endpoints Served by UI SPA Catch-All
- File:
src/api/routes/system.ts:53-69 - Issue: The
/diagnosticsand/metricsendpoints are defined in the backend but on the production domain (app.securityv0.com) they return the UI SPA HTML instead. This indicates the reverse proxy is routing/diagnosticsand/metricsto the UI static server rather than the API. The/diagnosticsendpoint exposesprocess.memoryUsage(),process.uptime(),process.version, and worker queue details. - Impact: If the proxy routing is corrected,
/diagnosticswould leak internal server details (memory, uptime, Node version, worker status) to any request with a tenant header. The endpoint only checks forx-tenant-idpresence, not validity. - Fix: Either (1) remove the
/diagnosticsendpoint in production, (2) require proper authentication, or (3) ensure the reverse proxy blocks it. The/metricsendpoint should only be accessible to internal monitoring.
[H3] Entity tenant_id Exposed in API Responses
- File:
src/api/routes/entities.ts:70-80,src/api/routes/exposures.ts:333-371 - Issue: Entity detail and exposure detail responses include the raw
tenant_idfield from the database document. While this is not a cross-tenant leak (the data is correct), includingtenant_idin API responses exposes internal infrastructure details to clients. - Tested:
GET /api/v1/entities/{id}-> response includes"tenant_id": "demo-w1"in the entity object. Also visible in: authority path findings, execution evidence records, timeline events, and execution chains. - Impact: Low risk since the client already sends the tenant header, but it reveals the internal tenant naming convention and could be used for targeted attacks on other tenants.
- Fix: Strip
tenant_idfrom API response serialization. TheNormalizedFindingtype inapi-types.tsdoes not includetenant_id, but the rawEntityDoctype does -- the entity routes return raw MongoDB documents without filtering.
MEDIUM
[M1] No Rate Limiting on Production API
- File:
src/api/middleware/rate-limit.ts - Issue: Rate limiting middleware exists but it is unclear whether it is enabled in production. The exposed endpoints (findings, entities, subgraph) could be used for resource exhaustion if not rate-limited.
- Fix: Verify rate limiting is active on the production reverse proxy or enable the middleware.
Part 2: Data Quality Issues
HIGH
[DQ1] Posture Summary Path Count Does Not Match Authority Paths List
- Issue: The posture summary endpoint reports
active_paths: 29, dormant_paths: 3(total: 32), but the authority-paths list endpoint returnstotal_count: 30with default filters. Additionally, within the 30 listed paths, 25 haveexecution_30d > 0and 5 haveexecution_30d == 0, which also does not match the posture summary's 29/3 split. - Tested:
GET /api/v1/posture/summary->{"active_paths": 29, "dormant_paths": 3}GET /api/v1/authority-paths?limit=50->{"meta": {"total_count": 30}}GET /api/v1/authority-paths?status=removed->{"meta": {"total_count": 1}}- Manual count of returned paths: 25 with execution, 5 without
- Impact: The posture summary counts are likely computed differently than the list API. The PostureService may include removed paths or use a different dormancy definition than simple
execution_30d == 0. This creates confusion in the UI where the dashboard number does not match the detail table. - Fix: Align the PostureService computation with the authority-path list definition, or document the difference. The posture summary should either (a) count only active-status paths, or (b) explain what additional paths are included.
[DQ2] Findings bySeverity/byType Reflects Current Page, Not Total Dataset
- Issue: The
meta.bySeverityandmeta.byTypebreakdowns in the findings list response are computed from the current page of results, not from the total dataset. Whenhas_more: true, these breakdowns are misleading. - Tested:
GET /api/v1/findings?source=evaluator&limit=3->total_count=205, bySeverity sum=3GET /api/v1/findings?source=evaluator&limit=200->total_count=205, bySeverity sum=200
- File:
src/api/routes/findings.ts:158-167-- the loop that buildsbySeverityandbyTypeiterates over thenormalizedarray after pagination truncation. - Impact: The UI severity breakdown pie chart / badges will only reflect the visible page. If a user loads 50 findings sorted by date, the severity breakdown may show 0 critical even though there are critical findings on later pages.
- Fix: Compute
bySeverityandbyTypebefore pagination truncation (from the full result set), or use a separate aggregation query. Alternatively, add a note in the API docs that these reflect the current page.
[DQ3] Evidence Pack scope_drift_detail Has Inconsistent added_roles
- Issue: For finding
eval:080ff3bc4c6fbce622fe9b94f23d2da2, the evidence pack showsbaseline_role_count: 0,current_role_count: 2, butadded_roles: [](empty). The finding'sevidence_refscorrectly listsnew_roles: ["ap_write", "ar_write"], but the scope_drift_detail section does not populate itsadded_rolesarray. Furthermore, the remediation text says "Review 0 role assignment(s) added since baseline" which is factually wrong -- 2 roles were added. - Tested:
GET /api/v1/findings/eval:080ff3bc4c6fbce622fe9b94f23d2da2/evidence-packscope_drift_detail.added_roles=[]scope_drift_detail.blast_radius_before=scope_drift_detail.blast_radius_after(identical, suggesting no expansion computed)scope_drift_detail.new_resources=[]
- Impact: The evidence pack provides incorrect forensic data. An auditor reading the evidence pack would see "0 roles added" despite the finding text saying "gained 'ap_write', 'ar_write'". This undermines the "evidence-grade" design constraint.
- Fix: The evidence pack assembly for scope_drift_detail needs to populate
added_rolesfromevidence_refs.new_rolesand compute before/after blast radius correctly.
MEDIUM
[DQ4] Exposure Detail: Identity Node Labels Show Raw IDs Instead of Display Names
- Issue: In the
GET /api/v1/exposures/{id}response, authority path nodes of type"identity"have theirlabelset to the raw entity ID instead of the identity's display name. For example, identity7b9b2ccee67941b6448175eaappears as label"7b9b2ccee67941b6448175ea"instead of"svc-foundry-ascribe-prod". - File:
src/api/routes/exposures.ts:286-- the identity node is built withlabel: p.via_identitywherep.via_identityis just the entity ID string, not the display name. - Tested:
GET /api/v1/exposures/f6787c51af06a78527269d57-- all 5 authority path identity nodes show raw ID as label. - Impact: The UI graph visualization shows raw hex IDs for identity nodes instead of human-readable names. This is a user-facing display issue that makes the authority paths harder to understand.
- Fix: In
exposures.ts:286, resolvep.via_identityto a display name by looking up the identity entity from theidentityMap(already fetched for theidentitiesarray). Changelabel: p.via_identitytolabel: identityEntities.find(e => e._id === p.via_identity)?.properties.display_name ?? p.via_identity.
[DQ5] Execution Evidence Records Have Empty target_resource and payload_hash
- Issue: All execution evidence records for entity
f6787c51af06a78527269d57have empty strings fortarget_resourceandpayload_hashfields. TheExecutionEvidenceRecordtype inapi-types.tsdefinestarget_resource: stringandpayload_hash: stringas required fields. - Tested:
GET /api/v1/entities/f6787c51af06a78527269d57/execution-evidence-- all 70 records havetarget_resource: ""andpayload_hash: "". - Impact: The execution evidence lacks specificity about which resource was accessed. For the "evidence-grade" design constraint, knowing the target of each API call is critical for proving whether a specific execution path was exercised. Without
target_resource, the evidence can only prove that execution occurred, not what was accessed. - Fix: Populate
target_resourceduring the connector transform or seed script. If the source data does not contain this information, updateevidence_completeness.notesto indicate this limitation.
[DQ6] Remediation applies_to Uses Generic Labels Instead of Object Names
- Issue: Path-level remediation actions use generic
applies_tolabels like "LLM endpoint access", "egress path", "execution path" instead of specific object names. TheresolveAppliesTo()function inremediation-service.ts:576-607can resolve names, but the authority-paths route handler atsrc/api/routes/authority-paths.ts:108does not passentityContexttogeneratePathRemediation(). - Tested:
GET /api/v1/authority-paths/0a3a4bb896821dc3813c9608/remediationapplies_to: "LLM endpoint access"instead of"Egress: Agent Ascribe_Summarizer"applies_to: "egress path"instead of"Egress: Agent Ascribe_Summarizer"applies_to: "execution path"instead of"Path: Agent Ascribe_Summarizer -> Billing_Payment_Methods"
- Note: The cluster remediation endpoint does pass
entityContextand produces enriched labels like "Egress: Agent Ascribe_Summarizer". The path-level endpoint is inconsistent. - File:
src/api/routes/authority-paths.ts:108--generatePathRemediation(path, findings)missingentityContextparameter. - Fix: In the authority-paths remediation route, resolve the workload, identity, and destination entities and pass them as
RemediationEntityContext:const workload = await storageAdapter.getEntity(tenantId, path.workload_id);
const dest = await storageAdapter.getEntity(tenantId, path.destination_id);
const entityContext = {
workload_name: workload?.properties.display_name ?? undefined,
destination_name: dest?.properties.display_name ?? undefined,
};
const guidance = generatePathRemediation(path, findings, entityContext);
Part 3: Cross-Endpoint Data Consistency
Entity Consistency (5 entities checked)
| Entity | Entity Detail | Exposure | Blast Radius | Findings | Consistent? |
|---|---|---|---|---|---|
Agent Ascribe_Summarizer (f678...) | workload, orphaned, 5 paths | orphaned, bound, 5 paths | 5 paths, customer domain | 30 findings | Yes |
Invoice Processing Rule (c32d...) | workload, owned, 3 paths | exists in exposures | 3 paths, finance domain | referenced | Yes |
AI Assist Flow (b9fb...) | workload, ambiguous | owned (exposure), unbound | 2 paths | 6+ findings | Inconsistent |
Compliance Audit Exporter (faf2...) | workload, owned | exists in exposures | exists | referenced | Yes |
Security Log Collector (da76...) | workload, owned | exists in exposures | exists | referenced | Yes |
AI Assist Flow Inconsistency: The entity detail shows ownership_status: "ambiguous" in properties and has group ownership (OWNED_BY relationship), but the exposure summary reports ownership_status: "owned". This is because the exposure route's deriveOwnershipStatus() returns "owned" when an OWNED_BY relationship exists, while the entity stores the evaluator's assessment of "ambiguous" (group ownership only). These represent different semantics but should be aligned.
Finding Consistency (3 findings checked)
| Finding | Detail | Evidence Pack | Remediation | Consistent? |
|---|---|---|---|---|
eval:080ff3... (scope_drift) | entity_id matches entity | Pack exists, sealed, SHA256 hash present | 3 actions returned | Partially (see DQ3 - added_roles empty) |
eval:0b74c7... (orphaned_ownership) | entity_id matches entity | Pack exists | refs valid entities | Yes |
eval:c444b0... (scope_drift) | entity_id matches entity | Pack exists | valid evidence refs | Yes |
Authority Path Role Analysis (Sprint Issue Check)
- Question: Does the API return ALL roles for a path or just one?
- Answer: The API correctly returns all roles. Path
0a3a4bb896821dc3813c9608hasvia_roles: ["sql_clinical_reader", "sql_admin_reader"](2 roles) andvia_role_ids: ["49d60495127bd9136a23f723", "d7a0accb85f0b56922d12c07"](2 role IDs). Across all 30 authority paths, multiple paths show 2+ roles:- Agent Ascribe_Summarizer -> Billing_Payment_Methods: 2 roles
- Invoice Processing Rule -> AP/AR Ledger: 2 roles
- Invoice Processing Rule -> Financial Records API: 2 roles
- Conclusion: The "1 role when there are 4" sprint issue appears to be resolved in the current API. All paths show their complete role sets.
Graph Subgraph Endpoint
- Note: The task specification listed the parameter as
entityIdbut the actual API usesseed_id. This is a documentation/spec discrepancy, not a bug. - Tested:
GET /api/v1/graph/subgraph?seed_id=f6787c51af06a78527269d57returns 5 nodes, 4 edges,truncated: false.
Part 4: Code-Level Audit
Files Audited
| File | Lines | Notes |
|---|---|---|
src/api/routes/authority-paths.ts | 134 | Missing entityContext in remediation (DQ6) |
src/api/routes/findings.ts | 335 | bySeverity page-scoped (DQ2) |
src/api/routes/entities.ts | 190 | Exposes tenant_id (H3) |
src/api/routes/exposures.ts | 376 | Identity label bug (DQ4) |
src/api/routes/posture.ts | 49 | Clean |
src/api/routes/risk-clusters.ts | 74 | Clean |
src/api/routes/chains.ts | 71 | Clean, exposes tenant_id |
src/api/routes/graph.ts | 57 | Good input validation |
src/api/routes/syncs.ts | 43 | Clean |
src/api/routes/evidence.ts | 108 | Good cursor validation |
src/api/routes/system.ts | 73 | Diagnostics info leak (H2) |
src/api/routes/paths.ts | 202 | Good BFS depth limit |
src/api/routes/ingest.ts | 263 | Good Zod validation |
src/api/middleware/auth.ts | 153 | JWT not verified (C1) |
src/api/middleware/require-tenant.ts | 20 | Clean |
src/api/middleware/tenant-context.ts | 31 | Clean |
src/services/remediation-service.ts | 665 | Comprehensive, deterministic |
src/services/risk-cluster-service.ts | 800 | Good cap limits, no unbounded queries |
Verified OK
- Tenant Isolation in DB Queries: All route handlers use
req.tenantId!which comes from the auth/tenant middleware chain. EverystorageAdaptercall passestenantIdas the first argument. No cross-tenant query paths found. - Read-Only Invariant: No outbound HTTP calls to source systems found in any route handler or service. Ingestion is receive-only via POST endpoints.
- Input Validation: The
ingest.tsroute uses comprehensive Zod schemas for both NormalizedGraph and ConnectorReport payloads. The graph route validatesmode,depth, andlimitwith bounds. The evidence route validates cursor format including ObjectId regex and date parsing. - Determinism: No
Math.random()found in evaluation logic. The remediation service uses pure functions withcomputeCategoryRank()for deterministic ranking. - Finding Status Transitions:
VALID_STATUS_TRANSITIONSis imported and enforced in the PATCH endpoint. Required fields (reasonfor false_positive,resolved_byfor remediated) are validated. - Tenant ID in Evidence Pack Integrity: Per the security-auditor agent spec, evidence packs should include
tenant_idin the SHA256 hash. The pack foreval:080ff3bchasintegrity_hash: "sha256:fe7a78d98636..."-- verified present. (Full verification of hash computation would require reading the evidence assembly code.) - No Secret Leakage in API Responses: No API keys, passwords, connection strings, or internal credentials found in any tested response. Error messages are generic and do not leak stack traces or internal paths.
- Error Responses: All error responses use a consistent
{error: {code, message, status}}structure. No stack traces or internal details leaked. - Pagination Safety: All list endpoints use
Math.min()to cap limits (typically 200) and use cursor-based pagination. The exposure and risk-cluster endpoints use internal caps (FINDINGS_CAP=5000, PATHS_CAP=5000) to prevent OOM. - BFS Depth Limit: The cross-system-paths BFS in
paths.ts:184is limited to 5 hops to prevent runaway graph traversal.
Endpoint Coverage Matrix
| Endpoint | Status | HTTP | JSON | Types Match | Notes |
|---|---|---|---|---|---|
GET /health | Tested | 200 | Yes | Yes | No auth required |
GET /ready | Tested | 200 | Yes | Yes | No auth required |
GET /api/v1/posture/summary | Tested | 200 | Yes | Yes | Path count discrepancy (DQ1) |
GET /api/v1/posture/risk-clusters | Tested | 200 | Yes | Yes | |
GET /api/v1/findings | Tested | 200 | Yes | Yes | bySeverity issue (DQ2) |
GET /api/v1/findings?severity=critical | Tested | 200 | Yes | Yes | |
GET /api/v1/findings/{id} | Tested | 200 | Yes | Yes | |
GET /api/v1/findings/{id}/evidence-pack | Tested | 200 | Yes | Yes | scope_drift_detail issue (DQ3) |
GET /api/v1/authority-paths | Tested | 200 | Yes | Yes | |
GET /api/v1/authority-paths/{id} | Tested | 200 | Yes | Yes | Roles complete |
GET /api/v1/authority-paths/{id}/findings | Tested | 200 | Yes | Yes | |
GET /api/v1/authority-paths/{id}/remediation | Tested | 200 | Yes | Yes | Generic applies_to (DQ6) |
GET /api/v1/entities | Tested | 200 | Yes | Yes | |
GET /api/v1/entities/{id} | Tested | 200 | Yes | Yes | Exposes tenant_id (H3) |
GET /api/v1/entities/{id}/timeline | Tested | 200 | Yes | Yes | |
GET /api/v1/entities/{id}/cross-system-paths | Tested | 200 | Yes | Yes | |
GET /api/v1/entities/{id}/blast-radius | Tested | 200 | Yes | Yes | |
GET /api/v1/entities/{id}/execution-evidence | Tested | 200 | Yes | Yes | Empty fields (DQ5) |
GET /api/v1/execution-chains | Tested | 200 | Yes | Yes | |
GET /api/v1/execution-chains/{id} | Tested | 200 | Yes | Yes | Exposes tenant_id |
GET /api/v1/execution-chains/{id}/entities | Tested | 200 | Yes | Yes | |
GET /api/v1/exposures | Tested | 200 | Yes | Yes | |
GET /api/v1/exposures/{id} | Tested | 200 | Yes | Yes | Identity label bug (DQ4) |
GET /api/v1/risk-clusters/{key}/remediation | Tested | 200 | Yes | Yes | Has entity names |
GET /api/v1/risk-clusters/{key}/authority-paths | Tested | 200 | Yes | Yes | |
GET /api/v1/graph/subgraph?seed_id={id} | Tested | 200 | Yes | Yes | Note: param is seed_id not entityId |
GET /api/v1/syncs | Tested | 200 | Yes | Yes |
Edge Cases Tested
| Test | Expected | Actual | Pass? |
|---|---|---|---|
| No X-Tenant-Id header | 400 or 401 | 200 (empty data) | FAIL (H1) |
| Empty X-Tenant-Id header | 400 | 200 (empty data) | FAIL (H1) |
| Different tenant (other-tenant) | 200 empty | 200 empty | PASS (isolation works) |
| Invalid finding ID | 404 | 404 | PASS |
| Missing seed_id for subgraph | 400 | 400 | PASS |
| Invalid subgraph mode | 400 | 400 | PASS |
Recommendations (Priority Order)
- [CRITICAL] Implement JWT signature verification before enabling Bearer auth in production
- [HIGH] Enable
REQUIRE_AUTH=truein production and ensure tenant header enforcement returns 400/401 - [HIGH] Fix evidence pack scope_drift_detail to populate
added_rolesand compute blast radius deltas - [MEDIUM] Fix exposure detail identity node labels to show display names
- [MEDIUM] Fix path-level remediation to pass entity context for specific
applies_tolabels - [MEDIUM] Compute
bySeverity/byTypefrom full dataset or add server-side aggregation query - [MEDIUM] Investigate and align posture summary path counts with authority-paths list
- [LOW] Strip
tenant_idfrom API responses where not needed by the client - [LOW] Populate
target_resourceandpayload_hashin execution evidence records