Skip to main content

API Data Quality & Security Audit


Summary

  • Endpoints tested: 26 (all listed endpoints + edge cases)
  • Findings: 1 CRITICAL, 3 HIGH, 5 MEDIUM, 4 INFO
  • Data quality issues: 6
  • All endpoints return valid JSON: Yes
  • Response structures match TypeScript types: Mostly (exceptions noted)

Part 1: Security Findings

CRITICAL

[C1] Auth Middleware Does Not Verify JWT Signatures

  • File: src/api/middleware/auth.ts:116-136
  • Issue: The decodeJwtPayload() function decodes JWT payloads without verifying signatures. The code explicitly acknowledges this with a WARNING comment. Any client can craft a JWT with arbitrary sub, tid, and scopes claims and the server will trust it.
  • Impact: An attacker can impersonate any user, claim any tenant ID, and gain any scope by constructing a fake JWT. This bypasses all authorization if REQUIRE_AUTH=true and Bearer tokens are used.
  • Fix: Implement proper JWT verification using jose library with JWKS endpoint validation from the IdP, as the comment already suggests for Phase 2. Until then, this is a critical vulnerability if Bearer auth is enabled in production.

HIGH

[H1] Missing Tenant Header Returns 200 Instead of 400/401

  • File: src/api/middleware/auth.ts:29-35
  • Issue: When REQUIRE_AUTH=false (current production configuration), requests without an X-Tenant-Id header are accepted and default to "dev-tenant". The API returns HTTP 200 with empty data arrays instead of rejecting the request. This means unauthenticated requests succeed silently.
  • Impact: No data leaks because the dev-tenant has no data, but the API accepts requests without any tenant context and returns a valid response shape. Clients cannot distinguish between "no tenant header" and "tenant exists but has no data." If dev-tenant ever gets data seeded, it would be exposed to anyone.
  • Tested:
    • GET /api/v1/findings (no header) -> 200, empty data
    • GET /api/v1/findings (empty header) -> 200, empty data
    • GET /api/v1/findings -H "X-Tenant-Id: other-tenant" -> 200, empty data
  • Fix: In production, set REQUIRE_AUTH=true and ensure the requireTenant middleware (src/api/middleware/require-tenant.ts) is applied before all API routes. Currently it exists but its enforcement depends on the auth middleware running first.

[H2] Diagnostics and Metrics Endpoints Served by UI SPA Catch-All

  • File: src/api/routes/system.ts:53-69
  • Issue: The /diagnostics and /metrics endpoints are defined in the backend but on the production domain (app.securityv0.com) they return the UI SPA HTML instead. This indicates the reverse proxy is routing /diagnostics and /metrics to the UI static server rather than the API. The /diagnostics endpoint exposes process.memoryUsage(), process.uptime(), process.version, and worker queue details.
  • Impact: If the proxy routing is corrected, /diagnostics would leak internal server details (memory, uptime, Node version, worker status) to any request with a tenant header. The endpoint only checks for x-tenant-id presence, not validity.
  • Fix: Either (1) remove the /diagnostics endpoint in production, (2) require proper authentication, or (3) ensure the reverse proxy blocks it. The /metrics endpoint should only be accessible to internal monitoring.

[H3] Entity tenant_id Exposed in API Responses

  • File: src/api/routes/entities.ts:70-80, src/api/routes/exposures.ts:333-371
  • Issue: Entity detail and exposure detail responses include the raw tenant_id field from the database document. While this is not a cross-tenant leak (the data is correct), including tenant_id in API responses exposes internal infrastructure details to clients.
  • Tested: GET /api/v1/entities/{id} -> response includes "tenant_id": "demo-w1" in the entity object. Also visible in: authority path findings, execution evidence records, timeline events, and execution chains.
  • Impact: Low risk since the client already sends the tenant header, but it reveals the internal tenant naming convention and could be used for targeted attacks on other tenants.
  • Fix: Strip tenant_id from API response serialization. The NormalizedFinding type in api-types.ts does not include tenant_id, but the raw EntityDoc type does -- the entity routes return raw MongoDB documents without filtering.

MEDIUM

[M1] No Rate Limiting on Production API

  • File: src/api/middleware/rate-limit.ts
  • Issue: Rate limiting middleware exists but it is unclear whether it is enabled in production. The exposed endpoints (findings, entities, subgraph) could be used for resource exhaustion if not rate-limited.
  • Fix: Verify rate limiting is active on the production reverse proxy or enable the middleware.

Part 2: Data Quality Issues

HIGH

[DQ1] Posture Summary Path Count Does Not Match Authority Paths List

  • Issue: The posture summary endpoint reports active_paths: 29, dormant_paths: 3 (total: 32), but the authority-paths list endpoint returns total_count: 30 with default filters. Additionally, within the 30 listed paths, 25 have execution_30d > 0 and 5 have execution_30d == 0, which also does not match the posture summary's 29/3 split.
  • Tested:
    • GET /api/v1/posture/summary -> {"active_paths": 29, "dormant_paths": 3}
    • GET /api/v1/authority-paths?limit=50 -> {"meta": {"total_count": 30}}
    • GET /api/v1/authority-paths?status=removed -> {"meta": {"total_count": 1}}
    • Manual count of returned paths: 25 with execution, 5 without
  • Impact: The posture summary counts are likely computed differently than the list API. The PostureService may include removed paths or use a different dormancy definition than simple execution_30d == 0. This creates confusion in the UI where the dashboard number does not match the detail table.
  • Fix: Align the PostureService computation with the authority-path list definition, or document the difference. The posture summary should either (a) count only active-status paths, or (b) explain what additional paths are included.

[DQ2] Findings bySeverity/byType Reflects Current Page, Not Total Dataset

  • Issue: The meta.bySeverity and meta.byType breakdowns in the findings list response are computed from the current page of results, not from the total dataset. When has_more: true, these breakdowns are misleading.
  • Tested:
    • GET /api/v1/findings?source=evaluator&limit=3 -> total_count=205, bySeverity sum=3
    • GET /api/v1/findings?source=evaluator&limit=200 -> total_count=205, bySeverity sum=200
  • File: src/api/routes/findings.ts:158-167 -- the loop that builds bySeverity and byType iterates over the normalized array after pagination truncation.
  • Impact: The UI severity breakdown pie chart / badges will only reflect the visible page. If a user loads 50 findings sorted by date, the severity breakdown may show 0 critical even though there are critical findings on later pages.
  • Fix: Compute bySeverity and byType before pagination truncation (from the full result set), or use a separate aggregation query. Alternatively, add a note in the API docs that these reflect the current page.

[DQ3] Evidence Pack scope_drift_detail Has Inconsistent added_roles

  • Issue: For finding eval:080ff3bc4c6fbce622fe9b94f23d2da2, the evidence pack shows baseline_role_count: 0, current_role_count: 2, but added_roles: [] (empty). The finding's evidence_refs correctly lists new_roles: ["ap_write", "ar_write"], but the scope_drift_detail section does not populate its added_roles array. Furthermore, the remediation text says "Review 0 role assignment(s) added since baseline" which is factually wrong -- 2 roles were added.
  • Tested: GET /api/v1/findings/eval:080ff3bc4c6fbce622fe9b94f23d2da2/evidence-pack
    • scope_drift_detail.added_roles = []
    • scope_drift_detail.blast_radius_before = scope_drift_detail.blast_radius_after (identical, suggesting no expansion computed)
    • scope_drift_detail.new_resources = []
  • Impact: The evidence pack provides incorrect forensic data. An auditor reading the evidence pack would see "0 roles added" despite the finding text saying "gained 'ap_write', 'ar_write'". This undermines the "evidence-grade" design constraint.
  • Fix: The evidence pack assembly for scope_drift_detail needs to populate added_roles from evidence_refs.new_roles and compute before/after blast radius correctly.

MEDIUM

[DQ4] Exposure Detail: Identity Node Labels Show Raw IDs Instead of Display Names

  • Issue: In the GET /api/v1/exposures/{id} response, authority path nodes of type "identity" have their label set to the raw entity ID instead of the identity's display name. For example, identity 7b9b2ccee67941b6448175ea appears as label "7b9b2ccee67941b6448175ea" instead of "svc-foundry-ascribe-prod".
  • File: src/api/routes/exposures.ts:286 -- the identity node is built with label: p.via_identity where p.via_identity is just the entity ID string, not the display name.
  • Tested: GET /api/v1/exposures/f6787c51af06a78527269d57 -- all 5 authority path identity nodes show raw ID as label.
  • Impact: The UI graph visualization shows raw hex IDs for identity nodes instead of human-readable names. This is a user-facing display issue that makes the authority paths harder to understand.
  • Fix: In exposures.ts:286, resolve p.via_identity to a display name by looking up the identity entity from the identityMap (already fetched for the identities array). Change label: p.via_identity to label: identityEntities.find(e => e._id === p.via_identity)?.properties.display_name ?? p.via_identity.

[DQ5] Execution Evidence Records Have Empty target_resource and payload_hash

  • Issue: All execution evidence records for entity f6787c51af06a78527269d57 have empty strings for target_resource and payload_hash fields. The ExecutionEvidenceRecord type in api-types.ts defines target_resource: string and payload_hash: string as required fields.
  • Tested: GET /api/v1/entities/f6787c51af06a78527269d57/execution-evidence -- all 70 records have target_resource: "" and payload_hash: "".
  • Impact: The execution evidence lacks specificity about which resource was accessed. For the "evidence-grade" design constraint, knowing the target of each API call is critical for proving whether a specific execution path was exercised. Without target_resource, the evidence can only prove that execution occurred, not what was accessed.
  • Fix: Populate target_resource during the connector transform or seed script. If the source data does not contain this information, update evidence_completeness.notes to indicate this limitation.

[DQ6] Remediation applies_to Uses Generic Labels Instead of Object Names

  • Issue: Path-level remediation actions use generic applies_to labels like "LLM endpoint access", "egress path", "execution path" instead of specific object names. The resolveAppliesTo() function in remediation-service.ts:576-607 can resolve names, but the authority-paths route handler at src/api/routes/authority-paths.ts:108 does not pass entityContext to generatePathRemediation().
  • Tested: GET /api/v1/authority-paths/0a3a4bb896821dc3813c9608/remediation
    • applies_to: "LLM endpoint access" instead of "Egress: Agent Ascribe_Summarizer"
    • applies_to: "egress path" instead of "Egress: Agent Ascribe_Summarizer"
    • applies_to: "execution path" instead of "Path: Agent Ascribe_Summarizer -> Billing_Payment_Methods"
  • Note: The cluster remediation endpoint does pass entityContext and produces enriched labels like "Egress: Agent Ascribe_Summarizer". The path-level endpoint is inconsistent.
  • File: src/api/routes/authority-paths.ts:108 -- generatePathRemediation(path, findings) missing entityContext parameter.
  • Fix: In the authority-paths remediation route, resolve the workload, identity, and destination entities and pass them as RemediationEntityContext:
    const workload = await storageAdapter.getEntity(tenantId, path.workload_id);
    const dest = await storageAdapter.getEntity(tenantId, path.destination_id);
    const entityContext = {
    workload_name: workload?.properties.display_name ?? undefined,
    destination_name: dest?.properties.display_name ?? undefined,
    };
    const guidance = generatePathRemediation(path, findings, entityContext);

Part 3: Cross-Endpoint Data Consistency

Entity Consistency (5 entities checked)

EntityEntity DetailExposureBlast RadiusFindingsConsistent?
Agent Ascribe_Summarizer (f678...)workload, orphaned, 5 pathsorphaned, bound, 5 paths5 paths, customer domain30 findingsYes
Invoice Processing Rule (c32d...)workload, owned, 3 pathsexists in exposures3 paths, finance domainreferencedYes
AI Assist Flow (b9fb...)workload, ambiguousowned (exposure), unbound2 paths6+ findingsInconsistent
Compliance Audit Exporter (faf2...)workload, ownedexists in exposuresexistsreferencedYes
Security Log Collector (da76...)workload, ownedexists in exposuresexistsreferencedYes

AI Assist Flow Inconsistency: The entity detail shows ownership_status: "ambiguous" in properties and has group ownership (OWNED_BY relationship), but the exposure summary reports ownership_status: "owned". This is because the exposure route's deriveOwnershipStatus() returns "owned" when an OWNED_BY relationship exists, while the entity stores the evaluator's assessment of "ambiguous" (group ownership only). These represent different semantics but should be aligned.

Finding Consistency (3 findings checked)

FindingDetailEvidence PackRemediationConsistent?
eval:080ff3... (scope_drift)entity_id matches entityPack exists, sealed, SHA256 hash present3 actions returnedPartially (see DQ3 - added_roles empty)
eval:0b74c7... (orphaned_ownership)entity_id matches entityPack existsrefs valid entitiesYes
eval:c444b0... (scope_drift)entity_id matches entityPack existsvalid evidence refsYes

Authority Path Role Analysis (Sprint Issue Check)

  • Question: Does the API return ALL roles for a path or just one?
  • Answer: The API correctly returns all roles. Path 0a3a4bb896821dc3813c9608 has via_roles: ["sql_clinical_reader", "sql_admin_reader"] (2 roles) and via_role_ids: ["49d60495127bd9136a23f723", "d7a0accb85f0b56922d12c07"] (2 role IDs). Across all 30 authority paths, multiple paths show 2+ roles:
    • Agent Ascribe_Summarizer -> Billing_Payment_Methods: 2 roles
    • Invoice Processing Rule -> AP/AR Ledger: 2 roles
    • Invoice Processing Rule -> Financial Records API: 2 roles
  • Conclusion: The "1 role when there are 4" sprint issue appears to be resolved in the current API. All paths show their complete role sets.

Graph Subgraph Endpoint

  • Note: The task specification listed the parameter as entityId but the actual API uses seed_id. This is a documentation/spec discrepancy, not a bug.
  • Tested: GET /api/v1/graph/subgraph?seed_id=f6787c51af06a78527269d57 returns 5 nodes, 4 edges, truncated: false.

Part 4: Code-Level Audit

Files Audited

FileLinesNotes
src/api/routes/authority-paths.ts134Missing entityContext in remediation (DQ6)
src/api/routes/findings.ts335bySeverity page-scoped (DQ2)
src/api/routes/entities.ts190Exposes tenant_id (H3)
src/api/routes/exposures.ts376Identity label bug (DQ4)
src/api/routes/posture.ts49Clean
src/api/routes/risk-clusters.ts74Clean
src/api/routes/chains.ts71Clean, exposes tenant_id
src/api/routes/graph.ts57Good input validation
src/api/routes/syncs.ts43Clean
src/api/routes/evidence.ts108Good cursor validation
src/api/routes/system.ts73Diagnostics info leak (H2)
src/api/routes/paths.ts202Good BFS depth limit
src/api/routes/ingest.ts263Good Zod validation
src/api/middleware/auth.ts153JWT not verified (C1)
src/api/middleware/require-tenant.ts20Clean
src/api/middleware/tenant-context.ts31Clean
src/services/remediation-service.ts665Comprehensive, deterministic
src/services/risk-cluster-service.ts800Good cap limits, no unbounded queries

Verified OK

  • Tenant Isolation in DB Queries: All route handlers use req.tenantId! which comes from the auth/tenant middleware chain. Every storageAdapter call passes tenantId as the first argument. No cross-tenant query paths found.
  • Read-Only Invariant: No outbound HTTP calls to source systems found in any route handler or service. Ingestion is receive-only via POST endpoints.
  • Input Validation: The ingest.ts route uses comprehensive Zod schemas for both NormalizedGraph and ConnectorReport payloads. The graph route validates mode, depth, and limit with bounds. The evidence route validates cursor format including ObjectId regex and date parsing.
  • Determinism: No Math.random() found in evaluation logic. The remediation service uses pure functions with computeCategoryRank() for deterministic ranking.
  • Finding Status Transitions: VALID_STATUS_TRANSITIONS is imported and enforced in the PATCH endpoint. Required fields (reason for false_positive, resolved_by for remediated) are validated.
  • Tenant ID in Evidence Pack Integrity: Per the security-auditor agent spec, evidence packs should include tenant_id in the SHA256 hash. The pack for eval:080ff3bc has integrity_hash: "sha256:fe7a78d98636..." -- verified present. (Full verification of hash computation would require reading the evidence assembly code.)
  • No Secret Leakage in API Responses: No API keys, passwords, connection strings, or internal credentials found in any tested response. Error messages are generic and do not leak stack traces or internal paths.
  • Error Responses: All error responses use a consistent {error: {code, message, status}} structure. No stack traces or internal details leaked.
  • Pagination Safety: All list endpoints use Math.min() to cap limits (typically 200) and use cursor-based pagination. The exposure and risk-cluster endpoints use internal caps (FINDINGS_CAP=5000, PATHS_CAP=5000) to prevent OOM.
  • BFS Depth Limit: The cross-system-paths BFS in paths.ts:184 is limited to 5 hops to prevent runaway graph traversal.

Endpoint Coverage Matrix

EndpointStatusHTTPJSONTypes MatchNotes
GET /healthTested200YesYesNo auth required
GET /readyTested200YesYesNo auth required
GET /api/v1/posture/summaryTested200YesYesPath count discrepancy (DQ1)
GET /api/v1/posture/risk-clustersTested200YesYes
GET /api/v1/findingsTested200YesYesbySeverity issue (DQ2)
GET /api/v1/findings?severity=criticalTested200YesYes
GET /api/v1/findings/{id}Tested200YesYes
GET /api/v1/findings/{id}/evidence-packTested200YesYesscope_drift_detail issue (DQ3)
GET /api/v1/authority-pathsTested200YesYes
GET /api/v1/authority-paths/{id}Tested200YesYesRoles complete
GET /api/v1/authority-paths/{id}/findingsTested200YesYes
GET /api/v1/authority-paths/{id}/remediationTested200YesYesGeneric applies_to (DQ6)
GET /api/v1/entitiesTested200YesYes
GET /api/v1/entities/{id}Tested200YesYesExposes tenant_id (H3)
GET /api/v1/entities/{id}/timelineTested200YesYes
GET /api/v1/entities/{id}/cross-system-pathsTested200YesYes
GET /api/v1/entities/{id}/blast-radiusTested200YesYes
GET /api/v1/entities/{id}/execution-evidenceTested200YesYesEmpty fields (DQ5)
GET /api/v1/execution-chainsTested200YesYes
GET /api/v1/execution-chains/{id}Tested200YesYesExposes tenant_id
GET /api/v1/execution-chains/{id}/entitiesTested200YesYes
GET /api/v1/exposuresTested200YesYes
GET /api/v1/exposures/{id}Tested200YesYesIdentity label bug (DQ4)
GET /api/v1/risk-clusters/{key}/remediationTested200YesYesHas entity names
GET /api/v1/risk-clusters/{key}/authority-pathsTested200YesYes
GET /api/v1/graph/subgraph?seed_id={id}Tested200YesYesNote: param is seed_id not entityId
GET /api/v1/syncsTested200YesYes

Edge Cases Tested

TestExpectedActualPass?
No X-Tenant-Id header400 or 401200 (empty data)FAIL (H1)
Empty X-Tenant-Id header400200 (empty data)FAIL (H1)
Different tenant (other-tenant)200 empty200 emptyPASS (isolation works)
Invalid finding ID404404PASS
Missing seed_id for subgraph400400PASS
Invalid subgraph mode400400PASS

Recommendations (Priority Order)

  1. [CRITICAL] Implement JWT signature verification before enabling Bearer auth in production
  2. [HIGH] Enable REQUIRE_AUTH=true in production and ensure tenant header enforcement returns 400/401
  3. [HIGH] Fix evidence pack scope_drift_detail to populate added_roles and compute blast radius deltas
  4. [MEDIUM] Fix exposure detail identity node labels to show display names
  5. [MEDIUM] Fix path-level remediation to pass entity context for specific applies_to labels
  6. [MEDIUM] Compute bySeverity/byType from full dataset or add server-side aggregation query
  7. [MEDIUM] Investigate and align posture summary path counts with authority-paths list
  8. [LOW] Strip tenant_id from API responses where not needed by the client
  9. [LOW] Populate target_resource and payload_hash in execution evidence records