API Data Quality Analysis: Automation Classification & Execution Evidence

Author: DEVELOPER (automation-analysis team) Date: 2026-02-12 Context: Analysis of /api/v1/entities response structure for automation classification gaps Dataset: 92 identity entities from http://localhost:3000/api/v1/entities

Executive Summary

Primary Finding: The API response structure conflates "no data collected" with "confirmed zero", has a 29% classification gap rate, and lacks confidence indicators. This makes downstream analysis ambiguous and prevents reliable automation risk assessment.

Critical Questions:

Does execution_count_30d: 0 mean "we checked and found zero" or "we didn't check"?
Is a 29% execution_mode: "unknown" rate acceptable for security-relevant automation classification?
Should the platform validate/override connector-provided classifications?

Recommendation: Add data quality metadata to entity schema, expose classification confidence levels in API responses, and provide reclassification endpoints.

1. Hypothesis

The Null Ambiguity Problem:

The current entity schema uses default values (0, null, "unknown") without distinguishing data unavailability from confirmed absence. This creates three failure modes:

1.1 Semantic Overload

execution_count_30d: 0 could mean:
- A) We queried sys_flow_context and found exactly 0 matching records
- B) We skipped execution data collection (connector config/permissions)
- C) This automation type has no deterministic execution log table (business_rule, system_execution)

1.2 Classification Gap Propagation

Connector returns execution_mode: "unknown" for 29% of flows
Platform ingests this verbatim with no validation or fallback
Every downstream consumer (UI, evaluator, reporting) inherits the gap

1.3 Temporal Data Quality Decay

No last_data_collection_timestamp field
No way to know if execution_count_30d: 0 reflects data from today or last month
Stale data presented as current

Core Issue: The schema assumes the connector is always authoritative and complete. In practice, connectors have permissions gaps, API limits, and implementation bugs.

2. API Data Quality Audit

2.1 Field-by-Field Reliability Assessment

Based on analysis of 92 identity entities (77 internal_inventory, 9 dormant_authority, 5 unknown, 1 active_external):

Field	Populated	Null/Default	Trustworthy?	Notes
`display_name`	100%	0%	✅ Yes	Always present from source system
`status`	100%	0%	✅ Yes	"active" or "disabled" from source
`identitySubtype`	100%	0%	✅ Yes	Deterministic from sys_class_name
`automation_type`	100%	0%	✅ Yes	Derived from subtype (flow, business_rule, etc.)
`sys_created_by`	100%	0%	✅ Yes	ServiceNow audit field
`sys_updated_by`	100%	0%	✅ Yes	ServiceNow audit field
`triggerTypes`	~95%*	~5%	⚠️ Partial	Flow trigger types extracted from sys_hub_trigger_instance
`endpoint_url`	~15%	85% null	⚠️ Partial	Only populated when REST step detected in flow actions
`last_observed_execution_timestamp`	0%**	100% null	❌ No	ALL internal_inventory have null (0 exec → no timestamp)
`execution_count_30d`	100%	0% (all zeros)	❓ AMBIGUOUS	See §2.2
`execution_evidence_refs`	0%	100% empty	❓ AMBIGUOUS	Empty means "no executions" OR "didn't check"
`identity_binding_status`	100%	0%	✅ Yes	"bound" or "unlinked" from RUNS_AS edge resolution
`egress_host`	~15%	85% null	✅ Yes	Null correctly means "no external egress detected"
`egress_base_url`	~15%	85% null	✅ Yes	Null correctly means "no external egress detected"
`egress_category`	100%	0%	✅ Yes	"none", "internal", "external", "cloud", "llm", "unknown"
`referenced_tables`	~90%	~10% empty	✅ Yes	Extracted from flow actions/triggers
`data_domains`	100%	0%	⚠️ Partial	Falls back to "unknown" if table→domain mapping missing
`ownership_status`	100%	0%	✅ Yes	"valid" or "orphaned" from OWNED_BY edge validation
`risk_group`	100%	0%	✅ Yes	Deterministic from egress_category + data_domains
`risk_group_label`	100%	0%	✅ Yes	Display label for risk_group
`risk_group_priority`	100%	0%	✅ Yes	P1-P4 from risk_group
`execution_mode`	71%	29% "unknown"	❌ GAP	See §4
`security_relevance`	100%	0%	⚠️ DERIVED	Computed from other fields; trustworthy IFF inputs are

* Some flows have empty triggerTypes array → classified as unknown ** Within internal_inventory subset

2.2 The `execution_count_30d: 0` Problem

Observation: ALL 77 internal_inventory entities have execution_count_30d: 0.

Three Possible Interpretations:

Interpretation A: Confirmed Zero (Optimistic)

The connector successfully queried sys_flow_context for each flow and confirmed 0 matching records in the last 30 days.

Evidence supporting:

Connector code has explicit discover_flow_executions() method
Uses two-pass approach: _get_table_count() for count, then _get_table() for evidence
Returns {} (empty dict) only if count query returns 0

Evidence against:

Zero flows with execution_count > 0 in the internal_inventory set
Statistically unlikely that EXACTLY ZERO of 77 flows executed in 30 days
Some flows are system-default ITSM workflows (Change, Incident) — these should execute

Interpretation B: Data Collection Skipped (Pessimistic)

The connector didn't collect execution data for these flows (permissions/config/bug).

Evidence supporting:

No data_collection_timestamp field to prove recency
execution_evidence_refs: [] for all — no proof that query was attempted
Connector code has execution_data: dict[str, dict] | None = None — optional parameter

Evidence against:

Connector doesn't have a "skip execution collection" flag
Code shows execution data is collected before transform step
No error logs in connector output about missing permissions

Interpretation C: Heterogeneous (Most Likely)

Flows/Jobs: execution data WAS collected, 0 is accurate

Business Rules/System Execution: execution data CANNOT be collected (no deterministic log table)

elif subtype in ("business_rule", "system_execution"):
    # No deterministic SN-side execution log for BRs/SIs
    props["last_observed_execution_timestamp"] = None
    props["execution_count_30d"] = 0
    props["execution_evidence_refs"] = []

Conclusion: The API response conflates "no execution records found" (flows/jobs) with "no execution records exist in ServiceNow" (business_rules/system_execution). There's no field indicating data availability.

3. The "0 vs null" Problem: Proposed Schema

3.1 Current Schema (Ambiguous)

interface EntityProperties {
  execution_count_30d: number;  // 0 means ???
  execution_evidence_refs: string[];  // [] means ???
  last_observed_execution_timestamp: string | null;  // null means ???
}

3.2 Proposed Schema (Explicit Data Quality)

interface EntityProperties {
  // Execution data
  execution_count_30d: number;
  execution_evidence_refs: string[];
  last_observed_execution_timestamp: string | null;

  // NEW: Data quality metadata
  execution_data_availability: ExecutionDataAvailability;
  execution_data_collected_at?: string;  // ISO timestamp
  execution_data_source?: string;  // "sys_flow_context" | "sys_trigger" | "unavailable"
  execution_data_notes?: string;  // "Permissions denied" | "No execution log for business_rule"
}

type ExecutionDataAvailability =
  | "available"           // Data was collected, count is accurate
  | "partial"             // Data was collected but incomplete (API limit, timeout)
  | "unavailable_no_log"  // Source system has no execution log for this automation type
  | "unavailable_no_access"  // Connector lacks permissions to query execution logs
  | "not_collected";      // Execution data collection was skipped (connector config)

3.3 Interpretation Rules

function interpretExecutionCount(entity: EntityDoc): string {
  const { execution_count_30d, properties } = entity;
  const availability = properties.execution_data_availability;

  if (availability === "available" && execution_count_30d === 0) {
    return "Confirmed zero executions in last 30 days";
  }
  if (availability === "unavailable_no_log") {
    return "Execution count unavailable (no execution log for this automation type)";
  }
  if (availability === "not_collected") {
    return "Execution count not collected";
  }
  if (execution_count_30d > 0) {
    return `${execution_count_30d} executions in last 30 days`;
  }
  return "Unknown execution status";
}

3.4 UI Impact

Current UI (Ambiguous):

Executions (30d): 0

Proposed UI (Explicit):

Executions (30d): 0 ✓ (verified 2026-02-12)
Executions (30d): 0 ⚠ (no execution log available)
Executions (30d): — (data not collected)

4. `execution_mode` Gap Analysis

4.1 Current State

77 internal_inventory entities
45 autonomous (58%)
10 operator_assisted (13%)
22 unknown (29%) ← PROBLEM

4.2 Root Cause: Trigger Type Gaps

The connector classifies execution_mode based on triggerTypes:

# Connector: transformer.py lines 1159-1209
_AUTONOMOUS_TRIGGERS = {
    "record", "schedule", "event", "data_change",
    "record_create", "record_update", "record_create_or_update",
    "daily", "weekly", "run_once", "repeat",
}
_OPERATOR_ASSISTED_TRIGGERS = {"service_catalog", "email", "inbound_action"}
_HUMAN_TRIGGERED_TRIGGERS = {"ui_action", "manual"}

# Classification logic
if not trigger_types:
    return "unknown"  # ← GAP SOURCE 1

for tt in trigger_types:
    if tt in _AUTONOMOUS_TRIGGERS:
        return "autonomous"
# ... check operator_assisted, human_triggered ...

return "unknown"  # ← GAP SOURCE 2: unrecognized trigger type

Gap Sources:

Empty triggerTypes array — Flow has no triggers configured
Unrecognized trigger types — ServiceNow emits trigger types not in allowlist

Example Entity with Gap:

{
  "display_name": "Knowledge - Approval Publish",
  "identitySubtype": "flow_designer_flow",
  "triggerTypes": ["knowledge management"],  // Not in any allowlist
  "execution_mode": "unknown"  // ← Classification gap
}

4.3 Is 29% Acceptable?

Arguments FOR (acceptable gap):

Internal inventory flows are low-priority (not displayed by default)
71% classification success for security-relevant automations may be enough
Trigger type allowlists can be expanded iteratively

Arguments AGAINST (blocking issue):

execution_mode is used in findings generation and risk scoring
"Unknown" execution mode prevents accurate dormant authority detection
29% gap means 1 in 3 flows can't be properly risk-assessed
Gap rate may be higher for security-relevant automations (not yet tested)

Recommendation:

✅ Acceptable for internal_inventory (hidden by default)
❌ Blocking for dormant_authority and active_external
ACTION: Collect trigger type gap stats for security-relevant subset

4.4 Should Platform Override Connector Classification?

Current: Connector is authoritative. Platform ingests execution_mode verbatim.

Option A: Platform Fallback (Conservative)

// During ingestion, if execution_mode === "unknown"
if (properties.identitySubtype === "business_rule") {
  properties.execution_mode = "autonomous";  // Business rules always run autonomously
}
if (properties.identitySubtype === "system_execution") {
  properties.execution_mode = "autonomous";  // Script includes are autonomous
}
// For flows, keep "unknown" → user must manually classify

Option B: Platform Re-Classification (Aggressive)

// Platform computes execution_mode from trigger types + subtype + egress signals
// Ignore connector value entirely
// Pro: Single source of truth, consistent across connectors
// Con: Duplicates connector logic, divergence risk

Option C: Platform Validation + Override Flag (Hybrid)

interface EntityProperties {
  execution_mode: ExecutionMode;
  execution_mode_source: "connector" | "platform_override" | "user_override";
  execution_mode_confidence: "high" | "low" | "unknown";
}

// Platform validates connector value, flags low-confidence classifications
if (properties.execution_mode === "unknown" && properties.identitySubtype === "business_rule") {
  properties.execution_mode = "autonomous";
  properties.execution_mode_source = "platform_override";
  properties.execution_mode_confidence = "high";
}

Recommendation: Option C (validation + metadata). Preserves connector authority while providing quality guardrails.

5. API Improvement Proposals

5.1 Automation Summary Endpoint

Current Gap: No way to get aggregate automation stats without fetching all entities.

Proposed Endpoint: GET /api/v1/automations/summary

interface AutomationSummaryResponse {
  total_count: number;
  by_subtype: Record<string, number>;  // "flow_designer_flow": 83
  by_execution_mode: Record<string, number>;  // "autonomous": 45, "unknown": 22
  by_security_relevance: Record<string, number>;  // "internal_inventory": 77
  by_egress_category: Record<string, number>;  // "none": 77, "external": 5
  with_execution_evidence: number;  // count where execution_count_30d > 0
  with_identity_binding: number;  // count where identity_binding_status == "bound"
  classification_gaps: {
    execution_mode_unknown: number;  // 22
    trigger_types_empty: number;  // count where triggerTypes == []
    execution_data_unavailable: number;  // count where execution_data_availability != "available"
  };
}

Use Cases:

Dashboard: show automation inventory at a glance
Data quality monitoring: track classification gap rate over time
Connector validation: confirm execution data collection success

Effort: 4-6 hours (new route + aggregation pipeline)

5.2 Classification Override/Reclassification Endpoint

Current Gap: No way to manually override execution_mode or security_relevance when connector gets it wrong.

Proposed Endpoint: PATCH /api/v1/entities/:id/classification

interface ClassificationOverrideRequest {
  execution_mode?: ExecutionMode;
  security_relevance?: SecurityRelevance;
  override_reason?: string;  // Required when overriding connector value
}

interface ClassificationOverrideResponse {
  entity_id: string;
  previous_classification: {
    execution_mode: string;
    execution_mode_source: string;
  };
  updated_classification: {
    execution_mode: string;
    execution_mode_source: "user_override";
    override_reason: string;
    overridden_at: string;
    overridden_by: string;  // user ID from auth context
  };
}

Workflow:

User views entity detail page
Sees execution_mode: "unknown" with confidence indicator
Clicks "Manually Classify"
Selects "Autonomous" from dropdown, provides reason: "Business rule always runs on record insert"
API updates entity properties + creates audit event
Evaluator re-runs on next sync to pick up classification change

Persistence:

interface EntityProperties {
  execution_mode: ExecutionMode;
  execution_mode_source: "connector" | "platform_override" | "user_override";
  execution_mode_override_reason?: string;
  execution_mode_overridden_at?: string;
  execution_mode_overridden_by?: string;
}

Sync Behavior:

Next connector sync should NOT overwrite user override
Add protected_fields: string[] to entity metadata
During ingestion, skip update of protected fields unless connector value changed

Effort: 8-12 hours (endpoint + UI + sync protection logic)

5.3 Data Quality Indicators per Entity

Current Gap: No visibility into which entity fields are trustworthy vs. defaulted/stale.

Proposed Addition: data_quality metadata in entity response

interface EntityDoc {
  // ... existing fields ...
  data_quality: DataQualityReport;
}

interface DataQualityReport {
  overall_score: number;  // 0-100, weighted sum of field confidence
  field_confidence: Record<string, FieldConfidence>;
  warnings: string[];  // ["execution_mode classification unknown", "no execution data collected"]
  last_validated_at?: string;
}

interface FieldConfidence {
  level: "high" | "medium" | "low" | "unavailable";
  source: "connector" | "platform_derived" | "user_override" | "default";
  collected_at?: string;
  notes?: string;
}

// Example
{
  "data_quality": {
    "overall_score": 72,
    "field_confidence": {
      "execution_mode": {
        "level": "low",
        "source": "connector",
        "notes": "Unrecognized trigger type 'knowledge management'"
      },
      "execution_count_30d": {
        "level": "high",
        "source": "connector",
        "collected_at": "2026-02-12T14:23:00Z"
      },
      "egress_category": {
        "level": "high",
        "source": "platform_derived",
        "notes": "Derived from endpoint_url analysis"
      }
    },
    "warnings": [
      "execution_mode classification unknown - manual review recommended",
      "No execution evidence in last 30 days"
    ]
  }
}

UI Impact:

Entity detail page shows data quality score badge (🟢 High / 🟡 Medium / 🔴 Low)
Field-level tooltips explain confidence level
Warnings surface in "Data Quality" tab

Effort: 12-16 hours (schema extension + computation logic + UI)

5.4 Filter: `classification_status=incomplete`

Current Gap: No way to find entities that need manual review/classification.

Proposed Query Parameter: GET /api/v1/entities?classification_status=incomplete

Logic:

function isClassificationIncomplete(entity: EntityDoc): boolean {
  return (
    entity.properties.execution_mode === "unknown" ||
    entity.properties.security_relevance === "unknown" ||
    entity.properties.execution_data_availability === "not_collected" ||
    (entity.properties.triggerTypes?.length === 0 && entity.properties.identitySubtype === "flow_designer_flow")
  );
}

Use Cases:

Connector validation: "Show me all automations with classification gaps"
User task list: "Review these 22 flows with unknown execution mode"
Data quality dashboard: "Incomplete classifications: 22 of 92 (24%)"

Implementation:

// In MongoStorageAdapter.queryEntities()
if (query.classificationStatus === "incomplete") {
  filter.$or = [
    { "properties.execution_mode": "unknown" },
    { "properties.security_relevance": "unknown" },
    { "properties.execution_data_availability": "not_collected" },
  ];
}

Effort: 2-3 hours (query parameter + filter logic)

6. Collaboration with INTEGRATOR

6.1 Question: Is ALL 77 Entities Having execution_count=0 Suspicious?

Data Point: 77 internal_inventory flows, 100% have execution_count_30d: 0.

Possible Explanations:

A. Accurate Reflection of Reality

These are template flows, system-default workflows, or disabled automations
They genuinely have not executed in the last 30 days
The connector correctly queried sys_flow_context and found 0 matching records

B. Data Collection Issue

Connector has a bug in execution data collection
Permissions issue: can't read sys_flow_context table
API limit: only fetched execution data for first N flows, rest defaulted to 0

C. Classification Filter Bias

The security_relevance classification logic is:

if has_external_egress and exec_count > 0:
    props["security_relevance"] = "active_external"
elif has_external_egress or binding == "bound":
    props["security_relevance"] = "dormant_authority"
elif exec_count > 0:
    props["security_relevance"] = "dormant_authority"
else:
    props["security_relevance"] = "internal_inventory"

By definition, anything in internal_inventory MUST have exec_count == 0 (otherwise it would be dormant_authority).

So the question becomes: Are there ANY flows in the full 92-entity dataset with execution_count_30d > 0?

INTEGRATOR Action Items:

Run connector with debug logging: confirm execution data collection was attempted for all flows
Check sys_flow_context table permissions: can the OAuth integration read it?
Manually query sys_flow_context in ServiceNow for 2-3 sample flows: confirm 0 records exist
Check for flows in dormant_authority or active_external with execution_count_30d > 0 → proves collection works

Expected Outcome:

If collection works: some flows should have exec_count > 0
If all 92 entities have exec_count=0: connector bug or permissions issue

6.2 Question: Can Execution Data Collection Be Improved?

Current Limitations:

Automation Type	Execution Log Table	Deterministic Join?	Supported?
Flow Designer Flow	`sys_flow_context`	Yes (flow reference)	✅ Yes
Scheduled Job	`sys_trigger`	Yes (document reference)	✅ Yes
Business Rule	❌ None	N/A	❌ No
System Execution (Script Include)	❌ None	N/A	❌ No

Potential ServiceNow APIs to Explore:

syslog Table (sys_log)
- Generic execution log for scripts, business rules, scheduled jobs
- Contains: timestamp, source (script name), message, level
- Join: fuzzy match on source field (not deterministic)
- Risk: high false positive rate, noise from unrelated logs
System Execution Tracker (sys_execution_tracker)
- Tracks long-running jobs and async operations
- May contain business rule executions if they take >N seconds
- Join: source_table + source field
- Risk: only captures slow executions, not representative
Table History (sys_audit)
- Tracks record changes (insert, update, delete)
- Business rules execute on these events
- Indirect signal: if sys_audit shows record changes on tables that have business_rule triggers, infer execution
- Risk: correlation, not causation
Flow Designer Execution Context (sys_hub_action_instance)
- Granular action-level execution log (individual steps within a flow)
- Join: flow reference
- Benefit: proves flow executed AND which steps ran (egress actions)
- Connector currently uses sys_flow_context (flow-level) — sys_hub_action_instance is more detailed

INTEGRATOR Recommendations:

Priority 1: Validate sys_flow_context collection is working (see §6.1)
Priority 2: Explore sys_hub_action_instance for action-level execution evidence
Priority 3: Research sys_log for business_rule execution inference (high effort, low confidence)

If execution data is truly unavailable for business_rules:

Set execution_data_availability: "unavailable_no_log" explicitly
Update UI to show "Execution count unavailable for this automation type"
Don't default to 0 — use null or a sentinel value (-1)

7. Schema Enhancement Proposals

7.1 Connector-Side Schema (NormalizedNode)

File: /Users/lucky/dev/securityv0/sv0-platform/src/ingestion/types.ts

Current:

export interface NormalizedNode {
  nodeId: string;
  nodeType: NormalizedNodeType;
  sourceSystem: string;
  sourceId: string;
  displayName: string;
  status: NodeStatus;
  createdAt?: string;
  lastModifiedAt?: string;
  properties: Record<string, unknown>;  // ← Unstructured
}

Proposed Addition (Automation Properties):

// New type for automation-specific properties
export interface AutomationProperties {
  // Existing fields
  identitySubtype: IdentitySubtype;
  automation_type: string;
  triggerTypes?: string[];
  endpoint_url?: string | null;

  // Execution evidence
  execution_count_30d: number;
  execution_evidence_refs: string[];
  last_observed_execution_timestamp?: string | null;

  // NEW: Data quality metadata
  execution_data_availability: ExecutionDataAvailability;
  execution_data_collected_at?: string;
  execution_data_source?: string;
  execution_data_notes?: string;

  // Classification
  execution_mode: ExecutionMode;
  execution_mode_confidence: "high" | "low" | "unknown";
  security_relevance: SecurityRelevance;

  // Egress
  egress_category: EgressCategory;
  egress_host?: string | null;
  egress_base_url?: string | null;

  // Identity binding
  identity_binding_status: "bound" | "unlinked";

  // Risk assessment
  risk_group: string;
  risk_group_label: string;
  risk_group_priority: string;
  ownership_status: "valid" | "orphaned";

  // Referenced data
  referenced_tables?: string[];
  data_domains?: string[];
}

export type ExecutionMode = "autonomous" | "operator_assisted" | "human_triggered" | "unknown";
export type SecurityRelevance = "active_external" | "dormant_authority" | "internal_inventory" | "unknown";
export type EgressCategory = "none" | "internal" | "external" | "cloud" | "llm" | "unknown";
export type ExecutionDataAvailability =
  | "available"
  | "partial"
  | "unavailable_no_log"
  | "unavailable_no_access"
  | "not_collected";
export type IdentitySubtype =
  | "flow_designer_flow"
  | "business_rule"
  | "scheduled_job"
  | "system_execution"
  | "oauth_app"
  | "service_principal";

Migration Strategy:

Add types to ingestion/types.ts
Connector already emits these properties (they're in properties: Record<string, unknown>)
Platform ingestion validates against type (runtime check, not compile-time)
UI can now type-safely access entity.properties.execution_mode as ExecutionMode

Effort: 2-3 hours (type definitions + validation)

7.2 Platform-Side Schema (EntityDoc)

File: /Users/lucky/dev/securityv0/sv0-platform/src/domain/entities/types.ts

Current:

export interface EntityDoc {
  _id: string;
  tenant_id: string;
  entity_type: EntityType;
  source_system: string;
  source_id: string;
  properties: Record<string, unknown>;  // ← Unstructured
  relationships: EntityRelationship[];
  execution_paths?: ExecutionPath[];
  accessible_by?: AccessibleByEntry[];
  sync_version: number;
  last_synced_at: Date;
  created_at: Date;
  updated_at: Date;
}

Proposed Addition:

export interface EntityDoc {
  // ... existing fields ...

  // NEW: Data quality metadata
  data_quality?: DataQualityReport;

  // NEW: User overrides
  user_overrides?: UserOverrideMetadata;
}

export interface DataQualityReport {
  overall_score: number;  // 0-100
  field_confidence: Record<string, FieldConfidence>;
  warnings: string[];
  last_validated_at?: Date;
}

export interface FieldConfidence {
  level: "high" | "medium" | "low" | "unavailable";
  source: "connector" | "platform_derived" | "user_override" | "default";
  collected_at?: Date;
  notes?: string;
}

export interface UserOverrideMetadata {
  protected_fields: string[];  // Fields that won't be overwritten by connector sync
  overrides: Record<string, FieldOverride>;
}

export interface FieldOverride {
  field_name: string;
  original_value: unknown;
  override_value: unknown;
  override_reason: string;
  overridden_at: Date;
  overridden_by: string;  // user ID
}

Computation Logic (during ingestion):

// In ingestion/normalizer.ts
function computeDataQuality(entity: EntityDoc): DataQualityReport {
  const confidence: Record<string, FieldConfidence> = {};
  const warnings: string[] = [];

  if (entity.properties.execution_mode === "unknown") {
    confidence.execution_mode = {
      level: "low",
      source: "connector",
      notes: "Trigger type not recognized by connector"
    };
    warnings.push("execution_mode classification unknown - manual review recommended");
  } else {
    confidence.execution_mode = {
      level: "high",
      source: "connector",
      collected_at: new Date(entity.last_synced_at)
    };
  }

  if (entity.properties.execution_count_30d === 0 && !entity.properties.execution_data_availability) {
    confidence.execution_count_30d = {
      level: "medium",
      source: "connector",
      notes: "Zero count, but availability status unknown"
    };
    warnings.push("Execution count is 0 - unclear if data was collected");
  }

  // ... more field checks ...

  const overall_score = computeOverallScore(confidence);

  return {
    overall_score,
    field_confidence: confidence,
    warnings,
    last_validated_at: new Date()
  };
}

Effort: 8-12 hours (schema + computation + storage)

7.3 MongoDB Index Additions

File: /Users/lucky/dev/securityv0/sv0-platform/src/storage/mongo/collections.ts

Proposed Indexes:

// For classification_status filter
await entities.createIndex({
  tenant_id: 1,
  "properties.execution_mode": 1,
  "properties.security_relevance": 1
});

// For data quality queries
await entities.createIndex({
  tenant_id: 1,
  "data_quality.overall_score": 1
});

// For user override tracking
await entities.createIndex({
  tenant_id: 1,
  "user_overrides.protected_fields": 1
});

Effort: 1 hour (index creation + migration script)

8. Pre-Ingest Filter Analysis

8.1 Current State

Connector Code: transformer.py lines 103-111

# Optionally filter internal_inventory automations (connector-side pre-filter).
# Default OFF to preserve Phase 1 inventory completeness gate.
if filter_internal_inventory:
    filtered_count = self._filter_internal_inventory()
    if filtered_count > 0:
        logging.getLogger(__name__).info(
            "Filtered %d internal_inventory automation(s) from NormalizedGraph output",
            filtered_count,
        )

Filter Logic: Lines 1211-1260

def _filter_internal_inventory(self) -> int:
    """Remove internal_inventory automation nodes and their orphaned edges/owner nodes.

    Filtering criteria: security_relevance == "internal_inventory" means:
    - egress_category in (none, internal, unknown)
    - identity_binding_status == "unlinked"
    - execution_count_30d == 0
    """
    # Find automation node IDs to remove
    remove_node_ids: set[str] = set()
    for node in self._nodes:
        if node.get("nodeType") == "autonomous_identity":
            rel = node.get("properties", {}).get("security_relevance")
            if rel == "internal_inventory":
                remove_node_ids.add(node["nodeId"])

    # Remove nodes
    self._nodes = [n for n in self._nodes if n["nodeId"] not in remove_node_ids]

    # Remove orphaned edges
    self._edges = [
        e for e in self._edges
        if e["sourceNodeId"] not in remove_node_ids
        and e["targetNodeId"] not in remove_node_ids
    ]

    # Remove orphaned owner nodes (OWNED_BY targets with no other edges)
    # ... (omitted for brevity)

    return len(remove_node_ids)

8.2 Question: Should Filtering Happen Pre-Ingest or Post-Ingest?

Option A: Pre-Ingest (Current Implementation, Disabled by Default)

Pros:

Reduces entity count before platform ingestion (lower storage, faster queries)
Simplifies platform by not storing irrelevant data
Graph layout is immediately clean (no 77 internal_inventory nodes)

Cons:

Inventory incompleteness — can't retroactively include entities if criteria change
Audit gap — no record that these automations exist in the source system
Temporal loss — can't track when internal_inventory automations become security-relevant
Discovery validation impossible — can't prove connector scanned all flows if some are filtered out

Option B: Post-Ingest (Platform Filters)

Pros:

Complete inventory — every discovered automation is stored
Temporal tracking — can see when execution_count changes from 0 → N (dormant → active)
Audit trail — proves connector scanned all entities, none were lost
Flexible filtering — UI can show/hide internal_inventory on demand
Reclassification — if connector gets security_relevance wrong, platform can override

Cons:

Higher entity count (92 instead of ~15)
Requires UI/API default filters to hide noise
Graph layout requires filtering logic

Recommendation: Option B (Post-Ingest with Default Filters)

Rationale:

Discovery is broad, analysis is narrow — connector should discover everything, platform should filter for security relevance
Temporal use case — a flow with 0 executions today may have 10 executions tomorrow. If it's filtered pre-ingest, we lose that transition.
Audit/compliance — "How many automations exist in ServiceNow?" should be 92, not 15
Data quality validation — can compare connector output to manual ServiceNow queries only if all entities are ingested

Implementation:

Keep filter_internal_inventory: bool = False (default OFF in connector)
Add default filter to platform API: GET /api/v1/entities?entity_type=identity&security_relevance!=internal_inventory
Add default filter to UI Automations page
Graph browse mode defaults to same filter (see automation-filtering-graph-strategy.md §S1)

Migration Path:

Currently deployed: internal_inventory entities ARE ingested (filter is off)
No migration needed
Just add default filters to API/UI

8.3 When Should Pre-Ingest Filtering Be Used?

Valid Use Cases:

Connector has a bug that discovers non-existent/duplicate entities → filter in connector until bug is fixed
Source system permissions limit — can't query execution data for some entities → filter them to avoid misleading 0 counts
Scale issues — 10,000+ automations discovered, platform can't handle load → filter to top N by relevance

Invalid Use Cases:

Hiding false positives — this should be done via UI filters, not pre-ingest
Improving graph layout — this is a UI problem, not a data problem
"Cleaning up" the inventory — defeats the purpose of deterministic discovery

Recommendation for sv0-connectors:

Remove filter_internal_inventory parameter entirely (simplifies connector interface)
Always emit all discovered entities
Let platform handle relevance filtering

9. Challenge Questions for Other Roles

9.1 For Product Owner

Q1: Should we expose data quality confidence levels in the UI?

Context: 29% of flows have execution_mode: "unknown". Currently, the UI shows this as a plain value. Should we add visual indicators (🟢 High / 🟡 Low / 🔴 Unknown) to signal data quality?

Impact:

Users can prioritize manual review of low-confidence entities
Reduces false confidence in incomplete data
Adds visual noise to UI

Recommendation: Yes, but make it subtle (icon + tooltip, not full badge).

Q2: Is "Show all automations" a toggle or a separate page?

Context: 77 of 92 automations are internal_inventory (hidden by default). Should users see them via:

A. Toggle switch "Show internal inventory" on main Automations page
B. Separate "Automation Inventory (All)" page
C. Filter dropdown: "Security-relevant only" / "All automations"

Recommendation: Option C (filter dropdown) — most flexible, matches existing filter patterns.

Q3: Should incomplete classification block findings generation?

Context: If execution_mode: "unknown", should the evaluator still generate dormant_authority findings, or skip the entity?

Options:

Block: conservative, avoids false positives, but reduces finding coverage
Allow: aggressive, treats "unknown" as "autonomous" (assume worst case)
Flag: generate finding but mark it as "low_confidence"

Recommendation: Option C (flag with low_confidence).

9.2 For CISO

Q1: Is "incomplete classification" itself a finding we should surface?

Example Finding:

Finding Type: INCOMPLETE_AUTOMATION_CLASSIFICATION
Severity: Low
Title: 22 automations with unknown execution mode
Description: 22 of 92 discovered automations have execution_mode="unknown" due to
             unrecognized trigger types. Manual classification recommended to ensure
             complete risk assessment.
Evidence:
  - Automation IDs: [list of 22 entity IDs]
  - Trigger types causing gaps: ["knowledge management", "email handler", ...]
  - Recommendation: Expand connector trigger type allowlist or manually classify

Pro: Surfaces data quality gaps as actionable items Con: Not a security risk per se, more of an operational issue

Recommendation: Yes, but as severity "Informational" (not Low/Medium/High).

Q2: What is the acceptable classification gap rate?

Current: 29% of flows have execution_mode: "unknown"

Question: What threshold should trigger an alert?

0% (perfect classification required)?
<10% (acceptable noise)?
<25% (current state is acceptable)?

Recommendation: <10% for security-relevant automations, <50% for internal_inventory.

Q3: Should we trust execution_count=0 or require manual verification?

Context: ALL 77 internal_inventory flows have execution_count_30d: 0. No execution_data_availability metadata to confirm this is accurate.

Options:

Trust it: assume connector is correct, proceed with analysis
Flag it: show warning "Execution count may be incomplete"
Block it: require INTEGRATOR to validate before accepting data

Recommendation: Flag it (option B) until INTEGRATOR confirms collection works (see §6).

9.3 For Architect

Q1: Should classification be a platform concern or remain connector-side?

Current: Connector computes execution_mode, security_relevance, risk_group. Platform ingests verbatim.

Alternative: Platform recomputes these during ingestion based on normalized properties.

Pros of Platform Classification:

Single source of truth
Consistent across all connectors
Easier to evolve classification logic (no connector updates needed)

Cons:

Duplicates logic between connector and platform
Connector loses autonomy
What if connector has better context (e.g., ServiceNow-specific trigger types)?

Recommendation: Hybrid — connector provides raw signals (trigger types, egress URLs), platform derives classification. Connector can provide hints, but platform is authoritative.

Q2: Should execution_data_availability be part of the NormalizedGraph schema?

Context: Proposed new field to distinguish "confirmed zero" from "data not collected".

Question: Should this be:

A. Required field in NormalizedGraph (connector MUST provide it)
B. Optional field (connector MAY provide it, platform infers if absent)
C. Platform-computed only (connector doesn't emit it, platform adds during ingestion)

Recommendation: Option A (required). Data quality is critical — connectors should explicitly declare availability.

Q3: Should we support connector-to-platform data quality feedback?

Context: Connector knows when it hits API limits, permissions errors, or timeouts during data collection.

Proposed: Add collection_warnings to NormalizedGraph:

export interface NormalizedGraph {
  // ... existing fields ...
  collectionWarnings?: CollectionWarning[];
}

export interface CollectionWarning {
  field: string;  // "execution_count_30d"
  severity: "info" | "warning" | "error";
  message: string;  // "API limit reached, execution count may be incomplete"
  affected_entities?: string[];  // nodeIds
}

Benefit: Platform can surface connector issues in UI, not just logs.

Recommendation: Yes — this closes the feedback loop between connector and platform.

9.4 For Integrator

Q1: What additional ServiceNow APIs would improve execution_count reliability?

Current: Uses sys_flow_context (flow-level) and sys_trigger (job-level).

Gaps:

Business rules: no execution log
Flows: only flow-level count, not action-level detail

Proposed Research:

sys_hub_action_instance — action-level execution log (which flow steps ran)
sys_log — generic script execution log (may contain business rule executions)
sys_execution_tracker — long-running job tracker
sys_audit — table history (indirect signal for business rule executions)

Question: Which of these are feasible with OAuth app permissions?

Expected Effort: 4-8 hours research + testing

Q2: Can we get last_modified_date for flows?

Context: Currently missing from entity properties. Would help identify recently-edited flows (potential new risk).

ServiceNow Field: sys_updated_on in sys_hub_flow table

Question: Already collected but not emitted, or not collected?

Action: Check connector code, add to properties if available.

Q3: Should connector emit a "data collection report" after each scan?

Proposed: After discovery, connector emits a summary:

{
  "collection_summary": {
    "flows_discovered": 83,
    "flows_with_execution_data": 0,  // ← KEY METRIC
    "flows_skipped_no_permissions": 0,
    "execution_data_sources": ["sys_flow_context", "sys_trigger"],
    "collection_duration_seconds": 42,
    "api_calls_made": 156,
    "api_limits_hit": 0
  }
}

Benefit: Immediate visibility into connector health, not just entity data.

Recommendation: Yes — include in connector sync metadata.

10. Summary & Recommendations

10.1 Critical Issues (Blocking)

Issue	Impact	Recommendation	Effort
Null Ambiguity	Can't distinguish "confirmed zero" from "not checked"	Add `execution_data_availability` to schema	8-12h
29% execution_mode Gap	Can't classify 1 in 3 automations	Expand trigger type allowlist + platform fallback	4-6h
No Data Quality Metadata	Can't assess confidence in entity properties	Add `data_quality` to EntityDoc	12-16h

10.2 High-Value Improvements (Recommended)

Feature	Use Case	Effort
Automation Summary Endpoint	Dashboard stats, connector validation	4-6h
Classification Override API	Manual review workflow	8-12h
`classification_status=incomplete` Filter	Find entities needing review	2-3h
Pre-Ingest Filter Removal	Preserve inventory completeness	1h

10.3 INTEGRATOR Action Items

Priority 1: Validate execution data collection works (check for flows with exec_count > 0)
Priority 2: Research sys_hub_action_instance for action-level execution evidence
Priority 3: Add last_modified_date to flow properties
Priority 4: Emit data collection summary in sync metadata

10.4 Platform Schema Enhancements

// 1. Add to NormalizedNode properties (connector emits)
interface AutomationProperties {
  execution_data_availability: ExecutionDataAvailability;
  execution_data_collected_at?: string;
  execution_mode_confidence: "high" | "low" | "unknown";
}

// 2. Add to EntityDoc (platform computes)
interface EntityDoc {
  data_quality?: DataQualityReport;
  user_overrides?: UserOverrideMetadata;
}

// 3. Add to NormalizedGraph (connector emits)
interface NormalizedGraph {
  collectionWarnings?: CollectionWarning[];
}

10.5 API Additions

GET  /api/v1/automations/summary
GET  /api/v1/entities?classification_status=incomplete
PATCH /api/v1/entities/:id/classification

10.6 Answers to Core Questions

Q: Does execution_count_30d: 0 mean "we checked and found zero" or "we didn't check"? A: Currently ambiguous. Recommendation: Add execution_data_availability field to make this explicit.

Q: Is a 29% execution_mode: "unknown" rate acceptable? A: Acceptable for internal_inventory (hidden by default), blocking for security-relevant automations. Recommendation: Platform fallback for known subtypes (business_rule → autonomous).

Q: Should the platform have a fallback classification if the connector returns "unknown"? A: Yes, with metadata indicating override. Use execution_mode_source: "platform_override" to track provenance.

Appendix A: TypeScript Type Definitions

File: /Users/lucky/dev/securityv0/sv0-platform/src/domain/entities/automation-types.ts (new)

/**
 * Automation-specific types for identity entities.
 * These types provide structure for properties that were previously untyped (Record<string, unknown>).
 */

export type IdentitySubtype =
  | "flow_designer_flow"
  | "business_rule"
  | "scheduled_job"
  | "system_execution"
  | "oauth_app"
  | "service_principal";

export type ExecutionMode = "autonomous" | "operator_assisted" | "human_triggered" | "unknown";

export type SecurityRelevance =
  | "active_external"      // Has external egress + execution evidence
  | "dormant_authority"    // Has capability but no recent execution
  | "internal_inventory"   // No external egress, no execution, unlinked
  | "unknown";

export type EgressCategory = "none" | "internal" | "external" | "cloud" | "llm" | "unknown";

export type ExecutionDataAvailability =
  | "available"                 // Data was collected, count is accurate
  | "partial"                   // Data was collected but incomplete (API limit, timeout)
  | "unavailable_no_log"        // Source system has no execution log for this automation type
  | "unavailable_no_access"     // Connector lacks permissions to query execution logs
  | "not_collected";            // Execution data collection was skipped

export interface AutomationProperties {
  // Identity classification
  identitySubtype: IdentitySubtype;
  automation_type: string;  // "flow", "business_rule", "job", "script"

  // Trigger configuration
  triggerTypes?: string[];

  // Execution evidence
  execution_count_30d: number;
  execution_evidence_refs: string[];
  last_observed_execution_timestamp?: string | null;

  // Data quality metadata
  execution_data_availability: ExecutionDataAvailability;
  execution_data_collected_at?: string;  // ISO 8601 timestamp
  execution_data_source?: string;  // "sys_flow_context" | "sys_trigger" | "unavailable"
  execution_data_notes?: string;

  // Classification
  execution_mode: ExecutionMode;
  execution_mode_confidence: "high" | "low" | "unknown";
  execution_mode_source?: "connector" | "platform_override" | "user_override";
  security_relevance: SecurityRelevance;

  // Egress analysis
  egress_category: EgressCategory;
  egress_host?: string | null;
  egress_base_url?: string | null;
  endpoint_url?: string | null;

  // Identity binding
  identity_binding_status: "bound" | "unlinked";

  // Risk assessment
  risk_group: string;  // "RG1" | "RG2" | "RG3" | "RG4" | "RG5"
  risk_group_label: string;
  risk_group_priority: string;  // "P1" | "P2" | "P3" | "P4"

  // Ownership
  ownership_status: "valid" | "orphaned";
  sys_created_by?: string;
  sys_updated_by?: string;

  // Referenced data
  referenced_tables?: string[];
  data_domains?: string[];
}

export interface DataQualityReport {
  overall_score: number;  // 0-100, weighted sum of field confidence scores
  field_confidence: Record<string, FieldConfidence>;
  warnings: string[];
  last_validated_at?: Date;
}

export interface FieldConfidence {
  level: "high" | "medium" | "low" | "unavailable";
  source: "connector" | "platform_derived" | "user_override" | "default";
  collected_at?: Date;
  notes?: string;
}

export interface UserOverrideMetadata {
  protected_fields: string[];  // Field names that won't be overwritten by connector sync
  overrides: Record<string, FieldOverride>;
}

export interface FieldOverride {
  field_name: string;
  original_value: unknown;
  override_value: unknown;
  override_reason: string;
  overridden_at: Date;
  overridden_by: string;  // User ID from auth context
}

export interface CollectionWarning {
  field: string;  // Property name that was affected
  severity: "info" | "warning" | "error";
  message: string;
  affected_entities?: string[];  // nodeIds of entities affected by this warning
}

Appendix B: API Endpoint Specifications

B.1 Automation Summary Endpoint

GET /api/v1/automations/summary

Query Parameters:

tenant_id (from auth context)
source_system (optional): filter by source system

Response:

{
  "total_count": 92,
  "by_subtype": {
    "flow_designer_flow": 83,
    "business_rule": 2,
    "oauth_app": 3,
    "service_principal": 2,
    "system_execution": 2
  },
  "by_execution_mode": {
    "autonomous": 45,
    "operator_assisted": 10,
    "unknown": 22,
    "human_triggered": 0
  },
  "by_security_relevance": {
    "internal_inventory": 77,
    "dormant_authority": 9,
    "active_external": 1,
    "unknown": 5
  },
  "by_egress_category": {
    "none": 77,
    "external": 5,
    "llm": 3,
    "cloud": 2,
    "internal": 3,
    "unknown": 2
  },
  "with_execution_evidence": 0,
  "with_identity_binding": 9,
  "classification_gaps": {
    "execution_mode_unknown": 22,
    "trigger_types_empty": 5,
    "execution_data_unavailable": 2
  },
  "data_quality": {
    "overall_average_score": 72,
    "entities_with_warnings": 24,
    "low_confidence_count": 22
  }
}

B.2 Classification Override Endpoint

PATCH /api/v1/entities/:id/classification

Request Body:

{
  "execution_mode": "autonomous",
  "override_reason": "Business rule always runs on record insert, not human-triggered"
}

Response:

{
  "entity_id": "027e16c40dc1009472308597",
  "previous_classification": {
    "execution_mode": "unknown",
    "execution_mode_source": "connector"
  },
  "updated_classification": {
    "execution_mode": "autonomous",
    "execution_mode_source": "user_override",
    "execution_mode_confidence": "high",
    "override_reason": "Business rule always runs on record insert, not human-triggered",
    "overridden_at": "2026-02-12T15:30:00Z",
    "overridden_by": "user-123"
  },
  "protected_fields": ["execution_mode"]
}

B.3 Classification Status Filter

GET /api/v1/entities?classification_status=incomplete

Logic: Returns entities where ANY of:

properties.execution_mode === "unknown"
properties.security_relevance === "unknown"
properties.execution_data_availability === "not_collected"
properties.triggerTypes.length === 0 AND properties.identitySubtype === "flow_designer_flow"

Response:

{
  "data": [
    {
      "_id": "027e16c40dc1009472308597",
      "entity_type": "identity",
      "properties": {
        "display_name": "Knowledge - Approval Publish",
        "execution_mode": "unknown",
        "triggerTypes": ["knowledge management"],
        "execution_data_availability": "available"
      },
      "data_quality": {
        "overall_score": 65,
        "warnings": ["execution_mode classification unknown - manual review recommended"]
      }
    }
  ],
  "cursor": null,
  "meta": {
    "total_count": 22
  }
}

Appendix C: Implementation Checklist

Phase 1: Schema Enhancements (8-12 hours)

Add automation-types.ts with structured types
Add execution_data_availability to AutomationProperties
Add execution_mode_confidence to AutomationProperties
Add DataQualityReport to EntityDoc
Add UserOverrideMetadata to EntityDoc
Create MongoDB indexes for new fields

Phase 2: API Endpoints (12-16 hours)

Implement GET /api/v1/automations/summary
Implement PATCH /api/v1/entities/:id/classification
Add classification_status=incomplete query parameter
Add data quality computation during ingestion
Add protected_fields sync behavior

Phase 3: Connector Updates (INTEGRATOR) (8-12 hours)

Add execution_data_availability to transformer output
Add execution_mode_confidence to transformer output
Add execution_data_collected_at timestamp
Add collectionWarnings to NormalizedGraph
Validate execution data collection works (check for exec_count > 0)
Research sys_hub_action_instance API

Phase 4: UI Updates (8-12 hours)

Add data quality badge to entity detail page
Add "Manually Classify" button for entities with execution_mode=unknown
Add classification override modal
Add "Show internal inventory" filter toggle
Add classification_status=incomplete filter to Automations page
Add automation summary dashboard widget

Phase 5: Testing (4-6 hours)

Unit tests for data quality computation
Integration tests for classification override
E2E test for incomplete classification filter
Connector test: verify execution_data_availability is populated
Manual test: override execution_mode, verify sync doesn't revert

END OF ANALYSIS

Executive Summary​

1. Hypothesis​

1.1 Semantic Overload​

1.2 Classification Gap Propagation​

1.3 Temporal Data Quality Decay​

2. API Data Quality Audit​

2.1 Field-by-Field Reliability Assessment​

2.2 The execution_count_30d: 0 Problem​

Interpretation A: Confirmed Zero (Optimistic)​

Interpretation B: Data Collection Skipped (Pessimistic)​

Interpretation C: Heterogeneous (Most Likely)​

3. The "0 vs null" Problem: Proposed Schema​

3.1 Current Schema (Ambiguous)​

3.2 Proposed Schema (Explicit Data Quality)​

3.3 Interpretation Rules​

3.4 UI Impact​

4. execution_mode Gap Analysis​

4.1 Current State​

4.2 Root Cause: Trigger Type Gaps​

4.3 Is 29% Acceptable?​

4.4 Should Platform Override Connector Classification?​

5. API Improvement Proposals​

5.1 Automation Summary Endpoint​

5.2 Classification Override/Reclassification Endpoint​

5.3 Data Quality Indicators per Entity​

5.4 Filter: classification_status=incomplete​

6. Collaboration with INTEGRATOR​

6.1 Question: Is ALL 77 Entities Having execution_count=0 Suspicious?​

A. Accurate Reflection of Reality​

B. Data Collection Issue​

C. Classification Filter Bias​

6.2 Question: Can Execution Data Collection Be Improved?​

7. Schema Enhancement Proposals​

7.1 Connector-Side Schema (NormalizedNode)​

7.2 Platform-Side Schema (EntityDoc)​

7.3 MongoDB Index Additions​

8. Pre-Ingest Filter Analysis​

8.1 Current State​

8.2 Question: Should Filtering Happen Pre-Ingest or Post-Ingest?​

Option A: Pre-Ingest (Current Implementation, Disabled by Default)​

Option B: Post-Ingest (Platform Filters)​

Recommendation: Option B (Post-Ingest with Default Filters)​

8.3 When Should Pre-Ingest Filtering Be Used?​

9. Challenge Questions for Other Roles​

9.1 For Product Owner​

9.2 For CISO​

9.3 For Architect​

9.4 For Integrator​

10. Summary & Recommendations​

10.1 Critical Issues (Blocking)​

10.2 High-Value Improvements (Recommended)​

10.3 INTEGRATOR Action Items​

10.4 Platform Schema Enhancements​

10.5 API Additions​

10.6 Answers to Core Questions​

Appendix A: TypeScript Type Definitions​

Appendix B: API Endpoint Specifications​

B.1 Automation Summary Endpoint​

B.2 Classification Override Endpoint​

B.3 Classification Status Filter​

Appendix C: Implementation Checklist​

Phase 1: Schema Enhancements (8-12 hours)​

Phase 2: API Endpoints (12-16 hours)​

Phase 3: Connector Updates (INTEGRATOR) (8-12 hours)​

Phase 4: UI Updates (8-12 hours)​

Phase 5: Testing (4-6 hours)​

Executive Summary

1. Hypothesis

1.1 Semantic Overload

1.2 Classification Gap Propagation

1.3 Temporal Data Quality Decay

2. API Data Quality Audit

2.1 Field-by-Field Reliability Assessment

2.2 The `execution_count_30d: 0` Problem

Interpretation A: Confirmed Zero (Optimistic)

Interpretation B: Data Collection Skipped (Pessimistic)

Interpretation C: Heterogeneous (Most Likely)

3. The "0 vs null" Problem: Proposed Schema

3.1 Current Schema (Ambiguous)

3.2 Proposed Schema (Explicit Data Quality)

3.3 Interpretation Rules

3.4 UI Impact

4. `execution_mode` Gap Analysis

4.1 Current State

4.2 Root Cause: Trigger Type Gaps

4.3 Is 29% Acceptable?

4.4 Should Platform Override Connector Classification?

5. API Improvement Proposals

5.1 Automation Summary Endpoint

5.2 Classification Override/Reclassification Endpoint

5.3 Data Quality Indicators per Entity

5.4 Filter: `classification_status=incomplete`

6. Collaboration with INTEGRATOR

6.1 Question: Is ALL 77 Entities Having execution_count=0 Suspicious?

A. Accurate Reflection of Reality

B. Data Collection Issue

C. Classification Filter Bias

6.2 Question: Can Execution Data Collection Be Improved?

7. Schema Enhancement Proposals

7.1 Connector-Side Schema (NormalizedNode)

7.2 Platform-Side Schema (EntityDoc)

7.3 MongoDB Index Additions

8. Pre-Ingest Filter Analysis

8.1 Current State

8.2 Question: Should Filtering Happen Pre-Ingest or Post-Ingest?

Option A: Pre-Ingest (Current Implementation, Disabled by Default)

Option B: Post-Ingest (Platform Filters)

Recommendation: Option B (Post-Ingest with Default Filters)

8.3 When Should Pre-Ingest Filtering Be Used?

9. Challenge Questions for Other Roles

9.1 For Product Owner

9.2 For CISO

9.3 For Architect

9.4 For Integrator

10. Summary & Recommendations

10.1 Critical Issues (Blocking)

10.2 High-Value Improvements (Recommended)

10.3 INTEGRATOR Action Items

10.4 Platform Schema Enhancements

10.5 API Additions

10.6 Answers to Core Questions

Appendix A: TypeScript Type Definitions

Appendix B: API Endpoint Specifications

B.1 Automation Summary Endpoint

B.2 Classification Override Endpoint

B.3 Classification Status Filter

Appendix C: Implementation Checklist

Phase 1: Schema Enhancements (8-12 hours)

Phase 2: API Endpoints (12-16 hours)

Phase 3: Connector Updates (INTEGRATOR) (8-12 hours)

Phase 4: UI Updates (8-12 hours)

Phase 5: Testing (4-6 hours)