API Data Quality Analysis: Automation Classification & Execution Evidence
Author: DEVELOPER (automation-analysis team)
Date: 2026-02-12
Context: Analysis of /api/v1/entities response structure for automation classification gaps
Dataset: 92 identity entities from http://localhost:3000/api/v1/entities
Executive Summary
Primary Finding: The API response structure conflates "no data collected" with "confirmed zero", has a 29% classification gap rate, and lacks confidence indicators. This makes downstream analysis ambiguous and prevents reliable automation risk assessment.
Critical Questions:
- Does
execution_count_30d: 0mean "we checked and found zero" or "we didn't check"? - Is a 29%
execution_mode: "unknown"rate acceptable for security-relevant automation classification? - Should the platform validate/override connector-provided classifications?
Recommendation: Add data quality metadata to entity schema, expose classification confidence levels in API responses, and provide reclassification endpoints.
1. Hypothesis
The Null Ambiguity Problem:
The current entity schema uses default values (0, null, "unknown") without distinguishing data unavailability from confirmed absence. This creates three failure modes:
1.1 Semantic Overload
execution_count_30d: 0could mean:- A) We queried
sys_flow_contextand found exactly 0 matching records - B) We skipped execution data collection (connector config/permissions)
- C) This automation type has no deterministic execution log table (business_rule, system_execution)
- A) We queried
1.2 Classification Gap Propagation
- Connector returns
execution_mode: "unknown"for 29% of flows - Platform ingests this verbatim with no validation or fallback
- Every downstream consumer (UI, evaluator, reporting) inherits the gap
1.3 Temporal Data Quality Decay
- No
last_data_collection_timestampfield - No way to know if
execution_count_30d: 0reflects data from today or last month - Stale data presented as current
Core Issue: The schema assumes the connector is always authoritative and complete. In practice, connectors have permissions gaps, API limits, and implementation bugs.
2. API Data Quality Audit
2.1 Field-by-Field Reliability Assessment
Based on analysis of 92 identity entities (77 internal_inventory, 9 dormant_authority, 5 unknown, 1 active_external):
| Field | Populated | Null/Default | Trustworthy? | Notes |
|---|---|---|---|---|
display_name | 100% | 0% | ✅ Yes | Always present from source system |
status | 100% | 0% | ✅ Yes | "active" or "disabled" from source |
identitySubtype | 100% | 0% | ✅ Yes | Deterministic from sys_class_name |
automation_type | 100% | 0% | ✅ Yes | Derived from subtype (flow, business_rule, etc.) |
sys_created_by | 100% | 0% | ✅ Yes | ServiceNow audit field |
sys_updated_by | 100% | 0% | ✅ Yes | ServiceNow audit field |
triggerTypes | ~95%* | ~5% | ⚠️ Partial | Flow trigger types extracted from sys_hub_trigger_instance |
endpoint_url | ~15% | 85% null | ⚠️ Partial | Only populated when REST step detected in flow actions |
last_observed_execution_timestamp | 0%** | 100% null | ❌ No | ALL internal_inventory have null (0 exec → no timestamp) |
execution_count_30d | 100% | 0% (all zeros) | ❓ AMBIGUOUS | See §2.2 |
execution_evidence_refs | 0% | 100% empty | ❓ AMBIGUOUS | Empty means "no executions" OR "didn't check" |
identity_binding_status | 100% | 0% | ✅ Yes | "bound" or "unlinked" from RUNS_AS edge resolution |
egress_host | ~15% | 85% null | ✅ Yes | Null correctly means "no external egress detected" |
egress_base_url | ~15% | 85% null | ✅ Yes | Null correctly means "no external egress detected" |
egress_category | 100% | 0% | ✅ Yes | "none", "internal", "external", "cloud", "llm", "unknown" |
referenced_tables | ~90% | ~10% empty | ✅ Yes | Extracted from flow actions/triggers |
data_domains | 100% | 0% | ⚠️ Partial | Falls back to "unknown" if table→domain mapping missing |
ownership_status | 100% | 0% | ✅ Yes | "valid" or "orphaned" from OWNED_BY edge validation |
risk_group | 100% | 0% | ✅ Yes | Deterministic from egress_category + data_domains |
risk_group_label | 100% | 0% | ✅ Yes | Display label for risk_group |
risk_group_priority | 100% | 0% | ✅ Yes | P1-P4 from risk_group |
execution_mode | 71% | 29% "unknown" | ❌ GAP | See §4 |
security_relevance | 100% | 0% | ⚠️ DERIVED | Computed from other fields; trustworthy IFF inputs are |
* Some flows have empty triggerTypes array → classified as unknown ** Within internal_inventory subset
2.2 The execution_count_30d: 0 Problem
Observation: ALL 77 internal_inventory entities have execution_count_30d: 0.
Three Possible Interpretations:
Interpretation A: Confirmed Zero (Optimistic)
The connector successfully queried sys_flow_context for each flow and confirmed 0 matching records in the last 30 days.
Evidence supporting:
- Connector code has explicit
discover_flow_executions()method - Uses two-pass approach:
_get_table_count()for count, then_get_table()for evidence - Returns
{}(empty dict) only if count query returns 0
Evidence against:
- Zero flows with execution_count > 0 in the internal_inventory set
- Statistically unlikely that EXACTLY ZERO of 77 flows executed in 30 days
- Some flows are system-default ITSM workflows (Change, Incident) — these should execute
Interpretation B: Data Collection Skipped (Pessimistic)
The connector didn't collect execution data for these flows (permissions/config/bug).
Evidence supporting:
- No
data_collection_timestampfield to prove recency execution_evidence_refs: []for all — no proof that query was attempted- Connector code has
execution_data: dict[str, dict] | None = None— optional parameter
Evidence against:
- Connector doesn't have a "skip execution collection" flag
- Code shows execution data is collected before transform step
- No error logs in connector output about missing permissions
Interpretation C: Heterogeneous (Most Likely)
- Flows/Jobs: execution data WAS collected, 0 is accurate
- Business Rules/System Execution: execution data CANNOT be collected (no deterministic log table)
elif subtype in ("business_rule", "system_execution"):
# No deterministic SN-side execution log for BRs/SIs
props["last_observed_execution_timestamp"] = None
props["execution_count_30d"] = 0
props["execution_evidence_refs"] = []
Conclusion: The API response conflates "no execution records found" (flows/jobs) with "no execution records exist in ServiceNow" (business_rules/system_execution). There's no field indicating data availability.
3. The "0 vs null" Problem: Proposed Schema
3.1 Current Schema (Ambiguous)
interface EntityProperties {
execution_count_30d: number; // 0 means ???
execution_evidence_refs: string[]; // [] means ???
last_observed_execution_timestamp: string | null; // null means ???
}
3.2 Proposed Schema (Explicit Data Quality)
interface EntityProperties {
// Execution data
execution_count_30d: number;
execution_evidence_refs: string[];
last_observed_execution_timestamp: string | null;
// NEW: Data quality metadata
execution_data_availability: ExecutionDataAvailability;
execution_data_collected_at?: string; // ISO timestamp
execution_data_source?: string; // "sys_flow_context" | "sys_trigger" | "unavailable"
execution_data_notes?: string; // "Permissions denied" | "No execution log for business_rule"
}
type ExecutionDataAvailability =
| "available" // Data was collected, count is accurate
| "partial" // Data was collected but incomplete (API limit, timeout)
| "unavailable_no_log" // Source system has no execution log for this automation type
| "unavailable_no_access" // Connector lacks permissions to query execution logs
| "not_collected"; // Execution data collection was skipped (connector config)
3.3 Interpretation Rules
function interpretExecutionCount(entity: EntityDoc): string {
const { execution_count_30d, properties } = entity;
const availability = properties.execution_data_availability;
if (availability === "available" && execution_count_30d === 0) {
return "Confirmed zero executions in last 30 days";
}
if (availability === "unavailable_no_log") {
return "Execution count unavailable (no execution log for this automation type)";
}
if (availability === "not_collected") {
return "Execution count not collected";
}
if (execution_count_30d > 0) {
return `${execution_count_30d} executions in last 30 days`;
}
return "Unknown execution status";
}
3.4 UI Impact
Current UI (Ambiguous):
Executions (30d): 0
Proposed UI (Explicit):
Executions (30d): 0 ✓ (verified 2026-02-12)
Executions (30d): 0 ⚠ (no execution log available)
Executions (30d): — (data not collected)
4. execution_mode Gap Analysis
4.1 Current State
- 77 internal_inventory entities
- 45 autonomous (58%)
- 10 operator_assisted (13%)
- 22 unknown (29%) ← PROBLEM
4.2 Root Cause: Trigger Type Gaps
The connector classifies execution_mode based on triggerTypes:
# Connector: transformer.py lines 1159-1209
_AUTONOMOUS_TRIGGERS = {
"record", "schedule", "event", "data_change",
"record_create", "record_update", "record_create_or_update",
"daily", "weekly", "run_once", "repeat",
}
_OPERATOR_ASSISTED_TRIGGERS = {"service_catalog", "email", "inbound_action"}
_HUMAN_TRIGGERED_TRIGGERS = {"ui_action", "manual"}
# Classification logic
if not trigger_types:
return "unknown" # ← GAP SOURCE 1
for tt in trigger_types:
if tt in _AUTONOMOUS_TRIGGERS:
return "autonomous"
# ... check operator_assisted, human_triggered ...
return "unknown" # ← GAP SOURCE 2: unrecognized trigger type
Gap Sources:
- Empty triggerTypes array — Flow has no triggers configured
- Unrecognized trigger types — ServiceNow emits trigger types not in allowlist
Example Entity with Gap:
{
"display_name": "Knowledge - Approval Publish",
"identitySubtype": "flow_designer_flow",
"triggerTypes": ["knowledge management"], // Not in any allowlist
"execution_mode": "unknown" // ← Classification gap
}
4.3 Is 29% Acceptable?
Arguments FOR (acceptable gap):
- Internal inventory flows are low-priority (not displayed by default)
- 71% classification success for security-relevant automations may be enough
- Trigger type allowlists can be expanded iteratively
Arguments AGAINST (blocking issue):
execution_modeis used in findings generation and risk scoring- "Unknown" execution mode prevents accurate dormant authority detection
- 29% gap means 1 in 3 flows can't be properly risk-assessed
- Gap rate may be higher for security-relevant automations (not yet tested)
Recommendation:
- ✅ Acceptable for internal_inventory (hidden by default)
- ❌ Blocking for dormant_authority and active_external
- ACTION: Collect trigger type gap stats for security-relevant subset
4.4 Should Platform Override Connector Classification?
Current: Connector is authoritative. Platform ingests execution_mode verbatim.
Option A: Platform Fallback (Conservative)
// During ingestion, if execution_mode === "unknown"
if (properties.identitySubtype === "business_rule") {
properties.execution_mode = "autonomous"; // Business rules always run autonomously
}
if (properties.identitySubtype === "system_execution") {
properties.execution_mode = "autonomous"; // Script includes are autonomous
}
// For flows, keep "unknown" → user must manually classify
Option B: Platform Re-Classification (Aggressive)
// Platform computes execution_mode from trigger types + subtype + egress signals
// Ignore connector value entirely
// Pro: Single source of truth, consistent across connectors
// Con: Duplicates connector logic, divergence risk
Option C: Platform Validation + Override Flag (Hybrid)
interface EntityProperties {
execution_mode: ExecutionMode;
execution_mode_source: "connector" | "platform_override" | "user_override";
execution_mode_confidence: "high" | "low" | "unknown";
}
// Platform validates connector value, flags low-confidence classifications
if (properties.execution_mode === "unknown" && properties.identitySubtype === "business_rule") {
properties.execution_mode = "autonomous";
properties.execution_mode_source = "platform_override";
properties.execution_mode_confidence = "high";
}
Recommendation: Option C (validation + metadata). Preserves connector authority while providing quality guardrails.
5. API Improvement Proposals
5.1 Automation Summary Endpoint
Current Gap: No way to get aggregate automation stats without fetching all entities.
Proposed Endpoint: GET /api/v1/automations/summary
interface AutomationSummaryResponse {
total_count: number;
by_subtype: Record<string, number>; // "flow_designer_flow": 83
by_execution_mode: Record<string, number>; // "autonomous": 45, "unknown": 22
by_security_relevance: Record<string, number>; // "internal_inventory": 77
by_egress_category: Record<string, number>; // "none": 77, "external": 5
with_execution_evidence: number; // count where execution_count_30d > 0
with_identity_binding: number; // count where identity_binding_status == "bound"
classification_gaps: {
execution_mode_unknown: number; // 22
trigger_types_empty: number; // count where triggerTypes == []
execution_data_unavailable: number; // count where execution_data_availability != "available"
};
}
Use Cases:
- Dashboard: show automation inventory at a glance
- Data quality monitoring: track classification gap rate over time
- Connector validation: confirm execution data collection success
Effort: 4-6 hours (new route + aggregation pipeline)
5.2 Classification Override/Reclassification Endpoint
Current Gap: No way to manually override execution_mode or security_relevance when connector gets it wrong.
Proposed Endpoint: PATCH /api/v1/entities/:id/classification
interface ClassificationOverrideRequest {
execution_mode?: ExecutionMode;
security_relevance?: SecurityRelevance;
override_reason?: string; // Required when overriding connector value
}
interface ClassificationOverrideResponse {
entity_id: string;
previous_classification: {
execution_mode: string;
execution_mode_source: string;
};
updated_classification: {
execution_mode: string;
execution_mode_source: "user_override";
override_reason: string;
overridden_at: string;
overridden_by: string; // user ID from auth context
};
}
Workflow:
- User views entity detail page
- Sees
execution_mode: "unknown"with confidence indicator - Clicks "Manually Classify"
- Selects "Autonomous" from dropdown, provides reason: "Business rule always runs on record insert"
- API updates entity properties + creates audit event
- Evaluator re-runs on next sync to pick up classification change
Persistence:
interface EntityProperties {
execution_mode: ExecutionMode;
execution_mode_source: "connector" | "platform_override" | "user_override";
execution_mode_override_reason?: string;
execution_mode_overridden_at?: string;
execution_mode_overridden_by?: string;
}
Sync Behavior:
- Next connector sync should NOT overwrite user override
- Add
protected_fields: string[]to entity metadata - During ingestion, skip update of protected fields unless connector value changed
Effort: 8-12 hours (endpoint + UI + sync protection logic)
5.3 Data Quality Indicators per Entity
Current Gap: No visibility into which entity fields are trustworthy vs. defaulted/stale.
Proposed Addition: data_quality metadata in entity response
interface EntityDoc {
// ... existing fields ...
data_quality: DataQualityReport;
}
interface DataQualityReport {
overall_score: number; // 0-100, weighted sum of field confidence
field_confidence: Record<string, FieldConfidence>;
warnings: string[]; // ["execution_mode classification unknown", "no execution data collected"]
last_validated_at?: string;
}
interface FieldConfidence {
level: "high" | "medium" | "low" | "unavailable";
source: "connector" | "platform_derived" | "user_override" | "default";
collected_at?: string;
notes?: string;
}
// Example
{
"data_quality": {
"overall_score": 72,
"field_confidence": {
"execution_mode": {
"level": "low",
"source": "connector",
"notes": "Unrecognized trigger type 'knowledge management'"
},
"execution_count_30d": {
"level": "high",
"source": "connector",
"collected_at": "2026-02-12T14:23:00Z"
},
"egress_category": {
"level": "high",
"source": "platform_derived",
"notes": "Derived from endpoint_url analysis"
}
},
"warnings": [
"execution_mode classification unknown - manual review recommended",
"No execution evidence in last 30 days"
]
}
}
UI Impact:
- Entity detail page shows data quality score badge (🟢 High / 🟡 Medium / 🔴 Low)
- Field-level tooltips explain confidence level
- Warnings surface in "Data Quality" tab
Effort: 12-16 hours (schema extension + computation logic + UI)
5.4 Filter: classification_status=incomplete
Current Gap: No way to find entities that need manual review/classification.
Proposed Query Parameter: GET /api/v1/entities?classification_status=incomplete
Logic:
function isClassificationIncomplete(entity: EntityDoc): boolean {
return (
entity.properties.execution_mode === "unknown" ||
entity.properties.security_relevance === "unknown" ||
entity.properties.execution_data_availability === "not_collected" ||
(entity.properties.triggerTypes?.length === 0 && entity.properties.identitySubtype === "flow_designer_flow")
);
}
Use Cases:
- Connector validation: "Show me all automations with classification gaps"
- User task list: "Review these 22 flows with unknown execution mode"
- Data quality dashboard: "Incomplete classifications: 22 of 92 (24%)"
Implementation:
// In MongoStorageAdapter.queryEntities()
if (query.classificationStatus === "incomplete") {
filter.$or = [
{ "properties.execution_mode": "unknown" },
{ "properties.security_relevance": "unknown" },
{ "properties.execution_data_availability": "not_collected" },
];
}
Effort: 2-3 hours (query parameter + filter logic)
6. Collaboration with INTEGRATOR
6.1 Question: Is ALL 77 Entities Having execution_count=0 Suspicious?
Data Point: 77 internal_inventory flows, 100% have execution_count_30d: 0.
Possible Explanations:
A. Accurate Reflection of Reality
- These are template flows, system-default workflows, or disabled automations
- They genuinely have not executed in the last 30 days
- The connector correctly queried
sys_flow_contextand found 0 matching records
B. Data Collection Issue
- Connector has a bug in execution data collection
- Permissions issue: can't read
sys_flow_contexttable - API limit: only fetched execution data for first N flows, rest defaulted to 0
C. Classification Filter Bias
The security_relevance classification logic is:
if has_external_egress and exec_count > 0:
props["security_relevance"] = "active_external"
elif has_external_egress or binding == "bound":
props["security_relevance"] = "dormant_authority"
elif exec_count > 0:
props["security_relevance"] = "dormant_authority"
else:
props["security_relevance"] = "internal_inventory"
By definition, anything in internal_inventory MUST have exec_count == 0 (otherwise it would be dormant_authority).
So the question becomes: Are there ANY flows in the full 92-entity dataset with execution_count_30d > 0?
INTEGRATOR Action Items:
- Run connector with debug logging: confirm execution data collection was attempted for all flows
- Check
sys_flow_contexttable permissions: can the OAuth integration read it? - Manually query
sys_flow_contextin ServiceNow for 2-3 sample flows: confirm 0 records exist - Check for flows in
dormant_authorityoractive_externalwithexecution_count_30d > 0→ proves collection works
Expected Outcome:
- If collection works: some flows should have exec_count > 0
- If all 92 entities have exec_count=0: connector bug or permissions issue
6.2 Question: Can Execution Data Collection Be Improved?
Current Limitations:
| Automation Type | Execution Log Table | Deterministic Join? | Supported? |
|---|---|---|---|
| Flow Designer Flow | sys_flow_context | Yes (flow reference) | ✅ Yes |
| Scheduled Job | sys_trigger | Yes (document reference) | ✅ Yes |
| Business Rule | ❌ None | N/A | ❌ No |
| System Execution (Script Include) | ❌ None | N/A | ❌ No |
Potential ServiceNow APIs to Explore:
-
syslog Table (sys_log)
- Generic execution log for scripts, business rules, scheduled jobs
- Contains: timestamp, source (script name), message, level
- Join: fuzzy match on source field (not deterministic)
- Risk: high false positive rate, noise from unrelated logs
-
System Execution Tracker (sys_execution_tracker)
- Tracks long-running jobs and async operations
- May contain business rule executions if they take >N seconds
- Join: source_table + source field
- Risk: only captures slow executions, not representative
-
Table History (sys_audit)
- Tracks record changes (insert, update, delete)
- Business rules execute on these events
- Indirect signal: if sys_audit shows record changes on tables that have business_rule triggers, infer execution
- Risk: correlation, not causation
-
Flow Designer Execution Context (sys_hub_action_instance)
- Granular action-level execution log (individual steps within a flow)
- Join: flow reference
- Benefit: proves flow executed AND which steps ran (egress actions)
- Connector currently uses
sys_flow_context(flow-level) —sys_hub_action_instanceis more detailed
INTEGRATOR Recommendations:
- Priority 1: Validate
sys_flow_contextcollection is working (see §6.1) - Priority 2: Explore
sys_hub_action_instancefor action-level execution evidence - Priority 3: Research
sys_logfor business_rule execution inference (high effort, low confidence)
If execution data is truly unavailable for business_rules:
- Set
execution_data_availability: "unavailable_no_log"explicitly - Update UI to show "Execution count unavailable for this automation type"
- Don't default to 0 — use null or a sentinel value (-1)
7. Schema Enhancement Proposals
7.1 Connector-Side Schema (NormalizedNode)
File: /Users/lucky/dev/securityv0/sv0-platform/src/ingestion/types.ts
Current:
export interface NormalizedNode {
nodeId: string;
nodeType: NormalizedNodeType;
sourceSystem: string;
sourceId: string;
displayName: string;
status: NodeStatus;
createdAt?: string;
lastModifiedAt?: string;
properties: Record<string, unknown>; // ← Unstructured
}
Proposed Addition (Automation Properties):
// New type for automation-specific properties
export interface AutomationProperties {
// Existing fields
identitySubtype: IdentitySubtype;
automation_type: string;
triggerTypes?: string[];
endpoint_url?: string | null;
// Execution evidence
execution_count_30d: number;
execution_evidence_refs: string[];
last_observed_execution_timestamp?: string | null;
// NEW: Data quality metadata
execution_data_availability: ExecutionDataAvailability;
execution_data_collected_at?: string;
execution_data_source?: string;
execution_data_notes?: string;
// Classification
execution_mode: ExecutionMode;
execution_mode_confidence: "high" | "low" | "unknown";
security_relevance: SecurityRelevance;
// Egress
egress_category: EgressCategory;
egress_host?: string | null;
egress_base_url?: string | null;
// Identity binding
identity_binding_status: "bound" | "unlinked";
// Risk assessment
risk_group: string;
risk_group_label: string;
risk_group_priority: string;
ownership_status: "valid" | "orphaned";
// Referenced data
referenced_tables?: string[];
data_domains?: string[];
}
export type ExecutionMode = "autonomous" | "operator_assisted" | "human_triggered" | "unknown";
export type SecurityRelevance = "active_external" | "dormant_authority" | "internal_inventory" | "unknown";
export type EgressCategory = "none" | "internal" | "external" | "cloud" | "llm" | "unknown";
export type ExecutionDataAvailability =
| "available"
| "partial"
| "unavailable_no_log"
| "unavailable_no_access"
| "not_collected";
export type IdentitySubtype =
| "flow_designer_flow"
| "business_rule"
| "scheduled_job"
| "system_execution"
| "oauth_app"
| "service_principal";
Migration Strategy:
- Add types to
ingestion/types.ts - Connector already emits these properties (they're in
properties: Record<string, unknown>) - Platform ingestion validates against type (runtime check, not compile-time)
- UI can now type-safely access
entity.properties.execution_mode as ExecutionMode
Effort: 2-3 hours (type definitions + validation)
7.2 Platform-Side Schema (EntityDoc)
File: /Users/lucky/dev/securityv0/sv0-platform/src/domain/entities/types.ts
Current:
export interface EntityDoc {
_id: string;
tenant_id: string;
entity_type: EntityType;
source_system: string;
source_id: string;
properties: Record<string, unknown>; // ← Unstructured
relationships: EntityRelationship[];
execution_paths?: ExecutionPath[];
accessible_by?: AccessibleByEntry[];
sync_version: number;
last_synced_at: Date;
created_at: Date;
updated_at: Date;
}
Proposed Addition:
export interface EntityDoc {
// ... existing fields ...
// NEW: Data quality metadata
data_quality?: DataQualityReport;
// NEW: User overrides
user_overrides?: UserOverrideMetadata;
}
export interface DataQualityReport {
overall_score: number; // 0-100
field_confidence: Record<string, FieldConfidence>;
warnings: string[];
last_validated_at?: Date;
}
export interface FieldConfidence {
level: "high" | "medium" | "low" | "unavailable";
source: "connector" | "platform_derived" | "user_override" | "default";
collected_at?: Date;
notes?: string;
}
export interface UserOverrideMetadata {
protected_fields: string[]; // Fields that won't be overwritten by connector sync
overrides: Record<string, FieldOverride>;
}
export interface FieldOverride {
field_name: string;
original_value: unknown;
override_value: unknown;
override_reason: string;
overridden_at: Date;
overridden_by: string; // user ID
}
Computation Logic (during ingestion):
// In ingestion/normalizer.ts
function computeDataQuality(entity: EntityDoc): DataQualityReport {
const confidence: Record<string, FieldConfidence> = {};
const warnings: string[] = [];
if (entity.properties.execution_mode === "unknown") {
confidence.execution_mode = {
level: "low",
source: "connector",
notes: "Trigger type not recognized by connector"
};
warnings.push("execution_mode classification unknown - manual review recommended");
} else {
confidence.execution_mode = {
level: "high",
source: "connector",
collected_at: new Date(entity.last_synced_at)
};
}
if (entity.properties.execution_count_30d === 0 && !entity.properties.execution_data_availability) {
confidence.execution_count_30d = {
level: "medium",
source: "connector",
notes: "Zero count, but availability status unknown"
};
warnings.push("Execution count is 0 - unclear if data was collected");
}
// ... more field checks ...
const overall_score = computeOverallScore(confidence);
return {
overall_score,
field_confidence: confidence,
warnings,
last_validated_at: new Date()
};
}
Effort: 8-12 hours (schema + computation + storage)
7.3 MongoDB Index Additions
File: /Users/lucky/dev/securityv0/sv0-platform/src/storage/mongo/collections.ts
Proposed Indexes:
// For classification_status filter
await entities.createIndex({
tenant_id: 1,
"properties.execution_mode": 1,
"properties.security_relevance": 1
});
// For data quality queries
await entities.createIndex({
tenant_id: 1,
"data_quality.overall_score": 1
});
// For user override tracking
await entities.createIndex({
tenant_id: 1,
"user_overrides.protected_fields": 1
});
Effort: 1 hour (index creation + migration script)
8. Pre-Ingest Filter Analysis
8.1 Current State
Connector Code: transformer.py lines 103-111
# Optionally filter internal_inventory automations (connector-side pre-filter).
# Default OFF to preserve Phase 1 inventory completeness gate.
if filter_internal_inventory:
filtered_count = self._filter_internal_inventory()
if filtered_count > 0:
logging.getLogger(__name__).info(
"Filtered %d internal_inventory automation(s) from NormalizedGraph output",
filtered_count,
)
Filter Logic: Lines 1211-1260
def _filter_internal_inventory(self) -> int:
"""Remove internal_inventory automation nodes and their orphaned edges/owner nodes.
Filtering criteria: security_relevance == "internal_inventory" means:
- egress_category in (none, internal, unknown)
- identity_binding_status == "unlinked"
- execution_count_30d == 0
"""
# Find automation node IDs to remove
remove_node_ids: set[str] = set()
for node in self._nodes:
if node.get("nodeType") == "autonomous_identity":
rel = node.get("properties", {}).get("security_relevance")
if rel == "internal_inventory":
remove_node_ids.add(node["nodeId"])
# Remove nodes
self._nodes = [n for n in self._nodes if n["nodeId"] not in remove_node_ids]
# Remove orphaned edges
self._edges = [
e for e in self._edges
if e["sourceNodeId"] not in remove_node_ids
and e["targetNodeId"] not in remove_node_ids
]
# Remove orphaned owner nodes (OWNED_BY targets with no other edges)
# ... (omitted for brevity)
return len(remove_node_ids)
8.2 Question: Should Filtering Happen Pre-Ingest or Post-Ingest?
Option A: Pre-Ingest (Current Implementation, Disabled by Default)
Pros:
- Reduces entity count before platform ingestion (lower storage, faster queries)
- Simplifies platform by not storing irrelevant data
- Graph layout is immediately clean (no 77 internal_inventory nodes)
Cons:
- Inventory incompleteness — can't retroactively include entities if criteria change
- Audit gap — no record that these automations exist in the source system
- Temporal loss — can't track when internal_inventory automations become security-relevant
- Discovery validation impossible — can't prove connector scanned all flows if some are filtered out
Option B: Post-Ingest (Platform Filters)
Pros:
- Complete inventory — every discovered automation is stored
- Temporal tracking — can see when execution_count changes from 0 → N (dormant → active)
- Audit trail — proves connector scanned all entities, none were lost
- Flexible filtering — UI can show/hide internal_inventory on demand
- Reclassification — if connector gets security_relevance wrong, platform can override
Cons:
- Higher entity count (92 instead of ~15)
- Requires UI/API default filters to hide noise
- Graph layout requires filtering logic
Recommendation: Option B (Post-Ingest with Default Filters)
Rationale:
- Discovery is broad, analysis is narrow — connector should discover everything, platform should filter for security relevance
- Temporal use case — a flow with 0 executions today may have 10 executions tomorrow. If it's filtered pre-ingest, we lose that transition.
- Audit/compliance — "How many automations exist in ServiceNow?" should be 92, not 15
- Data quality validation — can compare connector output to manual ServiceNow queries only if all entities are ingested
Implementation:
- Keep
filter_internal_inventory: bool = False(default OFF in connector) - Add default filter to platform API:
GET /api/v1/entities?entity_type=identity&security_relevance!=internal_inventory - Add default filter to UI Automations page
- Graph browse mode defaults to same filter (see automation-filtering-graph-strategy.md §S1)
Migration Path:
- Currently deployed: internal_inventory entities ARE ingested (filter is off)
- No migration needed
- Just add default filters to API/UI
8.3 When Should Pre-Ingest Filtering Be Used?
Valid Use Cases:
- Connector has a bug that discovers non-existent/duplicate entities → filter in connector until bug is fixed
- Source system permissions limit — can't query execution data for some entities → filter them to avoid misleading 0 counts
- Scale issues — 10,000+ automations discovered, platform can't handle load → filter to top N by relevance
Invalid Use Cases:
- Hiding false positives — this should be done via UI filters, not pre-ingest
- Improving graph layout — this is a UI problem, not a data problem
- "Cleaning up" the inventory — defeats the purpose of deterministic discovery
Recommendation for sv0-connectors:
- Remove
filter_internal_inventoryparameter entirely (simplifies connector interface) - Always emit all discovered entities
- Let platform handle relevance filtering
9. Challenge Questions for Other Roles
9.1 For Product Owner
Q1: Should we expose data quality confidence levels in the UI?
Context: 29% of flows have execution_mode: "unknown". Currently, the UI shows this as a plain value. Should we add visual indicators (🟢 High / 🟡 Low / 🔴 Unknown) to signal data quality?
Impact:
- Users can prioritize manual review of low-confidence entities
- Reduces false confidence in incomplete data
- Adds visual noise to UI
Recommendation: Yes, but make it subtle (icon + tooltip, not full badge).
Q2: Is "Show all automations" a toggle or a separate page?
Context: 77 of 92 automations are internal_inventory (hidden by default). Should users see them via:
- A. Toggle switch "Show internal inventory" on main Automations page
- B. Separate "Automation Inventory (All)" page
- C. Filter dropdown: "Security-relevant only" / "All automations"
Recommendation: Option C (filter dropdown) — most flexible, matches existing filter patterns.
Q3: Should incomplete classification block findings generation?
Context: If execution_mode: "unknown", should the evaluator still generate dormant_authority findings, or skip the entity?
Options:
- Block: conservative, avoids false positives, but reduces finding coverage
- Allow: aggressive, treats "unknown" as "autonomous" (assume worst case)
- Flag: generate finding but mark it as "low_confidence"
Recommendation: Option C (flag with low_confidence).
9.2 For CISO
Q1: Is "incomplete classification" itself a finding we should surface?
Example Finding:
Finding Type: INCOMPLETE_AUTOMATION_CLASSIFICATION
Severity: Low
Title: 22 automations with unknown execution mode
Description: 22 of 92 discovered automations have execution_mode="unknown" due to
unrecognized trigger types. Manual classification recommended to ensure
complete risk assessment.
Evidence:
- Automation IDs: [list of 22 entity IDs]
- Trigger types causing gaps: ["knowledge management", "email handler", ...]
- Recommendation: Expand connector trigger type allowlist or manually classify
Pro: Surfaces data quality gaps as actionable items Con: Not a security risk per se, more of an operational issue
Recommendation: Yes, but as severity "Informational" (not Low/Medium/High).
Q2: What is the acceptable classification gap rate?
Current: 29% of flows have execution_mode: "unknown"
Question: What threshold should trigger an alert?
- 0% (perfect classification required)?
- <10% (acceptable noise)?
- <25% (current state is acceptable)?
Recommendation: <10% for security-relevant automations, <50% for internal_inventory.
Q3: Should we trust execution_count=0 or require manual verification?
Context: ALL 77 internal_inventory flows have execution_count_30d: 0. No execution_data_availability metadata to confirm this is accurate.
Options:
- Trust it: assume connector is correct, proceed with analysis
- Flag it: show warning "Execution count may be incomplete"
- Block it: require INTEGRATOR to validate before accepting data
Recommendation: Flag it (option B) until INTEGRATOR confirms collection works (see §6).
9.3 For Architect
Q1: Should classification be a platform concern or remain connector-side?
Current: Connector computes execution_mode, security_relevance, risk_group. Platform ingests verbatim.
Alternative: Platform recomputes these during ingestion based on normalized properties.
Pros of Platform Classification:
- Single source of truth
- Consistent across all connectors
- Easier to evolve classification logic (no connector updates needed)
Cons:
- Duplicates logic between connector and platform
- Connector loses autonomy
- What if connector has better context (e.g., ServiceNow-specific trigger types)?
Recommendation: Hybrid — connector provides raw signals (trigger types, egress URLs), platform derives classification. Connector can provide hints, but platform is authoritative.
Q2: Should execution_data_availability be part of the NormalizedGraph schema?
Context: Proposed new field to distinguish "confirmed zero" from "data not collected".
Question: Should this be:
- A. Required field in NormalizedGraph (connector MUST provide it)
- B. Optional field (connector MAY provide it, platform infers if absent)
- C. Platform-computed only (connector doesn't emit it, platform adds during ingestion)
Recommendation: Option A (required). Data quality is critical — connectors should explicitly declare availability.
Q3: Should we support connector-to-platform data quality feedback?
Context: Connector knows when it hits API limits, permissions errors, or timeouts during data collection.
Proposed: Add collection_warnings to NormalizedGraph:
export interface NormalizedGraph {
// ... existing fields ...
collectionWarnings?: CollectionWarning[];
}
export interface CollectionWarning {
field: string; // "execution_count_30d"
severity: "info" | "warning" | "error";
message: string; // "API limit reached, execution count may be incomplete"
affected_entities?: string[]; // nodeIds
}
Benefit: Platform can surface connector issues in UI, not just logs.
Recommendation: Yes — this closes the feedback loop between connector and platform.
9.4 For Integrator
Q1: What additional ServiceNow APIs would improve execution_count reliability?
Current: Uses sys_flow_context (flow-level) and sys_trigger (job-level).
Gaps:
- Business rules: no execution log
- Flows: only flow-level count, not action-level detail
Proposed Research:
sys_hub_action_instance— action-level execution log (which flow steps ran)sys_log— generic script execution log (may contain business rule executions)sys_execution_tracker— long-running job trackersys_audit— table history (indirect signal for business rule executions)
Question: Which of these are feasible with OAuth app permissions?
Expected Effort: 4-8 hours research + testing
Q2: Can we get last_modified_date for flows?
Context: Currently missing from entity properties. Would help identify recently-edited flows (potential new risk).
ServiceNow Field: sys_updated_on in sys_hub_flow table
Question: Already collected but not emitted, or not collected?
Action: Check connector code, add to properties if available.
Q3: Should connector emit a "data collection report" after each scan?
Proposed: After discovery, connector emits a summary:
{
"collection_summary": {
"flows_discovered": 83,
"flows_with_execution_data": 0, // ← KEY METRIC
"flows_skipped_no_permissions": 0,
"execution_data_sources": ["sys_flow_context", "sys_trigger"],
"collection_duration_seconds": 42,
"api_calls_made": 156,
"api_limits_hit": 0
}
}
Benefit: Immediate visibility into connector health, not just entity data.
Recommendation: Yes — include in connector sync metadata.
10. Summary & Recommendations
10.1 Critical Issues (Blocking)
| Issue | Impact | Recommendation | Effort |
|---|---|---|---|
| Null Ambiguity | Can't distinguish "confirmed zero" from "not checked" | Add execution_data_availability to schema | 8-12h |
| 29% execution_mode Gap | Can't classify 1 in 3 automations | Expand trigger type allowlist + platform fallback | 4-6h |
| No Data Quality Metadata | Can't assess confidence in entity properties | Add data_quality to EntityDoc | 12-16h |
10.2 High-Value Improvements (Recommended)
| Feature | Use Case | Effort |
|---|---|---|
| Automation Summary Endpoint | Dashboard stats, connector validation | 4-6h |
| Classification Override API | Manual review workflow | 8-12h |
classification_status=incomplete Filter | Find entities needing review | 2-3h |
| Pre-Ingest Filter Removal | Preserve inventory completeness | 1h |
10.3 INTEGRATOR Action Items
- Priority 1: Validate execution data collection works (check for flows with exec_count > 0)
- Priority 2: Research
sys_hub_action_instancefor action-level execution evidence - Priority 3: Add
last_modified_dateto flow properties - Priority 4: Emit data collection summary in sync metadata
10.4 Platform Schema Enhancements
// 1. Add to NormalizedNode properties (connector emits)
interface AutomationProperties {
execution_data_availability: ExecutionDataAvailability;
execution_data_collected_at?: string;
execution_mode_confidence: "high" | "low" | "unknown";
}
// 2. Add to EntityDoc (platform computes)
interface EntityDoc {
data_quality?: DataQualityReport;
user_overrides?: UserOverrideMetadata;
}
// 3. Add to NormalizedGraph (connector emits)
interface NormalizedGraph {
collectionWarnings?: CollectionWarning[];
}
10.5 API Additions
GET /api/v1/automations/summary
GET /api/v1/entities?classification_status=incomplete
PATCH /api/v1/entities/:id/classification
10.6 Answers to Core Questions
Q: Does execution_count_30d: 0 mean "we checked and found zero" or "we didn't check"?
A: Currently ambiguous. Recommendation: Add execution_data_availability field to make this explicit.
Q: Is a 29% execution_mode: "unknown" rate acceptable?
A: Acceptable for internal_inventory (hidden by default), blocking for security-relevant automations. Recommendation: Platform fallback for known subtypes (business_rule → autonomous).
Q: Should the platform have a fallback classification if the connector returns "unknown"?
A: Yes, with metadata indicating override. Use execution_mode_source: "platform_override" to track provenance.
Appendix A: TypeScript Type Definitions
File: /Users/lucky/dev/securityv0/sv0-platform/src/domain/entities/automation-types.ts (new)
/**
* Automation-specific types for identity entities.
* These types provide structure for properties that were previously untyped (Record<string, unknown>).
*/
export type IdentitySubtype =
| "flow_designer_flow"
| "business_rule"
| "scheduled_job"
| "system_execution"
| "oauth_app"
| "service_principal";
export type ExecutionMode = "autonomous" | "operator_assisted" | "human_triggered" | "unknown";
export type SecurityRelevance =
| "active_external" // Has external egress + execution evidence
| "dormant_authority" // Has capability but no recent execution
| "internal_inventory" // No external egress, no execution, unlinked
| "unknown";
export type EgressCategory = "none" | "internal" | "external" | "cloud" | "llm" | "unknown";
export type ExecutionDataAvailability =
| "available" // Data was collected, count is accurate
| "partial" // Data was collected but incomplete (API limit, timeout)
| "unavailable_no_log" // Source system has no execution log for this automation type
| "unavailable_no_access" // Connector lacks permissions to query execution logs
| "not_collected"; // Execution data collection was skipped
export interface AutomationProperties {
// Identity classification
identitySubtype: IdentitySubtype;
automation_type: string; // "flow", "business_rule", "job", "script"
// Trigger configuration
triggerTypes?: string[];
// Execution evidence
execution_count_30d: number;
execution_evidence_refs: string[];
last_observed_execution_timestamp?: string | null;
// Data quality metadata
execution_data_availability: ExecutionDataAvailability;
execution_data_collected_at?: string; // ISO 8601 timestamp
execution_data_source?: string; // "sys_flow_context" | "sys_trigger" | "unavailable"
execution_data_notes?: string;
// Classification
execution_mode: ExecutionMode;
execution_mode_confidence: "high" | "low" | "unknown";
execution_mode_source?: "connector" | "platform_override" | "user_override";
security_relevance: SecurityRelevance;
// Egress analysis
egress_category: EgressCategory;
egress_host?: string | null;
egress_base_url?: string | null;
endpoint_url?: string | null;
// Identity binding
identity_binding_status: "bound" | "unlinked";
// Risk assessment
risk_group: string; // "RG1" | "RG2" | "RG3" | "RG4" | "RG5"
risk_group_label: string;
risk_group_priority: string; // "P1" | "P2" | "P3" | "P4"
// Ownership
ownership_status: "valid" | "orphaned";
sys_created_by?: string;
sys_updated_by?: string;
// Referenced data
referenced_tables?: string[];
data_domains?: string[];
}
export interface DataQualityReport {
overall_score: number; // 0-100, weighted sum of field confidence scores
field_confidence: Record<string, FieldConfidence>;
warnings: string[];
last_validated_at?: Date;
}
export interface FieldConfidence {
level: "high" | "medium" | "low" | "unavailable";
source: "connector" | "platform_derived" | "user_override" | "default";
collected_at?: Date;
notes?: string;
}
export interface UserOverrideMetadata {
protected_fields: string[]; // Field names that won't be overwritten by connector sync
overrides: Record<string, FieldOverride>;
}
export interface FieldOverride {
field_name: string;
original_value: unknown;
override_value: unknown;
override_reason: string;
overridden_at: Date;
overridden_by: string; // User ID from auth context
}
export interface CollectionWarning {
field: string; // Property name that was affected
severity: "info" | "warning" | "error";
message: string;
affected_entities?: string[]; // nodeIds of entities affected by this warning
}
Appendix B: API Endpoint Specifications
B.1 Automation Summary Endpoint
GET /api/v1/automations/summary
Query Parameters:
tenant_id(from auth context)source_system(optional): filter by source system
Response:
{
"total_count": 92,
"by_subtype": {
"flow_designer_flow": 83,
"business_rule": 2,
"oauth_app": 3,
"service_principal": 2,
"system_execution": 2
},
"by_execution_mode": {
"autonomous": 45,
"operator_assisted": 10,
"unknown": 22,
"human_triggered": 0
},
"by_security_relevance": {
"internal_inventory": 77,
"dormant_authority": 9,
"active_external": 1,
"unknown": 5
},
"by_egress_category": {
"none": 77,
"external": 5,
"llm": 3,
"cloud": 2,
"internal": 3,
"unknown": 2
},
"with_execution_evidence": 0,
"with_identity_binding": 9,
"classification_gaps": {
"execution_mode_unknown": 22,
"trigger_types_empty": 5,
"execution_data_unavailable": 2
},
"data_quality": {
"overall_average_score": 72,
"entities_with_warnings": 24,
"low_confidence_count": 22
}
}
B.2 Classification Override Endpoint
PATCH /api/v1/entities/:id/classification
Request Body:
{
"execution_mode": "autonomous",
"override_reason": "Business rule always runs on record insert, not human-triggered"
}
Response:
{
"entity_id": "027e16c40dc1009472308597",
"previous_classification": {
"execution_mode": "unknown",
"execution_mode_source": "connector"
},
"updated_classification": {
"execution_mode": "autonomous",
"execution_mode_source": "user_override",
"execution_mode_confidence": "high",
"override_reason": "Business rule always runs on record insert, not human-triggered",
"overridden_at": "2026-02-12T15:30:00Z",
"overridden_by": "user-123"
},
"protected_fields": ["execution_mode"]
}
B.3 Classification Status Filter
GET /api/v1/entities?classification_status=incomplete
Logic: Returns entities where ANY of:
properties.execution_mode === "unknown"properties.security_relevance === "unknown"properties.execution_data_availability === "not_collected"properties.triggerTypes.length === 0ANDproperties.identitySubtype === "flow_designer_flow"
Response:
{
"data": [
{
"_id": "027e16c40dc1009472308597",
"entity_type": "identity",
"properties": {
"display_name": "Knowledge - Approval Publish",
"execution_mode": "unknown",
"triggerTypes": ["knowledge management"],
"execution_data_availability": "available"
},
"data_quality": {
"overall_score": 65,
"warnings": ["execution_mode classification unknown - manual review recommended"]
}
}
],
"cursor": null,
"meta": {
"total_count": 22
}
}
Appendix C: Implementation Checklist
Phase 1: Schema Enhancements (8-12 hours)
- Add
automation-types.tswith structured types - Add
execution_data_availabilityto AutomationProperties - Add
execution_mode_confidenceto AutomationProperties - Add
DataQualityReportto EntityDoc - Add
UserOverrideMetadatato EntityDoc - Create MongoDB indexes for new fields
Phase 2: API Endpoints (12-16 hours)
- Implement
GET /api/v1/automations/summary - Implement
PATCH /api/v1/entities/:id/classification - Add
classification_status=incompletequery parameter - Add data quality computation during ingestion
- Add protected_fields sync behavior
Phase 3: Connector Updates (INTEGRATOR) (8-12 hours)
- Add
execution_data_availabilityto transformer output - Add
execution_mode_confidenceto transformer output - Add
execution_data_collected_attimestamp - Add
collectionWarningsto NormalizedGraph - Validate execution data collection works (check for exec_count > 0)
- Research
sys_hub_action_instanceAPI
Phase 4: UI Updates (8-12 hours)
- Add data quality badge to entity detail page
- Add "Manually Classify" button for entities with execution_mode=unknown
- Add classification override modal
- Add "Show internal inventory" filter toggle
- Add
classification_status=incompletefilter to Automations page - Add automation summary dashboard widget
Phase 5: Testing (4-6 hours)
- Unit tests for data quality computation
- Integration tests for classification override
- E2E test for incomplete classification filter
- Connector test: verify execution_data_availability is populated
- Manual test: override execution_mode, verify sync doesn't revert
END OF ANALYSIS