Comprehensive Plan: Sergey's Phase 1 Deltas + UI Upgrades
Date: 2026-02-10 Status: Track 1 COMPLETE — pilot gate cleared. Ready for Track 2/3.
Context
Sergey expanded the ServiceNow Phase 1 PRD from 3 to 7 requirements. Three are new additions pulled into Phase 1: data egress classification (#4), data origin classification (#5), and risk-based grouping (#7). Three existing items need verification and completion: identity binding (#2), execution detection (#3), ownership validation (#6). Automation inventory (#1) needs Flow Designer added.
The UI upgrades plan has Phase 1 (tables) complete, with Phases 2-5 pending. Graph focus "Execution Flow" mode maps directly to Sergey's automation chain concept.
Two Key Architectural Decisions
Decision 1: Import by type (connector emits entities independently)
Current: Connector correlator builds ExecutionChain objects by pre-linking BR→REST→OAuth→SP. Transformer decomposes chains into NormalizedGraph.
Target: Connector discovers entities by type independently. Each type emits as individual NormalizedNode with direct relationships (NormalizedEdge). No intermediate ExecutionChain representation. Platform path materializer reconstructs full execution flows.
Delivery strategy: This refactor is Track 2 — it runs AFTER the pilot-critical requirements are satisfied. Track 1 ships Sergey's 7 requirements using the existing connector architecture with targeted additions. Track 2 refactors the architecture for long-term scalability.
Decision 2: Platform evaluator only for findings (Target State)
Current: Connector detectors.py generates Detection objects for CLI. Platform evaluator generates Finding objects.
Target: All findings from platform evaluator. Connector is pure discovery + classification.
Delivery strategy: Track 1 adds new connector-side classifications (egress, origin, risk group) as entity properties that flow through to platform. Existing detectors continue working. Track 2 migrates all finding logic to platform evaluator and deprecates detectors.py.
Delivery Tracks
Track 1: Pilot-Critical (Sergey's 7 requirements) — SHIP FIRST
Minimal changes to satisfy all 7 PRD requirements. Uses existing connector architecture. Adds new modules alongside existing code.
Track 2: Architecture Refactor — AFTER PILOT GATE
Full import-by-type refactor, platform-only findings, ExecutionChain deprecation. Can be feature-flagged or phased in after pilot is stable.
Track 3: UI Upgrades (Phases 2-5) — PARALLEL
Graph focus, drawer, navigation. Can start once Track 1 data flows through platform.
Gate 0: PRD Lock (HARD PREREQUISITE)
Nothing starts until Gate 0 passes.
0a: Resolve merge conflict in PRD
File: sv0-documentation/docs/product/MVP1 - ServiceNow.md
- Resolve merge conflict (lines 149-261)
- Publish clean version with all 7 requirements
0b: Publish requirement-to-acceptance matrix
Create a locked acceptance matrix for all 7 requirements with concrete, testable outputs:
| # | Requirement | Required Outputs (per automation) | Acceptance Test |
|---|---|---|---|
| 1 | Automation inventory | sys_id, automation_type, sys_created_by, sys_updated_by | All 4 types (BR, SI, Flow, Job) enumerated with sys_id. No execution claims. |
| 2 | Identity binding | SP object_id, app_id, credential_type (secret|certificate), SP creation_timestamp, permission_assignments_snapshot | Each automation with REST Message → OAuth → SP chain produces all 5 fields. |
| 3 | Execution detection | last_observed_execution_timestamp, execution_count_30d, execution_evidence_refs (max 10 record IDs), identity_binding_status (bound|unlinked) | Deterministic first-party linkage. No heuristic string matching. Script Includes output "unlinked" when join unavailable. |
| 4 | Data egress classification | egress_host, egress_base_url, egress_category (llm|external|internal|none|unknown) | Same-instance = internal (not all *.service-now.com). LLM catalog match. No endpoint observed = none. No payload/header inspection. |
| 5 | Data origin classification | referenced_tables, data_domains (hr|identity|customer|financial|unknown) | First-party table references only (BR collection field, Flow trigger table, REST Message target). Script-parsed origins → unknown. |
| 6 | Ownership validation | ownership_status (valid|invalid|ambiguous) | Deterministic rules: no owner = invalid, all disabled = invalid, all inactive = invalid, groups only = ambiguous, at least one active individual = valid. |
| 7 | Risk-based grouping | risk_group (RG1-RG5), risk_group_label, risk_group_priority | Matrix of egress (#4) × origin (#5). Hardcoded rules. |
0c: Update automation-types.md
File: sv0-documentation/docs/integrations/servicenow/automation-types.md
- Flows moved from "Phase 2" to Phase 1
- Clarify: BR + SI + Scheduled Jobs are ALREADY detected. Flows are the new addition.
- Update "Current Detection Coverage" to reflect actual state
0d: Save delivery plan
Save this plan to sv0-documentation/docs/plans/2026-02-10-phase1-deltas-plan.md
Track 1: Pilot-Critical Phases
Phase Map
Gate 0 ──→ T1-A (Inventory: add Flows)
──→ T1-B (Identity binding: verify + complete)
──→ T1-C (Execution detection: verify + complete)
──→ T1-D (Egress classification: new module)
──→ T1-E (Origin classification: new module)
──→ T1-F (Ownership validation: align to spec)
──→ T1-G (Risk grouping: new module, depends on D+E)
──→ T1-H (Platform: ensure properties flow through)
──→ T1-QA (PRD-assertion tests)
T1-A through T1-F can run in parallel. T1-G depends on T1-D + T1-E. T1-QA runs last.
T1-A: Automation Inventory — Add Flow Designer
Goal: Add Flow Designer as 4th automation type. (Sergey's requirement #1)
Status: discover_flows() IMPLEMENTED but trigger filter incomplete — excludes service_catalog triggers.
Clarification: BR, SI, Scheduled Jobs are already discovered by servicenow_client.py. Flows are the only new addition.
File: sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py
discover_flows() exists (lines 1168-1328) but queries:
query="trigger_typeINrecord,schedule" # ← MISSING: service_catalog
Gap found 2026-02-11: AI Triage Flow (Service Catalog trigger → Azure OpenAI) not discovered. Fix: expand to trigger_typeINrecord,schedule,service_catalog. See reconciled roadmap item 2A.
Tables:
sys_hub_flow— flow definitions (sys_created_by, sys_updated_by)sys_hub_trigger_instance— triggers (trigger_type: record/schedule/service_catalog = autonomous)sys_hub_action_instance— actions (HTTP steps, REST calls)
File: sv0-connectors/integrations/entra-servicenow/transformer.py
Add Flow nodes as autonomous_identity with identitySubtype: "flow_designer_flow". Emit relationships:
- Flow → TRIGGERS_ON → table (from trigger config)
- Flow → RUNS_AS → identity (if run_as configured)
- Flow → EXECUTES_ON → REST Message (if HTTP action found)
Output per automation:
sys_idas source_idautomation_type: business_rule / script_include / flow / scheduled_jobsys_created_by,sys_updated_byas properties
T1-B: Identity Binding — Verify + Complete
Goal: Verify existing identity binding outputs match PRD spec. (Requirement #2)
File: sv0-connectors/integrations/entra-servicenow/correlator.py
Verify these outputs exist per automation with a REST Message → OAuth → SP chain:
| Required Output | Current Field | Status |
|---|---|---|
| Entra SP object ID | azure_sp.id | Verify present |
| App (client) ID | azure_sp.app_id | Verify present |
| Credential type (secret|certificate) | ? | Check — may need to add |
| SP creation timestamp | azure_sp.created_date_time | Verify present |
| Permission assignments snapshot | canonical_permissions | Verify present |
Action: Audit Integration dataclass fields. Add credential_type to SP entity properties if missing (derive from keyCredentials vs passwordCredentials on the app registration).
T1-C: Execution Detection — Verify + Complete
Goal: Verify and complete execution detection with all required outputs. (Requirement #3)
This is a dedicated phase, not deferred.
PRD required outputs per automation:
last_observed_execution_timestamp— from SN execution/transaction/flow logsexecution_count_30d— count of execution records in last 30 daysexecution_evidence_refs— up to 10 record IDs (store no more than 10 in Phase 1)identity_binding_status—bound(deterministic linkage exists) orunlinked(platform does not expose a deterministic join)
PRD constraints:
- Correlation MUST use first-party identifiers when exposed by the platform
- MUST NOT rely on heuristic string matching
- Each execution record is a distinct event regardless of timestamp granularity
- Script Includes will likely produce many
unlinkedresults (no deterministic join from execution log → SI)
Current state to audit:
- Azure sign-in logs: already discovered, linked via
EXECUTES_ONedges in transformer - ServiceNow execution logs: check what
servicenow_client.pyfetches (syslog_transaction, sys_flow_context, etc.)
Files to modify:
sv0-connectors/integrations/entra-servicenow/servicenow_client.py— Ensure execution log discovery for all 4 automation typessv0-connectors/integrations/entra-servicenow/transformer.py— Add execution evidence properties:# On each automation entity
properties["last_observed_execution_timestamp"] = timestamp_or_null
properties["execution_count_30d"] = count
properties["execution_evidence_refs"] = record_ids[:10]
properties["identity_binding_status"] = "bound" | "unlinked"
Execution log sources by type:
| Automation Type | Execution Log Source | Deterministic Join | Expected Status |
|---|---|---|---|
| Business Rules | syslog_transaction (table operation logs) | BR sys_id in log → yes, IF platform exposes it | bound or unlinked |
| Script Includes | syslog_transaction | No deterministic join from log → SI | mostly unlinked |
| Flows | sys_flow_context (flow run records) | Flow sys_id in context → yes | bound |
| Scheduled Jobs | sysauto_script run history / sys_trigger | Job sys_id in trigger → yes | bound |
T1-D: Data Egress Classification
Goal: Classify outbound egress per automation. (Requirement #4)
New file: sv0-connectors/integrations/entra-servicenow/egress_classifier.py
LLM_CATALOG = [
"api.openai.com", "*.openai.azure.com", "api.anthropic.com",
"generativelanguage.googleapis.com", "api.cohere.ai",
]
def classify_egress(endpoint_url: str | None, instance_host: str) -> dict:
if not endpoint_url:
return {"egress_host": None, "egress_base_url": None, "egress_category": "none"}
if _is_dynamic_url(endpoint_url):
return {"egress_host": None, "egress_base_url": None, "egress_category": "unknown"}
host = urlparse(endpoint_url).hostname
base_url = f"{parsed.scheme}://{host}"
if any(fnmatch(host, p) for p in LLM_CATALOG):
return {"egress_host": host, "egress_base_url": base_url, "egress_category": "llm"}
if host == instance_host: # SAME INSTANCE ONLY — not all *.service-now.com
return {"egress_host": host, "egress_base_url": base_url, "egress_category": "internal"}
return {"egress_host": host, "egress_base_url": base_url, "egress_category": "external"}
Internal = same instance only (compare against configured instance_host), NOT all *.service-now.com. PRD defines internal as "within the same ServiceNow tenant."
Sources: REST Message endpoint URL, Flow HTTP action static endpoint.
Constraints: No payload/header inspection. Orchestration layers = egress point.
T1-E: Data Origin Classification
Goal: Map automations to SN tables, classify into data domains. (Requirement #5)
New file: sv0-connectors/integrations/entra-servicenow/origin_classifier.py
Only use first-party table references from platform configuration. Do NOT infer from script code parsing in Phase 1.
Acceptable first-party sources:
- Business Rules:
collectionfield (the table the BR triggers on) — platform configuration field - Flows: trigger table from
sys_hub_trigger_instance— platform configuration - Scheduled Jobs: target table from job configuration (if available)
- REST Messages: target table reference (if available in configuration)
NOT acceptable in Phase 1:
- GlideRecord table names parsed from script bodies (
analyze_script_mutations()) - Any table reference extracted by code parsing/regex
DEFAULT_DOMAINS = {
"hr": ["sn_hr_*", "hr_*"],
"identity": ["sys_user", "sys_user_role", "sys_user_grmember"],
"customer": ["customer_*", "account*", "contact*"],
"financial": ["alm_*", "*finance*"],
}
def classify_origin(tables: list[str], domains: dict = DEFAULT_DOMAINS) -> dict:
"""Only accepts first-party table references from platform config fields."""
matched = set()
for table in tables:
for domain, patterns in domains.items():
if any(fnmatch(table, p) for p in patterns):
matched.add(domain)
return {"referenced_tables": tables, "data_domains": sorted(matched) or ["unknown"]}
Script Includes and other script-based automation where table reference is not a platform config field → data_domains: ["unknown"]
T1-F: Ownership Validation
Goal: Align ownership detection to Sergey's spec with deterministic rules. (Requirement #6)
All rules are deterministic — no threshold ambiguity:
| Condition | ownership_status | Rationale |
|---|---|---|
| No owner exists (SN or Entra) | invalid | No accountability |
All owners disabled (active=false) | invalid | No functioning accountability |
| All owners inactive (no sign-in in configured window) | invalid | No functioning accountability |
| All owners are groups (no individual) | ambiguous | Accountability exists but is diffuse |
| At least one active individual owner | valid | Clear individual accountability |
The "configured window" for inactive is a tenant-level setting (default: 90 days). This is a configuration, not a heuristic — it's a deterministic threshold applied uniformly.
Files: Add compute_ownership_status() to transformer. Reads from existing owner discovery data (Azure SP owners + SN creator lookup).
T1-G: Risk-Based Grouping
Goal: Assign RG1-RG5 per automation. (Requirement #7)
Depends on T1-D (egress) and T1-E (origin).
New file: sv0-connectors/integrations/entra-servicenow/risk_grouper.py
def assign_risk_group(egress_category: str, data_domains: list[str]) -> dict:
sensitive = bool(set(data_domains) & {"hr", "identity", "customer", "financial"})
if sensitive and egress_category == "llm":
return {"risk_group": "RG1", "label": "Sensitive Data → LLM Egress", "priority": "P0"}
if sensitive and egress_category == "external":
return {"risk_group": "RG2", "label": "Sensitive Data → External Egress", "priority": "P1"}
if egress_category in ("llm", "external"):
return {"risk_group": "RG3", "label": "External/LLM Egress (Non-sensitive)", "priority": "P2"}
if egress_category in ("internal", "none", None):
return {"risk_group": "RG4", "label": "Internal Only / No Observed Egress", "priority": "P3"}
return {"risk_group": "RG5", "label": "Unclassified / Unknown", "priority": "P3"}
Hardcoded for Phase 1.
T1-H: Platform — Ensure Properties Flow Through
Goal: Verify new entity properties from T1-A through T1-G flow through ingestion to storage and are queryable.
File: sv0-platform/src/ingestion/types.ts
NormalizedNode.properties is Record<string, unknown> — already flexible. New properties (egress_category, data_domains, risk_group, ownership_status, execution_count_30d, identity_binding_status) flow through without schema changes.
Verify:
- Properties survive ingestion → storage round-trip
- Properties are included in entity API responses
- Properties are available to evaluator via
EvaluationContext
File: sv0-platform/src/evaluator/rules/
Verify existing rules (orphaned_ownership, dormant_authority) fire correctly on automation identity subtypes. If execution_paths are not populated for automations yet, the evaluator still works on the raw entity properties.
T1-QA: PRD-Assertion Tests
Goal: Tests that directly assert PRD required outputs for each requirement.
| Test | Asserts |
|---|---|
test_req1_automation_inventory | All 4 types enumerated. Each has sys_id, automation_type, sys_created_by. |
test_req2_identity_binding | Bound automations produce: SP object_id, app_id, credential_type, creation_timestamp, permission_snapshot. |
test_req3_execution_detection | Each automation produces: last_observed_execution_timestamp, execution_count_30d, evidence_refs (<=10), binding_status. Script Includes produce "unlinked". No heuristic matching. |
test_req4_egress_classification | Same-instance → internal. LLM catalog → llm. Other external → external. Dynamic URL → unknown. |
test_req5_origin_classification | BR collection field → classified. Script-only automation → unknown. No script-parsed tables used. |
test_req6_ownership_validation | No owner → invalid. Disabled → invalid. Inactive → invalid. Groups only → ambiguous. Active individual → valid. |
test_req7_risk_grouping | RG1: sensitive + llm. RG2: sensitive + external. RG3: non-sensitive + external. RG4: internal. RG5: unknown. |
Track 2: Architecture Refactor (AFTER Pilot Gate)
Phase Map
T2-A: ADR — Import-by-Type Connector Architecture
T2-B: ADR — Platform-Only Finding Generation
T2-C: Connector refactor (DiscoveredEntities, deprecate ExecutionChain)
T2-D: Platform path materializer extension (automation chain traversal)
T2-E: Evaluator rule migration (detectors.py → platform evaluator)
T2-F: Update connector interface spec (05-connectors.md)
T2-G: Update data model (01-data-model.md)
Connector output target (post-refactor)
Individual entities by type:
- Business Rules → autonomous_identity (identitySubtype: business_rule)
- Script Includes → autonomous_identity (identitySubtype: system_execution)
- Flows → autonomous_identity (identitySubtype: flow_designer_flow)
- Scheduled Jobs → autonomous_identity (identitySubtype: scheduled_job)
- REST Messages → resource (resourceType: rest_message)
- OAuth Entities → autonomous_identity (identitySubtype: oauth_app)
- Azure SPs → autonomous_identity (identitySubtype: service_principal)
- Users/Owners → human_identity
Direct relationships (one-hop):
- Automation → TRIGGERS_ON → Table Resource
- Automation → EXECUTES_ON → REST Message
- Automation → RUNS_AS → SP or User
- REST Message → AUTHENTICATES_VIA → OAuth Entity
- SP → AUTHENTICATES_TO → OAuth Entity *(edge direction: SP is source, OAuth is target)*
- SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource
- Automation → OWNED_BY / CREATED_BY → User
Platform path materializer extension
File: sv0-platform/src/ingestion/path-materializer.ts
New traversal pattern for automation chains:
Automation → RUNS_AS → SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource
└→ SP → AUTHENTICATES_TO → OAuth Entity (cross-system tracing, no paths produced)
Note: The materializer follows RUNS_AS (identity binding) to reach the SP, then traverses the SP's own HAS_ROLE chain. AUTHENTICATES_TO edge direction is SP → OAuth (SP is source). The EXECUTES_ON / AUTHENTICATES_VIA chain is not used by the materializer — it is a parallel provenance path for UI display.
Track 3: UI Upgrades (Phases 2-5) — Parallel
Execution Flow semantics (UNIFIED DEFINITION)
"Execution Flow" mode performs a directed traversal from a seed entity through execution-chain relationship types. TWO patterns depending on seed entity type:
Pattern A — Seed is an automation entity (business_rule, script_include, flow, scheduled_job):
Forward edges (materializer path):
Automation → RUNS_AS → SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource
Provenance display (UI overlay, requires reverse AUTHENTICATES_TO lookup):
Automation → EXECUTES_ON → REST Message → AUTHENTICATES_VIA → OAuth Entity
OAuth Entity ←AUTHENTICATES_TO← SP (reverse edge, SP is source)
Also: Automation → TRIGGERS_ON → Table Resource (provenance display only)
Pattern B — Seed is an identity entity (service_principal, oauth_app):
SP → AUTHENTICATES_TO → OAuth Entity (SP is source, OAuth is target)
SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource
Edge direction reference: SP → AUTHENTICATES_TO → OAuth (SP is always source). UI "execution flow" that needs to show OAuth → SP must perform a reverse edge lookup.
Both plans must reference this canonical definition. UI upgrades plan Phase 2 Section 2b should be updated.
Phase H: Graph Focus Mode (UI plan Phase 2)
Existing plan applies. Execution Flow mode uses canonical definition above.
Phase I: Node Details Drawer (UI plan Phase 3)
Existing plan applies. Drawer shows new properties (egress, origin, risk group).
Phase J: Navigation Continuity (UI plan Phase 4)
Existing plan applies unchanged.
Phase K: Hardening & QA (UI plan Phase 5)
Existing plan applies.
Execution Order
Gate 0: Resolve PRD conflict, publish acceptance matrix, save plan
↓ (HARD GATE — nothing proceeds until Gate 0 passes)
Track 1 (Parallel): T1-A, T1-B, T1-C, T1-D, T1-E, T1-F (all can parallel)
→ T1-G (after T1-D + T1-E)
→ T1-H (after all T1-* modules done)
→ T1-QA
→ PILOT GATE
Track 2 (After pilot): T2-A, T2-B (ADRs) → T2-C → T2-D → T2-E → T2-F, T2-G
Track 3 (Parallel): Can start when T1-H confirms data flows through
H → I → J → K
Verification
# Connector (Track 1)
cd sv0-connectors/integrations/entra-servicenow && pytest
# Platform
cd sv0-platform && npm test && npm run test:integration && npm run typecheck
# UI
cd sv0-platform/ui && npm run ci
# Full stack
cd sv0-platform && docker compose up --build
# Trigger connector → verify entities with all 7 requirement outputs
# → verify findings → verify graph visualization