Comprehensive Plan: Sergey's Phase 1 Deltas + UI Upgrades

Date: 2026-02-10 Status: Track 1 COMPLETE — pilot gate cleared. Ready for Track 2/3.

Context

Sergey expanded the ServiceNow Phase 1 PRD from 3 to 7 requirements. Three are new additions pulled into Phase 1: data egress classification (#4), data origin classification (#5), and risk-based grouping (#7). Three existing items need verification and completion: identity binding (#2), execution detection (#3), ownership validation (#6). Automation inventory (#1) needs Flow Designer added.

The UI upgrades plan has Phase 1 (tables) complete, with Phases 2-5 pending. Graph focus "Execution Flow" mode maps directly to Sergey's automation chain concept.

Two Key Architectural Decisions

Decision 1: Import by type (connector emits entities independently)

Current: Connector correlator builds ExecutionChain objects by pre-linking BR→REST→OAuth→SP. Transformer decomposes chains into NormalizedGraph.

Target: Connector discovers entities by type independently. Each type emits as individual NormalizedNode with direct relationships (NormalizedEdge). No intermediate ExecutionChain representation. Platform path materializer reconstructs full execution flows.

Delivery strategy: This refactor is Track 2 — it runs AFTER the pilot-critical requirements are satisfied. Track 1 ships Sergey's 7 requirements using the existing connector architecture with targeted additions. Track 2 refactors the architecture for long-term scalability.

Decision 2: Platform evaluator only for findings (Target State)

Current: Connector detectors.py generates Detection objects for CLI. Platform evaluator generates Finding objects.

Target: All findings from platform evaluator. Connector is pure discovery + classification.

Delivery strategy: Track 1 adds new connector-side classifications (egress, origin, risk group) as entity properties that flow through to platform. Existing detectors continue working. Track 2 migrates all finding logic to platform evaluator and deprecates detectors.py.

Delivery Tracks

Track 1: Pilot-Critical (Sergey's 7 requirements) — SHIP FIRST

Minimal changes to satisfy all 7 PRD requirements. Uses existing connector architecture. Adds new modules alongside existing code.

Track 2: Architecture Refactor — AFTER PILOT GATE

Full import-by-type refactor, platform-only findings, ExecutionChain deprecation. Can be feature-flagged or phased in after pilot is stable.

Track 3: UI Upgrades (Phases 2-5) — PARALLEL

Graph focus, drawer, navigation. Can start once Track 1 data flows through platform.

Gate 0: PRD Lock (HARD PREREQUISITE)

Nothing starts until Gate 0 passes.

0a: Resolve merge conflict in PRD

File: sv0-documentation/docs/product/MVP1 - ServiceNow.md

Resolve merge conflict (lines 149-261)
Publish clean version with all 7 requirements

0b: Publish requirement-to-acceptance matrix

Create a locked acceptance matrix for all 7 requirements with concrete, testable outputs:

#	Requirement	Required Outputs (per automation)	Acceptance Test
1	Automation inventory	sys_id, automation_type, sys_created_by, sys_updated_by	All 4 types (BR, SI, Flow, Job) enumerated with sys_id. No execution claims.
2	Identity binding	SP object_id, app_id, credential_type (secret\|certificate), SP creation_timestamp, permission_assignments_snapshot	Each automation with REST Message → OAuth → SP chain produces all 5 fields.
3	Execution detection	last_observed_execution_timestamp, execution_count_30d, execution_evidence_refs (max 10 record IDs), identity_binding_status (bound\|unlinked)	Deterministic first-party linkage. No heuristic string matching. Script Includes output "unlinked" when join unavailable.
4	Data egress classification	egress_host, egress_base_url, egress_category (llm\|external\|internal\|none\|unknown)	Same-instance = internal (not all *.service-now.com). LLM catalog match. No endpoint observed = none. No payload/header inspection.
5	Data origin classification	referenced_tables, data_domains (hr\|identity\|customer\|financial\|unknown)	First-party table references only (BR collection field, Flow trigger table, REST Message target). Script-parsed origins → unknown.
6	Ownership validation	ownership_status (valid\|invalid\|ambiguous)	Deterministic rules: no owner = invalid, all disabled = invalid, all inactive = invalid, groups only = ambiguous, at least one active individual = valid.
7	Risk-based grouping	risk_group (RG1-RG5), risk_group_label, risk_group_priority	Matrix of egress (#4) × origin (#5). Hardcoded rules.

0c: Update automation-types.md

File: sv0-documentation/docs/integrations/servicenow/automation-types.md

Flows moved from "Phase 2" to Phase 1
Clarify: BR + SI + Scheduled Jobs are ALREADY detected. Flows are the new addition.
Update "Current Detection Coverage" to reflect actual state

0d: Save delivery plan

Save this plan to sv0-documentation/docs/plans/2026-02-10-phase1-deltas-plan.md

Track 1: Pilot-Critical Phases

Phase Map

Gate 0 ──→ T1-A (Inventory: add Flows)
       ──→ T1-B (Identity binding: verify + complete)
       ──→ T1-C (Execution detection: verify + complete)
       ──→ T1-D (Egress classification: new module)
       ──→ T1-E (Origin classification: new module)
       ──→ T1-F (Ownership validation: align to spec)
       ──→ T1-G (Risk grouping: new module, depends on D+E)
       ──→ T1-H (Platform: ensure properties flow through)
       ──→ T1-QA (PRD-assertion tests)

T1-A through T1-F can run in parallel. T1-G depends on T1-D + T1-E. T1-QA runs last.

T1-A: Automation Inventory — Add Flow Designer

Goal: Add Flow Designer as 4th automation type. (Sergey's requirement #1)

Status: discover_flows() IMPLEMENTED but trigger filter incomplete — excludes service_catalog triggers.

Clarification: BR, SI, Scheduled Jobs are already discovered by servicenow_client.py. Flows are the only new addition.

File: sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py

discover_flows() exists (lines 1168-1328) but queries:

query="trigger_typeINrecord,schedule"  # ← MISSING: service_catalog

Gap found 2026-02-11: AI Triage Flow (Service Catalog trigger → Azure OpenAI) not discovered. Fix: expand to trigger_typeINrecord,schedule,service_catalog. See reconciled roadmap item 2A.

Tables:

sys_hub_flow — flow definitions (sys_created_by, sys_updated_by)
sys_hub_trigger_instance — triggers (trigger_type: record/schedule/service_catalog = autonomous)
sys_hub_action_instance — actions (HTTP steps, REST calls)

File: sv0-connectors/integrations/entra-servicenow/transformer.py

Add Flow nodes as autonomous_identity with identitySubtype: "flow_designer_flow". Emit relationships:

Flow → TRIGGERS_ON → table (from trigger config)
Flow → RUNS_AS → identity (if run_as configured)
Flow → EXECUTES_ON → REST Message (if HTTP action found)

Output per automation:

sys_id as source_id
automation_type: business_rule / script_include / flow / scheduled_job
sys_created_by, sys_updated_by as properties

T1-B: Identity Binding — Verify + Complete

Goal: Verify existing identity binding outputs match PRD spec. (Requirement #2)

File: sv0-connectors/integrations/entra-servicenow/correlator.py

Verify these outputs exist per automation with a REST Message → OAuth → SP chain:

Required Output	Current Field	Status
Entra SP object ID	`azure_sp.id`	Verify present
App (client) ID	`azure_sp.app_id`	Verify present
Credential type (secret\|certificate)	?	Check — may need to add
SP creation timestamp	`azure_sp.created_date_time`	Verify present
Permission assignments snapshot	`canonical_permissions`	Verify present

Action: Audit Integration dataclass fields. Add credential_type to SP entity properties if missing (derive from keyCredentials vs passwordCredentials on the app registration).

T1-C: Execution Detection — Verify + Complete

Goal: Verify and complete execution detection with all required outputs. (Requirement #3)

This is a dedicated phase, not deferred.

PRD required outputs per automation:

last_observed_execution_timestamp — from SN execution/transaction/flow logs
execution_count_30d — count of execution records in last 30 days
execution_evidence_refs — up to 10 record IDs (store no more than 10 in Phase 1)
identity_binding_status — bound (deterministic linkage exists) or unlinked (platform does not expose a deterministic join)

PRD constraints:

Correlation MUST use first-party identifiers when exposed by the platform
MUST NOT rely on heuristic string matching
Each execution record is a distinct event regardless of timestamp granularity
Script Includes will likely produce many unlinked results (no deterministic join from execution log → SI)

Current state to audit:

Azure sign-in logs: already discovered, linked via EXECUTES_ON edges in transformer
ServiceNow execution logs: check what servicenow_client.py fetches (syslog_transaction, sys_flow_context, etc.)

Files to modify:

sv0-connectors/integrations/entra-servicenow/servicenow_client.py — Ensure execution log discovery for all 4 automation types

sv0-connectors/integrations/entra-servicenow/transformer.py — Add execution evidence properties:

# On each automation entity
properties["last_observed_execution_timestamp"] = timestamp_or_null
properties["execution_count_30d"] = count
properties["execution_evidence_refs"] = record_ids[:10]
properties["identity_binding_status"] = "bound" | "unlinked"

Execution log sources by type:

Automation Type	Execution Log Source	Deterministic Join	Expected Status
Business Rules	`syslog_transaction` (table operation logs)	BR sys_id in log → yes, IF platform exposes it	bound or unlinked
Script Includes	`syslog_transaction`	No deterministic join from log → SI	mostly unlinked
Flows	`sys_flow_context` (flow run records)	Flow sys_id in context → yes	bound
Scheduled Jobs	`sysauto_script` run history / `sys_trigger`	Job sys_id in trigger → yes	bound

T1-D: Data Egress Classification

Goal: Classify outbound egress per automation. (Requirement #4)

New file: sv0-connectors/integrations/entra-servicenow/egress_classifier.py

LLM_CATALOG = [
    "api.openai.com", "*.openai.azure.com", "api.anthropic.com",
    "generativelanguage.googleapis.com", "api.cohere.ai",
]

def classify_egress(endpoint_url: str | None, instance_host: str) -> dict:
    if not endpoint_url:
        return {"egress_host": None, "egress_base_url": None, "egress_category": "none"}
    if _is_dynamic_url(endpoint_url):
        return {"egress_host": None, "egress_base_url": None, "egress_category": "unknown"}
    host = urlparse(endpoint_url).hostname
    base_url = f"{parsed.scheme}://{host}"
    if any(fnmatch(host, p) for p in LLM_CATALOG):
        return {"egress_host": host, "egress_base_url": base_url, "egress_category": "llm"}
    if host == instance_host:  # SAME INSTANCE ONLY — not all *.service-now.com
        return {"egress_host": host, "egress_base_url": base_url, "egress_category": "internal"}
    return {"egress_host": host, "egress_base_url": base_url, "egress_category": "external"}

Internal = same instance only (compare against configured instance_host), NOT all *.service-now.com. PRD defines internal as "within the same ServiceNow tenant."

Sources: REST Message endpoint URL, Flow HTTP action static endpoint.

Constraints: No payload/header inspection. Orchestration layers = egress point.

T1-E: Data Origin Classification

Goal: Map automations to SN tables, classify into data domains. (Requirement #5)

New file: sv0-connectors/integrations/entra-servicenow/origin_classifier.py

Only use first-party table references from platform configuration. Do NOT infer from script code parsing in Phase 1.

Acceptable first-party sources:

Business Rules: collection field (the table the BR triggers on) — platform configuration field
Flows: trigger table from sys_hub_trigger_instance — platform configuration
Scheduled Jobs: target table from job configuration (if available)
REST Messages: target table reference (if available in configuration)

NOT acceptable in Phase 1:

GlideRecord table names parsed from script bodies (analyze_script_mutations())
Any table reference extracted by code parsing/regex

DEFAULT_DOMAINS = {
    "hr": ["sn_hr_*", "hr_*"],
    "identity": ["sys_user", "sys_user_role", "sys_user_grmember"],
    "customer": ["customer_*", "account*", "contact*"],
    "financial": ["alm_*", "*finance*"],
}

def classify_origin(tables: list[str], domains: dict = DEFAULT_DOMAINS) -> dict:
    """Only accepts first-party table references from platform config fields."""
    matched = set()
    for table in tables:
        for domain, patterns in domains.items():
            if any(fnmatch(table, p) for p in patterns):
                matched.add(domain)
    return {"referenced_tables": tables, "data_domains": sorted(matched) or ["unknown"]}

Script Includes and other script-based automation where table reference is not a platform config field → data_domains: ["unknown"]

T1-F: Ownership Validation

Goal: Align ownership detection to Sergey's spec with deterministic rules. (Requirement #6)

All rules are deterministic — no threshold ambiguity:

Condition	ownership_status	Rationale
No owner exists (SN or Entra)	`invalid`	No accountability
All owners disabled (`active=false`)	`invalid`	No functioning accountability
All owners inactive (no sign-in in configured window)	`invalid`	No functioning accountability
All owners are groups (no individual)	`ambiguous`	Accountability exists but is diffuse
At least one active individual owner	`valid`	Clear individual accountability

The "configured window" for inactive is a tenant-level setting (default: 90 days). This is a configuration, not a heuristic — it's a deterministic threshold applied uniformly.

Files: Add compute_ownership_status() to transformer. Reads from existing owner discovery data (Azure SP owners + SN creator lookup).

T1-G: Risk-Based Grouping

Goal: Assign RG1-RG5 per automation. (Requirement #7)

Depends on T1-D (egress) and T1-E (origin).

New file: sv0-connectors/integrations/entra-servicenow/risk_grouper.py

def assign_risk_group(egress_category: str, data_domains: list[str]) -> dict:
    sensitive = bool(set(data_domains) & {"hr", "identity", "customer", "financial"})
    if sensitive and egress_category == "llm":
        return {"risk_group": "RG1", "label": "Sensitive Data → LLM Egress", "priority": "P0"}
    if sensitive and egress_category == "external":
        return {"risk_group": "RG2", "label": "Sensitive Data → External Egress", "priority": "P1"}
    if egress_category in ("llm", "external"):
        return {"risk_group": "RG3", "label": "External/LLM Egress (Non-sensitive)", "priority": "P2"}
    if egress_category in ("internal", "none", None):
        return {"risk_group": "RG4", "label": "Internal Only / No Observed Egress", "priority": "P3"}
    return {"risk_group": "RG5", "label": "Unclassified / Unknown", "priority": "P3"}

Hardcoded for Phase 1.

T1-H: Platform — Ensure Properties Flow Through

Goal: Verify new entity properties from T1-A through T1-G flow through ingestion to storage and are queryable.

File: sv0-platform/src/ingestion/types.ts

NormalizedNode.properties is Record<string, unknown> — already flexible. New properties (egress_category, data_domains, risk_group, ownership_status, execution_count_30d, identity_binding_status) flow through without schema changes.

Verify:

Properties survive ingestion → storage round-trip
Properties are included in entity API responses
Properties are available to evaluator via EvaluationContext

File: sv0-platform/src/evaluator/rules/

Verify existing rules (orphaned_ownership, dormant_authority) fire correctly on automation identity subtypes. If execution_paths are not populated for automations yet, the evaluator still works on the raw entity properties.

T1-QA: PRD-Assertion Tests

Goal: Tests that directly assert PRD required outputs for each requirement.

Test	Asserts
`test_req1_automation_inventory`	All 4 types enumerated. Each has sys_id, automation_type, sys_created_by.
`test_req2_identity_binding`	Bound automations produce: SP object_id, app_id, credential_type, creation_timestamp, permission_snapshot.
`test_req3_execution_detection`	Each automation produces: last_observed_execution_timestamp, execution_count_30d, evidence_refs (<=10), binding_status. Script Includes produce "unlinked". No heuristic matching.
`test_req4_egress_classification`	Same-instance → internal. LLM catalog → llm. Other external → external. Dynamic URL → unknown.
`test_req5_origin_classification`	BR collection field → classified. Script-only automation → unknown. No script-parsed tables used.
`test_req6_ownership_validation`	No owner → invalid. Disabled → invalid. Inactive → invalid. Groups only → ambiguous. Active individual → valid.
`test_req7_risk_grouping`	RG1: sensitive + llm. RG2: sensitive + external. RG3: non-sensitive + external. RG4: internal. RG5: unknown.

Track 2: Architecture Refactor (AFTER Pilot Gate)

Phase Map

T2-A: ADR — Import-by-Type Connector Architecture
T2-B: ADR — Platform-Only Finding Generation
T2-C: Connector refactor (DiscoveredEntities, deprecate ExecutionChain)
T2-D: Platform path materializer extension (automation chain traversal)
T2-E: Evaluator rule migration (detectors.py → platform evaluator)
T2-F: Update connector interface spec (05-connectors.md)
T2-G: Update data model (01-data-model.md)

Connector output target (post-refactor)

Individual entities by type:
  - Business Rules → autonomous_identity (identitySubtype: business_rule)
  - Script Includes → autonomous_identity (identitySubtype: system_execution)
  - Flows → autonomous_identity (identitySubtype: flow_designer_flow)
  - Scheduled Jobs → autonomous_identity (identitySubtype: scheduled_job)
  - REST Messages → resource (resourceType: rest_message)
  - OAuth Entities → autonomous_identity (identitySubtype: oauth_app)
  - Azure SPs → autonomous_identity (identitySubtype: service_principal)
  - Users/Owners → human_identity

Direct relationships (one-hop):
  - Automation → TRIGGERS_ON → Table Resource
  - Automation → EXECUTES_ON → REST Message
  - Automation → RUNS_AS → SP or User
  - REST Message → AUTHENTICATES_VIA → OAuth Entity
  - SP → AUTHENTICATES_TO → OAuth Entity  *(edge direction: SP is source, OAuth is target)*
  - SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource
  - Automation → OWNED_BY / CREATED_BY → User

Platform path materializer extension

File: sv0-platform/src/ingestion/path-materializer.ts

New traversal pattern for automation chains:

Automation → RUNS_AS → SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource
                         └→ SP → AUTHENTICATES_TO → OAuth Entity (cross-system tracing, no paths produced)

Note: The materializer follows RUNS_AS (identity binding) to reach the SP, then traverses the SP's own HAS_ROLE chain. AUTHENTICATES_TO edge direction is SP → OAuth (SP is source). The EXECUTES_ON / AUTHENTICATES_VIA chain is not used by the materializer — it is a parallel provenance path for UI display.

Track 3: UI Upgrades (Phases 2-5) — Parallel

Execution Flow semantics (UNIFIED DEFINITION)

"Execution Flow" mode performs a directed traversal from a seed entity through execution-chain relationship types. TWO patterns depending on seed entity type:

Pattern A — Seed is an automation entity (business_rule, script_include, flow, scheduled_job):

Forward edges (materializer path):
  Automation → RUNS_AS → SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource

Provenance display (UI overlay, requires reverse AUTHENTICATES_TO lookup):
  Automation → EXECUTES_ON → REST Message → AUTHENTICATES_VIA → OAuth Entity
  OAuth Entity ←AUTHENTICATES_TO← SP  (reverse edge, SP is source)

Also: Automation → TRIGGERS_ON → Table Resource (provenance display only)

Pattern B — Seed is an identity entity (service_principal, oauth_app):

SP → AUTHENTICATES_TO → OAuth Entity (SP is source, OAuth is target)
SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource

Edge direction reference: SP → AUTHENTICATES_TO → OAuth (SP is always source). UI "execution flow" that needs to show OAuth → SP must perform a reverse edge lookup.

Both plans must reference this canonical definition. UI upgrades plan Phase 2 Section 2b should be updated.

Phase H: Graph Focus Mode (UI plan Phase 2)

Existing plan applies. Execution Flow mode uses canonical definition above.

Phase I: Node Details Drawer (UI plan Phase 3)

Existing plan applies. Drawer shows new properties (egress, origin, risk group).

Existing plan applies unchanged.

Phase K: Hardening & QA (UI plan Phase 5)

Existing plan applies.

Execution Order

Gate 0:                Resolve PRD conflict, publish acceptance matrix, save plan
                       ↓ (HARD GATE — nothing proceeds until Gate 0 passes)

Track 1 (Parallel):   T1-A, T1-B, T1-C, T1-D, T1-E, T1-F (all can parallel)
                       → T1-G (after T1-D + T1-E)
                       → T1-H (after all T1-* modules done)
                       → T1-QA
                       → PILOT GATE

Track 2 (After pilot): T2-A, T2-B (ADRs) → T2-C → T2-D → T2-E → T2-F, T2-G

Track 3 (Parallel):    Can start when T1-H confirms data flows through
                       H → I → J → K

Verification

# Connector (Track 1)
cd sv0-connectors/integrations/entra-servicenow && pytest

# Platform
cd sv0-platform && npm test && npm run test:integration && npm run typecheck

# UI
cd sv0-platform/ui && npm run ci

# Full stack
cd sv0-platform && docker compose up --build
# Trigger connector → verify entities with all 7 requirement outputs
# → verify findings → verify graph visualization

Context​

Two Key Architectural Decisions​

Decision 1: Import by type (connector emits entities independently)​

Decision 2: Platform evaluator only for findings (Target State)​

Delivery Tracks​

Track 1: Pilot-Critical (Sergey's 7 requirements) — SHIP FIRST​

Track 2: Architecture Refactor — AFTER PILOT GATE​

Track 3: UI Upgrades (Phases 2-5) — PARALLEL​

Gate 0: PRD Lock (HARD PREREQUISITE)​

0a: Resolve merge conflict in PRD​

0b: Publish requirement-to-acceptance matrix​

0c: Update automation-types.md​

0d: Save delivery plan​

Track 1: Pilot-Critical Phases​

Phase Map​

T1-A: Automation Inventory — Add Flow Designer​

T1-B: Identity Binding — Verify + Complete​

T1-C: Execution Detection — Verify + Complete​

T1-D: Data Egress Classification​

T1-E: Data Origin Classification​

T1-F: Ownership Validation​

T1-G: Risk-Based Grouping​

T1-H: Platform — Ensure Properties Flow Through​

T1-QA: PRD-Assertion Tests​

Track 2: Architecture Refactor (AFTER Pilot Gate)​

Phase Map​

Connector output target (post-refactor)​

Platform path materializer extension​

Track 3: UI Upgrades (Phases 2-5) — Parallel​

Execution Flow semantics (UNIFIED DEFINITION)​

Phase H: Graph Focus Mode (UI plan Phase 2)​

Phase I: Node Details Drawer (UI plan Phase 3)​

Phase J: Navigation Continuity (UI plan Phase 4)​

Phase K: Hardening & QA (UI plan Phase 5)​

Execution Order​

Verification​

Context

Two Key Architectural Decisions

Decision 1: Import by type (connector emits entities independently)

Decision 2: Platform evaluator only for findings (Target State)

Delivery Tracks

Track 1: Pilot-Critical (Sergey's 7 requirements) — SHIP FIRST

Track 2: Architecture Refactor — AFTER PILOT GATE

Track 3: UI Upgrades (Phases 2-5) — PARALLEL

Gate 0: PRD Lock (HARD PREREQUISITE)

0a: Resolve merge conflict in PRD

0b: Publish requirement-to-acceptance matrix

0c: Update automation-types.md

0d: Save delivery plan

Track 1: Pilot-Critical Phases

Phase Map

T1-A: Automation Inventory — Add Flow Designer

T1-B: Identity Binding — Verify + Complete

T1-C: Execution Detection — Verify + Complete

T1-D: Data Egress Classification

T1-E: Data Origin Classification

T1-F: Ownership Validation

T1-G: Risk-Based Grouping

T1-H: Platform — Ensure Properties Flow Through

T1-QA: PRD-Assertion Tests

Track 2: Architecture Refactor (AFTER Pilot Gate)

Phase Map

Connector output target (post-refactor)

Platform path materializer extension

Track 3: UI Upgrades (Phases 2-5) — Parallel

Execution Flow semantics (UNIFIED DEFINITION)

Phase H: Graph Focus Mode (UI plan Phase 2)

Phase I: Node Details Drawer (UI plan Phase 3)

Phase J: Navigation Continuity (UI plan Phase 4)

Phase K: Hardening & QA (UI plan Phase 5)

Execution Order

Verification