Skip to main content

Comprehensive Plan: Sergey's Phase 1 Deltas + UI Upgrades

Date: 2026-02-10 Status: Track 1 COMPLETE — pilot gate cleared. Ready for Track 2/3.

Context

Sergey expanded the ServiceNow Phase 1 PRD from 3 to 7 requirements. Three are new additions pulled into Phase 1: data egress classification (#4), data origin classification (#5), and risk-based grouping (#7). Three existing items need verification and completion: identity binding (#2), execution detection (#3), ownership validation (#6). Automation inventory (#1) needs Flow Designer added.

The UI upgrades plan has Phase 1 (tables) complete, with Phases 2-5 pending. Graph focus "Execution Flow" mode maps directly to Sergey's automation chain concept.


Two Key Architectural Decisions

Decision 1: Import by type (connector emits entities independently)

Current: Connector correlator builds ExecutionChain objects by pre-linking BR→REST→OAuth→SP. Transformer decomposes chains into NormalizedGraph.

Target: Connector discovers entities by type independently. Each type emits as individual NormalizedNode with direct relationships (NormalizedEdge). No intermediate ExecutionChain representation. Platform path materializer reconstructs full execution flows.

Delivery strategy: This refactor is Track 2 — it runs AFTER the pilot-critical requirements are satisfied. Track 1 ships Sergey's 7 requirements using the existing connector architecture with targeted additions. Track 2 refactors the architecture for long-term scalability.

Decision 2: Platform evaluator only for findings (Target State)

Current: Connector detectors.py generates Detection objects for CLI. Platform evaluator generates Finding objects.

Target: All findings from platform evaluator. Connector is pure discovery + classification.

Delivery strategy: Track 1 adds new connector-side classifications (egress, origin, risk group) as entity properties that flow through to platform. Existing detectors continue working. Track 2 migrates all finding logic to platform evaluator and deprecates detectors.py.


Delivery Tracks

Track 1: Pilot-Critical (Sergey's 7 requirements) — SHIP FIRST

Minimal changes to satisfy all 7 PRD requirements. Uses existing connector architecture. Adds new modules alongside existing code.

Track 2: Architecture Refactor — AFTER PILOT GATE

Full import-by-type refactor, platform-only findings, ExecutionChain deprecation. Can be feature-flagged or phased in after pilot is stable.

Track 3: UI Upgrades (Phases 2-5) — PARALLEL

Graph focus, drawer, navigation. Can start once Track 1 data flows through platform.


Gate 0: PRD Lock (HARD PREREQUISITE)

Nothing starts until Gate 0 passes.

0a: Resolve merge conflict in PRD

File: sv0-documentation/docs/product/MVP1 - ServiceNow.md

  • Resolve merge conflict (lines 149-261)
  • Publish clean version with all 7 requirements

0b: Publish requirement-to-acceptance matrix

Create a locked acceptance matrix for all 7 requirements with concrete, testable outputs:

#RequirementRequired Outputs (per automation)Acceptance Test
1Automation inventorysys_id, automation_type, sys_created_by, sys_updated_byAll 4 types (BR, SI, Flow, Job) enumerated with sys_id. No execution claims.
2Identity bindingSP object_id, app_id, credential_type (secret|certificate), SP creation_timestamp, permission_assignments_snapshotEach automation with REST Message → OAuth → SP chain produces all 5 fields.
3Execution detectionlast_observed_execution_timestamp, execution_count_30d, execution_evidence_refs (max 10 record IDs), identity_binding_status (bound|unlinked)Deterministic first-party linkage. No heuristic string matching. Script Includes output "unlinked" when join unavailable.
4Data egress classificationegress_host, egress_base_url, egress_category (llm|external|internal|none|unknown)Same-instance = internal (not all *.service-now.com). LLM catalog match. No endpoint observed = none. No payload/header inspection.
5Data origin classificationreferenced_tables, data_domains (hr|identity|customer|financial|unknown)First-party table references only (BR collection field, Flow trigger table, REST Message target). Script-parsed origins → unknown.
6Ownership validationownership_status (valid|invalid|ambiguous)Deterministic rules: no owner = invalid, all disabled = invalid, all inactive = invalid, groups only = ambiguous, at least one active individual = valid.
7Risk-based groupingrisk_group (RG1-RG5), risk_group_label, risk_group_priorityMatrix of egress (#4) × origin (#5). Hardcoded rules.

0c: Update automation-types.md

File: sv0-documentation/docs/integrations/servicenow/automation-types.md

  • Flows moved from "Phase 2" to Phase 1
  • Clarify: BR + SI + Scheduled Jobs are ALREADY detected. Flows are the new addition.
  • Update "Current Detection Coverage" to reflect actual state

0d: Save delivery plan

Save this plan to sv0-documentation/docs/plans/2026-02-10-phase1-deltas-plan.md


Track 1: Pilot-Critical Phases

Phase Map

Gate 0 ──→ T1-A (Inventory: add Flows)
──→ T1-B (Identity binding: verify + complete)
──→ T1-C (Execution detection: verify + complete)
──→ T1-D (Egress classification: new module)
──→ T1-E (Origin classification: new module)
──→ T1-F (Ownership validation: align to spec)
──→ T1-G (Risk grouping: new module, depends on D+E)
──→ T1-H (Platform: ensure properties flow through)
──→ T1-QA (PRD-assertion tests)

T1-A through T1-F can run in parallel. T1-G depends on T1-D + T1-E. T1-QA runs last.


T1-A: Automation Inventory — Add Flow Designer

Goal: Add Flow Designer as 4th automation type. (Sergey's requirement #1)

Status: discover_flows() IMPLEMENTED but trigger filter incomplete — excludes service_catalog triggers.

Clarification: BR, SI, Scheduled Jobs are already discovered by servicenow_client.py. Flows are the only new addition.

File: sv0-connectors/integrations/entra-servicenow/src/entra_servicenow/adapters/servicenow_client.py

discover_flows() exists (lines 1168-1328) but queries:

query="trigger_typeINrecord,schedule"  # ← MISSING: service_catalog

Gap found 2026-02-11: AI Triage Flow (Service Catalog trigger → Azure OpenAI) not discovered. Fix: expand to trigger_typeINrecord,schedule,service_catalog. See reconciled roadmap item 2A.

Tables:

  • sys_hub_flow — flow definitions (sys_created_by, sys_updated_by)
  • sys_hub_trigger_instance — triggers (trigger_type: record/schedule/service_catalog = autonomous)
  • sys_hub_action_instance — actions (HTTP steps, REST calls)

File: sv0-connectors/integrations/entra-servicenow/transformer.py

Add Flow nodes as autonomous_identity with identitySubtype: "flow_designer_flow". Emit relationships:

  • Flow → TRIGGERS_ON → table (from trigger config)
  • Flow → RUNS_AS → identity (if run_as configured)
  • Flow → EXECUTES_ON → REST Message (if HTTP action found)

Output per automation:

  • sys_id as source_id
  • automation_type: business_rule / script_include / flow / scheduled_job
  • sys_created_by, sys_updated_by as properties

T1-B: Identity Binding — Verify + Complete

Goal: Verify existing identity binding outputs match PRD spec. (Requirement #2)

File: sv0-connectors/integrations/entra-servicenow/correlator.py

Verify these outputs exist per automation with a REST Message → OAuth → SP chain:

Required OutputCurrent FieldStatus
Entra SP object IDazure_sp.idVerify present
App (client) IDazure_sp.app_idVerify present
Credential type (secret|certificate)?Check — may need to add
SP creation timestampazure_sp.created_date_timeVerify present
Permission assignments snapshotcanonical_permissionsVerify present

Action: Audit Integration dataclass fields. Add credential_type to SP entity properties if missing (derive from keyCredentials vs passwordCredentials on the app registration).


T1-C: Execution Detection — Verify + Complete

Goal: Verify and complete execution detection with all required outputs. (Requirement #3)

This is a dedicated phase, not deferred.

PRD required outputs per automation:

  1. last_observed_execution_timestamp — from SN execution/transaction/flow logs
  2. execution_count_30d — count of execution records in last 30 days
  3. execution_evidence_refs — up to 10 record IDs (store no more than 10 in Phase 1)
  4. identity_binding_statusbound (deterministic linkage exists) or unlinked (platform does not expose a deterministic join)

PRD constraints:

  • Correlation MUST use first-party identifiers when exposed by the platform
  • MUST NOT rely on heuristic string matching
  • Each execution record is a distinct event regardless of timestamp granularity
  • Script Includes will likely produce many unlinked results (no deterministic join from execution log → SI)

Current state to audit:

  • Azure sign-in logs: already discovered, linked via EXECUTES_ON edges in transformer
  • ServiceNow execution logs: check what servicenow_client.py fetches (syslog_transaction, sys_flow_context, etc.)

Files to modify:

  • sv0-connectors/integrations/entra-servicenow/servicenow_client.py — Ensure execution log discovery for all 4 automation types
  • sv0-connectors/integrations/entra-servicenow/transformer.py — Add execution evidence properties:
    # On each automation entity
    properties["last_observed_execution_timestamp"] = timestamp_or_null
    properties["execution_count_30d"] = count
    properties["execution_evidence_refs"] = record_ids[:10]
    properties["identity_binding_status"] = "bound" | "unlinked"

Execution log sources by type:

Automation TypeExecution Log SourceDeterministic JoinExpected Status
Business Rulessyslog_transaction (table operation logs)BR sys_id in log → yes, IF platform exposes itbound or unlinked
Script Includessyslog_transactionNo deterministic join from log → SImostly unlinked
Flowssys_flow_context (flow run records)Flow sys_id in context → yesbound
Scheduled Jobssysauto_script run history / sys_triggerJob sys_id in trigger → yesbound

T1-D: Data Egress Classification

Goal: Classify outbound egress per automation. (Requirement #4)

New file: sv0-connectors/integrations/entra-servicenow/egress_classifier.py

LLM_CATALOG = [
"api.openai.com", "*.openai.azure.com", "api.anthropic.com",
"generativelanguage.googleapis.com", "api.cohere.ai",
]

def classify_egress(endpoint_url: str | None, instance_host: str) -> dict:
if not endpoint_url:
return {"egress_host": None, "egress_base_url": None, "egress_category": "none"}
if _is_dynamic_url(endpoint_url):
return {"egress_host": None, "egress_base_url": None, "egress_category": "unknown"}
host = urlparse(endpoint_url).hostname
base_url = f"{parsed.scheme}://{host}"
if any(fnmatch(host, p) for p in LLM_CATALOG):
return {"egress_host": host, "egress_base_url": base_url, "egress_category": "llm"}
if host == instance_host: # SAME INSTANCE ONLY — not all *.service-now.com
return {"egress_host": host, "egress_base_url": base_url, "egress_category": "internal"}
return {"egress_host": host, "egress_base_url": base_url, "egress_category": "external"}

Internal = same instance only (compare against configured instance_host), NOT all *.service-now.com. PRD defines internal as "within the same ServiceNow tenant."

Sources: REST Message endpoint URL, Flow HTTP action static endpoint.

Constraints: No payload/header inspection. Orchestration layers = egress point.


T1-E: Data Origin Classification

Goal: Map automations to SN tables, classify into data domains. (Requirement #5)

New file: sv0-connectors/integrations/entra-servicenow/origin_classifier.py

Only use first-party table references from platform configuration. Do NOT infer from script code parsing in Phase 1.

Acceptable first-party sources:

  • Business Rules: collection field (the table the BR triggers on) — platform configuration field
  • Flows: trigger table from sys_hub_trigger_instance — platform configuration
  • Scheduled Jobs: target table from job configuration (if available)
  • REST Messages: target table reference (if available in configuration)

NOT acceptable in Phase 1:

  • GlideRecord table names parsed from script bodies (analyze_script_mutations())
  • Any table reference extracted by code parsing/regex
DEFAULT_DOMAINS = {
"hr": ["sn_hr_*", "hr_*"],
"identity": ["sys_user", "sys_user_role", "sys_user_grmember"],
"customer": ["customer_*", "account*", "contact*"],
"financial": ["alm_*", "*finance*"],
}

def classify_origin(tables: list[str], domains: dict = DEFAULT_DOMAINS) -> dict:
"""Only accepts first-party table references from platform config fields."""
matched = set()
for table in tables:
for domain, patterns in domains.items():
if any(fnmatch(table, p) for p in patterns):
matched.add(domain)
return {"referenced_tables": tables, "data_domains": sorted(matched) or ["unknown"]}

Script Includes and other script-based automation where table reference is not a platform config field → data_domains: ["unknown"]


T1-F: Ownership Validation

Goal: Align ownership detection to Sergey's spec with deterministic rules. (Requirement #6)

All rules are deterministic — no threshold ambiguity:

Conditionownership_statusRationale
No owner exists (SN or Entra)invalidNo accountability
All owners disabled (active=false)invalidNo functioning accountability
All owners inactive (no sign-in in configured window)invalidNo functioning accountability
All owners are groups (no individual)ambiguousAccountability exists but is diffuse
At least one active individual ownervalidClear individual accountability

The "configured window" for inactive is a tenant-level setting (default: 90 days). This is a configuration, not a heuristic — it's a deterministic threshold applied uniformly.

Files: Add compute_ownership_status() to transformer. Reads from existing owner discovery data (Azure SP owners + SN creator lookup).


T1-G: Risk-Based Grouping

Goal: Assign RG1-RG5 per automation. (Requirement #7)

Depends on T1-D (egress) and T1-E (origin).

New file: sv0-connectors/integrations/entra-servicenow/risk_grouper.py

def assign_risk_group(egress_category: str, data_domains: list[str]) -> dict:
sensitive = bool(set(data_domains) & {"hr", "identity", "customer", "financial"})
if sensitive and egress_category == "llm":
return {"risk_group": "RG1", "label": "Sensitive Data → LLM Egress", "priority": "P0"}
if sensitive and egress_category == "external":
return {"risk_group": "RG2", "label": "Sensitive Data → External Egress", "priority": "P1"}
if egress_category in ("llm", "external"):
return {"risk_group": "RG3", "label": "External/LLM Egress (Non-sensitive)", "priority": "P2"}
if egress_category in ("internal", "none", None):
return {"risk_group": "RG4", "label": "Internal Only / No Observed Egress", "priority": "P3"}
return {"risk_group": "RG5", "label": "Unclassified / Unknown", "priority": "P3"}

Hardcoded for Phase 1.


T1-H: Platform — Ensure Properties Flow Through

Goal: Verify new entity properties from T1-A through T1-G flow through ingestion to storage and are queryable.

File: sv0-platform/src/ingestion/types.ts

NormalizedNode.properties is Record<string, unknown> — already flexible. New properties (egress_category, data_domains, risk_group, ownership_status, execution_count_30d, identity_binding_status) flow through without schema changes.

Verify:

  1. Properties survive ingestion → storage round-trip
  2. Properties are included in entity API responses
  3. Properties are available to evaluator via EvaluationContext

File: sv0-platform/src/evaluator/rules/

Verify existing rules (orphaned_ownership, dormant_authority) fire correctly on automation identity subtypes. If execution_paths are not populated for automations yet, the evaluator still works on the raw entity properties.


T1-QA: PRD-Assertion Tests

Goal: Tests that directly assert PRD required outputs for each requirement.

TestAsserts
test_req1_automation_inventoryAll 4 types enumerated. Each has sys_id, automation_type, sys_created_by.
test_req2_identity_bindingBound automations produce: SP object_id, app_id, credential_type, creation_timestamp, permission_snapshot.
test_req3_execution_detectionEach automation produces: last_observed_execution_timestamp, execution_count_30d, evidence_refs (<=10), binding_status. Script Includes produce "unlinked". No heuristic matching.
test_req4_egress_classificationSame-instance → internal. LLM catalog → llm. Other external → external. Dynamic URL → unknown.
test_req5_origin_classificationBR collection field → classified. Script-only automation → unknown. No script-parsed tables used.
test_req6_ownership_validationNo owner → invalid. Disabled → invalid. Inactive → invalid. Groups only → ambiguous. Active individual → valid.
test_req7_risk_groupingRG1: sensitive + llm. RG2: sensitive + external. RG3: non-sensitive + external. RG4: internal. RG5: unknown.

Track 2: Architecture Refactor (AFTER Pilot Gate)

Phase Map

T2-A: ADR — Import-by-Type Connector Architecture
T2-B: ADR — Platform-Only Finding Generation
T2-C: Connector refactor (DiscoveredEntities, deprecate ExecutionChain)
T2-D: Platform path materializer extension (automation chain traversal)
T2-E: Evaluator rule migration (detectors.py → platform evaluator)
T2-F: Update connector interface spec (05-connectors.md)
T2-G: Update data model (01-data-model.md)

Connector output target (post-refactor)

Individual entities by type:
- Business Rules → autonomous_identity (identitySubtype: business_rule)
- Script Includes → autonomous_identity (identitySubtype: system_execution)
- Flows → autonomous_identity (identitySubtype: flow_designer_flow)
- Scheduled Jobs → autonomous_identity (identitySubtype: scheduled_job)
- REST Messages → resource (resourceType: rest_message)
- OAuth Entities → autonomous_identity (identitySubtype: oauth_app)
- Azure SPs → autonomous_identity (identitySubtype: service_principal)
- Users/Owners → human_identity

Direct relationships (one-hop):
- Automation → TRIGGERS_ON → Table Resource
- Automation → EXECUTES_ON → REST Message
- Automation → RUNS_AS → SP or User
- REST Message → AUTHENTICATES_VIA → OAuth Entity
- SP → AUTHENTICATES_TO → OAuth Entity *(edge direction: SP is source, OAuth is target)*
- SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource
- Automation → OWNED_BY / CREATED_BY → User

Platform path materializer extension

File: sv0-platform/src/ingestion/path-materializer.ts

New traversal pattern for automation chains:

Automation → RUNS_AS → SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource
└→ SP → AUTHENTICATES_TO → OAuth Entity (cross-system tracing, no paths produced)

Note: The materializer follows RUNS_AS (identity binding) to reach the SP, then traverses the SP's own HAS_ROLE chain. AUTHENTICATES_TO edge direction is SP → OAuth (SP is source). The EXECUTES_ON / AUTHENTICATES_VIA chain is not used by the materializer — it is a parallel provenance path for UI display.


Track 3: UI Upgrades (Phases 2-5) — Parallel

Execution Flow semantics (UNIFIED DEFINITION)

"Execution Flow" mode performs a directed traversal from a seed entity through execution-chain relationship types. TWO patterns depending on seed entity type:

Pattern A — Seed is an automation entity (business_rule, script_include, flow, scheduled_job):

Forward edges (materializer path):
Automation → RUNS_AS → SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource

Provenance display (UI overlay, requires reverse AUTHENTICATES_TO lookup):
Automation → EXECUTES_ON → REST Message → AUTHENTICATES_VIA → OAuth Entity
OAuth Entity ←AUTHENTICATES_TO← SP (reverse edge, SP is source)

Also: Automation → TRIGGERS_ON → Table Resource (provenance display only)

Pattern B — Seed is an identity entity (service_principal, oauth_app):

SP → AUTHENTICATES_TO → OAuth Entity (SP is source, OAuth is target)
SP → HAS_ROLE → Role → GRANTS → Permission → APPLIES_TO → Resource

Edge direction reference: SP → AUTHENTICATES_TO → OAuth (SP is always source). UI "execution flow" that needs to show OAuth → SP must perform a reverse edge lookup.

Both plans must reference this canonical definition. UI upgrades plan Phase 2 Section 2b should be updated.

Phase H: Graph Focus Mode (UI plan Phase 2)

Existing plan applies. Execution Flow mode uses canonical definition above.

Phase I: Node Details Drawer (UI plan Phase 3)

Existing plan applies. Drawer shows new properties (egress, origin, risk group).

Phase J: Navigation Continuity (UI plan Phase 4)

Existing plan applies unchanged.

Phase K: Hardening & QA (UI plan Phase 5)

Existing plan applies.


Execution Order

Gate 0:                Resolve PRD conflict, publish acceptance matrix, save plan
↓ (HARD GATE — nothing proceeds until Gate 0 passes)

Track 1 (Parallel): T1-A, T1-B, T1-C, T1-D, T1-E, T1-F (all can parallel)
→ T1-G (after T1-D + T1-E)
→ T1-H (after all T1-* modules done)
→ T1-QA
→ PILOT GATE

Track 2 (After pilot): T2-A, T2-B (ADRs) → T2-C → T2-D → T2-E → T2-F, T2-G

Track 3 (Parallel): Can start when T1-H confirms data flows through
H → I → J → K

Verification

# Connector (Track 1)
cd sv0-connectors/integrations/entra-servicenow && pytest

# Platform
cd sv0-platform && npm test && npm run test:integration && npm run typecheck

# UI
cd sv0-platform/ui && npm run ci

# Full stack
cd sv0-platform && docker compose up --build
# Trigger connector → verify entities with all 7 requirement outputs
# → verify findings → verify graph visualization