Skip to main content

Claude Code UI Vision Review

GitHub Issue: SecurityV0/sv0-documentation#4

Problem: UI testing requires many manual iterations — a human reviews each page, decides whether it communicates the right thing to the right user, and feeds back. The correctness checks (renders, errors, data) are already automated by visual-qa.ts and ux-audit.ts. What's missing is the qualitative review: does the UI serve the business vision?

Goal: Build a /review-ui skill that uses Claude's browser + vision capabilities to evaluate the UI against product docs, user feedback, and business goals.


1. What We Already Have

The sv0-platform already has two deterministic QA scripts:

ScriptWhat it doesCommand
visual-qa.tsHeadless Playwright. 11 pages. Console errors, network failures, load times, layout checks, screenshots, data presence, graph node counting.npm run qa:visual
ux-audit.tsDeep UX audit. User flows, interactions, accessibility, performance, error states. Categorized findings (blocker/major/minor/polish).npm run qa:ux

These cover correctness well. They answer "does it work?" They do NOT answer:

  • Does the Dashboard communicate risk at a glance? (feedback item #2)
  • Would a CISO scan this in 5 seconds and know what to act on?
  • Are we showing the right information hierarchy — or just dumping data?
  • Does "Sensitive Data → External Egress → Orphaned Owner → Active Execution" come through? (feedback item #1)
  • Is the graph readable as complexity grows? (feedback item #5)
  • Does each finding surface sensitivity of data touched? (feedback item #6)

Those questions require judgment. That's what the skill does.


2. How Claude Code Can See the UI

Three options, each suited to different workflows:

2.1 Chrome Integration (Interactive — Local Dev)

claude --chrome    # or /chrome within a session

Claude connects to your actual Chrome browser. It can navigate, click, read DOM, read console, record GIFs. Uses your session cookies — no auth setup needed.

Best for: You're working on the UI, want Claude to look at what you're seeing right now, give feedback in real-time.

Limitation: Requires you to be present. No CI/CD. Beta (Chrome/Edge only).

2.2 Playwright MCP (Headless — Automated)

claude mcp add playwright -- npx @playwright/mcp@1.58.2

Headless browser controlled by Claude via MCP tools (browser_navigate, browser_take_screenshot, browser_snapshot). 25 tools total. 143 device profiles.

Best for: Automated review runs. Claude navigates every page, takes screenshots, analyzes them without you watching.

Limitation: Needs tenant context injected (see §3). Costs tokens for every screenshot (~2,700 tokens per 1920×1080).

2.3 Existing Screenshots (Cheapest)

The visual-qa.ts script already saves screenshots to reports/visual-qa/screenshots/. Claude can read those with the Read tool (it's multimodal).

Best for: Running the review against already-captured screenshots. No browser needed. Cheapest option.

This is the recommended default for the skill.


3. Authentication & Tenant Context

The UI requires tenantId in localStorage to show data. Without it, every page shows an empty state.

The existing visual-qa.ts already handles this via QA_TENANT_ID env var and Playwright's addInitScript. When using Playwright MCP directly, you'd need to inject this via browser_navigate to a setup page or use addInitScript in the MCP config.

When using the Chrome integration, you're already logged in — no issue.

When using existing screenshots from visual-qa.ts — tenant context was already injected when the screenshots were taken.


4. The Skill: /review-ui

What It Does

Reads screenshots of every UI page, then evaluates each one against:

  1. Product vision — What SecurityV0 is supposed to communicate
  2. User feedback — Specific feedback from stakeholders (MIMA, CISOs)
  3. Target persona — What a CISO needs to see in 5 seconds
  4. W1 gap analysis — What the W1 UX spec calls for vs. what exists today

Produces a structured report with concrete, actionable findings — not scores or platitudes.

What It Does NOT Do

  • Visual correctness (use npm run qa:visual)
  • Functional testing (use npm run qa:ux)
  • Pixel-diff regression (use BackstopJS/Percy if needed)
  • Replace actual CISO feedback (this is a proxy, not a substitute)

5. Critical Review Findings (Incorporated)

The initial research was critically reviewed. Key adjustments made:

FindingSeverityHow Addressed
Deterministic QA scripts already existCriticalSkill builds on top of them, doesn't replace
LLM visual regression is non-deterministicMajorSkill is advisory, not a CI gate. No pass/fail scores.
Token costs understatedMajorDefault mode uses pre-captured screenshots (cheap). Honest cost estimate: $3-8 per full run with live browser.
W1 UX spec ≠ current UIMajorExplicitly framed as "gap analysis" not "bug report"
@xyflow/react graph not fully capturable by accessibility treeMajorUses screenshots for graph pages, not accessibility tree
Auth/tenant context ignoredMajorDocumented in §3. Existing scripts already solve this.
Layer 3 produces unfalsifiable outputMajorReframed: concrete observations and checklists, not 1-5 scores

6. Cost Estimate (Honest)

ModeToken CostWall Time
Screenshots from visual-qa.ts (default)~30k input (11 screenshots) + ~20k product docs + 10k output ≈ **60k tokens ($1-2)**2-3 min
Live Playwright MCP browsing~150k-300k tokens (tool call overhead, accumulated context) ≈ $5-1010-15 min
Chrome interactiveSimilar to Playwright MCP but conversationalDepends on session

The screenshot-based mode is recommended for routine reviews. Use live browsing only when you need Claude to interact with the UI (click through flows, test specific scenarios).


7. Implementation

Phase 1 — Skill + Screenshot Mode

  1. Create /review-ui skill in sv0-platform/.claude/skills/review-ui/
  2. Skill runs npm run qa:visual first (captures fresh screenshots)
  3. Claude reads screenshots + product docs
  4. Produces structured review report

Phase 2 — Live Browser Mode

  1. Add Playwright MCP to sv0-platform (.mcp.json)
  2. Skill optionally uses live browsing for deeper investigation
  3. Claude can navigate to specific entities, click through flows

Phase 3 — Feedback Loop Integration

  1. After each stakeholder feedback session, update the feedback doc
  2. Re-run /review-ui to check whether feedback was addressed
  3. Track alignment improvements over time

References


Next Action

Status: adopted — shipped Playwright-based visual review implemented as /visual-review skill in sv0-platform. Environment comparison mode added in PR #78. No further action required.