Claude Code UI Vision Review

GitHub Issue: SecurityV0/sv0-documentation#4

Problem: UI testing requires many manual iterations — a human reviews each page, decides whether it communicates the right thing to the right user, and feeds back. The correctness checks (renders, errors, data) are already automated by visual-qa.ts and ux-audit.ts. What's missing is the qualitative review: does the UI serve the business vision?

Goal: Build a /review-ui skill that uses Claude's browser + vision capabilities to evaluate the UI against product docs, user feedback, and business goals.

1. What We Already Have

The sv0-platform already has two deterministic QA scripts:

Script	What it does	Command
`visual-qa.ts`	Headless Playwright. 11 pages. Console errors, network failures, load times, layout checks, screenshots, data presence, graph node counting.	`npm run qa:visual`
`ux-audit.ts`	Deep UX audit. User flows, interactions, accessibility, performance, error states. Categorized findings (blocker/major/minor/polish).	`npm run qa:ux`

These cover correctness well. They answer "does it work?" They do NOT answer:

Does the Dashboard communicate risk at a glance? (feedback item #2)
Would a CISO scan this in 5 seconds and know what to act on?
Are we showing the right information hierarchy — or just dumping data?
Does "Sensitive Data → External Egress → Orphaned Owner → Active Execution" come through? (feedback item #1)
Is the graph readable as complexity grows? (feedback item #5)
Does each finding surface sensitivity of data touched? (feedback item #6)

Those questions require judgment. That's what the skill does.

2. How Claude Code Can See the UI

Three options, each suited to different workflows:

2.1 Chrome Integration (Interactive — Local Dev)

claude --chrome    # or /chrome within a session

Claude connects to your actual Chrome browser. It can navigate, click, read DOM, read console, record GIFs. Uses your session cookies — no auth setup needed.

Best for: You're working on the UI, want Claude to look at what you're seeing right now, give feedback in real-time.

Limitation: Requires you to be present. No CI/CD. Beta (Chrome/Edge only).

2.2 Playwright MCP (Headless — Automated)

claude mcp add playwright -- npx @playwright/mcp@1.58.2

Headless browser controlled by Claude via MCP tools (browser_navigate, browser_take_screenshot, browser_snapshot). 25 tools total. 143 device profiles.

Best for: Automated review runs. Claude navigates every page, takes screenshots, analyzes them without you watching.

Limitation: Needs tenant context injected (see §3). Costs tokens for every screenshot (~2,700 tokens per 1920×1080).

2.3 Existing Screenshots (Cheapest)

The visual-qa.ts script already saves screenshots to reports/visual-qa/screenshots/. Claude can read those with the Read tool (it's multimodal).

Best for: Running the review against already-captured screenshots. No browser needed. Cheapest option.

This is the recommended default for the skill.

3. Authentication & Tenant Context

The UI requires tenantId in localStorage to show data. Without it, every page shows an empty state.

The existing visual-qa.ts already handles this via QA_TENANT_ID env var and Playwright's addInitScript. When using Playwright MCP directly, you'd need to inject this via browser_navigate to a setup page or use addInitScript in the MCP config.

When using the Chrome integration, you're already logged in — no issue.

When using existing screenshots from visual-qa.ts — tenant context was already injected when the screenshots were taken.

4. The Skill: `/review-ui`

What It Does

Reads screenshots of every UI page, then evaluates each one against:

Product vision — What SecurityV0 is supposed to communicate
User feedback — Specific feedback from stakeholders (MIMA, CISOs)
Target persona — What a CISO needs to see in 5 seconds
W1 gap analysis — What the W1 UX spec calls for vs. what exists today

Produces a structured report with concrete, actionable findings — not scores or platitudes.

What It Does NOT Do

Visual correctness (use npm run qa:visual)
Functional testing (use npm run qa:ux)
Pixel-diff regression (use BackstopJS/Percy if needed)
Replace actual CISO feedback (this is a proxy, not a substitute)

5. Critical Review Findings (Incorporated)

The initial research was critically reviewed. Key adjustments made:

Finding	Severity	How Addressed
Deterministic QA scripts already exist	Critical	Skill builds on top of them, doesn't replace
LLM visual regression is non-deterministic	Major	Skill is advisory, not a CI gate. No pass/fail scores.
Token costs understated	Major	Default mode uses pre-captured screenshots (cheap). Honest cost estimate: $3-8 per full run with live browser.
W1 UX spec ≠ current UI	Major	Explicitly framed as "gap analysis" not "bug report"
@xyflow/react graph not fully capturable by accessibility tree	Major	Uses screenshots for graph pages, not accessibility tree
Auth/tenant context ignored	Major	Documented in §3. Existing scripts already solve this.
Layer 3 produces unfalsifiable output	Major	Reframed: concrete observations and checklists, not 1-5 scores

6. Cost Estimate (Honest)

Mode	Token Cost	Wall Time
Screenshots from `visual-qa.ts` (default)	~30k input (11 screenshots) + ~20k product docs + ~~10k output ≈ 60k tokens (~~$1-2)	2-3 min
Live Playwright MCP browsing	~150k-300k tokens (tool call overhead, accumulated context) ≈ $5-10	10-15 min
Chrome interactive	Similar to Playwright MCP but conversational	Depends on session

The screenshot-based mode is recommended for routine reviews. Use live browsing only when you need Claude to interact with the UI (click through flows, test specific scenarios).

7. Implementation

Phase 1 — Skill + Screenshot Mode

Create /review-ui skill in sv0-platform/.claude/skills/review-ui/
Skill runs npm run qa:visual first (captures fresh screenshots)
Claude reads screenshots + product docs
Produces structured review report

Phase 2 — Live Browser Mode

Add Playwright MCP to sv0-platform (.mcp.json)
Skill optionally uses live browsing for deeper investigation
Claude can navigate to specific entities, click through flows

Phase 3 — Feedback Loop Integration

After each stakeholder feedback session, update the feedback doc
Re-run /review-ui to check whether feedback was addressed
Track alignment improvements over time

References

Next Action

Status: adopted — shipped Playwright-based visual review implemented as /visual-review skill in sv0-platform. Environment comparison mode added in PR #78. No further action required.

1. What We Already Have​

2. How Claude Code Can See the UI​

2.1 Chrome Integration (Interactive — Local Dev)​

2.2 Playwright MCP (Headless — Automated)​

2.3 Existing Screenshots (Cheapest)​

3. Authentication & Tenant Context​

4. The Skill: /review-ui​

What It Does​

What It Does NOT Do​

5. Critical Review Findings (Incorporated)​

6. Cost Estimate (Honest)​

7. Implementation​

Phase 1 — Skill + Screenshot Mode​

Phase 2 — Live Browser Mode​

Phase 3 — Feedback Loop Integration​

References​

Next Action​