Claude Code UI Vision Review
GitHub Issue: SecurityV0/sv0-documentation#4
Problem: UI testing requires many manual iterations — a human reviews each page, decides whether it communicates the right thing to the right user, and feeds back. The correctness checks (renders, errors, data) are already automated by
visual-qa.tsandux-audit.ts. What's missing is the qualitative review: does the UI serve the business vision?Goal: Build a
/review-uiskill that uses Claude's browser + vision capabilities to evaluate the UI against product docs, user feedback, and business goals.
1. What We Already Have
The sv0-platform already has two deterministic QA scripts:
| Script | What it does | Command |
|---|---|---|
visual-qa.ts | Headless Playwright. 11 pages. Console errors, network failures, load times, layout checks, screenshots, data presence, graph node counting. | npm run qa:visual |
ux-audit.ts | Deep UX audit. User flows, interactions, accessibility, performance, error states. Categorized findings (blocker/major/minor/polish). | npm run qa:ux |
These cover correctness well. They answer "does it work?" They do NOT answer:
- Does the Dashboard communicate risk at a glance? (feedback item #2)
- Would a CISO scan this in 5 seconds and know what to act on?
- Are we showing the right information hierarchy — or just dumping data?
- Does "Sensitive Data → External Egress → Orphaned Owner → Active Execution" come through? (feedback item #1)
- Is the graph readable as complexity grows? (feedback item #5)
- Does each finding surface sensitivity of data touched? (feedback item #6)
Those questions require judgment. That's what the skill does.
2. How Claude Code Can See the UI
Three options, each suited to different workflows:
2.1 Chrome Integration (Interactive — Local Dev)
claude --chrome # or /chrome within a session
Claude connects to your actual Chrome browser. It can navigate, click, read DOM, read console, record GIFs. Uses your session cookies — no auth setup needed.
Best for: You're working on the UI, want Claude to look at what you're seeing right now, give feedback in real-time.
Limitation: Requires you to be present. No CI/CD. Beta (Chrome/Edge only).
2.2 Playwright MCP (Headless — Automated)
claude mcp add playwright -- npx @playwright/mcp@1.58.2
Headless browser controlled by Claude via MCP tools (browser_navigate, browser_take_screenshot, browser_snapshot). 25 tools total. 143 device profiles.
Best for: Automated review runs. Claude navigates every page, takes screenshots, analyzes them without you watching.
Limitation: Needs tenant context injected (see §3). Costs tokens for every screenshot (~2,700 tokens per 1920×1080).
2.3 Existing Screenshots (Cheapest)
The visual-qa.ts script already saves screenshots to reports/visual-qa/screenshots/. Claude can read those with the Read tool (it's multimodal).
Best for: Running the review against already-captured screenshots. No browser needed. Cheapest option.
This is the recommended default for the skill.
3. Authentication & Tenant Context
The UI requires tenantId in localStorage to show data. Without it, every page shows an empty state.
The existing visual-qa.ts already handles this via QA_TENANT_ID env var and Playwright's addInitScript. When using Playwright MCP directly, you'd need to inject this via browser_navigate to a setup page or use addInitScript in the MCP config.
When using the Chrome integration, you're already logged in — no issue.
When using existing screenshots from visual-qa.ts — tenant context was already injected when the screenshots were taken.
4. The Skill: /review-ui
What It Does
Reads screenshots of every UI page, then evaluates each one against:
- Product vision — What SecurityV0 is supposed to communicate
- User feedback — Specific feedback from stakeholders (MIMA, CISOs)
- Target persona — What a CISO needs to see in 5 seconds
- W1 gap analysis — What the W1 UX spec calls for vs. what exists today
Produces a structured report with concrete, actionable findings — not scores or platitudes.
What It Does NOT Do
- Visual correctness (use
npm run qa:visual) - Functional testing (use
npm run qa:ux) - Pixel-diff regression (use BackstopJS/Percy if needed)
- Replace actual CISO feedback (this is a proxy, not a substitute)
5. Critical Review Findings (Incorporated)
The initial research was critically reviewed. Key adjustments made:
| Finding | Severity | How Addressed |
|---|---|---|
| Deterministic QA scripts already exist | Critical | Skill builds on top of them, doesn't replace |
| LLM visual regression is non-deterministic | Major | Skill is advisory, not a CI gate. No pass/fail scores. |
| Token costs understated | Major | Default mode uses pre-captured screenshots (cheap). Honest cost estimate: $3-8 per full run with live browser. |
| W1 UX spec ≠ current UI | Major | Explicitly framed as "gap analysis" not "bug report" |
| @xyflow/react graph not fully capturable by accessibility tree | Major | Uses screenshots for graph pages, not accessibility tree |
| Auth/tenant context ignored | Major | Documented in §3. Existing scripts already solve this. |
| Layer 3 produces unfalsifiable output | Major | Reframed: concrete observations and checklists, not 1-5 scores |
6. Cost Estimate (Honest)
| Mode | Token Cost | Wall Time |
|---|---|---|
Screenshots from visual-qa.ts (default) | ~30k input (11 screenshots) + ~20k product docs + | 2-3 min |
| Live Playwright MCP browsing | ~150k-300k tokens (tool call overhead, accumulated context) ≈ $5-10 | 10-15 min |
| Chrome interactive | Similar to Playwright MCP but conversational | Depends on session |
The screenshot-based mode is recommended for routine reviews. Use live browsing only when you need Claude to interact with the UI (click through flows, test specific scenarios).
7. Implementation
Phase 1 — Skill + Screenshot Mode
- Create
/review-uiskill insv0-platform/.claude/skills/review-ui/ - Skill runs
npm run qa:visualfirst (captures fresh screenshots) - Claude reads screenshots + product docs
- Produces structured review report
Phase 2 — Live Browser Mode
- Add Playwright MCP to sv0-platform (
.mcp.json) - Skill optionally uses live browsing for deeper investigation
- Claude can navigate to specific entities, click through flows
Phase 3 — Feedback Loop Integration
- After each stakeholder feedback session, update the feedback doc
- Re-run
/review-uito check whether feedback was addressed - Track alignment improvements over time
References
- Claude Code Chrome Integration (Beta)
- Playwright MCP Server
- Building an AI QA Engineer with Claude Code + Playwright
- Taking Screenshots with Playwright MCP
- Simon Willison: Playwright MCP with Claude Code
- Claude Code on the Web
Next Action
Status: adopted — shipped
Playwright-based visual review implemented as /visual-review skill in sv0-platform. Environment comparison mode added in PR #78. No further action required.