Layer 4: Storybook + Chromatic — Component Visual Regression
Why Storybook Over Figma
Decision based on analysis of 7 Sergey feedback documents (Mar 16 → Apr 7), 63+ review files, and 42 existing components.
| Factor | Storybook | Figma |
|---|---|---|
| Matches Sergey's feedback style | Stories = page scenarios ("CISO sees overview, gets WOW in 15 seconds") | Figma = pixel specs (Sergey gives architecture feedback, not pixel feedback) |
| Design tool compatibility | Independent of Stitch — stories are code | Would require moving designs from Stitch to Figma (second tool) |
| Living catalog | Stories become the missing component documentation | Figma is a separate tool outside the dev workflow |
| Iteration pattern | 17 rounds on Overview = story variants documenting each iteration | Figma can't show runtime behavior or data states |
| Chromatic | Native integration — every story = visual regression test | Would need Applitools ($500+/mo) for Figma comparison |
| Existing tokens | CSS custom properties in index.css work in Storybook automatically | Would need to export tokens to Figma Variables |
| Cost | Chromatic free tier (5K snapshots/mo) | Applitools enterprise pricing |
Evidence from March Sprint
Sergey's feedback operates at two levels:
- Page architecture (dominant, 90%): "Top section should be cluster-driven hero, not generic stats"
- Component-level (rare, 10%): "Fingerprint icon wrong", "Jira CTA too quiet"
Page-level stories map directly to how Sergey reviews. Component stories catch the token regressions that slip through page-level review.
Current State
| Asset | Count | Status |
|---|---|---|
| React components | 42 | Well-structured, pure-presentational, use CSS tokens |
| Pages | 24 | Data-fetching via TanStack Query hooks |
| Storybook | 0 | Not installed |
| Component tests | 0 | No unit tests for components |
| DESIGN.md | 1 | 120-line design system (19 color tokens, typography, rules) |
| Design Principles | 6 | Codified with acceptance tests |
Components are already reusable and well-bounded — badges, cards, tables, graphs are domain-organized. Stories will map naturally.
Phased Rollout
Phase A: Page-Level Stories (2-3 days)
Goal: Stories for the 3 pages Sergey reviews most — Overview, Risk Cluster Detail, Remediation Brief.
What to build:
OverviewPage.stories.tsx— variants: cluster-driven hero, empty state, single cluster, many clustersRiskClusterDetailPage.stories.tsx— variants: orphaned sensitive, scope drift, with/without remediationRemediationBriefPage.stories.tsx— variants: 404 state (old), full brief (new), action pending
Mock data: Create fixture factories for FindingDoc, RiskCluster, RemediationPlan that return realistic demo-w1 data. Use TanStack Query's QueryClient with pre-populated cache for page stories.
Validation: Compare story screenshots against prod/dev screenshots from the visual-verify run. This proves whether stories reproduce what's actually deployed.
Phase B: Shared Primitives (2-3 days)
Goal: Stories for the 15 most-reused components.
Priority order (by reuse count across pages):
| Component | Used in | Story variants |
|---|---|---|
SeverityBadge | 8+ pages | critical, high, medium, low, informational |
DeltaBadge / DeltaIndicator | 5+ pages | positive, negative, zero, large delta |
EvidenceBadge | 5+ pages | execution_confirmed, standing_authority, structural, inferred, correlated |
DataTable + filters + pagination | 6+ pages | empty, loading, populated, filtered, sorted |
StatCard | 4+ pages | number, trend, with delta, compact |
RiskClusterCard | 3+ pages | critical cluster, info cluster, with/without remediation |
EntityBadge | 6+ pages | identity, workload, resource, data domain |
TabNav | 5+ pages | 2 tabs, 5 tabs, with counts |
EmptyState / ErrorMessage | all pages | no data, error, 404 |
PathLabel | 4+ pages | short path, long path, with execution count |
Phase C: Chromatic CI Integration (1 day)
Goal: Every PR auto-snapshots changed stories.
Setup:
npx storybook@latest init # one-time setup
npx chromatic --project-token=<token> # connect to Chromatic
CI workflow addition (.github/workflows/chromatic.yml):
- Trigger: PRs modifying
ui/** - Run:
npx chromatic --exit-zero-on-changes - TurboSnap: Only snapshot stories whose dependencies changed (85% faster)
- Post: PR comment with Chromatic build link + change count
Free tier: 5,000 snapshots/month. With TurboSnap and ~30 stories, this covers ~160 PRs/month.
Phase D: Full Component Catalog (Ongoing)
Goal: Remaining 27+ components get stories over time.
Rule: Any PR that modifies a component must add/update its story. This grows the catalog organically without a big upfront investment.
Graph components (9 files) are the hardest — ELK.js layout + @xyflow/react. Defer these unless graph visualization changes are planned.
How This Integrates with Existing Layers
Layer 1: Agent rules (AGENTS.md)
→ Agent must reference evidence before claiming done
Layer 2: Dual-output diff report (visual-diff-report.ts --agent-report)
→ Page-level pixel comparison, structured PASS/FAIL
Layer 3: Local deploy + CI gate (visual-deploy.ts)
→ Immediate Cloudflare Pages links, PR comments with summaries
Layer 4: Storybook + Chromatic (THIS PLAN)
→ Component-level visual regression, catches what page-level misses
→ Stories document component states (living catalog)
→ Chromatic catches regressions before they propagate to pages
What each layer catches:
| Issue type | Layer that catches it |
|---|---|
| Agent blindly claims "fixed" | Layer 1 (rules block it) |
| Page layout regression | Layer 2 (pixel diff) |
| Token color drift on a badge | Layer 4 (Chromatic component snapshot) |
| Intentional redesign needs review | Layer 3 (/visual-verify verdicts) |
| Component state missing (empty, error) | Layer 4 (story variants) |
Validation Plan
Before committing to full Chromatic rollout, validate with Phase A:
- Create stories for Overview + Remediation Brief
- Capture story screenshots locally (
npx storybook build && npx test-storybook) - Compare against production screenshots from the visual-verify run
- If stories reproduce the actual deployed UI → Storybook works for our stack
- If stories diverge significantly → investigate mock data / token issues before proceeding
Next Action
Status: draft
Decision needed from: CTO
Options:
- Start Phase A now — create stories for 3 key pages as proof-of-concept alongside current UI work
- Start after current sprint — defer until Overview + Remediation pages are stable
- Skip Layer 4 — Layers 1-3 are sufficient for current team size
GitHub Issue: SecurityV0/sv0-platform — to be created when Phase A starts