Auto-Fix CI Failures with Claude
Date: 2026-02-23 Status: Proposed
Context
CI failures from pushes by any team member (CEO, developer) currently require manual investigation. Recent example: a broken link in product/index.md broke the docs build for 3 consecutive pushes before it was noticed and fixed. This plan adds automated analysis and auto-fix for simple failures across all 3 repos.
How It Works
Push → CI fails → claude-ci-fix.yml triggers → reads failure logs
→ Simple fix? → create branch, fix, verify, open PR → reviewer agent approves → you merge
→ Complex fix? → create GitHub issue with analysis → you assign to developer
New Files (5 total)
| Repo | File | Purpose |
|---|---|---|
| sv0-documentation | .github/workflows/claude-ci-fix.yml | Analyze + fix docs build failures |
| sv0-platform | .github/workflows/claude-ci-fix.yml | Analyze + fix lint/type/test failures |
| sv0-platform | .github/workflows/claude-review.yml | Review auto-fix PRs |
| sv0-connectors | .github/workflows/claude-ci-fix.yml | Analyze + fix Python lint/type/test failures |
| sv0-connectors | .github/workflows/claude-review.yml | Review auto-fix PRs |
No reviewer workflow for sv0-documentation — the mkdocs build --strict re-run on the fix PR serves as the review gate.
Workflow Design
Analyzer (claude-ci-fix.yml)
Trigger: workflow_run with types: [completed] — fires when an existing CI workflow finishes.
Loop prevention (3 layers):
if: github.event.workflow_run.conclusion == 'failure'— only on failuresif: !startsWith(github.event.workflow_run.head_branch, 'claude/')— skip fix branchesif: !contains(github.event.workflow_run.head_commit.message, '[ci-fix]')— skip fix commits
Steps:
- Checkout the failed branch
- Install deps (node/python/mkdocs depending on repo)
- Run
claude-code-actionwith a prompt that:- Uses
gh run view $RUN_ID --log-failedto read failure logs - Classifies failure as SIMPLE or COMPLEX
- SIMPLE → create
claude/fix-ci-*branch, minimal fix, verify, open PR with labelci-auto-fix - COMPLEX → create GitHub issue with analysis, labels
ci-failure+needs-human
- Uses
Watched workflows (exact names from repos):
- sv0-documentation:
"Documentation CI" - sv0-platform:
"ci" - sv0-connectors:
"Entra-ServiceNow CI","Entra-ServiceNow Quality Gates"(excludes scan workflow — those failures are credential/API issues)
Reviewer (claude-review.yml)
Trigger: pull_request with types: [opened] — only when head_ref starts with claude/fix-ci-.
Steps:
- Checkout
- Run
claude-code-actionas reviewer:- Check if fix is minimal and correct
- Approve →
gh pr review --approve - Issues found →
gh pr review --request-changes
Prompt Constraints (all repos)
Every analyzer prompt includes:
- Never modify workflow files (
.github/workflows/*) - Never modify
CLAUDE.md - Never commit secrets or
.envfiles - Keep fixes minimal — only what's needed to pass CI
- Use conventional commits:
fix: <description> [ci-fix](no Co-Authored-By per project convention) - If unsure, create an issue instead of a PR
Execution Options
Two approaches for running Claude in CI, depending on cost preference:
Option A: API Key on GitHub-hosted runners (simplest setup)
Add ANTHROPIC_API_KEY as an org-level or per-repo secret. Workflows use claude-code-action on GitHub's runners. Pay-per-token on Anthropic API plan (~$4-48/month).
Option B: Self-hosted runner on Mac Mini / VPS (use Max subscription)
Install GitHub Actions runner + Claude Code CLI on your own machine, authenticated with your Max login. Workflows call claude -p "..." directly instead of claude-code-action. No API costs — covered by Max subscription.
ToS note (Feb 2026): Running the official Claude Code CLI binary headlessly / via cron / on self-hosted runners is explicitly allowed under the Consumer Terms. What's prohibited is extracting OAuth tokens for use in third-party tools or the Agent SDK. The claude-code-action GitHub Action uses the Agent SDK internally, so it requires an API key — it cannot use Max auth. The self-hosted approach calls the CLI directly, which is fine.
| API key (Option A) | Self-hosted (Option B) | |
|---|---|---|
| Cost | ~$4-48/mo API usage | $0 (Max subscription) |
| Setup | Add 1 secret | Install runner + Claude CLI |
| Reliability | GitHub-managed | Your machine must be on |
| Maintenance | None | Runner updates |
Secrets Required
| Secret | Where | Notes |
|---|---|---|
ANTHROPIC_API_KEY | Org-level or per-repo | Only needed for Option A |
Labels to Create (all 3 repos)
ci-auto-fix(green) — auto-generated fix PRci-failure(red) — needs human attentionneeds-human(yellow) — requires manual intervention
Model Choice
Sonnet for both analyzer and reviewer — CI fixes are mechanical (lint, types, broken links), not architectural. ~$0.20-0.80 per failure. Estimated $4-48/month at 5-15 failures/week.
Implementation Order
- Org secret: Add
ANTHROPIC_API_KEYto SecurityV0 org (or per-repo) - Labels:
gh label createin all 3 repos - sv0-documentation first (simplest, lowest risk):
- Create
claude-ci-fix.yml - Test by intentionally breaking a link
- Verify: Claude detects, fixes, opens PR, docs-ci validates
- Create
- sv0-platform:
- Create
claude-ci-fix.yml+claude-review.yml - Test with a lint error on a branch
- Create
- sv0-connectors:
- Create
claude-ci-fix.yml+claude-review.yml - Test with a ruff violation
- Create
Verification
For each repo:
- Create a test branch
- Introduce a known simple failure (broken link / lint error / type error)
- Push → CI fails →
claude-ci-fix.ymltriggers - Confirm: fix PR created, CI passes on fix PR, reviewer approves (or docs-ci validates for docs)
- Merge the fix PR manually
What This Does NOT Cover
- Cross-repo failures (platform type change breaks connector)
- Credential/API failures (scan workflow excluded)
- Complex logic bugs (filed as issues, not auto-fixed)
- Auto-merge (always requires human approval to merge)