CI/CD Operations

GitHub Actions workflows across the SecurityV0 workspace. All repos live under the securityv0 GitHub organization.

Workflow Inventory

sv0-platform

Inventory current as of 2026-05-22. 18 workflows. Cost behaviour of ci.yml is governed by ADR-030 — see § Cost and Actions Minutes.

Workflow	Trigger	Runner	Purpose
`ci.yml`	Push `main`/`redesign/v06-pilot`/`v*` tags + PRs	ubuntu-latest	Lint, typecheck, test, build; push Docker images to GHCR. amd64-only on PRs, multi-arch on main/tags; image build is path-gated to app changes; superseded PR runs cancelled (ADR-030)
`deploy-dev.yml`	`workflow_run` (ci success) + dispatch	ubuntu-latest	Auto-deploy to Hetzner `dev.securityv0.com` + PR previews `pr-N-dev.securityv0.com`
`deploy-prod.yml`	Manual dispatch	ubuntu-latest	Deploy to Hetzner `app.securityv0.com` (approval gate)
`deploy-dev-cleanup.yml`	Schedule + dispatch	ubuntu-latest	GC stale PR-preview instances (scheduled sweep; reaps closed PRs)
`deploy-azure-dev.yml`	`workflow_run` (ci success) + dispatch	ubuntu-latest	Deploy to Azure dev demo VM via OIDC Run Command (ADR-024)
`deploy-azure-staging.yml`	`workflow_run` (ci success) + dispatch	ubuntu-latest	Deploy to Azure staging VM via OIDC Run Command
`smoke-staging.yml`	`workflow_run` + schedule + dispatch	ubuntu-latest	Post-deploy smoke tests against staging
`visual-review.yml`	PR (ui/api changes) + dispatch	ubuntu-latest	Before/after screenshots + visual diff on `sv0-reviews.pages.dev`
`visual-review-cleanup.yml`	PR closed	ubuntu-latest	Delete Cloudflare Pages visual-review deployments
`visual-review-stale-cleanup.yml`	Schedule + dispatch	ubuntu-latest	GC stale Cloudflare Pages visual-review deployments
`visual-regression.yml`	PR (ui changes)	ubuntu-latest	Visual regression checks
`token-health.yml`	Weekly cron + dispatch	ubuntu-latest	Check Cloudflare Access service-token expiry, open issues near expiration
`demo-data-health.yml`	Manual dispatch	ubuntu-latest	On-demand demo data-health probe via `/admin/data-health` (staging)
`seed-jira-aws-smoke.yml`	Schedule + dispatch	ubuntu-latest	Jira→AWS demo seed smoke test
`azure-ops.yml`	Manual dispatch	ubuntu-latest	Demo-data ops (`docker run` from the deployed image) on Azure
`pr-preview-admin.yml`	Manual dispatch	ubuntu-latest	Seed/restore admin helper for Hetzner PR previews
`chain-builder-version-bump.yml`	PR	ubuntu-latest	Guard: fails the PR if chain-builder source changed without bumping `CHAIN_BUILDER_VERSION`
`bootstrap-cf-access.yml`	Manual dispatch	ubuntu-latest	Bootstrap/reconcile the "SecurityV0 PR Previews" Cloudflare Access app

sv0-website

Workflow	Trigger	Runner	Purpose
`deploy.yml`	PR to `main`	ubuntu-latest	Preview deploy to `pr-N.securityv0.pages.dev`
`deploy-prod.yml`	Manual dispatch	ubuntu-latest	Deploy to `securityv0.com` (approval gate)
`staging.yml`	Push to `main`	ubuntu-latest	Deploy staging, generate report, await approval, deploy prod
`report.yml`	Reusable workflow	ubuntu-latest	Lighthouse audit, screenshots, visual diff, business logic checks, post to issue #18
`visual-review.yml`	PR (src/public changes)	ubuntu-latest	Visual diff on `sv0-website-reviews.pages.dev`
`visual-review-cleanup.yml`	PR closed	ubuntu-latest	Delete Cloudflare Pages deployments

sv0-connectors

Workflow	Trigger	Runner	Purpose
`azure-foundry-ci.yml`	Push/PR (azure-foundry paths)	ubuntu-latest	Lint + test (Python 3.11-3.13 matrix)
`entra-servicenow-ci.yml`	Push/PR (entra-servicenow paths)	ubuntu-latest	Test + connector reports
`entra-servicenow-quality.yml`	Push/PR	ubuntu-latest	Lint, format, typecheck, test, build (Python matrix)
`entra-servicenow-scan.yml`	Push/PR + dispatch	ubuntu-latest	Run security scans against live Azure/ServiceNow
`servicenow-keepalive.yml`	Every 30 min	ubuntu-latest	Ping ServiceNow dev instance to prevent hibernation

sv0-documentation

Workflow	Trigger	Runner	Purpose
`docs-ci.yml`	Push/PR (docs paths)	ubuntu-latest	Build Docusaurus, deploy to `sv0-docs.pages.dev`

sv0-intelligence

Workflow	Trigger	Runner	Purpose
`weekly-incident.yml`	Mondays 8am UTC + dispatch	ubuntu-latest	Gather AI security signals, score with Claude, open PR to sv0-website

Dependency Graph

sv0-platform:
  ci.yml ──workflow_run──> deploy-dev.yml ──creates──> PR preview instances
                                                         │
  visual-review.yml ──screenshots──> PR preview instances ┘

  PR closed ──> deploy-dev-cleanup.yml
  PR closed ──> visual-review-cleanup.yml

  token-health.yml ──monitors──> CF_ACCESS_* service tokens

sv0-website:
  staging.yml ──calls──> report.yml (reusable) ──approval──> deploy-prod

  PR opened  ──> deploy.yml (preview)
  PR opened  ──> visual-review.yml
  PR closed  ──> visual-review-cleanup.yml

sv0-connectors:
  servicenow-keepalive.yml ──every 30min──> ServiceNow dev instance (prevent hibernation)

sv0-intelligence:
  weekly-incident.yml ──opens PR──> sv0-website ──triggers──> website CI
                                                 (deploy.yml preview + visual-review.yml)

Secrets Inventory

Secret Name	Repo(s)	Purpose	Rotation
`GITHUB_TOKEN` (implicit)	All repos	GHCR, GH API	Auto-managed
`DEPLOY_SSH_KEY`	sv0-platform	SSH to Hetzner servers	Manual rotation
`DEPLOY_HOST` / `DEPLOY_HOST_KEY`	sv0-platform	Server address + host key	Change on server migration
`CLOUDFLARE_API_TOKEN`	sv0-platform, sv0-website, sv0-documentation	Pages deployments	Manual rotation
`CLOUDFLARE_ACCOUNT_ID`	sv0-platform, sv0-website, sv0-documentation	Cloudflare account	Static
`CF_ACCESS_CLIENT_ID_DEPLOY` / `CF_ACCESS_CLIENT_SECRET_DEPLOY`	sv0-platform	CI deploy bot Cloudflare Access	Expires 2027-03-31, monitored by `token-health.yml`
`CF_ACCESS_CLIENT_ID_VISUAL` / `CF_ACCESS_CLIENT_SECRET_VISUAL`	sv0-platform	Visual review bot Cloudflare Access	Expires 2027-03-31, monitored by `token-health.yml`
`CLOUDFLARE_API_TOKEN_ZERO_TRUST`	sv0-platform	Zero Trust management API	Manual rotation
`ENTRA_SERVICENOW_AZURE_*` (3 secrets)	sv0-connectors	Azure Entra connector	Manual rotation
`ENTRA_SERVICENOW_SNOW_*` (3 secrets)	sv0-connectors	ServiceNow connector	Manual rotation
`ANTHROPIC_API_KEY`	sv0-intelligence	Claude API for signal scoring	Manual rotation
`GH_TOKEN`	sv0-intelligence	Cross-repo PR creation	Manual rotation

Credential Rotation Strategy

The infrastructure strategy doc (2026-03-31-infrastructure-strategy.md) defines a tiered secrets management approach. Operational details:

Automated monitoring -- The token-health.yml workflow runs weekly and on-demand. It queries the Cloudflare Zero Trust API for service token expiry dates and opens GitHub issues when tokens are within 30 days of expiration.

Cloudflare service tokens -- 1-year expiry (current tokens expire 2027-03-31). Rotation is automated via Cloudflare API: create new token, update GitHub secrets, delete old token. Monitored by token-health.yml.

GitHub secrets -- Manual rotation, no built-in expiry tracking. Rely on documentation and calendar reminders.

Future expansion -- Extend token-health.yml to check:

Azure client secrets (Entra connector) via Microsoft Graph API
ServiceNow passwords via ServiceNow Table API
Anthropic API key validity via a lightweight API call

Cloudflare Pages Projects

Project	Repo	Branch Pattern	Purpose
`securityv0`	sv0-website	main / staging / pr-N	Marketing website
`sv0-reviews`	sv0-platform	pr-N / custom	Platform visual review reports
`sv0-website-reviews`	sv0-website	pr-N / staging	Website visual review reports
`sv0-docs` / `sv0-docs-docusaurus`	sv0-documentation	main / pr-N	Documentation site

Runner Infrastructure

All workflows currently use ubuntu-latest (GitHub-hosted runners). Switched from self-hosted mac-mini runners in March 2026 for reliability and reduced maintenance.

The self-hosted mac-mini runner is still registered but not actively used by any workflow. It remains available as a fallback if GitHub-hosted runners become insufficient (e.g., for tasks requiring macOS or persistent local state). Do not move heavy CI (multi-arch Docker builds) onto it — the host is memory-constrained and has caused kernel panics under load. If native arm64 builds are ever needed, use GitHub's native ubuntu-24.04-arm runners (no QEMU), not self-hosting. See ADR-030.

Cost and Actions Minutes

The org has 50,000 included GitHub Actions Linux-minutes per month (resets on the 1st). Overage is $0.006/min for steady-state burn (~$60 for 10k minutes over the 50k pool; ~$300 at 100k total). That is financially trivial — the operational risks are a $0 budget cap halting all CI, multi-hour build hangs, and unbounded runaway burn (a wedged job or loop has no steady-state ceiling), not the dollar. See ADR-030 for the decision; this section is the operational how-to.

The pool follows active development

The org-wide pool is effectively a single-repo pool — it is consumed almost entirely by whichever repo is under heaviest development that month:

Month	Linux minutes	Repo consuming ~all of it
March 2026	3,829	`excalidraw-diagram-skill`
April 2026	33,780	`sv0-connectors`
May 2026 (22 days)	45,019	`sv0-platform`

So when you investigate a spike, start from the billing-by-repo breakdown, then drill into that repo's heaviest workflow (almost always its ci). In May 2026, sv0-platform's ci was ~80% of the pool — and the spend was bimodal: a typical run was ~20 billed min, but a tail of ~67 runs hung for 300–1,078 min each on multi-arch arm64-via-QEMU image builds (no job timeout to cap them). Look for that long-tail shape, not a high average.

Diagnosing a spike

# 1. Spend by repo for a month (Actions Linux minutes). The legacy
#    /orgs/.../settings/billing/actions endpoint is gone (HTTP 410).
gh api /organizations/SecurityV0/settings/billing/usage --jq '
  [.usageItems[] | select(.product=="actions" and .sku=="Actions Linux" and (.date|startswith("2026-05")))]
  | sort_by(.quantity) | reverse | .[] | "\(.quantity|floor) min   \(.repositoryName)"'

# 2. Find the heavy workflow's id (or just use its filename below):
gh workflow list --repo SecurityV0/<repo>

# 3. True run count for a workflow. (gh run list defaults to 20 and needs an
#    explicit --limit; for an exact count use the API total_count field. The
#    filename form is easiest — no id lookup needed.)
gh api "/repos/SecurityV0/<repo>/actions/workflows/ci.yml/runs?created=>=2026-05-01&per_page=1" --jq '.total_count'

# 4. Per-run billed minutes = sum over jobs of ceil(job_seconds / 60). The
#    /workflows/{id}/timing endpoint returns an empty {"billable":{}} and is
#    useless — sum job durations from /actions/runs/{run_id}/jobs instead.
#    Look at the DISTRIBUTION (a hang tail), not just the mean, and include
#    failed/cancelled runs — they bill too.

Levers (highest ROI first)

Cap heavy jobs with timeout-minutes — the May spike was 67 runs hanging up to 18h on wedged QEMU builds with no timeout. A timeout-minutes: 30 on build-images fails fast instead of billing to the 6-hour default. Highest-value, lowest-risk. (Follow-up — not yet shipped.)
amd64-only image builds on PRs — removes the arm64-via-QEMU emulation that causes the hangs; keep multi-arch on main/tags only. (ADR-030, shipped.)
concurrency + cancel-in-progress for PR refs — stop stacking full runs from rapid pushes. Scope to PRs only; main/tags/redesign/v06-pilot pushes are not auto-cancelled (they must publish their images). (ADR-030, shipped.)
Path-gate the non-required image build — docs/test-only PRs skip it; keep required checks always-on. Never make build-images a required check (a skipped required check blocks merge forever). (ADR-030, shipped.)
Label-gate PR-preview builds — only 4 dev preview slots exist; don't build images for PRs that can't deploy. (Follow-up.)
Move expensive optional checks to workflow_dispatch / label-gated — visual-regression, release multi-arch builds. (Follow-up.)
Native ARM runners, not self-hosting, if arm64 is ever required on PRs (self-hosted + fork PRs = code execution on persistent hardware).

For budget: do not set a $0 Actions budget — it converts a cost event into a CI outage. Use two layers: an alert budget with headroom (detection) and a non-zero hard ceiling set well above expected burn plus per-job timeout-minutes (containment). Alerts alone don't stop a 3am runaway.

Security Note

This document is internal to SecurityV0. Secret names are listed for operational reference — actual secret values are stored in GitHub Actions secrets and are not accessible without repository admin access. Do not share this document externally without redacting the secrets inventory.

Workflow Inventory​

sv0-platform​

sv0-website​

sv0-connectors​

sv0-documentation​

sv0-intelligence​

Dependency Graph​

Secrets Inventory​

Credential Rotation Strategy​

Cloudflare Pages Projects​

Runner Infrastructure​

Cost and Actions Minutes​

The pool follows active development​

Diagnosing a spike​

Levers (highest ROI first)​

Security Note​