Live in Production 2024-01 to Ongoing Solo

Case Study

StageFlow

Self-hosted website scan workbench with live execution visibility and evidence-rich unified reports.

StageFlow runs up to eight heterogeneous scanners against live URLs or static-site ZIP archives, streams progress in real time over SSE, and merges everything into a single unified report with stable content-based issue IDs. One submission, one report, one severity scale. I built it because running axe, Lighthouse, and LinkChecker separately meant separate reports, separate issue IDs, and no way to compare severity across tools.

Scanners

Go Modules

Microservices

State Transitions

GoTypeScriptNATS JetStreamPostgreSQLSQLiteRootless PodmanMinIOSvelteKitPlaywrightBun

Open StageFlow Source Code

Why this matters

Proof first, implementation second.

I want the reader to see the product, understand the operator workflow, and then dig into the architecture and tradeoffs behind it.

Overview

What the product does and why I built it that way.

Single-scanner tools each cover one audit category. Running them separately means separate reports, separate issue IDs, separate severity scales, and no way to see whether a 'critical' in one tool maps to a 'serious' in another. I wanted one submission that produces one report with one severity scale and one issue ID namespace. The CLI target is CI integration: push a change, CI runs `stageflow scan` with `--fail-on serious`, and the step fails if any new serious issue appears. Without stable issue IDs, a human has to review every run. With content-based fingerprints and `stageflow diff`, CI can distinguish new regressions from pre-existing known issues.

Highlights

Quick read

Eight scanners covering accessibility (axe), performance (Lighthouse), SEO, security headers, broken links, spelling/grammar, Open Graph metadata, and AI-driven navigation testing

Issue fingerprinting uses 12-character SHA-256 hashes of scanner, rule ID, page ID, and selector. Same page, same rule, same element produces the same ID across runs, so diff works without a database query.

The CLI `--fail-on critical` flag exits with code 1 if any issue meets the severity threshold, making it directly usable as a CI gate in GitHub Actions

Project Mode reads `.stageflow/config.yaml` from a git repo, starts the dev server, runs the scan, and stops the server. Dev server lifecycle and scan in one command.

ZIP job support lets you scan a static-site build artifact by uploading the ZIP. The archive extractor safely unpacks it, discovers HTML pages, and serves them locally so scanners run without a live deployment.

Cross-scanner deduplication uses a static equivalence table. Only confirmed rule overlaps get merged. Conservative by design because hiding a real finding is worse than showing a duplicate.

Architecture

How the system is structured.

Four services, each owning a strict data boundary. The platform API owns SQLite project records and in-memory job status projections. The orchestrator owns PostgreSQL job state and Podman pod lifecycle. The archive extractor owns ZIP validation and provenance generation. The scanner runner owns browser automation and audit output. They communicate exclusively through NATS JetStream. No service calls another service's HTTP API except for one admin fallback path on cold start. The whole codebase lives in a Go workspace monorepo with 21 modules and shared JSON Schema contracts.

Platform API

Accepts job submissions, validates URLs with SSRF protection (17 CIDR ranges checked against every DNS response), streams real-time status via SSE, and manages project/baseline data in SQLite. Never reads from PostgreSQL. Maintains an in-memory status projection rebuilt from NATS events on startup, with a fallback to the orchestrator admin API for cold starts.

Orchestrator

Manages the seven-state job FSM in PostgreSQL, launches and monitors Podman pods via the Libpod HTTP API over a Unix socket, tracks scanner completion with idempotent transactions, and assembles the final deduplicated report. Follows clean architecture: domain layer has zero external imports, application layer wires domain to adapter interfaces, adapter layer implements PostgreSQL, Podman, NATS, and MinIO integrations. Never reads from SQLite.

Archive Extractor

Ephemeral container launched per ZIP job. Validates archives against ZIP bomb thresholds (5000 max entries, 100x expansion ratio, 1 GiB total uncompressed), blocks path traversal, discovers HTML pages, generates provenance.json, and serves extracted files on port 8080. Crashes here do not affect the orchestrator.

Scanner Runner

TypeScript/Bun-built, Node-runtime container running Playwright with Chromium. Each scanner is a plugin that produces a UnifiedReportV2 JSON. Axe injects into the live DOM. Lighthouse drives Chrome via CDP. Results upload to MinIO and completion events publish to NATS. Up to eight scanner containers run fully in parallel per job.

NATS JetStream Event Bus

Three streams (jobs, extraction, scan) with eight event subjects. 72-hour retention, 10-minute ACK waits, max 10 deliveries, 5-second NAK delay. All inter-service state changes flow through here. Durable consumers survive service restarts and replay from the last acknowledged sequence number.

Features

What it does and how each piece works.

Scanning

Eight-Scanner Unified Audit

Axe for WCAG 2.1 AA accessibility with full DOM context. Lighthouse for performance scores via CDP. SEO for meta tags, headings, structured data. Security headers for CSP, HSTS, X-Frame-Options via raw HTTP GET. Link checker for broken references. Spelling/grammar for content errors. Open Graph for social metadata. AI Navigator for goal-directed browser automation using a vision model loop through OpenRouter. All mapped to one five-level severity scale.

Data Integrity

Content-Based Issue Fingerprinting

Every issue gets a 12-character ID from SHA-256 of scanner, rule ID, page ID, and CSS selector. The same violation on the same element produces the same fingerprint every time, regardless of when the scan ran. This makes `stageflow diff` a simple set intersection by ID. No database join needed to detect regressions.

Real-Time

Real-Time SSE Progress Streaming

The platform API fans out typed Change structs to per-subscriber buffered channels. A three-select non-blocking write evicts one pending event rather than blocking the publisher on a slow consumer. 15-second keepalive heartbeats prevent proxy idle timeout disconnects. On reconnect, EventSource gets a full status snapshot immediately. I chose SSE over WebSockets because automatic reconnection with full state recovery is built into the browser's EventSource API.

Developer Experience

CLI with CI Gate Support

The Go CLI does scan submission, progress streaming, report rendering, baseline diffing, and project mode. `--fail-on serious` exits with code 1 for CI gates. Output formats include colored text for humans, markdown for PR comments, and a JSON envelope wrapping UnifiedReportV2 for machine consumers. `stageflow project` reads config from the repo, starts the dev server, scans, and tears down.

Input Flexibility

ZIP Archive Scanning

Upload a static-site build artifact as a ZIP. The archive extractor validates against bomb thresholds, blocks path traversal with sanitizeZipEntryName, discovers HTML pages, generates provenance.json, and serves the extracted files locally. Scanners use the same code path as URL scans. Adding ZIP support required zero changes to the scanner runner.

Data Integrity

Cross-Scanner Deduplication

Axe, Lighthouse, and the SEO scanner all detect missing image alt text under different rule IDs. The deduplication table maps scanner-prefixed rule IDs to canonical IDs. Merging only happens when two issues share both the canonical rule ID and the same page ID. The higher-priority scanner version is kept, lower-priority gets noted in alsoDetectedBy. About 15 known equivalences. Conservative by design.

Reliability

Partial Scan Completion

Eight scanners run in parallel and some will fail. A site might block automated browsers, have CSP headers that prevent axe injection, or time out the link checker. If any scanner failure killed the whole job, you'd get no output for a large class of real sites. DecideScanFailureCompletion returns partial results when at least one scanner succeeded. Failed scanners show up in the report with their error message, but the summary counts stay internally consistent.

Security

SSRF Protection at Intake

Three-tier IP classification applied to every IP returned by DNS resolution for each submitted URL. Not just the final resolved address. A hostname with one valid IP and one private IP gets rejected. 17 hardcoded CIDR ranges covering RFC1918, link-local, CGNAT, multicast, and cloud metadata endpoints. The browser-level BlockList in the scanner runner adds a second layer, intercepting navigations via page.route to validate destination IPs.

Data flow

How data moves through the system.

A job enters as an HTTP request, becomes a NATS event, gets orchestrated through a seven-state PostgreSQL FSM, spawns Podman containers that publish scan results back to NATS, and ends as a UnifiedReportV2 JSON in MinIO. The handoffs that matter: the job UUID generated at intake, the provenance file mapping pages to scannable URLs, per-scanner results.json objects in MinIO, and the NATS event subjects gating each state transition.

Platform API validates the request, generates a UUID job ID, stages ZIP to MinIO if applicable, publishes jobs.events.created to NATS, seeds the in-memory status cache, returns 201 with the job ID

Orchestrator consumer picks up the event, inserts the job row with ON CONFLICT DO NOTHING for idempotency, creates a Podman pod with workspace and results volumes on stageflow_net

For ZIP jobs: archive extractor container validates the ZIP, extracts to the workspace volume, discovers HTML pages, generates provenance.json, starts a local HTTP server, publishes extraction.events.ready

Orchestrator resolves scanner types from the job config, sets expected_scanners in PostgreSQL, launches all scanner containers in parallel with environment variables for job ID, scanner type, NATS URL, and MinIO endpoint

Each scanner container loads its plugin, iterates pages with concurrency 4, publishes scan.events.page.completed per page for real-time progress, then uploads results.json to MinIO and publishes scan.events.completed

RecordScannerCompletion opens a PostgreSQL transaction: reads completed_scanners, checks for duplicates, appends the result, computes allComplete. When all scanners have reported, transitions to COMPLETING.

BuildAggregatedReport downloads results in sorted scanner order, applies deduplicateIssues using the equivalence table, recalculates severity counts from the deduplicated set, uploads the final report.json to MinIO, transitions to DONE

Client retrieves results via GET /api/v1/jobs/{id}/results which returns a 302 redirect to a presigned MinIO URL. The platform API never buffers report content.

Tradeoffs

Decisions that had real alternatives.

Four services instead of two

The archive extractor could have been a library inside the orchestrator. The scanner runner could have been a goroutine pool. I split them because they have different trust levels (the extractor handles untrusted ZIPs), different language runtimes (scanner runner needs Node.js and Playwright), and different failure modes. A Playwright crash in the scanner runner should not affect the orchestrator's job tracking loop. The cost is four container images to build and maintain.

NATS over HTTP for service coordination

The orchestrator could call the scanner runner via HTTP and poll for results. NATS gives at-least-once delivery with replay on restart, durable consumers that survive orchestrator restarts, and a clear audit trail via sequence numbers. The cost is an additional infrastructure dependency. I judged the reliability benefit worth it because no missed completion events on restart means no polling loop and no stuck jobs.

One severity scale for all scanners

Each scanner has its own notion of severity. Axe has critical/serious/moderate/minor. Lighthouse has 0-1 scores. Security headers has OWASP grades. I mapped all of these to a five-level scale. The Lighthouse mapping is particularly lossy. A score of 0.45 and 0.05 both become 'serious'. The alternative was a scanner-native severity field, but that would make cross-scanner filtering and --fail-on thresholds much harder to implement.

Flat issues array over per-page nesting

UnifiedReportV2 has a flat issues array with pageId and pageUrl per issue, plus a separate pages array. I could have nested issues under each page. The flat structure makes deduplication, sorting, filtering, and CLI rendering easier. You can slice the issues array by severity or scanner without nested loops. The cost is redundancy: each issue carries its page URL rather than a reference.

PostgreSQL idempotency over NATS dedup

NATS JetStream's key-value store could hold a dedup set for processed event IDs. I used PostgreSQL transactions instead because the state mutations involved need to be atomic: updating completed_scanners, checking allComplete, and potentially triggering aggregation. PostgreSQL gives me BEGIN/COMMIT with serializable isolation. NATS KV does not have multi-key atomicity.

Challenges

Problems that required specific solutions.

Problem

NATS at-least-once delivery means a scan.completed event can arrive twice. If RecordScannerCompletion is not idempotent, the second delivery triggers aggregation again or produces an incorrect allComplete value.

Solution

A PostgreSQL transaction reads completed_scanners, checks whether the incoming scanner type is already present, and only writes if it is not. Two concurrent redeliveries race on the same transaction. The loser finds the type already present and returns without effect. The allComplete boolean is computed from the post-transaction state, so it is always accurate regardless of delivery count.

Problem

ZIP files can be malicious: ZIP bombs with high compression ratios, path traversal entries like ../../etc/passwd, too many files causing inode exhaustion, NUL-byte filenames. The extractor handles untrusted input from the internet.

Solution

validateZIP checks maxEntries (5000), maxExpansionRatio (100x), maxUncompressedSize (1 GiB), and maxEntryUncompressedSize (250 MiB) before extracting a single byte. sanitizeZipEntryName rejects entries containing .., starting with /, or containing NUL bytes. The extractor runs in its own container, so a crash does not affect the orchestrator.

Problem

Podman containers can fail without producing NATS events. A scanner can be OOM-killed by the kernel, the runtime can lose the process, or the scanner can crash before connecting to NATS. Relying only on NATS events for completion detection would leave jobs stuck.

Solution

Two mechanisms close the gap. spawnMonitorContainer launches a goroutine per scanner that blocks on WaitContainer via the Podman socket. If the container exits non-zero, it fetches the last 500 bytes of logs and calls failJobSafe. A deadline sweeper runs every 30 seconds querying PostgreSQL for jobs stuck past their timeout. After any failure path, CleanupJob removes the pod, both volumes, and the staged ZIP.

Problem

The models.Job struct is the central type needed by the platform API, orchestrator, CLI, and scanner runner. But it accumulates fields from multiple contexts that only the orchestrator uses, like ScanStageLogKey and ExtractionRecipeKey.

Solution

I put it in libs/go/models as a pure data type with no service-specific logic. One fat struct with omitempty on fields consumers do not need. The alternative was service-specific view types with translation layers at every boundary. For a solo project, the single struct with omitempty is simpler and the field bloat is manageable.

Outcomes

What the work produced.

The eight-scanner design covers a wider audit surface than any single tool. Axe catches DOM-level accessibility violations that Lighthouse misses. The security-headers scanner catches HTTP-level misconfigurations that browser-based tools cannot see. The spelling scanner catches content issues that neither accessibility nor performance tools detect.

Adding a new scanner requires changes in exactly three places: the scanner runner plugin, the scanner manifest catalog, and the rule deduplication table if the new scanner overlaps with existing ones. No changes to the platform API, archive extractor, or deployment infrastructure.

The idempotency design has survived every NATS redelivery scenario in production, including orchestrator restarts mid-scan. No stuck jobs have been traced to duplicate scan.completed processing.

ZIP validation has rejected malformed uploads at the API layer (100 MB body limit) and malicious archives at the extractor layer (expansion ratio checks) without any successful extraction of a ZIP bomb or path-traversal payload.

The partial completion path means that on sites where Lighthouse is blocked by the target robot policy, the remaining seven scanners still produce a complete report. Most real-world scans hit at least one scanner failure.

The provenance handoff means URL and ZIP jobs use the same scanner code path. Scanners do not know or care whether the URLs point to live sites or to a local HTTP server in the pod. Adding ZIP support was a matter of adding the extraction phase and static server with zero changes to the scanner runner.

The whole stack runs without any root privileges on the host. Rootless Podman maps container processes to unprivileged UIDs via user namespaces. A Playwright process inside a scanner container is a non-root process on the host even if it runs as root inside the container.

See the rest of the work.

Each case study covers architecture, tradeoffs, and delivery detail. The skills page shows how these technologies connect across projects.

All Case Studies Get in Touch