StageFlow
Self-hosted website scan workbench with live execution visibility and evidence-rich unified reports.
6
Built-In Scanners
3
User Phases
3
Durable Streams
3
Audience Modes
Product Screenshots

StageFlow homepage hero showing the product promise and live scan summary.

Live StageFlow playground for configuring URL or ZIP scans and scanner options.

Live scan status page with stage progress, stepper, and terminal output.

Unified report overview with issue density, severity breakdown, and audience tabs.

Pages evidence view with an annotated screenshot and issue markers.
Overview
I built StageFlow to give teams one place to configure scans, watch execution, and triage findings without sending targets or artifacts to third-party SaaS. The product accepts live URLs and ZIP-based static builds through the same playground, runs scanners in rootless Podman pods, and streams job progress back over SSE while artifacts and reports are assembled behind the scenes.
The result is a self-hosted scan workbench: fast enough for release-day checks, strict enough for untrusted inputs, and productized enough that engineers, designers, and PMs can all work from the same report.
TL;DR
Built a self-hosted scan workbench that runs six scanners against live URLs or ZIP builds, streams job state in real time, and merges the results into one evidence-rich triage surface.
Highlights
- One playground handles both live URL and ZIP-based scan intake.
- Per-job Podman pods isolate extractors and scanners behind durable JetStream events.
- Unified reports combine severity, page overlays, artifacts, and scanner drill-downs.
- CLI project mode extends the product from hosted targets to local pre-release scans.
Problem
Website quality checks were fragmented, opaque, and hard to trust.
I needed one repeatable workflow that could handle public URLs and uploaded static builds while keeping execution isolated and reproducible. The tools I had used before split results across separate scanners, made it hard to understand current job state, and provided very little evidence when a run failed.
I also wanted a product I could run on infrastructure I control. That meant treating target URLs and archives as untrusted input, exposing progress in real time, and normalizing the final output into one report instead of asking people to compare five tools by hand.
Solution
I turned scanning into a product flow: configure, observe, triage.
StageFlow models each scan as a durable lifecycle of events over NATS JetStream. The API validates intake, publishes a job event, and keeps a lightweight SQLite projection for live status. The orchestrator persists authoritative job state in PostgreSQL, launches per-job Podman pods, runs extractors and scanners in parallel, uploads artifacts to MinIO, and emits terminal outcomes.
On the frontend I focused on product clarity, not just infrastructure truth. The playground configures URL and ZIP runs from one surface, the scan page streams SSE updates with reconnect-friendly behavior, and the unified report pulls severity, artifacts, scanner status, and page-level evidence into one view so triage starts immediately.
Workflow
Configure -> Observe -> Triage
The user-facing loop starts in the playground, moves through live scan status with streamed events and logs, and finishes in a unified report with evidence overlays and scanner drill-downs.
Lifecycle
Event Streams
jobs
- job.created
- job.completed
- job.failed
extraction
- extraction.ready
- extraction.failed
scan
- scan.page.completed
- scan.completed
- scan.failed
Key Endpoints
/api/v1/jobs/zip
Upload a static build ZIP and create a scan job
/api/v1/jobs/urls
Submit one or more public URLs for scanning
/api/v1/jobs/:id
Fetch current job state, progress, and artifact links
/api/v1/jobs/:id/stream
Subscribe to live SSE updates for a job
/api/v1/jobs/:id/results
Retrieve the unified report payload for completed scans
/api/v1/scanners
List available scanner manifests and capabilities
Architecture
Single-host event-driven architecture with strict trust boundaries. The API validates intake and status projections, JetStream carries lifecycle events, the orchestrator owns legal transitions and job pods, PostgreSQL stores durable job state, and MinIO serves artifacts and report evidence.
Ingress
- Caddy reverse proxy
- TLS termination
- Path-based routing
Frontend
- SvelteKit 2
- Playground UI
- SSE status + report views
Platform API
- URL / ZIP intake
- SSRF validation
- SQLite status projection
Durable State
- NATS JetStream
- PostgreSQL job state/events
- MinIO artifacts
Orchestrator
- FSM transitions
- Scanner coordination
- Report aggregation
Execution
- Rootless Podman job pods
- Extractor container
- Scanner runner containers
Product Surfaces
8 Shipped Capabilities
Dual Intake Playground
coreOne product surface configures public URL scans and ZIP-based static build scans.
Evidence-Rich Accessibility
coreaxe-core results include severity, issue screenshots, and page-overlay evidence for triage.
Unified Report Contract
dxA versioned report schema merges scanner summaries, pages, issues, artifacts, and errors.
Performance + SEO + Headers
performanceLighthouse, SEO, and security-header scanners run in the same job and land in one report.
Manifest-Driven Scanners
integrationScanner manifests validate identity, resource hints, and option schemas before runtime.
Project Mode CLI
integrationCLI project mode boots a local app, waits for readiness, and scans pre-release targets safely.
Rootless Isolation
securityEach job gets scoped containers, volumes, and cleanup boundaries instead of a shared worker.
Live Scan Status
dxSSE streams stage progress, logs, and terminal state so long-running jobs stay debuggable.
Tech Stack
Core Services
Go 1.25
Platform API, orchestrator, and extractor services
TypeScript + Bun
Scanner runtime and manifest-driven plugins
PostgreSQL
Durable job state, event history, and orchestration truth
Runtime Infrastructure
SQLite (WAL)
Fast status projections served by the API
NATS JetStream
Durable lifecycle events and replayable consumers
Podman (rootless)
Per-job isolation and reproducible execution
MinIO
Artifact storage with presigned download links
Frontend
SvelteKit 2
Playground, live status, and report routing
Svelte 5 Runes
Reactive stores for scan status and reports
Tailwind CSS v4
Shared design system and editorial UI styling
Scan Engine
Playwright
Browser automation across multiple scanner modules
axe-core
WCAG-focused accessibility checks and evidence
Lighthouse
Performance and quality scoring within the same run
Tradeoffs & Decisions
Why model scans as durable events instead of direct service chaining?
I wanted explicit state transitions, replay, and crash recovery. JetStream lets the orchestrator resume from persisted events instead of rebuilding job context from logs after a failure.
Why use rootless Podman pods per job?
Targets and uploaded archives are untrusted. Job-scoped pods and volumes reduce cross-job leakage, keep cleanup predictable, and make the execution model easier to reason about than a shared worker pool.
Why create a unified report contract instead of keeping scanner outputs separate?
The product promise is faster triage, not just faster scanning. A versioned unified report schema lets the frontend compare findings, show audience-specific views, and attach evidence without custom per-scanner logic.
Why make scanner manifests a first-class contract?
Manifest validation keeps scanners extensible without making the runtime unsafe. The runner can verify IDs, option schemas, resource hints, and capabilities before a container ever starts.
Challenges
Accepting untrusted inputs safely from both URLs and uploaded ZIP archives
I validate URL schemes and block private, loopback, metadata, and link-local targets, and I enforce archive entry, expansion, size, and traversal checks before extraction ever reaches the scanner runtime.
Coordinating scanners that start and finish independently
I persist the expected scanner set, run one container per scanner inside a job pod, and only advance the state machine once the orchestrator has accounted for every required outcome.
Keeping real-time status useful when browser connections are unreliable
I use an SSE hub with buffered channels, keepalive frames, and backpressure-aware eviction on the API side while the frontend reconnects and falls back to status fetches when streams drop.
Merging heterogeneous scanner outputs into one actionable report
I normalize scanner results into a unified contract, deduplicate known overlaps with explicit priority rules, and preserve enough evidence and metadata that the frontend can still explain where each finding came from.
Outcomes
- I shipped one playground that handles both live URL scans and ZIP uploads through the same intake model
- I expose real-time job visibility through SSE status, stage progress, logs, and terminal events
- I normalize six scanner outputs into one report with severity, page evidence, artifacts, and scanner drill-downs
- I preserve explicit trust boundaries from intake validation through extraction, runtime isolation, and presigned artifact access
- I support a CLI project mode so teams can scan local apps before deploy using the same platform backend
- I built report views that serve PM, engineer, and designer audiences without forking the underlying scan data
Inspect the implementation
Review intake validation, orchestrator state transitions, scanner manifests, report aggregation, and the CLI project mode in the StageFlow repository.