Live in Production2024-01 to OngoingSolo

StageFlow

Self-hosted website scan workbench with live execution visibility and evidence-rich unified reports.

6

Built-In Scanners

3

User Phases

3

Durable Streams

3

Audience Modes

Self-HostedJetStreamRootless PodmanSSEUnified ReportCLI

Product Screenshots

StageFlow homepage hero showing the product promise and live scan summary.

StageFlow homepage hero showing the product promise and live scan summary.

Live StageFlow playground for configuring URL or ZIP scans and scanner options.

Live StageFlow playground for configuring URL or ZIP scans and scanner options.

Live scan status page with stage progress, stepper, and terminal output.

Live scan status page with stage progress, stepper, and terminal output.

Unified report overview with issue density, severity breakdown, and audience tabs.

Unified report overview with issue density, severity breakdown, and audience tabs.

Pages evidence view with an annotated screenshot and issue markers.

Pages evidence view with an annotated screenshot and issue markers.

Overview

I built StageFlow to give teams one place to configure scans, watch execution, and triage findings without sending targets or artifacts to third-party SaaS. The product accepts live URLs and ZIP-based static builds through the same playground, runs scanners in rootless Podman pods, and streams job progress back over SSE while artifacts and reports are assembled behind the scenes.

The result is a self-hosted scan workbench: fast enough for release-day checks, strict enough for untrusted inputs, and productized enough that engineers, designers, and PMs can all work from the same report.

TL;DR

Built a self-hosted scan workbench that runs six scanners against live URLs or ZIP builds, streams job state in real time, and merges the results into one evidence-rich triage surface.

Highlights

  • One playground handles both live URL and ZIP-based scan intake.
  • Per-job Podman pods isolate extractors and scanners behind durable JetStream events.
  • Unified reports combine severity, page overlays, artifacts, and scanner drill-downs.
  • CLI project mode extends the product from hosted targets to local pre-release scans.

Problem

Website quality checks were fragmented, opaque, and hard to trust.

I needed one repeatable workflow that could handle public URLs and uploaded static builds while keeping execution isolated and reproducible. The tools I had used before split results across separate scanners, made it hard to understand current job state, and provided very little evidence when a run failed.

I also wanted a product I could run on infrastructure I control. That meant treating target URLs and archives as untrusted input, exposing progress in real time, and normalizing the final output into one report instead of asking people to compare five tools by hand.

Solution

I turned scanning into a product flow: configure, observe, triage.

StageFlow models each scan as a durable lifecycle of events over NATS JetStream. The API validates intake, publishes a job event, and keeps a lightweight SQLite projection for live status. The orchestrator persists authoritative job state in PostgreSQL, launches per-job Podman pods, runs extractors and scanners in parallel, uploads artifacts to MinIO, and emits terminal outcomes.

On the frontend I focused on product clarity, not just infrastructure truth. The playground configures URL and ZIP runs from one surface, the scan page streams SSE updates with reconnect-friendly behavior, and the unified report pulls severity, artifacts, scanner status, and page-level evidence into one view so triage starts immediately.

Workflow

Configure -> Observe -> Triage

The user-facing loop starts in the playground, moves through live scan status with streamed events and logs, and finishes in a unified report with evidence overlays and scanner drill-downs.

Lifecycle

PENDINGEXTRACTINGREADY_TO_SCANSCANNINGCOMPLETINGDONEFAILED

Event Streams

jobs

  • job.created
  • job.completed
  • job.failed

extraction

  • extraction.ready
  • extraction.failed

scan

  • scan.page.completed
  • scan.completed
  • scan.failed

Key Endpoints

POST

/api/v1/jobs/zip

Upload a static build ZIP and create a scan job

POST

/api/v1/jobs/urls

Submit one or more public URLs for scanning

GET

/api/v1/jobs/:id

Fetch current job state, progress, and artifact links

GET

/api/v1/jobs/:id/stream

Subscribe to live SSE updates for a job

GET

/api/v1/jobs/:id/results

Retrieve the unified report payload for completed scans

GET

/api/v1/scanners

List available scanner manifests and capabilities

Architecture

Single-host event-driven architecture with strict trust boundaries. The API validates intake and status projections, JetStream carries lifecycle events, the orchestrator owns legal transitions and job pods, PostgreSQL stores durable job state, and MinIO serves artifacts and report evidence.

Ingress

  • Caddy reverse proxy
  • TLS termination
  • Path-based routing

Frontend

  • SvelteKit 2
  • Playground UI
  • SSE status + report views

Platform API

  • URL / ZIP intake
  • SSRF validation
  • SQLite status projection

Durable State

  • NATS JetStream
  • PostgreSQL job state/events
  • MinIO artifacts

Orchestrator

  • FSM transitions
  • Scanner coordination
  • Report aggregation

Execution

  • Rootless Podman job pods
  • Extractor container
  • Scanner runner containers

Product Surfaces

8 Shipped Capabilities

Dual Intake Playground

core

One product surface configures public URL scans and ZIP-based static build scans.

Evidence-Rich Accessibility

core

axe-core results include severity, issue screenshots, and page-overlay evidence for triage.

Unified Report Contract

dx

A versioned report schema merges scanner summaries, pages, issues, artifacts, and errors.

Performance + SEO + Headers

performance

Lighthouse, SEO, and security-header scanners run in the same job and land in one report.

Manifest-Driven Scanners

integration

Scanner manifests validate identity, resource hints, and option schemas before runtime.

Project Mode CLI

integration

CLI project mode boots a local app, waits for readiness, and scans pre-release targets safely.

Rootless Isolation

security

Each job gets scoped containers, volumes, and cleanup boundaries instead of a shared worker.

Live Scan Status

dx

SSE streams stage progress, logs, and terminal state so long-running jobs stay debuggable.

Tech Stack

Core Services

Go 1.25

Platform API, orchestrator, and extractor services

TypeScript + Bun

Scanner runtime and manifest-driven plugins

PostgreSQL

Durable job state, event history, and orchestration truth

Runtime Infrastructure

SQLite (WAL)

Fast status projections served by the API

NATS JetStream

Durable lifecycle events and replayable consumers

Podman (rootless)

Per-job isolation and reproducible execution

MinIO

Artifact storage with presigned download links

Frontend

SvelteKit 2

Playground, live status, and report routing

Svelte 5 Runes

Reactive stores for scan status and reports

Tailwind CSS v4

Shared design system and editorial UI styling

Scan Engine

Playwright

Browser automation across multiple scanner modules

axe-core

WCAG-focused accessibility checks and evidence

Lighthouse

Performance and quality scoring within the same run

Tradeoffs & Decisions

Why model scans as durable events instead of direct service chaining?

I wanted explicit state transitions, replay, and crash recovery. JetStream lets the orchestrator resume from persisted events instead of rebuilding job context from logs after a failure.

Alternatives:Direct REST chainingRPC-only orchestrationExternal workflow engine

Why use rootless Podman pods per job?

Targets and uploaded archives are untrusted. Job-scoped pods and volumes reduce cross-job leakage, keep cleanup predictable, and make the execution model easier to reason about than a shared worker pool.

Alternatives:Shared scanner workerDocker daemon-based workersSingle long-lived scanner container

Why create a unified report contract instead of keeping scanner outputs separate?

The product promise is faster triage, not just faster scanning. A versioned unified report schema lets the frontend compare findings, show audience-specific views, and attach evidence without custom per-scanner logic.

Alternatives:Separate reports onlyFrontend-only merge logicLoose JSON blobs per scanner

Why make scanner manifests a first-class contract?

Manifest validation keeps scanners extensible without making the runtime unsafe. The runner can verify IDs, option schemas, resource hints, and capabilities before a container ever starts.

Alternatives:Hardcoded scanner listAd-hoc flagsRuntime reflection only

Challenges

Accepting untrusted inputs safely from both URLs and uploaded ZIP archives

I validate URL schemes and block private, loopback, metadata, and link-local targets, and I enforce archive entry, expansion, size, and traversal checks before extraction ever reaches the scanner runtime.

Coordinating scanners that start and finish independently

I persist the expected scanner set, run one container per scanner inside a job pod, and only advance the state machine once the orchestrator has accounted for every required outcome.

Keeping real-time status useful when browser connections are unreliable

I use an SSE hub with buffered channels, keepalive frames, and backpressure-aware eviction on the API side while the frontend reconnects and falls back to status fetches when streams drop.

Merging heterogeneous scanner outputs into one actionable report

I normalize scanner results into a unified contract, deduplicate known overlaps with explicit priority rules, and preserve enough evidence and metadata that the frontend can still explain where each finding came from.

Outcomes

  • I shipped one playground that handles both live URL scans and ZIP uploads through the same intake model
  • I expose real-time job visibility through SSE status, stage progress, logs, and terminal events
  • I normalize six scanner outputs into one report with severity, page evidence, artifacts, and scanner drill-downs
  • I preserve explicit trust boundaries from intake validation through extraction, runtime isolation, and presigned artifact access
  • I support a CLI project mode so teams can scan local apps before deploy using the same platform backend
  • I built report views that serve PM, engineer, and designer audiences without forking the underlying scan data

Inspect the implementation

Review intake validation, orchestrator state transitions, scanner manifests, report aggregation, and the CLI project mode in the StageFlow repository.