Skip to main content
Matthew BobackBackend & Platform Engineer
Live in Production2024-01 to OngoingSolo

Case Study

StageFlow

Self-hosted website scan workbench with live execution visibility and evidence-rich unified reports.

I built a self-hosted scan workbench that runs eight scanners against live URLs or ZIP builds, streams job state over SSE, fingerprints each issue with a 12-character SHA-256 hash, and merges deduplicated findings into one report with page-level evidence.

8

Built-In Scanners

3

User Phases

8

Durable Consumers

3

Audience Modes

Self-HostedJetStreamRootless PodmanSSEUnified ReportCLIVision AI
StageFlow homepage hero showing the product promise and live scan summary.

Why this matters

Proof first, implementation second.
I want the reader to see the product, understand the operator workflow, and then dig into the architecture and tradeoffs behind it.

I shipped one playground that handles both live URL scans and ZIP uploads through the same intake model and provenance handoff

I expose real-time job visibility through SSE with 15-second keepalives, snapshot-on-connect, and backpressure-aware eviction

I built an AI Vision Agent that navigates websites autonomously by iterating screenshots through a vision model with goal completion checking and loop detection

Product Proof

Screens that show the system in context.

Live StageFlow playground for configuring URL or ZIP scans and scanner options.

Live StageFlow playground for configuring URL or ZIP scans and scanner options.

Live scan status page with stage progress, stepper, and terminal output.

Live scan status page with stage progress, stepper, and terminal output.

Unified report overview with issue density, severity breakdown, and audience tabs.

Unified report overview with issue density, severity breakdown, and audience tabs.

Pages evidence view with an annotated screenshot and issue markers.

Pages evidence view with an annotated screenshot and issue markers.

Overview

What the product does and why I built it that way.

I built StageFlow to give teams one place to configure scans, watch execution, and triage findings without sending data to third-party SaaS. The product accepts live URLs and ZIP-based static builds through the same playground, runs up to eight scanners in rootless Podman pods, and streams job progress over SSE.

The platform API blocks SSRF attempts across 17 CIDR ranges (covering cloud metadata at 169.254.169.254, RFC1918, link-local, multicast, and CGNAT) before publishing to NATS. The orchestrator manages a seven-state FSM in PostgreSQL, launches per-job pods through the Podman Libpod HTTP API, and monitors each scanner with a dedicated goroutine on WaitContainer. A 30-second deadline sweeper catches jobs stuck in EXTRACTING (5-minute limit) or SCANNING (30-minute limit). When all scanners have reported, results are downloaded from MinIO, deduplicated with a conservative rule equivalence table, and uploaded as a UnifiedReportV2.

Highlights

Quick read

One playground handles both live URL and ZIP-based scan intake through the same job model.

Per-job Podman pods isolate extractors and up to eight scanners behind durable JetStream consumers with 10-minute ACK waits and PostgreSQL-backed idempotency.

Issue fingerprinting uses 12-character SHA-256 hashes of scanner, rule, page, and selector, so baseline diffs work across runs without a database join.

Partial failure handling produces a report when at least one scanner succeeds. Failed scanners appear explicitly in the output instead of killing the whole job.

Problem

Website quality checks were fragmented, opaque, and hard to trust.

I needed one repeatable workflow that could handle public URLs and uploaded static builds while keeping execution isolated and reproducible. The tools I had used before split results across separate dashboards, offered no visibility into running jobs, and gave little evidence when something failed. Comparing findings from five different scanners by hand was slow and unreliable.

I also wanted to run this on infrastructure I control. That meant treating target URLs and uploaded archives as untrusted input from day one: SSRF protection at intake, ZIP bomb detection before extraction, per-container memory limits, and a report that normalizes eight scanner outputs into one severity scale instead of asking people to mentally map between separate rating systems.

Solution

I turned scanning into a product flow: configure, observe, triage.

StageFlow models each scan as a durable lifecycle of events over NATS JetStream. The platform API validates intake, checks every resolved IP against a three-tier SSRF blocklist, publishes a single job event, and maintains an in-memory status projection with 15-minute TTL for live reads. The orchestrator persists authoritative job state in PostgreSQL, launches per-job Podman pods, runs extractors and up to eight scanners in parallel, uploads artifacts to MinIO, and emits terminal outcomes. Eight durable consumers across three streams handle at-least-once delivery, with idempotent PostgreSQL transactions preventing double-counted completions.

On the frontend I focused on product clarity. The playground configures URL and ZIP runs from one surface. The scan page streams SSE updates with a 15-second keepalive heartbeat and reconnect-friendly snapshot delivery. The unified report pulls severity breakdowns, page-level evidence, scanner status, and artifacts into one view so triage starts the moment a scan finishes.

Workflow

Configure -> Observe -> Triage
The user-facing loop starts in the playground, moves through live scan status with streamed events and logs, and finishes in a unified report with evidence overlays and scanner drill-downs.
PENDING
EXTRACTING
READY_TO_SCAN
SCANNING
COMPLETING
DONE
FAILED

jobs

job.created

job.completed

job.failed

extraction

extraction.ready

extraction.failed

scan

scan.page.completed

scan.completed

scan.failed

Key Endpoints

POST

/api/v1/jobs/zip

Upload a static build ZIP and create a scan job. Streams directly to MinIO without disk buffering.

POST

/api/v1/jobs/urls

Submit up to 100 public URLs for scanning with SSRF validation on every resolved IP.

GET

/api/v1/jobs/:id

Fetch current job state from in-memory cache with orchestrator admin API fallback on cache miss.

GET

/api/v1/jobs/:id/stream

Subscribe to live SSE updates with snapshot-on-connect and 15-second keepalives.

GET

/api/v1/jobs/:id/results

302 redirect to a presigned MinIO URL for the unified report JSON.

GET

/api/v1/jobs/:id/report

302 redirect to a presigned MinIO URL for the rendered HTML report.

GET

/api/v1/jobs/:id/diff

On-demand diff against the project baseline, computed in-process from two MinIO report downloads.

GET

/api/v1/scanners

List available scanner manifests, capabilities, and config schemas from the registry.

Architecture

The system shape behind the product.

Single-host event-driven architecture with strict data ownership boundaries. The platform API validates intake and maintains status projections in memory. JetStream carries lifecycle events across three streams with 72-hour retention. The orchestrator owns state transitions, job pods, and report assembly in PostgreSQL. MinIO stores artifacts and reports. Each service owns its data: the orchestrator never reads SQLite, the platform API never reads PostgreSQL, and neither service accesses the other’s MinIO namespaces.

Ingress

Ingress

Caddy reverse proxy

TLS termination

Path-based routing

Frontend

Frontend

SvelteKit 2 with Svelte 5 Runes

Playground UI

SSE status + report views

Platform API

Platform API

URL / ZIP intake with SSRF validation (17 CIDR ranges)

In-memory status projection (15-min TTL)

SQLite project and baseline storage

Durable State

Durable State

NATS JetStream (3 streams, 8 consumers, 72h retention)

PostgreSQL job state, events, and scanner results

MinIO artifact and report storage

Orchestrator

Orchestrator

Seven-state FSM with idempotent transitions

Scanner dispatch and completion tracking

Report aggregation with cross-scanner deduplication

Execution

Execution

Rootless Podman job pods via Libpod HTTP API

Archive extractor container (ZIP jobs)

Up to 8 scanner runner containers in parallel

Product Surfaces

8 Shipped Capabilities

Dual Intake Playground

core

One product surface configures public URL scans and ZIP-based static build scans. ZIP uploads stream directly to MinIO without touching disk.

AI Vision Agent

core

The ai-navigator scanner runs a two-phase vision-model loop. PageAnalyzer extracts up to 75 interactive DOM elements and sends them with a compressed screenshot to a vision model via OpenRouter. ActionDecider checks goal completion and loop conditions before each step. Disabled by default; requires OPENROUTER_API_KEY.

Unified Report with Issue Fingerprinting

dx

A versioned UnifiedReportV2 schema merges scanner summaries, pages, issues, and artifacts. Each issue gets a 12-character SHA-256 fingerprint of scanner, ruleId, pageId, and selector, making content-based diffing across runs possible without database queries.

Eight-Scanner Audit Coverage

performance

Axe (WCAG accessibility), Lighthouse (performance/quality), SEO (meta tags, headings, structured data), security-headers (CSP, HSTS, X-Frame-Options), link-checker (HEAD requests with status code classification), spelling-grammar (DOM text extraction), open-graph (og:* and twitter:* meta validation), and AI Navigator (goal-directed browser automation).

Manifest-Driven Plugin System

integration

Scanner manifests are validated against JSON Schema via AJV before any code loads. Discovery searches dist/scanners, /plugins, ~/.stageflow/plugins, and PLUGIN_PATHS. Invalid manifests fail at startup, not at scan time.

Project Mode CLI

integration

stageflow project reads .stageflow/config.yaml from a git repo, starts the configured dev server as a subprocess, polls a readiness URL, submits the scan, and stops the server on completion. Works locally and in CI with the --fail-on flag for severity gating.

SSRF and Archive Security

security

URL intake checks every resolved IP against a three-tier classification with 17 CIDR ranges blocking cloud metadata (169.254.169.254), RFC1918, link-local, multicast, and CGNAT. ZIP extraction enforces five checks: 5,000 max entries, 100x max expansion ratio, 1 GiB max uncompressed size, 250 MiB per-entry limit, and path traversal plus NUL-byte rejection.

Live Scan Status

dx

SSE streams stage progress, per-scanner completion, logs, and terminal state. The hub uses buffered channels with backpressure-aware eviction, 15-second keepalive heartbeats, and sends a full status snapshot on each new connection so reconnecting clients recover immediately.

Partial Failure and Deduplication

dx

DecideScanFailureCompletion returns partial results when at least one scanner succeeds. Cross-scanner deduplication uses a conservative static equivalence table of about 15 known overlaps, merging only when issues share both a canonical rule ID and the same pageId.

Tradeoffs

The decisions worth calling out.

Why model scans as durable events instead of direct service chaining?

I wanted explicit state transitions, replay, and crash recovery. JetStream lets the orchestrator resume from persisted events instead of rebuilding job context from logs after a failure. The 72-hour retention window and 10-minute ACK waits give the system room to recover from restarts without losing events.

Direct REST chainingRPC-only orchestrationExternal workflow engine

Why use rootless Podman pods per job?

Targets and uploaded archives are untrusted. Job-scoped pods and volumes reduce cross-job leakage, keep cleanup predictable, and make the execution model easier to reason about than a shared worker pool. Podman's rootless mode maps container processes to unprivileged UIDs on the host, so a compromised scanner has no path to root.

Shared scanner worker poolDocker daemon-based workersSingle long-lived scanner container

Why create a unified report contract instead of keeping scanner outputs separate?

The product promise is faster triage, not faster scanning. A versioned UnifiedReportV2 schema lets the frontend compare findings, show audience views, and attach evidence without custom per-scanner logic. Content-based fingerprints make regression diffing possible across runs.

Separate reports onlyFrontend-only merge logicLoose JSON blobs per scanner

Why validate scanner manifests with JSON Schema before code loads?

AJV validates manifests at plugin discovery time, before any module import. A scanner with an invalid manifest fails at startup, not at scan time. For a CI tool where a missing scanner would produce a false clean report, early failure is the right mode.

Hardcoded scanner listRuntime reflection onlyAd-hoc validation at scan time

Why implement idempotency in PostgreSQL instead of a NATS dedup layer?

RecordScannerCompletion needs to atomically read completed_scanners, check for duplicates, write the update, and compute allComplete. PostgreSQL gives me BEGIN/COMMIT with row-level locking. NATS KV does not have multi-key atomicity. The cost is PostgreSQL as a required dependency, which it already was for job state.

NATS KV dedup setApplication-level dedup cacheExactly-once delivery middleware

Why use a static rule equivalence table instead of semantic matching?

The cross-scanner overlap set is small, about 15 known equivalences. A false merge hides a real finding, which is worse than a duplicate in an audit tool. The table lives in one file and takes seconds to update. Semantic matching on descriptions or DOM selectors would have higher merge recall but also a higher false-merge rate.

Semantic description matchingDOM selector comparisonNo deduplication

Tech Stack

What actually shipped the system.

Core Services

Go 1.26.1

Platform API, orchestrator, archive extractor, and CLI in a 21-module workspace monorepo

TypeScript + Bun 1.3.8

Scanner runtime build toolchain and manifest-driven plugins (Node.js at runtime for Playwright compatibility)

PostgreSQL 17

Durable job state, scanner results, and event audit trail with serializable transactions and raw SQL

Runtime Infrastructure

SQLite (WAL)

Project metadata and baseline storage in the platform API

NATS JetStream 2.12.2

Durable lifecycle events with 72-hour retention, 8 consumers, 10-min ACK waits, and 5-second NAK delay

Podman (rootless)

Per-job pod isolation via Libpod HTTP API with per-container memory limits and swap disabled

MinIO

Artifact and report storage with presigned URL redirects (S3-compatible, swappable to AWS S3)

Frontend

SvelteKit 2

Playground, live status, and report routing compiled to static assets served by Caddy

Svelte 5 Runes

Reactive stores for scan status and report data

Tailwind CSS v4

Design system and editorial UI styling

Scan Engine

Playwright v1.57.0

Browser automation with request interception for SSRF protection at the browser level

axe-core

WCAG 2.1 AA accessibility checks with full DOM context, selectors, and bounding boxes

Lighthouse

Performance and quality scoring via Chrome DevTools Protocol, serialized per page due to chrome-launcher constraints

Challenges

What was hard and how I dealt with it.

Accepting untrusted inputs safely from both URLs and uploaded ZIP archives

I validate every resolved IP against a three-tier classification covering 17 CIDR ranges (cloud metadata at 169.254.169.254, RFC1918, link-local, multicast, CGNAT) before any NATS message is published. A single blocked IP in a DNS response rejects the entire URL. For ZIPs, the extractor enforces five checks before extracting a byte: 5,000 max entries, 100x max expansion ratio, 1 GiB total uncompressed size, 250 MiB per-entry limit, and path traversal plus NUL-byte rejection.

Making NATS at-least-once delivery safe for state mutations

RecordScannerCompletion opens a PostgreSQL transaction, reads completed_scanners, checks if the scanner is already present, and only writes if it is not. Two concurrent redeliveries race on the same row, and the loser finds the scanner already recorded. CreateJobIfAbsent uses INSERT ON CONFLICT DO NOTHING with a boolean return. Idempotency lives in the database transaction, not in a separate dedup layer.

Coordinating scanners that start and finish independently

I persist the expected scanner set in PostgreSQL, run one container per scanner inside a job pod, and only advance the state machine once every scanner has reported success or failure. The allComplete boolean is computed inside the same transaction that records each completion, so it stays accurate regardless of delivery order or count.

Detecting scanner failures when containers exit without publishing NATS events

Each StartScanner call ends with spawnMonitorContainer, which launches a goroutine that blocks on WaitContainer via the Podman socket. If the container exits with a non-zero code, the goroutine fetches the last 500 bytes of logs and calls failJobSafe. A separate deadline sweeper runs every 30 seconds, catching jobs stuck in EXTRACTING for over 5 minutes or SCANNING for over 30 minutes.

Keeping real-time status useful when browser connections are unreliable

I use an SSE hub with buffered channels and a three-select non-blocking write that evicts one pending event rather than blocking the publisher on a slow consumer. A 15-second keepalive heartbeat prevents proxy idle timeouts. On each new connection the handler sends a full status snapshot before entering the event loop, so reconnecting clients recover immediately without tracking missed events.

Producing a useful report when some scanners fail but others succeed

DecideScanFailureCompletion checks whether at least one ScannerResult has Success: true. If yes, it returns ScanFailureCompleteWithPartialResults and aggregation proceeds with whatever succeeded. Failed scanners are recorded in the report with status: failed and their error message. Summary counts come from successful results only, so the totals stay internally consistent.

Merging heterogeneous scanner outputs into one actionable report without hiding real findings

I normalize results into a unified contract and deduplicate with a conservative static equivalence table of about 15 known cross-scanner overlaps. Merging only happens when issues share both a canonical rule ID and the same pageId. Issues on different pages pass through unchanged. The higher-priority scanner's version is kept and annotated with alsoDetectedBy.

Outcomes

What shipped and what improved.

I shipped one playground that handles both live URL scans and ZIP uploads through the same intake model and provenance handoff

I expose real-time job visibility through SSE with 15-second keepalives, snapshot-on-connect, and backpressure-aware eviction

I built an AI Vision Agent that navigates websites autonomously by iterating screenshots through a vision model with goal completion checking and loop detection

I normalize eight scanner outputs into one report with content-based fingerprints, cross-scanner deduplication, severity breakdowns, and page-level evidence

I enforce trust boundaries at every layer: SSRF blocking across 17 CIDR ranges at intake, five ZIP extraction safety checks, browser-level request interception in the scanner runner, and per-container memory limits in rootless Podman pods

I handle partial scan failure gracefully, so a job with seven working scanners and one failing scanner still produces a usable report with explicit failure attribution

I support a CLI project mode that reads .stageflow/config.yaml, manages the dev server lifecycle, and gates CI pipelines with --fail-on severity thresholds

Next Step

Inspect the implementation

Review intake validation, orchestrator state transitions, scanner manifests, report aggregation, and the CLI project mode in the StageFlow repository.