Skip to main content
Matthew BobackBackend & Platform Engineer
Open Source2024-01 to OngoingSolo

Case Study

Clear11y

Containerized accessibility scanner for pre-deployment static site testing.

Clear11y scans ZIP archives of static site builds for WCAG 2.1 violations before deployment. It runs axe-core (90+ rules) and a custom keyboard navigation engine inside Docker containers, producing scored reports with screenshots. The CLI, FastAPI REST API, and GitHub Action all call the same Pipeline class.

9.3K

Python LOC

90+

axe-core Rules

7

Pipeline Stages

5

Arch Layers

Docker-FirstCLIREST APIGitHub ActionWCAG 2.1Keyboard TestingSecurity Hardened

Why this matters

Proof first, implementation second.
I want the reader to see the product, understand the operator workflow, and then dig into the architecture and tradeoffs behind it.

I bypassed Playwright's file:// restrictions by embedding an ephemeral HTTP server that binds port 0 inside the container. ZIP builds become indistinguishable from live sites to the browser.

I reduced per-page scan time by keeping one Playwright browser warm and creating lightweight BrowserContexts per page. This cuts ~500ms of browser launch overhead per page.

I built IS_FOCUS_VISIBLE_SCRIPT to detect focus indicator issues that axe-core's static analysis misses. It snapshots 8 computed style properties before focus and compares after, catching outline:none suppressions and 4 other indicator patterns.

Overview

What the product does and why I built it that way.

I built Clear11y to close a gap in static-site workflows: you can't reliably scan a `dist/` build with browser tools before you deploy it. Browser extensions can't access `file://` URLs, and Playwright's `file://` handling varies across browsers and versions. Clear11y takes a ZIP-packaged build, extracts it, serves it over an ephemeral HTTP server bound to port 0, and runs both axe-core and a custom keyboard engine against it. The result is a scored WCAG report with screenshot evidence, so accessibility issues get caught before users ever see them.

Highlights

Quick read

Scans build artifacts from ZIP archives, bypassing the file:// restriction that blocks every browser-based accessibility tool

Two testing engines: axe-core for static WCAG rules and a custom KeyboardAccessibilityService that detects focus issues axe-core cannot see

IS_FOCUS_VISIBLE_SCRIPT snapshots 8 computed style properties before focus, compares after, across 5 indicator categories (outline, box-shadow, border, background-color, text-decoration)

Ephemeral HTTP server binds port 0 (kernel-assigned), runs in a daemon thread, shuts down with a 5-second join timeout

Zip Slip protection runs two-pass validation: before and after os.path.normpath(), plus os.path.commonpath() as a final guard

Problem

Static builds can't be scanned before deployment.

Static site generators produce final HTML in `dist/` or `build/`, but the most common accessibility workflow (browser extensions) breaks because Chrome and Firefox block extension access to `file://` URLs as a security measure. Playwright can navigate `file://` paths, but the behavior changes between Chromium versions and requires different flags per browser.

I kept seeing teams deploy first and test in production, or skip automated testing entirely. That's how accessibility regressions slip into releases. And even when teams do run axe-core, it's a static analyzer. It cannot detect focus visibility, tab order, or focus traps, because those only exist when a user is pressing Tab.

Solution

Containerized scanning that simulates real deployment.

Clear11y takes the zipped build artifact and runs it through an isolated Playwright container. Instead of relying on `file://` navigation, it spins up an ephemeral HTTP server bound to port 0 (kernel-assigned) inside the container, making the build look like a live site to the browser.

Two engines scan each page. axe-core checks 90+ static WCAG rules (missing alt text, color contrast, ARIA misuse). A custom keyboard engine presses Tab up to 150 times, compares 8 computed style properties before and after `focus()` to detect missing indicators, catches focus traps when the same element appears 3+ times consecutively, and renders SVG overlays of the tab journey.

The result is a scored report (0-100, graded A+ through F) with screenshot evidence. The CLI, FastAPI REST API, and GitHub Action all call the same Pipeline class. If violations exceed a threshold, the pipeline fails and you fix issues before deployment.

Workflow

Build → Scan → Deploy
Package your static build as a ZIP, scan it for WCAG violations, fix issues before deployment. Only accessible builds ship to production.
PENDING
RUNNING
SUCCESS
FAILED

Key Endpoints

POST

/api/jobs/zip

Submit ZIP archive for scanning (100 MB limit, returns 202 with job ID)

POST

/api/jobs/url

Submit up to 50 URLs for scanning (SSRF-validated, returns 202)

GET

/api/jobs/:id

Poll job status, violation counts, and report link

GET

/api/jobs/:id/report

Self-contained HTML report with base64-embedded screenshots

GET

/api/jobs/:id/consolidated

Full ReportData as JSON (metadata, per-page violations, keyboard results)

GET

/healthz

Health check endpoint

GET

/prometheus

Prometheus text format metrics (job counts, latency, errors)

Architecture

The system shape behind the product.

I organized Clear11y into five layers. Three interfaces (CLI, API, GitHub Action) all construct the same Pipeline with injected services. The Pipeline coordinates stateless scanning services in sequence. None of the interface code knows about scanning; none of the service code knows about HTTP or CLI flags. Adding the API required writing only `web/server.py` with zero changes to any service or the pipeline.

Interface Layer

Interface Layer

CLI (click + Rich)

REST API (FastAPI)

GitHub Action (composite)

Orchestration Layer

Orchestration Layer

Pipeline (pipeline.py)

Container Manager

Settings (TOML + env + defaults)

Scanning Services

Scanning Services

PlaywrightAxeService (stateless)

KeyboardAccessibilityService

BrowserManager

HttpService (port 0)

Storage & Reporting

Storage & Reporting

ZipService

HtmlDiscoveryService

Jinja2 Report Generator

DatabaseJobStore (SQLAlchemy)

Runtime Environment

Runtime Environment

Docker (playwright/python base)

Chromium (headless)

Python 3.10+

uv + hatchling

Capabilities

7 Core Features

ZIP Archive HTTP Scanning

core

Extracts static builds from ZIP archives and serves them over an ephemeral HTTP server bound to port 0 (kernel-assigned). The server runs in a daemon thread with a 5-second join timeout on shutdown. BLOCKED_URL_PATTERNS aborts requests to analytics and tracking domains for deterministic scans.

Focus Indicator Detection

core

IS_FOCUS_VISIBLE_SCRIPT snapshots 8 computed style properties as primitive strings before calling element.focus({ focusVisible: true }), then compares after values. Detects missing indicators across 5 categories: outline, box-shadow, border, background-color, and text-decoration. The focusVisible: true option triggers :focus-visible UA styles, matching what keyboard users actually see.

Tab Journey Visualization

core

Simulates full Tab traversal (up to 150 keypresses), recording each focused element's selector, role, accessible name, and bounding box. Injects an SVG overlay with magenta numbered circles (r=14) connected by lines showing exact tab order. Captures a full-page screenshot so you can see at a glance if navigation is out of order.

Focus Trap Detection

core

Presses Tab 20 times from a clean page and tracks focus history. If the same element appears 3+ times consecutively, that indicates a focus trap. Reported as a critical severity violation mapped to WCAG 2.1.2 (No Keyboard Trap).

Self-Validating Keyboard Engine

core

Before any real scan, _self_validate() creates an inline test page with three buttons: one with proper outline focus, one with outline: none, and one with box-shadow. The validator checks that IS_FOCUS_VISIBLE_SCRIPT correctly distinguishes them. If validation fails, every subsequent scan logs a warning. This caught a real regression when a Playwright version changed focusVisible behavior.

Playwright Context Reuse

performance

I eliminated browser cold starts by sharing one Chromium process across all pages and creating isolated BrowserContexts per page. Each context resets cookies and storage without the ~500ms browser launch cost. The axe engine and keyboard engine use separate Playwright instances because they have different lifecycle requirements.

Screenshot Evidence

dx

Captures violations with red-highlighted CSS overlays injected via page.evaluate(). Default highlight is outline: 4px solid #ff0000 with box-shadow. Up to 4 elements per violation (configurable via A11Y_SHOT_MAX_TARGETS), with overlapping elements merged into single contextual screenshots.

WCAG Scoring

dx

Penalty weights: critical=40, serious=20, moderate=5, minor=1. Score is 100 minus the sum of penalties, capped at 0. Keyboard violations count too, with focus traps and unreachable elements rated critical, poor indicators rated serious. Grades: 90+ is A+, 80+ is A, 70+ is B, 60+ is C, 50+ is D, below 50 is F.

Zip Slip & Zip Bomb Protection

security

Two-pass path validation: rejects absolute paths and '..' components before os.path.normpath(), then checks again after normalization (which can introduce traversal from safe-looking inputs). Final guard via os.path.commonpath() on resolved absolute paths. Zip Bomb protection reads uncompressed sizes from the ZIP central directory and aborts at 500 MB before any bytes hit disk.

SSRF Mitigation

security

For URL scans, _validate_public_http_url() resolves hostnames via socket.getaddrinfo() and requires every resolved IP to pass ipaddress.is_global. Blocks RFC 1918, loopback, and link-local addresses. Localhost is also blocked by name before DNS lookup to catch trivial bypasses. ZIP scans restrict Playwright navigation to localhost origins only.

Job-based FastAPI

integration

POST endpoints return HTTP 202 with a job ID immediately. Scans run in asyncio background tasks offloaded via asyncio.to_thread() to avoid blocking the uvicorn event loop. Jobs expire after a configurable retention period (default 24 hours) and get cleaned up at startup and on a periodic schedule.

GitHub Action Gate

integration

Composite action with 4 steps: setup Python, install package, build Docker image, run scan. Outputs 7 variables including total-violations, critical-violations, keyboard-violations, and scan-status. The fail-on-violations input exits 1 to block deployment when thresholds are exceeded. Reports upload as build artifacts automatically.

Tradeoffs

The decisions worth calling out.

Why Docker over native Playwright?

I optimized for repeatability. Docker pins the Playwright + browser versions so scans produce identical results regardless of the host machine. Microsoft's playwright/python base image includes all Chromium system dependencies, removing a multi-step apt installation. The tradeoff is startup overhead in exchange for results I can trust in CI.

Native PlaywrightBrowser-specific containersCloud browser services

Why ZIP scanning over live URL scanning?

I wanted to scan exactly what would be deployed. ZIP archives let the scanner analyze the real build output, including routing and asset paths, without requiring a staging deploy. The ephemeral HTTP server makes the ZIP indistinguishable from a live site as far as the browser is concerned.

Live URL-only scanningLocal HTTP server (manual)Direct directory scanning

Why port 0 for the ephemeral server?

Picking a port manually and hoping it is free causes intermittent failures in CI. Port 0 tells the kernel to assign a guaranteed-free port atomically. The server reads the assigned port back via s.getsockname()[1]. The tradeoff is that callers don't know the URL until after start() returns, so Pipeline.run() accesses it via http_service.base_url.

Fixed port (e.g., 8080)Port scanning for free portsCaller-specified port

Why SQLite + PostgreSQL over a single DB?

I wanted zero-friction local runs while keeping a production path open. SQLite works out of the box for single-instance deployments. PostgreSQL supports teams that need concurrent writers or replication. The same job model works in both via SQLAlchemy's DatabaseJobStore, so switching costs nothing.

PostgreSQL-onlyNoSQL document storeIn-memory only

Why sequential keyboard tests with a separate browser?

Axe scans are stateless enough to share browser contexts, but keyboard tests require isolated focus state. I split the pipeline so axe scanning and keyboard testing run sequentially with separate Playwright instances. Playwright's sync API uses greenlets that can't transfer between threads, which rules out true parallelism anyway.

All sequentialAll concurrentPer-page workers

Why a custom keyboard engine instead of relying on axe-core?

axe-core is a static DOM analyzer. It cannot simulate keyboard input. Focus visibility, tab order, focus traps, and unreachable interactive elements only exist at runtime when someone presses Tab. I built a separate engine with three test suites: tab_navigation, focus_indicators, and focus_traps. A _self_validate() step tests the engine against known-good and known-bad pages before each scan session.

axe-core onlyManual keyboard testingThird-party keyboard testing library

Tech Stack

What actually shipped the system.

Backend

Python 3.10+

Core scanner, match statements, X | Y union syntax

FastAPI

REST API with async background tasks and auto-generated OpenAPI docs

Playwright

Browser automation (sync API via greenlets)

axe_playwright_python

Injects axe-core into Playwright pages, returns violations as Python dicts

click

CLI framework for scanner/clear11y commands

Rich

Terminal progress bars, tables, and warning panels

Frontend

Jinja2

HTML report templates with ChoiceLoader (filesystem then packaged)

Vanilla JS

Interactive report UI with severity filtering

Infrastructure

Docker

Containers built on mcr.microsoft.com/playwright/python base image

GitHub Actions

Composite action with 4 steps and 7 output variables

uv + hatchling

Build and install from uv.lock in seconds, no pip resolver

Data

SQLite

Zero-config job tracking for local and single-node use

PostgreSQL

Multi-node job tracking with concurrent writers

SQLAlchemy 2.0

ORM with declarative_base, cascade deletes, RLock thread safety

jsonschema

Validates ReportData against consolidated-report-v0.5.0.json schema

Challenges

What was hard and how I dealt with it.

Playwright doesn't reliably support file:// URLs across browsers

I serve the extracted build over an ephemeral HTTP server bound to port 0 (kernel-assigned). The server runs in a daemon thread. Pipeline.run() calls http_service.stop() in a finally block, which shuts down the server and joins the thread with a 5-second timeout.

getComputedStyle() returns a live object, so comparing it to itself after focus always shows equal values

IS_FOCUS_VISIBLE_SCRIPT reads all 8 property values into a plain before object as primitive strings, then calls focus({ focusVisible: true }), then reads again into an after object. A test in test_keyboard_focus.py verifies that 'const before' appears before 'element.focus(' in the script source.

ZIP archives could contain Zip Slip traversal paths or Zip Bombs

Two-pass validation: reject absolute paths and '..' components before os.path.normpath(), then check again after normalization. Final guard: os.path.commonpath() on resolved absolute paths. Zip Bomb protection sums uncompressed sizes from the central directory and aborts at 500 MB before extracting a single byte.

The URL scan endpoint could be used for SSRF to probe internal networks

I resolve hostnames via socket.getaddrinfo() and check every returned IP with ipaddress.is_global. RFC 1918, loopback, and link-local addresses produce HTTP 400. Localhost is blocked by name before DNS to catch trivial bypasses. For ZIP scans, Playwright navigation is restricted to localhost origins.

Playwright's sync API uses greenlets that can't transfer between threads

I use sequential scanning with a single BrowserManager context that spans all pages. Each page gets a fresh BrowserContext (isolated cookies and storage) without restarting Chromium. For the FastAPI server, asyncio.to_thread() offloads the synchronous scan to a thread pool worker so the event loop isn't blocked.

Keyboard tests require isolated focus state and can't share a browser with axe scans

I separated execution models: axe scanning runs first with BrowserManager, then keyboard testing runs with its own Playwright instance. Two Chromium processes run sequentially. Sharing a browser would create implicit coupling between service lifecycles.

Docker bind-mount ownership varies across hosts, breaking writes from non-root containers

The entrypoint runs as root only long enough to chown the data directory to pwuser, detect the Docker socket GID at runtime, and add pwuser to the matching group. Then it drops to pwuser via exec su. The double exec (exec su -c 'exec python') propagates signals correctly to the Python process.

Outcomes

What shipped and what improved.

I bypassed Playwright's file:// restrictions by embedding an ephemeral HTTP server that binds port 0 inside the container. ZIP builds become indistinguishable from live sites to the browser.

I reduced per-page scan time by keeping one Playwright browser warm and creating lightweight BrowserContexts per page. This cuts ~500ms of browser launch overhead per page.

I built IS_FOCUS_VISIBLE_SCRIPT to detect focus indicator issues that axe-core's static analysis misses. It snapshots 8 computed style properties before focus and compares after, catching outline:none suppressions and 4 other indicator patterns.

I shipped a GitHub Action composite step with 7 output variables. Teams can gate deployments on accessibility scores without writing custom CI logic.

I hardened file extraction with two-pass Zip Slip validation (pre and post normpath) plus a commonpath final guard. Zip Bomb protection reads the central directory and aborts at 500 MB before decompressing anything.

I blocked SSRF at the socket level using getaddrinfo() resolution and ipaddress.is_global checks. RFC 1918, loopback, and link-local addresses get rejected before the browser navigates.

I structured the reporting pipeline so any failed stage can be re-run in isolation. Consolidation reads from the results/ directory without caring how the files got there.

Next Step

Try it yourself

Clear11y is open source. Pull the Docker image, scan your static builds, and ship accessible websites.