Case Study
Clear11y
Containerized accessibility scanner for pre-deployment static site testing.
Clear11y scans ZIP archives of static site builds for WCAG 2.1 violations before deployment. It runs axe-core (90+ rules) and a custom keyboard navigation engine inside Docker containers, producing scored reports with screenshots. The CLI, FastAPI REST API, and GitHub Action all call the same Pipeline class.
9.3K
Python LOC
90+
axe-core Rules
7
Pipeline Stages
5
Arch Layers
Why this matters
I bypassed Playwright's file:// restrictions by embedding an ephemeral HTTP server that binds port 0 inside the container. ZIP builds become indistinguishable from live sites to the browser.
I reduced per-page scan time by keeping one Playwright browser warm and creating lightweight BrowserContexts per page. This cuts ~500ms of browser launch overhead per page.
I built IS_FOCUS_VISIBLE_SCRIPT to detect focus indicator issues that axe-core's static analysis misses. It snapshots 8 computed style properties before focus and compares after, catching outline:none suppressions and 4 other indicator patterns.
Overview
What the product does and why I built it that way.
I built Clear11y to close a gap in static-site workflows: you can't reliably scan a `dist/` build with browser tools before you deploy it. Browser extensions can't access `file://` URLs, and Playwright's `file://` handling varies across browsers and versions. Clear11y takes a ZIP-packaged build, extracts it, serves it over an ephemeral HTTP server bound to port 0, and runs both axe-core and a custom keyboard engine against it. The result is a scored WCAG report with screenshot evidence, so accessibility issues get caught before users ever see them.
Highlights
Scans build artifacts from ZIP archives, bypassing the file:// restriction that blocks every browser-based accessibility tool
Two testing engines: axe-core for static WCAG rules and a custom KeyboardAccessibilityService that detects focus issues axe-core cannot see
IS_FOCUS_VISIBLE_SCRIPT snapshots 8 computed style properties before focus, compares after, across 5 indicator categories (outline, box-shadow, border, background-color, text-decoration)
Ephemeral HTTP server binds port 0 (kernel-assigned), runs in a daemon thread, shuts down with a 5-second join timeout
Zip Slip protection runs two-pass validation: before and after os.path.normpath(), plus os.path.commonpath() as a final guard
Problem
Static site generators produce final HTML in `dist/` or `build/`, but the most common accessibility workflow (browser extensions) breaks because Chrome and Firefox block extension access to `file://` URLs as a security measure. Playwright can navigate `file://` paths, but the behavior changes between Chromium versions and requires different flags per browser.
I kept seeing teams deploy first and test in production, or skip automated testing entirely. That's how accessibility regressions slip into releases. And even when teams do run axe-core, it's a static analyzer. It cannot detect focus visibility, tab order, or focus traps, because those only exist when a user is pressing Tab.
Solution
Clear11y takes the zipped build artifact and runs it through an isolated Playwright container. Instead of relying on `file://` navigation, it spins up an ephemeral HTTP server bound to port 0 (kernel-assigned) inside the container, making the build look like a live site to the browser.
Two engines scan each page. axe-core checks 90+ static WCAG rules (missing alt text, color contrast, ARIA misuse). A custom keyboard engine presses Tab up to 150 times, compares 8 computed style properties before and after `focus()` to detect missing indicators, catches focus traps when the same element appears 3+ times consecutively, and renders SVG overlays of the tab journey.
The result is a scored report (0-100, graded A+ through F) with screenshot evidence. The CLI, FastAPI REST API, and GitHub Action all call the same Pipeline class. If violations exceed a threshold, the pipeline fails and you fix issues before deployment.
Workflow
Key Endpoints
/api/jobs/zip
Submit ZIP archive for scanning (100 MB limit, returns 202 with job ID)
/api/jobs/url
Submit up to 50 URLs for scanning (SSRF-validated, returns 202)
/api/jobs/:id
Poll job status, violation counts, and report link
/api/jobs/:id/report
Self-contained HTML report with base64-embedded screenshots
/api/jobs/:id/consolidated
Full ReportData as JSON (metadata, per-page violations, keyboard results)
/healthz
Health check endpoint
/prometheus
Prometheus text format metrics (job counts, latency, errors)
Architecture
The system shape behind the product.
I organized Clear11y into five layers. Three interfaces (CLI, API, GitHub Action) all construct the same Pipeline with injected services. The Pipeline coordinates stateless scanning services in sequence. None of the interface code knows about scanning; none of the service code knows about HTTP or CLI flags. Adding the API required writing only `web/server.py` with zero changes to any service or the pipeline.
Interface Layer
Interface Layer
CLI (click + Rich)
REST API (FastAPI)
GitHub Action (composite)
Orchestration Layer
Orchestration Layer
Pipeline (pipeline.py)
Container Manager
Settings (TOML + env + defaults)
Scanning Services
Scanning Services
PlaywrightAxeService (stateless)
KeyboardAccessibilityService
BrowserManager
HttpService (port 0)
Storage & Reporting
Storage & Reporting
ZipService
HtmlDiscoveryService
Jinja2 Report Generator
DatabaseJobStore (SQLAlchemy)
Runtime Environment
Runtime Environment
Docker (playwright/python base)
Chromium (headless)
Python 3.10+
uv + hatchling
Capabilities
ZIP Archive HTTP Scanning
coreExtracts static builds from ZIP archives and serves them over an ephemeral HTTP server bound to port 0 (kernel-assigned). The server runs in a daemon thread with a 5-second join timeout on shutdown. BLOCKED_URL_PATTERNS aborts requests to analytics and tracking domains for deterministic scans.
Focus Indicator Detection
coreIS_FOCUS_VISIBLE_SCRIPT snapshots 8 computed style properties as primitive strings before calling element.focus({ focusVisible: true }), then compares after values. Detects missing indicators across 5 categories: outline, box-shadow, border, background-color, and text-decoration. The focusVisible: true option triggers :focus-visible UA styles, matching what keyboard users actually see.
Tab Journey Visualization
coreSimulates full Tab traversal (up to 150 keypresses), recording each focused element's selector, role, accessible name, and bounding box. Injects an SVG overlay with magenta numbered circles (r=14) connected by lines showing exact tab order. Captures a full-page screenshot so you can see at a glance if navigation is out of order.
Focus Trap Detection
corePresses Tab 20 times from a clean page and tracks focus history. If the same element appears 3+ times consecutively, that indicates a focus trap. Reported as a critical severity violation mapped to WCAG 2.1.2 (No Keyboard Trap).
Self-Validating Keyboard Engine
coreBefore any real scan, _self_validate() creates an inline test page with three buttons: one with proper outline focus, one with outline: none, and one with box-shadow. The validator checks that IS_FOCUS_VISIBLE_SCRIPT correctly distinguishes them. If validation fails, every subsequent scan logs a warning. This caught a real regression when a Playwright version changed focusVisible behavior.
Playwright Context Reuse
performanceI eliminated browser cold starts by sharing one Chromium process across all pages and creating isolated BrowserContexts per page. Each context resets cookies and storage without the ~500ms browser launch cost. The axe engine and keyboard engine use separate Playwright instances because they have different lifecycle requirements.
Screenshot Evidence
dxCaptures violations with red-highlighted CSS overlays injected via page.evaluate(). Default highlight is outline: 4px solid #ff0000 with box-shadow. Up to 4 elements per violation (configurable via A11Y_SHOT_MAX_TARGETS), with overlapping elements merged into single contextual screenshots.
WCAG Scoring
dxPenalty weights: critical=40, serious=20, moderate=5, minor=1. Score is 100 minus the sum of penalties, capped at 0. Keyboard violations count too, with focus traps and unreachable elements rated critical, poor indicators rated serious. Grades: 90+ is A+, 80+ is A, 70+ is B, 60+ is C, 50+ is D, below 50 is F.
Zip Slip & Zip Bomb Protection
securityTwo-pass path validation: rejects absolute paths and '..' components before os.path.normpath(), then checks again after normalization (which can introduce traversal from safe-looking inputs). Final guard via os.path.commonpath() on resolved absolute paths. Zip Bomb protection reads uncompressed sizes from the ZIP central directory and aborts at 500 MB before any bytes hit disk.
SSRF Mitigation
securityFor URL scans, _validate_public_http_url() resolves hostnames via socket.getaddrinfo() and requires every resolved IP to pass ipaddress.is_global. Blocks RFC 1918, loopback, and link-local addresses. Localhost is also blocked by name before DNS lookup to catch trivial bypasses. ZIP scans restrict Playwright navigation to localhost origins only.
Job-based FastAPI
integrationPOST endpoints return HTTP 202 with a job ID immediately. Scans run in asyncio background tasks offloaded via asyncio.to_thread() to avoid blocking the uvicorn event loop. Jobs expire after a configurable retention period (default 24 hours) and get cleaned up at startup and on a periodic schedule.
GitHub Action Gate
integrationComposite action with 4 steps: setup Python, install package, build Docker image, run scan. Outputs 7 variables including total-violations, critical-violations, keyboard-violations, and scan-status. The fail-on-violations input exits 1 to block deployment when thresholds are exceeded. Reports upload as build artifacts automatically.
Tradeoffs
Why Docker over native Playwright?
I optimized for repeatability. Docker pins the Playwright + browser versions so scans produce identical results regardless of the host machine. Microsoft's playwright/python base image includes all Chromium system dependencies, removing a multi-step apt installation. The tradeoff is startup overhead in exchange for results I can trust in CI.
Why ZIP scanning over live URL scanning?
I wanted to scan exactly what would be deployed. ZIP archives let the scanner analyze the real build output, including routing and asset paths, without requiring a staging deploy. The ephemeral HTTP server makes the ZIP indistinguishable from a live site as far as the browser is concerned.
Why port 0 for the ephemeral server?
Picking a port manually and hoping it is free causes intermittent failures in CI. Port 0 tells the kernel to assign a guaranteed-free port atomically. The server reads the assigned port back via s.getsockname()[1]. The tradeoff is that callers don't know the URL until after start() returns, so Pipeline.run() accesses it via http_service.base_url.
Why SQLite + PostgreSQL over a single DB?
I wanted zero-friction local runs while keeping a production path open. SQLite works out of the box for single-instance deployments. PostgreSQL supports teams that need concurrent writers or replication. The same job model works in both via SQLAlchemy's DatabaseJobStore, so switching costs nothing.
Why sequential keyboard tests with a separate browser?
Axe scans are stateless enough to share browser contexts, but keyboard tests require isolated focus state. I split the pipeline so axe scanning and keyboard testing run sequentially with separate Playwright instances. Playwright's sync API uses greenlets that can't transfer between threads, which rules out true parallelism anyway.
Why a custom keyboard engine instead of relying on axe-core?
axe-core is a static DOM analyzer. It cannot simulate keyboard input. Focus visibility, tab order, focus traps, and unreachable interactive elements only exist at runtime when someone presses Tab. I built a separate engine with three test suites: tab_navigation, focus_indicators, and focus_traps. A _self_validate() step tests the engine against known-good and known-bad pages before each scan session.
Tech Stack
Backend
Python 3.10+
Core scanner, match statements, X | Y union syntax
FastAPI
REST API with async background tasks and auto-generated OpenAPI docs
Playwright
Browser automation (sync API via greenlets)
axe_playwright_python
Injects axe-core into Playwright pages, returns violations as Python dicts
click
CLI framework for scanner/clear11y commands
Rich
Terminal progress bars, tables, and warning panels
Frontend
Jinja2
HTML report templates with ChoiceLoader (filesystem then packaged)
Vanilla JS
Interactive report UI with severity filtering
Infrastructure
Docker
Containers built on mcr.microsoft.com/playwright/python base image
GitHub Actions
Composite action with 4 steps and 7 output variables
uv + hatchling
Build and install from uv.lock in seconds, no pip resolver
Data
SQLite
Zero-config job tracking for local and single-node use
PostgreSQL
Multi-node job tracking with concurrent writers
SQLAlchemy 2.0
ORM with declarative_base, cascade deletes, RLock thread safety
jsonschema
Validates ReportData against consolidated-report-v0.5.0.json schema
Challenges
Playwright doesn't reliably support file:// URLs across browsers
I serve the extracted build over an ephemeral HTTP server bound to port 0 (kernel-assigned). The server runs in a daemon thread. Pipeline.run() calls http_service.stop() in a finally block, which shuts down the server and joins the thread with a 5-second timeout.
getComputedStyle() returns a live object, so comparing it to itself after focus always shows equal values
IS_FOCUS_VISIBLE_SCRIPT reads all 8 property values into a plain before object as primitive strings, then calls focus({ focusVisible: true }), then reads again into an after object. A test in test_keyboard_focus.py verifies that 'const before' appears before 'element.focus(' in the script source.
ZIP archives could contain Zip Slip traversal paths or Zip Bombs
Two-pass validation: reject absolute paths and '..' components before os.path.normpath(), then check again after normalization. Final guard: os.path.commonpath() on resolved absolute paths. Zip Bomb protection sums uncompressed sizes from the central directory and aborts at 500 MB before extracting a single byte.
The URL scan endpoint could be used for SSRF to probe internal networks
I resolve hostnames via socket.getaddrinfo() and check every returned IP with ipaddress.is_global. RFC 1918, loopback, and link-local addresses produce HTTP 400. Localhost is blocked by name before DNS to catch trivial bypasses. For ZIP scans, Playwright navigation is restricted to localhost origins.
Playwright's sync API uses greenlets that can't transfer between threads
I use sequential scanning with a single BrowserManager context that spans all pages. Each page gets a fresh BrowserContext (isolated cookies and storage) without restarting Chromium. For the FastAPI server, asyncio.to_thread() offloads the synchronous scan to a thread pool worker so the event loop isn't blocked.
Keyboard tests require isolated focus state and can't share a browser with axe scans
I separated execution models: axe scanning runs first with BrowserManager, then keyboard testing runs with its own Playwright instance. Two Chromium processes run sequentially. Sharing a browser would create implicit coupling between service lifecycles.
Docker bind-mount ownership varies across hosts, breaking writes from non-root containers
The entrypoint runs as root only long enough to chown the data directory to pwuser, detect the Docker socket GID at runtime, and add pwuser to the matching group. Then it drops to pwuser via exec su. The double exec (exec su -c 'exec python') propagates signals correctly to the Python process.
Outcomes
I bypassed Playwright's file:// restrictions by embedding an ephemeral HTTP server that binds port 0 inside the container. ZIP builds become indistinguishable from live sites to the browser.
I reduced per-page scan time by keeping one Playwright browser warm and creating lightweight BrowserContexts per page. This cuts ~500ms of browser launch overhead per page.
I built IS_FOCUS_VISIBLE_SCRIPT to detect focus indicator issues that axe-core's static analysis misses. It snapshots 8 computed style properties before focus and compares after, catching outline:none suppressions and 4 other indicator patterns.
I shipped a GitHub Action composite step with 7 output variables. Teams can gate deployments on accessibility scores without writing custom CI logic.
I hardened file extraction with two-pass Zip Slip validation (pre and post normpath) plus a commonpath final guard. Zip Bomb protection reads the central directory and aborts at 500 MB before decompressing anything.
I blocked SSRF at the socket level using getaddrinfo() resolution and ipaddress.is_global checks. RFC 1918, loopback, and link-local addresses get rejected before the browser navigates.
I structured the reporting pipeline so any failed stage can be re-run in isolation. Consolidation reads from the results/ directory without caring how the files got there.
Next Step
Try it yourself
Clear11y is open source. Pull the Docker image, scan your static builds, and ship accessible websites.