Open Source2024-01 to OngoingSolo

Clear11y

Containerized accessibility scanner for pre-deployment static site testing.

Source Code Docs

9.3K

Python LOC

40-80%

Faster Scans

90+

axe-core Rules

Arch Layers

Docker-FirstCLIREST APIGitHub ActionWCAG 2.1

Overview

I built Clear11y to close a frustrating gap in static-site workflows: you can’t reliably scan a `dist/` build with browser tools before you deploy it. Clear11y scans ZIP-packaged builds in an isolated Playwright container and produces evidence-rich WCAG reports so accessibility issues get fixed before users ever see them.

Problem

Static builds can’t be tested before deployment.

Static site generators produce final HTML in `dist/` or `build/`, but the most common a11y workflow (browser extensions) breaks down because `file://` scanning is restricted and inconsistent.

I kept seeing teams forced into bad tradeoffs: deploy first and test in production, stand up local servers that don’t match real artifacts, or skip automated testing entirely. That’s how accessibility regressions slip into releases.

Solution

Containerized scanning of ZIP-packaged builds.

I treat the build artifact as the input: zip the output folder and scan it in a Docker-isolated Playwright runtime so results are consistent locally and in CI.

That enables a simple gate: Build → Scan → Deploy. If violations exceed a threshold, the pipeline fails and you fix issues before anything ships.

Workflow

Build → Scan → Deploy

Package your static build as a ZIP, scan it for WCAG violations, fix issues before deployment. Only accessible builds ship to production.

Lifecycle

PENDINGRUNNINGSUCCESSFAILED

Key Endpoints

POST

/api/jobs/zip

Submit ZIP archive for scanning

POST

/api/jobs/url

Submit live URLs for scanning

GET

/api/jobs/:id

Get job status and metadata

GET

/api/jobs/:id/report

Get HTML accessibility report

GET

/healthz

Health check endpoint

GET

/prometheus

Prometheus-format metrics

Architecture

Docker-first microservice architecture with CLI and REST API interfaces, using Playwright for browser automation and axe-core for WCAG validation.

Interface Layer

CLI (click)
REST API (FastAPI)
GitHub Action

Orchestration Layer

Pipeline
Container Manager
Job Queue

Scanning Services

Playwright Axe Service
Keyboard Service
Browser Manager
HTTP Service

Storage & Support

ZIP Service
HTML Discovery
Report Generator
SQLite/PostgreSQL

Runtime Environment

Docker Container
Playwright Browsers
Python 3.10+

Capabilities

7 Core Features

ZIP Archive Scanning

core

Scans static builds packaged as ZIP before deployment

Keyboard Navigation Service

core

Validates tab reachability, focus indicators, and focus traps

Browser Reuse

performance

Single browser instance reduces scan time by up to 80%

Screenshot Evidence

Captures violations with red-highlighted CSS overlays

SSRF & Zip Slip Shield

security

Protects against path traversal and private network probes

Job-based FastAPI

integration

Async processing with database-backed status tracking

GitHub Action Gate

integration

Composite action for build-failing CI/CD integration

Tech Stack

Backend

Python 3.10+

Core scanner implementation

FastAPI

REST API server, async job processing

Playwright

Cross-browser automation

axe-core

90+ WCAG 2.1 accessibility rules

Frontend

Jinja2

HTML report templating

Vanilla JS

Interactive report UI

Infrastructure

Docker

Container isolation, consistent environments

GitHub Actions

CI/CD automation

Data

SQLite

Job tracking (dev/single-node)

PostgreSQL

Job tracking (production/multi-node)

SQLAlchemy 2.0

Database ORM

Tradeoffs & Decisions

Why Docker over native Playwright?

I optimized for repeatability. Docker pins the Playwright + browser versions so scans don’t change depending on the machine running them. The tradeoff is a bit of startup overhead in exchange for results I can trust in CI.

Alternatives:Native PlaywrightBrowser-specific containersCloud browser services

Why ZIP scanning over live URL scanning?

I wanted to scan exactly what would be deployed. ZIP archives let the scanner analyze the real build output (including routing/asset paths) without requiring a staging deploy.

Alternatives:Live URL-only scanningLocal HTTP serverDirect directory scanning

Why SQLite + PostgreSQL over single DB?

I wanted a zero-friction local experience while keeping a production path open. SQLite makes local runs trivial; PostgreSQL supports teams that want multi-node durability. The same job model works in both via SQLAlchemy.

Alternatives:PostgreSQL-onlyNoSQL document storeIn-memory only

Why concurrent axe but sequential keyboard tests?

Axe scans are stateless enough to parallelize for big wins, but keyboard tests require isolated focus state. I split the pipeline so axe runs concurrently while keyboard checks stay sequential to preserve correctness.

Alternatives:All sequentialAll concurrentPer-page workers

Challenges

Playwright doesn’t reliably support file:// URLs across browsers

I serve the extracted build over an ephemeral local HTTP server so the scan behaves like a real deployment while still testing the exact build artifact.

Keyboard accessibility tests require isolated focus state and can’t be parallelized

I separated execution models: concurrent axe scanning for speed, sequential keyboard testing for correctness.

ZIP archives could contain malicious paths (Zip Slip vulnerability CVE-2018-1002200)

I validate and normalize extraction paths so every file stays inside the working directory, preventing path traversal.

API needed to support both in-process scanning and per-job containerization

I added an optional per-job container mode for tighter isolation, while keeping a simpler in-process mode for local runs.

Outcomes

I enabled pre-deployment a11y testing for static builds (Hugo/Jekyll/Astro/Vite/Next export/Eleventy) without a staging deploy
I reduced scan times by reusing Playwright browser instances (often 40–80% faster depending on page count)
I shipped a CI gate via a composite GitHub Action so accessibility regressions fail builds automatically
I hardened the scanner for untrusted input (Zip Slip + SSRF protections)
I generate evidence-rich reports with screenshots and WCAG mappings to make fixes straightforward
I kept the runtime flexible: SQLite for local runs, PostgreSQL for teams that need durable job history

Try it yourself

Clear11y is open source. Pull the Docker image, scan your static builds, and ship accessible websites.

View on GitHub