Skip to main content
Matthew BobackBackend & Platform Engineer
Open Source2025-01 to OngoingSolo

Case Study

DisplayAnalysis

Detect the display artifacts that cause eye strain.

I built DisplayAnalysis to detect screen artifacts that cause eye strain and headaches: PWM flicker, temporal dithering, brightness non-uniformity. It processes video recordings using FFT analysis, Welford's online algorithm, and CIELAB color science, then produces a risk-assessed PDF report. Every measurement is numeric, every threshold is explicit, and every risk rating has a specific number behind it.

3.8K

Python LOC

154

Tests

480Hz+

Video Support

9

PDF Pages

CLIGUIFFTCIELABWelford's AlgorithmPDF ReportsDockerContract Testing

Why this matters

Proof first, implementation second.
I want the reader to see the product, understand the operator workflow, and then dig into the architecture and tradeoffs behind it.

I process multi-minute 4K video using 16 MB for pixel statistics regardless of duration, via Welford's two-delta online algorithm in PixelStatsAccumulator

Flicker detection accuracy validated against synthetic signals: frequency within ±1 Hz, amplitude within 15%, using Hanning-windowed FFT with coherent gain correction

TestSummaryKeyContract catches silent field-rename bugs before they ship. 13 fields verified across both calculate_eye_strain_risk() and generate_pdf_report()

Overview

What the product does and why I built it that way.

I built DisplayAnalysis to quantify display artifacts that are hard to see but easy to feel. It analyzes high-frame-rate captures to detect temporal dithering, PWM backlight flicker, and uniformity issues, then generates a nine-page PDF report with risk assessments and plain-language explanations. Under the hood, Welford's online algorithm keeps pixel statistics at 16 MB regardless of video length, Hanning-windowed FFT detects flicker frequencies within ±1 Hz, and CIELAB color science makes uniformity measurements perceptually meaningful. 154 tests across 10 files validate every threshold boundary.

Highlights

Quick read

PixelStatsAccumulator cuts per-pixel variance memory from O(N×H×W) to O(H×W) using Welford's two-delta update; finalize() output verified within 1% of batch np.std()

calculate_flicker_metrics() applies a Hanning window and corrects amplitude with (2.0 / window_sum) * dominant_mag, detecting a synthetic 10 Hz signal within 1 Hz and 15% amplitude tolerance

TestSummaryKeyContract uses sentinel value 13.83 to prevent silent .get(key, 0) fallbacks from producing wrong risk ratings across 13 required AnalysisSummary fields

Nine-page PDF with matplotlib PdfPages: heatmaps, box plots, FFT spectrum, worst-case frame extraction, per-pixel temporal stability map

Problem

Display artifacts cause eye strain but are invisible in a single frame.

Modern displays use PWM to dim the backlight by rapidly switching it on and off, sometimes hundreds of times per second. They use temporal dithering (FRC) to simulate colors the panel can't produce by alternating between nearby pixel values across frames. None of this shows up in a screenshot.

The artifacts live in the temporal domain and in sub-perceptual color shifts. That makes them hard to pin down even when the symptoms are obvious: headaches, fatigue, the vague sense that one screen feels worse than another. I wanted an objective way to answer "is my display doing something weird?" without lab equipment. Record a high-speed capture, compute metrics, generate a report.

Solution

Frame-by-frame analysis makes the invisible measurable.

I run a five-stage pipeline: frame generation yields one frame at a time through a generator, per-frame metric extraction computes 12 values per frame, Welford's algorithm accumulates pixel statistics in constant memory, post-loop aggregation runs the FFT and assembles typed results, and the report renderer produces a nine-page PDF.

The output is quantified and explainable. You get a risk assessment with specific thresholds: dither below 1% of pixels is LOW, PWM below 0.5% modulation depth is exempt, frequencies below 100 Hz are HIGH. One bad category makes the overall rating HIGH, because eye strain doesn't average out across categories.

Workflow

Capture → Analyze → Report
Record your display with a high-speed camera, run the analysis, and receive a comprehensive PDF report with quantified metrics and risk assessments.
FRAME_GENERATION
METRIC_EXTRACTION
PIXEL_ACCUMULATION
POST_LOOP_AGGREGATION
REPORT_GENERATION

Command Line Interface

CLI

analyze-display <input>

Run full analysis pipeline on a video file or image sequence

CLI

analyze-display --interactive

Launch guided wizard with FPS camera examples and ROI validation

CLI

analyze-display-gui

Launch Tkinter GUI with visual ROI selection via cv2.selectROI()

Python

run_analysis(AnalysisConfig(...))

Programmatic entry; accepts AnalysisConfig or argparse.Namespace via isinstance check

Docker

docker run --rm -v $PWD:/data display-analysis /data/video.mp4

Headless container. CMD ["--help"] makes it self-documenting without arguments

Architecture

The system shape behind the product.

Five modules with one-way dependencies. utils.py handles I/O, metrics.py does pure computation with no side effects (imports nothing from the package except utils.safe_div), models.py defines three typed dataclasses, analyze_display.py orchestrates everything through run_analysis(), and reporting.py at 1042 lines generates the nine-page PDF. The AnalysisResults dataclass is the single typed handoff between analysis and reporting.

User Interface

User Interface

CLI (argparse)

Interactive Wizard

Tkinter GUI

Docker Entry Point

Orchestration

Orchestration

analyze_display.py (908 lines)

run_analysis()

_process_all_frames()

_build_results()

Analysis

Analysis

metrics.py (pure, no I/O)

PixelStatsAccumulator

Temporal/Spatial/Color Metrics

FFT Flicker Detection

Data Model

Data Model

AnalysisConfig

AnalysisSummary

AnalysisResults

WorstCase

Output

Output

PDF Reports (9 pages)

Risk Assessment Engine

CSV/JSON Export

Heatmap PNGs

Analysis Capabilities

8 Detection Methods

Streaming Pixel Variance (Welford's Algorithm)

core

PixelStatsAccumulator.update() uses the two-delta form: delta computed before the mean update, delta2 computed after. This is the numerically stable variant. Memory stays at O(H×W), about 16 MB for a 1080p ROI, regardless of video length. The finalize() output matches batch np.std() within 1%.

PWM Flicker Analysis (Hanning-Windowed FFT)

core

calculate_flicker_metrics() applies np.hanning(N) to the brightness time series before scipy.fft.rfft. Without windowing, non-integer cycle counts cause spectral leakage that smears energy into adjacent bins. After finding the dominant bin, amplitude is corrected with (2.0 / window_sum) * dominant_mag. Integration tests confirm frequency within ±1 Hz and amplitude within 15%.

Dither Pixel Detection (Float32-Safe)

core

Counts pixels that changed by ±1 between frames, the signature of FRC temporal dithering. Uses np.isclose(abs_diff, 1.0, rtol=1e-5, atol=1e-8) instead of exact equality because uint8-to-float32 conversion introduces precision artifacts at some pixel values.

Risk Assessment Engine

core

Four categories with explicit thresholds: dither below 1% is LOW, 1–10% MODERATE, 10%+ HIGH. PWM modulation depth below 0.5% is exempt. Frequency below 100 Hz is HIGH, 100–250 Hz MODERATE, above 250 Hz LOW. Text/Edge stability boundaries at 0.5 and 2.0. Overall rule: any single HIGH makes the entire rating HIGH.

Temporal Stability (MAD/RMS/StdDev)

core

Three complementary frame-to-frame measures: MAD catches widespread small changes, RMS amplifies large isolated changes, StdDev reveals spatial unevenness. All normalized by mean ROI brightness.

CIELAB Color Uniformity

core

Block-based color analysis using scikit-image's rgb2lab, chosen over OpenCV's implementation for correct CIE D65 white point. Perceptual uniformity means a delta‑E of ~1 corresponds to a just-noticeable difference. Measures L*, a*, b* channels independently via vectorized block means.

Contract Testing (TestSummaryKeyContract)

core

Guards against the most common silent failure: dictionary field renames between analysis and reporting. Three tests verify all 13 AnalysisSummary.to_dict() fields match what calculate_eye_strain_risk() and generate_pdf_report() consume. Sentinel value 13.83 catches wrong-field fallbacks that would produce incorrect risk ratings.

Interactive Setup Wizard

dx

State-machine text wizard that validates ROI coordinates, FPS values, and skip settings before anything reaches the analysis pipeline. Includes common camera examples for FPS selection and an --override-fps escape hatch for slow-motion recordings where container metadata lies about capture rate.

Tkinter GUI Interface

dx

Native file selection, ROI drawing with cv2.selectROI() on the first frame, and background-thread analysis so the UI stays responsive. 216 lines. Log messages forwarded to the text pane via a queue polled every 200 ms.

Worst-Case Frame Capture

core

Nine WorstCase instances track the single highest-valued frame per metric during the loop. Pages 6 and 7 of the PDF show the actual display frame at the worst temporal instability moment and the brightness block-mean heatmap at the worst spatial uniformity moment.

Docker Deployment (Headless + noVNC)

dx

Two Dockerfiles: headless CLI image with COPY --from=ghcr.io/astral-sh/uv multi-stage install, and GUI image with Xvfb + x11vnc + websockify stack accessible at http://localhost:6080. _has_display_environment() checks DISPLAY, WAYLAND_DISPLAY, and MIR_SOCKET to auto-detect headless mode.

Tradeoffs

The decisions worth calling out.

Why Welford's online algorithm instead of batch np.std()?

A 30-second 60fps video at 1080p produces 1800 frames at ~2 MB each, totaling 3.6 GB in RAM. Welford's two-delta update computes per-pixel variance in one pass with O(H×W) state, about 16 MB for 1080p. The 1% accuracy tradeoff against batch computation is acceptable because risk thresholds sit at round numbers (0.5, 2.0).

Batch np.std() at end (crashes on long videos)Downsample to fewer framesNaive running sum-of-squares (numerically unstable)

Why FFT with Hanning windowing for flicker detection?

Frame differencing catches obvious changes but misses periodic flicker that repeats every few frames. FFT gives frequency and amplitude, not just a boolean. The Hanning window prevents spectral leakage from non-integer cycle counts. Without it, a 60 Hz signal might smear from 55–65 Hz. Amplitude correction via (2.0 / window_sum) * dominant_mag recovers the true modulation depth that windowing attenuates.

Frame differencingPeak detectionAutocorrelationRectangular window FFT

Why scikit-image over OpenCV for CIELAB conversion?

OpenCV's cv2.COLOR_BGR2Lab uses a D65 white point with non-standard normalization. scikit-image's rgb2lab implements the CIE 1976 standard exactly, with L* in [0, 100] and a*, b* in [-128, 127]. For uniformity measurements, the numbers need to be interpretable in absolute terms. A delta‑E of ~1 in CIELAB corresponds to a just-noticeable difference for an average observer.

OpenCV cvtColorRGB Euclidean distanceHSV/HSLXYZ color space

Why contract testing with a sentinel value?

AnalysisSummary.to_dict() produces data that both calculate_eye_strain_risk() and generate_pdf_report() consume via .get(field, 0). If a field is renamed, the fallback silently returns zero and produces a wrong risk rating with no exception. TestSummaryKeyContract passes 13.83 through the pipeline and checks that exact value appears in the output. 13.83 lands in the HIGH range and is unlikely to appear by accident.

Shared dataclasses everywhere (requires rewriting 1042-line reporting.py)Implicit kwargsSchema validation registries

Why np.isclose instead of exact equality for dither detection?

FRC dithering shows up as ±1 pixel changes between frames. But uint8 to float32 conversion is not perfectly exact for all values. Using abs_diff == 1.0 misses legitimate dither pixels at certain values. np.isclose with default tolerances (rtol=1e-5, atol=1e-8) catches values within 0.001 of 1.0 without false positives at 0.9 or 1.1.

Integer arithmetic on uint8Larger tolerance thresholdExact float comparison

Why four interface modes sharing one pipeline?

CLI enables automation and scripting. The interactive wizard reduces setup mistakes with FPS/ROI validation and camera examples. The GUI provides visual ROI confirmation with cv2.selectROI(). Docker eliminates OpenCV's native library dependencies. All four converge on run_analysis(), so a bug fixed in CLI testing is fixed for the GUI too.

CLI onlyGUI onlyWeb interface

Why PDF reports over interactive dashboards?

I optimized for shareability. PDFs are self-contained, printable, work offline, and make it straightforward to compare displays or send results to someone else. The nine-page structure is deterministic: same input always produces the same report.

HTML dashboardJupyter notebookTerminal output only

Why conservative overall risk (any HIGH = overall HIGH)?

Eye strain doesn't average out. A display with perfect uniformity and zero dithering but 60 Hz PWM flicker is still unusable. One bad dimension ruins the experience. The scoring maps LOW=0, MODERATE=1, HIGH=2. Overall is MODERATE if the average exceeds 0.5 and no category is HIGH.

Weighted average of all categoriesMajority votingUser-configurable severity weights

Tech Stack

What actually shipped the system.

Language

Python 3.8+

Core implementation with src layout via hatchling build backend

Computer Vision

OpenCV 4.8+

Video I/O, frame extraction, ROI selection, BGR color pipeline

scikit-image 0.21+

CIELAB conversion via rgb2lab (correct CIE D65 white point, unlike OpenCV's variant)

Scientific Computing

NumPy 1.24+

Welford accumulator arrays (float64), np.isclose for dither detection, np.hanning for FFT windowing

SciPy 1.10+

rfft/rfftfreq for flicker frequency detection (faster than NumPy's FFT for large N)

pandas 2.0+

Per-frame metrics DataFrame, CSV/JSON export, .describe().T for report statistics

Visualization

Matplotlib 3.7+

Nine-page PDF via PdfPages backend, heatmaps (inferno/coolwarm), FFT spectrum plots

Infrastructure

Docker

Two images: headless CLI (python:3.12-slim) and GUI (noVNC via Xvfb + x11vnc + websockify)

uv

Dependency management with frozen lockfile; COPY --from=ghcr.io/astral-sh/uv in Dockerfiles

GitHub Actions

CI pipeline: uv sync, pytest across Python versions

Testing & Quality

pytest 7.0+

154 tests across 10 files; conftest fixtures provide synthetic frames without real video

ruff

Linter and formatter replacing flake8, black, isort, and pyupgrade in one binary

mypy

Gradual static type checking with per-file overrides for Tkinter in gui.py

Challenges

What was hard and how I dealt with it.

Per-pixel temporal analysis requires O(N×H×W) memory. A 30-second 60fps 1080p video needs 3.6 GB.

I replaced the frame list with PixelStatsAccumulator using Welford's two-delta update. Memory dropped to ~16 MB for 1080p regardless of video length. The 1% accuracy difference against batch np.std() is well within the risk threshold boundaries (0.5, 2.0).

FFT on finite recordings produces spectral leakage when the signal doesn't contain integer cycle counts. A 60 Hz signal can smear across 55–65 Hz, causing wrong frequency classification.

I apply a Hanning window before the transform, which tapers the signal to zero at both ends. The window attenuates amplitude, so I correct with (2.0 / window_sum) * dominant_mag. The integration test confirms ±1 Hz frequency accuracy and 15% amplitude accuracy against a known 10 Hz synthetic signal.

Dictionary field renames between AnalysisSummary.to_dict() and calculate_eye_strain_risk() fail silently. The .get(field, 0) fallback returns zero, producing a wrong risk rating with no exception.

I wrote TestSummaryKeyContract with three tests: verify all 13 consumed fields exist, and pass sentinel value 13.83 through the pipeline to confirm it appears in the output string. 13.83 is in the HIGH range and won't match any accidental default.

FRC dither detection needs to count ±1 pixel changes, but uint8-to-float32 conversion loses precision at some values. Using abs_diff == 1.0 misses real dither pixels.

I switched to np.isclose(abs_diff, 1.0, rtol=1e-5, atol=1e-8), which catches values within 0.001 of 1.0 without false positives from actual motion artifacts at 0.9 or 1.1.

ROI preview calls cv2.imshow(), which crashes in Docker, SSH, and CI environments where no display server exists.

I added _has_display_environment() that checks DISPLAY, WAYLAND_DISPLAY, and MIR_SOCKET before attempting GUI operations. For container users who still need visual ROI selection, I built a second Docker image with Xvfb + x11vnc + websockify, accessible via browser at port 6080.

OpenCV reads BGR, scikit-image expects RGB normalized to [0,1] for CIELAB, and OpenCV's own CIELAB conversion uses a non-standard D65 white point normalization.

I made the conversion chain explicit: BGR to RGB, divide by 255.0 to float64, then skimage.color.rgb2lab. I chose scikit-image specifically because it implements the CIE 1976 standard exactly, making block-mean StdDev values interpretable in absolute perceptual terms.

Outcomes

What shipped and what improved.

I process multi-minute 4K video using 16 MB for pixel statistics regardless of duration, via Welford's two-delta online algorithm in PixelStatsAccumulator

Flicker detection accuracy validated against synthetic signals: frequency within ±1 Hz, amplitude within 15%, using Hanning-windowed FFT with coherent gain correction

TestSummaryKeyContract catches silent field-rename bugs before they ship. 13 fields verified across both calculate_eye_strain_risk() and generate_pdf_report()

154 tests across 10 files. test_eye_strain_risk.py alone has 41 tests covering every threshold boundary for all four risk categories

Nine-page PDF generated headlessly via matplotlib PdfPages: risk summary, descriptive stats, temporal metrics, spatial/color plots, FFT spectrum, worst-case frames, CIELAB heatmaps, per-pixel stability map

No external service dependencies. The tool runs fully offline and produces deterministic output for the same input and configuration

Next Step

Investigate your display

If you’re experiencing eye strain, headaches, or fatigue, use this tool to objectively measure your display’s flicker and dithering artifacts.