Case Study
DisplayAnalysis
Detect the display artifacts that cause eye strain.
I built DisplayAnalysis to detect screen artifacts that cause eye strain and headaches: PWM flicker, temporal dithering, brightness non-uniformity. It processes video recordings using FFT analysis, Welford's online algorithm, and CIELAB color science, then produces a risk-assessed PDF report. Every measurement is numeric, every threshold is explicit, and every risk rating has a specific number behind it.
3.8K
Python LOC
154
Tests
480Hz+
Video Support
9
PDF Pages
Why this matters
I process multi-minute 4K video using 16 MB for pixel statistics regardless of duration, via Welford's two-delta online algorithm in PixelStatsAccumulator
Flicker detection accuracy validated against synthetic signals: frequency within ±1 Hz, amplitude within 15%, using Hanning-windowed FFT with coherent gain correction
TestSummaryKeyContract catches silent field-rename bugs before they ship. 13 fields verified across both calculate_eye_strain_risk() and generate_pdf_report()
Overview
What the product does and why I built it that way.
I built DisplayAnalysis to quantify display artifacts that are hard to see but easy to feel. It analyzes high-frame-rate captures to detect temporal dithering, PWM backlight flicker, and uniformity issues, then generates a nine-page PDF report with risk assessments and plain-language explanations. Under the hood, Welford's online algorithm keeps pixel statistics at 16 MB regardless of video length, Hanning-windowed FFT detects flicker frequencies within ±1 Hz, and CIELAB color science makes uniformity measurements perceptually meaningful. 154 tests across 10 files validate every threshold boundary.
Highlights
PixelStatsAccumulator cuts per-pixel variance memory from O(N×H×W) to O(H×W) using Welford's two-delta update; finalize() output verified within 1% of batch np.std()
calculate_flicker_metrics() applies a Hanning window and corrects amplitude with (2.0 / window_sum) * dominant_mag, detecting a synthetic 10 Hz signal within 1 Hz and 15% amplitude tolerance
TestSummaryKeyContract uses sentinel value 13.83 to prevent silent .get(key, 0) fallbacks from producing wrong risk ratings across 13 required AnalysisSummary fields
Nine-page PDF with matplotlib PdfPages: heatmaps, box plots, FFT spectrum, worst-case frame extraction, per-pixel temporal stability map
Problem
Modern displays use PWM to dim the backlight by rapidly switching it on and off, sometimes hundreds of times per second. They use temporal dithering (FRC) to simulate colors the panel can't produce by alternating between nearby pixel values across frames. None of this shows up in a screenshot.
The artifacts live in the temporal domain and in sub-perceptual color shifts. That makes them hard to pin down even when the symptoms are obvious: headaches, fatigue, the vague sense that one screen feels worse than another. I wanted an objective way to answer "is my display doing something weird?" without lab equipment. Record a high-speed capture, compute metrics, generate a report.
Solution
I run a five-stage pipeline: frame generation yields one frame at a time through a generator, per-frame metric extraction computes 12 values per frame, Welford's algorithm accumulates pixel statistics in constant memory, post-loop aggregation runs the FFT and assembles typed results, and the report renderer produces a nine-page PDF.
The output is quantified and explainable. You get a risk assessment with specific thresholds: dither below 1% of pixels is LOW, PWM below 0.5% modulation depth is exempt, frequencies below 100 Hz are HIGH. One bad category makes the overall rating HIGH, because eye strain doesn't average out across categories.
Workflow
Command Line Interface
analyze-display <input>
Run full analysis pipeline on a video file or image sequence
analyze-display --interactive
Launch guided wizard with FPS camera examples and ROI validation
analyze-display-gui
Launch Tkinter GUI with visual ROI selection via cv2.selectROI()
run_analysis(AnalysisConfig(...))
Programmatic entry; accepts AnalysisConfig or argparse.Namespace via isinstance check
docker run --rm -v $PWD:/data display-analysis /data/video.mp4
Headless container. CMD ["--help"] makes it self-documenting without arguments
Architecture
The system shape behind the product.
Five modules with one-way dependencies. utils.py handles I/O, metrics.py does pure computation with no side effects (imports nothing from the package except utils.safe_div), models.py defines three typed dataclasses, analyze_display.py orchestrates everything through run_analysis(), and reporting.py at 1042 lines generates the nine-page PDF. The AnalysisResults dataclass is the single typed handoff between analysis and reporting.
User Interface
User Interface
CLI (argparse)
Interactive Wizard
Tkinter GUI
Docker Entry Point
Orchestration
Orchestration
analyze_display.py (908 lines)
run_analysis()
_process_all_frames()
_build_results()
Analysis
Analysis
metrics.py (pure, no I/O)
PixelStatsAccumulator
Temporal/Spatial/Color Metrics
FFT Flicker Detection
Data Model
Data Model
AnalysisConfig
AnalysisSummary
AnalysisResults
WorstCase
Output
Output
PDF Reports (9 pages)
Risk Assessment Engine
CSV/JSON Export
Heatmap PNGs
Analysis Capabilities
Streaming Pixel Variance (Welford's Algorithm)
corePixelStatsAccumulator.update() uses the two-delta form: delta computed before the mean update, delta2 computed after. This is the numerically stable variant. Memory stays at O(H×W), about 16 MB for a 1080p ROI, regardless of video length. The finalize() output matches batch np.std() within 1%.
PWM Flicker Analysis (Hanning-Windowed FFT)
corecalculate_flicker_metrics() applies np.hanning(N) to the brightness time series before scipy.fft.rfft. Without windowing, non-integer cycle counts cause spectral leakage that smears energy into adjacent bins. After finding the dominant bin, amplitude is corrected with (2.0 / window_sum) * dominant_mag. Integration tests confirm frequency within ±1 Hz and amplitude within 15%.
Dither Pixel Detection (Float32-Safe)
coreCounts pixels that changed by ±1 between frames, the signature of FRC temporal dithering. Uses np.isclose(abs_diff, 1.0, rtol=1e-5, atol=1e-8) instead of exact equality because uint8-to-float32 conversion introduces precision artifacts at some pixel values.
Risk Assessment Engine
coreFour categories with explicit thresholds: dither below 1% is LOW, 1–10% MODERATE, 10%+ HIGH. PWM modulation depth below 0.5% is exempt. Frequency below 100 Hz is HIGH, 100–250 Hz MODERATE, above 250 Hz LOW. Text/Edge stability boundaries at 0.5 and 2.0. Overall rule: any single HIGH makes the entire rating HIGH.
Temporal Stability (MAD/RMS/StdDev)
coreThree complementary frame-to-frame measures: MAD catches widespread small changes, RMS amplifies large isolated changes, StdDev reveals spatial unevenness. All normalized by mean ROI brightness.
CIELAB Color Uniformity
coreBlock-based color analysis using scikit-image's rgb2lab, chosen over OpenCV's implementation for correct CIE D65 white point. Perceptual uniformity means a delta‑E of ~1 corresponds to a just-noticeable difference. Measures L*, a*, b* channels independently via vectorized block means.
Contract Testing (TestSummaryKeyContract)
coreGuards against the most common silent failure: dictionary field renames between analysis and reporting. Three tests verify all 13 AnalysisSummary.to_dict() fields match what calculate_eye_strain_risk() and generate_pdf_report() consume. Sentinel value 13.83 catches wrong-field fallbacks that would produce incorrect risk ratings.
Interactive Setup Wizard
dxState-machine text wizard that validates ROI coordinates, FPS values, and skip settings before anything reaches the analysis pipeline. Includes common camera examples for FPS selection and an --override-fps escape hatch for slow-motion recordings where container metadata lies about capture rate.
Tkinter GUI Interface
dxNative file selection, ROI drawing with cv2.selectROI() on the first frame, and background-thread analysis so the UI stays responsive. 216 lines. Log messages forwarded to the text pane via a queue polled every 200 ms.
Worst-Case Frame Capture
coreNine WorstCase instances track the single highest-valued frame per metric during the loop. Pages 6 and 7 of the PDF show the actual display frame at the worst temporal instability moment and the brightness block-mean heatmap at the worst spatial uniformity moment.
Docker Deployment (Headless + noVNC)
dxTwo Dockerfiles: headless CLI image with COPY --from=ghcr.io/astral-sh/uv multi-stage install, and GUI image with Xvfb + x11vnc + websockify stack accessible at http://localhost:6080. _has_display_environment() checks DISPLAY, WAYLAND_DISPLAY, and MIR_SOCKET to auto-detect headless mode.
Tradeoffs
Why Welford's online algorithm instead of batch np.std()?
A 30-second 60fps video at 1080p produces 1800 frames at ~2 MB each, totaling 3.6 GB in RAM. Welford's two-delta update computes per-pixel variance in one pass with O(H×W) state, about 16 MB for 1080p. The 1% accuracy tradeoff against batch computation is acceptable because risk thresholds sit at round numbers (0.5, 2.0).
Why FFT with Hanning windowing for flicker detection?
Frame differencing catches obvious changes but misses periodic flicker that repeats every few frames. FFT gives frequency and amplitude, not just a boolean. The Hanning window prevents spectral leakage from non-integer cycle counts. Without it, a 60 Hz signal might smear from 55–65 Hz. Amplitude correction via (2.0 / window_sum) * dominant_mag recovers the true modulation depth that windowing attenuates.
Why scikit-image over OpenCV for CIELAB conversion?
OpenCV's cv2.COLOR_BGR2Lab uses a D65 white point with non-standard normalization. scikit-image's rgb2lab implements the CIE 1976 standard exactly, with L* in [0, 100] and a*, b* in [-128, 127]. For uniformity measurements, the numbers need to be interpretable in absolute terms. A delta‑E of ~1 in CIELAB corresponds to a just-noticeable difference for an average observer.
Why contract testing with a sentinel value?
AnalysisSummary.to_dict() produces data that both calculate_eye_strain_risk() and generate_pdf_report() consume via .get(field, 0). If a field is renamed, the fallback silently returns zero and produces a wrong risk rating with no exception. TestSummaryKeyContract passes 13.83 through the pipeline and checks that exact value appears in the output. 13.83 lands in the HIGH range and is unlikely to appear by accident.
Why np.isclose instead of exact equality for dither detection?
FRC dithering shows up as ±1 pixel changes between frames. But uint8 to float32 conversion is not perfectly exact for all values. Using abs_diff == 1.0 misses legitimate dither pixels at certain values. np.isclose with default tolerances (rtol=1e-5, atol=1e-8) catches values within 0.001 of 1.0 without false positives at 0.9 or 1.1.
Why four interface modes sharing one pipeline?
CLI enables automation and scripting. The interactive wizard reduces setup mistakes with FPS/ROI validation and camera examples. The GUI provides visual ROI confirmation with cv2.selectROI(). Docker eliminates OpenCV's native library dependencies. All four converge on run_analysis(), so a bug fixed in CLI testing is fixed for the GUI too.
Why PDF reports over interactive dashboards?
I optimized for shareability. PDFs are self-contained, printable, work offline, and make it straightforward to compare displays or send results to someone else. The nine-page structure is deterministic: same input always produces the same report.
Why conservative overall risk (any HIGH = overall HIGH)?
Eye strain doesn't average out. A display with perfect uniformity and zero dithering but 60 Hz PWM flicker is still unusable. One bad dimension ruins the experience. The scoring maps LOW=0, MODERATE=1, HIGH=2. Overall is MODERATE if the average exceeds 0.5 and no category is HIGH.
Tech Stack
Language
Python 3.8+
Core implementation with src layout via hatchling build backend
Computer Vision
OpenCV 4.8+
Video I/O, frame extraction, ROI selection, BGR color pipeline
scikit-image 0.21+
CIELAB conversion via rgb2lab (correct CIE D65 white point, unlike OpenCV's variant)
Scientific Computing
NumPy 1.24+
Welford accumulator arrays (float64), np.isclose for dither detection, np.hanning for FFT windowing
SciPy 1.10+
rfft/rfftfreq for flicker frequency detection (faster than NumPy's FFT for large N)
pandas 2.0+
Per-frame metrics DataFrame, CSV/JSON export, .describe().T for report statistics
Visualization
Matplotlib 3.7+
Nine-page PDF via PdfPages backend, heatmaps (inferno/coolwarm), FFT spectrum plots
Infrastructure
Docker
Two images: headless CLI (python:3.12-slim) and GUI (noVNC via Xvfb + x11vnc + websockify)
uv
Dependency management with frozen lockfile; COPY --from=ghcr.io/astral-sh/uv in Dockerfiles
GitHub Actions
CI pipeline: uv sync, pytest across Python versions
Testing & Quality
pytest 7.0+
154 tests across 10 files; conftest fixtures provide synthetic frames without real video
ruff
Linter and formatter replacing flake8, black, isort, and pyupgrade in one binary
mypy
Gradual static type checking with per-file overrides for Tkinter in gui.py
Challenges
Per-pixel temporal analysis requires O(N×H×W) memory. A 30-second 60fps 1080p video needs 3.6 GB.
I replaced the frame list with PixelStatsAccumulator using Welford's two-delta update. Memory dropped to ~16 MB for 1080p regardless of video length. The 1% accuracy difference against batch np.std() is well within the risk threshold boundaries (0.5, 2.0).
FFT on finite recordings produces spectral leakage when the signal doesn't contain integer cycle counts. A 60 Hz signal can smear across 55–65 Hz, causing wrong frequency classification.
I apply a Hanning window before the transform, which tapers the signal to zero at both ends. The window attenuates amplitude, so I correct with (2.0 / window_sum) * dominant_mag. The integration test confirms ±1 Hz frequency accuracy and 15% amplitude accuracy against a known 10 Hz synthetic signal.
Dictionary field renames between AnalysisSummary.to_dict() and calculate_eye_strain_risk() fail silently. The .get(field, 0) fallback returns zero, producing a wrong risk rating with no exception.
I wrote TestSummaryKeyContract with three tests: verify all 13 consumed fields exist, and pass sentinel value 13.83 through the pipeline to confirm it appears in the output string. 13.83 is in the HIGH range and won't match any accidental default.
FRC dither detection needs to count ±1 pixel changes, but uint8-to-float32 conversion loses precision at some values. Using abs_diff == 1.0 misses real dither pixels.
I switched to np.isclose(abs_diff, 1.0, rtol=1e-5, atol=1e-8), which catches values within 0.001 of 1.0 without false positives from actual motion artifacts at 0.9 or 1.1.
ROI preview calls cv2.imshow(), which crashes in Docker, SSH, and CI environments where no display server exists.
I added _has_display_environment() that checks DISPLAY, WAYLAND_DISPLAY, and MIR_SOCKET before attempting GUI operations. For container users who still need visual ROI selection, I built a second Docker image with Xvfb + x11vnc + websockify, accessible via browser at port 6080.
OpenCV reads BGR, scikit-image expects RGB normalized to [0,1] for CIELAB, and OpenCV's own CIELAB conversion uses a non-standard D65 white point normalization.
I made the conversion chain explicit: BGR to RGB, divide by 255.0 to float64, then skimage.color.rgb2lab. I chose scikit-image specifically because it implements the CIE 1976 standard exactly, making block-mean StdDev values interpretable in absolute perceptual terms.
Outcomes
I process multi-minute 4K video using 16 MB for pixel statistics regardless of duration, via Welford's two-delta online algorithm in PixelStatsAccumulator
Flicker detection accuracy validated against synthetic signals: frequency within ±1 Hz, amplitude within 15%, using Hanning-windowed FFT with coherent gain correction
TestSummaryKeyContract catches silent field-rename bugs before they ship. 13 fields verified across both calculate_eye_strain_risk() and generate_pdf_report()
154 tests across 10 files. test_eye_strain_risk.py alone has 41 tests covering every threshold boundary for all four risk categories
Nine-page PDF generated headlessly via matplotlib PdfPages: risk summary, descriptive stats, temporal metrics, spatial/color plots, FFT spectrum, worst-case frames, CIELAB heatmaps, per-pixel stability map
No external service dependencies. The tool runs fully offline and produces deterministic output for the same input and configuration
Next Step
Investigate your display
If you’re experiencing eye strain, headaches, or fatigue, use this tool to objectively measure your display’s flicker and dithering artifacts.