Skip to main content
Matthew BobackBackend & Platform Engineer
Live in Production2024-12 to OngoingSolo

Case Study

AlchemizeCV

Job-search workflow platform that turns a master profile into tailored resumes, grounded project bullets, and tracked applications.

Built a job-search workflow platform that turns one master profile into tailored resumes, grounded project bullets, and tracked application runs through a replayable four-phase generation pipeline.

4

Pipeline Phases

3

Job-Hunt Surfaces

<1s

Render Time

2

Provider Paths

Context PruningContent HashingReact + VikePlaywright PoolAST RedactionWebSocket
AlchemizeCV marketing hero promising role-ready resumes from one master profile.

Why this matters

Proof first, implementation second.
I want the reader to see the product, understand the operator workflow, and then dig into the architecture and tradeoffs behind it.

I built Context-Aware Pruning into the pipeline to prevent the LLM from repeating the same accomplishments across multiple past roles

I dramatically reduced PDF render times by shifting from cold-start headless rendering to a warm, semaphore-limited Playwright browser pool

I shipped a deterministic Go microservice that extracts and redacts code context before generation, relying strictly on AST analysis without LLMs

Product Proof

Screens that show the system in context.

Authenticated AlchemizeCV profile builder with completion progress and section ordering.

Authenticated AlchemizeCV profile builder with completion progress and section ordering.

Authenticated recon discover screen for pending discoveries and recon runs.

Authenticated recon discover screen for pending discoveries and recon runs.

Authenticated API settings screen showing BYOK provider and model configuration.

Authenticated API settings screen showing BYOK provider and model configuration.

Overview

What the product does and why I built it that way.

I built AlchemizeCV because resume tailoring is only one piece of a bigger problem: people need a reusable profile, grounded project context, clear privacy boundaries, and a workflow that survives dozens of applications. The product manages this through a replayable four-phase pipeline whose core innovation is Context-Aware Pruning: as the LLM selects bullets for your most recent job, it passes that selection state forward, ensuring it never repeats the same capability for an older role.

Under the hood, the web app is a React 18 + Vike thin client, FastAPI owns the workflow with content-hashed artifact caching, a Go service uses tree-sitter for deterministic repository analysis and entropy-based secret redaction, and a warm Playwright browser pool delivers sub-second PDF rendering.

Highlights

Quick read

Context-Aware Pruning prevents redundancy by tracking what accomplishments have already been used across previous jobs.

A content-hashed generation cache allows instant partial reruns and rapid prompt iteration without redundant LLM costs.

A Playwright warm browser pool cuts PDF rendering time from 5–10s down to sub-second delivery under concurrent demand.

The product extends beyond resume generation into recon discovery, job tracking, and application runs.

Problem

Resume tailoring kept losing context, burning tokens, and repeating itself.

Tailoring a resume for every role is already expensive, but the deeper issue is context drift. Experience lives in one place, project evidence lives in repositories, and every application adds more manual copy, prompt tweaking, and second-guessing.

Worse, naive LLM generation tends to repeat your strongest accomplishments across every job you've had. I wanted a product that treats the whole job-search loop as a system: grounded project context, replayable generation runs with targeted caching, and LLM pruning that actually remembers what it already wrote.

Solution

I built a context-aware workflow powered by a four-phase generation pipeline.

AlchemizeCV combines a structured profile builder, deterministic GitHub-backed project analysis, an iterative generation pipeline. The core innovation is Context-Aware Pruning: the LLM passes an accumulating context string between experiences to ensure it produces a diverse, dense resume instead of repeating the same generic management bullets across four different roles.

The backend relies heavily on caching: content hashes and config hashes allow instant partial reruns without duplicating LLM costs. And under the hood, privacy boundaries are explicit. A Go service redacts secrets from code before it ever reaches an LLM, BYOK lets users route through Gemini or OpenRouter directly, and a Firefox extension proxies all automation through the backend to keep keys out of local browser storage.

Workflow

Onboard -> Ground -> Generate -> Apply
The product flow starts with profile onboarding and BYOK setup, grounds project evidence through GitHub imports and semantic context, generates artifacts through a replayable four-phase pipeline, and continues into cover letters and application runs.
PENDING
GENERATING
GENERATED
RENDERING
COMPLETE
FAILED

Key Endpoints

POST

/api/jobs/extract

Extract job requirements from a pasted URL or posting

POST

/api/jobs/:label/async

Start an async generation run for a job

GET

/api/jobs/:label/generate/status

Fetch current generation and rendering state

POST

/api/jobs/:label/render

Render the current bundle to PDF

POST

/api/projects/import/github

Import repositories and generate semantic project context

POST

/api/discover

Accept browser-extension discoveries into the review queue

Architecture

The system shape behind the product.

Polyglot workflow platform with a React 18 + Vike web app, FastAPI feature slices for profile/jobs/settings/applications, a Go tree-sitter analysis service for GitHub project context, and PostgreSQL-backed persistence for users, jobs, runs, and artifacts.

Ingress

Ingress

Caddy reverse proxy

HTTPS/TLS

Cookie + token boundaries

Web App

Web App

React 18

Vike routing

TypeScript UI + live preview

Resume API

Resume API

FastAPI feature slices

WebSocket progress

Profile / jobs / settings flows

Generation Pipeline

Generation Pipeline

Raw extraction

Synthesis

Context-aware pruning

Assembly + rendering

Code Context

Code Context

Go portfolio service

tree-sitter parser pool

Semantic digest artifacts

Data

Data

PostgreSQL 16

Run lineage + prompts

Rendered artifacts + settings

Product Surfaces

8 Shipped Capabilities

Guided Onboarding + Profile Builder

core

Nine-step onboarding, structured profile editing, and live completion guidance replace blank-state prompting.

GitHub Context Editor

integration

Repository imports generate semantic digests and let users curate exactly which code facts the model can see.

Replayable 4-Phase Pipeline

core

Runs persist intermediate artifacts so prompt changes and partial reruns stay inspectable.

Live Generation Observability

dx

WebSocket progress, phase visualization, and LLM call traces keep long-running jobs debuggable.

BYOK Provider Settings

security

Users configure Gemini or OpenRouter-backed models without relying on shared server keys.

Cover Letters + Application Runs

core

Generated job artifacts feed into tracked application runs instead of stopping at the PDF.

Recon Discovery Queue

integration

Extension-driven discovery surfaces pending jobs, recon runs, and events for later review.

Content-Hash Caching

performance

Generation phases cache outputs against inputs and config hashes, enabling instant, cost-free partial pipeline reruns.

Warm Browser-Pool Rendering

performance

A lazy-initialized Playwright browser pool drops PDF render times from 5–10s to sub-second delivery while capping memory usage.

LLM Backend Proxy

security

The Firefox extension proxies automation tasks through the FastAPI backend to keep API keys completely out of local browser storage.

Entropy-Based AST Redaction

security

The Go portfolio service uses tree-sitter analysis to detect and redact credentials before code context ever reaches an LLM.

Tradeoffs

The decisions worth calling out.

Why keep the generation pipeline split into four phases?

I wanted better output quality and better debugging. Separating extraction, synthesis, pruning, and assembly makes each prompt easier to tune and lets me persist intermediate artifacts for replay or inspection.

Single end-to-end promptTemplate-only generationClient-only prompting

Why build a token-aware Context Editor for project imports?

Project bullets are stronger when they are grounded in real code, but raw repository context is too large and too noisy. Semantic digests plus a user-facing context editor give me controllable evidence instead of blind prompt stuffing.

Direct README ingestionUncurated full-repo contextManual project entry only

Why use a React + Vike thin client over a heavier frontend architecture?

The product already has a complex backend workflow, so I kept the web app focused on task-specific UI, typed API access, and live status instead of duplicating domain logic in the browser.

Fully stateful SPA architectureServer-rendered forms onlyDesktop app

Why isolate rendering behind a shared browser pool?

PDF rendering has real cost. A semaphore-limited Playwright browser pool gives good throughput and warm performance without spinning up an uncontrolled number of Chromium instances under concurrent demand.

Spawn a browser per renderThird-party PDF APIClient-side rendering only

Tech Stack

What actually shipped the system.

Web App

React 18

Authenticated product UI and interactive editing surfaces

Vike

Route structure and SSR/client rendering split

TypeScript

Typed APIs, feature state, and UI logic

Tailwind CSS

Shared design primitives and responsive styling

Backend

Python 3.13

Workflow orchestration and feature-slice backend code

FastAPI

REST endpoints, auth surfaces, and WebSocket progress

SQLAlchemy Async

Async persistence for profiles, jobs, and runs

Playwright

HTML-to-PDF rendering and browser pool execution

Code Analysis

Go 1.25

Repository parsing and structured project context generation

tree-sitter

Incremental AST parsing across imported repositories

Parser pooling

Reuse language parsers across large repository imports

Integrations

OpenRouter

Model catalog access and multi-provider routing

Google Gemini

Direct provider path for BYOK users

GitHub App

Repository import and curated code evidence flows

Firefox Extension

Recon capture and guarded automation handoff

Challenges

What was hard and how I dealt with it.

LLM context windows collapse when profile, projects, and job context all compete for tokens

I split generation into phases, produced canonical semantic digests for projects, and let users curate context before the pruner applies its evolving-context snowball across the final selection pass.

Generation runs need to stay debuggable after prompts or settings change

I persist run lineage, prompt choices, and intermediate artifacts in the database so I can partially rerun from the affected phase instead of forcing a full restart from scratch.

PDF rendering can become a bottleneck under concurrent demand

I use a semaphore-limited shared browser pool with isolated contexts so rendering stays fast and bounded instead of spawning uncontrolled browser processes.

Users need privacy and provider control without turning the app into a billing proxy

I kept the provider model BYOK-first, added direct Gemini and OpenRouter paths, and separated web auth, WebSocket auth, and extension-token auth so each surface has a clear trust boundary.

Outcomes

What shipped and what improved.

I built Context-Aware Pruning into the pipeline to prevent the LLM from repeating the same accomplishments across multiple past roles

I dramatically reduced PDF render times by shifting from cold-start headless rendering to a warm, semaphore-limited Playwright browser pool

I shipped a deterministic Go microservice that extracts and redacts code context before generation, relying strictly on AST analysis without LLMs

I kept API costs low and debugging high by persisting intermediate generation artifacts and utilizing content-hash caching for instant reruns

I architected the browser extension to proxy all LLM requests through the backend, guaranteeing user API keys never leak to local storage

I expose live generation state through WebSocket progress, phase visualization, and LLM call detail rather than opaque loading spinners

Next Step

Explore the code

This case study focuses on the onboarding-to-application workflow, the replayable generation pipeline, the GitHub context model, and the polyglot services behind AlchemizeCV.