Live in Production 2024-12 to Ongoing Solo

Case Study

AlchemizeCV

Job-search workflow platform that turns a master profile into tailored resumes, grounded project bullets, and tracked applications.

I got tired of rewriting my resume for every application. AlchemizeCV is the platform I built to fix that. It takes a master profile with freeform experience narratives, pairs it with a job description, and runs a four-phase LLM pipeline that produces tailored resume bullets grounded in what I actually did. No hallucinations. The output is a downloadable PDF from one of six templates, rendered by Playwright or Typst. A browser extension fills out ATS forms automatically using an LLM agent that issues CLI commands over a WebSocket relay.

Feature Slices

Runtimes

Languages

DB Migrations

Domains on Caddy

Python 3.13FastAPITypeScriptReact 19GoPostgreSQLVike SSRTypstPlaywrightDagger CIPodman QuadletCaddy

Why this matters

Proof first, implementation second.

I want the reader to see the product, understand the operator workflow, and then dig into the architecture and tradeoffs behind it.

Overview

What the product does and why I built it that way.

AlchemizeCV is a polyglot monorepo with seven runtimes: a Python API, a TypeScript web frontend, a TypeScript browser extension, and four Go services (control plane, CLI, agent harness, repo analysis). The API is organized into 11 feature slices, each owning its own router, service, models, and schemas. A single workflow directory adopts Clean Architecture for the resume generation pipeline because the domain logic there (phase inheritance, fork semantics, retry backoff) is complex enough to justify the abstraction cost. Profile CRUD does not get that treatment. Workers coordinate through PostgreSQL's `SELECT ... FOR UPDATE SKIP LOCKED` instead of a message broker. The whole stack runs on bare metal with rootless Podman containers managed by systemd Quadlet units, behind a Caddy reverse proxy that handles automatic TLS for five domains.

Highlights

Quick read

Four-phase LLM pipeline (RAW → SYNTHESIS → PRUNER → BUNDLING) with feedforward context that prevents duplicate bullets across resume sections

Browser extension with an LLM agent that fills ATS forms by issuing CLI commands over a WebSocket relay, using XML tags instead of function-calling schemas

Partial reruns from any pipeline phase. Inherited phases cost zero LLM calls. Users experiment with prompt changes without paying full generation cost.

Real-time WebSocket generation stream with automatic HTTP polling fallback when the socket drops mid-run

PostgreSQL SKIP LOCKED for worker coordination. No Redis, no Celery, no dead-letter queues to manage.

Dual PDF rendering: Playwright for HTML-based templates, Typst for native typeset output. Six templates across both renderers.

Project context extraction via tree-sitter AST parsing of GitHub repos, with user-curated context that feeds into generation

Architecture

How the system is structured.

The API uses feature slices as the dominant pattern. Each of the 11 slices owns its router, service, models, schemas, dependencies, and integration files. Cross-slice calls go through explicit integration.py files so the boundaries stay visible. The platform layer handles cross-cutting concerns like database sessions, LLM clients, rendering, security, and worker infrastructure. One workflow directory breaks from the slice pattern and uses full Clean Architecture with five layers, because the resume generation domain has enough complexity to earn it.

Feature Slices (features/)

Eleven vertical slices: auth, profile, education, experience, jobs, projects, settings, developer_tokens, github, resume_upload, templates. Each slice has a consistent internal structure. Adding a new field to profile touches three files in one directory, never another feature.

Platform Layer (platform/)

Shared horizontal infrastructure. Database session management, async SQLAlchemy engine, LLM client facade over Gemini/OpenAI/OpenRouter, Playwright browser pool, Typst CLI wrapper, Jinja2 template catalog, JWT security, structlog tracing with X-Request-ID correlation, and the lease-based worker primitives. Nothing in platform/ imports from features/.

Clean Architecture Workflow (workflows/resume_generation/)

Five layers from outside-in: transport (HTTP + WebSocket), orchestration (workflow coordination), application (use cases), infrastructure (run repository with SKIP LOCKED queries), domain (enums, errors, no framework imports). The domain layer defines phase/stage mappings and business rules without knowing about FastAPI or SQLAlchemy.

Generation Pipeline (features/jobs/generation/)

Pure functional module with pipeline.py, raw.py, synthesis.py, pruner.py, bundler.py. None of these files import from the jobs router or service. The pruner runs items sequentially with feedforward context so already-finalized bullets inform later selections. A parse-time assertion enforces exact bullet counts.

Composition Layer (composition/)

Explicit model registration in model_imports.py. All nine SQLAlchemy model groups are imported here before Alembic can see them. No ORM auto-discovery. Adding a model means adding one import line. The startup sequence in app/startup.py launches the browser pool, recovers stale runs, and spawns three asyncio worker tasks.

App Shell (app/)

FastAPI lifespan, dependency injection wiring, HTTP error mapping, and middleware. The central raise_mapped_http_error() function converts domain exceptions into HTTP status codes. Request-scoped AsyncSession injection via get_db_session(). CORS, request ID, and structured logging middleware.

Features

What it does and how each piece works.

Core Pipeline

Four-Phase Resume Generation

RAW generates 12-16 candidate bullets per experience/project item grounded in the user's freeform narrative. SYNTHESIS produces a professional summary and skills taxonomy from all raw bullets in one LLM call. PRUNER selects exactly target_bullets per item using feedforward context that carries already-finalized selections. BUNDLING assembles the final ResumeBundleV1 with no LLM call. Each phase writes a GenerationStep row with the full output payload as JSON.

Frontend

Real-Time Generation Stream

The frontend connects via WebSocket to receive typed event frames during generation. A Redux-style reducer tracks current phase, per-phase state, LLM call metadata, and token usage. An inline LLM call viewer shows prompt/completion pairs while generation runs. If the WebSocket drops, the hook falls back to HTTP polling at 2-second intervals without user action.

Automation

Browser Automation Agent

A Go agent harness spawns ephemeral LLM sessions that drive a browser through CLI commands in XML tags. The model sees a sliding window of the last 3 commit frames (before/after state snapshots with diffs). A failure tracker breaks command loops without requiring the model to remember its own history. Protocol errors from malformed XML recover in 1-2 correction turns.

Data Enrichment

Project Context from Source Code

Users import a GitHub repo and the repo-analysis service extracts structured context using tree-sitter AST parsing. The extracted context is stored separately from user-curated context. Only the curated version feeds into generation, so users control exactly what evidence the pipeline has. Artifact generation produces enriched project narratives with technical claims.

Output

Dual PDF Rendering

Six resume templates across two rendering backends. Playwright renders HTML+CSS templates through headless Chromium. Typst compiles .typ source files with a JSON data sidecar. A manifest-driven template catalog maps each template to its supported renderers. The Typst path is faster and more memory-efficient. The Playwright path is more flexible for custom styling.

Core Pipeline

Partial Reruns and Fork Semantics

Users can regenerate from any pipeline phase. A fork run inherits GenerationStep payloads from the parent via source_run_id pointers. Inherited phases show as "skipped" in the UI and cost zero LLM calls. The fork uses the original ProfileSnapshot, not current profile data, so reruns are deterministic against the same inputs.

Onboarding

Resume Import and Profile Bootstrap

Users upload an existing PDF resume. A background worker parses it into a structured draft via the repo-analysis service. The draft is presented for review, then merged into the user's profile, education, and experience tables. This means nobody starts from a blank profile.

Workflow

Job Application Tracking

Jobs list with status pills, bulk archive/delete via multi-select, and automatic status transitions on generation completion. Bulk operations required a separate UI state layer for selection management and three additional API endpoints. Without them, managing 50+ applications would be impractical.

Data flow

How data moves through the system.

Data flows linearly from job submission to signed-off PDF. Every LLM call is persisted with prompt and response previews. GenerationEvent rows serve double duty as the live WebSocket stream source and the post-hoc audit log. The pruner phase is the most structurally interesting: it runs items sequentially, carrying a feedforward context dict where finished items have state "pruned" and pending items have state "raw_candidates".

User pastes a job description. POST /api/jobs/ writes a Job row with status "backlog" and the job_text. No LLM call yet.

User clicks Generate. POST /api/resume-runs/ snapshots the current profile/experience/project data into snapshot_json on a GenerationRun row with status QUEUED.

The generation worker claims the run via SELECT ... FOR UPDATE SKIP LOCKED. Sets status RUNNING, records worker_id, starts a lease heartbeat every 30 seconds.

Phase 1 (RAW): One LLM call per experience/project item. Each call gets the item narrative, job description, and prompt template. Produces 12-16 candidate bullets per item. Bullet quality audit flags weak and overblown verbs.

Phase 2 (SYNTHESIS): One LLM call total. All raw bullets from all items go into a single prompt. Produces a professional summary and a three-bucket skills taxonomy.

Phase 3 (PRUNER): Sequential LLM calls in render order. Each item sees already-finalized bullets from prior items as feedforward context. parse_pruner_response() raises ValueError if the count is wrong. Retry up to 2 times.

Phase 4 (BUNDLING): No LLM call. Assembles ResumeBundleV1 from pruner selections, synthesis output, profile metadata, and education. Writes bundle JSON to Job.bundle_text. Sets Job.status to "ready_to_apply".

User triggers render. POST /api/jobs/{label}/render enqueues a RenderRun. The render worker resolves the template from the manifest, runs Jinja2 for HTML, then Playwright or Typst for PDF. Stores rendered_html and rendered_pdf on the Job row.

Throughout all phases, GenerationEvent rows record every transition. Eight event types: started, phase_start, phase_progress, phase_end, completed, failed, llm_call, llm_response. These stream live over WebSocket and are queryable after the fact.

Tradeoffs

Decisions that had real alternatives.

Freeform narrative over structured experience fields

I could have asked users to fill in structured fields for accomplishments, technologies, and metrics. I chose a freeform narrative instead. Structured fields constrain what the LLM can find. A narrative lets users write about their work in their own words and lets the model identify what's relevant to a specific role. The downside is real: users who write detailed narratives get better bullets than users who write brief summaries.

PostgreSQL SKIP LOCKED over a message broker

Adding Redis or RabbitMQ means another service, another failure mode, another thing to monitor. PostgreSQL is already running. SKIP LOCKED gives exclusive claim semantics with no deadlocks. The tradeoff is that worker observability requires querying the database rather than reading a broker dashboard. For three workers on a solo project, that is the right call.

XML tags over native LLM function calling

Gemini supports structured tool calls. I chose not to use them. Function-call schemas consumed 3,000-5,000 tokens per spawn before the model saw a single pixel of the form. Two XML tags are easier to parse, easier to recover from when malformed, and cheaper per request. I gave up type-safety at the protocol boundary and got reliability and lower cost.

Feature slices for CRUD, Clean Architecture for the pipeline

I could have refactored all 11 feature slices into CA layers. Profile CRUD does not justify a repository abstraction. The resume generation workflow has a real domain model with phase inheritance, fork semantics, and retry backoff. CA earns its keep there. Both patterns coexist in the same codebase, applied where they fit.

Vike over Next.js for the SSR split

I needed exactly three rendering behaviors scoped to route groups: prerendered marketing pages, public pages, and CSR-only app pages. Next.js App Router would have required Server Components, which adds complexity without benefit for a heavily auth-gated application. Vike's +config.ts inheritance is explicit. The cost is a smaller ecosystem and less documentation.

Bare metal with Podman Quadlet over managed containers

A managed Kubernetes cluster or ECS would handle orchestration and auto-scaling. Podman Quadlet generates systemd unit files from .container files. Restarts, dependency ordering, and boot persistence are handled by systemd. No control plane to maintain. The tradeoff is manual scaling, but for a single-host deployment this is simpler to reason about.

Challenges

Problems that required specific solutions.

Problem

Chrome's MV3 service workers terminate after 30 seconds of inactivity, killing the extension's WebSocket connection to the control plane mid-session.

Solution

The control plane sends a session.heartbeat message every 30 seconds. The extension responds with session.heartbeat.ack. The WebSocket message handler counts as "activity" from Chrome's perspective, keeping the worker alive. If the worker does terminate, the GatewayClient reconnects with exponential backoff and re-runs the pairing handshake statelessly.

Problem

MV3 extensions can't set Authorization headers on WebSocket upgrade requests, and cookies don't attach to cross-origin WebSocket connections. Standard auth patterns don't work.

Solution

The extension passes the JWT as the second item in the Sec-WebSocket-Protocol array: ["auth", "<token>"]. The Go control plane reads it from the upgrade header, validates it against the API's /auth/me endpoint, and closes with code 4001 on failure. Auth is proxied without the control plane implementing JWT verification itself.

Problem

The LLM frequently returned the wrong number of bullets in the pruner phase. Prompt instructions to "select exactly N" are treated as soft guidance by the model.

Solution

parse_pruner_response() raises ValueError if the count does not match expected_count. The pipeline catches this, emits a PRUNER_PARSE_OR_PROVIDER_ERROR event, and retries up to 2 times. The parse-time assertion is what makes the count reliable, not the prompt. Wrong-count outputs never silently produce a malformed resume.

Problem

The LLM agent in browser automation sessions would repeat failed commands indefinitely, clicking the same broken element because earlier turns had scrolled off the context window.

Solution

Two mechanisms. A failureTracker records each failed command by exact text. If the same command fails twice with the same exit code, it returns "[REPEATED FAILURE: try a different approach]" without executing. And the CommitFrame NoChange flag tells the model when a click produced no visible state change, so it knows to try something different.

Problem

Playwright and Typst produce different PDF output for the same resume data. Some templates support both renderers, some support only one. Keeping the rendering paths consistent is hard.

Solution

A manifest-driven template catalog maps each template to its supported renderers and correct template files. resolve_render_plan() normalizes the template value and renderer selection. Playwright renders HTML+CSS via Jinja2. Typst reads a JSON sidecar at compile time. Both paths produce bytes stored as Job.rendered_pdf. The manifest makes adding templates a config change, not a code change.

Outcomes

What the work produced.

The full journey from job description to downloadable PDF takes under 5 minutes for a configured user with an existing profile.

Every LLM call in every run is persisted with prompt preview and response preview. I can trace any final bullet back to the source narrative and the exact model response that produced it.

Partial reruns from any phase complete in seconds for inherited phases. Only re-executing phases cost LLM time and money.

The browser automation agent fills ATS forms that previously required 20-40 minutes of manual copy-paste. The trace viewer shows exactly what the agent did after the session completes.

Worker races have never produced a duplicate-claimed run since SKIP LOCKED was adopted. No broker, no dead-letter queue, no operational overhead.

The Dagger CI pipeline runs all lint, test, and build checks for six runtimes in parallel. Total CI time is bounded by the slowest runtime, not the sum.

Caddy automatic TLS eliminates certificate management entirely. All five domains renew without intervention.

See the rest of the work.

Each case study covers architecture, tradeoffs, and delivery detail. The skills page shows how these technologies connect across projects.

All Case Studies Get in Touch