← Back to projects

AlchemizeCV

AI-powered resume generation pipeline from job descriptions.

AlchemizeCV turns resume writing into a pipeline problem instead of a prompt problem. You store experience once, paste a job description (or import it), and a four-phase generator produces a tailored, ATS-friendly resume while streaming progress live.

At a glance (from the codebase)

Metric What it looks like
Scale ~146K LOC across TypeScript, Python, and Go
Core backend services/resume (FastAPI + async SQLAlchemy + Playwright rendering)
Deterministic code facts services/portfolio (Go + tree-sitter + GitHub API)
Shipped frontend React + Vike (apps/react-web)
Pipeline model Four phases under services/resume/alchemizeresume/generation/phases/

Measured numbers (2026-01-16)

  • Python requirement (resume service package): >= 3.13,< 3.14 (pyproject.toml).
  • cloc (shipped scope: services/, apps/react-web/, packages/, tests/): 94,502 LOC across 702 files.
  • cloc (full repo, excluding .git,node_modules,dist,build,.venv): 146,376 LOC across 1,169 files (includes archived apps/web/).
  • Services: 2 (services/resume, services/portfolio).
  • Resume templates: 16 total template files, 9 CSS variants (services/resume/alchemizeresume/templates/).
  • Python test files: 101 (tests/**/*.py); frontend test files: 15 (apps/react-web/tests/).

Problem

Tailoring a resume per job posting is a pile of repeated work: copy/paste keywords, re-order bullets, keep the story consistent, and avoid accidentally changing factual claims.

Most “AI resume tools” try to solve this with one giant prompt. That fails in predictable ways:

  • You can’t tell why a bullet exists or where it came from.
  • You can’t retry a single step without rerunning the whole thing.
  • You can’t track regression or cost without durable intermediate artifacts.

Constraints

  • Inspectable: every phase produces an artifact you can view, diff, and replay.
  • Deterministic where possible: caching keyed by content + config, not “vibes”.
  • Fast feedback: stream progress so the UI stays responsive during slow model calls.
  • Evidence-based project bullets: derive some content from code, not memory.

Solution (what shipped)

AlchemizeCV is intentionally split into deterministic and generative parts:

  • The resume service (services/resume) owns the four-phase pipeline, persistence, and PDF rendering.
  • The portfolio service (services/portfolio) produces deterministic “code facts” (AST-derived project signals) that the pipeline can safely reference.

The generation pipeline lives in services/resume/alchemizeresume/generation/ and is implemented as four phases:

  1. Raw (phases/raw.py)
  2. Synthesis (phases/synthesis.py)
  3. Pruning (phases/pruner.py)
  4. Bundle/assembly (phases/bundler.py)

Artifacts are stored so failures are debuggable and runs are replayable without redoing everything.

Architecture

flowchart TB
  Browser[Browser] -->|HTTPS| Caddy["Caddy<br/>TLS + routing"]
 
  Caddy --> FE["Frontend<br/>React + Vike"]
  Caddy --> RESUME["Resume Service<br/>FastAPI (async)"]
  Caddy --> PORT["Portfolio Service<br/>Go + tree-sitter"]
  Browser --> EXT["Chrome extension (MV3)"]
 
  RESUME --> DB[(PostgreSQL)]
  RESUME --> OR["OpenRouter (LLM gateway)"]
  PORT --> GH["GitHub API"]
  PORT --> OR
 
  FE <-->|WebSocket| RESUME
  EXT -->|Import jobs / automation loops| RESUME
flowchart LR
  JD["Job description"] --> P1["Raw: structure + candidates"]
  P1 --> P2["Synthesis: summary + skills"]
  P2 --> P3["Prune: enforce counts, remove redundancy"]
  P3 --> P4["Bundle: final markdown document"]
  P4 --> PDF["Render PDF (Playwright templates)"]

Evidence (placeholders)

  • Screenshot (TODO): case-studies/alchemizecv/pipeline-progress.png
    • Capture: an in-progress generation run showing phase transitions (Raw → Synthesis → Prune → Bundle) and live progress.
    • Alt text: “Resume generation pipeline showing phase transitions and live progress updates.”
    • Why it matters: supports the claim that the pipeline is phase-structured and observable.
  • Screenshot (TODO): case-studies/alchemizecv/artifacts-view.png
    • Capture: a view where you can see saved intermediate artifacts for a run (phase outputs).
    • Alt text: “Generation run artifacts listing per-phase outputs.”
    • Why it matters: supports the claim that phase artifacts persist for debugging and replay.
  • Screenshot (TODO): case-studies/alchemizecv/rendered-pdf.png
    • Capture: a rendered PDF preview (with sensitive info redacted) showing layout stability.
    • Alt text: “Rendered resume PDF preview with consistent layout.”
    • Why it matters: supports the claim that PDF rendering is a first-class subsystem, not an afterthought.
  • Screenshot (TODO): case-studies/alchemizecv/github-import.png
    • Capture: the “import project from GitHub” flow (or the resulting project summary) that uses code-derived facts.
    • Alt text: “GitHub project import flow producing project data derived from code.”
    • Why it matters: supports the claim that project bullets can be backed by code-derived artifacts.

Deep dive: the interesting parts

1) “Pipelines, not prompts”

Each phase is small enough to reason about, and the boundaries are visible in the code:

  • Pipeline orchestration: generation/pipeline_runner.py + generation/db_pipeline.py
  • Phase-specific logic: generation/phases/*
  • Progress fan-out: generation/progress.py (feeds WebSocket updates in features/jobs/ws.py)

This structure makes it straightforward to:

  • retry a single phase,
  • compare outputs across models,
  • and debug “why did this bullet appear?” questions by following artifacts.

2) Deterministic caching keyed by content + config

Caching lives under services/resume/alchemizeresume/cache/ and is keyed by:

  • “what you asked for” (job + experience content), and
  • “how you asked” (model + settings).

If you change model/temperature or template inputs, you intentionally bust cache.

3) Code-derived project bullets (tree-sitter service)

The portfolio service (services/portfolio) is the deterministic counterweight to LLM output:

  • It parses source with tree-sitter and produces structured artifacts.
  • The resume service consumes those artifacts as input evidence when generating project bullets.

This is how AlchemizeCV avoids the classic “vague achievements you never actually built” failure mode.

4) Rendering is a real production subsystem

PDF generation is not “print a page and pray”. The resume service has dedicated rendering code under:

  • rendering/playwright_render.py
  • rendering/browser_pool.py
  • templates/ (HTML + CSS variants)

The point is predictable layout and repeatable output across runs.

Tech stack

Area Choices
Frontend React + Vike + TypeScript
Backend Python 3.13 + FastAPI (async) + SQLAlchemy async
Code analysis Go + tree-sitter (deterministic code facts)
Data PostgreSQL 16 + Alembic migrations
LLM gateway OpenRouter (multi-model), BYOK support
Rendering Playwright (headless Chromium)

Key decisions

  • Phase boundaries: smaller prompts and artifacts per phase make failures diagnosable.
  • Deterministic “code facts” service: tree-sitter artifacts ground project claims in source code structure.
  • Async FastAPI: mixes fan-out and sequential phases without thread-pool contention.
  • Playwright rendering: consistent PDF output across runs and templates.

Tradeoffs

  • More moving parts than a single-service “prompt and render” app.
  • Artifact persistence increases storage and schema complexity.
  • Code analysis services introduce extra operational surface area.

Security and reliability

  • Pipeline progress is streamed and persisted so long-running jobs are observable.
  • Phase-level artifacts make partial retries and post-mortems possible.

Testing and quality

  • Python tests live under tests/ and are structured with pytest markers (unit/integration/e2e).

What I’d point to in an interview

  • Feature-slice backend: features/* groups routes, service logic, and persistence per feature instead of a horizontal “controllers/services/models” split.
  • LLM system design: phase boundaries + caching + artifacts are what make the system diagnosable (not just “better prompts”).
  • Deterministic facts feeding generative output: tree-sitter artifacts are a reliability strategy, not a novelty.

Outcomes

  • Resume generation is now debuggable like a normal backend system: phase artifacts persist and can be replayed.
  • The UX stays fast even when models are slow because generation is streamed in phases over WebSocket.
  • Project bullets can be backed by code-derived facts instead of only narrative text.