reverse-headhunter · status: operational architecture cost build
Personal build · job-search infrastructure

I got tired of reading job boards.
So I built the execution engine that reads them for me.

An automated monitoring system that watches 100+ companies across five industries and dozens of European cities, decides which of their postings fit a specified profile, and surfaces only those — for about five cents a decision, and nothing at all to keep watching.

pipeline: detect → decide
01Detect A script polls every company's careers page daily: via their hiring-platform APIs where they exist, parsing pages where they don't. LLM cost: €0.00
02Diff Each run compares against yesterday's snapshot. Only genuinely new postings move forward, typically a handful, not an ocean. LLM cost: €0.00
03Filter A mechanical title filter bins obvious non-fits (engineers, artists, the org-chart long tail) before any model is involved. LLM cost: €0.00
04Vet Each survivor is read in full and scored 0–100 against a written manifesto. The only step that uses model tokens. cost: ~€0.05
05Brief Matches above threshold land in a briefing with scores, reasoning, and an apply-or-not posture. Priority hits interrupt immediately. output only
Manual search
You browse boards, read listings, judge each one, track what you've seen.
Your time: all of it
Generic AI agent
It fetches with little purpose, and you steer it, wait on it, and sift its output.
Your time: babysitting + filtering
This system
Searching and judging run on their own.
I read a briefing and decide.
My time: judgment + applying
01

The insight the system hinges on

Most job-search automation tries to make a keyword search clever: to make it express fit. That fails categorically: a search engine matches words, not meaning, so it returns noise dressed as signal. I learned this the hard way, across many runs, before drawing the line that defines the architecture.

Separate the dumb work from the judgment. Retrieval should be broad, cheap, and mechanical. Meaning is the only thing worth spending an expensive reasoning model on, and only on the few candidates that survive the cheap filters. Every design decision below follows from putting those two things on opposite sides of a wall.

100+
companies under watch, across 5 industries & dozens of European cities
7
strong matches surfaced from the 59 active sources on the first pass.
€0.00
cost to keep watching, per company, per day
~€0.05
typical cost of one fit-decision (up to €0.14 on a hard call) without per-session savings
02

One system, two branches

The system retrieves from two kinds of source. They share everything downstream of retrieval: the same manifesto, the same scoring, the same briefing format. They differ only in how they find postings in the first place.

Active · primary

Careers-page monitoring

Polls named target companies directly. Because the targets are pre-vetted at the company level, retrieval collapses to a daily diff: fetch, compare, surface what's new.

  • Reads hiring-platform APIs (Lever, Personio, Teamtailor…) where they exist: clean structured data, no scraping
  • Parses pages directly where they don't
  • Zero external dependencies; runs as a scheduled local job
  • Detection latency under 24 hours, at no token cost
Built · held in reserve

LinkedIn scraping

The original branch, for open-ended search when the target set is unknown: cast broad role-family nets, let the judgment layer do all the filtering. Correct for discovery; demoted once the target set was mapped.

  • Broad nets, deliberately dumb. Intelligence lives in vetting, never the query
  • A funnel (dedupe → cull → cap) protects the expensive step
  • Retired, not deleted: the right tool for a different task

Why this matters for an operations role: the hard part wasn't the code. It was deciding where meaning gets adjudicated, what each layer is allowed to do, and how to keep a cheap layer cheap without letting anything real slip through. The system is mostly a set of enforced boundaries.

03

How I operate a system I don’t trust blindly

The system has been wrong, and catching that is part of running it. Below are a few operating principles, each one learned because an early version failed and the fix belonged in the design.

scoring drift
When the same role scored differently across runs, the cause wasn't the model. It was an underspecified rule. The durable fix is tightening the specification, not reaching for a bigger model. Vague instructions, not weak tools, are usually the problem.
silent misses
A monitor's one unforgivable failure is a confident false all-clear, reporting "nothing new" when it simply failed to see. So the system is built to distrust its own silence and flag suspicious quiet, rather than assume success.
two-tier spec
Cost is a design choice, not an afterthought. The system is structured so the easy decisions stay cheap and only genuinely hard calls draw on expensive reasoning, which is why a fit-decision averages around five cents rather than many times that.
division of labour
The system makes factual calls silently and with authority: does this experience transfer, is this a sensible stretch. It escalates only subjective calls a human must own, batched into one block, never dribbled. Knowing which is which is most of the value.
04

What it costs, measured honestly

Real runs were measured, counting the actual characters read and written at each stage, so these figures are grounded rather than guessed. Token counts are an approximation (chars ÷ 4) at published Sonnet rates, and the system runs on a subscription, so the costs are notional computational ones rather than billed.

stageunitcost
Detection: poll every company, dailyper day€0.00
Diff + mechanical filterper run€0.00
Vetting, typical decisionper role~ €0.05
Vetting, difficult decisionper role≤ €0.14
Vetting, cumulative decisionper role < €0.02
LinkedIn branch: one task = 100+ listings processed for one briefingper 100 listings€0.32–0.53

The structure is the point: watching is free, deciding is cheap, and the only real cost scales with the number of judgments actually made, not with how many companies are monitored. Adding companies costs nothing until they post something worth reading.

Per-session savings: in steady state the vetting session processes a handful of new postings. A single-posting session has the worst per-vet cost at ~€0.05–€0.14. The cost per extra vet is under €0.02. A session of ~4 new postings costs roughly €0.08–€0.16.

05

How it was built

I designed the architecture and made every judgment call: the layer boundaries, the scoring rubric, the cost structure, the calibration fixes. I directed Claude to do the implementation: the monitoring script, the platform integrations, the diagnostics. The collaboration is the point, not a caveat. The role of the person is to hold the problem top-down, decide what good looks like, and keep the machine honest. That's the same disposition I'd bring to operationalising a research org's execution: absorb what the specialists are doing, build the systems that turn it into coordinated, measurable progress, and make it legible to everyone who needs it.

Stack, for the curious: a zero-dependency Node script for retrieval and diffing; hiring-platform APIs over scraping wherever possible; a written manifesto as the scoring authority; scheduled local execution; the reasoning layer reserved strictly for judgment. Roughly 100+ companies configured, rolling out in waves; built and calibrated over a sustained series of sessions.

06

Where it could go as a product

I built this for myself, but I designed it thinking about whether the problem it solves is mine alone. It isn't. So here's the shipped reality, and the product it could become.

Built & running The engine on this page. Autonomous monitoring, manifesto-scored vetting, easy decisions. Running now, against my own search.
↓ the same engine, as a product ↓
01 Free conversational taste
A no-commitment first experience that shows the system’s value before asking for anything, and is designed to cost little to give away.
the hook
02 Voice-interview onboarding
A richer, more honest profile than a written form could produce, handled to keep the user in control of their data. This is where the real differentiation lives.
the moat
03 The paid service
Two tiers: one finds and surfaces roles, the other also helps you apply, offered at the moment a strong match appears.
initial revenue
04 B2B headhunting
The same profile inverted for talent teams: the higher-value transaction, and where a frontier-AI approach sets the service apart.
the larger value

The problem it solves: people want a better job and stay stuck, because the effort to find one competes with the energy their current job already takes.

The honest read: a timing play. Defensibility would come from accumulated data and trust over time, not from the technology itself.