The Reverse Headhunter — Peter Schmidt

The insight the system hinges on

Most job-search automation tries to make a keyword search clever: to make it express fit. That fails categorically: a search engine matches words, not meaning, so it returns noise dressed as signal. I learned this the hard way, across many runs, before drawing the line that defines the architecture.

Separate the dumb work from the judgment. Retrieval should be broad, cheap, and mechanical. Meaning is the only thing worth spending an expensive reasoning model on, and only on the few candidates that survive the cheap filters. Every design decision below follows from putting those two things on opposite sides of a wall.

100+

companies under watch, across 5 industries & dozens of European cities

strong matches surfaced from the 59 active sources on the first pass.

€0.00

cost to keep watching, per company, per day

~€0.05

typical cost of one fit-decision (up to €0.14 on a hard call) without per-session savings

One system, two branches

The system retrieves from two kinds of source. They share everything downstream of retrieval: the same manifesto, the same scoring, the same briefing format. They differ only in how they find postings in the first place.

Active · primary

Careers-page monitoring

Polls named target companies directly. Because the targets are pre-vetted at the company level, retrieval collapses to a daily diff: fetch, compare, surface what's new.

Reads hiring-platform APIs (Lever, Personio, Teamtailor…) where they exist: clean structured data, no scraping
Parses pages directly where they don't
Zero external dependencies; runs as a scheduled local job
Detection latency under 24 hours, at no token cost

Built · held in reserve

LinkedIn scraping

The original branch, for open-ended search when the target set is unknown: cast broad role-family nets, let the judgment layer do all the filtering. Correct for discovery; demoted once the target set was mapped.

Broad nets, deliberately dumb. Intelligence lives in vetting, never the query
A funnel (dedupe → cull → cap) protects the expensive step
Retired, not deleted: the right tool for a different task

Why this matters for an operations role: the hard part wasn't the code. It was deciding where meaning gets adjudicated, what each layer is allowed to do, and how to keep a cheap layer cheap without letting anything real slip through. The system is mostly a set of enforced boundaries.

How I operate a system I don’t trust blindly

The system has been wrong, and catching that is part of running it. Below are a few operating principles, each one learned because an early version failed and the fix belonged in the design.

scoring drift

When the same role scored differently across runs, the cause wasn't the model. It was an underspecified rule. The durable fix is tightening the specification, not reaching for a bigger model. Vague instructions, not weak tools, are usually the problem.

silent misses

A monitor's one unforgivable failure is a confident false all-clear, reporting "nothing new" when it simply failed to see. So the system is built to distrust its own silence and flag suspicious quiet, rather than assume success.

two-tier spec

Cost is a design choice, not an afterthought. The system is structured so the easy decisions stay cheap and only genuinely hard calls draw on expensive reasoning, which is why a fit-decision averages around five cents rather than many times that.

division of labour

The system makes factual calls silently and with authority: does this experience transfer, is this a sensible stretch. It escalates only subjective calls a human must own, batched into one block, never dribbled. Knowing which is which is most of the value.

What it costs, measured honestly

Real runs were measured, counting the actual characters read and written at each stage, so these figures are grounded rather than guessed. Token counts are an approximation (chars ÷ 4) at published Sonnet rates, and the system runs on a subscription, so the costs are notional computational ones rather than billed.

stageunitcost

Detection: poll every company, dailyper day€0.00

Diff + mechanical filterper run€0.00

Vetting, typical decisionper role~ €0.05

Vetting, difficult decisionper role≤ €0.14

Vetting, cumulative decisionper role < €0.02

LinkedIn branch: one task = 100+ listings processed for one briefingper 100 listings€0.32–0.53

The structure is the point: watching is free, deciding is cheap, and the only real cost scales with the number of judgments actually made, not with how many companies are monitored. Adding companies costs nothing until they post something worth reading.

Per-session savings: in steady state the vetting session processes a handful of new postings. A single-posting session has the worst per-vet cost at ~€0.05–€0.14. The cost per extra vet is under €0.02. A session of ~4 new postings costs roughly €0.08–€0.16.

How it was built

I designed the architecture and made every judgment call: the layer boundaries, the scoring rubric, the cost structure, the calibration fixes. I directed Claude to do the implementation: the monitoring script, the platform integrations, the diagnostics. The collaboration is the point, not a caveat. The role of the person is to hold the problem top-down, decide what good looks like, and keep the machine honest. That's the same disposition I'd bring to operationalising a research org's execution: absorb what the specialists are doing, build the systems that turn it into coordinated, measurable progress, and make it legible to everyone who needs it.

Stack, for the curious: a zero-dependency Node script for retrieval and diffing; hiring-platform APIs over scraping wherever possible; a written manifesto as the scoring authority; scheduled local execution; the reasoning layer reserved strictly for judgment. Roughly 100+ companies configured, rolling out in waves; built and calibrated over a sustained series of sessions.

Where it could go as a product

I built this for myself, but I designed it thinking about whether the problem it solves is mine alone. It isn't. So here's the shipped reality, and the product it could become.

Built & running The engine on this page. Autonomous monitoring, manifesto-scored vetting, easy decisions. Running now, against my own search.

↓ the same engine, as a product ↓

01 Free conversational taste

A no-commitment first experience that shows the system’s value before asking for anything, and is designed to cost little to give away.

the hook

02 Voice-interview onboarding

A richer, more honest profile than a written form could produce, handled to keep the user in control of their data. This is where the real differentiation lives.

the moat

03 The paid service

Two tiers: one finds and surfaces roles, the other also helps you apply, offered at the moment a strong match appears.

initial revenue

04 B2B headhunting

The same profile inverted for talent teams: the higher-value transaction, and where a frontier-AI approach sets the service apart.

the larger value

The problem it solves: people want a better job and stay stuck, because the effort to find one competes with the energy their current job already takes.

The honest read: a timing play. Defensibility would come from accumulated data and trust over time, not from the technology itself.