An automated monitoring system that watches 100+ companies across five industries and dozens of European cities, decides which of their postings fit a specified profile, and surfaces only those — for about five cents a decision, and nothing at all to keep watching.
Most job-search automation tries to make a keyword search clever: to make it express fit. That fails categorically: a search engine matches words, not meaning, so it returns noise dressed as signal. I learned this the hard way, across many runs, before drawing the line that defines the architecture.
Separate the dumb work from the judgment. Retrieval should be broad, cheap, and mechanical. Meaning is the only thing worth spending an expensive reasoning model on, and only on the few candidates that survive the cheap filters. Every design decision below follows from putting those two things on opposite sides of a wall.
The system retrieves from two kinds of source. They share everything downstream of retrieval: the same manifesto, the same scoring, the same briefing format. They differ only in how they find postings in the first place.
Polls named target companies directly. Because the targets are pre-vetted at the company level, retrieval collapses to a daily diff: fetch, compare, surface what's new.
The original branch, for open-ended search when the target set is unknown: cast broad role-family nets, let the judgment layer do all the filtering. Correct for discovery; demoted once the target set was mapped.
Why this matters for an operations role: the hard part wasn't the code. It was deciding where meaning gets adjudicated, what each layer is allowed to do, and how to keep a cheap layer cheap without letting anything real slip through. The system is mostly a set of enforced boundaries.
The system has been wrong, and catching that is part of running it. Below are a few operating principles, each one learned because an early version failed and the fix belonged in the design.
Real runs were measured, counting the actual characters read and written at each stage, so these figures are grounded rather than guessed. Token counts are an approximation (chars ÷ 4) at published Sonnet rates, and the system runs on a subscription, so the costs are notional computational ones rather than billed.
The structure is the point: watching is free, deciding is cheap, and the only real cost scales with the number of judgments actually made, not with how many companies are monitored. Adding companies costs nothing until they post something worth reading.
Per-session savings: in steady state the vetting session processes a handful of new postings. A single-posting session has the worst per-vet cost at ~€0.05–€0.14. The cost per extra vet is under €0.02. A session of ~4 new postings costs roughly €0.08–€0.16.
I designed the architecture and made every judgment call: the layer boundaries, the scoring rubric, the cost structure, the calibration fixes. I directed Claude to do the implementation: the monitoring script, the platform integrations, the diagnostics. The collaboration is the point, not a caveat. The role of the person is to hold the problem top-down, decide what good looks like, and keep the machine honest. That's the same disposition I'd bring to operationalising a research org's execution: absorb what the specialists are doing, build the systems that turn it into coordinated, measurable progress, and make it legible to everyone who needs it.
Stack, for the curious: a zero-dependency Node script for retrieval and diffing; hiring-platform APIs over scraping wherever possible; a written manifesto as the scoring authority; scheduled local execution; the reasoning layer reserved strictly for judgment. Roughly 100+ companies configured, rolling out in waves; built and calibrated over a sustained series of sessions.
I built this for myself, but I designed it thinking about whether the problem it solves is mine alone. It isn't. So here's the shipped reality, and the product it could become.
The problem it solves: people want a better job and stay stuck, because the effort to find one competes with the energy their current job already takes.
The honest read: a timing play. Defensibility would come from accumulated data and trust over time, not from the technology itself.