on-scope
Add the empty-query guard
Return 400 when q is missing or empty; un-skip the waiting test.
LEARN handed you a grounded picture of reality. ANALYZE turns that pile of facts into a single decision: sort every gap into the right bucket, rate the handful that are actually yours to do now, and pick the one most valuable bounded unit to execute next. Not three. One.
Every turn of the loop has five steps — LEARN → ANALYZE → EXECUTE one bounded unit → VERIFY at the real boundary → DECIDE. This lesson is the second step. ANALYZE is the moment between seeing reality and changing it: you take the gaps LEARN surfaced and decide, on purpose, the single next thing to do.
It has three small moves, in order. First you classify each gap into one of four buckets — is it on-scope work, a do-now blocker, something that needs the user, or plainly out of scope? Second, for the gaps that survive (the on-scope and do-now ones), you rate each candidate unit on five quick axes: Fit, Risk, Proof, Blocker, Next. Third, you pick exactly one — the most valuable bounded unit — and hand it to EXECUTE.
The discipline that makes this hard is the word one. When you can see ten things wrong, every instinct says "I'll just fix them all while I'm here". The loop forbids that. A pass executes a single bounded unit so that VERIFY can prove that one change at the real boundary, and so the human watching can tell exactly what moved. Batch five fixes together and a red test tells you nothing about which fix broke it.
Continuing the running example from the last lesson: in the RHG service, LEARN already confirmed the gap — /search returns every row on an empty query, the handler is in api.py:42, and there's a skipped test test_empty_query_returns_400 waiting. ANALYZE's job now is not to find more gaps. It's to decide which single unit to ship first.
Think of it like… a triage nurse in a full emergency room. Ten people are waiting and all of them want help. The nurse doesn't treat anyone yet — she sorts. Who is critical and must be seen now, who can wait, who is in the wrong department entirely, who needs a doctor's call she can't make herself. Only after the sort does one patient go through the door. Where the analogy breaks: the nurse runs many rooms at once; an ANALYZE pass commits to a single patient, because the loop wants one provable change per turn.
A bounded unit is the smallest change that is independently verifiable at the real boundary. The bound is what lets VERIFY (lesson 6) make a clean claim: this test went from skipped to green, this endpoint now returns 400, nothing else moved. If a unit bundles two behaviours, a failure is ambiguous — you can't tell which half regressed without unpicking it. Picking one is therefore not productivity advice; it's what keeps the proof gate meaningful and the loop debuggable.
ANALYZE is allowed to reason only over grounded facts — the picture LEARN established. It does not go look again (that was LEARN) and it does not edit (that's EXECUTE). If a rating needs a fact you don't have, that's a signal to drop back to LEARN, not to guess. Keeping the steps separate is what stops "analysis" from quietly becoming "I started fixing it".
A finished ANALYZE pass produces one chosen unit and a short, legible rationale: which bucket each gap landed in, how the top candidates rated, and why this one won. That rationale is observability — it lands in LOOP-LOG.md so the human can read the decision without re-deriving it (lesson 7).
Classifying means dropping each gap into exactly one of four boxes. The boxes are not about how big or hard the work is — they're about whether it's yours to do, and whether it's now. Here they are.
On-scope: does the done-when contract (lesson 2) require it? If yes, and it isn't blocking anything urgent, it's normal queued work. Do-now: would the current unit be impossible or untrustworthy without it? A broken test runner, a missing dependency, a red baseline you'd otherwise mistake for your own breakage — these jump ahead because they poison VERIFY. Needs-user: is there a genuine fork only the human can settle — a product choice, an irreversible action, an ambiguity scope didn't pin down? Then it's a handoff, decision-ready (lesson 9), not a guess. Out-of-scope: is it real but outside this goal? Park it where it won't be lost, and move on.
"Out of scope" doesn't mean "ignore". A parked item is written down (a backlog note, a follow-up ticket) so the observability trail stays honest and nothing real evaporates. The loop is allowed to not do something; it is not allowed to pretend it didn't see it.
Everything else runs AFK. The whole suite's contract is that the human only observes — they read the log, they don't execute. The single exception is a needs-user fork: the loop pauses and hands a decision-ready question over (never a half-built guess). That handoff is the one place a human is in the path, and it's deliberately rare.
Now do the classifying yourself. Below is a triage board whose four columns are the four buckets. The cards are the exact gaps LEARN confirmed on the RHG task, plus a couple of tempting distractions. Drag each card — or press its arrow — into the bucket where it belongs. There's no "in progress" here; classifying is a sort, and every gap lands in precisely one bucket.
The counts at the top stay honest: how many gaps are actionable now (on-scope + do-now), and how many you've set aside (needs-user + out-of-scope). Only the actionable ones move on to rating.
Think of it like… sorting the morning mail into four trays before you open anything. Bills to pay, something urgent that can't wait, a letter you must ask your partner about, junk for the recycling. You don't act on any of it while sorting — you just make sure each piece is in the right tray. Acting comes after, one piece at a time.
Each gap is one object with a col field constrained to the four buckets. The columns on screen are not the source of truth — the array is — so a card physically cannot sit in two buckets and the counts can never drift. This mirrors the discipline exactly: classification is a function from a gap to a single category, not a vibe. The "actionable now" count sums the first two columns, because those are the only gaps that earn a rating in the next section.
Where this picks up from lesson 3: LEARN's board ended with observations confirmed; this board takes those same items and assigns each a bucket. Confirm first (LEARN), classify second (ANALYZE) — the two boards are the two steps, in order.
The whole step is a funnel. Many gaps go in; they get sorted into buckets; only the actionable ones get rated; and exactly one comes out the bottom as the chosen unit. Everything else is either parked, handed off, or queued for a later pass.
The one rule
ANALYZE always ends with a single chosen unit. If a pass ends with "and also these four", it didn't finish analyzing — it just made a to-do list, and to-do lists don't survive contact with the proof gate.
Once a gap is in an actionable bucket, you rate it on five quick axes. They're chosen so a single glance tells you whether a unit is a good next move — not just whether it's worth doing eventually, but whether it's worth doing now, and whether you'll be able to prove it when you're done.
Fit — how squarely the unit serves the done-when. A unit that directly satisfies a contract line scores high; a tangential nice-to-have scores low even if it's tempting. Risk — the chance it breaks something else or balloons past its bound (high risk is bad, so it counts against). Proof — how cleanly VERIFY can confirm it at the real boundary: a unit with a waiting test scores high; one whose only proof is "looks right" scores low. Blocker — is there anything that stops it starting right now (a missing fact, a needs-user fork, an unrun test suite)? A live blocker can veto an otherwise great unit. Next — how much finishing it unblocks afterwards; a small unit that clears the path for three more punches above its weight.
Beginners rate units by excitement ("this would be cool"). The rubric forces the unglamorous questions instead: can I prove it, will it break the build, is it even unblocked. A high-Fit unit with a live Blocker is not the next move — the blocker is. That re-ordering is most of the value of rating at all.
You're not adding the axes into a precise number; you're using them to make the trade-off legible. A unit with stellar Fit but no Proof should make you nervous, and the rubric is what surfaces that before you've sunk a pass into it. The scorecard in the next section makes the weighing tangible.
Triage left three actionable candidates on the RHG task. Here they are as cards, each rated on the rubric. Think of them like variants of one decision — same component (the next move), different flavours — laid side by side so you can compare on purpose. Use the buttons to sort by an axis; the card that wins on that axis lights up. The one that wins overall carries a green badge.
on-scope
Add the empty-query guard
Return 400 when q is missing or empty; un-skip the waiting test.
out-of-scope-ish
Rewrite the search index layer
Make index.lookup reject short queries at the data layer too.
needs-user
Pick the status code: 400 vs 422
A product call on which error code the API should return.
Each card carries its raw ratings in data attributes (data-fit, data-risk, data-proof). Sorting re-orders the cards by the chosen axis and lights the leader — exactly the way a component gallery lets you compare one variant across a size axis. The "overall" sort uses a simple legible score: Fit and Proof add, Risk subtracts, and a live Blocker caps the result. Card A wins not because it's the most ambitious but because it's the most shippable next: high Fit, high Proof, zero Blocker.
Card B is the classic scope-creep trap: a real improvement that balloons risk and proves nothing the guard doesn't. Card C is genuinely valuable but lives in needs-user — it's a handoff, not this pass's unit. Rating doesn't just find the winner; it tells you why each loser is parked or handed off, which is the rationale that lands in the log.
The cards showed fixed ratings. Now you turn the dials. Drag the three sliders to rate a unit on Fit, Risk, and Proof; the gauge and verdict update live. Try the presets to feel how a real candidate scores — and watch what a live Blocker does to even a high-Fit unit.
Think of it like… a sound desk. Push Fit and Proof up and the level rises into the green; push Risk up and it drops back toward red. A blocker is the mute switch — flip it and it doesn't matter how good the mix is, nothing comes out until you clear it.
The score is deliberately simple so you can predict it: value = Fit + Proof − Risk, clamped to 0–100. Above ~120 raw it reads "pick now" (green), in the middle "maybe", and low "defer". The Blocker slider is special — flip it on and the verdict is forced to defer regardless of the rest, because a live blocker means the unit literally cannot start. That hard veto encodes the rubric's sharpest rule: a blocker beats a high score every time.
function score(fit, risk, proof, blocked) { if (blocked) return { v: 0, verdict: 'defer' }; // hard veto const v = clamp(fit + proof - risk, 0, 200); return { v, verdict: v >= 120 ? 'go' : v >= 70 ? 'maybe' : 'defer' }; }
Move the dials to the guard preset and you'll land deep in the green; move to the index-rewrite preset and high Risk plus low Proof drag it down to defer — the same verdicts the cards reached, now under your own hands.
Sometimes the decision isn't "which gap" but "how big a bite". For the empty-query fix there are at least three reasonable units, each a different size of the same change. None is wrong; each makes a different bargain between how fast it ships, how much it proves, and how much risk it carries. Lay them side by side and choose the bound on purpose. Hover or focus a card to bring it forward.
Add one early return for empty q, un-skip the one waiting test. The smallest provable unit.
# api.py — the whole change if not (q := request.args.get("q", "").strip()): return jsonify(error="empty query"), 400
Pros
Cons
Extract a require_query() helper and call it from /search now, ready for siblings later.
def require_query(args): q = args.get("q", "").strip() if not q: abort(400, "empty query") return q
Pros
Cons
Sweep all six query endpoints and add validation to each in one pass.
# touches 6 handlers + 6 new tests for ep in (search, suggest, facet, related, recent, popular): add_guard(ep) # one big unit
Pros
Cons
For this pass I most need to…
All three units would improve RHG. The tiebreaker is the proof gate: unit A maps one-to-one onto an existing test, so VERIFY can make an unambiguous claim. Unit C bundles six behaviours, so one failure is a mystery and the human watching can't tell what moved. The loop's preference for the smallest provable bite isn't timidity — it's what keeps every pass debuggable and every claim honest. Unit C isn't rejected; it's re-shaped into six queued units, one per sibling endpoint, each its own future pass.
Unit B extracts a helper before a second caller exists, which is speculative generality — risk with no present payoff. If LEARN had shown two endpoints already needing the guard today, B's reuse would be real and its rating would climb. The rubric only credits reuse you can point at, not reuse you imagine.
Two of the five axes do most of the picking, so it helps to plot just them. Put Fit on one side and Risk on the other and you get four quadrants. The one you want is the top-left: high fit, low risk — high value, unlikely to bite. That's where the empty-query guard lands.
Top-left (high fit, low risk) — pick it now; this is the sweet spot. Top-right (high fit, high risk) — the work matters but the bite is too big or too dangerous; split it into smaller provable units and pick the safest slice. Bottom-left (low fit, low risk) — harmless filler; it's safe but it doesn't move the done-when, so it waits. Bottom-right (low fit, high risk) — avoid; lots of danger for little value. The matrix is a fast first pass; Proof, Blocker and Next then break ties among the survivors.
It scored low Fit (the guard already satisfies the contract line) and high Risk (it touches every caller with no waiting test). That's the avoid quadrant — not because it's a bad idea forever, but because as this pass's unit it's the worst trade on the board.
You've picked the unit: add the empty-query guard. But even one unit can be built more than one way, and ANALYZE is also where you choose the approach. Switch between three ways to implement the guard; the diagram and the rating note update together so you can feel each trade-off before EXECUTE touches a line.
The guard is three lines at the top of /search. The change lives exactly where its one effect is.
A @require_query wrapper holds the check; you opt routes in one at a time. Reuse without a global blast radius.
Validation runs on every request before any route. Powerful — and exactly why it's risky: it touches endpoints that have no q at all.
/upload that shouldn't be touched.Choosing how to build the unit is still analysis — it changes Risk and Proof, so it belongs before any code is written. All three diagrams ship in one inline SVG; the tablist swaps which <g data-diagram> is shown and rewrites the caption and aria-label, with arrow-key roving focus per the WAI-ARIA tabs pattern. For RHG, approach A wins the same way unit A did: highest Fit and Proof, lowest Risk. The decorator only becomes the right call once a real second caller exists; the middleware overshoots the scope and drags in routes that have no query at all.
This is the habit ANALYZE exists to enforce, so look at it head-on. On the left, the loop's way: pick one unit, prove it, then go again. On the right, the tempting way: grab everything at once. They feel similar at the start and diverge completely at the proof gate.
Why one wins
A bounded unit is the largest change VERIFY can still make an unambiguous claim about. Cross that line and a red result stops telling you anything — and the human watching the log can no longer see what moved. "Pick one" is the price of a meaningful proof gate.
The decision is made: ship the empty-query guard. ANALYZE's last act is to shape that one unit into a tiny plan EXECUTE can follow — what it does, the steps, the risks, and the exit bar that proves it's done. Click along the strip to read each phase. Notice the whole plan is for one unit; this is a bounded change, not a project.
Think of it like… a recipe card for a single dish, not the whole dinner. Ingredients, three steps, the one way to know it's cooked (the test goes green). You don't plan the entire menu — you plan the one plate going out next.
Goal: See the bug with your own eyes before fixing it, so EXECUTE starts from a confirmed failure, not a belief.
GET /search?q= and confirm it returns all rowspytest -q to confirm the baseline is green (11 passed)test_empty_query_returns_400Goal: One early return in api.py that rejects a missing or empty q with a 400 — and nothing else.
q with a default and .strip() — catch both None and ""400 with a short error body when it's emptyif q is None lets ?q= through. Mitigation: the grounded fact from LEARN — strip and test for falsy.
Goal: Turn the waiting spec into a live check, so the change has a real boundary to be judged against.
@skip from test_empty_query_returns_400?q= (empty string) alongside the missing-param caseq are coveredGoal: Hand a unit to VERIFY that proves itself at the real boundary — the measurable gate that says "done", not "looks done".
test_empty_query_returns_400 goes from skipped → green/search?q= returns 400, not all rowsNotice the last phase isn't "ship it" — it's "clear the exit bar". The exit bar is a measurable gate: a specific test going from skipped to green, the rest of the suite staying green, the endpoint returning 400. "It looks right" is not a gate. Shaping the unit this way at ANALYZE time means EXECUTE knows precisely what success is, and VERIFY (lesson 6) has an unambiguous thing to prove. The milestone bar is a small tablist driven by per-step state; in a real run those states would reflect the live tracker, not the plan as written.
Reproduce → add guard → un-skip test → prove is not four units; it's the internal shape of one bounded change. Each step is a read or a minimal edit that builds to a single provable outcome. That's the difference between a plan for a unit and a project plan.
ANALYZE is the hinge between looking and doing. It only runs after LEARN has grounded the facts, and it must finish before EXECUTE touches anything — because EXECUTE needs exactly one chosen unit to act on. Here's the whole cycle with ANALYZE lit, and the one thing it must hand forward.
An ANALYZE pass doesn't change a single file — it produces a decision and writes it down. Here's what that looks like as the note that lands in the observability log: each gap's bucket, the ratings of the live candidates, and the one unit chosen, with its reason. The human can read this and know the next move without re-deriving it.
# ANALYZE — classify, rate, pick classify: empty-q guard ................. on-scope # done-when line tests don't run? (they do) .... n/a # baseline green 400 vs 422 status code ........ needs-user # product call → handoff rewrite search index .......... out-of-scope # parked: backlog#214 rate (Fit / Risk / Proof): A · add guard ................. 92 / 18 / 95 # blocker: none B · index rewrite ............ 40 / 78 / 35 # blocker: many callers C · status-code call ......... 55 / 22 / 50 # blocker: human-only pick: A — add the empty-query guard (inline) # highest Fit + Proof, zero blocker, maps to the waiting test. # C handed off as decision-ready; B parked. no files changed.
The ANALYZE rationale lives in the run's observability file — typically LOOP-LOG.md at the repo root. Read it with sed -n '1,40p' LOOP-LOG.md or jump to the latest pass with grep -n "ANALYZE" LOOP-LOG.md | tail -1. It is append-only: each pass adds its classify / rate / pick block so the decision history is auditable.
Crucially there are no edit commands here — an ANALYZE pass is pure decision. If you ever see a file change attributed to ANALYZE, the step bled into EXECUTE; that's the boundary the loop keeps clean (lesson 5). And any external fact a rating leaned on is grounded the one allowed way — the Bright Data CLI (lesson 11), never memory, never WebSearch/WebFetch.
Tie it together on the RHG task, start to finish — the ANALYZE part only. LEARN handed over a grounded picture with several loose threads. Watch a careful ANALYZE pass turn that pile into a single, defensible next move.
The empty-query guard is squarely on-scope — it's the done-when line itself. "Which status code, 400 or 422?" is a product call, so it's needs-user. "Rewrite the search index" is real but unrelated to this goal, so it's out-of-scope — parked as a backlog note, not dropped. The baseline being green means there's no do-now blocker. Four gaps, four boxes.
Only the on-scope guard truly survives as a unit this pass. Rated: Fit 92 (it is the contract line), Risk 18 (one handler, no callers affected), Proof 95 (a waiting test makes verification trivial), Blocker none, Next high (closes the core gap). The index rewrite rates the opposite way and the status-code call is a handoff — neither is this pass's unit.
The guard wins on every axis that matters for a next move: high value, low risk, trivially provable, unblocked. Pick it — and resist the pull to also fix the five sibling endpoints "while I'm here". Those become their own queued units. One unit goes to EXECUTE.
The status-code question goes to the human as a decision-ready handoff (400 vs 422, with the trade-off stated) so EXECUTE isn't blocked guessing. The index rewrite is parked in the backlog. Nothing real is dropped; everything is either chosen, handed off, or recorded.
What ANALYZE produced
One chosen unit (the inline empty-query guard), three routed gaps (one handed off, one parked, one already-fine baseline), and a one-paragraph rationale in the log. Zero files changed. EXECUTE now has a single, bounded, provable thing to do — and the human can see exactly why. That is an ANALYZE pass done right.
The whole point is a legible decision the human reads without re-doing it (the observability the suite runs on — lesson 7). Decision only; every rating has a reason, every routed gap has a destination.
# ANALYZE result — fix/empty-query chosen unit : add empty-query guard (inline, /search) rationale : Fit 92 · Risk 18 · Proof 95 · blocker none handoff : status code 400 vs 422 → decision-ready to user parked : index-layer rewrite → backlog#214 deferred : 5 sibling endpoints → one unit each, later passes files changed: 0 # ANALYZE never edits
Recall beats re-reading. Answer each from memory before you peek — the option you pick grades instantly, with a note on why. No tells in the formatting; the answers are spread around on purpose.
Q1What are the three moves of an ANALYZE pass, in order?
Q2A gap is a real product decision only the human can make. Which bucket?
Q3Why does an ANALYZE pass pick only one bounded unit?
Q4A unit has high Fit but a live Blocker. What does the rubric say?
Q5What does a finished ANALYZE pass change in the repo?