EXECUTE built one bounded unit. Now comes the step that decides whether the cycle is allowed to end: VERIFY. You do not get to say it works — you have to show it at the real boundary. This lesson is about the three gates a turn must clear before it closes: Scope, Proof, and Course — and about the one rule that holds the whole method together: a claim is not evidence.
Every turn of the loop runs five steps — LEARN → ANALYZE → EXECUTE one bounded unit → VERIFY at the real boundary → DECIDE. You have walked the first three. This lesson is the fourth, and it is the one that protects all the others: VERIFY.
Here is the failure VERIFY exists to prevent. You make a change, it looks right, and you write "done — the endpoint now returns 200." You believe it. You move on. But you never actually called the endpoint. The change had a typo, or hit a config you forgot, or worked on your machine and nowhere else. The word "done" was a claim, and a claim is just a sentence. It carries no proof inside it.
VERIFY replaces the claim with evidence. Evidence is what you get when you go to the real boundary and make the thing happen: you run the program and read the output; you hit the endpoint and look at the status code and the body; you feed it the hardest input you can think of and watch what it does. The difference between "it should work" and "I ran it, here is the output" is the entire difference between a hope and a fact.
To keep this honest, a turn of the loop is not allowed to close until it passes three gates, each of which demands evidence, not words:
Think of it like… a science fair. It is not enough to say your baking-soda volcano erupts — the judge stands there while you pour the vinegar in and watches it foam over. Saying "trust me, it works at home" gets you nothing. The eruption, performed live, in front of the judge, is the proof. Where the analogy breaks: at a fair you demo once; in the loop you re-run the proof after every single change, because the last change could have quietly broken it.
VERIFY gathers and judges evidence; DECIDE acts on that judgment. After VERIFY you are holding a verdict — pass or fail, per gate. DECIDE is the branch that follows: all gates green → close the turn and either pick the next unit or converge; a gate red → loop again (re-LEARN with the new state and run another bounded unit), or, if the failure is a genuine user-only fork, hand off. Keeping them separate stops you from rationalizing a failing result into a "pass" just because you want to move on.
The boundary is wherever the truth actually lives for this change: the process exit code and stdout/stderr for a script; the HTTP status and response body for an endpoint; the rendered DOM for a UI; the row that did or did not get written for a database mutation. A test double, a mock, a stub, or a static type-check models that boundary — usefully — but a model can diverge from reality (the mock returns what you told it to; the real service does not). So those are supplements: run them, keep them, but the gate is only satisfied when reality itself was observed.
Sometimes you genuinely cannot reach the boundary inside the turn — no credentials for the live service, a hardware device that is not attached, a destructive action that needs human sign-off. The rule is explicit and strict: stop before claiming. You do not write "done" over an unverified change. You surface the blocker (a decision-ready handoff) and let the human unblock it. An honest "blocked, here is exactly what I need" is worth far more than a confident "done" that is false.
The gates are not a menu you pick from — they are in series, like three locked doors on one corridor. The turn can only leave when all three are open. If any one is BLOCKED, the turn stays inside: you loop again, fix the gap, and re-check. Below, watch a turn try to pass. The first attempt trips on Proof; after a re-run, all three flip to PASS.
Order is deliberate. Scope first: there is no point proving a change at the boundary if it does not even address the agreed done-when — you would be proving the wrong thing. Proof second: once scope says "this is the right target," you confirm reality agrees. Course last: it is the packaging/observability deliverable that rides along with every turn in this method, so it is checked at the end, when the substance is settled. A red gate short-circuits the rest exactly like the guard clauses you saw in the auth flow — no reason to evaluate door three if door two is locked.
A blocked gate is not an error state to be embarrassed about; it is the system working. It means the turn is honestly reporting "not yet" instead of lying with "done." The DECIDE step reads the block and routes: loop again for something the agent can fix, or hand off for a genuine user-only fork. The cost of one honest BLOCKED is one more pass; the cost of one false PASS can be a shipped bug, a broken build, or a lie in the log that the next person trusts.
Here is the whole of VERIFY on one canvas. The bounded unit comes out of EXECUTE. It is checked against the scope, proven at the real boundary, and packaged in the course. The output is a single verdict that DECIDE then acts on. Notice the dashed arrow: any change made to fix a gate sends you back to re-prove — verification is not a checkbox you tick once.
Back in lesson 2 you wrote a scope contract: a list of measurable "done-when" conditions. The Scope Gate is where that list gets cashed in. You go down it line by line and, for each condition, you answer one question: is it met — and what is the evidence?
The word measurable is doing all the work. "The login is better" cannot pass a gate — there is nothing to point at. "Logging in with a wrong password returns HTTP 401 and the message bad_credentials" can: you either see that response or you do not. The Scope Gate turns the contract into a checklist where every row is either a green tick with proof attached, or it is not done yet.
Think of it like… a pre-flight checklist. The pilot does not "feel ready" — they read each line aloud and physically confirm it: flaps set, fuel quantity confirmed, doors armed. A line you cannot confirm is a line that is not done, and the plane does not leave the gate. Same here: a done-when you cannot point evidence at is not met.
It cuts both ways. A turn fails Scope if a done-when is unmet — but you also do not get credit for things outside the contract. If you "improved" five other files nobody asked about, that is not a pass; at best it is noise, at worst it is unverified risk you smuggled in. The gate measures the turn against the agreed list, no more and no less. That is what keeps a bounded unit bounded.
Each done-when carries its own evidence: a command's output, a screenshot of the rendered state, the exact HTTP exchange, a row count from the database. "Everything works" is a claim about the whole; the gate wants a proof about each part. When the evidence is attached row-by-row, the next reader can re-check any single line without re-running the world.
The Proof Gate is the one this whole course keeps coming back to. It has a single demand: the change was proven at the real boundary. Not argued. Not assumed. Not "the types check, so it must be fine." You went to where the truth lives and you made the thing happen, with your own command, and you read what came back.
Below is a focused deep-dive on the Proof Gate — what counts as the real boundary, how mocks and type-checks fit (they supplement, they never replace), what to do when the boundary is out of reach, and the questions people actually ask. There is a live evidence meter: each kind of proof you collect fills it up, and you can watch a hope turn into a fact.
Gate deep-dive · The Proof Gate — proving a change at the real boundary
A claim is a sentence; proof is an observation. The Proof Gate only opens when you have observed the real boundary. The boundary is wherever the truth of this change actually lives: for a script it is the exit code and the text it prints; for an endpoint it is the HTTP status and the body; for a page it is what actually renders; for a database write it is whether the row is really there.
You collect proof by doing the real thing — and the strongest proof is feeding it the hardest input you can devise, not the easy happy path. A type-check and a mock are welcome, but they sit under the real-boundary proof: they catch cheap mistakes early and let you go fast, yet a green type-check has never once returned a 200 from a live endpoint. They supplement; they do not substitute.
Think of it like… proving a bridge holds. A blueprint that pencils out (the type-check) and a scale model in a wind tunnel (the mock) are real, useful work — but nobody opens the bridge until a loaded truck has driven across the actual span. The truck on the real bridge is the Proof Gate.
Rank what you can collect, weakest to strongest: (1) "it should work" — a claim, zero evidence; (2) a static type-check / lint — proves shape, not behavior; (3) a unit test against a mock — proves your logic given assumptions you wrote; (4) the real boundary, happy path — you actually ran it once and it worked; (5) the real boundary, hardest input — you ran it against the nastiest case and it held. The Proof Gate is satisfied at level 4 and convincing at level 5. Levels 2–3 are supplements that make 4–5 faster to reach, never replacements for them.
Good proof is reproducible: the same command at the same boundary yields the same observation. That is why "I clicked around and it seemed fine" is weak — nobody can replay it. A captured command + its output is proof anyone can re-run, which is also what makes the re-run-after-change rule (later in this lesson) cheap to obey.
Start at a bare claim. Add each kind of evidence and watch the meter rise. The gate stays BLOCKED until the real boundary itself has been hit — type-checks and mocks alone can never push it to PROVEN.
Three views of the same gate: the check that hits the boundary and refuses to lie, the policy you configure, and how the loop uses it before closing a turn.
def proof_gate(change, boundary): # supplements first — cheap, fast, but NOT the gate if not typecheck(change): return blocked("types fail") if not unit_tests(change): return blocked("unit tests fail") # the gate itself: hit the REAL boundary if boundary.unreachable: return halt("cannot reach boundary — ask, do not claim") observed = boundary.run(change, inputs=hardest_cases()) if observed == expected: return proven(evidence=observed) # real output captured return blocked("boundary disagreed", evidence=observed)
# what the Proof Gate accepts as real-boundary evidence proof: require_real_boundary: true # mocks never satisfy on their own supplements: - typecheck # run, but not sufficient - unit_tests_with_mocks # run, but not sufficient on_unreachable: "halt_and_handoff" # stop before claiming rerun_after_change: true # any edit re-proves prefer_inputs: hardest_first # nastiest case, not happy path
def verify(unit): if not scope_gate(unit): return Verdict.LOOP if not proof_gate(unit, real_boundary): return Verdict.LOOP # or HANDOFF if halted if not course_gate(unit): return Verdict.LOOP return Verdict.CLOSE # all three green # the human never runs this — they READ the verdict + evidence
Find it yourself: grep -rn "def proof_gate" loop/
WebSearch/WebFetch. If a claim depends on a current fact, you fetch that fact from the real source as your evidence.
The fastest way to feel the difference between a claim and proof is to put them side by side. Below is a turn's worth of statements. Each one is written as a claim by default — the kind of confident sentence that sounds done. Flip a line to proof and watch it rewrite into the real-boundary evidence that would actually satisfy the gate. The verdict at the bottom only flips to PROVEN when every line is backed by real evidence.
Read the two forms again. The claim form uses state-of-being verbs — "returns", "is rejected", "is written" — describing how the world supposedly is. The proof form uses past-tense actions with captured output — "ran curl … → HTTP/1.1 200", "wrote then read back row id 4821". The grammar itself tells you whether a boundary was touched: an assertion about the present is a claim; a recorded observation of an action is proof.
One proven line does not rescue three claims — a turn is only as verified as its weakest statement. The aggregate verdict mirrors the Scope Gate: it is the AND of every row, so a single unbacked claim keeps the whole turn at NOT PROVEN. That is deliberate; partial proof is exactly how a false "done" sneaks through.
Everything to the left of the line below is a model of reality — your code, your types, your mocks, your reasoning. They are useful and you should use them. But they are still maps. The Proof Gate lives exactly on the line where the map ends and the territory begins. Proof is what you can only get by stepping over that line and observing the real thing.
The three gates form a decision you can walk like a flowchart. Each gate is a yes/no question, and the first "no" peels off — not to a hard failure, but back into the loop to fix that gap, or to a handoff if it is a user-only fork. Pick a turn below and press Next to trace it gate by gate. Watch how a clean run reaches Close the turn and a tripped gate routes back to Loop again.
Start here
A bounded unit reaches VERIFY
Press Next to follow the all proven turn through the three gates. Switch the turn above to see how a gate sends it back to loop.
The flowchart is a chain of early returns, cheapest meaningful check first, and the first failure short-circuits the rest. A failure does not throw — it returns a verdict the DECIDE step routes on. The "unreachable boundary" case is special: it does not return LOOP (there is nothing the agent can re-run to fix it) — it halts and hands off.
function verifyTurn(unit) { if (!scopeMet(unit)) return loop('scope: a done-when is unmet'); if (boundaryUnreachable()) return handoff('cannot prove — ask, do not claim'); if (!provenAtBoundary(unit)) return loop('proof: boundary disagreed'); if (!courseBuilt(unit)) return loop('course: not built yet'); return close(unit); // all three gates green }
When a turn finishes, the loop emits a status — the at-a-glance answer to "can this turn close?" It is the same shape as a service health dashboard: a few headline numbers up top (gates passed, evidence count, re-runs), then a per-gate table with a colored badge for each. Remember the AFK rule from this course: the human reads this status — they do not run anything to produce it. Hit Refresh for a fresh reading, or turn on Live to watch a turn that is still settling.
Verify status — turn #18 · "rate-limit the login endpoint"
loop · bounded unit · gates checked at the real boundary
| Gate | Status | Evidence | Last check |
|---|
A single array of gate objects drives both the table and the rollup pill. The overall banner is derived from the worst gate, exactly like a service rollup: any blocked → red "cannot close"; any in-progress → amber; an awaiting-handoff → blue; only all-pass → green "may close." The rollup can never be greener than its worst gate, which is the whole point — you cannot paint a turn "done" while a gate is red.
This is an observability surface, not a control panel. In the AFK model (lesson 7) the loop produces this status into LOOP-LOG.md; the human reads it to know where things stand and only intervenes on a genuine user-only fork. The "last checked" stamp is an aria-live region; every badge pairs its color with a text label so status is never color-only.
Now the cautionary tale — the exact failure the Proof Gate exists to catch. A turn was marked done on a claim, the gate was skipped, and the bug shipped. Here is the whole thing as an incident report: a timeline from the false "done" to the recovery, the root cause dug out with five whys, the blast radius, and a checklist of fixes you can tick off. Read it top to bottom — the moment the proof was skipped is lit in red.
Olive dots are routine, clay is a warning sign, red is the moment proof was skipped, green is recovery.
Unit built — the rate limiter
EXECUTE adds the 5-per-minute cap to POST /auth/login. The diff looks correct and reads cleanly.
Type-check + unit tests pass
All supplements are green. The mocked limiter returns 429 on the 6th call in the test. Looks finished.
Marked "done" — boundary never hit
The log records done: limiter returns 429. But nobody ever called the live endpoint. "Done" was a claim resting on the mock.
Next turn depends on it
Turn #15 starts work that assumes the limiter is live. It re-LEARNs the real state first — and something does not add up.
downstreamProof Gate catches it
A real curl of the live endpoint returns 200 on the 6th attempt — the limiter is wired to the wrong middleware order and never runs. The "done" was false.
Fixed, then re-proven at the boundary
Middleware order corrected. A fresh curl now returns 429 on the 6th call. Evidence captured; the gate truly passes this time.
curl at 11:48.Keep asking "but why did that happen?" The first answer is a symptom; the fifth is the one worth fixing.
Why did the bug ship?
The turn was marked done while the live endpoint still returned 200 on the 6th attempt.
Why was it marked done?
"Done" was written from the passing unit tests — a claim — without anyone hitting the real boundary.
Why was the boundary not hit?
The mock was treated as a substitute for the real endpoint, not as a supplement to it.
Why was the mock trusted that far?
No step forced an actual real-boundary observation before "done" could be written. The Proof Gate was implicit, not enforced.
Root cause · why proof was optional
The verify step let a claim close a turn. There was no hard rule that real-boundary evidence — captured, re-runnable — is required before "done", and no rule to re-prove after a change. The fix is to make the Proof Gate mandatory and its evidence explicit.
Side roads shortened the fuse: the unit tests mocked the limiter at a layer above the middleware ordering bug, so they could never have caught it; the happy-path was the only case anyone pictured (no hardest-input attempt with 6 rapid calls against the live route); and the log template accepted free-text "done" with no evidence field, so a claim and a proof looked identical in the record.
Whoever wrote "done" acted reasonably given the tools — green tests felt like proof. A blameless fix changes the system: require captured real-boundary evidence before a turn can close, and re-run it after every change. Then "done" can only mean "proven," for everyone, every time.
46 min
"Done" was false for1
Downstream turn misled∞
Brute-force tries still open0
Shipped to users (caught first)Each fix turns this exact failure into one that cannot recur. Check items off — the bar tracks progress.
VERIFY does not stop at "the gates passed." A turn also gets an autoreview — an independent read of the change itself, line by line, the way a careful colleague reviews a pull request. This matters for one reason from this course: the reviewer is never the builder. The same agent that wrote the code does not get to bless it; a separate pass looks for what the builder, too close to it, would miss.
Below is that review of the very change from the incident above. Green lines are added, red removed; the pills along the top are the reviewer's one-glance read of risk; the clay dots are notes pinned to a line. Click any dotted line to read the note — and find the one marked blocking, because a blocking note keeps the turn open no matter how green the gates looked.
Caps password attempts at 5 / minute per IP on POST /auth/login. 1 file changed · +9 −2
Each clay-dotted line carries a note. 1 is blocking — it is the very bug the Proof Gate later caught. See if you can find it before reading on.
The Proof Gate answers "does the boundary do the right thing right now?" The autoreview answers a different question: "is this change good — safe, clear, free of latent traps — even where the gate happened to pass?" A change can pass every gate on the happy path and still carry a blocking flaw (here, the middleware order) that only an independent reader spots. That is why both run, and why a blocking review note holds the turn open.
This is a hard rule in the AFK model (lesson 9): the Validator is a different agent from the Executor. Self-review is structurally weak — you cannot see the blind spot you built from. Severity is explicit: blocking must be resolved before the turn closes; nit is optional polish; praise reinforces a good pattern so it is repeated.
The third gate is specific to how this method delivers: every turn that produces real work must also ship its full visual course — the multi-lesson, self-contained explanation of what was done — before the turn ends. Not sketched, not "to be written up later." Built, complete, in the same turn.
Why a gate and not a nice-to-have? Because "I'll document it later" is the same species of lie as "it should work." Later rarely comes, and when it does the context is gone. Forcing the course to exist now means the explanation is captured while the work is fresh — and it gives the human something real to read in the AFK model. The page you are reading is itself this gate being satisfied.
Think of it like… a chef who must plate and write the recipe card before the dish leaves the pass. A plate with no card means the next cook re-invents it from scratch. The card, written while the pan is still hot, is part of "done" — not a chore for some quieter day that never arrives.
It is checked like any other deliverable: the lessons exist as self-contained files, they open with no external requests, both language builds are present (EN + PT-BR in this suite), and the shared shell is intact. A half-built course fails the gate exactly as a half-proven change fails Proof. You will meet the engine that produces it in lesson 13.
In the AFK model the human's only job is observability — and the course is part of what they observe, alongside LOOP-LOG.md and review.md. Gating it per-turn keeps the explanation in lock-step with the work, so the record never drifts behind reality.
One rule deserves its own section because it is the one most often forgotten: after any change, re-run the proof. A proof is a photograph of one moment. The instant you edit anything — even a "trivial" one-line fix to satisfy a review note — that photograph is out of date. The change you just made to fix one thing can break the thing you already proved.
So verification is a loop, not a checkpoint. You prove, you adjust, you prove again — and you keep going around until a full pass requires no change at all. The cycle only ends when you run the proof and nothing needs touching afterward.
The re-run rule only stays cheap if your proof is a captured, replayable command rather than a one-off manual poke. "I ran curl … && echo $? and saw 200" can be re-run in one keystroke after the next edit; "I clicked around and it seemed fine" cannot. Investing in re-runnable evidence at first proof pays back every time you must re-prove.
The loop is guaranteed to end as long as each change strictly reduces the set of failing checks (no oscillation). If a change makes things worse or trades one failure for another indefinitely, that is itself a signal — re-LEARN, re-ANALYZE, possibly re-scope. The exit condition is sharp: a proof pass that triggered no follow-up change.
Here is the whole verify step as one readable function. Read it top to bottom: the three gates run in series, the real boundary is non-negotiable, an unreachable boundary halts instead of claiming, and the verdict — not a side effect — is what comes out. This is the shape every turn passes through before it is allowed to close.
def verify(unit, scope, boundary, course): evidence = [] # ── Gate 1 · Scope: every measurable done-when, with evidence ── for done_when in scope.conditions: result = check(done_when) # observe, don't assume evidence.append(result) if not result.met: return Verdict("LOOP", why=f"scope: {done_when} unmet", evidence=evidence) # ── Gate 2 · Proof: the REAL boundary, or stop and ask ── typecheck(unit); unit_tests(unit) # supplements — run, never sufficient if boundary.unreachable: return Verdict("HANDOFF", why="cannot reach boundary — ask, do not claim") observed = boundary.run(unit, inputs=hardest_cases()) # run it / hit it evidence.append(observed) if observed != expected: return Verdict("LOOP", why="proof: boundary disagreed", evidence=evidence) # ── Gate 3 · Course: the full visual course exists NOW ── if not course.built: return Verdict("LOOP", why="course: not built yet", evidence=evidence) return Verdict("CLOSE", evidence=evidence) # all three gates green → DECIDE closes
The verify step and its gates live under the loop's gates/ directory. To read the real thing rather than this teaching sketch:
# the verify entry point and each gate grep -rn "def verify" loop/ grep -rn "def proof_gate" loop/gates/ # the policy that says mocks never satisfy the Proof Gate sed -n '/^proof:/,/^[a-z]/p' loop/gates.yaml
If you remember a single line, make it if boundary.unreachable: return HANDOFF. That is the method refusing to lie: when it cannot prove, it asks instead of claiming. Every other line is bookkeeping around that one commitment.
Let's run one concrete turn all the way through, so the three gates stop being abstract. The unit: "add a 5-per-minute rate limit to the login endpoint." The scope contract from lesson 2 had three done-when lines.
Gate 1 · Scope. Three conditions: (a) the 6th attempt in a minute is refused; (b) a refusal returns 429 with too_many_attempts; (c) a fresh minute lets attempts through again. We go line by line. (a) and (b) we can test; (c) we can test. None is vague — each is checkable. Scope is satisfiable, so we move to Proof.
Gate 2 · Proof. The supplements pass — types are fine, the mocked unit test returns 429. Tempting to write "done." We do not. We hit the real endpoint with the hardest input: six rapid curl calls in one minute. The 6th returns… 200. The limiter never ran — wrong middleware order. The boundary disagreed with the mock. Proof is BLOCKED; verdict is LOOP.
The loop turns. We re-LEARN (the middleware order is the bug), make the one-line fix, and — because proof is perishable — re-run the six-call test. Now the 6th returns 429. We also re-check (c): wait out the window, the 7th call after the reset returns 200. Real evidence, captured, for every done-when. Proof is now PASS.
Gate 3 · Course. We build the lesson explaining the change — self-contained, both languages — before the turn ends. It exists now. Course is PASS.
Three greens. Only now does the turn close, and the log records "done" with the real curl output attached — a proof anyone can re-run, not a claim anyone has to trust.
The difference the whole lesson is about, made concrete — this is the log line for the closed turn:
# NOT this (a claim): done: limiter returns 429 on the 6th attempt # THIS (a proof — re-runnable evidence): done: limiter caps the 6th attempt scope: - 6th-in-window refused ✓ evidence: curl#6 → HTTP/1.1 429 - refusal body ✓ evidence: {"error":"too_many_attempts"} - window resets ✓ evidence: curl after 60s → HTTP/1.1 200 proof: boundary: POST /auth/login (the LIVE route, not a mock) command: for i in $(seq 1 6); do curl -s -o /dev/null -w "%{http_code}\n" ...; done observed: 200 200 200 200 200 429 # hardest input: 6 rapid calls reruns: 2 (after the middleware-order fix) course: lessons/0006-verify-gates.html built (EN + PT-BR)
Recall beats re-reading. Without scrolling up, answer these four. Each has exactly one best answer; pick it and the card tells you immediately whether it holds at a real boundary.
1. Your unit tests pass against a mock of the payment service. Under the Proof Gate, this counts as:
2. You cannot reach the live endpoint (no credentials). The correct move is to:
3. A one-line fix satisfies a review note after you already proved the change. You should:
4. Which log line would actually satisfy verification?