symptom patch
Hides the symptom by wrapping the crash. The 500 may stop showing, but the null deref is still there waiting.
Contract verdict: fails — no root cause.
LEARN looked. ANALYZE picked. Now you build — but exactly one thing, the single most valuable bounded unit, and nothing more. EXECUTE is not "type code until it seems done." It is a contract: read fully, find the root cause, do the simplest thing that meets the scope, add the check that proves it, run that proof, and log it. The discipline is in the bounding.
Every turn of the loop has five steps — LEARN → ANALYZE → EXECUTE one bounded unit → VERIFY at the real boundary → DECIDE. This lesson is the third step, the one where work actually happens. Everything before it was preparation; everything after it is judgment. EXECUTE is where you make a change.
The trap is obvious once named: when you finally get to build, it is tempting to fix five things while you are in there. You came to repair a leaking tap and you end up re-plumbing the bathroom. That feels productive. It is the most common way a loop turn goes wrong — because now nothing is small enough to prove, and if anything breaks you cannot tell which of the five changes did it.
So EXECUTE has one rule above all others: do the single most valuable bounded unit, and stop. "Bounded" means the change has an edge you can point at — these lines, this file, this one behavior. "Most valuable" means ANALYZE already ranked it; you do not re-litigate that here. You read the whole thing first, you find the real cause (not the symptom), you make the simplest change that satisfies the scope, you add a check that proves it, you run the proof, and you write down what you did. That sequence is the Unit Contract, and the rest of this lesson teaches it from six angles.
Think of it like… a surgeon with one item on the list. They do not open you up for an appendix and decide to also "tidy up" a knee while they are at it. They scope the cut to exactly what was agreed, they confirm the count of instruments before they close, and they write the operative note. Bounded, proven, logged. The skill is not cutting more — it is cutting only what was scoped, and proving you left the rest intact.
In an autonomous, AFK run the loop is executed without a human approving each diff. The only thing that keeps that safe is that every EXECUTE produces a change small enough for the VERIFY step to prove or reject at a real boundary. A diff that touches one behavior maps cleanly onto one Proof Gate; a diff that touches five behaviors needs five proofs and a much larger blast radius if it must be reverted. Bounding is what makes the next step — VERIFY — tractable.
EXECUTE does not get to say "done." It produces the change and the check that will judge the change, then runs that check. The verdict belongs to VERIFY (the next lesson), and in a crew it belongs to an independent Validator — never the builder. So the EXECUTE step's job is to leave behind something that is cheaply and honestly verifiable: a failing test that now passes, a command whose exit code flips, a boundary observation that changes. "I added the proving check and ran it" is the deliverable; "it works" is not a thing EXECUTE is allowed to assert.
If, mid-build, you discover a second worthwhile change, the correct move is to log it as a new unit and return it to ANALYZE for ranking — not to fold it into the current diff. The scope was set in lesson 2; ANALYZE chose one unit from it in lesson 4. EXECUTE honors that choice. Quietly widening scope mid-turn breaks the contract that lets the loop run unattended.
EXECUTE is not freeform. It is a fixed sequence of six moves, and skipping one is how turns go bad. Read it as a checklist you run every single time, no matter how small the change feels.
Read the entire relevant surface before editing a character — the function and its callers, the test that covers it, the issue/scope, the trusted sources from LEARN. Most bad fixes are bad because the builder edited the first plausible line without reading the second one that explained why it was written that way.
Name the actual cause, not the symptom. A 500 error is a symptom; "the handler dereferences user before the null check" is a cause. Then write the smallest plan that addresses that cause and draw its edge: which lines, which file, which one behavior changes. The edge is the bound.
Among the plans that satisfy the scope, take the one with the fewest moving parts. Not the cleverest, not the most general, not the one that "also sets us up for" a future feature. Simplicity here is what keeps the diff inside its edge and keeps the proof cheap.
Write the check before you believe the change. A regression test that fails on the old code and passes on the new; an assertion; a command whose exit status flips. The check is the contract's spine — it is what turns "I think it works" into something VERIFY can confirm at a real boundary.
Actually run it, at the real boundary — not in your head, not as a mock, not as a claim. If it passes, hand off to VERIFY. If it fails, you loop back to move 2 and re-plan the same unit; you do not absorb a new fix to make the failure go away.
One line in LOOP-LOG.md: what unit, what root cause, what proof, pass/fail. This is what makes the run observable to a human who never touches the build — the durable record that the contract was honored.
EXECUTE is the middle of the cycle — fed by ANALYZE's chosen unit, handing a proven change to VERIFY. It is the only step that writes to the artifact, which is exactly why it must be the most bounded.
Here is the Unit Contract for one real change, laid out as a stepped plan. The strip is the contract; each card zooms into one move with its concrete tasks, the exit bar that lets it advance, and the risks of skipping it. Click a step — or focus the strip and use the arrow keys — to open its card.
The running example for this lesson: a login endpoint that throws a 500 when the email is unknown. ANALYZE already ranked it the most valuable unit. We are now executing it.
Goal: understand the whole surface before touching it. Read the handler, its callers, and the test that covers it — so the fix addresses the real shape of the code, not a guess.
Goal: name the actual cause and draw the edge of the fix. Symptom: 500 on unknown email. Cause: the handler calls user.hash before checking that user exists.
401, generic messageGoal: make the smallest change that fixes the named cause — a single guard that returns a generic 401 when the user is missing, before any property is read. No refactor, no new abstraction.
if (!user) return 401Goal: write a regression test that fails on the old code and passes on the new. POST an unknown email; assert the status is 401, not 500. The check is the spine of the whole contract.
401Goal: run the check at the real boundary and write one line of record. EXECUTE does not declare victory — it produces a proof and a log entry, then hands the verdict to VERIFY.
LOOP-LOG.mdEach move advances only when its exit bar clears — and the bars are written as things you can check, not vibes. "The diff is a handful of lines, one file" is checkable; "the code feels clean" is not. This is the same gate discipline VERIFY uses, applied inside a single EXECUTE so the unit stays honest under time pressure.
The strip looks linear, but Move 5 has a back-edge: a failing proof returns you to Move 2 with the same scope. The temptation when a test won't pass is to "just also change" something adjacent. That widens the edge and breaks attribution. The contract says re-plan within the bound, or split off a new unit — never silently grow this one.
Each segment carries done / active / todo; selecting one swaps the visible role="tabpanel". In a live run these states come from the tracker so the strip reflects reality, not the plan as written.
Move 3 says "the simplest thing that meets scope." But there is usually more than one way to fix the same cause. Before you commit, it is worth holding the candidates side by side and feeling their trade-offs against the bound. Pick a fix below; the diagram and the trade-off note update together.
All three fix the 500. They differ in how much they touch, how much risk they add, and how well they fit the one-unit scope. The contract chooses the one that fixes the cause with the fewest moving parts — watch the meters.
Add a single line that returns a generic 401 when the user is missing, before any property is read. Fixes the named cause and nothing else.
Extract a shared requireUser() utility and route this handler — plus two others — through it. Tidy in the abstract, but wider than the unit.
Restructure the whole login flow "while we're in here" — sessions, error paths, logging. The classic scope-creep that looks like diligence.
Fix B and Fix C might be "better engineering" in a vacuum. But the Unit Contract's move 3 is "simplest thing that meets scope" — and scope is one handler, one behavior. Fix A is the only candidate whose edge equals the unit's edge, so it is the only one a single Proof Gate can fully cover and a single revert can cleanly undo. The helper and the rewrite are real ideas — they just belong in their own units, logged and ranked by ANALYZE.
Holding three candidates side by side is good practice — it is how you confirm the simplest one actually fixes the cause. The discipline is that exploration ends in a choice, and the choice respects the bound. You compare in order to narrow, never to justify doing all three.
"Simplest thing that meets scope" has a quiet second half: it must also stay inside the project's constraints. Scope says what to change; constraints say how any change must behave — the security, style, and safety rules that hold across the whole codebase. A unit that fixes the bug but violates a constraint is not done.
Below are the constraints this lesson's project carries, shown the way a design system shows its tokens: a named set you can scan, a table that says exactly where each applies, and do / don't pairs for the one we are about to touch.
| Constraint | Rule | What it forces in this fix |
|---|---|---|
| no-enumeration | Generic auth replies | The 401 message must not reveal whether the email exists. |
| status-codes | Auth never returns 500 |
The whole point: a missing user is a 401, not a crash. |
| validate-first | Check before you read | Guard user before touching user.hash. |
| one-behavior | One behavior, one file | Only the unknown-user path changes; everything else is frozen. |
| proof-required | Ship a check | A test that fails on 500 and passes on 401 must accompany the diff. |
no-enumerationA generic reply for both failure modes. An attacker can't tell "no such account" from "wrong password", so they can't enumerate valid emails — and the status is the contract's 401, never a 500.if (!user || !ok) return res.status(401).json({ error: 'bad_credentials' });
Returning a distinct 404 no_such_user stops the 500 — but it now leaks which emails are registered, violating no-enumeration. The bug is gone; the unit is still not done.if (!user) return res.status(404).json({ error: 'no_such_user' });
Scope is the edge of this unit — which lines you may touch. Constraints are global invariants every unit must respect no matter what it touches. They are independent fences: you can satisfy scope (one tiny diff) and still fail a constraint (a 404 that enumerates accounts), or honor every constraint while blowing scope (a constraint-clean full rewrite). The contract requires both: inside the edge and inside the rules.
In a real loop these come from the project's durable record — the GOAL.md constraints block, a CONTEXT/ADR doc, the linter config. EXECUTE reads them as part of move 1 ("read fully") so the simplest fix is chosen from the set that already satisfies them, not retrofitted after a reviewer catches a violation.
The single most important habit in EXECUTE is keeping the edge still. Here is the same starting point taken two ways — one stays inside the bound, the other quietly grows until nothing is provable.
Not every small change is a good bounded unit. There is a difference between a quick patch that hides the symptom and a clean bounded fix that resolves the cause — both can be tiny. The matrix lays the same unit out three ways so you can see which "small" actually satisfies the contract.
Read it as a grid: each row is a property the contract cares about, each column is an approach. Then the cards say when each is the right call.
| property \ approach | symptom patch | clean bounded fix | big rewrite |
|---|---|---|---|
| addresses cause? | no — hides the 500 | yes — guards the deref | yes (and much more) |
| diff size | tiny | small | large |
| stays in scope edge? | yes | yes | no |
| provable by one check? | only the symptom | yes — 500 → 401 | no — too broad |
| contract verdict | fails (no root cause) | passes | fails (scope-creep) |
symptom patch
Hides the symptom by wrapping the crash. The 500 may stop showing, but the null deref is still there waiting.
Contract verdict: fails — no root cause.
clean bounded fix
Resolves the cause with the smallest change in scope, honoring every constraint, with a check that proves it.
Contract verdict: passes — this is the unit.
big rewrite
Fixes everything and then some — but the edge is gone and one proof can't cover it. The good parts belong in their own units.
Contract verdict: fails — scope-creep.
The symptom patch is the smallest diff of the three — and it fails the contract hardest, because it does not address the named cause (move 2). "Small" is not the goal; "the simplest thing that fixes the cause within scope" is. A try/catch that turns a 500 into a different 500 is motion without progress: the next unknown-email request still hits the same broken path.
Only the middle column has a check that distinguishes fixed from broken at a real boundary (POST unknown email → expect 401). The patch can only "prove" that an error was swallowed; the rewrite is too broad for any single check to cover. Provability-by-one-check is a sharp test for whether a change is genuinely one bounded unit.
Of the six moves, the one most often dropped is move 4 — adding the check that proves the change. It feels like overhead when the fix "obviously works." But a fix with no proof is just a claim, and the loop runs on proofs, not claims. The trick that makes the check trustworthy: it must fail on the old code first.
// fails on the old handler (500), passes after the guard (401) test('unknown email returns 401, not 500', async () => { const res = await request(app) .post('/auth/login') .send({ email: 'nobody@example.com', password: 'x' }); expect(res.status).toBe(401); // not 500 expect(res.body.error).toBe('bad_credentials'); // generic — no enumeration });
The check hits the actual route through the app, not a stubbed function — that is what "real boundary" means. Run only this test while iterating:
# run just the new regression test npm test -- -t "unknown email returns 401" # expected: red on the pre-fix commit, green after the guard
Adding and running this check is part of EXECUTE. The independent judgment — "yes, this genuinely meets the scope's done-when" — is the next step, VERIFY, and in a crew it is done by a Validator who did not write the fix. EXECUTE's job is to hand over a change that is cheap and honest to verify: here, a single command whose exit code tells the truth.
The last thing EXECUTE does before VERIFY takes over is read its own diff with a reviewer's eye. Below is the actual change for our unit — green lines added, red removed — with the risk badges a careful builder would attach and reviewer notes pinned to specific lines. Click any line with a clay dot to read its note.
This is a self-review: catching the obvious problems before an independent Validator (or a human reading the log) ever sees them. One of the notes is blocking — see if you can find it before you read them all.
Turns a 500 on unknown-email login into a generic 401 on POST /auth/login. 1 file changed · +3 −2
Each clay-dotted line carries a note. One is blocking and must be resolved before the unit is handed off — open the notes to find it.
Reading your own diff with risk badges and line notes is part of move 3–4 hygiene: did the change stay in scope, honor the constraints, and come with a proof? It is the cheapest place to catch an enumeration leak or a stray reformatted line. But it is not the verdict.
In the loop, the judgment that the unit truly meets done-when is VERIFY's — and in a crew it belongs to an independent Validator who did not write the code. Self-review makes the handoff clean; it does not replace the independent gate. That separation is exactly what keeps an AFK run honest: the one who built it is never the one who certifies it.
Putting the whole contract together on our running unit, move by move — exactly what an Executor produces in one EXECUTE step.
Read fully: the handler reads user.hash on the line after findByEmail, with no null check. Cause: null deref when the email is unknown. Simplest fix in scope: one guard returning a generic 401 before the deref. No helper, no rewrite.
const user = await db.users.findByEmail(email); + if (!user) return res.status(401) + .json({ error: 'bad_credentials' }); const ok = await verify(password, user.hash);
Add the check: POST unknown email, expect 401 (fails on old code, passes on new). Run the proof: the suite goes green at the real boundary. Log it — one line, then hand to VERIFY.
LOOP-LOG.md — one entry## turn 7 — EXECUTE unit: fix-login-500 cause: null deref on missing user change: +1 guard, src/routes/auth.ts proof: npm test -t "unknown email" → PASS scope: 1 file · 1 behavior · in-bounds next: → VERIFY (Validator, not builder)
Five questions on EXECUTE. Pick an answer to see if it's right and why — retrieval beats re-reading. No tell in the formatting; read each option on its merits.
Q1What does "one bounded unit" mean in EXECUTE?
B. Bounded means the change has a visible edge — one behavior, one file's worth of intent — so a single proof can cover it and a single revert can undo it. Size of your day and "what's in the file" are not the bound.
Q2Mid-build you spot a second worthwhile fix. What does the contract say?
C. A new fix is a new unit. Log it and let ANALYZE rank it; do not widen the current edge or abandon the chosen unit. Quietly absorbing it breaks attribution and the contract that lets the loop run unattended.
Q3Why must the proving check fail on the old code first?
A. A check that passes on both old and new code proves nothing — it isn't exercising the bug. Seeing it fail on the broken code, then pass on the fix, is what makes it a real proof rather than decoration.
Q4The fix stops the 500 by returning a distinct 404 "no_such_user". Verdict?
B. Scope and constraints are two fences. The diff fits scope, but a distinct 404 tells an attacker which emails exist — violating no-enumeration. Inside the edge and inside the rules, or it isn't done.
Q5EXECUTE ran the proof and it passed. What does EXECUTE get to claim?
C. EXECUTE produces a change and a run proof, then logs it. The judgment that it truly meets done-when belongs to VERIFY — and in a crew, to an independent Validator who did not build it. EXECUTE never signs off on its own work.