The loop is only as good as its done-when. Before the agent reads a single file, you write a tiny contract that says what "finished" means — measurably. Get this right and the loop knows exactly when to stop. Get it vague and it runs forever, or stops at the wrong place and calls it done.
In the last lesson you saw the loop: learn → analyze → execute one bounded unit → verify at the real boundary → decide. That last word, decide, is where the loop either spins again or stops. But stop when? Something has to tell the loop, in plain measurable terms, that the work is actually finished. That something is the scope contract.
You write it once, up front, in a small file called SCOPE.md. It is not a plan and not a spec — it is shorter than both. It answers six questions: what's the goal, what's the context, what may not change, how do we know it's done, what is the agent allowed to touch, and who is doing the work. The heart of it is one line called done-when — the exit condition.
The whole trick is this: a done-when has to be something a machine can check, not something a human has to feel. "Make the login better" can never be true or false — so a loop chasing it can never stop. "Login p95 under 400 ms and zero 5xx over a 10-minute load test" is either true or it isn't — so the loop knows, on its own, the exact moment to stop.
One sentence to remember: the loop doesn't decide when it's done — your done-when does. The quality of that one line sets the ceiling on everything the loop can do.
Think of it like… a contractor renovating your kitchen. If you say "make it nice," they can paint one wall and declare victory, or keep "improving" it until you run out of money — and you have no grounds to argue either way. If you say "new cabinets installed, sink runs hot and cold, inspection passed," everyone knows the exact instant the job is finished. Where it breaks: unlike a contractor, the loop has no common sense to fall back on — it will take "make it nice" completely literally, so the contract has to carry all the meaning.
One more thing to place before we open it up: SCOPE.md is not the only contract you'll hear about in this course. There's a zoom scale of three. A PRD is the big picture (why, for whom, the whole product). SCOPE.md is the contract for one loop — the one-pager you're learning now. And GOAL.md is the machine-checkable run-contract the Forge compiles for a long autonomous run (lessons 8 and 10). Same idea at three zoom levels.
A loop is a control structure: while (!done) { learn; analyze; execute; verify; }. The done predicate is evaluated every cycle. If that predicate is not decidable from observable state, the loop has no termination guarantee — it is, formally, a program with no defined halting condition. "Better", "cleaner", "production-ready" are not predicates over state; they are subjective judgments, so the loop can never evaluate them and can never legitimately exit.
SCOPE.md is the lightweight, human-written contract you start any non-trivial loop with — six fields, often under a page. It is the input to the loop, the yardstick the decide step measures against. It is not the same as GOAL.md: GOAL.md is the compiled, autonomous-run contract that the Forge /goal step emits (XML-blocked: goal / context / constraints / verification / done-when) for a long unattended run — you'll meet it in lessons 8 and 10. And a PRD is broader still: market, persona, rollout. Think of it as a zoom scale: PRD (the why & what, many pages) → SCOPE.md (the contract for one loop, one page) → GOAL.md (the machine-checkable run contract the loop executes against).
A done-when clause is measurable when it names (1) a metric or observable artifact, (2) a comparison or threshold, and (3) the boundary where it is checked. "Tests pass" is weak (which tests? where?); "npm test exits 0 with 0 failures in CI" is strong. The verify step (lesson 6) runs that check at the real boundary — never a claim, never a mock — and the decide step compares the result to this contract.
Here is where the contract sits in the loop. Every cycle runs learn → analyze → execute → verify, and then hits a gate. The gate asks one question, taken straight from your done-when: is it true yet? No → loop again. Yes → stop. The contract is the gate.
done-when makes the gate un-evaluable — the loop can't ever legitimately exit.And here is the cost of getting it wrong, side by side. A vague contract leaves the gate stuck open (the loop can't tell true from false, so it drifts); a measurable contract gives the gate a clean yes/no.
SCOPE.mdA scope contract has exactly six fields. None is optional — each closes off a specific way the loop can go wrong. Here they are as one labelled anatomy, then as a clickable strip you can step through: pick any field to see what it's for, what good looks like, and what breaks without it.
What it's for: the single outcome this loop exists to produce, in one sentence a stranger could read. Not a task list — the result.
# Goal
Cut RHG checkout p95 latency so it stops timing out at peak.What it's for: the handful of facts the agent must know before touching anything — where the code lives, what the current behavior is, the one gotcha that isn't obvious. This is what the LEARN step (lesson 3) confirms against reality.
services/checkout; hot path is the cart re-price call."# Context
Checkout lives in services/checkout. Hot path is the cart
re-price call on every keystroke. Prod is us-east, behind a CDN.What it's for: the lines the agent must not cross while chasing the goal — the public API stays stable, no new dependency, the design tokens are fixed. The next section turns these into something you can actually see.
/checkout API shape."# Constraints
- Do NOT change the public /checkout request/response shape.
- No new runtime dependencies.
- Stay on the existing design tokens (no raw hex).What it's for: the measurable exit condition. This is the line the gate reads every cycle. Each clause must name a metric, a threshold, and the boundary where it's checked — so it can be true or false, never "sort of."
npm test exits 0 in CI"# Done-when
- checkout p95 < 400ms over a 10-minute load test at peak RPS
- 0 responses with status >= 500 during that test
- `npm test` exits 0 with 0 failures in CI
- the /checkout API contract test still passes unchangedWhat it's for: the exact set of files or directories the agent is allowed to change. Everything else is read-only. This keeps a "small fix" from quietly rewriting half the repo.
services/checkout/** and its tests."# Editable surface
- services/checkout/** # implementation
- services/checkout/__tests__/** # its tests
# everything else: READ-ONLYWhat it's for: who does what. At minimum: who builds (the Executor) and who proves it against this contract (the Validator). The rule that makes the proof trustworthy: the Validator is never the agent that built it (lesson 9).
# Agents
- Executor: agent-A (tier: execute)
- Validator: agent-B (tier: analyze) # NEVER the builder# Goal Cut RHG checkout p95 latency so it stops timing out at peak. # Context Checkout lives in services/checkout. Hot path is the cart re-price call on every keystroke. Prod is us-east, behind a CDN. # Constraints - Do NOT change the public /checkout request/response shape. - No new runtime dependencies. - Stay on the existing design tokens (no raw hex). # Done-when - checkout p95 < 400ms over a 10-minute load test at peak RPS - 0 responses with status >= 500 during that test - `npm test` exits 0 with 0 failures in CI - the /checkout API contract test still passes unchanged # Editable surface - services/checkout/** - services/checkout/__tests__/** # Agents - Executor: agent-A (tier: execute) - Validator: agent-B (tier: analyze) # never the builder
It is just a Markdown file you author by hand before the run. Create or open it from the repo root:
# create it next to the work, then open in your editor $ touch SCOPE.md && $EDITOR SCOPE.md # confirm a loop run is pointed at it $ grep -n "Done-when" SCOPE.md
In the Forge front-end (lesson 8) this same six-field thinking is what the /goal step compiles into the autonomous GOAL.md; for a quick interactive loop you often hand-write SCOPE.md and skip straight to running.
done-when — watch the exit condition flipNow the core skill, hands-on. Type a done-when below — or load one of the example chips. The panel on the right grades it live against the three things a measurable exit condition needs: a metric, a threshold, and a boundary. Watch the verdict flip between vague → the loop can't stop and measurable → the loop knows when to exit.
The grader is a tiny pure function over the text. It scans for three signals: a metric (a known measurable noun or a number-with-unit), a threshold (a comparison word/operator or a target), and a boundary (a "where" phrase — in CI, in prod, over a test). A clause is measurable only when all three are present (plus the no-feeling-words check in strict mode). Anything less is vague, and the readout spells out the missing piece — exactly what the loop's decide step would be unable to evaluate.
function grade(text, strict) { const t = text.toLowerCase(); const hasMetric = /\b(p9\d|latency|error rate|uptime|conversion|\d+\s?(ms|s|%))\b/.test(t); const hasThreshold = /(under|below|over|above|<|>|>=|<=|at least|exits 0|\d)/.test(t); const hasBoundary = /(in ci|in prod|production|staging|load test|over a)/.test(t); const feelings = /\b(better|faster|fast|good|nice|clean|improv)/.test(t); const ok = hasMetric && hasThreshold && hasBoundary && (!strict || !feelings); return { ok, hasMetric, hasThreshold, hasBoundary, feelings }; }
The point is not the regex — it's the shape: metric + threshold + boundary is the minimum a verify step can act on. Real loops let an LLM judge this in plain language, but the test is identical: could a machine, looking only at observable state, return true or false?
The single most common scope mistake is a done-when that sounds like a goal but can't be checked. Here are five vague lines and the measurable rewrite of each — the same intent, made into something the gate can evaluate. Read across: notice that every rewrite adds a metric, a threshold, and a boundary.
No number, no place to check. The loop can run forever and never be "wrong."
done-when: "checkout feels fast" done-when: "the page is snappy"
Metric (p95), threshold (< 400ms), boundary (10-min test). True or false.
done-when: "checkout p95 < 400ms over a" "10-minute load test at peak RPS"
"Fewer" compared to what, measured where? Nothing to evaluate.
done-when: "fewer errors" done-when: "more reliable"
A rate, a ceiling, and the boundary it's read at.
done-when: "5xx error rate below 0.1%" "in production over 24h"
"Pass" is good instinct but under-specified: which tests, run where?
done-when: "tests pass" done-when: "no bugs"
An exact command, an exit code, a place. The verify step can run it.
done-when: "`npm test` exits 0 with" "0 failures in CI"
"Improve" is open-ended by definition — there's always more.
done-when: "improve signups" done-when: "grow conversion"
A target rate over a defined window — reachable and checkable.
done-when: "signup conversion >= 22%" "over a 7-day A/B test"
"Production-ready" hides a dozen unstated checks. Whose definition?
done-when: "make it production-ready" done-when: "ship-quality"
Spell the dozen out as concrete clauses. Now "ready" is decidable.
done-when: "p95 < 400ms, 5xx < 0.1%," "tests green, a11y scan 0 errors"
The tell: if you can't picture the exact check that would prove it — a command, a metric read at a place — the clause is still vague. Keep rewriting until you can.
Done-when says when to stop. Constraints say what the agent may not disturb on the way. The cleanest way to pin a constraint is to point at something already named and fixed — a design system of tokens is the classic example: instead of "keep it on-brand" (vague), you say "use only these tokens" (checkable). Here is the kind of fixed surface a scope pins, and the right vs wrong way to reference it.
| Constraint references | Means | How the loop checks it |
|---|---|---|
| design tokens only surface · fixed set | No raw hex; every color from a named token. | grep the diff for #-hex literals → must be none. |
| public API frozen contract · stable | Request/response shape of /checkout unchanged. | The API contract test still passes unchanged. |
| no new dependencies surface · closed | Nothing added to the lockfile. | git diff on the lockfile is empty. |
| editable surface paths · allow-list | Only the named directories may change. | Every changed path is under the allow-list. |
The right vs wrong way to write one of these:
Points at a named, already-fixed set. A machine can verify it.
# Constraints - colors: use design tokens only # (--clay, --olive, …) — no raw hex
"On-brand" is a feeling. Nothing to grep, nothing to prove.
# Constraints - keep it on-brand and tasteful # …says who? checked how?
Pick a constraint kind
Strictness
# constraint
Notice every good constraint has the same shape as a good done-when: a metric/observable, a threshold (often "unchanged" or "zero"), and a boundary. "No raw hex in the diff" is metric (count of hex literals) + threshold (= 0) + boundary (the diff). That is why the same grader logic works for both — a constraint is an invariant the loop must hold at every step, while a done-when is the condition that lets it stop. Pinning to a named, fixed artifact (the token set, the contract test, the lockfile) is what makes the check mechanical instead of subjective.
The Agents field doesn't just name who's involved — it sets each agent's authorization tier: how far it's allowed to go without a human. There are three, and they stack like permissions:
analyze = read and reason only (look, never touch). execute = make the bounded change (edit files, run the build). destructive = irreversible actions (deploy to prod, drop data, force-push). Each higher tier requires the one below it — you can't grant execute without analyze, or destructive without execute.
Flip the switches below to grant tiers. Turn on a tier whose prerequisite is off and the panel warns you the grant won't hold — the same way a scope contract refuses an inconsistent authorization. The summary line always reads back what the agent is actually cleared to do.
Think of it like… keys on a ring. The read-the-building key lets you walk the halls. The workshop key only works if you already hold the building key. The master key that can knock down a wall is useless without the workshop key first. You hand out the smallest ring that gets the job done — and a loop that runs AFK is usually handed analyze + execute, with destructive held back behind a human.
analyze ⊂ execute ⊂ destructive. You can't hold an outer ring without the inner one — and the dashed line is where a hands-off loop stops and a human takes the irreversible step.Read files, run read-only checks, reason about the gap. Looks, never changes. The LEARN and ANALYZE steps live here.
Edit files inside the editable surface, run the build and tests. The EXECUTE step. Reversible by a revert.
Irreversible actions: deploy to prod, drop a table, force-push, send real emails. Usually held behind a human handoff.
This agent is cleared to
Each tier is a boolean the contract grants. A second map records what each tier requires below it; a granted tier whose prerequisite is missing is unsatisfied — it can't take effect, so the editor flags it, exactly as a real runner would refuse to act on an inconsistent grant. The effective authorization the summary reads is the longest satisfied prefix of the chain.
const grant = { analyze:false, execute:false, destructive:false }; const requires = { analyze: [], // base tier execute: ['analyze'], // can't change what you can't read destructive: ['execute'] // irreversible needs reversible first }; function missing(tier) { return requires[tier].filter(t => !grant[t]); } function effective() { // what the agent may actually do return Object.keys(grant).filter(t => grant[t] && missing(t).length === 0); }
An autonomous run (lesson 7, 9) is normally granted analyze + execute and not destructive. The loop can build and prove a change all day; the one irreversible step — ship it to prod — is the genuine user-only fork that triggers a handoff. That is also why the Validator is granted only analyze: it proves, it never changes. computer-use is AX-only and non-blocking, so it never even reaches the execute tier on the real system.
Pulling it together: the contract isn't paperwork, it's the wiring of the loop's decision. Done-when is the gate that ends the run. Constraints are guards checked every cycle. The editable surface and tiers fence what each cycle may do. Here is the whole thing as a state machine.
Everything in this diagram runs AFK. The human's only role is observability — reading LOOP-LOG.md / status / review.md as the machine works. The human executes nothing, not even the QA. The single exception is the dashed destructive branch: when the only remaining step is irreversible and genuinely user-only, the loop emits a decision-ready handoff and blocks there — never anywhere else. A well-written SCOPE.md is what makes this safe: the constraints keep cycles in-bounds without supervision, and the measurable done-when lets the loop terminate itself instead of waiting to be told.
done-when — a live status reportBecause every clause is measurable, the loop's progress against the contract reads like a dashboard. This is exactly what the human observes while the loop runs AFK — never touching it, just reading. Each done-when clause is a row with a live pass / pending / fail badge; the banner up top is green only when every clause passes. Hit Re-check to pull a fresh reading, or turn on Live to watch the loop converge.
Done-when status — RHG checkout loop
reading SCOPE.md · verify @ real boundary · the human only observes
| Done-when clause | Status | Reading | Boundary |
|---|
One array of clause objects drives both the table and the banner. Each tick re-reads each metric at its boundary and recomputes the clause's status. The overall gate is a logical AND: the run is done only when every clause passes — any fail keeps the banner red and the loop running. This is the decide step made visible. Notice the human does nothing here but read; the loop re-checks itself and converges.
A clause can be pending — its check hasn't completed this cycle (e.g. the 10-minute load test is still running). Pending is not pass; the gate treats only an affirmative pass as satisfied, so a half-finished check can never trip an early exit.
One question, no notes. Pick the done-when that a loop could actually evaluate and stop on. Click an option to see if it holds up.
Which done-when lets the loop decide, on its own, exactly when to stop?
Why C: it names a metric (p95, 5xx count), thresholds (< 400 ms, zero), and a boundary (a 10-minute test). A, B, and D each lean on a feeling — "faster", "happy", "clean / production-ready" — that no machine can return true or false for, so the loop chasing them can never legitimately exit.
Here's the shape of how a runner actually consumes SCOPE.md: it parses the contract, runs cycles inside the constraints and editable surface, and evaluates the gate against done-when every time. The human reads the log; they don't drive the loop.
const scope = parseScope("SCOPE.md"); // the 6 fields while (!gate(scope.doneWhen)) { // the exit gate const state = learn(scope.context); // see real state const unit = analyze(state, scope.goal); // pick ONE unit execute(unit, { surface: scope.editableSurface, // fence the change constraints: scope.constraints, // guard, every cycle tier: scope.agents.executor.tier // analyze | execute | … }); const proof = verify(unit, scope.doneWhen); // at the REAL boundary log("LOOP-LOG.md", proof); // the human only reads this } // loop exits the instant every done-when clause is true
# the contract you hand-write before a loop $ $EDITOR SCOPE.md # the observability artifacts the human reads (never executes) $ tail -f LOOP-LOG.md # the loop's running record $ less review.md # the AFK QA's observability report # confirm the gate is wired to your done-when $ grep -n "Done-when" SCOPE.md
Where this goes next: lesson 3 (LEARN) is how the agent confirms the Context against reality; lesson 6 (VERIFY) is the real-boundary proof gate that evaluates Done-when; lesson 8 (the Forge) shows the /goal step compiling this same thinking into an autonomous GOAL.md; lesson 9 explains why the Validator is never the builder.
done-when for something you're actually working on, paste it into the builder in section 4, and see if it turns green. If it stays vague, ask me "what metric, threshold, and boundary would make this checkable?" and we'll sharpen it together. The next lesson, LEARN: see the real state, is how the loop confirms the Context field against reality before it changes anything.