Module 1 · Foundations · Lesson 2

The scope contract: defining "done"

The loop is only as good as its done-when. Before the agent reads a single file, you write a tiny contract that says what "finished" means — measurably. Get this right and the loop knows exactly when to stop. Get it vague and it runs forever, or stops at the wrong place and calls it done.

Read the plain version, or open the technical layer on any section.

The big idea — a contract for "done"

In the last lesson you saw the loop: learn → analyze → execute one bounded unit → verify at the real boundary → decide. That last word, decide, is where the loop either spins again or stops. But stop when? Something has to tell the loop, in plain measurable terms, that the work is actually finished. That something is the scope contract.

You write it once, up front, in a small file called SCOPE.md. It is not a plan and not a spec — it is shorter than both. It answers six questions: what's the goal, what's the context, what may not change, how do we know it's done, what is the agent allowed to touch, and who is doing the work. The heart of it is one line called done-when — the exit condition.

The whole trick is this: a done-when has to be something a machine can check, not something a human has to feel. "Make the login better" can never be true or false — so a loop chasing it can never stop. "Login p95 under 400 ms and zero 5xx over a 10-minute load test" is either true or it isn't — so the loop knows, on its own, the exact moment to stop.

One sentence to remember: the loop doesn't decide when it's done — your done-when does. The quality of that one line sets the ceiling on everything the loop can do.

Think of it like… a contractor renovating your kitchen. If you say "make it nice," they can paint one wall and declare victory, or keep "improving" it until you run out of money — and you have no grounds to argue either way. If you say "new cabinets installed, sink runs hot and cold, inspection passed," everyone knows the exact instant the job is finished. Where it breaks: unlike a contractor, the loop has no common sense to fall back on — it will take "make it nice" completely literally, so the contract has to carry all the meaning.

One more thing to place before we open it up: SCOPE.md is not the only contract you'll hear about in this course. There's a zoom scale of three. A PRD is the big picture (why, for whom, the whole product). SCOPE.md is the contract for one loop — the one-pager you're learning now. And GOAL.md is the machine-checkable run-contract the Forge compiles for a long autonomous run (lessons 8 and 10). Same idea at three zoom levels.

One idea, three zoom levels: the PRD frames the product, SCOPE.md contracts one loop, GOAL.md is what the loop mechanically checks. This lesson lives at the middle level.

Why the exit condition is the load-bearing part

A loop is a control structure: while (!done) { learn; analyze; execute; verify; }. The done predicate is evaluated every cycle. If that predicate is not decidable from observable state, the loop has no termination guarantee — it is, formally, a program with no defined halting condition. "Better", "cleaner", "production-ready" are not predicates over state; they are subjective judgments, so the loop can never evaluate them and can never legitimately exit.

SCOPE.md vs. GOAL.md vs. a PRD

SCOPE.md is the lightweight, human-written contract you start any non-trivial loop with — six fields, often under a page. It is the input to the loop, the yardstick the decide step measures against. It is not the same as GOAL.md: GOAL.md is the compiled, autonomous-run contract that the Forge /goal step emits (XML-blocked: goal / context / constraints / verification / done-when) for a long unattended run — you'll meet it in lessons 8 and 10. And a PRD is broader still: market, persona, rollout. Think of it as a zoom scale: PRD (the why & what, many pages) → SCOPE.md (the contract for one loop, one page) → GOAL.md (the machine-checkable run contract the loop executes against).

What "measurable" means precisely

A done-when clause is measurable when it names (1) a metric or observable artifact, (2) a comparison or threshold, and (3) the boundary where it is checked. "Tests pass" is weak (which tests? where?); "npm test exits 0 with 0 failures in CI" is strong. The verify step (lesson 6) runs that check at the real boundary — never a claim, never a mock — and the decide step compares the result to this contract.

In one picture — scope is the loop's exit gate

Here is where the contract sits in the loop. Every cycle runs learn → analyze → execute → verify, and then hits a gate. The gate asks one question, taken straight from your done-when: is it true yet? No → loop again. Yes → stop. The contract is the gate.

The contract (dashed) feeds the gate. The gate is the only thing that turns "keep looping" into "stop." A vague done-when makes the gate un-evaluable — the loop can't ever legitimately exit.

And here is the cost of getting it wrong, side by side. A vague contract leaves the gate stuck open (the loop can't tell true from false, so it drifts); a measurable contract gives the gate a clean yes/no.

Same loop, two contracts. The metric, the threshold, and the boundary are what turn the gate from a coin-flip into a decision.

The six fields of `SCOPE.md`

A scope contract has exactly six fields. None is optional — each closes off a specific way the loop can go wrong. Here they are as one labelled anatomy, then as a clickable strip you can step through: pick any field to see what it's for, what good looks like, and what breaks without it.

Six fields. Goal & Context aim the work; Constraints & Editable surface fence it; Agents staff it; Done-when is the measurable line that lets the loop stop.

File SCOPE.md Length usually under a page Written by you, once, up front

Contract 6 of 6 fields present

Click a field — or focus the strip and use ← → — to read what it's for.

Field 1

Goal

aims the work

What it's for: the single outcome this loop exists to produce, in one sentence a stranger could read. Not a task list — the result.

Good

"Cut RHG checkout's p95 latency so it stops timing out at peak."
Names one outcome, not five
A reader knows what success looks like

Breaks without it

The agent optimizes the wrong thing
Three half-features instead of one finished one
No anchor for "is this in scope?"

In the file

# Goal
Cut RHG checkout p95 latency so it stops timing out at peak.

Field 2

Context

aims the work

What it's for: the handful of facts the agent must know before touching anything — where the code lives, what the current behavior is, the one gotcha that isn't obvious. This is what the LEARN step (lesson 3) confirms against reality.

Good

"Checkout = services/checkout; hot path is the cart re-price call."
Points at the real artifact and boundary
Names the non-obvious trap up front

Breaks without it

The agent guesses the layout and edits the wrong module
Re-discovers known facts slowly, burning cycles
Trips the gotcha you already knew about

In the file

# Context
Checkout lives in services/checkout. Hot path is the cart
re-price call on every keystroke. Prod is us-east, behind a CDN.

Field 3

Constraints

fences the work

What it's for: the lines the agent must not cross while chasing the goal — the public API stays stable, no new dependency, the design tokens are fixed. The next section turns these into something you can actually see.

Good

"No change to the public /checkout API shape."
"Stay on the existing design tokens."
Each one is a concrete, checkable boundary

Breaks without it

The agent "fixes" latency by breaking the API contract
Pulls in a heavy dependency you didn't want
Ships a faster page that no longer matches the brand

In the file

# Constraints
- Do NOT change the public /checkout request/response shape.
- No new runtime dependencies.
- Stay on the existing design tokens (no raw hex).

Field 4 · the heart

Done-when

lets the loop stop

What it's for: the measurable exit condition. This is the line the gate reads every cycle. Each clause must name a metric, a threshold, and the boundary where it's checked — so it can be true or false, never "sort of."

Measurable

"p95 < 400 ms over a 10-min load test"
"0 responses ≥ 500 during the test"
"npm test exits 0 in CI"

Vague (un-checkable)

"checkout feels fast"
"latency improved"
"tests look good"

In the file

# Done-when
- checkout p95 < 400ms over a 10-minute load test at peak RPS
- 0 responses with status >= 500 during that test
- `npm test` exits 0 with 0 failures in CI
- the /checkout API contract test still passes unchanged

Field 5

Editable surface

fences the work

What it's for: the exact set of files or directories the agent is allowed to change. Everything else is read-only. This keeps a "small fix" from quietly rewriting half the repo.

Good

"Only services/checkout/** and its tests."
An explicit allow-list, not "wherever needed"
Makes an out-of-bounds edit obvious in review

Breaks without it

A scoped fix sprawls across unrelated modules
Blast radius nobody signed up for
The diff is too big to review honestly

In the file

# Editable surface
- services/checkout/**          # implementation
- services/checkout/__tests__/** # its tests
# everything else: READ-ONLY

Field 6

Agents

staffs the work

What it's for: who does what. At minimum: who builds (the Executor) and who proves it against this contract (the Validator). The rule that makes the proof trustworthy: the Validator is never the agent that built it (lesson 9).

Good

"Executor: agent A. Validator: agent B."
Builder and prover are different agents
Authorization tier named per agent (next section)

Breaks without it

The builder grades its own homework
"It works on my machine" becomes the proof
No one owns the real-boundary check

In the file

# Agents
- Executor:  agent-A   (tier: execute)
- Validator: agent-B   (tier: analyze)   # NEVER the builder

The whole contract in one file

SCOPE.md (repo root, or alongside the work)

# Goal
Cut RHG checkout p95 latency so it stops timing out at peak.

# Context
Checkout lives in services/checkout. Hot path is the cart re-price
call on every keystroke. Prod is us-east, behind a CDN.

# Constraints
- Do NOT change the public /checkout request/response shape.
- No new runtime dependencies.
- Stay on the existing design tokens (no raw hex).

# Done-when
- checkout p95 < 400ms over a 10-minute load test at peak RPS
- 0 responses with status >= 500 during that test
- `npm test` exits 0 with 0 failures in CI
- the /checkout API contract test still passes unchanged

# Editable surface
- services/checkout/**
- services/checkout/__tests__/**

# Agents
- Executor:  agent-A   (tier: execute)
- Validator: agent-B   (tier: analyze)   # never the builder

How to find / open this

It is just a Markdown file you author by hand before the run. Create or open it from the repo root:

# create it next to the work, then open in your editor
$ touch SCOPE.md && $EDITOR SCOPE.md

# confirm a loop run is pointed at it
$ grep -n "Done-when" SCOPE.md

In the Forge front-end (lesson 8) this same six-field thinking is what the /goal step compiles into the autonomous GOAL.md; for a quick interactive loop you often hand-write SCOPE.md and skip straight to running.

Build a `done-when` — watch the exit condition flip

Now the core skill, hands-on. Type a done-when below — or load one of the example chips. The panel on the right grades it live against the three things a measurable exit condition needs: a metric, a threshold, and a boundary. Watch the verdict flip between vague → the loop can't stop and measurable → the loop knows when to exit.

Write your done-when

The exit condition

Type freely. The grader on the right reacts to every keystroke.

Quick-add a metric

A metric is a number the system can report.

Quick-add a threshold

A threshold is the line that says pass vs fail.

Quick-add a boundary

The boundary is where it's measured — the real place the verify step checks.

Strict mode — also require the clause to avoid feeling-words like "better", "fast", "good", "nice".

Exit conditionlive

!Vague — the loop cannot stop

✗names a metric (a number the system reports)
✗sets a threshold (the pass/fail line)
✗names the boundary (where it's checked)
✗avoids feeling-words (only graded in strict mode)

How the verdict is computed

The grader is a tiny pure function over the text. It scans for three signals: a metric (a known measurable noun or a number-with-unit), a threshold (a comparison word/operator or a target), and a boundary (a "where" phrase — in CI, in prod, over a test). A clause is measurable only when all three are present (plus the no-feeling-words check in strict mode). Anything less is vague, and the readout spells out the missing piece — exactly what the loop's decide step would be unable to evaluate.

function grade(text, strict) {
  const t = text.toLowerCase();
  const hasMetric    = /\b(p9\d|latency|error rate|uptime|conversion|\d+\s?(ms|s|%))\b/.test(t);
  const hasThreshold = /(under|below|over|above|<|>|>=|<=|at least|exits 0|\d)/.test(t);
  const hasBoundary  = /(in ci|in prod|production|staging|load test|over a)/.test(t);
  const feelings    = /\b(better|faster|fast|good|nice|clean|improv)/.test(t);
  const ok = hasMetric && hasThreshold && hasBoundary && (!strict || !feelings);
  return { ok, hasMetric, hasThreshold, hasBoundary, feelings };
}

The point is not the regex — it's the shape: metric + threshold + boundary is the minimum a verify step can act on. Real loops let an LLM judge this in plain language, but the test is identical: could a machine, looking only at observable state, return true or false?

Vague vs. measurable — five rewrites

The single most common scope mistake is a done-when that sounds like a goal but can't be checked. Here are five vague lines and the measurable rewrite of each — the same intent, made into something the gate can evaluate. Read across: notice that every rewrite adds a metric, a threshold, and a boundary.

Vague — can't stop

No number, no place to check. The loop can run forever and never be "wrong."

done-when: "checkout feels fast"
done-when: "the page is snappy"

Measurable — clean exit

Metric (p95), threshold (< 400ms), boundary (10-min test). True or false.

done-when:
  "checkout p95 < 400ms over a"
  "10-minute load test at peak RPS"

Vague — can't stop

"Fewer" compared to what, measured where? Nothing to evaluate.

done-when: "fewer errors"
done-when: "more reliable"

Measurable — clean exit

A rate, a ceiling, and the boundary it's read at.

done-when:
  "5xx error rate below 0.1%"
  "in production over 24h"

Vague — can't stop

"Pass" is good instinct but under-specified: which tests, run where?

done-when: "tests pass"
done-when: "no bugs"

Measurable — clean exit

An exact command, an exit code, a place. The verify step can run it.

done-when:
  "`npm test` exits 0 with"
  "0 failures in CI"

Vague — can't stop

"Improve" is open-ended by definition — there's always more.

done-when: "improve signups"
done-when: "grow conversion"

Measurable — clean exit

A target rate over a defined window — reachable and checkable.

done-when:
  "signup conversion >= 22%"
  "over a 7-day A/B test"

Vague — can't stop

"Production-ready" hides a dozen unstated checks. Whose definition?

done-when: "make it production-ready"
done-when: "ship-quality"

Measurable — clean exit

Spell the dozen out as concrete clauses. Now "ready" is decidable.

done-when:
  "p95 < 400ms, 5xx < 0.1%,"
  "tests green, a11y scan 0 errors"

The tell: if you can't picture the exact check that would prove it — a command, a metric read at a place — the clause is still vague. Keep rewriting until you can.

The constraints scope pins down

Done-when says when to stop. Constraints say what the agent may not disturb on the way. The cleanest way to pin a constraint is to point at something already named and fixed — a design system of tokens is the classic example: instead of "keep it on-brand" (vague), you say "use only these tokens" (checkable). Here is the kind of fixed surface a scope pins, and the right vs wrong way to reference it.

Pinned tokens — the fixed palette the scope references

#FAF9F5

--ivorypage background

#D97757

--clayprimary action

#788C5D

--olivesuccess / pass

#5C7CA3

--skyinfo / analyze

#B04A3F

--rustdanger / fail

#E3DACC

--oatsubtle fill

How a constraint references the fixed thing

Constraint references	Means	How the loop checks it
design tokens only surface · fixed set	No raw hex; every color from a named token.	grep the diff for `#`-hex literals → must be none.
public API frozen contract · stable	Request/response shape of `/checkout` unchanged.	The API contract test still passes unchanged.
no new dependencies surface · closed	Nothing added to the lockfile.	`git diff` on the lockfile is empty.
editable surface paths · allow-list	Only the named directories may change.	Every changed path is under the allow-list.

The right vs wrong way to write one of these:

Do — reference the fixed thing

Points at a named, already-fixed set. A machine can verify it.

# Constraints
- colors: use design tokens only
  # (--clay, --olive, …) — no raw hex

Don't — gesture at a vibe

"On-brand" is a feeling. Nothing to grep, nothing to prove.

# Constraints
- keep it on-brand and tasteful
  # …says who? checked how?

Pick a constraint kind

Strictness

resulting constraint + its check

# constraint

A constraint is just a done-when that must stay true

Notice every good constraint has the same shape as a good done-when: a metric/observable, a threshold (often "unchanged" or "zero"), and a boundary. "No raw hex in the diff" is metric (count of hex literals) + threshold (= 0) + boundary (the diff). That is why the same grader logic works for both — a constraint is an invariant the loop must hold at every step, while a done-when is the condition that lets it stop. Pinning to a named, fixed artifact (the token set, the contract test, the lockfile) is what makes the check mechanical instead of subjective.

Authorization tiers — what each agent is allowed to do

The Agents field doesn't just name who's involved — it sets each agent's authorization tier: how far it's allowed to go without a human. There are three, and they stack like permissions:

analyze = read and reason only (look, never touch). execute = make the bounded change (edit files, run the build). destructive = irreversible actions (deploy to prod, drop data, force-push). Each higher tier requires the one below it — you can't grant execute without analyze, or destructive without execute.

Flip the switches below to grant tiers. Turn on a tier whose prerequisite is off and the panel warns you the grant won't hold — the same way a scope contract refuses an inconsistent authorization. The summary line always reads back what the agent is actually cleared to do.

Think of it like… keys on a ring. The read-the-building key lets you walk the halls. The workshop key only works if you already hold the building key. The master key that can knock down a wall is useless without the workshop key first. You hand out the smallest ring that gets the job done — and a loop that runs AFK is usually handed analyze + execute, with destructive held back behind a human.

The tiers nest: analyze ⊂ execute ⊂ destructive. You can't hold an outer ring without the inner one — and the dashed line is where a hands-off loop stops and a human takes the irreversible step.

analyzetier 1

Read files, run read-only checks, reason about the gap. Looks, never changes. The LEARN and ANALYZE steps live here.

requires: nothing — base tier

executetier 2

Edit files inside the editable surface, run the build and tests. The EXECUTE step. Reversible by a revert.

requires: analyze

destructivetier 3

Irreversible actions: deploy to prod, drop a table, force-push, send real emails. Usually held behind a human handoff.

requires: execute

This agent is cleared to

Tiers are a dependency chain, recorded in the contract

Each tier is a boolean the contract grants. A second map records what each tier requires below it; a granted tier whose prerequisite is missing is unsatisfied — it can't take effect, so the editor flags it, exactly as a real runner would refuse to act on an inconsistent grant. The effective authorization the summary reads is the longest satisfied prefix of the chain.

const grant = { analyze:false, execute:false, destructive:false };

const requires = {
  analyze:     [],                 // base tier
  execute:     ['analyze'],        // can't change what you can't read
  destructive: ['execute']         // irreversible needs reversible first
};

function missing(tier) { return requires[tier].filter(t => !grant[t]); }
function effective() {        // what the agent may actually do
  return Object.keys(grant).filter(t => grant[t] && missing(t).length === 0);
}

Why AFK loops stop at execute

An autonomous run (lesson 7, 9) is normally granted analyze + execute and not destructive. The loop can build and prove a change all day; the one irreversible step — ship it to prod — is the genuine user-only fork that triggers a handoff. That is also why the Validator is granted only analyze: it proves, it never changes. computer-use is AX-only and non-blocking, so it never even reaches the execute tier on the real system.

Scope as the loop's exit gate — the state machine

Pulling it together: the contract isn't paperwork, it's the wiring of the loop's decision. Done-when is the gate that ends the run. Constraints are guards checked every cycle. The editable surface and tiers fence what each cycle may do. Here is the whole thing as a state machine.

The contract is the control flow: constraints are the in-bounds guard each cycle, done-when is the terminal gate, and any destructive step forks to the only place a human is required.

Observability, not operation

Everything in this diagram runs AFK. The human's only role is observability — reading LOOP-LOG.md / status / review.md as the machine works. The human executes nothing, not even the QA. The single exception is the dashed destructive branch: when the only remaining step is irreversible and genuinely user-only, the loop emits a decision-ready handoff and blocks there — never anywhere else. A well-written SCOPE.md is what makes this safe: the constraints keep cycles in-bounds without supervision, and the measurable done-when lets the loop terminate itself instead of waiting to be told.

Reading the `done-when` — a live status report

Because every clause is measurable, the loop's progress against the contract reads like a dashboard. This is exactly what the human observes while the loop runs AFK — never touching it, just reading. Each done-when clause is a row with a live pass / pending / fail badge; the banner up top is green only when every clause passes. Hit Re-check to pull a fresh reading, or turn on Live to watch the loop converge.

Done-when status — RHG checkout loop

reading SCOPE.md · verify @ real boundary · the human only observes

checking…

last checked just now

checkout p95

512ms

over 400ms target

5xx error rate

0.30%

over 0.1% target

test failures

target is 0

Each done-when clause, checked at its boundary
Done-when clause	Status	Reading	Boundary

The banner is an AND over the clauses

One array of clause objects drives both the table and the banner. Each tick re-reads each metric at its boundary and recomputes the clause's status. The overall gate is a logical AND: the run is done only when every clause passes — any fail keeps the banner red and the loop running. This is the decide step made visible. Notice the human does nothing here but read; the loop re-checks itself and converges.

Why "pending" exists

A clause can be pending — its check hasn't completed this cycle (e.g. the 10-minute load test is still running). Pending is not pass; the gate treats only an affirmative pass as satisfied, so a half-finished check can never trip an early exit.

Quick check — spot the stoppable contract

One question, no notes. Pick the done-when that a loop could actually evaluate and stop on. Click an option to see if it holds up.

Which done-when lets the loop decide, on its own, exactly when to stop?

Why C: it names a metric (p95, 5xx count), thresholds (< 400 ms, zero), and a boundary (a 10-minute test). A, B, and D each lean on a feeling — "faster", "happy", "clean / production-ready" — that no machine can return true or false for, so the loop chasing them can never legitimately exit.

In the code — wiring the contract to the loop

Here's the shape of how a runner actually consumes SCOPE.md: it parses the contract, runs cycles inside the constraints and editable surface, and evaluates the gate against done-when every time. The human reads the log; they don't drive the loop.

loop runner (pseudocode) — reads SCOPE.md, never asks a human to operate

const scope = parseScope("SCOPE.md");   // the 6 fields

while (!gate(scope.doneWhen)) {        // the exit gate
  const state = learn(scope.context);          // see real state
  const unit  = analyze(state, scope.goal);     // pick ONE unit
  execute(unit, {
    surface:     scope.editableSurface,          // fence the change
    constraints: scope.constraints,              // guard, every cycle
    tier:        scope.agents.executor.tier      // analyze | execute | …
  });
  const proof = verify(unit, scope.doneWhen);   // at the REAL boundary
  log("LOOP-LOG.md", proof);                  // the human only reads this
}
// loop exits the instant every done-when clause is true

Locate the contract and the log

# the contract you hand-write before a loop
$ $EDITOR SCOPE.md

# the observability artifacts the human reads (never executes)
$ tail -f LOOP-LOG.md          # the loop's running record
$ less review.md               # the AFK QA's observability report

# confirm the gate is wired to your done-when
$ grep -n "Done-when" SCOPE.md

Where this goes next: lesson 3 (LEARN) is how the agent confirms the Context against reality; lesson 6 (VERIFY) is the real-boundary proof gate that evaluates Done-when; lesson 8 (the Forge) shows the /goal step compiling this same thinking into an autonomous GOAL.md; lesson 9 explains why the Validator is never the builder.

Your turn — and I'm your teacher here. Try writing a done-when for something you're actually working on, paste it into the builder in section 4, and see if it turns green. If it stays vague, ask me "what metric, threshold, and boundary would make this checkable?" and we'll sharpen it together. The next lesson, LEARN: see the real state, is how the loop confirms the Context field against reality before it changes anything.

The big idea — a contract for "done"

Why the exit condition is the load-bearing part

SCOPE.md vs. GOAL.md vs. a PRD

What "measurable" means precisely

In one picture — scope is the loop's exit gate

The six fields of SCOPE.md

Goal

Good

Breaks without it

In the file

Context

Good

Breaks without it

In the file

Constraints

Good

Breaks without it

In the file

Done-when

Measurable

Vague (un-checkable)

In the file

Editable surface

Good

Breaks without it

In the file

Agents

Good

Breaks without it

In the file

The whole contract in one file

How to find / open this

Build a done-when — watch the exit condition flip

How the verdict is computed

Vague vs. measurable — five rewrites

The constraints scope pins down

Pinned tokens — the fixed palette the scope references

How a constraint references the fixed thing

A constraint is just a done-when that must stay true

Authorization tiers — what each agent is allowed to do

Tiers are a dependency chain, recorded in the contract

Why AFK loops stop at execute

Scope as the loop's exit gate — the state machine

Observability, not operation

Reading the done-when — a live status report

The banner is an AND over the clauses

Why "pending" exists

Quick check — spot the stoppable contract

In the code — wiring the contract to the loop

Locate the contract and the log

The six fields of `SCOPE.md`

Build a `done-when` — watch the exit condition flip

Reading the `done-when` — a live status report