Module 3 · The Forge front-end · Lesson 8

The Forge front-end: 7 steps from rough idea to shipped

The core loop needs a measurable done-when before it can run. But most real work starts as a vague sentence — "let me add saved-search alerts." Forge is the optional front-end that hardens that sentence into a measurable scope and then drives the whole build, AFK. Seven steps: grill → research → prototype → PRD → issues → implement → review. Then it converges into the course you are reading right now.

Read the plain layer top to bottom; open a panel when you want the exact files, commands, and edge cases.

The big idea

You learned the loop already: learn → analyze → execute one bounded unit → verify at the real boundary → decide. The loop is a brilliant engine. But an engine needs fuel of the right grade. The loop's fuel is a measurable done-when — a finish line a machine can check. Hand it "make alerts" and it has no finish line, so it can never stop. Hand it "a user can save a search, and within 60 seconds of a new match an email arrives, proven against a live inbox" and the loop knows the exact moment it is finished.

Most tasks don't arrive with that finish line attached. They arrive as a rough idea. Forge is the front-end that turns the rough idea into the finish line — and then keeps going, all the way to shipped, without you touching the keyboard. It is optional: if your scope is already crisp, you skip Forge and go straight to the loop. If it's fuzzy — which is most of the time — you run Forge first.

Forge is seven steps, and we will give each one its own beat in this lesson:

1. Grill — argue the idea with itself until the scope is sharp. 2. Research (optional) — cache hard exploration into a durable note. 3. Prototype (optional) — throwaway code for real evidence. 4. PRD — write the destination down. 5. Issues — break it into dependency-ordered tickets and compile GOAL.md, the run contract. 6. Implement — an Executor builds each ticket, an independent Validator proves it. 7. Review — AFK QA writes an observability report you read.

Two ideas hold the whole thing together, and they are worth saying once, plainly, before we go further. First: everything runs AFK — away from keyboard. Your only job is observability: you read what happened, you don't execute anything. Second: the step that turns the grill's output into a contract is GOAL.md, and that contract's measurable done-when is the loop's exit condition. Forge refuses to even compile GOAL.md until the done-when is something a machine can check.

Think of it like… a blacksmith's forge. You bring a lump of raw iron — your rough idea. You don't hammer it cold; you heat it, fold it, test its edge, reheat it, fold again. Each fold is a step. What comes out the other side is a finished blade with a tested edge — your shipped feature with a proven done-when. Where the analogy breaks: a real smith does the hammering by hand; here the smith is a crew of agents and you only watch the sparks (read the logs).

What Forge is, precisely

Forge is an additive front-end and harness bolted onto the loop-engineering loop. It does not replace anything in the core skill — it feeds it. Its component methods are bundled self-contained under forge/ so the loop never depends on an external skill being installed; each method file is named METHOD.md (not SKILL.md) so no skill-loader registers them as duplicate top-level skills.

The seven steps map onto the loop's five verbs: grill = LEARN + ANALYZE; research = LEARN; prototype = the Proof Gate applied early; PRD = ANALYZE; issues = ANALYZE (decompose) + compile the contract; implement = EXECUTE + VERIFY + DECIDE, in a loop; review = VERIFY, AFK. The eighth beat, converge, is the unchanged Course Gate from the core skill: deliver the result plus a full visual-teach course.

The running example for this lesson

We thread one concrete idea through every step: the Atlas app's "saved search alerts" feature. The rough one-liner — "let users get notified about new matches" — is exactly the kind of vague ask Forge exists to harden. By the end you will have watched it become a measurable done-when, a dependency-ordered ticket board, a proven build, and a review report.

The boundary rule

Forge only ADDS a path into the existing loop. It changes nothing in the core loop, the gates, or the grill protocol — it reuses them. The destructive / outward-facing authorization tier stays user-gated. The deliverable is still result + course.

Why a front-end + a measurable contract beats diving in

The obvious objection: why not just point the agent at the idea and let it go? Because an agent with no finish line behaves exactly like a person with no finish line — it wanders, it gold-plates, it declares victory on the wrong thing, or it loops forever. The cost of a vague start is not paid up front; it is paid later, multiplied, in rework. A front-end pays a small, bounded cost now to avoid a large, unbounded cost later.

Below is a deep-dive on the idea, in the shape of a real feature explainer: a contents list, an interactive simulation you can run, the contract in three views (the rough idea, the hardened scope, the compiled GOAL.md), and an FAQ of the questions people actually ask. Run the simulation — send "dive straight in" runs against "front-end first" runs and watch the wasted-work counter.

Feature deep-dive · Why harden the scope before you build

The claim, in one line

A bounded amount of thinking up front (grill, and a contract) converts an open-ended build into a closed-form one. Closed-form means the loop has an exit; open-ended means it doesn't. The front-end is cheap; an exit-less loop is not.

Diving straight in feels faster for the first ten minutes and is slower for the next ten hours. The agent re-discovers the same ambiguities again and again, builds things nobody asked for, and — worst — can't tell when it's done, because nothing told it what done means.

Run it: dive-in vs. forge

Each click is one unit of work. In dive-in mode there is no contract, so a third of the work lands off-scope and has to be redone. In forge mode the contract catches drift before it costs anything. Watch the wasted-work counter.

Same raw effort, two outcomes: without a contract a third of the work is scrap; with a done-when gate almost all of it ships.

dive-in shipped 0 · wasted 0 · forge shipped 0

The contract, in three views

The same feature at three zoom levels: the rough idea Forge starts from, the hardened scope the grill converges on, and the compiled GOAL.md — the machine-checkable run contract whose done-when is the loop's exit.

the prompt that started it

# what you typed
"Let users get notified about new matches for a saved search."

# un-answerable as written:
#   notified how? email? push? in-app?
#   how fresh is 'new'? minutes? hours?
#   what proves it works — and where?

forge/grill-with-docs → converged scope

Goal    a user saves a search; new matches trigger an email.
Channel email only for v1 (push deferred — see research.md).
Fresh   "new" = matched since the last alert; delivered < 60s.
Scope   save/list/delete a search; one matcher; one mailer.
Out     digests, push, per-result dedup beyond 24h.
Proof   live SMTP sandbox receives the mail within 60s.

GOAL.md — the autonomous-run contract

<goal>   Saved-search alerts ship for Atlas, email channel.
<context> repo=atlas; matcher reuses search/index service.
<constraints> no schema break; reuse the existing mailer.
<verification>
  run the e2e: save a search, insert a matching row,
  assert an email lands in the SMTP sandbox.
<done-when>
  e2e exits 0; mail received < 60s; 0 schema diffs.

Find it yourself: grep -rn "done-when" forge/ GOAL.md

FAQ

No — it is the cheapest insurance you can buy. The grill is AFK and finishes in minutes; the contract is a few XML blocks. What it buys is an exit condition: without one, the loop literally cannot decide it is finished, so it either stops at the wrong place or runs forever. Ceremony has no payoff; a done-when has a very large one.

When the scope is already crisp — you already have a measurable done-when and a bounded change. Then you skip straight to the core loop in the main skill. Forge is for the common case where the prompt is a rough idea, not yet a finish line. Skipping the optional steps (research, prototype) is also fine when the scope is familiar.

Always the brightdata CLI — search, scrape, or a live browser session. Never WebSearch/WebFetch and never the Bright Data MCP. When the grill or a prototype needs a current fact (an API's rate limit, a library's current API), it pulls real evidence rather than guessing from memory.

Never. The done-when is checked at the real boundary by an independent Validator — a different agent than the one that built the ticket, so it cannot grade its own homework. That separation is the whole point of the Proof Gate, and it carries through the Implement and Review steps.

The economics, stated plainly

Let p be the probability that a unit of work is off-scope. Diving in pays roughly 1/(1−p) units of effort per shipped unit — every off-scope unit is built, discovered, and rebuilt. The contract converts most of that p into a cheap gate check before the build, so the multiplier collapses toward 1. The simulation above uses p ≈ 1/3, a conservative estimate for genuinely vague asks.

Why "measurable" is non-negotiable

The Forge /goal step refuses to compile GOAL.md until the done-when names (1) a metric or observable artifact, (2) a comparison or threshold, and (3) the boundary where it is checked. "Alerts work" fails all three; "the e2e exits 0 and mail lands in the SMTP sandbox within 60s, with 0 schema diffs" passes all three. This refusal is the front-end's most important safety property.

The 7 steps, in one picture

Before we walk each step, here is the whole pipeline at a glance. Read it left to right: a rough idea enters at Setup, runs the grill, optionally detours through research and prototype, becomes a PRD, then a board of issues plus a GOAL.md, then the AFK build loop, then the AFK review — and finally converges into the course. The two boxes drawn with dashed borders are the optional steps; the curved arrow underneath shows that you can reach back for grill, research, or prototype at any pass, not only at the start.

The whole Forge, one glance. Dashed boxes = optional steps. The clay feedback loop = grill / research / prototype are reachable at any pass, even mid-Implement.

The shape to keep: a default order, not rigid phases. Research and Prototype are optional. A crisp, familiar scope skips them and goes straight from grill to PRD. And the moment any step hits an unknown that needs real evidence, you can reach back for the grill, research, or prototype — that is what the feedback loop means.

Walk the 7 steps, one beat at a time

Now the same seven steps as a deck — one big idea per slide, so you absorb the shape before the detail. Click Next, tap a dot, or use the left / right arrow keys. We come back to each step in depth right after.

Step 0 · Setup

The methods come bundled. Nothing to install.

Forge's seven methods live self-contained under forge/. Setup just reads your rough prompt and any docs or repo it names. The starting line is "here is an idea."

Step 1 · Grill

Argue the idea with itself until the scope is sharp.

Instead of interviewing you, the Orchestrator spawns perspectives — user-value, feasibility, risk, simplest-thing, hardest-input — that ask and answer each other, grounded in the docs, until the scope converges.

Step 2 · Research (optional)

Cache the hard exploring once, into a durable note.

If the build would hit difficult "explore" phases — an unfamiliar API, a big codebase — cache that exploration once into research.md. Web facts come from the brightdata CLI, never a guess.

Step 3 · Prototype (optional)

Throwaway code for real evidence.

When an answer needs a running artifact, not a paragraph, hash it out in code. The code is throwaway — but the evidence feeds back into the grill, and the reusable assets carry into the build.

Step 4 · PRD

Write the destination down.

The hardened scope becomes a PRD: the destination, the user stories, the implementation notes. It is the "what and why" the rest of the run is measured against.

Step 5 · Issues + GOAL.md

Tickets with blocking arrows, plus the run contract.

The PRD becomes individual tickets with blocking relationships — a dependency-ordered kanban. Alongside it, GOAL.md is compiled: the contract whose measurable done-when is the loop's exit.

Step 6 · Implement (AFK loop)

Executor builds, Validator proves. In blocking order.

A coding agent works the board in dependency order. An Executor builds one ticket; an independent Validator proves it against GOAL.md at the real boundary. The Validator is never the builder.

Step 7 · Review → Converge

AFK QA writes a report you read. Then the course.

An independent QA pass proves every story at the real boundary and emits review.md as observability — for you to read, never to perform. Then Forge converges into a full visual-teach course.

1 / 8 Use ← → arrow keys

Trace one idea through the pipeline

The deck gave you the shape; now feel the motion. Pick a starting situation, then press Next to walk the saved-search-alerts idea through the Forge one step at a time. Each path shows a real choice: a fresh idea runs the full pipeline; an already-crisp scope skips the optional steps; and a mid-build unknown reaches back for the grill. Watch which boxes light up — and which get skipped.

Trace run:

Each chip traces a different real path. Dashed boxes are optional; the clay dashed arrow is the re-grill loop reachable mid-build.

Step 0 of 0

Start here

Pick a run, then press Next

The fresh idea run walks the full pipeline. Switch the chip above to see how a crisp scope skips the optional steps, or how a mid-build unknown reaches back for the grill.

The pipeline as a workflow

The flowchart is a literal sketch of the orchestration. The grill is a parallel() of perspective agent() calls; research and prototype are conditional folds back into the scope document; PRD → GOAL → issues is a short pipeline(); implement is a pipeline(ordered(issues), build, validate); review is a final agent() that emits review.md. The "scope crisp?" and "unknown hit?" diamonds are real branches in that script, not decoration.

Why the loop arrow matters

The clay dashed arrow — from a mid-build unknown back up to the grill — is the difference between a brittle, one-shot plan and a robust one. When Implement hits something the plan didn't anticipate (an API behaves differently than assumed), the right move is not to guess; it is to re-grill, optionally research or prototype for real evidence, fold that back, and continue. The optional steps are tools reachable at any pass, not gates you pass once.

The optional branches, drawn

Two of the seven steps are optional: Research and Prototype. They are not lesser steps — they are powerful tools you reach for when the situation calls for them, and skip when it doesn't. The deciding question is simple: does this unknown need a paragraph of facts, or a running artifact of evidence?

If the unknown is "what does this library's API actually look like, what is that service's real rate limit" — that is research: cache the facts once into research.md (web facts via the brightdata CLI) so you never re-explore. If the unknown is "will this approach even work, how does this feel" — that needs prototype: throwaway code that produces real evidence, leaving reusable assets behind. And if the scope is already familiar, you skip both.

The optional steps are chosen by the kind of unknown: facts → Research; a running artifact → Prototype; nothing unknown → straight to PRD. Both fold their evidence back.

Research, precisely

forge/research/METHOD.md exists to cache a difficult "explore" phase once into a durable research.md, so the same unfamiliar API or domain is never re-explored on a later pass. Its facts must be real and current — pulled with the brightdata CLI (search / scrape / live browser), never invented from parametric memory and never via WebSearch/WebFetch or the Bright Data MCP. Skip it when the scope is familiar, or when the unknown needs a running artifact instead (→ Prototype).

Prototype, precisely

forge/prototype/METHOD.md is the Proof Gate applied early: hash out an idea in throwaway code to get real evidence for an unknown before committing to it. The code is disposable but it leaves two durable things — the evidence, which folds back into the grill, and reusable assets, which can carry into the implementation. It is a tool, not a phase: reach for it at any pass.

Reachable at any pass

This is the rule worth memorizing: grill, research, and prototype can be reached for at any pass — including mid-Implement — whenever the loop hits an unknown that needs real evidence, not only up front. The default ordering is a convenience, not a cage.

Step 1 · Grill — the idea argues with itself

The grill is where a rough idea becomes a sharp scope. The trick is that it does this without asking you a single question. Instead of interviewing the human, the Orchestrator spawns several perspective sub-agents — one that pushes for user value, one that checks feasibility, one that hunts for risk, one that argues for the simplest thing that could work, and one that probes the hardest input. They ask and answer each other, grounded in the docs you named, and they keep going until they converge on something decision-ready.

For our running example, the rough idea is "let users get notified about new matches." The grill turns that into: email only for v1; "new" means matched since the last alert; delivered in under 60 seconds; save / list / delete one saved search; one matcher; one mailer; digests and push are explicitly out. That is a scope a machine can build against.

Think of it like… a newsroom editorial meeting before a story runs. Nobody publishes the first pitch. The reporter, the skeptic, the lawyer, and the editor all push on it from their angle until what's left is sharp and defensible. Where it breaks: the meeting has people; the grill has agents debating among themselves, and only a genuine user-only fork ever escalates to you.

The grill's job is convergence, not exhaustion. It stops when the scope is decision-ready — sharp enough to build against — not when every possible question has been asked. Grilling too hard is its own anti-pattern.

Auto-grill-with-docs

The method lives in forge/grill-with-docs/METHOD.md (with ADR-FORMAT.md and CONTEXT-FORMAT.md alongside). It runs the grill interview AFK: rather than interviewing the user, the Orchestrator spawns N perspective sub-agents that ASK and ANSWER the grill among themselves, grounded in the named docs, until they converge. The classic perspectives are user-value, feasibility, risk, simplest-thing, and hardest-input.

When the human is touched

Only a genuine user-only fork escalates — an irreversible, outward-facing, or business-intent decision the machine has no authority to make. It is escalated via forge/handoff as a decision-ready package with a recommendation, never as an open-ended question. Everything else the grill resolves on its own.

Anti-patterns to avoid

Too passive — the Orchestrator drives; it does not wait.
Not grilling in parallel — always spawn multiple perspectives at once.
Not prototyping — when an answer needs evidence, prototype; don't guess.
Grilling too hard — stop when decision-ready, not when exhausted.
Clearing context too soon — keep the grill's context until the PRD and GOAL are written.

forge/grill-with-docs/METHOD.md — the grill, self-contained

# spawn perspectives that ask + answer among themselves
const answers = await parallel(PERSPECTIVES.map(p =>
  () => agent(grillPrompt(p, scope), { schema: QA })))
let scopeDoc = synthesizeScope(answers)   # converge → sharp scope

Step 2 · Research (optional)

Sometimes the build will run head-first into territory nobody on the crew knows well: an unfamiliar API, a library you've never used, a domain with its own rules, a large codebase, or prior art worth studying. Exploring that during the build is expensive — you'd re-explore it every time you came back. Research caches that exploration once, into a durable note called research.md, so the knowledge is written down and reused.

For saved-search alerts, the research question might be: does our existing mailer support the headers we need, and what is the search service's real query budget? Those are facts — and facts come from the brightdata CLI (web search, scraping a doc page, or a live browser session), never from a guess. The answers land in research.md and the grill folds them in.

Think of it like… reading the trail map and checking the weather before a hike, then writing the key facts on a card you keep in your pocket. You don't re-check the forecast at every fork — you cached it once. Where it breaks: a trail map is static; research.md is living, and you add to it whenever a new unknown turns out to be fact-shaped.

research.md as a durable cache

forge/research/METHOD.md turns a difficult explore phase into a durable artifact so it is paid for once. The rule is strict on provenance: every external fact is pulled with the brightdata CLI — brightdata search "<query>", brightdata scrape <url>, or a live browser session — and never via WebSearch/WebFetch or the Bright Data MCP. A claim without a real source is a guess, and guesses don't go in research.md.

When to skip it

Skip Research when the scope is familiar (you already know the APIs and the domain), or when the unknown can't be settled by facts and needs a running artifact instead — that is Prototype's job. Research answers "what is true"; Prototype answers "will it work."

how research folds into the scope

if (needsResearch(scopeDoc))
  scopeDoc = fold(scopeDoc,
    await agent(researchPrompt(scopeDoc)))  # web facts via brightdata → research.md

Step 3 · Prototype (optional)

Some questions can't be answered by reading — only by building. Will this matcher actually fire on the row I expect? Does an email really arrive, end to end? For those, you write a tiny throwaway program that produces real evidence. The code itself is disposable; what you keep is the evidence (which folds back into the grill) and any reusable assets (which carry into the real build).

Prototype is the Proof Gate applied early: instead of assuming an approach works and discovering otherwise after a full build, you prove the risky bit in an afternoon. For saved-search alerts, a prototype might wire a fake search result straight into the mailer and confirm a message lands in a local SMTP sandbox — proof, before a single production line is written.

Think of it like… a chef tasting the sauce from a spoon before plating the whole dish. The spoonful is thrown away, but it told you the seasoning is right. Where it breaks: the spoonful vanishes; a prototype can leave reusable bits — a config, a fixture, a helper — that survive into the kitchen.

Prototype is a tool, not a phase. You can reach for it at any pass — including mid-Implement — the moment an unknown needs a running artifact rather than a paragraph. Feed the evidence back into the grill and continue.

Throwaway code, durable evidence

forge/prototype/METHOD.md (with LOGIC.md and UI.md) hashes out an idea in code for early feedback and to get REAL evidence for an unknown. Two outputs survive the throwaway code: the evidence (folded back into the grill / scope) and reusable assets that can later be reused in the implementation. Applying the Proof Gate here — at the riskiest unknown, early — is far cheaper than discovering the approach was wrong after a full build.

prototype each unknown, fold the evidence back

for (const u of unknowns(scopeDoc))
  scopeDoc = fold(scopeDoc,
    await agent(prototypePrompt(u)))   # real evidence, not a guess

Step 4 · PRD — write the destination down

By now the scope is sharp and any unknowns are settled. The PRD writes it all down as the destination: what the feature is, who it's for, the user stories, and the implementation notes. It is the "what and why" that the rest of the run is measured against — broader than a single loop's scope, but not a novel; a tight document, not a wall.

One useful way to read a PRD is as a phased plan you can click through. Below, the saved-search-alerts work is laid out as four phases — each with its goal, its tasks, its risks, and the exit bar that lets it advance. Click a milestone across the top (or focus the bar and use the arrow keys) to open its phase card. Notice that each phase only advances when it clears a measurable exit bar — the same discipline as the loop's done-when, applied at the plan level.

Project Atlas · saved-search alerts (email v1) Window one Forge run Owner the AFK crew (you observe)

Progress1 of 4 phases complete

Click a milestone — or focus the bar and use ← → — to open its phase card.

Phase 1 · Done

Persist a saved search

tickets A–B

Goal: a user can save, list, and delete a search. Storage and the API exist before anything tries to match against them.

User stories & tasks

As a user, I can save a search with a name
As a user, I can list and delete my saved searches
Add a saved_search table (additive migration, no break)

Exit bar

CRUD endpoints return 2xx and persist across a restart
Migration applies and rolls back cleanly; 0 schema diffs elsewhere

Risks & mitigations

LowNaming collisions per userTwo searches with the same name. Mitigation: unique index on (user, name).

Phase 2 · In progress

Match new rows to saved searches

tickets C–D

Goal: when a new row appears that matches a saved search, produce a "new match" event — reusing the existing search/index service, not a second matcher.

User stories & tasks

As a user, a new match since my last alert is detected
Reuse the search/index service to evaluate the query
Track a last_alerted_at watermark per saved search
Emit a match.new event for the mailer to consume

Exit bar

Inserting a matching row emits exactly one event
An already-alerted match does not re-fire (watermark holds)

Risks & mitigations

HighDuplicate alerts on retriesA retry re-emits the event. Mitigation: watermark + idempotency key on (search, row).

MedMatcher drifts from live searchA second matcher would diverge. Mitigation: reuse the search service; no parallel logic.

Phase 3 · Planned

Send the email

ticket E

Goal: a match.new event turns into an email through the existing mailer — no new mail stack.

User stories & tasks

As a user, I receive an email for a new match
Render a template with the match and a link back
Send via the existing mailer (reuse, do not rebuild)

Exit bar

A match.new event produces one well-formed message
Delivery is attempted within the freshness budget (< 60s)

Risks & mitigations

MedMail queued, not delivered"Sent" ≠ "received". Mitigation: the real proof is Phase 4's live-inbox check, not a send log.

What the PRD is, and isn't

forge/to-prd/METHOD.md emits the hardened scope as a PRD describing the destination, with user stories and implementation notes. In the zoom scale you met in lesson 2 — PRD → SCOPE.md → GOAL.md — the PRD is the widest: the why and what across the whole feature. It frames the work; it does not by itself give the loop an exit. The exit comes from GOAL.md (next step), whose done-when each phase's exit bar foreshadows.

Exit bars are not feelings

Every phase card carries an exit bar — a measurable gate, not a vibe. "Phase 2 went well" is not a gate; "inserting a matching row emits exactly one event and an already-alerted match does not re-fire" is. The plan stays honest under pressure because a phase cannot advance until its boxes are checked — the same property the loop's done-when gives a single cycle.

Step 5 · Issues — tickets with blocking relationships

A PRD describes the destination, but an agent doesn't build "a destination" — it builds one bounded ticket at a time. So Forge turns the PRD into individual tickets, each a small, well-specified unit. The crucial part: the tickets carry blocking relationships. You can't send the email before you can match a row; you can't match a row before a search is stored. Those dependencies make the board a dependency-ordered kanban — work flows in the only order that can actually succeed.

Alongside the tickets, Forge compiles GOAL.md — the autonomous-run contract. Its measurable done-when is the loop's exit condition, and it doubles as each ticket's acceptance bar. And here is the guardrail that makes the whole thing trustworthy: Forge refuses to compile GOAL.md until the done-when is measurable. No measurable finish line, no run.

Below is that board, live. A card with unfinished dependencies is blocked — hatched and locked; you literally cannot advance it until what it depends on is done. Clear its blockers and it unlocks. Try to move F (the end-to-end proof) before its dependencies are done and the board won't let you — that is the blocking relationship enforcing the only safe order. Move cards with the arrow, or drag them, and watch the counts and the locks update.

Blocked0

Ready0

In progress0

Done0

The blocking graph the board enforces. F (the proof) can only run after E, which needs D, which needs B and C — the dependency-ordered kanban.

Tickets as Unit Contracts

forge/to-issues/METHOD.md turns the PRD and GOAL into individual tickets with blocking relationships — a dependency-ordered kanban — where each ticket is a bounded Unit Contract (the same unit the core loop's EXECUTE step builds). The blocking edges aren't cosmetic: they are the topological order the Implement loop must respect, so a ticket is never attempted before the work it depends on exists.

GOAL.md — the run contract

forge/forge-goal/METHOD.md compiles GOAL.md with five XML blocks: <goal>, <context>, <constraints>, <verification>, and <done-when>. It follows the ultragoal discipline — goal fit → loop with a Proof-Gate verifier → anti-cheating / approval gates → red-team → activate — which you'll meet in full in lesson 10. The hard rule: it refuses to compile until the done-when is measurable. That single refusal is what guarantees the downstream loop always has an exit.

PRD → GOAL.md → issues

const prd    = await agent(toPrdPrompt(scopeDoc),  { schema: PRD })
const goal   = await agent(goalPrompt(prd),        { schema: GOAL })   # refuse unless done-when measurable
const issues = await agent(toIssuesPrompt(prd, goal),{ schema: ISSUES }) # tickets + blocking edges

Step 6 · Implement — the AFK build loop

This is where Forge hands the wheel to the core loop you already know — and never takes it back until the work is done. In a loop, a coding agent works the kanban in blocking order. For each ticket, an Executor builds it, and then an independent Validator proves the result against GOAL.md at the real boundary — the Proof Gate, never a claim, never a mock. The one rule that makes this honest: the Validator is never the builder. A different agent grades the work, so nobody marks their own homework.

If a ticket keeps failing the same way, the crew fixes the root cause — and if the plan or the prompt is the root cause, it improves that and re-runs. Every pass is logged to LOOP-LOG.md, which is one of the files you read to observe. The loop continues until every ticket is done and proven.

This is the bridge between this module and the core loop you studied. The diagram below shows exactly where each Forge step lands inside learn → analyze → execute → verify → decide.

Forge doesn't replace the loop — it feeds it. Each step lands on a verb; Implement runs the full cycle, Executor on EXECUTE and Validator on VERIFY.

Cross-agent and agnostic: the Executor and Validator can be any agent — Claude Code, Grok, Kimi, pi, Council, Codex — dispatched headless via that agent's cli -p. The only invariant is that the Validator is a different agent than the builder.

The build loop, precisely

Implement is EXECUTE + VERIFY + DECIDE, run as a loop over the kanban in blocking order. An Executor builds one ticket at a time; an independent Validator proves each result against GOAL.md at the real boundary (the Proof Gate — never a claim/mock; the Validator is never the builder). Analyze each result; if a ticket keeps failing the same way, fix the root cause — and if the prompt/plan is the root cause, improve it and re-run. Log passes to LOOP-LOG.md. Loop until every ticket is done and proven.

Cross-agent delegation

The loop is agent-agnostic but runs as a tier: a top model orchestrates and delegates bounded units down to mid-tier executors via each agent's headless cli -p (one-shot, non-interactive) — the same way the Council adapters spawn each CLI. The Validator is always a different agent than the builder; authorization tiers still apply per agent (a delegated executor gets execute permission, not destructive / outward-facing).

the implement loop

const results = await pipeline(ordered(issues),
  i => agent(executePrompt(i)),                    # Executor: one bounded unit (any agent via cli -p)
  r => agent(validatePrompt(r, goal), { schema: VERDICT })) # Validator: real-boundary proof; never the builder
# loop on any unmet done-when (improve the artifact OR the prompt)

Step 7 · Review — AFK QA as observability

The build is done and every ticket was proven as it landed. Review is the final, independent sweep: an AFK QA pass that checks every user story and done-when against the real boundary one more time, and writes the result to a file called review.md. Crucially, this is QA you read, not QA you perform. The machine runs the checks; review.md is an observability report for you.

Per criterion, the report says: what was built, the verification result, and a pointer to the evidence. It also lists residual gaps and a "flagged for awareness" set — the taste, UX, and product judgments a machine can't make and wants a human to be aware of. You observe; the loop never blocks on you. (The one exception, as always, is a genuine user-only fork, which is escalated as a decision-ready handoff — never as routine review.)

Think of it like… a flight's post-landing report waiting in your inbox. The aircraft already landed safely; the report tells you how the flight went, what to watch next time, and anything that needs a human's call. You read it — you weren't asked to fly the plane. Where it breaks: a flight report is static; review.md points at live evidence you can re-run.

The human's only role across all seven steps is observability. You read LOOP-LOG.md, review.md, and the run status. You never execute anything — not even the QA. The loop blocks on you only for a true user-only fork, via a decision-ready handoff.

review.md as an observability report

forge/review/METHOD.md runs the QA AFK: the independent Validator (plus a QA agent if useful) checks every user story / done-when against the real boundary (Proof Gate) and emits review.md — per criterion: what was built, the verification result, the evidence pointer; plus residual gaps and a flagged-for-awareness list (taste / UX / product the machine can't judge). The human only observes; the loop never blocks on them.

The handoff exception

forge/handoff/METHOD.md packages a decision-ready handoff when — and only when — a genuine user-only fork appears (irreversible, outward-facing, or business intent). It carries a recommendation, not an open question. This is the single circumstance in which the AFK loop pauses for a human, and it is never triggered by routine review or QA.

review → converge

const review = await agent(reviewPrompt(prd, goal, results)) # → review.md, an observability report the human READS
# then CONVERGE → deliver result + a full visual-teach course

Per-step status — what observability looks like

Since your only job is to observe, it helps to see exactly what that looks like. This is a live status board for one Forge run — the saved-search-alerts build. Four headline numbers up top (steps done, tickets proven, wasted work, freshness against the 60s budget), then a row per step with its state and the evidence pointer you'd open to verify. Hit Refresh for a new reading, or turn on Live to watch the run progress — the way you'd glance at it while it runs AFK.

Forge run — Atlas saved-search alerts

AFK · you observe · contract = GOAL.md done-when

Run healthy · on contract

last updated just now

Steps done

5/ 7

+1 step

Tickets proven

4/ 6

+1 proven

Wasted work

0units

contract held

Freshness (e2e)

41s

under 60s

Per-step state
Forge step	State	Pass	Evidence you can open

What you read, and what you never do

Observability is not a metaphor here — it is the human's entire interface to an AFK run. The artifacts are real files: LOOP-LOG.md (every pass and its result), review.md (the final QA report), and the run status (the kind of board above). You read these. You do not run the build, you do not run the validator, and you do not perform the QA. The "Skipped (optional)" rows are Research and Prototype when the scope was familiar enough not to need them.

Why "wasted work = 0" is the headline

The wasted-work tile is the front-end's whole thesis made into a number. Because the grill produced a measurable scope and the Validator gates every ticket at the real boundary, off-scope work is caught before it accumulates. A run that holds at zero wasted units is a run whose contract is doing its job.

Converge → the course

Forge doesn't end at "it works." The last beat is converge: deliver the shipped result and a full visual-teach course — like the one you're reading. This is the unchanged Course Gate from the core skill. The point is that knowledge built during a run shouldn't evaporate when the run ends; it gets distilled into something a person can learn from later.

So the full arc, end to end, is: Setup → Grill → (Research) → (Prototype) → PRD → Issues + GOAL.md → Implement → Review → Converge. A rough sentence went in; a proven feature and a teachable course came out — and you only ever observed.

SETUP → … → CONVERGE. A rough idea enters; a proven feature and a teachable course leave. The dashed boxes are skippable.

In the code — where Forge lives

Everything in this lesson is real and bundled. The Forge methods live self-contained under forge/, each as a METHOD.md in our own words so the loop never depends on an external skill. Here is the map, and how to open it yourself.

~/.claude/skills/loop-engineering/forge-flow.md — the front-end, and forge/ — the methods

# the seven methods, self-contained under forge/
forge/grill-with-docs/METHOD.md   # 1 · grill (+ ADR-FORMAT.md, CONTEXT-FORMAT.md)
forge/research/METHOD.md          # 2 · research (optional) → research.md, brightdata facts
forge/prototype/METHOD.md         # 3 · prototype (optional) (+ LOGIC.md, UI.md)
forge/to-prd/METHOD.md            # 4 · PRD — destination + user stories
forge/forge-goal/METHOD.md        # 5 · GOAL.md — refuse unless done-when measurable
forge/to-issues/METHOD.md         # 5 · tickets + blocking relationships (kanban)
forge/review/METHOD.md            # 7 · AFK QA → review.md (observability)
forge/handoff/METHOD.md           # the user-only-fork escape hatch (decision-ready)

Open it on your machine

# read the front-end overview
cat ~/.claude/skills/loop-engineering/forge-flow.md

# list the bundled methods
ls ~/.claude/skills/loop-engineering/forge/*/METHOD.md

# find the measurable-done-when guardrail
grep -rn "done-when" ~/.claude/skills/loop-engineering/forge/

How it runs as a workflow

The crew maps directly onto the Workflow tool: the Orchestrator is the script; the self-debate grill and the Executor / Validator passes are parallel() / pipeline() of agent() calls. Forge only invokes the Workflow tool when the user has opted into multi-agent orchestration — invoking this flow is that opt-in.

One precision worth keeping straight for lesson 10: ultragoal is the durable-goal discipline behind /goal, and it is agent / CLI / model-agnostic. Universal activation is simply a durable GOAL.md run under the loop; a tool's native "create goal" command is one optional example of activation, never a requirement.

Quick check

Recall beats re-reading. Answer from memory before you peek — each question grades on click, and the feedback names the idea so you can shore up anything fuzzy.

1. What is the single thing Forge's /goal step refuses to compile without?

2. Research and Prototype are best described as:

3. In the Implement loop, who proves a ticket against GOAL.md?

4. What makes the issues board a dependency-ordered kanban?

5. During a Forge run, the human's role is to:

6. Where do web facts come from during grill, research, or verify?

Answer the six to score yourself.

I'm your teacher for this — ask me anything. Want to run Forge on your own rough idea, see a real GOAL.md compiled from a one-liner, or trace how a mid-build unknown re-enters the grill? Ask, and we'll do it on your example. Next up, lesson 9 zooms into the crew that makes Implement work: the Orchestrator, Executor, and Validator — and exactly how observability keeps you out of the execution path.

The big idea

What Forge is, precisely

The running example for this lesson

The boundary rule

Why a front-end + a measurable contract beats diving in

The claim, in one line

Run it: dive-in vs. forge

The contract, in three views

FAQ

The economics, stated plainly

Why "measurable" is non-negotiable

The 7 steps, in one picture

Walk the 7 steps, one beat at a time

The methods come bundled. Nothing to install.

Argue the idea with itself until the scope is sharp.

Cache the hard exploring once, into a durable note.

Throwaway code for real evidence.

Write the destination down.

Tickets with blocking arrows, plus the run contract.

Executor builds, Validator proves. In blocking order.

AFK QA writes a report you read. Then the course.

Trace one idea through the pipeline

The pipeline as a workflow

Why the loop arrow matters

The optional branches, drawn

Research, precisely

Prototype, precisely

Reachable at any pass

Step 1 · Grill — the idea argues with itself

Auto-grill-with-docs

When the human is touched

Anti-patterns to avoid

Step 2 · Research (optional)

research.md as a durable cache

When to skip it

Step 3 · Prototype (optional)

Throwaway code, durable evidence

Step 4 · PRD — write the destination down

Persist a saved search

User stories & tasks

Exit bar

Risks & mitigations

Match new rows to saved searches

User stories & tasks

Exit bar

Risks & mitigations

Send the email

User stories & tasks

Exit bar

Risks & mitigations

Prove it end-to-end

User stories & tasks

Exit bar — the done-when

Risks & mitigations

What the PRD is, and isn't

Exit bars are not feelings

Step 5 · Issues — tickets with blocking relationships

Tickets as Unit Contracts

GOAL.md — the run contract

Step 6 · Implement — the AFK build loop

The build loop, precisely

Cross-agent delegation

Step 7 · Review — AFK QA as observability

review.md as an observability report

The handoff exception

Per-step status — what observability looks like

What you read, and what you never do

Why "wasted work = 0" is the headline

Converge → the course

In the code — where Forge lives

Open it on your machine

How it runs as a workflow

Quick check