Step 1 · Foundations · Foundations · Loop Engineering EN PT

Module 1 · Foundations · Lesson 1

What loop engineering is (and why one-shot fails)

Most AI work is a single throw: you ask, it answers, you hope. Loop engineering trades hope for a method — say exactly what "done" means, then loop (do a little, prove it, fix, repeat) until the result truly meets the ask, and finish by teaching it back. This first lesson lays out the idea several ways, then previews the whole machine the rest of the course builds.

Read the plain version, or open the technical layer on any section.

The big idea

Picture the usual way people use an AI assistant. You type a request, it produces an answer, and then you hope the answer was right. There is no step where anything actually checks the result against what you wanted. One throw, one catch, fingers crossed. That is one-shot — and for anything bigger than a quick question, it quietly fails more often than it looks.

It fails because the first attempt at almost anything real is incomplete. A function compiles but mishandles an empty list. A summary reads well but invents a fact. A landing page looks finished but the signup button does nothing. One-shot has no way to notice any of that, because it stops at "answer" and never reaches "is the answer true."

Loop engineering is the opposite habit. Before you start, you write down what "done" actually means in a way you can measure. Then you run a tight cycle: learn the real state of things, work out the single biggest gap, do one bounded piece of work, and then go to the real place where it either works or doesn't and prove it. If it's done, you stop and teach what you built. If not, the proof tells you exactly what to fix, and you go round again.

The difference is the proof step. One-shot says "here you go." Loop engineering says "here you go — and here is the receipt that it actually works." Everything else in this course is built on that one move: prove it at the real boundary, never just claim it.

Think of it like… the difference between firing an arrow at a target in the dark versus firing with the lights on. One-shot is the dark: you loose the arrow and hope. The loop turns the lights on — you shoot, you walk up and look at where it hit, you adjust, you shoot again. Same archer, same bow; the only new thing is going to check the target between shots. That check is the whole method.

The shape, precisely

Loop engineering is: SCOPE (a measurable done-when, written before any work) → a repeated cycle of LEARN → ANALYZE → EXECUTE one bounded unit → VERIFY at the real boundary → DECIDE → on convergence, deliver the result plus a full taught course. The verify step is a Proof Gate: the check runs against the real artifact or the real boundary (the actual file, the actual command, the running app), and a mock or a from-memory claim never counts as a pass.

Why "one bounded unit"

Each pass executes exactly one change, not a batch. Bounding the unit keeps the verify cheap and unambiguous: when something breaks, exactly one thing changed, so the cause is obvious. Batches blur attribution and make the proof gate useless. The loop trades raw throughput per pass for a clean signal every pass — which converges faster on anything non-trivial.

Model-agnostic, no safety gating

The method is a control structure, not a product. It runs on any capable model or CLI, with cross-agent work delegated headlessly via cli -p. It is derived from two field-tested patterns: the steipete maintainer-orchestrator (a top model orchestrates, delegating bounded units to other agents) and a github-project-triage control-plane (issues with blocking relationships as the unit ledger). The rest of this course turns that structure into a usable harness.

The two patterns this method distills are public field reports on long-horizon agent orchestration. When in doubt about any external fact in this course, the rule is to ground it in live web evidence via the Bright Data CLI (Lesson 11) — never from stale memory. source pattern · maintainer-orchestrator + project-triage control-plane

One-shot vs the loop, at a glance

Here are the two shapes drawn next to each other. One-shot is a straight line with no way back: ask, answer, done, hope. The loop bends the line into a circle, and the bend is the verify step — when the proof fails, the arrow feeds back to the start instead of off the edge.

Left: a dashed line that ends in hope. Right: the same work bent into a cycle whose VERIFY either feeds back (fail) or releases (pass).

The cycle, drawn and labelled

Now the full annotated cycle on one sheet. Five named parts in a fixed arrangement: the scope you set once, then the four repeating steps. Tap a part in the legend below to spotlight it and read what it does.

Read it clockwise from the top: SCOPE feeds LEARN → ANALYZE → EXECUTE → VERIFY; VERIFY loops back on fail, releases on pass.

Pick one part to spotlight it, or "Show all" to reset. (Tap an active part again to clear it.)

DECIDE is the hinge

VERIFY produces evidence; DECIDE reads it and routes. Three outcomes: converged (every done-when met → leave the ring to ship + teach), continue (progress but not done → back to LEARN with the new state), or blocked on a user-only fork (a genuine human decision → hand off, decision-ready, and only then does a human get involved). The human's standing role is observability, not driving.

Each step has its own lesson

LEARN (Lesson 3), ANALYZE (Lesson 4), EXECUTE (Lesson 5), VERIFY + the gates (Lesson 6), and the whole thing running AFK (Lesson 7). This sheet is the map; the rest of Module 2 walks each node in depth.

Why loops beat one-shot

It is tempting to think a strong enough model makes one-shot fine. It doesn't — not because the model is weak, but because the structure of one-shot has no place to catch a mistake. Three views of why the loop wins: the mechanism that makes it work, the contract that makes "done" objective, and a worked trace of a real ask.

the loop, in pseudo-code

function loop_engineer(ask) {
  const scope = define_done_when(ask);   // measurable, written first
  let state = learn();                  // read the REAL artifact / boundary
  while (true) {
    const unit  = analyze(state, scope); // classify the gap → pick ONE bounded unit
    execute(unit);                    // make exactly one change
    const proof = verify(scope);       // PROOF GATE — run it at the real boundary
    if (proof.converged) return ship_and_teach(proof);
    state = learn();                   // fold the proof back in, go again
  }
}

The loop has a place a mistake gets caught — verify(). One-shot is the same code with the whole while block deleted.

GOAL.md · the done-when block

# the scope contract — what makes "done" objective
done_when:
  - "build passes: `npm run build` exits 0"
  - "the signup button POSTs and returns 200 at /api/signup"
  - "no console errors on load (checked in the running app)"
verification:
  boundary: "the running app + the real endpoint"   # NOT a mock
  proof:    "a transcript / screenshot, not a claim"

Each line is a thing a machine can check. "Looks done" is not on the list — that's the point.

LOOP-LOG.md · three passes on one ask

# ask: "add a working signup form to the landing page"
pass 1  execute: "render the form markup"
        verify:  open app → form shows, button does nothing  → FAIL
pass 2  execute: "wire button → POST /api/signup"
        verify:  curl /api/signup → 500, handler missing  → FAIL
pass 3  execute: "add the /api/signup handler"
        verify:  curl + open app → 200, row written  → PASS ✓ converged

A one-shot would have stopped after pass 1 ("here's your form") with two of the three defects shipped. The loop caught both.

Find it yourself: grep -rn "done_when" GOAL.md · cat LOOP-LOG.md

A better model raises the odds any single pass is good, but it can't tell you whether this pass was. The loop's value is the proof step, which is independent of model strength: it runs the real check at the real boundary. Strong model plus loop converges fastest; strong model alone still ships unverified guesses.

Per pass, yes — you pay for the verify. Across the whole task, no: a one-shot that's subtly wrong costs a human debugging session later, which dwarfs a few cheap proof gates now. Bounding each unit keeps the verify fast, so the loop's overhead stays small while its certainty stays high.

Running the actual thing and observing the actual result: executing the command and reading its exit code, hitting the real endpoint, opening the running app, querying the real database row. What does not count: "this should work", a mocked response, or re-reading the code and asserting it's fine. That distinction — boundary over belief — is Lesson 6.

For a throwaway answer with no real boundary — a quick definition, a brainstorm, a first-draft sketch you'll judge yourself anyway. The moment the output has to work against something real (code, data, a live system, a measurable claim), one-shot's missing check becomes the bug, and the loop earns its keep.

The five steps, slide by slide

Same cycle, one idea per card so it lands cleanly. Click Next / Back, tap a dot, or use the left/right arrow keys. The bar shows how far through you are.

Step 0 · before the loop

SCOPE — write down what "done" means.

Before any work, state the result as something you can measure. No measurable done-when, no loop — you'd never know when to stop.

Step 1 · learn

LEARN — see the real state, don't assume.

Read the actual artifact and boundary as they are right now. Decisions built on a guess about the state are guesses themselves.

Step 2 · analyze

ANALYZE — classify the gap, pick ONE unit.

Compare state to scope, name the biggest gap, and turn it into one bounded, ranked next move. Not a batch — one.

Step 3 · execute

EXECUTE — make exactly one change.

Do that single unit and nothing else. One change per pass keeps the next step's proof unambiguous when something moves.

Step 4 · verify

VERIFY — prove it at the real boundary.

Run the real check on the real thing. A Proof Gate, never a claim or a mock. This is the step one-shot doesn't have.

The hinge · decide

DECIDE — loop, ship, or hand off.

Read the proof. Not done? go round again. Done? ship + teach. A human-only fork? hand off, decision-ready.

Takeaway

Set "done", loop with proof, then teach.

Scope it, run learn → analyze → execute → verify until the proof says converged, then deliver the result and a course like this one.

1 / 7 Use ← → arrow keys

Two approaches, side by side

The same ask, handled two ways. Each card shows the shape of the approach, what it's good at, where it bites, and a plain "pick this when". Hover or focus a card to bring it forward — then use the chooser below to light up the right one for your situation.

One-shot

Ask once, take the first answer, hand it on. There is no step that checks the answer against the ask.

const result = ask(prompt);
return result;          // done? unknown — we just hope

Pros

+Fastest possible turnaround — one call.
+Fine for throwaway, no-boundary answers.

Cons

–No check: subtle defects ship silently.
–"Done" is undefined, so you can't trust it.

Pick this when The output is disposable and you'll judge it yourself anyway — a definition, a brainstorm, a rough sketch.

Loop engineering

Set a measurable "done", then do one bounded unit, prove it at the real boundary, and repeat until the proof says converged.

const scope = done_when(ask);
do { execute_one(); } while (!verify(scope).ok);
return ship_and_teach();   // done? proven ✓

Pros

+Every result carries a proof it works.
+Converges on real tasks; runs AFK.

Cons

–Per-pass overhead: you pay for verify.
–Needs a measurable done-when up front.

Pick this when The output has to actually work against something real — code, data, a live system, a measurable claim. This is the default for real work.

My task is mostly…

Where one-shot breaks

Take one real ask — "add a working signup form" — and walk it through the loop one decision at a time. Each pass executes one unit, then asks a single yes/no question at the proof gate: did it pass? A "no" loops back; a clean run ships. Pick a scenario, then press Next.

The one-shot scenario is the cautionary tale: it stops at the very first "looks done" and never reaches a proof gate at all.

Trace:

The loop returns to EXECUTE on every "no" until the proof passes. One-shot (dashed) runs once and skips the gate entirely.

Step 1 of 1

Start here

An ask lands

Press Next to walk the loop through three passes — or switch to one-shot to watch it skip the gate.

The gate is a guard, not a vibe

"Proof passes?" is not the model judging its own work — it's the real check run at the real boundary, returning a hard signal. In the worked trace, the three gates were open app (button dead → no), curl /api/signup (500 → no), and curl + open app (200, row written → yes). A "no" carries why, which is exactly the input ANALYZE needs for the next unit.

Why one-shot's path is dashed

It never touches the diamond. It runs EXECUTE once and jumps to "shipped on hope," so any defect rides along undetected. That missing edge — execute straight to ship with no gate between — is the entire failure mode this course exists to fix.

Convergence over passes

Here's what the two shapes look like over time. The y-axis is "how much of the ask is actually met and proven." One-shot makes one jump and then flatlines — it can't climb, because nothing checks it. The loop steps up at each pass as the proof gate closes one gap after another, until it reaches done.

Read left → right: one-shot plateaus below "done"; the loop climbs one proven step per pass until it crosses the line.

Monotonic by construction

The loop's curve only ever goes up or stays flat, never down — because a pass is accepted only when its proof gate holds, and a failed gate discards the change rather than shipping a regression. Each accepted step is a done-when line turned green. "Converged" is simply the pass where the last red line goes green; there is no pass beyond it.

The flat one-shot line

One-shot's height is wherever its single guess happened to land — sometimes high, sometimes low, but always uncertain, because no gate ever measured it. Drawn honestly, it's a flat line at an unknown height below "done." The loop trades that one uncertain jump for a staircase of certain ones.

The whole suite (a teaser)

The loop is the engine. The rest of this course adds the machine built around it — and here's the map so you know where we're headed. At the center is the loop you just met. Wrapped around it are four pieces: an optional Forge front-end that takes a rough idea all the way to shipped, an AFK crew that runs everything hands-off while you only watch, the ultragoal discipline that keeps a long run honest, and a toolbelt of companion skills that ground and deliver the work.

One line to hold onto: everything runs AFK, and the human's only job is observability. You read the logs, the review, the status. You don't execute anything — not even the QA. You step in only when there's a genuine human-only decision to make.

The loop at the center; Forge feeds it, the AFK crew runs it, ultragoal anchors it, the toolbelt serves it — and the human only reads the receipts.

Forge — 7 steps (Lesson 8)

For a raw, vague prompt, the Forge front-end runs before the loop: grill (self-answered interrogation of the idea) → research (optional, via the Bright Data CLI) → prototype (optional, for evidence) → PRD → issues (tickets with BLOCKING relationships, a kanban) → implement (the AFK loop: an Executor builds each ticket, an independent Validator proves it vs GOAL.md — the Validator is never the builder) → review (AFK QA emitting review.md as an observability report).

AFK + observability (Lesson 9)

Everything above runs AFK. The Orchestrator (a top model) delegates bounded units to other agents via headless cli -p; the roster is set at the start and is model-agnostic. The human reads LOOP-LOG.md / review.md / status and executes nothing — not even the QA — blocking only on a genuine user-only fork via a decision-ready handoff.

ultragoal + the toolbelt (Lessons 10–13)

ultragoal is the durable-goal discipline behind /goal — agent/CLI/model-agnostic; universal activation is a durable GOAL.md run under the loop (Codex's create_goal is one optional example, never required). The toolbelt: Bright Data CLI for real web evidence (always — never WebSearch/WebFetch, never the Bright Data MCP), Computer Use CLI for non-blocking, accessibility-tree-only macOS automation, and visual-teach, the engine that generated this very course.

Glossary

The handful of terms used above, in plain words. These are the words the whole course leans on — once a term is here, every later lesson uses it the same way.

One-shot: Asking once and taking the first answer, with no step that checks it against the request. e.g. "write the form" → here's a form → ship, defects and all.
Loop engineering: Set a measurable "done", then cycle (do one unit → prove it → fix) until the result truly meets the ask, then teach it. e.g. the three-pass signup trace that caught a dead button and a 500.
Scope / done-when: The contract written before any work: what "done" means, expressed as things you can measure. e.g. "npm run build exits 0" — a line a machine can check.
Bounded unit: Exactly one change executed per pass — not a batch — so the next proof is unambiguous. e.g. "wire the button", on its own, before adding the handler.
Proof Gate: The verify step run at the real boundary — the actual command, endpoint, or app — never a claim or a mock. e.g. curl /api/signup returning 200, not "it should return 200".
Converged: The pass where every done-when line is proven met. The loop stops; the result ships with its receipt. e.g. pass 3 above — 200 plus a written row — done.
AFK / observability: The whole run happens hands-off; the human only reads the logs and status, executing nothing, blocking only on a human-only fork. e.g. you read LOOP-LOG.md while the crew works.

Quick check

Three quick questions — answer from memory before you peek back. Each grades the moment you click, and tells you why. Recalling it now is what makes it stick.

1 · What is the one step the loop has that one-shot does not?

the proof gate is the difference, and it's independent of model strength. One-shot stops at "answer"; the loop reaches "proven."

2 · Why does each pass execute only ONE bounded unit?

one change per pass means the verify is unambiguous: if it breaks, exactly one thing changed, so the cause is obvious.

3 · In the full suite, what is the human's standing role?

everything runs AFK. The human reads LOOP-LOG.md / review.md / status and executes nothing, not even the QA, blocking only on a human-only decision.

Answered 0 / 3 · correct 0

Your turn next. You've now seen the loop five ways — drawn, deck, deep-dive, side-by-side, and a flowchart you stepped through. Next up is Lesson 2 · the scope contract, where you write the measurable "done" the whole loop depends on. Remember: this course is your teacher — ask it to re-explain any part, in plain words or in depth, whenever you want.