Step 10 · The Forge front-end · The Forge front-end · Loop Engineering ENPT
Module 3 · The Forge front-end · The goal discipline

ultragoal: durable, verifiable goals (any CLI, any model)

The Forge starts with a goal — but a wish is not a goal. ultragoal is the discipline behind the /goal step: it turns a fuzzy intention into a contract the loop can actually pursue and that the world can actually check. A good goal has three parts and only three: an observable finish line, a verifier that can FAIL at the real boundary (the Proof Gate), and enough durable context to recover after an interruption. This lesson teaches you to design that contract, to red-team it before you switch it on, and to keep it agent-agnostic — the universal activation is a durable GOAL.md run under the loop on any CLI and any model. A native goal API is optional sugar, never a requirement.

Read the plain version, or open the technical layer on any section.
1

The big idea: a goal is a contract, not a wish


"Make the nightly report better" is a wish. Nobody can tell when it is finished, and nobody can prove it ever was. A goal is the opposite: it names exactly what "done" looks like, it carries a test that can come back red, and it writes down enough that if the run is interrupted at 3 a.m. it can pick itself back up in the morning.

Our running example for the whole lesson is one real objective: the RHG nightly digest must land in every subscriber's inbox by 07:00 with zero broken links. Watch how that wish becomes a contract you could hand to any agent and walk away from.

Why does this matter so much? Because the loop you learned in Module 2 — LEARN → ANALYZE → EXECUTE one unit → VERIFY at the real boundary → DECIDE — only converges if there is a fixed thing to converge on. The goal is that fixed thing. If the finish line can drift, or the test can be quietly weakened, the loop will happily run forever and declare victory over nothing. ultragoal is how you make the finish line stay still and the test stay honest.

Think of it like… a building contract instead of "make me a nice house." The contract states the address, the number of rooms, the date, and — crucially — the inspection the builder must pass before they get paid. A wish lets the builder declare themselves done. A contract lets an inspector say "not yet." ultragoal writes the contract and hires the inspector. Where the analogy breaks: the inspector here must be someone other than the builder, and the inspection runs at the real boundary — the actual inbox at 07:00, not the builder's word that it works.

Goal vs task — the formal difference

A task is one-shot: do it, it's done, you move on. A goal is durable: it persists across attempts, waits, recoveries, and long feedback cycles, and its success is established by an external signal rather than the agent's self-report. ultragoal is the goal-setting front-end to loop-engineering: ultragoal defines and activates; the loop pursues. Same Proof-Gate spine, never a claim or a mock.

The one-line definition to memorise

A good goal = one observable finish line + a verifier that can fail + durable context to recover. Everything else in this lesson is an elaboration of those three clauses. If a draft is missing any one of them, it is not yet a goal — it is a wish wearing a goal's clothes.

Where it came from

The discipline is adapted, with permission, from jxnl/dots → agents/skills/ultragoal and rewritten for this harness: agent-agnostic activation, integrated with the loop-engineering loop (Proof Gate, the Forge GOAL.md, the Orchestrator/Executor/Validator crew from lesson 9).

2

Anatomy of a good goal


Before any tooling, hold the shape in your head. A goal is built from a small number of named parts that fit together. The three load-bearing ones are circled below; the rest support them.

durable context — survives an interruption (GOAL.md / WORKLOG.md / RESULT.md) Outcome one observable result Baseline where it starts today The GOAL the contract Primary verifier Proof Gate — can FAIL at the real boundary supporting checks Approval gates irreversible / public / shared / costly → ask the human first Blocker standard external blocker + smallest next action. Doubt alone ≠ blocked. Completion proof the exact commands, outputs, paths, or screenshots required.
The three load-bearing parts (thick strokes): observable Outcome, a Proof-Gate verifier that can fail, and durable context wrapping it all. The guards along the bottom keep the run safe and honest.
If you can only remember three boxes, remember the thick-stroked ones: outcome, a verifier that can fail, and the durable wrapper. A goal missing any of the three is not yet a goal.
3

Goal fit: when is it a goal, and when is it just a task?


Not every piece of work deserves the full goal machinery. The first decision ultragoal makes is fit: should this be a durable goal, or an ordinary task? Reach for goal-mode only when most of these hold — progress needs repeated attempts, waiting, or recovery; success is measurable at a real boundary; the agent can react to the next failure without asking you; and the completion evidence is stronger than the agent saying "done". If the work is one-shot, taste-dependent, blocked on repeated human choices, or has no credible verifier — it is a task, not a goal.

Below: each candidate from the RHG world is judged on the same axes. Switch the lens to compare the rows, then read the per-candidate cards for the verdict and the reason.

RHG candidates — judged on the four fit axes
candidate \ axis repeated / waits? verifiable at a boundary? recovers without you?
nightly digest by 07:00 Yes — runs every night, can fail and retry Yes — fetch the inbox, scan the links Yes — a failed send just re-runs
pick the new brand colour No — decided once No — it is taste, not a test No — needs your judgement
migrate the link checker to v2 Yes — iterate until tests pass Yes — the test suite is the gate Yes — failures point to the next fix
approve the legal disclaimer No — a single sign-off No — only a human can say "approved" No — blocked on a person

candidate · nightly digest

✓ make it a goal

Digest in every inbox by 07:00, zero broken links. Repeats nightly, fails loudly, and a different agent can fetch the inbox to prove it.

Fits all four axes → durable goal.

candidate · brand colour

→ keep it a task

Choose the new accent colour. Decided once, by taste. There is no boundary that can say "this colour is correct."

No verifier, taste-dependent → ordinary task.

candidate · link-checker v2

✓ make it a goal

Migrate to link-checker v2 with the suite green. Many iterations, each failure shows the next fix, the test suite is the boundary.

Repeated + verifiable + self-recovering → goal.

candidate · legal sign-off

→ keep it a task

Approve the new disclaimer. One human decision. The agent cannot make it, and nothing it produces can prove approval.

Blocked on a person → ordinary task.

Recommend goal-mode only when most hold

Progress needs repeated attempts, waiting, recovery, or long feedback; success is measurable by a test, benchmark, workflow, artifact inspection, screenshot, or readback at a real boundary; the agent can respond to the next failure without another preference decision; and completion evidence is stronger than the agent saying "done."

Prefer an ordinary task/plan when…

The work is one-shot, taste-dependent, blocked on repeated human choices, lacks a credible verifier, or risks unbounded external action. The brand colour and the legal sign-off fail on the verifier and the human-dependence axes — wrapping them in goal machinery would only add ceremony, not safety. Fit is the gate before everything else: get it wrong and a beautifully-specified "goal" still cannot be pursued autonomously.

4

Three modes: design, critique, activate


ultragoal runs in one of three modes, and the difference is mostly about how far it goes. Design researches and hands back a goal packet — and stops; nothing is switched on. Critique takes a goal someone already drafted and sharpens it — tightening the verifier, closing the cheat paths. Activate does design + critique and then, as the final step, writes the contract and starts the loop. The golden rule: you only activate when the human explicitly asks to start, set, or run a goal — never from vague planning talk.

⟵ activation line (only Activate crosses) ⟶ DESIGN research + draft return a packet ▢ stops here CRITIQUE read a draft sharpen verifier, close cheat paths stops here ACTIVATE design + critique red-team, then write GOAL.md ▶ loop runs
Design and Critique stop short of the dashed activation line. Only Activate crosses it — and only when the human explicitly asked to start the goal.

Default Activation Rule

When the user explicitly invokes ultragoal for a concrete work objective and asks to build, complete, run, pursue, or "do it", treat it as Activate by default — do not stop after writing durable state or reporting a packet. After grounding and (when useful) writing GOAL.md, activate, then continue under Active Goal Discipline. Stay in Design only when the user asks to draft, design, critique, or discuss a goal without starting it.

A fourth mode: goal tree

Only when the user explicitly authorises goal-backed subagents. Each child gets one bounded objective and one verifier — the same Orchestrator/Executor/Validator discipline from lesson 9, where the Validator is never the agent that built the thing. Never spawn child goals on your own initiative.

5

Define the loop: the eight parts of the contract


Designing a goal means filling in eight specific blanks. Skip one and the loop has a hole to fall through. Here they are, each in one plain line, then drawn as the cycle they form.

  1. Outcome — the one observable result. ("Digest in every inbox by 07:00, zero broken links.")
  2. Baseline — where it starts today. ("Last night it went out at 07:41 with two dead links.")
  3. Primary verifier (Proof Gate) — the strongest independent success check, run at the real boundary. ("Fetch a subscriber inbox at 07:00 and crawl every link.")
  4. Supporting checks — regression, quality, safety, durability. ("Don't break unsubscribe; image alt-text intact.")
  5. Iteration loop — inspect → change one meaningful thing → run the verifier → record evidence → choose next.
  6. Anti-cheating — never weaken tests, narrow scope, hide failures, swap in mocks, or move the benchmark without approval.
  7. Approval gates — irreversible, public, shared, or costly actions need separate human approval first.
  8. Blocker standard + completion proof — an external blocker plus the smallest next action; and the exact evidence required before "done."
inspect state change ONE thing run verifier at the real boundary record evidence decide next one meaningful change per turn
The iteration loop is the LEARN→ANALYZE→EXECUTE→VERIFY→DECIDE spine, scoped to this goal. The verifier node is the only one that touches the real boundary.
6

The goal brief: the eight parts, laid out as gated phases


A goal that runs over many nights is really a sequence of phases, each with a goal, the tasks inside it, the risks, and — the part that matters most — an exit bar it must clear before the next phase begins. That exit bar is just the verifier applied at that step. Click a phase below to open its card; the bar across the top is the map. This is the RHG nightly-digest goal, broken into the phases ultragoal would actually write.

Goal RHG nightly digest — in every inbox by 07:00, zero broken links Owner the loop (agent-agnostic) Verifier independent inbox + link crawl
Progress1 of 4 phases complete

Click a phase — or focus the bar and use — to open its card.

Phase 2 · In progress

Build the verifier first

Goal: write the Proof Gate before the fix, so success can never be self-declared. The verifier must be able to fail on tonight's real digest.

Tasks
  • Script: fetch a live subscriber inbox via IMAP at 07:00
  • Assert the digest arrived and its timestamp is ≤ 07:00
  • Crawl every href in the body; assert 0 non-200s
  • Prove the verifier FAILS on a known-bad fixture
Exit bar
  • Verifier returns red on the known-bad digest
  • Verifier runs unattended and writes evidence to a file
  • Validator is a different agent than whoever fixes the pipeline
Risks & mitigations
High A verifier that can't actually fail If it only checks "did the send function get called", it proves nothing. Mitigation: require a red run on a bad fixture before trusting any green.
Med Inbox fetch is flaky IMAP can be slow at 07:00. Mitigation: retry window + require clean-state reproduction.

Verifier-first is not optional

Phase 2 builds the Proof Gate before phase 3 touches the pipeline. This ordering is deliberate: if you fix first and verify later, you will be tempted to write a verifier that happens to pass the fix you already made — a verifier that has never seen red. Demanding a red run on a known-bad fixture is the cheapest insurance against a rubber-stamp check. An exit criterion is a measurable gate, not a feeling: "the digest looks fine" is not a gate; "an independent IMAP fetch at 07:00 found the digest with zero non-200 links" is.

7

A verifier that can FAIL — the whole point


Here is the single idea the rest of the lesson protects. A verifier is only worth anything if there is a real input that makes it come back red. A check that returns green no matter what is not a verifier — it is a decoration. The trick most fakes share is that they never touch the real boundary: they ask the agent "did you send it?" instead of asking the inbox "did it arrive?"

Compare the two below. On the left, a hollow check that can only say "yes". On the right, a Proof Gate that reaches the real inbox and can say "no".

✕ hollow check (a fake) the agent "did you send it?" always ✓ green no input can make it red → worthless ✓ Proof Gate (real boundary) fetch the live inbox + crawl the real inbox ✓ green ✕ red if missing / dead link
The only difference that matters: the Proof Gate has an input — a missing digest or a dead link — that turns it red. A check with no red path proves nothing.
Test your own verifier with one question: "What input makes this come back red?" If you can't name one, you don't have a verifier yet — you have a decoration.
8

Red-team the verifier: can "done" be faked?


Time to drive it yourself. The panel on the left lets you build a verifier for the RHG digest goal — choose where it checks, whether it can fail on a bad fixture, and whether the agent grades its own work. The panel on the right assembles that verifier and then renders the verdict that matters: could a lazy or adversarial run fake "done" past this check? Try to build a green-looking verifier that is actually fakeable, then fix it.

Build the verifier

"agent's claim" and "a mock" never touch the real boundary.

Assembled verifierlive

        
?

The four cheat paths this tuner models

A verifier becomes fakeable the moment any of these is true: (1) it checks a claim or a mock instead of the real boundary, so the agent's word is the evidence; (2) it has never seen red, so a green result is meaningless; (3) the builder grades itself, so bias toward "done" is unchecked; or (4) the run may narrow scope or move the benchmark without approval, so "done" can be redefined downward until it's trivially true. The right pane goes red the instant any one of these holds — because in a real run, any one of them is enough to fake completion.

The anti-cheating clause in one sentence

A passing verifier must mean the real outcome: do not weaken tests, narrow scope, hide failures, swap in mocks, or move the benchmark without approval. The tuner is a toy, but the verdict logic is exactly the red-team you run on every real goal before activating it.

9

The anti-cheating rule: five ways "done" gets faked


Under schedule pressure, an agent (or a person) will reach for the easy win: make the test pass without making the thing true. ultragoal names the five moves that do this and forbids all of them without explicit human approval. Memorise the list — it is the same list whether you are red-teaming your own goal or reviewing someone else's.

A passing verifier must mean the real outcome weaken the tests relax assertions narrow the scope "only these users" hide failures swallow the error swap in a mock fake the boundary move the benchmark 07:00 → 07:05 the only legal move: change the benchmark WITH human approval
All five are forbidden by default. The only legal way to change what "done" means is an explicit, approved change to the contract — never a quiet edit mid-run.

A concrete RHG cheat

Say the send keeps landing at 07:02. The lazy fix is to edit the verifier's assertion from ts <= 07:00 to ts <= 07:05. The digest now "passes" — but the real outcome (in every inbox by 07:00) is no longer what's being proven. The anti-cheating rule catches this because moving the benchmark requires approval: the loop must surface the proposed change to the human, not slip it in. The same logic forbids "narrow scope" (proving it only for a test account) and "swap in a mock" (asserting against a fake inbox that always has the digest).

10

Keep state durable: GOAL.md / WORKLOG.md / RESULT.md


A goal that lives only in an agent's head dies the moment that turn ends. The third load-bearing part of a good goal is durable context: enough written down that a fresh run — tomorrow morning, a different model, after a crash — can read the files and resume cold. The loop-engineering convention uses three files, each with a clear job.

GOAL.md outcome, baseline, constraints, success + blocker criteria the contract — stays fixed WORKLOG.md attempts, evidence, current state, next action grows every turn RESULT.md final change, boundary evidence, remaining risks written at the end a fresh agent reads all three → resumes cold
Keep the active objective compact in GOAL.md; let WORKLOG.md carry the running detail; seal the proof in RESULT.md. Any future run reads all three and continues without you re-explaining a thing.

Keep the contract compact; push detail down

Keep the active objective in GOAL.md short; when supporting context grows, put it in the nearest durable file. Prefer the project's existing conventions over inventing new files. Do not create files in Design mode unless asked, and always read existing goal files before editing — preserve dirty work, never clobber a half-finished WORKLOG. A partial or interim state recorded honestly in WORKLOG.md is valid carry-over; it is never a finalized "done."

11

Any CLI, any model: GOAL.md is the core, a native API is optional


This is the part people most often get wrong, so it is worth saying plainly. ultragoal is not tied to one agent. The universal way to activate a goal is to write a durable GOAL.md and run it under the loop — and that works on any CLI (claude, codex, kimi, grok, gemini, opencode, …) and any model, with no special tools required. Some agents also expose a native goal API — Codex's create_goal / update_goal is one example — which you may call as the activation step when it exists. It is optional sugar over the same GOAL.md discipline, never a requirement. "Activate" always means: write/commit GOAL.md and enter the loop, plus call the native API only if the agent happens to have one.

GOAL.md the universal contract claude codex kimi grok gemini · opencode · … run under the loop Proof Gate spine native goal API e.g. Codex create_goal optional sugar
Every CLI activates the same way: a durable GOAL.md run under the loop. The native goal API (dashed) is one optional path some agents add on top — never the thing that makes it work.
If a course or a colleague tells you "you need Codex's create_goal to set a goal" — that's the misconception this section exists to kill. The contract is GOAL.md + the loop. The API is sugar.
12

Walk the gate: design → red-team → activate


Activation is the final action, and it has a gate in front of it: the red-team. Before any goal goes live you ask a short series of hard questions, and the very first "no" sends you back to redesign — you do not activate a goal you could not defend. Pick a draft below, then press Next to walk it through the gate one question at a time. Watch where a weak draft peels off to back to design and a defensible one reaches activate.

Red-team this draft:
no — good yes yes yes — bad no no A goal draft enters Can success be faked? Outcome observable? Checkable outside? ▶ Activate — write GOAL.md ↺ Back to design
Read top → bottom. The first failed red-team question routes the draft back to design; only a draft that survives all three is allowed to activate.
Step 1 of 5

Start here

A goal draft arrives at the gate

Press Next to red-team the defensible goal question by question. Switch the draft above to watch a weak one get sent back.

Red-team the draft before activating

The full checklist ultragoal runs before it crosses the activation line: (1) Can success be faked by weakening the verifier? (2) Could the words be satisfied while missing the user's real outcome? (3) Are the approval gates explicit? (4) Does the loop say what to do after a failed attempt or a wait? And the one this flow ends on: is completion observable outside the running agent? A "no" on observability is fatal — if only the agent can see that it's done, "done" is just a claim. Survive all of them and only then write/commit GOAL.md and enter the loop (plus the native API, if any).

13

The live goal readout: what you watch while it runs


Once the goal is active, you don't drive — you observe (exactly the human's role from lesson 9). The readout below is what the goal's durable state looks like rendered as a dashboard. Flip between the three honest states a goal can be in — active, blocked, and done — and watch the numbers and the status pill change. Notice that "blocked" is a real, named state, not a failure to hide.

RHG nightly digest — goal state

source · WORKLOG.md (last verifier run)

ACTIVE
Send time
07:02
target ≤ 07:00
Broken links
0
target 0
Clean nights
1 / 3
need 3 consecutive
Last verifier
RED
timing miss

Active: the verifier ran at 07:00, found the digest but a 2-minute-late timestamp, and reported red. The loop will change one thing tonight and re-run. You did nothing — and nothing needed you.

Three honest states, no fourth

Active means a safe, relevant next step remains — the loop keeps going. Blocked is only legitimate after the repeated external-blocker threshold is met and no meaningful progress remains, with the smallest next action recorded; difficulty or uncertainty alone is never "blocked." Done is marked only after the objective and the completion proof are satisfied at the real boundary — here, three consecutive clean nights with archived evidence. There is no "probably done" — a partial state stays Active with its next action written down.

14

A goal honestly blocked — with the exact evidence


The most honest thing a goal can do, short of finishing, is stop and say exactly why — at the real boundary, with evidence, and with the smallest next action. Below is the RHG digest goal written up the way a blocked run should report: a header, a timeline of what the verifier actually saw, a five-whys to the real cause, the proof that it is a genuine external blocker, and a checklist of what would unblock it. This is what "blocked" looks like when it is a state, not an excuse.

BLOCKED

The verifier can't reach the inbox to prove the send

GoalRHG digest by 07:00
Detected07:00 · verifier run
Boundarylive IMAP inbox
Stateblocked (external)
Verdict byValidator agent

What the verifier saw, in order. Olive is routine, clay is a warning, red is the wall it hit, green is the one fix it could still make.

  1. 06:58

    Verifier starts on schedule

    The Proof Gate wakes up to fetch a real subscriber inbox and crawl the digest's links.

    routine
  2. 07:00

    IMAP login refused

    The mailbox password the verifier uses returns AUTHENTICATIONFAILED. It was rotated by IT last night.

    warning
  3. 07:01

    Cannot prove the outcome at the boundary

    With no inbox access, the verifier cannot confirm the digest arrived. It refuses to report green on a check it could not run.

    blocked
  4. 07:01

    Smallest next action recorded

    WORKLOG: "External blocker — rotated IMAP credential. Next action: a human re-grants the mailbox secret; then re-run the verifier." No scope was narrowed, no green faked.

    next action

Five whys, to the real cause — and why this is a true blocker, not just difficulty.

  1. Why is the goal not done?

    The verifier could not confirm the digest arrived.

  2. Why couldn't it confirm?

    It could not log into the inbox to look.

  3. Why couldn't it log in?

    The mailbox credential returns AUTHENTICATIONFAILED.

  4. Why does the credential fail?

    IT rotated the secret overnight and the new value was never shared with the run.

  5. Root · why this is a real blocker

    Only a human with vault access can re-grant the rotated secret — the agent cannot self-serve it, and there is no safe next step that makes progress without it. That is the bar for "blocked": an external dependency plus no meaningful work left. The loop did the honest thing — it did not weaken the check to a fakeable one just to show green.

What unblocks it. Check items off as they happen — the bar tracks progress. P1 is the human-only step that the loop is correctly waiting on.

0 of 3 done
  • P1
  • P2
  • P3

Blocked > a faked green

The cheap, dangerous alternative was for the verifier to "pass" because the send function ran — never checking the inbox at all. That would have marked the goal done while subscribers got nothing. Blocking with evidence is strictly safer: it surfaces a precise, human-actionable next step (re-grant the secret) and keeps the contract intact. This is also where an approval gate and the human's observability role meet — the loop pauses on a genuine user-only fork (vault access) and waits, exactly as lesson 9 described, rather than guessing or faking.

15

In the code: the GOAL.md contract


None of this needs special tooling. The whole goal is a plain text file that any agent can read. Here is the RHG digest goal written as the durable contract — the eight parts from section 5, in the order ultragoal writes them. This is the file you'd commit and then walk away from.

GOAL.md — the durable, agent-agnostic contract
# GOAL — RHG nightly digest

outcome:   "The RHG digest lands in every subscriber inbox by 07:00, zero broken links."
baseline:  "Last week: median send 07:38; 2.1 dead links/issue."

verifier:   # the Proof Gate — runs at the REAL boundary, can FAIL
  - fetch a live subscriber inbox via IMAP at 07:00
  - assert digest present AND timestamp <= 07:00
  - crawl every href in the body; assert 0 non-200 responses
  - MUST have failed once on a known-bad fixture (seen red)
  - run by a DIFFERENT agent than the one that edits the pipeline

supporting: [ unsubscribe link works, image alt-text intact ]
anti_cheating: "No weakening tests, narrowing scope, hiding failures,"
               "mocking the inbox, or moving 07:00 — without human approval."
approval_gates: [ rotating prod secrets, emailing all subscribers a test ]
blocker_standard: "external dependency + smallest next action; doubt != blocked"
completion_proof: "3 consecutive clean nights, evidence archived in WORKLOG.md"

Read it for yourself

The full ultragoal discipline lives in the skill. To read the workflow, the goal-fit rule, the anti-cheating clause, and the red-team checklist:

# the ultragoal workflow, modes, and red-team checklist
cat ~/.claude/skills/ultragoal/SKILL.md
# where it ties into the loop + the Forge GOAL.md contract
grep -rn "Proof Gate\|GOAL.md\|anti-cheating\|red-team\|Validator" \
     ~/.claude/skills/ultragoal/ ~/.claude/skills/loop-engineering/

The three invariants the contract enforces: (1) the verifier runs at the real boundary and has been seen to fail; (2) "done" requires the completion proof, not a claim; (3) the activation path is GOAL.md + the loop on any CLI — a native goal API is optional. Note there is no field that names a specific model or agent: the contract is deliberately portable.

16

Worked example: one goal, design to done


Tie it together with the RHG digest goal, start to finish. Watch where the discipline acts and where you do not.

  1. Fit: the digest repeats nightly, fails loudly, and is verifiable at a real inbox → it is a goal, not a task. (The brand-colour request beside it stays a task.)
  2. Design (the loop): ultragoal fills the eight blanks — outcome ("inbox by 07:00, zero broken links"), baseline ("07:38, 2.1 dead links"), the Proof Gate (live IMAP fetch + link crawl), supporting checks, anti-cheating, approval gates, blocker standard, completion proof.
  3. Verifier first: before touching the pipeline, it writes the verifier and proves it goes red on a known-bad fixture. A check that has never failed is not trusted.
  4. Red-team (the gate): can success be faked? Is the outcome observable? Is it checkable outside the agent? All three survive — the inbox is an external boundary, the timing and links are observable, and the verifier can fail. The draft is allowed across the activation line.
  5. Activate: only now does it write/commit GOAL.md and enter the loop — on whatever CLI is running, no native API required. The Validator is a different agent than whoever edits the pipeline.
  6. Run (you observe): night one, the verifier reports red — digest arrived but at 07:02. The loop changes one thing (moves the slow render earlier), re-runs, and gets green. It does not edit the assertion to 07:05 — that would need your approval.
  7. Block, honestly: night two, the IMAP credential was rotated; the verifier can't reach the inbox, so it marks blocked with the exact evidence and the smallest next action (a human re-grants the secret). It does not fake a green.
  8. Done: after the secret is restored, three consecutive clean nights with archived evidence satisfy the completion proof. Only then is the goal marked done, and RESULT.md records the change, the boundary evidence, and the residual risks.

Count your actions: one — re-granting a rotated secret behind an approval gate. Everything else the goal did on its own, and every "done" was proven at the real inbox, never claimed.

The durable log of the run

# WORKLOG.md — RHG digest
n1 07:00 verify  RED  digest present, ts=07:02 (> 07:00) · evidence: imap+crawl ok
n1 07:05 change  move render step earlier (one change) · re-run queued
n1 07:55 verify  GREEN ts=06:57, 0 broken links · 1/3 clean nights
n2 07:00 verify  BLOCKED IMAP AUTHENTICATIONFAILED (rotated) · next: human re-grants secret
n2 09:30 human   approval-gate: mailbox secret re-granted
n3 07:00 verify  GREEN ts=06:58, 0 broken · 2/3
n4 07:00 verify  GREEN ts=06:59, 0 broken · 3/3 → completion proof met
n4 07:01 done    RESULT.md written · verifier left running as nightly guard

Note that every verify line is a real boundary result (an actual inbox fetch), the one human line sits behind an approval gate, and "done" appears only after 3/3 — never on a single lucky night. The agent's name is irrelevant to the trace; the contract is what's fixed.

17

Quick check: did the model land?


Four quick questions. Pick one answer in each — it grades on click, and tells you why.

What are the three load-bearing parts of a good goal?

How can you tell a real verifier from a decoration?

On any CLI, what is the universal way to activate a goal?

The send keeps landing at 07:02. Which move is allowed?

You are not done learning here — I am your teacher for this. Ask me to red-team a goal you actually want to run, to turn one of your own wishes into a GOAL.md contract with a verifier that can fail, or to show how this activates on a CLI you use (Codex, Kimi, Grok — the contract is the same). Next up: the Bright Data CLI — how the loop gets real web evidence instead of guessing, every single time.