candidate · nightly digest
✓ make it a goalDigest in every inbox by 07:00, zero broken links. Repeats nightly, fails loudly, and a different agent can fetch the inbox to prove it.
Fits all four axes → durable goal.
The Forge starts with a goal — but a wish is not a goal. ultragoal is the discipline behind the /goal step: it turns a fuzzy intention into a contract the loop can actually pursue and that the world can actually check. A good goal has three parts and only three: an observable finish line, a verifier that can FAIL at the real boundary (the Proof Gate), and enough durable context to recover after an interruption. This lesson teaches you to design that contract, to red-team it before you switch it on, and to keep it agent-agnostic — the universal activation is a durable GOAL.md run under the loop on any CLI and any model. A native goal API is optional sugar, never a requirement.
"Make the nightly report better" is a wish. Nobody can tell when it is finished, and nobody can prove it ever was. A goal is the opposite: it names exactly what "done" looks like, it carries a test that can come back red, and it writes down enough that if the run is interrupted at 3 a.m. it can pick itself back up in the morning.
Our running example for the whole lesson is one real objective: the RHG nightly digest must land in every subscriber's inbox by 07:00 with zero broken links. Watch how that wish becomes a contract you could hand to any agent and walk away from.
Why does this matter so much? Because the loop you learned in Module 2 — LEARN → ANALYZE → EXECUTE one unit → VERIFY at the real boundary → DECIDE — only converges if there is a fixed thing to converge on. The goal is that fixed thing. If the finish line can drift, or the test can be quietly weakened, the loop will happily run forever and declare victory over nothing. ultragoal is how you make the finish line stay still and the test stay honest.
Think of it like… a building contract instead of "make me a nice house." The contract states the address, the number of rooms, the date, and — crucially — the inspection the builder must pass before they get paid. A wish lets the builder declare themselves done. A contract lets an inspector say "not yet." ultragoal writes the contract and hires the inspector. Where the analogy breaks: the inspector here must be someone other than the builder, and the inspection runs at the real boundary — the actual inbox at 07:00, not the builder's word that it works.
A task is one-shot: do it, it's done, you move on. A goal is durable: it persists across attempts, waits, recoveries, and long feedback cycles, and its success is established by an external signal rather than the agent's self-report. ultragoal is the goal-setting front-end to loop-engineering: ultragoal defines and activates; the loop pursues. Same Proof-Gate spine, never a claim or a mock.
A good goal = one observable finish line + a verifier that can fail + durable context to recover. Everything else in this lesson is an elaboration of those three clauses. If a draft is missing any one of them, it is not yet a goal — it is a wish wearing a goal's clothes.
The discipline is adapted, with permission, from jxnl/dots → agents/skills/ultragoal and rewritten for this harness: agent-agnostic activation, integrated with the loop-engineering loop (Proof Gate, the Forge GOAL.md, the Orchestrator/Executor/Validator crew from lesson 9).
Before any tooling, hold the shape in your head. A goal is built from a small number of named parts that fit together. The three load-bearing ones are circled below; the rest support them.
Not every piece of work deserves the full goal machinery. The first decision ultragoal makes is fit: should this be a durable goal, or an ordinary task? Reach for goal-mode only when most of these hold — progress needs repeated attempts, waiting, or recovery; success is measurable at a real boundary; the agent can react to the next failure without asking you; and the completion evidence is stronger than the agent saying "done". If the work is one-shot, taste-dependent, blocked on repeated human choices, or has no credible verifier — it is a task, not a goal.
Below: each candidate from the RHG world is judged on the same axes. Switch the lens to compare the rows, then read the per-candidate cards for the verdict and the reason.
| candidate \ axis | repeated / waits? | verifiable at a boundary? | recovers without you? |
|---|---|---|---|
| nightly digest by 07:00 | Yes — runs every night, can fail and retry | Yes — fetch the inbox, scan the links | Yes — a failed send just re-runs |
| pick the new brand colour | No — decided once | No — it is taste, not a test | No — needs your judgement |
| migrate the link checker to v2 | Yes — iterate until tests pass | Yes — the test suite is the gate | Yes — failures point to the next fix |
| approve the legal disclaimer | No — a single sign-off | No — only a human can say "approved" | No — blocked on a person |
candidate · nightly digest
✓ make it a goalDigest in every inbox by 07:00, zero broken links. Repeats nightly, fails loudly, and a different agent can fetch the inbox to prove it.
Fits all four axes → durable goal.
candidate · brand colour
→ keep it a taskChoose the new accent colour. Decided once, by taste. There is no boundary that can say "this colour is correct."
No verifier, taste-dependent → ordinary task.
candidate · link-checker v2
✓ make it a goalMigrate to link-checker v2 with the suite green. Many iterations, each failure shows the next fix, the test suite is the boundary.
Repeated + verifiable + self-recovering → goal.
candidate · legal sign-off
→ keep it a taskApprove the new disclaimer. One human decision. The agent cannot make it, and nothing it produces can prove approval.
Blocked on a person → ordinary task.
Progress needs repeated attempts, waiting, recovery, or long feedback; success is measurable by a test, benchmark, workflow, artifact inspection, screenshot, or readback at a real boundary; the agent can respond to the next failure without another preference decision; and completion evidence is stronger than the agent saying "done."
The work is one-shot, taste-dependent, blocked on repeated human choices, lacks a credible verifier, or risks unbounded external action. The brand colour and the legal sign-off fail on the verifier and the human-dependence axes — wrapping them in goal machinery would only add ceremony, not safety. Fit is the gate before everything else: get it wrong and a beautifully-specified "goal" still cannot be pursued autonomously.
ultragoal runs in one of three modes, and the difference is mostly about how far it goes. Design researches and hands back a goal packet — and stops; nothing is switched on. Critique takes a goal someone already drafted and sharpens it — tightening the verifier, closing the cheat paths. Activate does design + critique and then, as the final step, writes the contract and starts the loop. The golden rule: you only activate when the human explicitly asks to start, set, or run a goal — never from vague planning talk.
When the user explicitly invokes ultragoal for a concrete work objective and asks to build, complete, run, pursue, or "do it", treat it as Activate by default — do not stop after writing durable state or reporting a packet. After grounding and (when useful) writing GOAL.md, activate, then continue under Active Goal Discipline. Stay in Design only when the user asks to draft, design, critique, or discuss a goal without starting it.
Only when the user explicitly authorises goal-backed subagents. Each child gets one bounded objective and one verifier — the same Orchestrator/Executor/Validator discipline from lesson 9, where the Validator is never the agent that built the thing. Never spawn child goals on your own initiative.
Designing a goal means filling in eight specific blanks. Skip one and the loop has a hole to fall through. Here they are, each in one plain line, then drawn as the cycle they form.
A goal that runs over many nights is really a sequence of phases, each with a goal, the tasks inside it, the risks, and — the part that matters most — an exit bar it must clear before the next phase begins. That exit bar is just the verifier applied at that step. Click a phase below to open its card; the bar across the top is the map. This is the RHG nightly-digest goal, broken into the phases ultragoal would actually write.
Goal: measure the real starting point before changing anything, so "better" has a number to beat.
Goal: write the Proof Gate before the fix, so success can never be self-declared. The verifier must be able to fail on tonight's real digest.
href in the body; assert 0 non-200sGoal: change the pipeline until the verifier turns green — one meaningful change per loop, each measured.
Goal: demand the completion proof — enough consecutive clean nights to rule out luck — and write the durable record so a future run can recover.
RESULT.md: the change, the boundary evidence, residual risksPhase 2 builds the Proof Gate before phase 3 touches the pipeline. This ordering is deliberate: if you fix first and verify later, you will be tempted to write a verifier that happens to pass the fix you already made — a verifier that has never seen red. Demanding a red run on a known-bad fixture is the cheapest insurance against a rubber-stamp check. An exit criterion is a measurable gate, not a feeling: "the digest looks fine" is not a gate; "an independent IMAP fetch at 07:00 found the digest with zero non-200 links" is.
Here is the single idea the rest of the lesson protects. A verifier is only worth anything if there is a real input that makes it come back red. A check that returns green no matter what is not a verifier — it is a decoration. The trick most fakes share is that they never touch the real boundary: they ask the agent "did you send it?" instead of asking the inbox "did it arrive?"
Compare the two below. On the left, a hollow check that can only say "yes". On the right, a Proof Gate that reaches the real inbox and can say "no".
Time to drive it yourself. The panel on the left lets you build a verifier for the RHG digest goal — choose where it checks, whether it can fail on a bad fixture, and whether the agent grades its own work. The panel on the right assembles that verifier and then renders the verdict that matters: could a lazy or adversarial run fake "done" past this check? Try to build a green-looking verifier that is actually fakeable, then fix it.
—
—
A verifier becomes fakeable the moment any of these is true: (1) it checks a claim or a mock instead of the real boundary, so the agent's word is the evidence; (2) it has never seen red, so a green result is meaningless; (3) the builder grades itself, so bias toward "done" is unchecked; or (4) the run may narrow scope or move the benchmark without approval, so "done" can be redefined downward until it's trivially true. The right pane goes red the instant any one of these holds — because in a real run, any one of them is enough to fake completion.
A passing verifier must mean the real outcome: do not weaken tests, narrow scope, hide failures, swap in mocks, or move the benchmark without approval. The tuner is a toy, but the verdict logic is exactly the red-team you run on every real goal before activating it.
Under schedule pressure, an agent (or a person) will reach for the easy win: make the test pass without making the thing true. ultragoal names the five moves that do this and forbids all of them without explicit human approval. Memorise the list — it is the same list whether you are red-teaming your own goal or reviewing someone else's.
Say the send keeps landing at 07:02. The lazy fix is to edit the verifier's assertion from ts <= 07:00 to ts <= 07:05. The digest now "passes" — but the real outcome (in every inbox by 07:00) is no longer what's being proven. The anti-cheating rule catches this because moving the benchmark requires approval: the loop must surface the proposed change to the human, not slip it in. The same logic forbids "narrow scope" (proving it only for a test account) and "swap in a mock" (asserting against a fake inbox that always has the digest).
A goal that lives only in an agent's head dies the moment that turn ends. The third load-bearing part of a good goal is durable context: enough written down that a fresh run — tomorrow morning, a different model, after a crash — can read the files and resume cold. The loop-engineering convention uses three files, each with a clear job.
Keep the active objective in GOAL.md short; when supporting context grows, put it in the nearest durable file. Prefer the project's existing conventions over inventing new files. Do not create files in Design mode unless asked, and always read existing goal files before editing — preserve dirty work, never clobber a half-finished WORKLOG. A partial or interim state recorded honestly in WORKLOG.md is valid carry-over; it is never a finalized "done."
This is the part people most often get wrong, so it is worth saying plainly. ultragoal is not tied to one agent. The universal way to activate a goal is to write a durable GOAL.md and run it under the loop — and that works on any CLI (claude, codex, kimi, grok, gemini, opencode, …) and any model, with no special tools required. Some agents also expose a native goal API — Codex's create_goal / update_goal is one example — which you may call as the activation step when it exists. It is optional sugar over the same GOAL.md discipline, never a requirement. "Activate" always means: write/commit GOAL.md and enter the loop, plus call the native API only if the agent happens to have one.
GOAL.md run under the loop. The native goal API (dashed) is one optional path some agents add on top — never the thing that makes it work.create_goal to set a goal" — that's the misconception this section exists to kill. The contract is GOAL.md + the loop. The API is sugar.Activation is the final action, and it has a gate in front of it: the red-team. Before any goal goes live you ask a short series of hard questions, and the very first "no" sends you back to redesign — you do not activate a goal you could not defend. Pick a draft below, then press Next to walk it through the gate one question at a time. Watch where a weak draft peels off to back to design and a defensible one reaches activate.
Start here
A goal draft arrives at the gate
Press Next to red-team the defensible goal question by question. Switch the draft above to watch a weak one get sent back.
The full checklist ultragoal runs before it crosses the activation line: (1) Can success be faked by weakening the verifier? (2) Could the words be satisfied while missing the user's real outcome? (3) Are the approval gates explicit? (4) Does the loop say what to do after a failed attempt or a wait? And the one this flow ends on: is completion observable outside the running agent? A "no" on observability is fatal — if only the agent can see that it's done, "done" is just a claim. Survive all of them and only then write/commit GOAL.md and enter the loop (plus the native API, if any).
Once the goal is active, you don't drive — you observe (exactly the human's role from lesson 9). The readout below is what the goal's durable state looks like rendered as a dashboard. Flip between the three honest states a goal can be in — active, blocked, and done — and watch the numbers and the status pill change. Notice that "blocked" is a real, named state, not a failure to hide.
RHG nightly digest — goal state
source · WORKLOG.md (last verifier run)
Active: the verifier ran at 07:00, found the digest but a 2-minute-late timestamp, and reported red. The loop will change one thing tonight and re-run. You did nothing — and nothing needed you.
Active means a safe, relevant next step remains — the loop keeps going. Blocked is only legitimate after the repeated external-blocker threshold is met and no meaningful progress remains, with the smallest next action recorded; difficulty or uncertainty alone is never "blocked." Done is marked only after the objective and the completion proof are satisfied at the real boundary — here, three consecutive clean nights with archived evidence. There is no "probably done" — a partial state stays Active with its next action written down.
The most honest thing a goal can do, short of finishing, is stop and say exactly why — at the real boundary, with evidence, and with the smallest next action. Below is the RHG digest goal written up the way a blocked run should report: a header, a timeline of what the verifier actually saw, a five-whys to the real cause, the proof that it is a genuine external blocker, and a checklist of what would unblock it. This is what "blocked" looks like when it is a state, not an excuse.
What the verifier saw, in order. Olive is routine, clay is a warning, red is the wall it hit, green is the one fix it could still make.
Verifier starts on schedule
The Proof Gate wakes up to fetch a real subscriber inbox and crawl the digest's links.
routineIMAP login refused
The mailbox password the verifier uses returns AUTHENTICATIONFAILED. It was rotated by IT last night.
Cannot prove the outcome at the boundary
With no inbox access, the verifier cannot confirm the digest arrived. It refuses to report green on a check it could not run.
blockedSmallest next action recorded
WORKLOG: "External blocker — rotated IMAP credential. Next action: a human re-grants the mailbox secret; then re-run the verifier." No scope was narrowed, no green faked.
next actionFive whys, to the real cause — and why this is a true blocker, not just difficulty.
Why is the goal not done?
The verifier could not confirm the digest arrived.
Why couldn't it confirm?
It could not log into the inbox to look.
Why couldn't it log in?
The mailbox credential returns AUTHENTICATIONFAILED.
Why does the credential fail?
IT rotated the secret overnight and the new value was never shared with the run.
Root · why this is a real blocker
Only a human with vault access can re-grant the rotated secret — the agent cannot self-serve it, and there is no safe next step that makes progress without it. That is the bar for "blocked": an external dependency plus no meaningful work left. The loop did the honest thing — it did not weaken the check to a fakeable one just to show green.
What unblocks it. Check items off as they happen — the bar tracks progress. P1 is the human-only step that the loop is correctly waiting on.
The cheap, dangerous alternative was for the verifier to "pass" because the send function ran — never checking the inbox at all. That would have marked the goal done while subscribers got nothing. Blocking with evidence is strictly safer: it surfaces a precise, human-actionable next step (re-grant the secret) and keeps the contract intact. This is also where an approval gate and the human's observability role meet — the loop pauses on a genuine user-only fork (vault access) and waits, exactly as lesson 9 described, rather than guessing or faking.
None of this needs special tooling. The whole goal is a plain text file that any agent can read. Here is the RHG digest goal written as the durable contract — the eight parts from section 5, in the order ultragoal writes them. This is the file you'd commit and then walk away from.
# GOAL — RHG nightly digest outcome: "The RHG digest lands in every subscriber inbox by 07:00, zero broken links." baseline: "Last week: median send 07:38; 2.1 dead links/issue." verifier: # the Proof Gate — runs at the REAL boundary, can FAIL - fetch a live subscriber inbox via IMAP at 07:00 - assert digest present AND timestamp <= 07:00 - crawl every href in the body; assert 0 non-200 responses - MUST have failed once on a known-bad fixture (seen red) - run by a DIFFERENT agent than the one that edits the pipeline supporting: [ unsubscribe link works, image alt-text intact ] anti_cheating: "No weakening tests, narrowing scope, hiding failures," "mocking the inbox, or moving 07:00 — without human approval." approval_gates: [ rotating prod secrets, emailing all subscribers a test ] blocker_standard: "external dependency + smallest next action; doubt != blocked" completion_proof: "3 consecutive clean nights, evidence archived in WORKLOG.md"
The full ultragoal discipline lives in the skill. To read the workflow, the goal-fit rule, the anti-cheating clause, and the red-team checklist:
# the ultragoal workflow, modes, and red-team checklist cat ~/.claude/skills/ultragoal/SKILL.md # where it ties into the loop + the Forge GOAL.md contract grep -rn "Proof Gate\|GOAL.md\|anti-cheating\|red-team\|Validator" \ ~/.claude/skills/ultragoal/ ~/.claude/skills/loop-engineering/
The three invariants the contract enforces: (1) the verifier runs at the real boundary and has been seen to fail; (2) "done" requires the completion proof, not a claim; (3) the activation path is GOAL.md + the loop on any CLI — a native goal API is optional. Note there is no field that names a specific model or agent: the contract is deliberately portable.
Tie it together with the RHG digest goal, start to finish. Watch where the discipline acts and where you do not.
GOAL.md and enter the loop — on whatever CLI is running, no native API required. The Validator is a different agent than whoever edits the pipeline.RESULT.md records the change, the boundary evidence, and the residual risks.Count your actions: one — re-granting a rotated secret behind an approval gate. Everything else the goal did on its own, and every "done" was proven at the real inbox, never claimed.
# WORKLOG.md — RHG digest n1 07:00 verify RED digest present, ts=07:02 (> 07:00) · evidence: imap+crawl ok n1 07:05 change move render step earlier (one change) · re-run queued n1 07:55 verify GREEN ts=06:57, 0 broken links · 1/3 clean nights n2 07:00 verify BLOCKED IMAP AUTHENTICATIONFAILED (rotated) · next: human re-grants secret n2 09:30 human approval-gate: mailbox secret re-granted n3 07:00 verify GREEN ts=06:58, 0 broken · 2/3 n4 07:00 verify GREEN ts=06:59, 0 broken · 3/3 → completion proof met n4 07:01 done RESULT.md written · verifier left running as nightly guard
Note that every verify line is a real boundary result (an actual inbox fetch), the one human line sits behind an approval gate, and "done" appears only after 3/3 — never on a single lucky night. The agent's name is irrelevant to the trace; the contract is what's fixed.
Four quick questions. Pick one answer in each — it grades on click, and tells you why.
What are the three load-bearing parts of a good goal?
How can you tell a real verifier from a decoration?
On any CLI, what is the universal way to activate a goal?
The send keeps landing at 07:02. Which move is allowed?
GOAL.md contract with a verifier that can fail, or to show how this activates on a CLI you use (Codex, Kimi, Grok — the contract is the same). Next up: the Bright Data CLI — how the loop gets real web evidence instead of guessing, every single time.