Chapter 4 · Measuring & Evolving the Harness · Lesson 4.4

The Self-Improving Loop

The win: the ratchet you turn by hand can be automated - the agent edits its own harness from what it learns each run.

The ratchet, by hand

You already know the ratchet: the agent fails at something, you add a durable fix - a rule, a hook, a reviewer - and that fix pays off on every run after it, not just the one that broke. Because each fix stays in place, the wins stack up instead of resetting: that stacking is the compound engineering loop - "the model does not get smarter, the harness does." So far every turn of that loop has needed a human hand on the lever. This lesson is about taking your hand off it.

Now let the agent turn it

Agentic Harness Engineering (AHE), also called the self-improving loop, is exactly that: a loop where the agent edits its own scaffolding - the system prompt, the tools, the memory, the middleware - straight from what it sees in its own runs. No human has to hand-write the fix each time. The point is to keep the harness in step with each new model release automatically, instead of someone re-editing the rules file every time a better model ships. Addy Osmani frames this as observability-driven evolution: the harness watches its real runs and tightens itself from the failures it sees (Agent Harness Engineering), and My Experiments With AI makes the same case - the ratchet, but automated.

Proof it works

This is not just a nice idea. Stanford's IRIS Lab paired a model with an automated harness-evolution system they call a "Meta-Harness" - one that rewrites its own scaffolding between runs - and measured it on a public coding benchmark:

"A model plus an automated harness-evolution system ('Meta-Harness') hit 76.4% on Terminal-Bench 2.0 - beating every hand-designed harness on the board." Stanford IRIS Lab, via My Experiments With AI

The same model that people were wrapping in hand-tuned harnesses did better when the harness tuned itself. That is the loop paying off at the frontier, not just in theory.

Why the labs care: "the harness is the dataset"

There's a deeper reason this matters beyond your own project. Every run your harness records is a trajectory - a play-by-play of how a real task went. Whoever captures the best trajectories has the best raw material to train the next model on. Philipp Schmid puts it bluntly: the harness is the dataset, so the team that runs the strongest harness builds the stronger data flywheel (The Agent Harness). The self-improving loop is that flywheel turning on its own. Its productized form is Harness-as-a-Service - you build on someone's ready-made runtime, and they get to learn from every run that flows through it.

What it doesn't do

Be honest about the limits. Automation still needs a yardstick - your eval set from Lesson 4.2 - or it has no way to tell a good edit from a bad one. And it needs a guardrail, because a loop that changes its own harness can just as easily make things worse: an edit that fixes one task might quietly break three others. That guardrail is exactly what Lesson 4.5 is about. So the self-improving loop does not replace your judgement - it removes the repetitive part of turning the crank, and leaves the judgement calls to you.

The loop, one turn

run → observe failure → propose a harness edit → test against the eval set → keep it if it helped.

Check yourself

The self-improving loop automates -

The agent edits its own scaffolding from execution feedback - the manual ratchet, run automatically. It does not retrain the model's weights, and it does not remove your judgement.

"The harness is the dataset" means -

Whoever captures the best execution trajectories builds the stronger data flywheel (Philipp Schmid). The self-improving loop is that flywheel, and Harness-as-a-Service is its productized form.

Automated harness evolution still needs -

It needs a yardstick (your eval set) and a guardrail so the loop can't make things worse - which is why it pairs with Lesson 4.5. It removes the repetitive part, not your judgement.

Do this now (5 min)

Name one fix you keep making by hand - the same correction you type over and over into your agent. Then sketch, in a couple of lines, how a hook plus your eval set could catch that failure and close it automatically the next time it happens - so you never type that correction again.

Go deeper

Primary source (read this): Addy Osmani - Agent Harness Engineering, on treating the harness as something that evolves from its own observed runs.

Wisdom (test it on people): the HumanLayer community - a good place to sanity-check whether an automated loop is really earning its keep, or just adding churn.

I'm your teacher - ask freely. Not sure whether a fix you keep making by hand could be automated - or whether it's safe to let the agent edit that part of its own harness? Describe it and we'll work out where the yardstick and the guardrail have to go before you hand over the crank.