← → or Space · F fullscreen
A 20-minute tour

Agentic Harness
Engineering

The model is the commodity. The harness is your edge.

From "prompt and hope" to a measured, ratcheted practice
The shift

The models have converged

0.2 pt
open DeepSeek V4-Pro vs closed Opus 4.6 on SWE-bench
~3 mo
open-weight now lags the frontier by months, not generations
11.9 → 5.4%
gap between the #1 and #10 coding model, in one year

So "which model?" stopped being the interesting question.

The core idea

Agent = Model + Harness

The harness is everything that isn't the model: the prompts, tools, memory, checks, and the loop that runs them. "If you're not the model, you're the harness."

Why it matters

Same model. Different harness. Wildly different results.

52.8 → 66.5%
rebuild only the harness, same model (LangChain)
77 vs 93%
one Opus, two harnesses, identical tasks
32×
cost swing for near-identical code

You can't buy good agentic coding with a bigger model. You engineer the harness.

Chapter 1

The Ratchet
& the practice loop

The core habit, and the moves that compound it.

Definition
ratch·et
noun · as used in this course

A harness discipline that only ever tightens. Each agent failure earns one permanent fix - a rule, a hook, or a reviewer - so the same mistake can never recur. Nothing is added speculatively; nothing is removed except when a better model makes it redundant.

Origin: the mechanical ratchet - a gear that turns one way and never slips back. Applied here to the harness, which compounds every lesson and never unlearns one.

The one habit

When the agent gets something wrong, don't re-prompt it - change the harness so it can never get that wrong again. The ratchet only ever tightens. "The model doesn't get smarter. The harness does."

Where a fix lives

Three homes for a fix

Rule

  • the agent just needs to know it
  • a line in CLAUDE.md

Hook

  • must be enforced every time
  • a script that runs automatically

Reviewer

  • needs judgement a script can't make
  • a second agent checks it
Lesson 1.2

CLAUDE.md is a pilot's checklist

Lesson 1.3

Hooks, not instructions

The prompt is where you steer. The harness is where you enforce. Prompt compliance ~70-90%. A hook fires 100%. Success is silent; failures are verbose.

Lesson 1.4

Spec before code

Spec frameworks

Three ways to bring SDD to your agent

Spec Kit

  • GitHub toolkit
  • heavier, phase-gated
  • specify CLI, 30+ agents

OpenSpec

  • lightweight framework
  • "the change" delta, brownfield
  • no Python, ~5-min setup

agent-skills

  • SDD as a drop-in skill
  • no separate CLI
  • loads into your harness

All three: the spec is the source of truth. (Spec Kit deep-dive is now Chapter 2.)

Lessons 1.5 - 1.6

Keep it sharp, then check it

Chapter 2

Spec-driven
development in depth

The spec frameworks, with Spec Kit as the main event.

Chapter 2 · spec-driven development in depth

Spec Kit: the deep dive

Chapter 3

Scaling & trusting
the harness

Running agents for real work, safely.

Chapter 3 in one slide

Five moves for real work

Chapter 4

Measuring & evolving

You can't improve what you don't measure.

Chapter 4 in one slide

Make the ratchet honest

Chapter 5

Multi-agent
& team harnesses

Scaling across agents and across people.

Chapter 5 in one slide

Across agents, across people

Chapter 6

Capstone:
build your harness

Assemble it, run it end to end, make it yours.

The throughline

Build → Spec-drive → Scale → Measure → Share → Make it yours

Keep the harness lean, earned, and disposable - and keep ratcheting. The model is the commodity; the harness is the edge.

Start Monday

Your 5-piece starter harness

Day 1: commit a near-empty rules file. Then ratchet from real failures.

Keep ratcheting

Thank you

1 / 1