Chapter 1 · The Ratchet & the Practice Loop · Lesson 1.5
Context Firewalls & Subagents
The win: keep the agent's context window clean so it stays sharp - and use subagents to do the dirty work without polluting the main thread.
- 0.1-0.3 · Sprint Zero (models, harness, spec-driven dev)
- 1.1 · The ratchet
- 1.2 · CLAUDE.md as a pilot's checklist
- 1.3 · Hooks, not instructions
- 1.4 · Spec before code
- 1.5 · Context firewalls & subagents
- 1.6 · Cross-agent review
The window, and why it rots
A model's context window is the limited amount of text it can "see" at once - your prompt, the files it has read, the output of every tool it ran, and the whole back-and-forth so far. Everything the agent reasons about has to fit inside that window. It is finite, and it fills up fast on a long session.
Here's the trap most newcomers miss: a fuller window does not mean a smarter agent. The opposite. As the window fills, the model reasons worse - a failure mode called context rot. The clearest symptom is the "lost in the middle" effect: information the model handled fine at the start of the window gets ignored once it's buried in a crowded one.
So the target is not "cram in as much as possible". It's the reverse: keep window utilisation in the 40-60% range, not brimming - per My Experiments With AI. More context is not better. A lean window is a sharp agent.
Keeping the window clean
Three habits fight context rot. All three are about spending window space only where it earns its place:
- Compaction - summarise or offload older context before the window fills, not as a panic move at the end. Squash a long conversation down to its decisions and drop the raw transcript.
- Tool-call offloading - when a tool spits out a huge blob (a long log, a big file), keep just the head and tail in the window and write the rest to disk. The agent can re-read the file if it ever needs the middle.
- Progressive disclosure - load instructions and tools only when the task actually needs them, instead of dumping everything up front and letting it rot unread.
Subagents as context firewalls
The fourth move is the big one. A subagent is a child agent with its own fresh context window. You hand it a messy job - search a huge codebase, read a 5,000-line log, sift a pile of files - and it does all that reading in its own window. Then it returns just a short, condensed answer to the parent. The parent's window never sees the mess. That's why it's called a context firewall: the noise is contained on the far side of it.
The pattern is straightforward:
# WITHOUT a firewall: the parent reads everything itself
parent reads 40 files → window 80% full of raw code → rot sets in
# WITH a firewall: a subagent absorbs the mess
parent → subagent: "which files handle auth?"
subagent reads 40 files in ITS window
subagent returns: "auth lives in 3 files: a, b, c"
parent window grows by one short sentence, stays clean
But do not oversell this. More agents is not automatically better. Spinning up a swarm of subagents can actively hurt you: adding agents has been shown to degrade performance by 39-70% on sequential reasoning tasks - work that has to happen in order, where one step depends on the last - per My Experiments With AI, citing Google DeepMind. Reach for a subagent for two specific reasons: isolation (keep a messy job off the main window) and parallelism (run independent jobs at once). Not as a reflex, and not for a chain of steps that must stay in one head.
Simon Willison's subagents guide makes the same point: the value is a clean parent thread and parallel fan-out, not agent count for its own sake.
Why this sits under the ratchet
Everything from the last three lessons only works if the agent is actually paying attention to it. Your earned rules (1.2), your enforced hooks (1.3), your spec (1.4) - all of it lives in the context window. Let the window rot and those instructions get buried in the middle, ignored exactly when they matter. Clean context is what lets the rest of your harness be heard. It's the ground the whole ratchet stands on.
Check yourself
Context rot means models tend to -
As the window fills, reasoning degrades - the "lost in the middle" effect dropped accuracy 30%+ when key info moved into a crowded window. That's why the target is 40-60% utilisation, not full.
A subagent keeps the parent clean by -
A subagent does the messy reading in its own fresh window and passes back a short summary. The parent's window never absorbs the raw noise - that's the "context firewall".
Adding more subagents is -
More agents can degrade sequential reasoning by 39-70%. Use subagents for isolation and parallelism - not as a reflex, and not for step-by-step work that must stay in one head.
Do this now (3 min)
On your next long agent session, watch for the moment it goes vague - it starts repeating itself, contradicts an earlier answer, or forgets an instruction you gave near the start. That is context rot.
When you spot it, try one of these and notice the agent sharpen back up:
- Compact the conversation (many harnesses have a command for this - it summarises and drops the old transcript), or
- Start a fresh session with a short handoff summary: three or four lines saying what you're doing, what's decided, and what's next.
Go deeper
Primary source (read this): Simon Willison - Subagents. The clearest walkthrough of using child agents as context firewalls - and when not to.
Secondary: Chroma - Context Rot. The research behind "a full window reasons worse", with the lost-in-the-middle numbers.
Wisdom (test it on people): r/ClaudeAI - practitioners trading real tactics for keeping long sessions sharp.