Chapter 3 · Scaling & Trusting the Harness · Lesson 3.3

Filesystem, Git & Sandboxes

The win: three pieces of plumbing give the agent durable memory, a safety net to undo bad work, and a safe place to run code.

Chapter 0 · Sprint Zero
Chapter 1 · The ratchet & the practice loop
Chapter 2 · Spec-driven development in depth
3.1 · Tools: fewer and sharper
3.2 · MCP & tool safety
3.3 · Filesystem, Git & sandboxes
3.4 · Long-horizon autonomy
3.5 · Cost, observability & HaaS

The filesystem: durable memory

The first piece of plumbing is the plainest one: the filesystem - the ordinary folders and files on disk that the agent can read and write. This is the agent's workspace and its long-term memory. It doesn't just edit your source files; it can also park information there. When a tool spits out a huge blob - a 5,000-line log, a giant search result - the agent keeps only the useful head and tail in its context window and writes the rest to a file, then re-reads that file later if it needs the middle. That move is tool-call offloading (from Lesson 1.5), and it is the reason a full session doesn't choke on its own output. Per Addy Osmani, treating the filesystem as memory - not just as the thing being edited - is a core part of the harness.

Git: your undo button

Git is a version-control tool: it records the history of your files so you can go back to any earlier point. For an agent, that history is the cheapest safety net you will ever install. The agent can commit checkpoints as it works (saving a labelled snapshot); you can read the diff (exactly what changed) to review the run; and if the run went sideways, one git reset throws it all away and puts the files back the way they were. Branches - parallel copies of the history - let you try a risky idea off to the side without touching your main work.

Git is your undo button for an autonomous agent. after Simon Willison, Using Git with coding agents

The point is that autonomy is only safe when it is reversible. An agent left to run on its own will eventually make a mess; git means that mess costs you one command to undo, not an afternoon of hand-repair.

The sandbox: a safe place to run

A sandbox is an isolated, throwaway execution environment - a walled-off copy of a working machine with good defaults already in place: language runtimes to run code, test tools to check it, often a headless browser (a browser with no visible window) to exercise a web page. Inside it, the agent can run commands, execute code, and run your tests without any risk to your real machine or files. If the agent runs something destructive, it wrecks the sandbox, not your laptop. And because a sandbox is cheap to spin up, you can run many at once. This is the containment side of the safety story from Lesson 3.2: don't just restrict what a tool can do - also control where it runs. Per Osmani, a good sandbox with sensible defaults is what lets you hand an agent real execution power without holding your breath.

Why the three go together

None of these is exciting on its own. Together they are the substrate the ratchet loop runs on. The filesystem holds the state the agent works over. Git lets you undo any run you don't like. The sandbox lets the agent actually verify its work by running the tests - and lets subagents (Lesson 1.5) run in parallel, each in its own safe space. Take any one away and the loop gets fragile: no filesystem and it has no memory, no git and a bad run is permanent, no sandbox and every test run gambles your real machine.

The three-piece substrate

Filesystem = memory - the agent reads, writes, and offloads big context to disk so its window stays clean.
Git = undo - checkpoints, diffs, and git reset make every run reversible.
Sandbox = safe run - an isolated environment where code and tests run without risking your machine, many at once.

Check yourself

The filesystem gives an agent -

The filesystem is the agent's workspace and long-term memory: it reads and writes files, and offloads big tool output to disk so the window doesn't fill up. It is not a model, a reviewer, or a sandbox.

Git works as the agent's -

Git records history, so the agent can commit checkpoints and you can review the diff and git reset a bad run. It makes autonomy reversible - it does not upgrade the model or run tests.

A sandbox lets the agent -

A sandbox is an isolated environment with runtimes, test tools, and often a headless browser, where the agent runs code and tests without touching your real machine - and you can run many in parallel.

Do this now (2 min)

Before your next agent task, create a fresh git branch so a bad run is one command away from being undone:

git switch -c agent/try-thing

Now let the agent work on that branch. If the run goes well, keep it; if it doesn't, git switch - back and delete the branch - your main work never saw it. That single habit is the highest-leverage safety net in this whole chapter.

I'm your teacher - ask freely. Want a simple sandbox setup you can stand up in a few minutes, or a safe git workflow for letting an agent commit on its own without ever endangering your main branch? Ask and we'll walk through it.

Go deeper

Primary source (read this): Simon Willison - Using Git with coding agents. The clearest guide to branches, checkpoints, and treating git reset as your undo button for autonomous runs.

Secondary: Addy Osmani - Agent Harness Engineering. Why the filesystem and a good sandbox are core parts of the harness, not afterthoughts.

Wisdom (test it on people): r/ClaudeAI - practitioners trading real sandbox and git-safety setups for day-to-day agent work.

← 3.2 MCP & tool safety Course map Glossary Next: 3.4 Long-horizon autonomy →