Chapter 3 · Scaling & Trusting the Harness · Lesson 3.1

Tools: Fewer and Sharper

A small set of focused tools beats a big overlapping pile - because the model has to hold the whole menu in its head on every turn.

What a "tool" is here

A tool is a capability you hand the agent so it can act on the world instead of only talking about it: read a file, run a command, search the web, call an API. On each turn the model looks at the list of tools available to it and picks which one to call. So the tool list is not free - it is a menu the model has to read and choose from, over and over, for the whole task. Addy Osmani's essay on agent harness engineering treats that menu as something you design, not something you pile up.

The core claim

The instinct is to give the agent every tool you can think of - more capability must mean a more capable agent. Osmani's finding is the opposite:

"Ten focused tools outperform fifty overlapping ones." Addy Osmani, Agent Harness Engineering

Here is why. The model has to choose from the whole tool menu every single turn. When two or three tools do almost the same thing, it burns reasoning deciding between them and often picks the wrong one. That is the exact same problem as a bloated rules file from Lesson 1.2: every extra option competes for the model's limited attention. A padded tool list dilutes the tools that matter, just as a padded rules file dilutes the rules that matter.

Work backwards from behaviour

The test for keeping a tool is the same one you used for rules: work backwards from behaviour. Each tool should exist to deliver a specific thing the model cannot do on its own - "search our internal docs", "run the test suite", "open a pull request". If you can name that behaviour, the tool has earned its slot. If you cannot, the tool is pure cost on every turn, and it should be cut.

The general-tool trick

There is a shortcut that avoids the pile-up entirely. Instead of pre-building fifty narrow tools for every little job, give the agent one general capability - bash, or code execution - and let it build the tool it needs on the fly. Need to count matching lines in a log? It writes the one-line command itself. Osmani's point is that one sharp general tool often beats many narrow ones: it keeps the menu tiny while covering far more ground.

Keep the tool set sharp

Check yourself

A big overlapping tool set -

The model chooses from the whole menu every turn, so near-duplicate tools force wrong picks and burn reasoning - the same attention-budget problem as a bloated rules file.

A tool earns its place by -

Work backwards from behaviour: each tool should deliver something the model can't do alone. If you can't name that behaviour, the tool is pure cost - cut it.

Giving the agent bash lets it -

One sharp general tool lets the agent assemble the capability it needs on the fly, so you avoid pre-building fifty narrow tools that clutter the menu and slow every pick.

Do this now (5 min)

Open the tool or MCP list for the agent you use most, then:

  1. Count them. Note the number - a big list is a warning sign, not a badge.
  2. Find two to remove - either two tools that overlap, or one you've never actually seen the agent use.
  3. Name the behaviour for each remaining tool - what does it let the agent do that it otherwise couldn't? If you can't say, it's a cut.

The goal isn't a tidy list for its own sake - it's a menu the model can read and choose from without wasting a turn.

I'm your teacher - ask freely. Paste your tool or MCP list and I'll go through it with you - which tools overlap, which you can't tie to a behaviour, and where a single general tool would replace a handful of narrow ones.

Go deeper

Primary source (read this): Addy Osmani - Agent Harness Engineering. The source of the "ten focused tools" claim and the build-a-tool-on-the-fly idea.

Wisdom (test it on people): the HumanLayer community is a good place to have your tool list pressure-tested for overlap and bloat.