# Agentic Harness Engineering Resources

## Knowledge

### Primary (the course spine)
- [Addy Osmani — "Agent Harness Engineering"](https://addyosmani.com/blog/agent-harness-engineering/)
  The clearest single essay on treating the harness as a first-class engineering artifact. Use for: the ratchet, work-backwards-from-behaviour, AGENTS.md discipline, hooks, HaaS.
- [Simon Willison — "Agentic Engineering Patterns" (guide)](https://simonwillison.net/guides/agentic-engineering-patterns/)
  Practical multi-chapter guide for Claude Code / Codex. Use for: how coding agents work, git workflows, subagents, TDD, agentic manual testing, annotated real prompts.
- [My Experiments With AI — "Agentic Engineering: Why the Harness"](https://myexperimentswithai.substack.com/p/agentic-engineering-why-the-harness)
  Six-principle framework (P1-P6) backed by benchmark numbers and failure taxonomies. Use for: spec-before-code, hooks-not-instructions, cross-agent review, the compound loop.

### Deep dives (surfaced by the primaries — reach for when a lesson needs the source)
- [Simon Willison — "Designing agentic loops"](https://simonwillison.net/2025/Sep/30/designing-agentic-loops/) — the loop + sandbox mental model.
- [Anthropic Engineering — "Harness design for long-running apps"](https://www.anthropic.com/engineering/harness-design-long-running-apps) — first-party harness design (every component encodes an assumption about what the model can't do).
- [LangChain — "The Anatomy of an Agent Harness"](https://blog.langchain.com/the-anatomy-of-an-agent-harness/) — Viv Trivedy; origin of "Agent = Model + Harness"; the 52.8%→66.5% Terminal-Bench jump.
- [HumanLayer — "Skill issue: harness engineering for coding agents"](https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents) — Dex Horthy; the "skill issue" reframe; <60-line AGENTS.md.
- [Birgitta Böckeler (Martin Fowler) — "Harness engineering" (user side)](https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html) — the user-facing discipline.
- [UC Berkeley MAST — Multi-Agent System failure Taxonomy](https://arxiv.org/pdf/2503.13657v2) — 1,600+ traces; where agents actually fail.
- [Chroma — "Context Rot" research](https://trychroma.com/research/context-rot) — empirical basis for context management.
- Addy Osmani — [Beyond Vibe Coding](https://beyond.addy.ie) (O'Reilly book) and [self-improving agents](https://addyosmani.com/blog/self-improving-agents/).

### Sprint Zero — model landscape (0.1)
- [MindStudio — "Open-Source Models Closing the Frontier Gap"](https://www.mindstudio.ai/blog/kimmy-k2-6-qwen-3-6-open-source-frontier-models) — plain-English tour of the 2026 open-vs-closed convergence with the key numbers.
- [MindStudio — "Best Open-Source LLMs for Agentic Coding 2026"](https://www.mindstudio.ai/blog/best-open-source-llms-agentic-coding-2026) — Epoch ~3-month gap; per-workload breakdown.
- [Can Demir — "Three Weeks, Four Chinese Coding Models"](https://medium.com/@candemir13/three-weeks-four-chinese-coding-models-whats-actually-real-and-what-s-overstated-4cb58199e83d) — the "narrowed, not closed" caveat.

### Sprint Zero — the toolkit trio (0.3)
- [GitHub Spec Kit](https://github.com/github/spec-kit) + [docs](https://github.github.com/spec-kit/) — the SDD **method**: constitution → specify → plan → tasks → implement. Agent-neutral, MIT.
- [OpenSpec](https://github.com/Fission-AI/OpenSpec/) — a lightweight, agent-neutral SDD **framework** (Fission AI). Unit of work = "the change" (a delta); propose → apply → archive; brownfield-first; `/opsx:*` slash commands, no Python, ~5-min setup.
- agent-skills — `spec-driven-development` **skill** (a gated SPECIFY→PLAN→TASKS→IMPLEMENT workflow) from the [Addy Osmani agent-skills plugin](https://addyosmani.com/blog/agent-harness-engineering/); layers onto a harness like Claude Code.

## Wisdom (Communities)
- [r/ChatGPTCoding](https://reddit.com/r/ChatGPTCoding) — cross-tool coding-agent technique; harness/scaffold debates. Use for: workflow critique.
- [r/ClaudeAI](https://reddit.com/r/ClaudeAI) — Claude Code-specific hooks, CLAUDE.md, subagent patterns. Use for: tool-specific troubleshooting.
- [HumanLayer community](https://www.humanlayer.dev/) — the crowd that coined much of this vocabulary; deepest harness-engineering discussion.
- Simon Willison's blog comments + [Latent Space / AI Engineer Discord](https://www.latent.space/) — practitioner debate at the frontier.

## Gaps
- No .NET-specific harness resource yet (all primaries are JS/Python-flavoured). Adapt examples to your stack; flag if a .NET-native source turns up.
- No source yet on measuring harness ROI rigorously (regression rate, cost-per-task) - needed for the "fewer regressions" success criterion.
