I Stopped Coding and Started Architecting Agents (And Why You Should Too)

by | Jun 25, 2026 | Augmented Coding

This post is available as a video on the Modern Software Engineering Channel

The focus of technical coaching is the minute-by-minute habits that software developers need to create good quality working software that lasts. These days all the teams I meet are asking for advice and help with AI tools. I’m teaching and coaching all the same code quality skills as before, and how to achieve them with agentic AI. It helps to think in terms of not only augmented coding, but also Harness Engineering. I’m going to explain what that is and how it helps ensure you get good results with your AI tools.

I usually talk about programming being made up of many ā€˜microskills’ – each one in itself isn’t a big deal, but if you gain several in the same area they start to support one another and build up to better overall outcomes. (I talked about this in a previous video about how to learn TDD). We’re still identifying all the detailed skills and behaviours that lead to effective use of Agentic AI, but I have some good leads. The overall macro behaviour that I want to see is working in small steps with plenty of opportunities to steer development, which leads to high quality code being committed frequently. An important part of achieving that is having a ā€˜harness’. Birgitta Bƶckeler has a great article about ā€œHarness Engineeringā€ and explains in a podcast that a good harness contains both Guides and Sensors. The harness’s job is to constrain the AI to come up with good quality code and tests.

You can go on the internet and grab someone else’s harness – a pile of skills, hooks and scripts – basically a load of markdown files and command line tools. It seems to me a much better idea to learn to make your own, particularly when you’re just starting out, and also so you can continually adapt it over time as your needs and your codebase evolves. Today I’m going to explain about ā€˜harness engineering’ which I’ve started to teach.

Agentic AI

All the developers I’ve been coaching over the past few years have AI tools in their IDEs.  Most people are still using them as fancy autocomplete. Agentic AI is fundamentally different. Actually I personally turn off AI generated line completion in my IDE now. It’s a huge distraction. If I’m writing something by hand, I want to write it by hand, not be bombarded by random suggestions that flash into my field of view demanding attention and often entirely spurious. I’m not surprised a lot of developers think AI is a waste of space if that’s the only part of it they tried. With Agentic AI, it uses additional tools in a loop so it can iterate towards something actually useful. The suggestions it comes up with will at least compile. Often it’s reasonably good code.

As a technical coach I have a focus on teamwork, good engineering, and code quality. I have high standards. Models these days might come up with reasonably good code, but what I find exciting is that you can set things up so code quality increases over time. You can set up a Harness Improvement Flywheel.

Harness Improvement Flywheel

A Harness can help constrain an AI agent to produce good quality code. It’s particularly useful in the typical situation of the teams I coach – they have legacy code that is important to the business but difficult to work with.Ā Ā 

In the past I would come in and teach a team how to write good unit tests and improve design in small steps, and I still do that. In addition I’ve started to help teams to set up AI Harness as we go, so improvements compound. If you start with a lot of poorly designed code and tests, the agent will tend to copy the patterns it sees and create more poorly designed code and tests.

Instead, I’m aiming for a flywheel effect where a better harness leads to better code leads to better harness and code that improves over time.

Getting started

A good starting point is unit tests. We pick a test that could be better designed, and improve it. It will probably take several rounds of prompting the AI before the design looks any good. And we do learning hours and general training in good test design generally, because usually the team doesn’t recognize good or bad unit test design, and can’t usefully instruct the AI.  At some point we should have a design that I’m happy with, that the team understands why it’s better.

Once we have an example of the good design style we are looking for, we can improve the Harness. Add a unit test design Guide. This is a kind of knowledge document the agent uses. It could include things like ā€˜follow the Arrange – Act – Assert structure’.

When you’re using the agent to design tests, you tell it to take this Guide into account. ā€œGuideā€ is Birgitta Bƶckeler’s term for a feed-forward part of the agentic harness. Most agents these days support ā€˜skills’ which is a mechanism for the agent to include relevant guides into the LLM’s context when you do particular tasks. That’s useful, but if your agent is missing that feature you can achieve the same effect by including the relevant guide in your prompt. You actually get more control that way too.

Guides give the LLM advice before it starts writing code, and during the agentic loop the it can refer to it as it revises and improves the structure. In some cases it’s better to instead use a Sensor, which is a feed-back part of the Harness in Birgitta Bƶckeler’s language. 

A Sensor gives feedback after the agent writes some code. It will specify what needs to be improved. Often sensors are deterministic scripts that will for example check the number of lines in a file doesn’t exceed a fixed limit, or find other code smells.

Some agents have a ā€˜hooks’ feature that enables you to include Sensors at the right point in the agentic loop, but you can produce a similar effect by including them in your build script or linter.

Harness Flywheel

Those are two parts of the Harness I advise teams to build up for their codebase. Every time we succeed with a design task, we update our Guides and Sensors to encourage the agent to continue to follow our preferences. Over time, it gets easier to prompt the agent and have it do good design because of the harness, plus there will be more examples of good design in the codebase that it can copy. This is the ā€˜harness flywheel’ effect that we’re looking for – design getting better over time.

That is huge. In legacy code if the design is getting better over time then you’re winning. You can put off that big risky rewrite and start to enjoy working with the code again. It’s so much better when you can put more of your focus on understanding the users needs and the exciting new features you’re going to add, compared with spending all your time just trying to understand the existing design and fixing bugs.

Harness Engineering

Harness Engineering involves both adding, updating and removing Guides and Sensors. What tends to happen is the harness only grows over time, as we add more and more advice. At the same time, the codebase improves so less advice is needed, and the models improve so they don’t need as strong a harness. We need a way to balance the process so we also remove stuff from the harness that’s not helping.

This recent article from Anthropic ā€œA good AGENTS.md is a model upgrade. A bad one is worse than no docs at allā€ outlines a process for evaluating a harness. They take previously completed tasks and do them again from scratch, with and without a harness. They compare the outcomes against the code that was originally accepted, to see what difference the harness made. This kind of evaluation gives useful advice but is hard to achieve in practice since it needs a pretty large sample to work with and quite a lot of time and resources.

For a team starting out with a new harness I think the most practical approach is to make it part of every task to also update the harness. Encourage people not to be afraid to remove stuff – big long guides can be expensive in tokens. 

You can also do ā€˜spot checks’ where you take a branch and try the same prompt twice with and without a proposed harness update. I don’t think it’s practical to do this on every change though. 

Fundamentally you need to own your harness and treat it as an important part of your codebase, like you would the shared build scripts or tests. This is why I don’t want teams to just download someone else’s harness and start using it. They won’t know what’s in it and will be afraid to change it. You can set up something for yourselves that matches your team’s preferences, that tailors the agent’s general idea of ā€˜good code’ to your situation.Ā 

Conclusions

In my work as a technical coach using the Samman method I meet a lot of teams with difficult code and poor or missing unit tests. Agentic AI gives me real hope that ordinary developers can create a harness engineering flywheel where the design of the code gets better over time, not worse.  It’s the same engineering principles for better code quality that I’ve been teaching and coaching these many years, but now with the help of AI tools a whole team can adopt this approach consistently on all changes.

Happy Coding!

Hi – IĀ“m Emily!

Ā I am a consultant with Bache Consulting and chair of the Samman Technical Coaching Society.Ā  As a technical coach I work with software development organizations who want to get better at technical practices like Test-Driven Development, Refactoring and Incremental Design. I also write books and publish videos. I live in Gothenburg, Sweden, although I am originally from the UK.

Sociala ikoner med hovringseffekt

Practical Coaching –
Beyond the Blog

If you’re enjoying the insights shared here on the blog, you might enjoy my training too.

ā€œTechnical Agile Coaching with the Samman Methodā€Ā offers a practical guide to improving how developers collaborate and write code. You’ll learn hands-on techniques for Test-Driven Development, Refactoring, and effective team coaching.

To learn more about the book, just click the link below.

Blog categories