This post is available as a video on the Modern Software Engineering Channel
The focus of technical coaching is the minute-by-minute habits that software developers need to create good quality working software that lasts. These days all the teams I meet are asking for advice and help with AI tools. Iām teaching and coaching all the same code quality skills as before, and how to achieve them with agentic AI. It helps to think in terms of not only augmented coding, but also Harness Engineering. Iām going to explain what that is and how it helps ensure you get good results with your AI tools.
I usually talk about programming being made up of many āmicroskillsā – each one in itself isnāt a big deal, but if you gain several in the same area they start to support one another and build up to better overall outcomes. (I talked about this in a previous video about how to learn TDD). Weāre still identifying all the detailed skills and behaviours that lead to effective use of Agentic AI, but I have some good leads. The overall macro behaviour that I want to see is working in small steps with plenty of opportunities to steer development, which leads to high quality code being committed frequently. An important part of achieving that is having a āharnessā. Birgitta Bƶckeler has a great article about āHarness Engineeringā and explains in a podcast that a good harness contains both Guides and Sensors. The harnessās job is to constrain the AI to come up with good quality code and tests.
You can go on the internet and grab someone elseās harness – a pile of skills, hooks and scripts – basically a load of markdown files and command line tools. It seems to me a much better idea to learn to make your own, particularly when youāre just starting out, and also so you can continually adapt it over time as your needs and your codebase evolves. Today Iām going to explain about āharness engineeringā which Iāve started to teach.
Agentic AI
All the developers Iāve been coaching over the past few years have AI tools in their IDEs. Most people are still using them as fancy autocomplete. Agentic AI is fundamentally different. Actually I personally turn off AI generated line completion in my IDE now. Itās a huge distraction. If Iām writing something by hand, I want to write it by hand, not be bombarded by random suggestions that flash into my field of view demanding attention and often entirely spurious. Iām not surprised a lot of developers think AI is a waste of space if thatās the only part of it they tried. With Agentic AI, it uses additional tools in a loop so it can iterate towards something actually useful. The suggestions it comes up with will at least compile. Often itās reasonably good code.
As a technical coach I have a focus on teamwork, good engineering, and code quality. I have high standards. Models these days might come up with reasonably good code, but what I find exciting is that you can set things up so code quality increases over time. You can set up a Harness Improvement Flywheel.
Harness Improvement Flywheel
A Harness can help constrain an AI agent to produce good quality code. Itās particularly useful in the typical situation of the teams I coach – they have legacy code that is important to the business but difficult to work with.Ā Ā
In the past I would come in and teach a team how to write good unit tests and improve design in small steps, and I still do that. In addition Iāve started to help teams to set up AI Harness as we go, so improvements compound. If you start with a lot of poorly designed code and tests, the agent will tend to copy the patterns it sees and create more poorly designed code and tests.
Instead, Iām aiming for a flywheel effect where a better harness leads to better code leads to better harness and code that improves over time.
Getting started
A good starting point is unit tests. We pick a test that could be better designed, and improve it. It will probably take several rounds of prompting the AI before the design looks any good. And we do learning hours and general training in good test design generally, because usually the team doesnāt recognize good or bad unit test design, and canāt usefully instruct the AI. At some point we should have a design that Iām happy with, that the team understands why itās better.
Once we have an example of the good design style we are looking for, we can improve the Harness. Add a unit test design Guide. This is a kind of knowledge document the agent uses. It could include things like āfollow the Arrange – Act – Assert structureā.
When youāre using the agent to design tests, you tell it to take this Guide into account. āGuideā is Birgitta Bƶckelerās term for a feed-forward part of the agentic harness. Most agents these days support āskillsā which is a mechanism for the agent to include relevant guides into the LLMās context when you do particular tasks. Thatās useful, but if your agent is missing that feature you can achieve the same effect by including the relevant guide in your prompt. You actually get more control that way too.
Guides give the LLM advice before it starts writing code, and during the agentic loop the it can refer to it as it revises and improves the structure. In some cases itās better to instead use a Sensor, which is a feed-back part of the Harness in Birgitta Bƶckelerās language.
A Sensor gives feedback after the agent writes some code. It will specify what needs to be improved. Often sensors are deterministic scripts that will for example check the number of lines in a file doesnāt exceed a fixed limit, or find other code smells.
Some agents have a āhooksā feature that enables you to include Sensors at the right point in the agentic loop, but you can produce a similar effect by including them in your build script or linter.
Harness Flywheel
Those are two parts of the Harness I advise teams to build up for their codebase. Every time we succeed with a design task, we update our Guides and Sensors to encourage the agent to continue to follow our preferences. Over time, it gets easier to prompt the agent and have it do good design because of the harness, plus there will be more examples of good design in the codebase that it can copy. This is the āharness flywheelā effect that weāre looking for – design getting better over time.
That is huge. In legacy code if the design is getting better over time then youāre winning. You can put off that big risky rewrite and start to enjoy working with the code again. Itās so much better when you can put more of your focus on understanding the users needs and the exciting new features youāre going to add, compared with spending all your time just trying to understand the existing design and fixing bugs.
Harness Engineering
Harness Engineering involves both adding, updating and removing Guides and Sensors. What tends to happen is the harness only grows over time, as we add more and more advice. At the same time, the codebase improves so less advice is needed, and the models improve so they donāt need as strong a harness. We need a way to balance the process so we also remove stuff from the harness thatās not helping.
This recent article from Anthropic āA good AGENTS.md is a model upgrade. A bad one is worse than no docs at allā outlines a process for evaluating a harness. They take previously completed tasks and do them again from scratch, with and without a harness. They compare the outcomes against the code that was originally accepted, to see what difference the harness made. This kind of evaluation gives useful advice but is hard to achieve in practice since it needs a pretty large sample to work with and quite a lot of time and resources.
For a team starting out with a new harness I think the most practical approach is to make it part of every task to also update the harness. Encourage people not to be afraid to remove stuff – big long guides can be expensive in tokens.
You can also do āspot checksā where you take a branch and try the same prompt twice with and without a proposed harness update. I donāt think itās practical to do this on every change though.
Fundamentally you need to own your harness and treat it as an important part of your codebase, like you would the shared build scripts or tests. This is why I donāt want teams to just download someone elseās harness and start using it. They wonāt know whatās in it and will be afraid to change it. You can set up something for yourselves that matches your teamās preferences, that tailors the agentās general idea of āgood codeā to your situation.Ā
Conclusions
In my work as a technical coach using the Samman method I meet a lot of teams with difficult code and poor or missing unit tests. Agentic AI gives me real hope that ordinary developers can create a harness engineering flywheel where the design of the code gets better over time, not worse. Itās the same engineering principles for better code quality that Iāve been teaching and coaching these many years, but now with the help of AI tools a whole team can adopt this approach consistently on all changes.
Happy Coding!



