Originally published on ProAgile’s blog

That was the question I posed on Twitter yesterday. I got a huge pile of answers, some of them surprised me! This article is my way of thanking everyone for sharing their ideas.

You can see the whole picture in this pdf which you are free to download.

Of course I asked this question mostly because I was curious whether anyone else would say the same things as me. I primed the discussion with my personal list, asked people to ”like” any responses they agreed with and reply with their own additions.

I was really pleased with how many people took the trouble to write something. Some of the things I hadn’t thought of at all.

I went through all the replies and summarized the main ideas on postit notes. (I didn’t try to transcribe every one). Then I grouped them by area and added some colours and headlines. The picture above is what I came up with in the end. Eight areas that emerged:

  • Human needs
  • Teamwork
  • Learning and professional development
  • Making progress and achieving a Flow state
  • Meaningful relationship with end users and customers
  • Building something important
  • Good code and coding environment
  • Shared coding rules and architecture

Some ideas would fit in more than one area of course, in which case I tried to put them near the boundary.

The main point that surprised me was the number of people who wanted peace and quiet – long stretches of uninterrupted time. I interpret that as a desire to reach the ’flow’ state that you get when you become so absorbed in what you’re doing that you are no longer conscious of time passing. It’s a wonderfully rewarding state to be in. I guess I never found silence to be a prerequisite for that, I get it in a pair or ensemble too.

The most popular kinds of ideas, that got the most likes, were mostly around the ’human needs’ category. We’re all people, including software developers 🙂

I’m interested in this question not only from idle curiosity. I regularly give talks at conferences and company events attended by lots of software developers. I want to understand the kinds of challenges developers have in their daily work and connect that to the things I can help them with. I don’t want to just cause additional boring meetings taking people away from fun coding!

Thanks to everyone who contributed to the discussion on Twitter. I think I learnt something from it, I hope you like my summary.

By Emily Bache

There’s a frank discussion going on in the software industry at the moment about the words we use and the history behind them. Perhaps now is a good time to reconsider some of our terminology. For example, I’ve noticed we have several terms that describe essentially the same kind of testing:

  • Golden Master
  • Snapshot
  • Characterization
  • Approval

I think it’s time to completely drop the first one of these. In addition, if we could all agree on just one term it could make communication easier. My preferred choice is ‘Approval Testing’. As an industry, as a community of software professionals, can we agree to change the words we use?

What kind of testing are we referring to?

The common mechanism for ‘Golden Master’, ‘Snapshot’, ‘Characterization’ and ‘Approval’ testing is that you run the software, gather the output and store it. The combination of (a) exactly how you set up and ran the software and (b) the stored output, forms the basis of a test case. 

When you subsequently run the software with the same set up, you again gather the output. You then compare it against the version you previously stored in the test case. Any difference fails the test.

There are a number of testing frameworks that support this style of testing. Some open source examples:

 Full disclosure: I am a contributor to both Approvals and TextTest.

Reasons for choosing the term ‘Approval Testing’

Test cases are designed by people. You decide how to run the software and what output is good enough to store and compare against later. That step where you ‘approve’ the output is crucial to the success of the test case later on. If you make a poor judgement the test might not contain all the essential aspects you want to check for, or it might contain irrelevant details. In the former situation, it might continue to pass even when the software is broken. In the latter situation, the test might fail frequently for no good reason, causing you to mistrust or even ignore it. 

I like to describe this style of testing with a term that puts human design decisions front and center.

Comments on the alternative terms

Snapshot

This term draws your attention to the fact that the output you have gathered and stored for later comparison in the test is transient. It’s correct today, but it may not be correct tomorrow. That’s pretty agile – we expect the behaviour of our system to change and we want our tests to be able to keep up. 

The problem with this term is that it doesn’t imply any duty of care towards the contents of the snapshot. If a test fails unexpectedly I might just assume nothing is wrong – my snapshot is simply out of date. I can replace it with the newer one. After all, I expect a snapshot to change frequently. Did I just miss finding a bug though?

I prefer to use a word that emphasizes the human judgement involved in deciding what to keep in that snapshot.

Characterization

This is a better term because it draws your attention to the content of the output you store: that it should characterize the program behaviour. You want to ensure that all the essential aspects are included, so your test will check for them. This is clearly an important part of designing the test case. 

On the other hand, this term primarily describes tests written after the system is already working and finished. It doesn’t invite you to consider what the system should do or what you or others would like it to do. Approval testing is a much more iterative process where you approve what’s good enough today and expect to approve something better in the future.

Golden Master

This term comes from the record industry where the original audio for a song or album was stored on a golden disk in a special archive. All the copies in the shops were derived from it. The term implies  that once you’ve decided on the correct output, and stored it in a test, it should never change. It’s so precious we should store it in a special ‘golden’ archive. It has been likened to ‘pouring concrete on your software’. That is the complete opposite of agile! 

In my experience, what is correct program behaviour today will not necessarily be correct program behaviour tomorrow, and we need to update our understanding and our tests. We need to be able to ‘approve’ a new version of the output and see that as a normal part of our work.

This seems to me to be a strong enough argument for dropping the term ‘Golden Master’. If you’ve been following the recent announcement from Github around renaming the default branch to ‘main’, you’ll also be aware there are further objections to the term ‘master’. I would like to be able to communicate with all kinds of people in a respectful and friendly manner. If a particular word is problematic and a good alternative exists, I think it’s a good idea to switch.

In conclusion

Our job is literally about writing words in code and imbuing them with meaning. Using the same words to describe the same thing helps everyone to communicate better. Will you please join me in using the words ‘Approval Testing’ as an umbrella term referring to a particular style of testing? Words matter. We should choose them carefully. 

A book for managers, about architecture

Ruth Malan re-stated Conway’s law like this: “If the architecture of the system and the architecture of the organization are at odds, the architecture of the organization wins”. As a technical leader, that really caught my attention. The sphere of things I should be able to influence in order to do my job well just grew! 

Team Topologies book on my desk with some post-its around it

I read that quote in the book “Team Topologies” by Matthew Skelton and Manuel Pais. I think it’s a really interesting book and introduces several ideas that were new to me at least. They deserve a wide audience within the software industry. In this article I’ll give you some highlights, and hopefully tempt you to read the book for yourself.

This first idea perhaps isn’t so novel, but it is the basis for the rest, so worth mentioning first. You should use the team as the fundamental building block for your organization. Rather than thinking about hierarchies or departments that get re-organized from time to time, think about building close-knit, long-lived teams. Adopt a team structure that is aligned to the business and adapts and evolves over time. 

Conway’s law

When designing a team structure for your organization you should take into account Conway’s law. Your team structure and your software architecture are strongly linked. You should design team responsibilities and communication pathways. Skelton and Pais suggest there are four fundamental team topologies, ie kinds of team. The main one is ‘business stream-aligned’, the other three have supporting roles. The way they describe a stream-aligned team seems familiar to me. It’s like a DevOps team or a feature team. I think the name is better though. It’s a team that’s aligned to a single, valuable stream of work. They do everything needed to deliver valuable software to customers, gather feedback from how it’s used, and improve the way they work.

The other three kinds of team exist to reduce the burden on those stream-aligned teams. Skelton and Pais introduce the idea that good organization design requires restricting the amount of ‘cognitive load’ each team is expected to handle. Software systems can become large and complex and can require an overwhelming amount of detailed knowledge to work on them. This really resonated with me. I’ve worked on teams where the amount of software we had to look after was just too much to cope with. Important work was delayed or missed because we were overburdened.

The other three types of team are: 

  • Platform 
  • Complicated Subsystem
  • Enabling

There is a fuller description of each in the book of course, but to summarize – platform teams provide a self-service tool, API or software service. Enabling teams support other teams with specialized knowledge in particular techniques and business domains. Complicated subsystem teams look after a particular component of the software that needs specialized knowledge, like an optimizer or a machine learning algorithm. 

Following from the idea of Conway’s law in particular, is the idea that you should have only three ‘interaction modes’ between teams. Restrict the communication pathways to get the architecture you want, and to avoid unnecessary team cognitive load. Skelton and Pais suggest teams should work together in one of three ways:

  • Collaborate
  • Facilitate
  • Provide X-as-a-Service

Collaboration means two teams work closely together, have a common goal, and only need to work this closely for a limited time. (Otherwise the teams would merge!) Facilitation is about one team clearing impediments that are holding back the other team. X-as-a-service is a much looser collaboration, where one team provides something like a software library or api to another team. The way teams interact with one another will evolve over time, and consequently your organization will also evolve.

I thought it was a good sign that I could imagine how my own work would fit into an organization designed this way. I think I would fit well into an Enabling team that would support stream-aligned teams through facilitation. We would work with several teams over time. My particular role is to facilitate a team to clear impediments around technical debt and code quality, and learn skills like Test-Driven Development. 

Team Topologies really does make organizational design feel like you’re doing architecture. Skelton and Pais have pretty diagrams throughout their book with colours and patterns and lines and boxes describing various organizational designs. It’s all very attractive to a software developer like me. I think the intended audience is managers though. People who are designing organizations today. I really hope some of them read this book and are inspired to involve technical people in important organizational decisions.

By Emily Bache

Or: is Given-When-Then Compulsory?

In BDD you discover what software you should build through a collaborative process involving both software developers and business people. BDD also involves a lot of test automation and tools like Cucumber and SpecFlow. But what would happen if you used an Approval testing tool instead? Would that still be BDD?

Double-loop TDD diagram. Failing scenario -> passing scenario -> refactor and inner loop red->green->refactor

Figure 4 from “Discovery – Explore behaviour using examples” by Gaspar Nagy and Seb Rose

I’m a big fan of Behaviour Driven Development. I think it’s an excellent way for teams to gain a good understanding of what the end-user wants and how they will use the software. I like the emphasis on whole team collaboration and building shared understanding through examples. These examples can be turned into executable scenarios, also known as acceptance tests. They then become ‘living documentation’ that stays in sync with the system and helps everyone to collaborate over the lifetime of the software. 

I wrote an article about Double-Loop TDD a while back, and I was thinking about BDD again recently in the context of Approval testing. Are they compatible? The usual tools for automating scenarios as tests are SpecFlow and Cucumber which both use the Gherkin syntax. Test cases comprise ‘Given-When-Then’ steps written in natural language and backed up by automation code. My question is – could you use an Approval testing tool instead? 

I recently read a couple of books by Nagy and Rose. They are about BDD and specifically how to discover good examples and then formulate them into test cases. I thought the books did a good job of clearly explaining these aspects in a way that made them accessible to everyone, not just programmers. 

Nagy and Rose are planning a third book in the series which will be more technical and go into more detail on how to implement the automation. They say that you can use other test frameworks, but in their books they deal exclusively with the Gherkin format and Cucumber family of tools. What would happen if you used an Approval testing tool? Would it still be BDD or would we be doing something else? Let’s go into a little more detail about the key aspects of BDD: discovery, formulation, and automation.

Discovery

The discovery part of BDD is all about developers talking with business stakeholders about what software to build. Through a structured conversation you identify rules and examples and unanswered questions. You can use an ‘example mapping’ workshop for that discussion outlined in this blog post by Cucumber Co-founder, Matt Wynne.

Formulation

The formulation part of BDD is about turning those rules and examples of system behaviour into descriptive scenarios. Each scenario is made as intelligible as possible for business people, consistent with the other scenarios, and unambiguous about system behaviour. There’s a lot of skill involved in doing this!

Automation

The automation part of BDD is where you turn formulated scenarios into executable test cases. Even though the automation is done in a programming language, the focus is still on collaboration with the business stakeholders. Everyone is expected to be able to read and understand these executable scenarios even if they can’t read a programming language.  

Double-Loop TDD

The picture shown at the start of the article from Nagy and Rose’s Discovery BDD book emphasizes the double loop nature of the BDD automation cycle. The outer loop is about building the supporting code needed to make a formulated scenario executable. Test-Driven Development fits within it as the inner loop for implementing the system that fulfills the scenarios. In my experience the inner loop of unit tests goes round within minutes, whereas the outer loop can take hours or even days.  

Later in the book they have a more detailed diagram showing an example BDD process:


Figure 16  from “Discovery – Explore behaviour using examples” by Gaspar Nagy and Seb Rose

This diagram is more complex, so I’m not going to explain it in depth here (for a deep dive take a look at this blog post by Seb Rose, or of course read the book itself!). What I want to point out is that the ‘Develop’ and ‘Implement’ parts of this diagram are showing double-loop TDD again, with slightly more detail than before. For the purpose of comparing a BDD process, with and without Approval testing, I’ve redrawn the diagram to emphasize those parts:

How you formulate, automate, and implement with TDD will all be affected by an approval testing approach. I recently wrote an article ”How to develop new features with Approval Testing, Illustrated with the Lift Kata”. That article goes through a couple of scenarios, how I formulate them as sketches, then automate them with an approval testing tool. Based on the process described in that article I could draw it like this:

What’s different?

  • “Formulate” is called “Sketch” since the method of formulation is visual rather than ‘Given-When-Then’. The purpose is the same though.
  • “Automate” includes writing a Printer as well as the usual kind of ‘glue’ code to access functionality in your application. A Printer can print the state of the software system in a format that matches the Sketch. The printer code will also evolve as you work on the implementation.
  • “Implement” is a slightly modified TDD cycle. With approval tests you still work test-driven and you still refactor frequently, but other aspects may differ. You may improve the Printer and approve the output many times before being ready to show the golden master to others for review.
  • “Review” – this activity is supposed to ensure the executable scenario is suitable to use as living documentation, and that business people can read it. The difference here is that the artifact being reviewed is the Approved Golden Master output, not the sketch you made in the “Formulate” activity. It’s particularly important to make sure business people are involved here because the living documentation that will be kept is a different artifact from the scenario they co-created in the ‘discover’ activities.

But is this still BDD?

I’m happy to report that, yes, this is still BDD! I hope you can see the activities are not that different. Just as importantly, the BDD community is open and welcoming of diversity of practice. This article describes BDD practitioners as forming a ‘centered’ community rather than a bounded community. That means people are open to you varying the exact practices and processes of BDD so long as you uphold some common values. The really central part of BDD is the collaborative discovery process.

In this article I hope I’ve shown that using an approval testing approach upholds that collaborative discovery process. It modifies the way you do formulation, automation, and development, but in a way that retains the iterative, collaborative heart of BDD. For some kinds of system sketches and golden masters might prove to be easier for business people to understand than the more mainstream ‘Given-When-Then’ Gherkin format. In that case an approval testing tool might enable a better collaborative discovery process and propel you closer to the centre of BDD. 

Conclusions

BDD is about a lot more than test automation, and Gherkin is not the only syntax you can use for that part. Approval testing is perfectly compatible with BDD. I’m happy I can both claim to be a member of the BDD community and continue to choose a testing tool that fits the context I’m working in. 🙂 
If you’d like to learn more about Approval testing check out this video of me pair programming with Adrian Bolboaca.

It’s Test-Driven Development with a twist! Developing new functionality with approval tests requires some slightly different steps, but if you’re a visual thinker like me you might just prefer it. In this blog post I’ll explain how it works.

Photo by Jason Dent on Unsplash

You may be familiar with the Gilded Rose Kata. It’s the most popular exercise I have on my GitHub page. About a year ago I posted some videos demonstrating a way to solve it. I used several techniques, including ‘Approval’ testing, which is also known as ‘Golden Master’ testing. It’s an approach that’s often used to get legacy code under control. What’s perhaps less known is that you can use the same tools for new development. I’ve put together a new exercise – the ‘Lift’ kata – to help people understand how this works.

If you’ve never done the Lift Kata now might be a good time to try it out. I originally worked from this description of it, and I now have my own description and GitHub repo for those who want to try it out “approval testing style”. The first step towards solving it is to spend some time understanding the problem. I’m going to assume that most of you have been in a lift at some point, so take a few minutes to note down your understanding of how they work, and the rules that govern them. Perhaps even formulate some test cases.

I did this by sketching out some scenarios. I say ‘sketch’ and not ‘formulate’ quite deliberately! The way my mind works is quite visual, so for me it made sense to represent each floor vertically on the page, and write the name of the lift next to the floor it was on. This shows a lift system with four floors named 0, 1, 2, 3, and one lift named ‘A’, on floor 0:

sketch of one lift on floor 0

This is just a snapshot of a moment in time. I then started to think about how a lift responds to people pressing the floor buttons inside. I figured that this is an important aspect to test and proceeded to sketch it out. It occurred to me that I could write a list of requested floor numbers next to the lift name, but then I noticed it was even clearer if I put a mark next to each requested. For example, if passengers request floors 2 and 3 I can sketch it like this:

sketch of lift on floor 0 with requests to go to floor 2 and 3

The next move for this lift would be to go to floor 2 since it’s the closest requested floor. That example could be formulated as a test case sketch like this:

sketch of lift on floor 0 moving to floor 2

I can use this sketch as the first test case for TDD. I’ll need to write code for a lift with floors and requests. I’ll also need to write a ‘Printer’ that can turn a lift object into some output that looks like my sketch. I write some code for this and use the printer output in the ‘verify’ step of the test. After some work the output looks like this:

ascii printout from my test showing lift moving from floor 0 to floor 2

This ascii-art looks much the same as my sketch. One difference is that I wrote the floor numbers at both ends of each line. This is a trick to stop my editor from deleting what it thinks is irrelevant trailing whitespace at the ends of lines! I think it looks enough like my sketch to approve the output and store it as a ‘golden master’ for this scenario. Actually, I’ve already approved it several times as it started to look more and more like my sketch. And every time I did that I could refactor a little before adding more functionality and updating the approved file again.

I’m looking at the requirements again and realize that I haven’t modelled the lift’s doors. You can’t fulfill a request until you’ve opened the doors, and that only happens after you’ve moved to the right floor. I drew a new sketch including them, shown below. I’ve written [A] for a lift called ‘A’ with closed doors, and ]A[ for when it has open doors. I also show an intermediate step when the lift is on the correct floor, but since the doors are closed the request is still active: 

sketch of lift moving from floor 0 to floor 2 and opening the doors

To get this to pass I’ll need to update all of my lift class, my printer, and my test case. After a little coding, and a few iterations of improving both the code and the printer, the test produces output that looks like this and I approve it:

ascii printout of lift moving from floor 0 to 2 and opening the doors

Now that the test is passing, I’m fairly happy that my lift can answer requests. The next feature I was thinking about was being able to call the lift from another floor. For this I think I’ll need a new test case. Let’s say I’m standing on the third floor and the lift is on floor 1, and I press the button to go down. I can include that in my sketch by putting a ”v” next to the floor I’m on. The whole scenario might play out like this:

sketch of lift on floor 1 being called to floor 3, moving there and opening the doors

As before, I spend time improving both the lift code and the printer. I approve intermediate results several times and do several refactorings. At some point the output from my program looks like my sketch and I approve it:

Great stuff! My lift can now fulfill requests from passengers and answer calls from another floor. Time for a celebratory cup of tea!

I’ve shown you the first couple of test cases, but there are of course plenty more features I could implement. A system with more than one lift for a start. Plus, the lift should alert the person waiting when it arrives by making a ‘ding’ when it opens the doors. I feel my lifts would be vastly improved if they said ding! I’ll have to come up with a new sketch that includes this feature. For the moment, let’s pause and reflect on the development process I’ve used so far.

Comparing Approval Testing with ordinary TDD

If I’d been doing ordinary Test-Driven Development with unit tests I might have created a dozen tests in the same time period for the same functionality. With Approval Testing I’ve still been working incrementally and iteratively and refactoring just as frequently. I only have two test cases though. The size of the unit being tested is a little larger than with ordinary TDD, but the feedback cycle is similarly short. 

Having a slightly larger unit for testing can be an advantage or a disadvantage, depending on how you view it. When the chunk of code being tested is larger, and the test uses a fairly narrow interface to access that code, it constrains the design less than it would if you instead had many finer grained tests for lower level interfaces. That means the tests don’t influence the design as strongly, and don’t need to be changed as often when you refactor. 

Another difference is that I’ve invested some effort in building code that can print a lift system as an ASCII artwork, which is reused in all my tests. In classic TDD I’d have had to write assertion code that would have been different in every test. 

Try it for yourself


What I’ve done isn’t exactly the same as ordinary TDD, but I think it’s a useful approach with many of the same benefits. I’ve put this exercise up on GitHub, so you can try it out for yourself. I’ve included the code for my printer so you don’t have to spend a lot of time setting that up, and can get on with developing your lift functionality. I’ve also recorded a video together with Adrian Bolboaca where I explain how the exercise works. So far I’ve translated the starting code into Java, C# and Python, and some friends have done a C++ version. (Do send me a pull request if you translate it to your favourite language.) And that’s it! You’ve seen how easy it is, so why don’t you have a try at Approval testing-style TDD for yourself?