By Emily Bache

There’s a frank discussion going on in the software industry at the moment about the words we use and the history behind them. Perhaps now is a good time to reconsider some of our terminology. For example, I’ve noticed we have several terms that describe essentially the same kind of testing:

  • Golden Master
  • Snapshot
  • Characterization
  • Approval

I think it’s time to completely drop the first one of these. In addition, if we could all agree on just one term it could make communication easier. My preferred choice is ‘Approval Testing’. As an industry, as a community of software professionals, can we agree to change the words we use?

What kind of testing are we referring to?

The common mechanism for ‘Golden Master’, ‘Snapshot’, ‘Characterization’ and ‘Approval’ testing is that you run the software, gather the output and store it. The combination of (a) exactly how you set up and ran the software and (b) the stored output, forms the basis of a test case. 

When you subsequently run the software with the same set up, you again gather the output. You then compare it against the version you previously stored in the test case. Any difference fails the test.

There are a number of testing frameworks that support this style of testing. Some open source examples:

 Full disclosure: I am a contributor to both Approvals and TextTest.

Reasons for choosing the term ‘Approval Testing’

Test cases are designed by people. You decide how to run the software and what output is good enough to store and compare against later. That step where you ‘approve’ the output is crucial to the success of the test case later on. If you make a poor judgement the test might not contain all the essential aspects you want to check for, or it might contain irrelevant details. In the former situation, it might continue to pass even when the software is broken. In the latter situation, the test might fail frequently for no good reason, causing you to mistrust or even ignore it. 

I like to describe this style of testing with a term that puts human design decisions front and center.

Comments on the alternative terms

Snapshot

This term draws your attention to the fact that the output you have gathered and stored for later comparison in the test is transient. It’s correct today, but it may not be correct tomorrow. That’s pretty agile – we expect the behaviour of our system to change and we want our tests to be able to keep up. 

The problem with this term is that it doesn’t imply any duty of care towards the contents of the snapshot. If a test fails unexpectedly I might just assume nothing is wrong – my snapshot is simply out of date. I can replace it with the newer one. After all, I expect a snapshot to change frequently. Did I just miss finding a bug though?

I prefer to use a word that emphasizes the human judgement involved in deciding what to keep in that snapshot.

Characterization

This is a better term because it draws your attention to the content of the output you store: that it should characterize the program behaviour. You want to ensure that all the essential aspects are included, so your test will check for them. This is clearly an important part of designing the test case. 

On the other hand, this term primarily describes tests written after the system is already working and finished. It doesn’t invite you to consider what the system should do or what you or others would like it to do. Approval testing is a much more iterative process where you approve what’s good enough today and expect to approve something better in the future.

Golden Master

This term comes from the record industry where the original audio for a song or album was stored on a golden disk in a special archive. All the copies in the shops were derived from it. The term implies  that once you’ve decided on the correct output, and stored it in a test, it should never change. It’s so precious we should store it in a special ‘golden’ archive. It has been likened to ‘pouring concrete on your software’. That is the complete opposite of agile! 

In my experience, what is correct program behaviour today will not necessarily be correct program behaviour tomorrow, and we need to update our understanding and our tests. We need to be able to ‘approve’ a new version of the output and see that as a normal part of our work.

This seems to me to be a strong enough argument for dropping the term ‘Golden Master’. If you’ve been following the recent announcement from Github around renaming the default branch to ‘main’, you’ll also be aware there are further objections to the term ‘master’. I would like to be able to communicate with all kinds of people in a respectful and friendly manner. If a particular word is problematic and a good alternative exists, I think it’s a good idea to switch.

In conclusion

Our job is literally about writing words in code and imbuing them with meaning. Using the same words to describe the same thing helps everyone to communicate better. Will you please join me in using the words ‘Approval Testing’ as an umbrella term referring to a particular style of testing? Words matter. We should choose them carefully.