Archive for the ‘Opinion’ Category

By Emily Bache

There’s a frank discussion going on in the software industry at the moment about the words we use and the history behind them. Perhaps now is a good time to reconsider some of our terminology. For example, I’ve noticed we have several terms that describe essentially the same kind of testing:

  • Golden Master
  • Snapshot
  • Characterization
  • Approval

I think it’s time to completely drop the first one of these. In addition, if we could all agree on just one term it could make communication easier. My preferred choice is ‘Approval Testing’. As an industry, as a community of software professionals, can we agree to change the words we use?

What kind of testing are we referring to?

The common mechanism for ‘Golden Master’, ‘Snapshot’, ‘Characterization’ and ‘Approval’ testing is that you run the software, gather the output and store it. The combination of (a) exactly how you set up and ran the software and (b) the stored output, forms the basis of a test case. 

When you subsequently run the software with the same set up, you again gather the output. You then compare it against the version you previously stored in the test case. Any difference fails the test.

There are a number of testing frameworks that support this style of testing. Some open source examples:

 Full disclosure: I am a contributor to both Approvals and TextTest.

Reasons for choosing the term ‘Approval Testing’

Test cases are designed by people. You decide how to run the software and what output is good enough to store and compare against later. That step where you ‘approve’ the output is crucial to the success of the test case later on. If you make a poor judgement the test might not contain all the essential aspects you want to check for, or it might contain irrelevant details. In the former situation, it might continue to pass even when the software is broken. In the latter situation, the test might fail frequently for no good reason, causing you to mistrust or even ignore it. 

I like to describe this style of testing with a term that puts human design decisions front and center.

Comments on the alternative terms

Snapshot

This term draws your attention to the fact that the output you have gathered and stored for later comparison in the test is transient. It’s correct today, but it may not be correct tomorrow. That’s pretty agile – we expect the behaviour of our system to change and we want our tests to be able to keep up. 

The problem with this term is that it doesn’t imply any duty of care towards the contents of the snapshot. If a test fails unexpectedly I might just assume nothing is wrong – my snapshot is simply out of date. I can replace it with the newer one. After all, I expect a snapshot to change frequently. Did I just miss finding a bug though?

I prefer to use a word that emphasizes the human judgement involved in deciding what to keep in that snapshot.

Characterization

This is a better term because it draws your attention to the content of the output you store: that it should characterize the program behaviour. You want to ensure that all the essential aspects are included, so your test will check for them. This is clearly an important part of designing the test case. 

On the other hand, this term primarily describes tests written after the system is already working and finished. It doesn’t invite you to consider what the system should do or what you or others would like it to do. Approval testing is a much more iterative process where you approve what’s good enough today and expect to approve something better in the future.

Golden Master

This term comes from the record industry where the original audio for a song or album was stored on a golden disk in a special archive. All the copies in the shops were derived from it. The term implies  that once you’ve decided on the correct output, and stored it in a test, it should never change. It’s so precious we should store it in a special ‘golden’ archive. It has been likened to ‘pouring concrete on your software’. That is the complete opposite of agile! 

In my experience, what is correct program behaviour today will not necessarily be correct program behaviour tomorrow, and we need to update our understanding and our tests. We need to be able to ‘approve’ a new version of the output and see that as a normal part of our work.

This seems to me to be a strong enough argument for dropping the term ‘Golden Master’. If you’ve been following the recent announcement from Github around renaming the default branch to ‘main’, you’ll also be aware there are further objections to the term ‘master’. I would like to be able to communicate with all kinds of people in a respectful and friendly manner. If a particular word is problematic and a good alternative exists, I think it’s a good idea to switch.

In conclusion

Our job is literally about writing words in code and imbuing them with meaning. Using the same words to describe the same thing helps everyone to communicate better. Will you please join me in using the words ‘Approval Testing’ as an umbrella term referring to a particular style of testing? Words matter. We should choose them carefully. 

By Emily Bache

Or: is Given-When-Then Compulsory?

In BDD you discover what software you should build through a collaborative process involving both software developers and business people. BDD also involves a lot of test automation and tools like Cucumber and SpecFlow. But what would happen if you used an Approval testing tool instead? Would that still be BDD?

Double-loop TDD diagram. Failing scenario -> passing scenario -> refactor and inner loop red->green->refactor

Figure 4 from “Discovery – Explore behaviour using examples” by Gaspar Nagy and Seb Rose

I’m a big fan of Behaviour Driven Development. I think it’s an excellent way for teams to gain a good understanding of what the end-user wants and how they will use the software. I like the emphasis on whole team collaboration and building shared understanding through examples. These examples can be turned into executable scenarios, also known as acceptance tests. They then become ‘living documentation’ that stays in sync with the system and helps everyone to collaborate over the lifetime of the software. 

I wrote an article about Double-Loop TDD a while back, and I was thinking about BDD again recently in the context of Approval testing. Are they compatible? The usual tools for automating scenarios as tests are SpecFlow and Cucumber which both use the Gherkin syntax. Test cases comprise ‘Given-When-Then’ steps written in natural language and backed up by automation code. My question is – could you use an Approval testing tool instead? 

I recently read a couple of books by Nagy and Rose. They are about BDD and specifically how to discover good examples and then formulate them into test cases. I thought the books did a good job of clearly explaining these aspects in a way that made them accessible to everyone, not just programmers. 

Nagy and Rose are planning a third book in the series which will be more technical and go into more detail on how to implement the automation. They say that you can use other test frameworks, but in their books they deal exclusively with the Gherkin format and Cucumber family of tools. What would happen if you used an Approval testing tool? Would it still be BDD or would we be doing something else? Let’s go into a little more detail about the key aspects of BDD: discovery, formulation, and automation.

Discovery

The discovery part of BDD is all about developers talking with business stakeholders about what software to build. Through a structured conversation you identify rules and examples and unanswered questions. You can use an ‘example mapping’ workshop for that discussion outlined in this blog post by Cucumber Co-founder, Matt Wynne.

Formulation

The formulation part of BDD is about turning those rules and examples of system behaviour into descriptive scenarios. Each scenario is made as intelligible as possible for business people, consistent with the other scenarios, and unambiguous about system behaviour. There’s a lot of skill involved in doing this!

Automation

The automation part of BDD is where you turn formulated scenarios into executable test cases. Even though the automation is done in a programming language, the focus is still on collaboration with the business stakeholders. Everyone is expected to be able to read and understand these executable scenarios even if they can’t read a programming language.  

Double-Loop TDD

The picture shown at the start of the article from Nagy and Rose’s Discovery BDD book emphasizes the double loop nature of the BDD automation cycle. The outer loop is about building the supporting code needed to make a formulated scenario executable. Test-Driven Development fits within it as the inner loop for implementing the system that fulfills the scenarios. In my experience the inner loop of unit tests goes round within minutes, whereas the outer loop can take hours or even days.  

Later in the book they have a more detailed diagram showing an example BDD process:


Figure 16  from “Discovery – Explore behaviour using examples” by Gaspar Nagy and Seb Rose

This diagram is more complex, so I’m not going to explain it in depth here (for a deep dive take a look at this blog post by Seb Rose, or of course read the book itself!). What I want to point out is that the ‘Develop’ and ‘Implement’ parts of this diagram are showing double-loop TDD again, with slightly more detail than before. For the purpose of comparing a BDD process, with and without Approval testing, I’ve redrawn the diagram to emphasize those parts:

How you formulate, automate, and implement with TDD will all be affected by an approval testing approach. I recently wrote an article ”How to develop new features with Approval Testing, Illustrated with the Lift Kata”. That article goes through a couple of scenarios, how I formulate them as sketches, then automate them with an approval testing tool. Based on the process described in that article I could draw it like this:

What’s different?

  • “Formulate” is called “Sketch” since the method of formulation is visual rather than ‘Given-When-Then’. The purpose is the same though.
  • “Automate” includes writing a Printer as well as the usual kind of ‘glue’ code to access functionality in your application. A Printer can print the state of the software system in a format that matches the Sketch. The printer code will also evolve as you work on the implementation.
  • “Implement” is a slightly modified TDD cycle. With approval tests you still work test-driven and you still refactor frequently, but other aspects may differ. You may improve the Printer and approve the output many times before being ready to show the golden master to others for review.
  • “Review” – this activity is supposed to ensure the executable scenario is suitable to use as living documentation, and that business people can read it. The difference here is that the artifact being reviewed is the Approved Golden Master output, not the sketch you made in the “Formulate” activity. It’s particularly important to make sure business people are involved here because the living documentation that will be kept is a different artifact from the scenario they co-created in the ‘discover’ activities.

But is this still BDD?

I’m happy to report that, yes, this is still BDD! I hope you can see the activities are not that different. Just as importantly, the BDD community is open and welcoming of diversity of practice. This article describes BDD practitioners as forming a ‘centered’ community rather than a bounded community. That means people are open to you varying the exact practices and processes of BDD so long as you uphold some common values. The really central part of BDD is the collaborative discovery process.

In this article I hope I’ve shown that using an approval testing approach upholds that collaborative discovery process. It modifies the way you do formulation, automation, and development, but in a way that retains the iterative, collaborative heart of BDD. For some kinds of system sketches and golden masters might prove to be easier for business people to understand than the more mainstream ‘Given-When-Then’ Gherkin format. In that case an approval testing tool might enable a better collaborative discovery process and propel you closer to the centre of BDD. 

Conclusions

BDD is about a lot more than test automation, and Gherkin is not the only syntax you can use for that part. Approval testing is perfectly compatible with BDD. I’m happy I can both claim to be a member of the BDD community and continue to choose a testing tool that fits the context I’m working in. 🙂 
If you’d like to learn more about Approval testing check out this video of me pair programming with Adrian Bolboaca.

Clinical Trials and Software Process

Note: this article first appeared here

In the Accelerate book, researchers explain several metrics which they have shown will measure the performance of a DevOps organization, and crucially, drive performance of the organization as a whole. I will explain why this is important, using an analogy with your risk of a heart attack.

In 2018 Nicole Forsgren, Jez Humble and Gene Kim released Accelerate: The Science of Lean Software and DevOps to detail their research into DevOps. They identified a causal link between DevOps organizations which score well on a number of metrics and success in the marketplace. Good scores on those metrics are in turn driven by successful implementation of DevOps practices. The book explains which metrics their research has found to be particularly significant.

If you’ve been following the DevOps movement you won’t be surprised to learn that these practices include Continuous Integration and Deployment Automation. As the research proceeds they continue to refine our understanding of which practices are most significant and which metrics are most useful to measure. I think it is really encouraging that researchers are applying proper science to this problem. This relationship they’ve discovered, between DevOps practices, metrics and organizational performance, is really useful for helping leaders in all kinds of organizations to make more informed decisions about how to do their work.

Avoiding a heart attack

I like to compare it with the relationship between your risk of a heart attack, your cholesterol levels, and your lifestyle. A low risk of heart attack is what you’re trying to optimize for, just as an organization might try to optimize for creating profit, increasing shareholder value, or reducing suffering in the world. Each organization will have some kind of ultimate aim, but as an employee, or even a leader, it is probably not easy to influence directly.

Measuring your cholesterol levels is a way to assess your risk of a heart attack. In much the same way, measuring the DevOps metrics detailed in Accelerate is a way to assess your organization’s chances of successfully meeting its goals. The lifestyle choices you make will, of course, affect your cholesterol levels, and consequently your heart attack risk. Similarly, the practices you use for development and operations will affect the values you score on Accelerate’s DevOps metrics, and hence your chances of achieving your goals. Crucially, the feedback loop is much faster for the metric than for the ultimate goal.

Lifestyle matters

I watched a documentary recently where they had four couples each take up a new diet for one month. Before and after they measured all sorts of things: weight, fat percentage, and a host of biomarkers, including cholesterol. One of the couples changed to quite an extreme Low-Carb diet, where they ate lots of meat and dairy, no fruit, and only a restricted selection of vegetables. After only one month it was noticeable how much higher their cholesterol levels had become. They were still quite young, so their risk of heart attack was still relatively low, but you could see if they persisted with this diet for the long term, the risk was going to become significant.

I think you can use the Accelerate DevOps metrics in a similar way. You can measure them relatively easily for your organization, and get an idea of where you stand. If you have poor values it doesn’t necessarily mean you are in imminent danger of company failure, but it doesn’t look good for the long term. If you make changes to the way you build software, for example by introducing Continuous Integration or investing in deployment automation, you should be able to see changes in your metrics relatively quickly. Hopefully this will encourage you to press on with more productive practices and become more likely to achieve your organization’s aims.

As a woman programmer, I have noticed there is something of a gender imbalance in my profession. It’s an issue that’s interested me for a while, not least because people often ask me about what we can do to improve the situation. For myself, I enjoy writing code and I think it’s a great career. The sexism I’ve been aware of has not made a big impact on my life, although I know not everyone has been so fortunate. Susan Fowler’s blog really shocked me earlier this year. I have had some bad experiences, but not like that.

I recently read this article about the history of women in programming, by . She shows this graph comparing percentage women in different university studies in the US. It’s quite stark:

The percentage of women studying Computer Science suffers a trend reversal in the mid 80’s, while the other subjects don’t. The explanation given, is that it’s about then that home computers began to appear on the market, sold as a toy for boys. I lived through that time, and yes, my family bought a ZX Spectrum in the mid 80’s when I was about 10 years old, and yes, my younger brother learnt to program it and I didn’t. Fortunately I managed to learn to program later on anyway.

All this got me thinking about my current situation. I live in Sweden, and it’s a very different culture than the US. For example, I was reading about the concept of ‘male privilege‘. One of the examples given is that men have the privilege of keeping their name when they marry, while women are questioned if they keep theirs. The thing is, in Sweden, this is not true. Either partner may change their name and it’s not remarkable for them both to keep their original names, or both swap to something entirely different. That’s something of a trivial example, but I do think it is a sign of a wider cultural difference. Privilege is experienced in a social context, and Sweden has a much more feminist society in many ways. (See for example this page about gender equality in Sweden)

So I became curious to see whether the same thing happened in Sweden – did the proportion of women computer scientists also drop in the 80’s? I discovered that the Swedish statistical authority collects and publishes data on this kind of thing, and you can search it via a web gui. I started poking around on it and soon I was hooked. Loads of really interesting data lying around waiting to be analysed!

Here is the plot I came up with, that is showing somewhat equivalent data to the graph on the US that I showed earlier:

(If you want to check my data, I got it from statistikdatabasen.scb.se, from the table “Antal examina i högskoleutbildning på grundnivå och avancerad nivå efter universitet/högskola, examen, utbildningslängd, kön och ålder. Läsår 1977/78 – 2015/16”)

Although the proportion of women in engineering is pretty low compared to the other subjects, it’s encouraging to note that the proportion has increased more than the other subjects. It’s now at a similar level to where doctors, lawyers and architects were thirty-five years ago. (I was disappointed not to find any data for Natural Sciences. I’m not sure why that’s excluded from the source database). Anyway, I’m not seeing this trend change in the 80’s, the curve is fairly smoothly upwards. I suspect the subject breakdown isn’t detailed enough to pick out Computer Science from the wider Engineering discipline, and that could explain it.

So I’ve done some more digging into the data to try to find if there was a turning point in the mid 80’s for aspiring women programmers. I think something did happen in Sweden, actually. This is the graph that I think shows it:

(I’m using the data sources “Anställda 16-64 år i riket efter yrke (3-siffrig SSYK 96), utbildningsinriktning (SUN 2000), ålder och kön. År 2001 – 2013” and “Antal examina i högskoleutbildning på grundnivå och avancerad nivå efter universitet/högskola, examen, utbildningslängd, kön och ålder. Läsår 1977/78 – 2015/16”, the SSYK codes I used are shown in the title of the graph)

If you look at the blue curve for 2001, you can see it peaks at age 35-39 years – that is, there were a higher proportion of women programmers at that age than other ages. If you were 35-39 in 2001, you were probably doing your studies in the mid to late 80’s. Notice that the proportion of women at younger ages is lower. The green and yellow curves for 2005 and 2010 continue to show the same peak, just moved five years to the right. The proportion of women coming in at the younger agegroups remains lower. The orange curve for 2015 is a little more encouraging – at least the proportion of women in the youngest two age-groups has levelled off and is no longer sinking!

So it looks to me like there was a trend change in the mid to late 80’s in Sweden too – the proportion of women entering the profession seems to drop from then on, based on this secondary evidence. I imagine that computers were also marketed here as a boy’s toy. I really hope that things are changing today in Sweden, and that more women are studying computer science than before.

For reference, I did similar curves for several other professions, using the same dataset.

So there are a lot of women lawyers out there, and the proportion looks to be continuing to increase.

Male nurses seem to have things worse than female programmers, unfortunately. Plus I can’t see any real trend in this graph – the situation is bad and fairly stable.

The proportion of women police officers levelled off for a while but they’ve managed to turn things around, and it is now increasing again.

So programming is the only profession I discovered that has this decreasing trend of women participation, even if it has now levelled off. Let’s hope that changes to an upward trend soon – my daughters will by applying to university in about ten year’s time…

 

 

I forget exactly when, but I think it was 2008 or 2009. Anyway, I was at a software conference, and I was chatting with a developer after one of the sessions about cool new technologies and stuff. I don’t remember what hot new thing it was we talked about, all I remember, is the shoes he was wearing!

credit: flickr, Steve Hodgson

This is rather an unusual style of running shoe. At the time, I’d never seen any like this before, and I was intrigued. It turns out that this developer I was talking to was, like me, also something of a serial early adopter, it’s just that he not only picked up shiny new programming tools and technologies.

At the time, I was running in a pair of shoes with thick heel padding the shop assistant had assured me would correct my bad posture and foot “pronation”. This guy’s “five finger” shoes had none of that, in fact quite the opposite. I was looking at disruptive running technology.

The conversation quickly switched from the latest programming tools and frameworks, as this guy explained the essential benefits of his shoes:

  • Lightweight
  • Your toes can spread out, giving better push-off from the ground
  • Thin sole – you adapt your stride to the surface because you can feel it
  • No heel padding, means you strike the ground with the whole foot simultaneously.

And of course, most importantly for a technology enthusiast:

  • People stare at you feet!

Following this conversation, a little googling about and watching the odd video by a “running style expert”, I became convinced. To be honest, it wasn’t much contest – shiny new technology, being in with the cool kids – I bought some new shoes, with toes and everything!

It took me several weeks to get used to them. You have to start with short distances, build up some new muscles in your foot, and learn to strike the ground with the whole foot at once. In my old shoes, I had a tendency to strike heel-first, because of the huge wad of padding on the sole, but in these minimal shoes, that just hurt. It was useful feedback, the whole-foot-at-once gait is supposed to be better for your knees.

After a few weeks of running shorter distances, slower than before, I gradually found my stride, and really started to enjoy running my usual 7km circuit of the local forest, in my eye-catching five-fingered shoes.
Unfortunately it didn’t last. Maybe two or three months later something happened. I think the technical term for it is Swedish Autumn. It turns out that forest tracks gain a surprising number of cold, muddy puddles at that time of year! Shoes with a very thin sole that isolate and surround each individual toe in waterlogged fabric, mean absolutely freezing feet 🙁

So I’m back on the internet, looking for new, shiny technology to fix this problem, and of course, I buy some new shoes. This time I got a pair of minimalist shoes in waterproof goretex, with basically all the features of my old shoes, minus the individual toes.


I was back out on the forest track, faster than ever, with dry, comfortable feet – win! The only problem was, people were no longer staring at my eye-catching toes. So you can’t have it all!

So this is normally a blog about programming. What’s going on?

Test Driven Development as a Disruptive Technology

I’ve been thinking about this, and it seems to me that as with running shoes that have toes, TDD is something of a disruptive technology. Just as I haven’t seen the majority of runners switch to shoes with toes, I also havn’t seen the majority of developers using TDD yet. Neither seem to have crossed Geoffrey Moore’s “chasm”.

Geoffrey Moore's technology adoption distribution showing the chasm

Lots of developers write unit tests, but I think that’s slightly different. I’m talking about a TDD where developers primarily use tests to inform and direct design decisions, and rely on them for minute-by-minute feedback as they work. In 2009, Kent Beck made an observation in his blog that “the data suggests that there are at most a few thousand Java programmers actively practicing TDD”. I don’t think the situation is radically different today.

So can we learn anything about TDD from the story about running shoes? A couple of points I find relevant:

  • Early adopters will try a new technology based on really very flimsy evidence, and will persevere with it, even if it slows them down in the short term.
  • Early adopters like to look cool and stick out.

You may think that last point is just vanity, but actually, being a talking point helps drive adoption, but primarily amongst other, similarly minded technology geeks.

I remember a while back I was at work, writing some code, when a guy from another team came over to ask me something. He was about to leave, when he did a double-take and stared at my screen for a moment. “Are you doing TDD? I’ve never seen anyone actually do that in production code. Do you mind if I watch?”. So, you see, eye-catching shiny new technology, and I’m one of the cool kids, about to be emulated by the guy in the next team. 🙂

The other part of this story, is of course the compromise I made when the cool technology met the reality of a muddy Swedish forest track. The toes went, but the shoe I ended up with is still radically different from the one I had before. I think that for TDD to reach the mainstream, it may need to become a little less extreme, a little more practical – but without losing the essential benefits.

What are the essential benefits of TDD? Well, I would say something like this:

  • Design: useful feedback, pushing you away from long methods and tightly-coupled classes, because they’re hard to test.
  • Refactoring: quickly detecting regression when you make a mistake
  • Productivity: helping you to manage complexity and work incrementally

So is it possible to get these things in another way? Without driving development minute-by-minute with tests? Well, that’s probably the subject of another blog post…

You might be interested to watch a video of my recent keynote speech at Europython, where I told this story.