Archive for the ‘Opinion’ Category

By Emily Bache

Or: is Given-When-Then Compulsory?

In BDD you discover what software you should build through a collaborative process involving both software developers and business people. BDD also involves a lot of test automation and tools like Cucumber and SpecFlow. But what would happen if you used an Approval testing tool instead? Would that still be BDD?

Double-loop TDD diagram. Failing scenario -> passing scenario -> refactor and inner loop red->green->refactor

Figure 4 from “Discovery – Explore behaviour using examples” by Gaspar Nagy and Seb Rose

I’m a big fan of Behaviour Driven Development. I think it’s an excellent way for teams to gain a good understanding of what the end-user wants and how they will use the software. I like the emphasis on whole team collaboration and building shared understanding through examples. These examples can be turned into executable scenarios, also known as acceptance tests. They then become ‘living documentation’ that stays in sync with the system and helps everyone to collaborate over the lifetime of the software. 

I wrote an article about Double-Loop TDD a while back, and I was thinking about BDD again recently in the context of Approval testing. Are they compatible? The usual tools for automating scenarios as tests are SpecFlow and Cucumber which both use the Gherkin syntax. Test cases comprise ‘Given-When-Then’ steps written in natural language and backed up by automation code. My question is – could you use an Approval testing tool instead? 

I recently read a couple of books by Nagy and Rose. They are about BDD and specifically how to discover good examples and then formulate them into test cases. I thought the books did a good job of clearly explaining these aspects in a way that made them accessible to everyone, not just programmers. 

Nagy and Rose are planning a third book in the series which will be more technical and go into more detail on how to implement the automation. They say that you can use other test frameworks, but in their books they deal exclusively with the Gherkin format and Cucumber family of tools. What would happen if you used an Approval testing tool? Would it still be BDD or would we be doing something else? Let’s go into a little more detail about the key aspects of BDD: discovery, formulation, and automation.

Discovery

The discovery part of BDD is all about developers talking with business stakeholders about what software to build. Through a structured conversation you identify rules and examples and unanswered questions. You can use an ‘example mapping’ workshop for that discussion outlined in this blog post by Cucumber Co-founder, Matt Wynne.

Formulation

The formulation part of BDD is about turning those rules and examples of system behaviour into descriptive scenarios. Each scenario is made as intelligible as possible for business people, consistent with the other scenarios, and unambiguous about system behaviour. There’s a lot of skill involved in doing this!

Automation

The automation part of BDD is where you turn formulated scenarios into executable test cases. Even though the automation is done in a programming language, the focus is still on collaboration with the business stakeholders. Everyone is expected to be able to read and understand these executable scenarios even if they can’t read a programming language.  

Double-Loop TDD

The picture shown at the start of the article from Nagy and Rose’s Discovery BDD book emphasizes the double loop nature of the BDD automation cycle. The outer loop is about building the supporting code needed to make a formulated scenario executable. Test-Driven Development fits within it as the inner loop for implementing the system that fulfills the scenarios. In my experience the inner loop of unit tests goes round within minutes, whereas the outer loop can take hours or even days.  

Later in the book they have a more detailed diagram showing an example BDD process:


Figure 16  from “Discovery – Explore behaviour using examples” by Gaspar Nagy and Seb Rose

This diagram is more complex, so I’m not going to explain it in depth here (for a deep dive take a look at this blog post by Seb Rose, or of course read the book itself!). What I want to point out is that the ‘Develop’ and ‘Implement’ parts of this diagram are showing double-loop TDD again, with slightly more detail than before. For the purpose of comparing a BDD process, with and without Approval testing, I’ve redrawn the diagram to emphasize those parts:

How you formulate, automate, and implement with TDD will all be affected by an approval testing approach. I recently wrote an article ”How to develop new features with Approval Testing, Illustrated with the Lift Kata”. That article goes through a couple of scenarios, how I formulate them as sketches, then automate them with an approval testing tool. Based on the process described in that article I could draw it like this:

What’s different?

  • “Formulate” is called “Sketch” since the method of formulation is visual rather than ‘Given-When-Then’. The purpose is the same though.
  • “Automate” includes writing a Printer as well as the usual kind of ‘glue’ code to access functionality in your application. A Printer can print the state of the software system in a format that matches the Sketch. The printer code will also evolve as you work on the implementation.
  • “Implement” is a slightly modified TDD cycle. With approval tests you still work test-driven and you still refactor frequently, but other aspects may differ. You may improve the Printer and approve the output many times before being ready to show the golden master to others for review.
  • “Review” – this activity is supposed to ensure the executable scenario is suitable to use as living documentation, and that business people can read it. The difference here is that the artifact being reviewed is the Approved Golden Master output, not the sketch you made in the “Formulate” activity. It’s particularly important to make sure business people are involved here because the living documentation that will be kept is a different artifact from the scenario they co-created in the ‘discover’ activities.

But is this still BDD?

I’m happy to report that, yes, this is still BDD! I hope you can see the activities are not that different. Just as importantly, the BDD community is open and welcoming of diversity of practice. This article describes BDD practitioners as forming a ‘centered’ community rather than a bounded community. That means people are open to you varying the exact practices and processes of BDD so long as you uphold some common values. The really central part of BDD is the collaborative discovery process.

In this article I hope I’ve shown that using an approval testing approach upholds that collaborative discovery process. It modifies the way you do formulation, automation, and development, but in a way that retains the iterative, collaborative heart of BDD. For some kinds of system sketches and golden masters might prove to be easier for business people to understand than the more mainstream ‘Given-When-Then’ Gherkin format. In that case an approval testing tool might enable a better collaborative discovery process and propel you closer to the centre of BDD. 

Conclusions

BDD is about a lot more than test automation, and Gherkin is not the only syntax you can use for that part. Approval testing is perfectly compatible with BDD. I’m happy I can both claim to be a member of the BDD community and continue to choose a testing tool that fits the context I’m working in. 🙂 
If you’d like to learn more about Approval testing check out this video of me pair programming with Adrian Bolboaca.

Clinical Trials and Software Process

Note: this article first appeared here

In the Accelerate book, researchers explain several metrics which they have shown will measure the performance of a DevOps organization, and crucially, drive performance of the organization as a whole. I will explain why this is important, using an analogy with your risk of a heart attack.

In 2018 Nicole Forsgren, Jez Humble and Gene Kim released Accelerate: The Science of Lean Software and DevOps to detail their research into DevOps. They identified a causal link between DevOps organizations which score well on a number of metrics and success in the marketplace. Good scores on those metrics are in turn driven by successful implementation of DevOps practices. The book explains which metrics their research has found to be particularly significant.

If you’ve been following the DevOps movement you won’t be surprised to learn that these practices include Continuous Integration and Deployment Automation. As the research proceeds they continue to refine our understanding of which practices are most significant and which metrics are most useful to measure. I think it is really encouraging that researchers are applying proper science to this problem. This relationship they’ve discovered, between DevOps practices, metrics and organizational performance, is really useful for helping leaders in all kinds of organizations to make more informed decisions about how to do their work.

Avoiding a heart attack

I like to compare it with the relationship between your risk of a heart attack, your cholesterol levels, and your lifestyle. A low risk of heart attack is what you’re trying to optimize for, just as an organization might try to optimize for creating profit, increasing shareholder value, or reducing suffering in the world. Each organization will have some kind of ultimate aim, but as an employee, or even a leader, it is probably not easy to influence directly.

Measuring your cholesterol levels is a way to assess your risk of a heart attack. In much the same way, measuring the DevOps metrics detailed in Accelerate is a way to assess your organization’s chances of successfully meeting its goals. The lifestyle choices you make will, of course, affect your cholesterol levels, and consequently your heart attack risk. Similarly, the practices you use for development and operations will affect the values you score on Accelerate’s DevOps metrics, and hence your chances of achieving your goals. Crucially, the feedback loop is much faster for the metric than for the ultimate goal.

Lifestyle matters

I watched a documentary recently where they had four couples each take up a new diet for one month. Before and after they measured all sorts of things: weight, fat percentage, and a host of biomarkers, including cholesterol. One of the couples changed to quite an extreme Low-Carb diet, where they ate lots of meat and dairy, no fruit, and only a restricted selection of vegetables. After only one month it was noticeable how much higher their cholesterol levels had become. They were still quite young, so their risk of heart attack was still relatively low, but you could see if they persisted with this diet for the long term, the risk was going to become significant.

I think you can use the Accelerate DevOps metrics in a similar way. You can measure them relatively easily for your organization, and get an idea of where you stand. If you have poor values it doesn’t necessarily mean you are in imminent danger of company failure, but it doesn’t look good for the long term. If you make changes to the way you build software, for example by introducing Continuous Integration or investing in deployment automation, you should be able to see changes in your metrics relatively quickly. Hopefully this will encourage you to press on with more productive practices and become more likely to achieve your organization’s aims.

As a woman programmer, I have noticed there is something of a gender imbalance in my profession. It’s an issue that’s interested me for a while, not least because people often ask me about what we can do to improve the situation. For myself, I enjoy writing code and I think it’s a great career. The sexism I’ve been aware of has not made a big impact on my life, although I know not everyone has been so fortunate. Susan Fowler’s blog really shocked me earlier this year. I have had some bad experiences, but not like that.

I recently read this article about the history of women in programming, by . She shows this graph comparing percentage women in different university studies in the US. It’s quite stark:

The percentage of women studying Computer Science suffers a trend reversal in the mid 80’s, while the other subjects don’t. The explanation given, is that it’s about then that home computers began to appear on the market, sold as a toy for boys. I lived through that time, and yes, my family bought a ZX Spectrum in the mid 80’s when I was about 10 years old, and yes, my younger brother learnt to program it and I didn’t. Fortunately I managed to learn to program later on anyway.

All this got me thinking about my current situation. I live in Sweden, and it’s a very different culture than the US. For example, I was reading about the concept of ‘male privilege‘. One of the examples given is that men have the privilege of keeping their name when they marry, while women are questioned if they keep theirs. The thing is, in Sweden, this is not true. Either partner may change their name and it’s not remarkable for them both to keep their original names, or both swap to something entirely different. That’s something of a trivial example, but I do think it is a sign of a wider cultural difference. Privilege is experienced in a social context, and Sweden has a much more feminist society in many ways. (See for example this page about gender equality in Sweden)

So I became curious to see whether the same thing happened in Sweden – did the proportion of women computer scientists also drop in the 80’s? I discovered that the Swedish statistical authority collects and publishes data on this kind of thing, and you can search it via a web gui. I started poking around on it and soon I was hooked. Loads of really interesting data lying around waiting to be analysed!

Here is the plot I came up with, that is showing somewhat equivalent data to the graph on the US that I showed earlier:

(If you want to check my data, I got it from statistikdatabasen.scb.se, from the table “Antal examina i högskoleutbildning på grundnivå och avancerad nivå efter universitet/högskola, examen, utbildningslängd, kön och ålder. Läsår 1977/78 – 2015/16”)

Although the proportion of women in engineering is pretty low compared to the other subjects, it’s encouraging to note that the proportion has increased more than the other subjects. It’s now at a similar level to where doctors, lawyers and architects were thirty-five years ago. (I was disappointed not to find any data for Natural Sciences. I’m not sure why that’s excluded from the source database). Anyway, I’m not seeing this trend change in the 80’s, the curve is fairly smoothly upwards. I suspect the subject breakdown isn’t detailed enough to pick out Computer Science from the wider Engineering discipline, and that could explain it.

So I’ve done some more digging into the data to try to find if there was a turning point in the mid 80’s for aspiring women programmers. I think something did happen in Sweden, actually. This is the graph that I think shows it:

(I’m using the data sources “Anställda 16-64 år i riket efter yrke (3-siffrig SSYK 96), utbildningsinriktning (SUN 2000), ålder och kön. År 2001 – 2013” and “Antal examina i högskoleutbildning på grundnivå och avancerad nivå efter universitet/högskola, examen, utbildningslängd, kön och ålder. Läsår 1977/78 – 2015/16”, the SSYK codes I used are shown in the title of the graph)

If you look at the blue curve for 2001, you can see it peaks at age 35-39 years – that is, there were a higher proportion of women programmers at that age than other ages. If you were 35-39 in 2001, you were probably doing your studies in the mid to late 80’s. Notice that the proportion of women at younger ages is lower. The green and yellow curves for 2005 and 2010 continue to show the same peak, just moved five years to the right. The proportion of women coming in at the younger agegroups remains lower. The orange curve for 2015 is a little more encouraging – at least the proportion of women in the youngest two age-groups has levelled off and is no longer sinking!

So it looks to me like there was a trend change in the mid to late 80’s in Sweden too – the proportion of women entering the profession seems to drop from then on, based on this secondary evidence. I imagine that computers were also marketed here as a boy’s toy. I really hope that things are changing today in Sweden, and that more women are studying computer science than before.

For reference, I did similar curves for several other professions, using the same dataset.

So there are a lot of women lawyers out there, and the proportion looks to be continuing to increase.

Male nurses seem to have things worse than female programmers, unfortunately. Plus I can’t see any real trend in this graph – the situation is bad and fairly stable.

The proportion of women police officers levelled off for a while but they’ve managed to turn things around, and it is now increasing again.

So programming is the only profession I discovered that has this decreasing trend of women participation, even if it has now levelled off. Let’s hope that changes to an upward trend soon – my daughters will by applying to university in about ten year’s time…

 

 

I forget exactly when, but I think it was 2008 or 2009. Anyway, I was at a software conference, and I was chatting with a developer after one of the sessions about cool new technologies and stuff. I don’t remember what hot new thing it was we talked about, all I remember, is the shoes he was wearing!

credit: flickr, Steve Hodgson

This is rather an unusual style of running shoe. At the time, I’d never seen any like this before, and I was intrigued. It turns out that this developer I was talking to was, like me, also something of a serial early adopter, it’s just that he not only picked up shiny new programming tools and technologies.

At the time, I was running in a pair of shoes with thick heel padding the shop assistant had assured me would correct my bad posture and foot “pronation”. This guy’s “five finger” shoes had none of that, in fact quite the opposite. I was looking at disruptive running technology.

The conversation quickly switched from the latest programming tools and frameworks, as this guy explained the essential benefits of his shoes:

  • Lightweight
  • Your toes can spread out, giving better push-off from the ground
  • Thin sole – you adapt your stride to the surface because you can feel it
  • No heel padding, means you strike the ground with the whole foot simultaneously.

And of course, most importantly for a technology enthusiast:

  • People stare at you feet!

Following this conversation, a little googling about and watching the odd video by a “running style expert”, I became convinced. To be honest, it wasn’t much contest – shiny new technology, being in with the cool kids – I bought some new shoes, with toes and everything!

It took me several weeks to get used to them. You have to start with short distances, build up some new muscles in your foot, and learn to strike the ground with the whole foot at once. In my old shoes, I had a tendency to strike heel-first, because of the huge wad of padding on the sole, but in these minimal shoes, that just hurt. It was useful feedback, the whole-foot-at-once gait is supposed to be better for your knees.

After a few weeks of running shorter distances, slower than before, I gradually found my stride, and really started to enjoy running my usual 7km circuit of the local forest, in my eye-catching five-fingered shoes.
Unfortunately it didn’t last. Maybe two or three months later something happened. I think the technical term for it is Swedish Autumn. It turns out that forest tracks gain a surprising number of cold, muddy puddles at that time of year! Shoes with a very thin sole that isolate and surround each individual toe in waterlogged fabric, mean absolutely freezing feet 🙁

So I’m back on the internet, looking for new, shiny technology to fix this problem, and of course, I buy some new shoes. This time I got a pair of minimalist shoes in waterproof goretex, with basically all the features of my old shoes, minus the individual toes.


I was back out on the forest track, faster than ever, with dry, comfortable feet – win! The only problem was, people were no longer staring at my eye-catching toes. So you can’t have it all!

So this is normally a blog about programming. What’s going on?

Test Driven Development as a Disruptive Technology

I’ve been thinking about this, and it seems to me that as with running shoes that have toes, TDD is something of a disruptive technology. Just as I haven’t seen the majority of runners switch to shoes with toes, I also havn’t seen the majority of developers using TDD yet. Neither seem to have crossed Geoffrey Moore’s “chasm”.

Geoffrey Moore's technology adoption distribution showing the chasm

Lots of developers write unit tests, but I think that’s slightly different. I’m talking about a TDD where developers primarily use tests to inform and direct design decisions, and rely on them for minute-by-minute feedback as they work. In 2009, Kent Beck made an observation in his blog that “the data suggests that there are at most a few thousand Java programmers actively practicing TDD”. I don’t think the situation is radically different today.

So can we learn anything about TDD from the story about running shoes? A couple of points I find relevant:

  • Early adopters will try a new technology based on really very flimsy evidence, and will persevere with it, even if it slows them down in the short term.
  • Early adopters like to look cool and stick out.

You may think that last point is just vanity, but actually, being a talking point helps drive adoption, but primarily amongst other, similarly minded technology geeks.

I remember a while back I was at work, writing some code, when a guy from another team came over to ask me something. He was about to leave, when he did a double-take and stared at my screen for a moment. “Are you doing TDD? I’ve never seen anyone actually do that in production code. Do you mind if I watch?”. So, you see, eye-catching shiny new technology, and I’m one of the cool kids, about to be emulated by the guy in the next team. 🙂

The other part of this story, is of course the compromise I made when the cool technology met the reality of a muddy Swedish forest track. The toes went, but the shoe I ended up with is still radically different from the one I had before. I think that for TDD to reach the mainstream, it may need to become a little less extreme, a little more practical – but without losing the essential benefits.

What are the essential benefits of TDD? Well, I would say something like this:

  • Design: useful feedback, pushing you away from long methods and tightly-coupled classes, because they’re hard to test.
  • Refactoring: quickly detecting regression when you make a mistake
  • Productivity: helping you to manage complexity and work incrementally

So is it possible to get these things in another way? Without driving development minute-by-minute with tests? Well, that’s probably the subject of another blog post…

You might be interested to watch a video of my recent keynote speech at Europython, where I told this story.

Last week I created a little quiz  and put a link to it on Twitter. I was interested to see whether the terminology around Test Doubles has standardized on Gerard Meszaro’s definitions, from his book “xUnit Test Patterns“, and I thought my twitter followers (I have over 1000 now!) might be able to tell me.

The quiz was taken by nearly 150 people, and overall, my conclusion is that Meszaro’s definitions are in general use today, (at least amongst people who follow me on Twitter). You can see a summary of the responses here. (I’ve closed the survey now, btw).

The first question was “Which of these is not a kind of Test Double”, and anyone who thought a “Suite” was a kind of test double clearly hasn’t got a clue, so I excluded their answers from my analysis. That was only a handful of responses though.

Looking at the remaining answers, between 70 – 85% share my understanding of what each kind of test double is. The scores are noticably a little lower for ‘Spy’, and actually I think my question on Spies was not very good. Jason Gorman kindly sent me a few tweets pointing out what he thought was unclear about it. The answer I was looking for to distinguish a Spy was “A Spy may raise an exception in the ‘Assert’ part of the test.”.  I was trying to articulate the difference between a “Spy” and a “Mock”, but I’ve personally only used Test Spy frameworks like Mockito and unittest.mock. Clearly I have more to learn.

I got the Mock question right, “A mock will fail the test if its methods are not called as expected”, but I thought a Spy was basically the same, except it fails the test in the “Assert” part instead of the “Act” part. Jason kindly pointed out that a Spy could be a Stub, a Fake, or even a decorated ‘real’ object. You don’t necessarily need to use a framework. The distinguishing feature of a Spy is that it records information about interactions, that your test can query later. So that was a good result from my quiz – I learnt something about Spies!

I also asked whether people had read Gerard Meszaro’s book, or my work-in-progress book, because I was interested to see if people would give better answers in this case. When I excluded all the people who said they hadn’t read either book, (about a third of responses), the scores improved significantly – over 80% agreed with me about the definitions of Mock, Stub and Fake. For Spies, on the other hand, the score was lower than for the group as a whole! That was clearly my fault…

Some people tried to argue that the distinction between the various kinds of test double is not interesting, that it doesn’t matter. I disagree. I think each kind has a different purpose and different situations when it is appropriate to use. If you can’t articulate what kind of test double you’re using, then there’s a very real danger you’re using it wrongly, or that better alternatives exist. Like – using several Mocks in the same test case, and finding it very fragile, when Stubs would be sufficient. Or – creating a new Stub for every test case, when it would be less work, and make for clearer tests, if you had a Fake that you could re-use in several.

So my conclusion from this quiz, and my other research on the subject, is that it’s worth using Meszaro’s definitions when discussing Test Doubles. A lot of programmers have a good working understanding of them, and do know the difference between a Stub and a Mock. So that’s very encouraging! One of the reasons Meszaros invented the “Test Double” terminology was to clear up the confusion that reigned at the time. It seems to me we’re in a much better position today because of his work. I do think there is more to do, though, I still see people using Test Doubles badly. Which is partly the motivation for my new book, which is about various techniques in Test Driven Development, not just Test Doubles actually. I’d better go and do some work on the section on Spies…