Posts tagged ‘BDD’

By Emily Bache

Or: is Given-When-Then Compulsory?

In BDD you discover what software you should build through a collaborative process involving both software developers and business people. BDD also involves a lot of test automation and tools like Cucumber and SpecFlow. But what would happen if you used an Approval testing tool instead? Would that still be BDD?

Double-loop TDD diagram. Failing scenario -> passing scenario -> refactor and inner loop red->green->refactor

Figure 4 from “Discovery – Explore behaviour using examples” by Gaspar Nagy and Seb Rose

I’m a big fan of Behaviour Driven Development. I think it’s an excellent way for teams to gain a good understanding of what the end-user wants and how they will use the software. I like the emphasis on whole team collaboration and building shared understanding through examples. These examples can be turned into executable scenarios, also known as acceptance tests. They then become ‘living documentation’ that stays in sync with the system and helps everyone to collaborate over the lifetime of the software. 

I wrote an article about Double-Loop TDD a while back, and I was thinking about BDD again recently in the context of Approval testing. Are they compatible? The usual tools for automating scenarios as tests are SpecFlow and Cucumber which both use the Gherkin syntax. Test cases comprise ‘Given-When-Then’ steps written in natural language and backed up by automation code. My question is – could you use an Approval testing tool instead? 

I recently read a couple of books by Nagy and Rose. They are about BDD and specifically how to discover good examples and then formulate them into test cases. I thought the books did a good job of clearly explaining these aspects in a way that made them accessible to everyone, not just programmers. 

Nagy and Rose are planning a third book in the series which will be more technical and go into more detail on how to implement the automation. They say that you can use other test frameworks, but in their books they deal exclusively with the Gherkin format and Cucumber family of tools. What would happen if you used an Approval testing tool? Would it still be BDD or would we be doing something else? Let’s go into a little more detail about the key aspects of BDD: discovery, formulation, and automation.

Discovery

The discovery part of BDD is all about developers talking with business stakeholders about what software to build. Through a structured conversation you identify rules and examples and unanswered questions. You can use an ‘example mapping’ workshop for that discussion outlined in this blog post by Cucumber Co-founder, Matt Wynne.

Formulation

The formulation part of BDD is about turning those rules and examples of system behaviour into descriptive scenarios. Each scenario is made as intelligible as possible for business people, consistent with the other scenarios, and unambiguous about system behaviour. There’s a lot of skill involved in doing this!

Automation

The automation part of BDD is where you turn formulated scenarios into executable test cases. Even though the automation is done in a programming language, the focus is still on collaboration with the business stakeholders. Everyone is expected to be able to read and understand these executable scenarios even if they can’t read a programming language.  

Double-Loop TDD

The picture shown at the start of the article from Nagy and Rose’s Discovery BDD book emphasizes the double loop nature of the BDD automation cycle. The outer loop is about building the supporting code needed to make a formulated scenario executable. Test-Driven Development fits within it as the inner loop for implementing the system that fulfills the scenarios. In my experience the inner loop of unit tests goes round within minutes, whereas the outer loop can take hours or even days.  

Later in the book they have a more detailed diagram showing an example BDD process:


Figure 16  from “Discovery – Explore behaviour using examples” by Gaspar Nagy and Seb Rose

This diagram is more complex, so I’m not going to explain it in depth here (for a deep dive take a look at this blog post by Seb Rose, or of course read the book itself!). What I want to point out is that the ‘Develop’ and ‘Implement’ parts of this diagram are showing double-loop TDD again, with slightly more detail than before. For the purpose of comparing a BDD process, with and without Approval testing, I’ve redrawn the diagram to emphasize those parts:

How you formulate, automate, and implement with TDD will all be affected by an approval testing approach. I recently wrote an article ”How to develop new features with Approval Testing, Illustrated with the Lift Kata”. That article goes through a couple of scenarios, how I formulate them as sketches, then automate them with an approval testing tool. Based on the process described in that article I could draw it like this:

What’s different?

  • “Formulate” is called “Sketch” since the method of formulation is visual rather than ‘Given-When-Then’. The purpose is the same though.
  • “Automate” includes writing a Printer as well as the usual kind of ‘glue’ code to access functionality in your application. A Printer can print the state of the software system in a format that matches the Sketch. The printer code will also evolve as you work on the implementation.
  • “Implement” is a slightly modified TDD cycle. With approval tests you still work test-driven and you still refactor frequently, but other aspects may differ. You may improve the Printer and approve the output many times before being ready to show the golden master to others for review.
  • “Review” – this activity is supposed to ensure the executable scenario is suitable to use as living documentation, and that business people can read it. The difference here is that the artifact being reviewed is the Approved Golden Master output, not the sketch you made in the “Formulate” activity. It’s particularly important to make sure business people are involved here because the living documentation that will be kept is a different artifact from the scenario they co-created in the ‘discover’ activities.

But is this still BDD?

I’m happy to report that, yes, this is still BDD! I hope you can see the activities are not that different. Just as importantly, the BDD community is open and welcoming of diversity of practice. This article describes BDD practitioners as forming a ‘centered’ community rather than a bounded community. That means people are open to you varying the exact practices and processes of BDD so long as you uphold some common values. The really central part of BDD is the collaborative discovery process.

In this article I hope I’ve shown that using an approval testing approach upholds that collaborative discovery process. It modifies the way you do formulation, automation, and development, but in a way that retains the iterative, collaborative heart of BDD. For some kinds of system sketches and golden masters might prove to be easier for business people to understand than the more mainstream ‘Given-When-Then’ Gherkin format. In that case an approval testing tool might enable a better collaborative discovery process and propel you closer to the centre of BDD. 

Conclusions

BDD is about a lot more than test automation, and Gherkin is not the only syntax you can use for that part. Approval testing is perfectly compatible with BDD. I’m happy I can both claim to be a member of the BDD community and continue to choose a testing tool that fits the context I’m working in. 🙂 
If you’d like to learn more about Approval testing check out this video of me pair programming with Adrian Bolboaca.

It’s Test-Driven Development with a twist! Developing new functionality with approval tests requires some slightly different steps, but if you’re a visual thinker like me you might just prefer it. In this blog post I’ll explain how it works.

Photo by Jason Dent on Unsplash

You may be familiar with the Gilded Rose Kata. It’s the most popular exercise I have on my GitHub page. About a year ago I posted some videos demonstrating a way to solve it. I used several techniques, including ‘Approval’ testing, which is also known as ‘Golden Master’ testing. It’s an approach that’s often used to get legacy code under control. What’s perhaps less known is that you can use the same tools for new development. I’ve put together a new exercise – the ‘Lift’ kata – to help people understand how this works.

If you’ve never done the Lift Kata now might be a good time to try it out. I originally worked from this description of it, and I now have my own description and GitHub repo for those who want to try it out “approval testing style”. The first step towards solving it is to spend some time understanding the problem. I’m going to assume that most of you have been in a lift at some point, so take a few minutes to note down your understanding of how they work, and the rules that govern them. Perhaps even formulate some test cases.

I did this by sketching out some scenarios. I say ‘sketch’ and not ‘formulate’ quite deliberately! The way my mind works is quite visual, so for me it made sense to represent each floor vertically on the page, and write the name of the lift next to the floor it was on. This shows a lift system with four floors named 0, 1, 2, 3, and one lift named ‘A’, on floor 0:

sketch of one lift on floor 0

This is just a snapshot of a moment in time. I then started to think about how a lift responds to people pressing the floor buttons inside. I figured that this is an important aspect to test and proceeded to sketch it out. It occurred to me that I could write a list of requested floor numbers next to the lift name, but then I noticed it was even clearer if I put a mark next to each requested. For example, if passengers request floors 2 and 3 I can sketch it like this:

sketch of lift on floor 0 with requests to go to floor 2 and 3

The next move for this lift would be to go to floor 2 since it’s the closest requested floor. That example could be formulated as a test case sketch like this:

sketch of lift on floor 0 moving to floor 2

I can use this sketch as the first test case for TDD. I’ll need to write code for a lift with floors and requests. I’ll also need to write a ‘Printer’ that can turn a lift object into some output that looks like my sketch. I write some code for this and use the printer output in the ‘verify’ step of the test. After some work the output looks like this:

ascii printout from my test showing lift moving from floor 0 to floor 2

This ascii-art looks much the same as my sketch. One difference is that I wrote the floor numbers at both ends of each line. This is a trick to stop my editor from deleting what it thinks is irrelevant trailing whitespace at the ends of lines! I think it looks enough like my sketch to approve the output and store it as a ‘golden master’ for this scenario. Actually, I’ve already approved it several times as it started to look more and more like my sketch. And every time I did that I could refactor a little before adding more functionality and updating the approved file again.

I’m looking at the requirements again and realize that I haven’t modelled the lift’s doors. You can’t fulfill a request until you’ve opened the doors, and that only happens after you’ve moved to the right floor. I drew a new sketch including them, shown below. I’ve written [A] for a lift called ‘A’ with closed doors, and ]A[ for when it has open doors. I also show an intermediate step when the lift is on the correct floor, but since the doors are closed the request is still active: 

sketch of lift moving from floor 0 to floor 2 and opening the doors

To get this to pass I’ll need to update all of my lift class, my printer, and my test case. After a little coding, and a few iterations of improving both the code and the printer, the test produces output that looks like this and I approve it:

ascii printout of lift moving from floor 0 to 2 and opening the doors

Now that the test is passing, I’m fairly happy that my lift can answer requests. The next feature I was thinking about was being able to call the lift from another floor. For this I think I’ll need a new test case. Let’s say I’m standing on the third floor and the lift is on floor 1, and I press the button to go down. I can include that in my sketch by putting a ”v” next to the floor I’m on. The whole scenario might play out like this:

sketch of lift on floor 1 being called to floor 3, moving there and opening the doors

As before, I spend time improving both the lift code and the printer. I approve intermediate results several times and do several refactorings. At some point the output from my program looks like my sketch and I approve it:

Great stuff! My lift can now fulfill requests from passengers and answer calls from another floor. Time for a celebratory cup of tea!

I’ve shown you the first couple of test cases, but there are of course plenty more features I could implement. A system with more than one lift for a start. Plus, the lift should alert the person waiting when it arrives by making a ‘ding’ when it opens the doors. I feel my lifts would be vastly improved if they said ding! I’ll have to come up with a new sketch that includes this feature. For the moment, let’s pause and reflect on the development process I’ve used so far.

Comparing Approval Testing with ordinary TDD

If I’d been doing ordinary Test-Driven Development with unit tests I might have created a dozen tests in the same time period for the same functionality. With Approval Testing I’ve still been working incrementally and iteratively and refactoring just as frequently. I only have two test cases though. The size of the unit being tested is a little larger than with ordinary TDD, but the feedback cycle is similarly short. 

Having a slightly larger unit for testing can be an advantage or a disadvantage, depending on how you view it. When the chunk of code being tested is larger, and the test uses a fairly narrow interface to access that code, it constrains the design less than it would if you instead had many finer grained tests for lower level interfaces. That means the tests don’t influence the design as strongly, and don’t need to be changed as often when you refactor. 

Another difference is that I’ve invested some effort in building code that can print a lift system as an ASCII artwork, which is reused in all my tests. In classic TDD I’d have had to write assertion code that would have been different in every test. 

Try it for yourself


What I’ve done isn’t exactly the same as ordinary TDD, but I think it’s a useful approach with many of the same benefits. I’ve put this exercise up on GitHub, so you can try it out for yourself. I’ve included the code for my printer so you don’t have to spend a lot of time setting that up, and can get on with developing your lift functionality. I’ve also recorded a video together with Adrian Bolboaca where I explain how the exercise works. So far I’ve translated the starting code into Java, C# and Python, and some friends have done a C++ version. (Do send me a pull request if you translate it to your favourite language.) And that’s it! You’ve seen how easy it is, so why don’t you have a try at Approval testing-style TDD for yourself?

I’ve previously written about Agile test automation principles, and since then I’ve had some interesting discussions with people that have led me to revise them in this article. In particular, Seb Rose wrote about his 6 principles of unit testing and pointed out some issues with mine. So this article is an update on the previous one, and I’m hoping this will spark further interesting discussions!

I feel like I’ve spent most of my career learning how to write good automated tests in an agile environment. When I downloaded JUnit in the year 2000 it didn’t take long before I was hooked – unit tests for everything in sight. That gratifying green bar is near-instant feedback that everthing is as expected, my code does what I intended, and I can continue developing from a firm foundation.

Later, starting in about 2002, I began writing larger granularity tests, for whole subsystems; functional tests if you like. The feedback that my code does what I intended, and that it has working functionality has given me confidence time and again to release updated versions to end-users.

I was not the first to discover that developers design automated functional tests for two main purposes. Initially we design them to help clarify our understanding of what to build. In fact, at that point they’re not really tests, we usually call them scenarios, or examples. Later, the main purpose of the tests becomes to detect regression errors, although we continue use them to document what the system does.

When you’re designing a functional test suite, you’re trying to support both aims, and sometimes you have to make tradeoffs between them. You’re also trying to keep the cost of writing and maintaining the tests as low as possible, and as with most software, it’s the maintenance cost that dominates. Over the years I’ve begun to think in terms of four principles that help me to design functional test suites that make good tradeoffs and identify when a particular test case is fit for purpose.

Book-128Readability

When you look at the test case, you can read it through and understand what the test is for. You can see what the expected behaviour is, and what aspects of it are covered by the test. When the test fails, you can quickly see what is broken.

If your test case is not readable, it will not be useful, neither for understanding what the system does, or identifying regression errors. When it fails you will have to dig though other sources outside of the test case to find out what is wrong. Quite likely you will not understand what is wrong and you will rewrite the test to check for something else, or simply delete it.

internet_128Robustness

When a test fails, it means there is a regression error, (functionality is broken), or the system has changed and the tests no longer document the correct behaviour. You need to take action to correct the system or update the test, and this is as it should be. If however, the test has failed for no good reason, you have a problem: a fragile test.

There are many causes of fragile tests. For example tests that are not isolated from one another, duplication between test cases, and dependencies on random or threaded code. If you run a test by itself and it passes, but fails in a suite together with other tests, then you have an isolation problem. If you have one broken feature and it causes a large number of test failures, you have duplication between test cases. If you have a test that fails in one test run, then passes in the next when nothing changed, you have a flickering test.

If your tests often fail for no good reason, you will start to ignore them. Quite likely there will be real failures hiding amongst all the false ones, and the danger is you will not see them.

Speed-128Speed

As an agile developer you run your test suite frequently. Both (a) every time you build the system, (b) before you check in changes, and (c) after check-in in an automated Continuous Integration environment. I recommend time limits of 2 minutes for (a), 10 minutes for (b), and 60 minutes for (c). This fast feedback gives you the best chance of actually being willing to run the tests, and to find defects when they’re cheapest to fix, soon after insertion.

If your test suite is slow, it will not be used. When you’re feeling stressed, you’ll skip running them, and problem code will enter the system. In the worst case the test suite will never become green. You’ll fix the one or two problems in a given run and kick off a new test run, but in the meantime you’ll continue developing and making other changes. The diagnose-and-fix loop gets longer and the tests become less likely to ever all pass at the same time. This can become pretty demoralizing.

updatability.001Updatability

When the needs of the users change, and the system is updated, your tests also need to be updated in tandem. It should be straightforward to identify which tests are affected by a given change, and quick to update them all.

If your tests are not easy to update, they will likely get left behind as the system moves on. Faced with a small change that causes thousands of failures and hours of work to update them all, you’ll likely delete most of the tests.

Following these four principles implies Maintainability

Taken all together, I think how well your tests adhere to these principles will determine how maintainable they are, or in other words, how much they will cost. That cost needs to be in proportion to the benefits you get: helping you understand what the system does, and regression protection.

As your test suite grows, it becomes ever more challenging to adhere to all the principles. Readability suffers when there are so many test cases you can’t see the forest for the trees. The more details of your system that you cover with tests, the more likely you are to have Robustness problems – tests that fail when these details change.  Speed obviously also suffers – the time to run the test suite usually scales linearly with the number of test cases. Updatability doesn’t necessarily get worse as the number of test cases increases, but it will if you don’t adhere to good design principles in your test code, or lack tools for bulk update of test data for example.

I think the principles are largely the same whether you’re writing skinny little unit tests or fatter functional tests that touch more of the codebase. My experience tells me that it’s a lot easier to be successful with unit tests. As the testing thickness increases, the feedback cycle gets slower, and your mistakes are amplified. That’s why I concentrate on teaching these principles through unit testing exercises. Once you understand what you’re aiming for, you can transfer your skills to functional tests.

How can you use these principles?

I find it useful to remember these principles when designing test cases. I may need to make tradeoffs between them, and it helps just to step back and assess how I’m doing on each principle from time to time as I develop. If I’m reviewing someone else’s test cases, I can point to code and say which principles it’s not following, and give them concrete advice about how to make improvements. We can have a discussion for example about whether to add more test cases in order to improve regression protection, and how to do that without reducing overall readability.

I also find these principles useful when I’m trying to diagnose why a test suite is not being useful to a development team, especially if things have got so bad they have stopped maintaining it. I can often identify which principle(s) the team has missed, and advise how to refactor the test suite to compensate.

For example, if the problem is lack of Speed you have some options and tradeoffs to make:

  • Replace some of the thicker, slower end-to-end tests with lots of skinny fast unit tests, (may reduce regression protection)
  • Invest in hardware and run tests in parallel (costs $)
  • Use a profiler to optimize the tests for speed the same as you would production code (may affect Readability)
  • Use more fakes to replace slow parts of the system (may reduce regression protection)
  • Identify key test cases for essential functionality and remove the other test cases. (sacrifice regression protection to get Speed)

Strategic Decisions

The principles also help me when I’m discussing automated testing strategy, and choosing testing tools. Some tools have better support for updating test cases and test data. Some allow very Readable test cases. It’s worth noting that automated tests in agile are quite different from in a traditional process, since they are run continually throughout the process, not just at the end. I’ve found many traditional automation tools don’t lead to enough Speed and Robustness to support agile development.

I hope you will find these principles help you to reason about your strategy and tools for functional automated testing, and to design more maintainable, useful test cases.

Images Attribution: DaPino Webdesign, Lebreton, Asher Abbasi, Woothemes, Iconshock, Andy Gongea, FatCow from www.iconspedia.com

One of the great privileges of being the programme chair for Scandinavian Developer Conference is getting to choose the keynote speakers. This year, I’m delighted to present Dan North and Janice Fraser, both thought leaders in the field of software development. Although from different backgrounds and perspectives, they’re both accomplished at building software that delivers great business outcomes. I’d like to tell you a little about each person, and why I’ve invited them to Göteborg for SDC2013.

Dan North – a man full of intriguing ideas

I first came accross Dan North at a conference in 2007, talking about a topic I was very familiar with – unit testing – but using a whole new set of words. Behaviour Driven Development (BDD) intrigued me then, and still does now. How can switching the word “Test” for “Behaviour” and “AssertEqual” to “Should be” make such a difference to the way you end up designing your code?

In his famous article from 2006, “Introducing BDD” Dan explains that he found when he stopped talking about “Testing” and started instead used the word “Behaviour”, “… a whole category of coaching problems disappeared”. People understand more easily that defining the behaviour of the software is an important activity for the whole team, not just testers. It also changes the way you as a programmer think about your code, and helps you focus on what’s important.

BDD as an approach to software development is still being actively developed and written about, although Dan himself has largely stepped aside in favour of other thought leaders like Elizabeth Keogh, Chris Matts, Olav Maasen, and of course Gojko Adzic. Gojko wrote the hugely influential book “Specification by Example” which is all about having useful conversations about software behaviour, and expressing that in terms of executable examples – ie a lot like BDD. At about the time that book was being written, Dan himself chose a different road. He actually stepped out of the consultant life entirely for about two years, taking up a full time position developing software at a financial trading firm.

His latest ideas around “Accelerated Agile” to a large extent come from his experiences working in that high-powered trading environment. He wrote in his blog: “This team was the most insanely effective delivery machine I’ve ever been a part of”, and I find that particularly intriguing. He says that standard agile practices like Continuous Integration and maintaining a Product Backlog weren’t being used! So what exactly did they do in order to be so effective?

Dan has also famously opined that “Programming is not a Craft”, and argues that the “Software Craftsmanship Manifesto … [is] a spectacularly easy bandwagon to jump on”. He says he’d rather see, “…a call to arms to stop navel-gazing and treat programming as the skilled trade that it is.” So there. Dan certainly has some strong opinions, and when I’ve met him, he always seems to express himself with wit and intelligence.

I’m really looking forward to Dan’s keynote address on Tuesday 5th March. I’m intrigued to find out what “Patterns of Effective Delivery” is all about, and to hear his latest opinions on the practice of software development.

Janice Fraser – a pioneer of Lean User Experience

To a large extent, Silicon Valley is the epicenter of our whole industry. Many of the biggest and most influential software companies in the world are based there, and as an incubator with a friendly climate for startup companies, it is unparalleled, despite the efforts of many other regions around the globe to imitate their success.

Janice Fraser has been working in Silicon Valley for over 15 years, and she says in her CVI’ve seen a lot — bubbles, bursts, and fantastic acts of collaboration that have transformed literally billions of lives.” Yes, that’s right, billions of lives.

The latest trend coming out of the Valley is “Lean Startup”, a term coined by Eric Ries, and documented in his bestselling book “The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Business”.  I first heard about it in 2011, when Joshua Kerievsky, an early adopter of eXtreme Programming and successful entrepreneur, published an article “Agile vs. Lean Startup”. He says, “[Lean Startup] rocks. It rocks far more than Agile.” If it rocks far more than agile, then I find that pretty intriguing!

Janice Fraser is of course also an early adopter of “Lean Startup”, and has pointed out that the ideas in it are not all new. She saysThe Lean Startup, is a rediscovery of user centered design… [it] gives UX teams an unqualified mandate to make products customers love.

Janice herself is a serial entrepreneur, having led several startup companies. She says in her CV, “My proudest success is Adaptive Path, a leading product design firm. I was a founder and served as the company’s first CEO”. Adaptive Path is still successfully in business.

Janice is not shy about recounting her failures either, she’s written a candid report of how she started “Emmet Labs” in 2007 intending to change the world, right through to when she laid off all the staff in 2009. I think her article “7 things I did right with Emmet Labs” shows how much courage and determination it takes to build a company, and how resilient and clear-thinking Janice herself can be in a crisis.

Janice’s business these days is helping other startup companies to succeed: “Before you build anything, find customers, learn their needs & goals, and measure your progress towards your vision” – an extract from the marketing materials for her company, Luxr. She’s also just about to publish a book “The Lean Product Book: How Smart Teams Work Better”, which I guess will document the kind of advice she gives to her clients – all about Lean User Experience.

All this talk of startups and product development in Silicon Valley might seem a long way from chilly Göteborg and our IT industry, dominated by a few huge corporations. I think it’s just the kind of thing we need to hear about, though. Companies of any size need to renew themselves and develop great new products in order to flourish, and this is clearly an area where Janice is innovating and leading the world. I’m really looking forward to hear what she’s going to say in her keynote “Lean Startup Product Teams: Principles of Success”, on Monday 4th March.

Please note – As of March 2013, I have rewritten this post in the light of further experience and discussions. The updated post is available here.

I feel like I’ve spent most of my career learning how to write good automated tests in an agile environment. When I downloaded JUnit in the year 2000 it didn’t take long before I was hooked – unit tests for everything in sight. That gratifying green bar is near-instant feedback that everthing is as expected, my code does what I intended, and I can continue developing from a firm foundation.

Later, starting in about 2002, I began writing larger granularity tests, for whole subsystems; functional tests if you like. The feedback that my code does what I intended, and that it has working functionality has given me confidence time and again to release updated versions to end-users.

Often, I’ve written functional tests as regression tests, after the functionality is supposed to work. In other situations, I’ve been able to write these kinds of tests in advance, as part of an ATDD, or BDD process. In either case, I’ve found the regression tests you end up with need to have certain properties if they’re going to be useful in an agile environment moving forward. I think the same properties are needed for good agile functional tests as for good unit tests, but it’s much harder. Your mistakes are amplified as the scope of the test increases.

I’d like to outline four principles of agile test automation that I’ve derived from my experience.

Coverage

If you have a test for a feature, and there is a bug in that feature, the test should fail. Note I’m talking about coverage of functionality, not code coverage, although these concepts are related. If your code coverage is poor, your functionality coverage is likely also to be poor.

If your tests have poor coverage, they will continue to pass even when your system is broken and functionality unusable. This can happen if you have missed out needed test cases, or when your test cases don’t check properly what the system actually did. The consequences of poor coverage is that you can’t refactor with confidence, and need to do additional (manual) testing before release.

The aim for automated regression tests is good Coverage: If you break something important and no tests fail, your test coverage is not good enough. All the other principles are in tension with this one – improving Coverage will often impair the others.

Readability

When you look at the test case, you can read it through and understand what the test is for. You can see what the expected behaviour is, and what aspects of it are covered by the test. When the test fails, you can quickly see what is broken.

If your test case is not readable, it will not be useful. When it fails you will have to dig though other sources outside of the test case to find out what is wrong. Quite likely you will not understand what is wrong and you will rewrite the test to check for something else, or simply delete it.

As you improve Coverage, you will likely add more and more test cases. Each one may be fairly readable on its own, but taken all together it can become hard to navigate and get an overview.

Robustness

When a test fails, it means the functionality it tests is broken, or at least is behaving significantly differently from before. You need to take action to correct the system or update the test to account for the new behaviour. Fragile tests are the opposite of Robust: they fail often for no good reason.

Aspects of Robustness you often run into are tests that are not isolated from one another, duplication between test cases, and flickering tests. If you run a test by itself and it passes, but fails in a suite together with other tests, then you have an isolation problem. If you have one broken feature and it causes a large number of test failures, you have duplication between test cases. If you have a test that fails in one test run, then passes in the next when nothing changed, you have a flickering test.

If your tests often fail for no good reason, you will start to ignore them. Quite likely there will be real failures hiding amongst all the false ones, and the danger is you will not see them.

As you improve Coverage you’ll want to add more checks for details of your system. This will give your tests more and more reasons to fail.

Speed

As an agile developer you run the tests frequently. Both (a) every time you build the system, and (b) before you check in changes. I recommend time limits of 2 minutes for (a) and 10 minutes for (b). This fast feedback gives you the best chance of actually being willing to run the tests, and to find defects when they’re cheapest to fix.

If your test suite is slow, it will not be used. When you’re feeling stressed, you’ll skip running them, and problem code will enter the system. In the worst case the test suite will never become green. You’ll fix the one or two problems in a given run and kick off a new test run, but in the meantime someone else has checked in other changes, and the new run is not green either. You’re developing all the while the tests are running, and they never quite catch up. This can become pretty demoralizing.

As you improve Coverage, you add more test cases, and this will naturally increase the execution time for the whole test suite.

How are these principles useful?

I find it useful to remember these principles when designing test cases. I may need to make tradeoffs between them, and it helps just to step back and assess how I’m doing on each principle from time to time as I develop.

I also find these principles useful when I’m trying to diagnose why a test suite is not being useful to a development team, especially if things have got so bad they have stopped maintaining it. I can often identify which principle(s) the team has missed, and advise how to refactor the test suite to compensate.

For example, if the problem is lack of Speed you have some options and tradeoffs to make:

  • Invest in hardware and run tests in parallel (costs $)
  • Use a profiler to optimize the tests for speed the same as you would production code (may affect Readability)
  • push down tests to a lower level of granularity where they can execute faster. (may reduce Coverage and/or increase Readability)
  • Identify key test cases for essential functionality and remove the other test cases. (sacrifice Coverage to get Speed)

Explaining these principles can promote useful discussions with people new to agile, particularly testers. The test suite is a resource used by many agile teamembers – developers, analysts, managers etc, in its role as “Living Documentation” for the system, (See Gojko Adzic‘s writings on this). This emphasizes the need for both Readability and Coverage. Automated tests in agile are quite different from in a traditional process, since they are run continually throughout the process, not just at the end. I’ve found many traditional automation approaches don’t lead to enough Speed and Robustness to support agile development.

I hope you will find these principles will help you to reason about the automated tests in your suite.