Posts tagged ‘TDD’

This is the third post in a series about London School TDD. The first one is here, introducing the topic. The second post discusses “Outside-In Development with Double-Loop TDD”. In this post I’d like to talk about the second difference I see between Classic and London School TDD, which is to do with your style of Object Oriented Design.

“Different design styles have different techniques that are most applicable for test-driving code written in those styles, and there are different tools that help you with those techniques…

That’s what we  … designed JMock to do …
“Tell, Don’t Ask” object-oriented design.”

— Nat Pryce, in an email to a discussion forum.

That quote explains the objective Nat et al had when designing JMock, and I think it shows  that London School TDD is actually a school of design as much as a testing technique. Let’s take a closer look at this way of designing objects.

Tell, Don’t Ask

“Tell, Don’t Ask” Object Oriented Design is about having Cohesive objects that hide their internal workings. If your objects obey the Law of Demeter, that’s a good start, it means they hide their inner workings and don’t talk to objects far away on the object graph. It reduces Coupling in your system, which should make for better maintainability.

In their book “Growing Object Oriented Software, Guided by Tests”, Freeman & Pryce actually define “Tell, Don’t Ask” as the same as following the Law of Demeter (p17). Then they go on with several chapters about their design style, expanding far beyond simply “following the Law of Demeter”. It’s well worth a read, here’s a sample:

“… we focus our design effort on how the objects collaborate … obviously, we want to achieve a well-designed class structure, but we think the communication patterns between objects are more important.”

— Freeman & Pryce, GOOS, p58

Message passing vs Types with Data

So it’s basically about how you view your objects. Do you see them primarily in terms of sending and receiving messages to other objects in order to get stuff done? Or are you more focussed on the data your objects look after and the class of objects they are part of?

In the diagram below you can see an object is defined in terms of which messages it sends and receives:      london_school_008

This diagram shows the same object, but with a focus on data rather than messages:

london_school_007

If you have a “message” focus, you’ll be concerned with defining protocols and interfaces. You’ll worry about which collaborators will be needed to process a particular message. If you have a “data” focus, you’ll be interested in checking your object goes through particular state transitions. You’ll check it makes correct calculations based on its data, and hides whether the result is cached or calculated.

In my first post I talked about the three ways to verify object behaviour. In Classic TDD, the most popular way to write your assert is to check the state of the object you’re testing, or a collaborator, using a public API. This naturally leads you to design objects that are more type-oriented, with the emphasis on class relationships.

In London School TDD, the favoured way to write your assert is to use a mock and check a particular interaction happened, or in other words, a particular message was passed. This is because you favour a design where objects don’t reveal much at all about their data – your system is all about the interactions.

Dependencies and Collaborators

In a message-oriented design, it’s natural to want to specify which collaborators a particular object needs in order to get something done, and what messages it will send them. It’s part of the public specification of an object, and natural to pin down in a test case using mocks. If you instead check your object via a method that lets you query its state, it could expose details that might stop you refactoring the internals later. This leads you to prefer to check your messages, rather than state and data.

If you have a more type-oriented design, you may want to hide the fact you’re storing data accross several objects, or delegating certain calculations to other objects. Those dependencies aren’t part of the public specification, what matters is the end result. If you start exposing these interactions in your test via mocks, you’ll end up with brittle tests that hinder a subsequent redistribution of responsibilities between an object and its dependents. This leads you to prefer to check state and data, rather than interactions.

Comparing the two styles

In these articles I’ve tried to draw each style of TDD to an extreme in order to emphasize the differences. Of course, in practice, a competent developer will use the style most appropriate to the situation she finds herself in. She may use both styles while developing different pieces of the same system. In my next post, I’d like to illustrate this with a small example.

In my last post, I started talking about London School TDD, and the two features of it that I think distinguish it from Classic TDD. The first was Outside-In development with Double Loop TDD, which I’d like to talk more about in this post. The second was “Tell, Don’t Ask” Object Oriented Design. I’ll take that topic up in my next post.

Double Loop TDD

london_school_001

When you’re doing double loop TDD, you go around the inner loop on the timescale of minutes, and the outer loop on the timescale of hours to days. The outer loop tests are written from the perspective of the user of the system. They generally cover thick slices of functionality, deployed in a realistic environment, or something close to it. In my book I’ve called this kind of test a “Guiding Test”, but Freeman & Pryce call them “Acceptance Tests”. These tests should fail if something the customer cares about stops working – in other words they provide good regression protection. They also help document what the system does. (See also my article “Principles for Agile Automated Test Design“).

I don’t think Double Loop TDD is unique to the London School of TDD, I think Classic TDDers do it too. The idea is right there in Kent Beck’s first book about eXtreme Programming. What I think is different in London School, is designing Outside-In, and the use of mocks to enable this.

Designing Outside-In

If you’re doing double loop TDD, you’ll begin with a Guiding Test that expresses something about how a user wants to interact with your system. That test helps you identify the top level function or class that is the entry point to the desired functionality, that will be called first. Often it’s a widget in a GUI, a link on a webpage, or a command line flag.

With London School TDD, you’ll often start your inner loop TDD by designing the class or method that gets called by that widget in the GUI, that link on the webpage, or that command line flag. You should quickly discover that this new piece of code can’t implement the whole function by itself, but will need collaborating classes to get stuff done.

london_school_003

The user looks at the system, and wants some functionality. This implies a new class is needed at the boundary of the system. This class in turn needs collaborating classes that don’t yet exist.

The collaborating classes don’t exist yet, or at least don’t provide all the functionality you need. Instead of going away and developing these collaborating classes straight away, you can just replace them with mocks in your test. It’s very cheap to change mocks and experiment until you get the the interface and the protocol just the way you want it. While you’re designing a test case, you’re also designing the production code.

london_school_004

You replace collaborating objects with mocks so you can design the interface and protocol between them.

When you’re happy with your design, and your test passes, you can move down the stack and start working on developing the implementation of one of the collaborating classes. Of course, if this class in turn has other collaborators, you can replace them with mocks and design these interactions too. This approach continues all the way through the system, moving through architectural layers and levels of abstraction.

london_school_005

You’ve designed the class at the boundary of the system, and now you design one of the collaborating classes, replacing its collaborators with mocks.

This way of working lets you break a problem down into manageable pieces, and get each part specified and tested before you move onto the next part. You start with a focus on what the user needs, and build the system from the “outside-in”, following the user interaction through all the parts of the system until the guiding test passes. The Guiding Test will not usually replace parts of the system with mocks, so when it passes you should be confident you’ve remembered to actually implement all the needed collaborating classes.

Outside-In with Classic TDD

A Classic TDD approach may work outside-in too, but using an approach largely without mocks. There are various strategies to cope with the fact that collaborators don’t exist yet. One is to start with the degenerate case, where nothing much actually happens from the user’s point of view. It’s some kind of edge case where the output is much simpler than in the normal or happy-path case. It lets you build up the structure of the classes and methods needed for a simple version of the functionality, but with basically empty implementations, or simple faked return values. Once the whole structure is there, you flesh it out, perhaps working inside-out.

Another way to do this in Classic TDD is to start writing the tests from the outside-in, but when you discover you need a collaborating class to be implemented before the test will pass, comment out that test and move down to work on the collaborator instead. Eventually you find something you can implement with collaborators that already exist, then work your way up again.

A Classic TDD approach will often just not work outside-in at all. You start with one of the classes nearer the heart of the system. You’ll pick something that can be fully implemented and tested using collaborating classes that already exist.  Often it’s a class in the central domain model of the application. When that is done, you continue to develop the system from the heart towards the outside, adding new classes that build on one another. Because you’re using classes that already exist, there is little need for using mocks. Eventually you find you’ve built all the functionality needed to get the Guiding Test to pass.

Pros and Cons

I think there’s a definite advantage to working outside-in, it keeps your focus on what the user really needs, and helps you to build something useful, without too much gold-plating. I think it takes skill and discipline to work this way with either Classic or London School. It’s not easy to see how to break down a piece of functionality into incremental pieces that you can develop and design step-by-step. If you work from the heart outwards, there is a danger you’ll build more than you need for what the user wants, or that you’ll get to the outside, discover it doesn’t “fit”, and have to refactor your work.

Assuming you are working outside-in, though, one difference seems to me to be in whether you write faked implementations in the actual production code, or in mocks. If you start with fakes in the production code, you’ll gradually replace them with real functionality. If you put all the faked functionality into mocks, they’ll live with the test code, and remain there when the real functionality is implemented. This could be useful for documentation, and will make your tests continue to execute really fast.

Having said that, there some debate about the maintainability of tests that use a lot of mocks. When the design changes, it can be prohibitive to update all the mocks as well as the production code. Once the real implementations are done, maybe the inner-loop tests should just be deleted? The Guiding Test could provide all the regression protection you need, and maybe the tests that helped you with your original design aren’t useful to keep? I don’t think it’s clear-cut actually. From talking to London School proponents, they don’t seem to delete all the tests that use mocks. They do delete some though.

I’m still trying to understand these issues and work out in what contexts London School TDD brings the most advantage. I hope I’ve outlined what I see as the differences in way of working with outside-in development. In my next post I look at how London School TDD promotes “Tell, Don’t Ask” Object Oriented Design.

Recently I’ve become quite interested in the London School of TDD. I blogged before about my experiences doing Luca Minudel’s exercises, in my post “SOLID Principles and TDD“. Since I wrote that, I’ve read Steve Freeman and Nat Pryce’s book “Growing Object Oriented Software, Guided by Tests” and practiced doing some code katas in this style. In my experience there is a lot of confusion around how to use Mocks, and I found it enlightening to see  how the people who invented the technique actually use them.

My current thinking is that there are at least these two areas where London School TDD differs from Classic TDD:

  • Outside-In development with Double-Loop TDD
  • “Tell, Don’t Ask” Object Oriented Design

London School practitioners use Mock Objects as a tool for achieving both. Let’s look a little more about what Mocks are for.

Verifying behaviour

A test case often has three parts: “Arrange – Act – Assert”. In the second edition of his book The Art of Unit Testing, Roy Osherove points out that in the “Assert” part of a test there are three ways to ensure the class you’re testing is behaving correctly. After “Arrange” and “Act”, you can:

  • Check the return value, or an exception.
  • Check the state of the object, or the state of a collaborator.
  • Check the object correctly interacts with a collaborator.

This last form of assertion is generally done using a Mock Object. With an ordinary Mock you set it up in advance to check for a particular interaction, with a Spy, you check after the fact. In either case, you’re asserting an interaction happens correctly. You’re checking a particular object received a particular method call, and you can be more or less strict about the precise details of arguments and numbers of invocations.

In Classic TDD, whenever possible you check a return value or exception. If you’re testing a void method, then you usually take the second option and check state. Only if the other options are really unattractive do you ever turn to using a mock. It’s the last choice.

With London School TDD, the option of using a mock is chosen much more often. You’ll still check return values or object states where that makes sense, but using a mock is often an attractive option. This is because using a mock helps you to both develop your system Outside-In, and to design your objects in an “Tell, Don’t Ask” manner.

Mocks used badly

I think one reason that using a mock object is often the last choice in classic TDD, is because it’s so easy to get into trouble when using them. You should be using mocks to help you improve your design, but all too often the design is bad, and the mocks are either hiding that, or getting in the way.  I sometimes see tests with an enormous “Arrange” part, specifying half a dozen different mocks before they’ve even started calling any functionality. Such a test is bound to be brittle, and could hinder your refactoring to a better design.

The article “Eliminate most Mocks from Unit Tests” by Arlo Belshee gives an example of using a mock to compensate for bad design, and he has several other articles in the series. I think Arlo is largely criticising poor use of mocks actually, rather than London School TDD itself.

It seems to me that you can abuse any technique, and Object Oriented Design is actually very difficult. Steve Freeman has said “No tool nor technique can survive inadequately trained developers“. London School TDD is a design technique that is not easy to master. I talk more about this in my next posts “Outside-In Development with Double Loop TDD“, and “Tell, Don’t Ask” Object Oriented Design.

I’ve been working on this Kata “Gilded Rose” at a few different coding dojos lately. There is even a video of a session I did at the “Tampere Goes Agile” conference recently. In the video, you can see me talking about my Principles of Agile Test Automation, which I have just written about, and updated in my last blog post.

I think these test automation principles are useful to think about when you’re doing the Gilded Rose kata. The basic plot of the Kata is that you’ve just been hired to look after an existing system, and the customer wants a new feature. Having a look at the code, you can see you’re going to want to refactor it a little before adding the new feature, and before you do that, you’re going to want some automated tests.

So the first part of the Kata is to add automated tests to the existing code. You’ve got a requirements document the customer has given you, and you can use it to identify test cases. You’ve also got the code which you can read and execute and work out what it does. The customer is happily using the code in production right now, so you can assume that the behaviour it has is the behaviour they want to keep, whatever it says in the requirements document. (hint!)

Warning – spoilers lie ahead! You should probably try the Gilded Rose kata for yourself before reading on!

When I’ve done this exercise with various groups, I’ve spent a lot of time discussing with people how to make their test cases really readable, and express the requirements clearly, and at the same time useful as regression protection when refactoring the code later.

When you design a test suite you have two main aims – to help you understand what the code should do, (and what it does now), and protection from regression failures when you update it. It can be a bit tricky to do both with the same test suite. If you focus solely on describing the requirements in an executable way, you tend to miss edge cases and there are gaps in the regression protection. If you focus only on regression protection, you’ll spend time analysing the edge cases, and measuring code coverage to see how well you’re doing, but the test cases can become quite hard to read and understand.

You can see for yourself by comparing this test case by Bobby Johnson with this text-based approval test. (It was written by several people at a GothPy meeting). Bobby’s test case is extremely readable and expresses the requirements clearly. He’s done pretty well on the edge cases, but I think he’s missing one or two*. With the text-based approval tests, it’s not so easy to understand what the underlying business rules are, although the regression protection is very good.

When I do this kata with a group, we spend some time discussing the various test cases we’ve come up with, and showing them on the projector. When we did this last week at the Booster Conference, people commented that showing these different test cases had given them a better understanding of “readability” and “regression protection”, and many went on to improve their test suites.

Once you’re reasonably happy with your test suite, the next task is to do the refactoring and add the new feature. How useful are your test cases for regression protection? It’s very easy to make refactoring mistakes in this kata, and you will be testing your tests! You may discover while refactoring that there are more test cases that you want to add. Version control can be pretty useful, so you can run the new test cases against the original code.

There’s also an interesting restriction on your refactoring options – the “Item” class is owned by a nasty-sounding goblin and he doesn’t want you to change his code, so if you do, you have to be prepared for some serious consequences! When comparing refactored solutions at the end of the dojo, this is often an interesting discussion point – did you change the Item class? Is your new design so great that you’re prepared to argue with the goblin for it?!

I havn’t tried this, but I would actually like to try running the text-based approval test against all the refactored solutions at the end of the coding dojo, as input to the retrospective. I think this test covers all the edge cases very well, and would reveal any refactoring mistakes that were not caught by the tests people had developed themselves. That would be interesting feedback to have!

If you havn’t tried the Gilded Rose kata yourself, I do recommend it for practicing writing good test cases. I’d be happy to get a pull request from you if you want to translate the exercise into your favourite programming language, or you can do it in the original C#, as Bobby suggests.

If you’re interested in taking part in a coding dojo with me, I’ll be at several conferences later this year: ACCU in Bristol, XP2013 in Vienna and Test Automation Day in the Netherlands.

* I believe he’s missing a check that the quality of backstage passes doesn’t increase past 50

I’ve previously written about Agile test automation principles, and since then I’ve had some interesting discussions with people that have led me to revise them in this article. In particular, Seb Rose wrote about his 6 principles of unit testing and pointed out some issues with mine. So this article is an update on the previous one, and I’m hoping this will spark further interesting discussions!

I feel like I’ve spent most of my career learning how to write good automated tests in an agile environment. When I downloaded JUnit in the year 2000 it didn’t take long before I was hooked – unit tests for everything in sight. That gratifying green bar is near-instant feedback that everthing is as expected, my code does what I intended, and I can continue developing from a firm foundation.

Later, starting in about 2002, I began writing larger granularity tests, for whole subsystems; functional tests if you like. The feedback that my code does what I intended, and that it has working functionality has given me confidence time and again to release updated versions to end-users.

I was not the first to discover that developers design automated functional tests for two main purposes. Initially we design them to help clarify our understanding of what to build. In fact, at that point they’re not really tests, we usually call them scenarios, or examples. Later, the main purpose of the tests becomes to detect regression errors, although we continue use them to document what the system does.

When you’re designing a functional test suite, you’re trying to support both aims, and sometimes you have to make tradeoffs between them. You’re also trying to keep the cost of writing and maintaining the tests as low as possible, and as with most software, it’s the maintenance cost that dominates. Over the years I’ve begun to think in terms of four principles that help me to design functional test suites that make good tradeoffs and identify when a particular test case is fit for purpose.

Book-128Readability

When you look at the test case, you can read it through and understand what the test is for. You can see what the expected behaviour is, and what aspects of it are covered by the test. When the test fails, you can quickly see what is broken.

If your test case is not readable, it will not be useful, neither for understanding what the system does, or identifying regression errors. When it fails you will have to dig though other sources outside of the test case to find out what is wrong. Quite likely you will not understand what is wrong and you will rewrite the test to check for something else, or simply delete it.

internet_128Robustness

When a test fails, it means there is a regression error, (functionality is broken), or the system has changed and the tests no longer document the correct behaviour. You need to take action to correct the system or update the test, and this is as it should be. If however, the test has failed for no good reason, you have a problem: a fragile test.

There are many causes of fragile tests. For example tests that are not isolated from one another, duplication between test cases, and dependencies on random or threaded code. If you run a test by itself and it passes, but fails in a suite together with other tests, then you have an isolation problem. If you have one broken feature and it causes a large number of test failures, you have duplication between test cases. If you have a test that fails in one test run, then passes in the next when nothing changed, you have a flickering test.

If your tests often fail for no good reason, you will start to ignore them. Quite likely there will be real failures hiding amongst all the false ones, and the danger is you will not see them.

Speed-128Speed

As an agile developer you run your test suite frequently. Both (a) every time you build the system, (b) before you check in changes, and (c) after check-in in an automated Continuous Integration environment. I recommend time limits of 2 minutes for (a), 10 minutes for (b), and 60 minutes for (c). This fast feedback gives you the best chance of actually being willing to run the tests, and to find defects when they’re cheapest to fix, soon after insertion.

If your test suite is slow, it will not be used. When you’re feeling stressed, you’ll skip running them, and problem code will enter the system. In the worst case the test suite will never become green. You’ll fix the one or two problems in a given run and kick off a new test run, but in the meantime you’ll continue developing and making other changes. The diagnose-and-fix loop gets longer and the tests become less likely to ever all pass at the same time. This can become pretty demoralizing.

updatability.001Updatability

When the needs of the users change, and the system is updated, your tests also need to be updated in tandem. It should be straightforward to identify which tests are affected by a given change, and quick to update them all.

If your tests are not easy to update, they will likely get left behind as the system moves on. Faced with a small change that causes thousands of failures and hours of work to update them all, you’ll likely delete most of the tests.

Following these four principles implies Maintainability

Taken all together, I think how well your tests adhere to these principles will determine how maintainable they are, or in other words, how much they will cost. That cost needs to be in proportion to the benefits you get: helping you understand what the system does, and regression protection.

As your test suite grows, it becomes ever more challenging to adhere to all the principles. Readability suffers when there are so many test cases you can’t see the forest for the trees. The more details of your system that you cover with tests, the more likely you are to have Robustness problems – tests that fail when these details change.  Speed obviously also suffers – the time to run the test suite usually scales linearly with the number of test cases. Updatability doesn’t necessarily get worse as the number of test cases increases, but it will if you don’t adhere to good design principles in your test code, or lack tools for bulk update of test data for example.

I think the principles are largely the same whether you’re writing skinny little unit tests or fatter functional tests that touch more of the codebase. My experience tells me that it’s a lot easier to be successful with unit tests. As the testing thickness increases, the feedback cycle gets slower, and your mistakes are amplified. That’s why I concentrate on teaching these principles through unit testing exercises. Once you understand what you’re aiming for, you can transfer your skills to functional tests.

How can you use these principles?

I find it useful to remember these principles when designing test cases. I may need to make tradeoffs between them, and it helps just to step back and assess how I’m doing on each principle from time to time as I develop. If I’m reviewing someone else’s test cases, I can point to code and say which principles it’s not following, and give them concrete advice about how to make improvements. We can have a discussion for example about whether to add more test cases in order to improve regression protection, and how to do that without reducing overall readability.

I also find these principles useful when I’m trying to diagnose why a test suite is not being useful to a development team, especially if things have got so bad they have stopped maintaining it. I can often identify which principle(s) the team has missed, and advise how to refactor the test suite to compensate.

For example, if the problem is lack of Speed you have some options and tradeoffs to make:

  • Replace some of the thicker, slower end-to-end tests with lots of skinny fast unit tests, (may reduce regression protection)
  • Invest in hardware and run tests in parallel (costs $)
  • Use a profiler to optimize the tests for speed the same as you would production code (may affect Readability)
  • Use more fakes to replace slow parts of the system (may reduce regression protection)
  • Identify key test cases for essential functionality and remove the other test cases. (sacrifice regression protection to get Speed)

Strategic Decisions

The principles also help me when I’m discussing automated testing strategy, and choosing testing tools. Some tools have better support for updating test cases and test data. Some allow very Readable test cases. It’s worth noting that automated tests in agile are quite different from in a traditional process, since they are run continually throughout the process, not just at the end. I’ve found many traditional automation tools don’t lead to enough Speed and Robustness to support agile development.

I hope you will find these principles help you to reason about your strategy and tools for functional automated testing, and to design more maintainable, useful test cases.

Images Attribution: DaPino Webdesign, Lebreton, Asher Abbasi, Woothemes, Iconshock, Andy Gongea, FatCow from www.iconspedia.com