Posts tagged ‘TDD’

For a little while now I’ve been collecting Refactoring Kata exercises in a github repo, (you’re welcome to clone it and try them out). I’ve recently facilitated working on some of these katas at various coding dojo meetings, and participants seem to have enjoyed doing them. I usually give a short introduction about the aims of the dojo and the refactoring skills we’re focusing on, then we split into pairs and work on one of these Refactoring Katas for a fixed timebox. Afterwards we compare designs and discuss what we’ve learnt in a short retrospective. It’s satisfying to take a piece of ugly code and after only an hour or so make it into something much more readable, flexible, and significantly smaller.

Test Driven Development is a multifaceted skill, and one aspect is the ability to improve a design incrementally, applying simple refactorings one at a time until they add up to a significant design improvement. I’ve noticed in these dojo meetings that some pairs do better than others at refactoring in small steps. I can of course stand behind them and coach to some extent, but I was wondering if we could use a tool that would watch more consistently, and help pairs notice when they are refactoring poorly.

I spent an hour or so doing the GildedRose refactoring kata myself in Java, and while I was doing it I had two different monitoring tools running in the background, the “Codersdojo Client” from http://content.codersdojo.org/codersdojo_client/, and “Sessions Recorder” from Industrial Logic. (This second tool is commercial and licenses cost money, but I actually got given a free license so I could try it out and review it). I wanted to see if these tools could help me to improve my refactoring skills, and whether I could use them in a coding dojo setting to help a group.

Setting it up and recording your Kata
The Codersdojo Client is a ruby gem that you download and install. When you want to work on a kata, you have to fiddle about a bit on the command line getting some files in the right places, (like the junit jar), then modify and run a couple of scripts. It’s not difficult if you follow the instructions and know basicly how to use the command line. You have a script running all the time you are coding, and it runs the tests every time you save a source file.

The Sessions Recorder is an Eclipse plugin that you download and install in the same way as other Eclipse plugins. It puts a “record” button on your toolbar. You press “record” before you start working on the kata.

Uploading the Kata for analysis
When you’ve finished the kata, you need to upload your recording for analysis. With the Codersdojo Client, when you stop the script, it gives you the option of doing the upload. When that’s completed it gives you a link to a webpage, where you can fill in some metadata about the Kata and who you are. Then it takes you to a page with the full final code listing and analysis.

The Sessions Recorder is similar. You press the button on the Eclipse toolbar to stop recording, and save the session in a file. Then you go to the Indutrial Logic webpage, log into your account, and go to the page where you upload your recorded Session file. You don’t have to enter any metadata, since you have an account and it remembers who you are, (you did pay for this service after all!) It then takes you to a page of analysis.

Codersdojo Client Analysis
The codersdojo client creates a page that you can make public if you want – mine is available here. It gives you a graph like this:

Screen Shot 2012-08-16 at 09.08.08

It’s showing how long you spent between saving the file, (a “move”), and whether the tests were red or green when you saved. There is also some general statistics about how long you spent in each move, and how many modifications there were on average. It points out your three longest moves, and has links to them so you can see what you were doing in the code at those points.

I think this analysis is quite helpful. I can see that I’m going no more than two or three minutes between saving the file, and usually if the tests go red I fix them quickly. Since it’s a refactoring kata I spend quite a lot of moves at the start where it’s all green, as I build up tests to cover the functionality. In the middle there is a red patch, and this is a clear sign to me that I could have done that part of the kata better. Looking over my code I was doing a major redesign and I should have done it in a better way that would have kept the tests running in the meantime.

Towards the end of the kata I have another flurry of red moves, as I start adding new functionality for “Conjured” items. I tried to move into a more normal TDD red-green-refactor cycle at that point, but it actually doesn’t look like I succeeded very well from this graph. I think I rushed past the “green” step without running the tests, then did a big refactoring. It worked in the end but I think I could have done that better too.

Sessions Recorder Analysis
The Sessions Recorder produces a page which is personal to me, and I don’t think it allows me to share it publicly on the web. On the page is a graph that looks like this:

anychart

As you can see it also shows how long I spend with passing and failing tests, in a slightly different way from the Codersdojo Client’s graph. It also distinguishes compiler errors from failing tests, (pink vs red).

This graph also clearly shows the areas I need to improve – the long pink patch in the middle where I do a major redesign in too large a step, and at the red bit at the end when I’m not doing TDD all that well.

The line on the graph is a “score” where it is awarding me points when I successfully perform the moves of TDD. Further down the page it gives me a list of the “events” this score is based on:

Screen Shot 2012-08-16 at 09.39.32

(This is just some of the events, to show you the kinds of things this picks up on.) “New Green Test” seems to score zero points, which is a bit disappointing, but adding a failing test gets a point, and so does making it pass. “Went green, but broke other tests” gets zero points. It’s clearly designed to help me successfully complete red-green-refactor cycles, not reward me for adding test coverage to existing code, then refactoring it.

There is another graph, more focused on the tests:

Screen Shot 2012-08-16 at 09.44.53

This graph has mouseover texts so when you hover over a red dot, it shows all the compilation errors you had at that point, and if you hover over a green dot it tells you which tests were passing. It also distinguishes “compler errors” from a “compiler rash”. The difference is that a “compiler rash” is a more serious compilation problem, that affects several files.

You can clearly see from this graph that the first part of the kata I was building up the test coverage, then just leaning on these tests and refactoring for the rest. It hasn’t noticed that I had two @Ignore ‘d tests until the last few minutes though. (I added failing tests for Conjured Items near the start then left them until I had the design suitably refactored near the end).

I actually found this graph quite hard to use to work out what I need to improve on. There seem to be three long gaps in the middle, full of compilation errors where I wasn’t running the tests. Unlike with the Codersdojo Client, there isn’t a link to the actual code changes I was making at those points. I’m having trouble working out just from the compiler errors what I should have been doing differently. I think one of these gaps is the same major redesign I could see clearly in the Codersdojo Client graph as a too big step, but I’m not so sure what the other two are.

There are further statistics and analysis as well. There is a section for “code smells” but it claims not to have found any. The code I started with should qualify as smelly, surely? Not sure why it hasn’t picked up on that.

Conclusions
I think both tools could help me to become better at Test Driven Development, and could be useful in a dojo setting. I can imagine pairs comparing their graphs after completing the kata, discussing how they handled the refactoring steps, and where the design ended up. Pairs could swap computers and look through someone else’s statistics to get some comparison with their own.

The Codersdojo Client is free to use, and works with a large number of programming languages, and any editor. You do have to be comfortable with the command line though. The Sessions Recorder tool only supports Java and C# via Eclipse. It has more detailed analysis, but for this Refactoring Kata I don’t think it was as helpful as it could have been.

The other big difference between the tools is about openness. The Sessions Recorder keeps your analysis private to you, and if you want to discuss your performance, it lets you do so with the designers of the tool via a “comment on this page” function. I havn’t tried that out yet so I’m not sure how it works, that is, whether you get feedback from a real person as well as the tool.

The Codersdojo Client also lets you keep your analysis private if you want, but in addition lets you publish your Kata performance for general review, as I have done. You can share your desire for feedback on twitter, g+ or facebook. People can go in and comment on specific lines of code and make suggestions. That wouldn’t be so needed during a dojo meeting, but might be useful if you were working alone.

Further comparison needed
On another occasion I tried out the Sessions Recorder on a normal TDD kata, and found the analysis much better. For example this graph of me doing the Tennis kata from scratch:

anychart (1)

This shows a clear red-green pattern of small steps, and steadily increasing score rewarding me for doing TDD correctly. Unfortunately I didn’t do a Codersdojo Client session at the same time as this one, for comparison. A further blog post is clearly needed for this case… 🙂

Please note – As of March 2013, I have rewritten this post in the light of further experience and discussions. The updated post is available here.

I feel like I’ve spent most of my career learning how to write good automated tests in an agile environment. When I downloaded JUnit in the year 2000 it didn’t take long before I was hooked – unit tests for everything in sight. That gratifying green bar is near-instant feedback that everthing is as expected, my code does what I intended, and I can continue developing from a firm foundation.

Later, starting in about 2002, I began writing larger granularity tests, for whole subsystems; functional tests if you like. The feedback that my code does what I intended, and that it has working functionality has given me confidence time and again to release updated versions to end-users.

Often, I’ve written functional tests as regression tests, after the functionality is supposed to work. In other situations, I’ve been able to write these kinds of tests in advance, as part of an ATDD, or BDD process. In either case, I’ve found the regression tests you end up with need to have certain properties if they’re going to be useful in an agile environment moving forward. I think the same properties are needed for good agile functional tests as for good unit tests, but it’s much harder. Your mistakes are amplified as the scope of the test increases.

I’d like to outline four principles of agile test automation that I’ve derived from my experience.

Coverage

If you have a test for a feature, and there is a bug in that feature, the test should fail. Note I’m talking about coverage of functionality, not code coverage, although these concepts are related. If your code coverage is poor, your functionality coverage is likely also to be poor.

If your tests have poor coverage, they will continue to pass even when your system is broken and functionality unusable. This can happen if you have missed out needed test cases, or when your test cases don’t check properly what the system actually did. The consequences of poor coverage is that you can’t refactor with confidence, and need to do additional (manual) testing before release.

The aim for automated regression tests is good Coverage: If you break something important and no tests fail, your test coverage is not good enough. All the other principles are in tension with this one – improving Coverage will often impair the others.

Readability

When you look at the test case, you can read it through and understand what the test is for. You can see what the expected behaviour is, and what aspects of it are covered by the test. When the test fails, you can quickly see what is broken.

If your test case is not readable, it will not be useful. When it fails you will have to dig though other sources outside of the test case to find out what is wrong. Quite likely you will not understand what is wrong and you will rewrite the test to check for something else, or simply delete it.

As you improve Coverage, you will likely add more and more test cases. Each one may be fairly readable on its own, but taken all together it can become hard to navigate and get an overview.

Robustness

When a test fails, it means the functionality it tests is broken, or at least is behaving significantly differently from before. You need to take action to correct the system or update the test to account for the new behaviour. Fragile tests are the opposite of Robust: they fail often for no good reason.

Aspects of Robustness you often run into are tests that are not isolated from one another, duplication between test cases, and flickering tests. If you run a test by itself and it passes, but fails in a suite together with other tests, then you have an isolation problem. If you have one broken feature and it causes a large number of test failures, you have duplication between test cases. If you have a test that fails in one test run, then passes in the next when nothing changed, you have a flickering test.

If your tests often fail for no good reason, you will start to ignore them. Quite likely there will be real failures hiding amongst all the false ones, and the danger is you will not see them.

As you improve Coverage you’ll want to add more checks for details of your system. This will give your tests more and more reasons to fail.

Speed

As an agile developer you run the tests frequently. Both (a) every time you build the system, and (b) before you check in changes. I recommend time limits of 2 minutes for (a) and 10 minutes for (b). This fast feedback gives you the best chance of actually being willing to run the tests, and to find defects when they’re cheapest to fix.

If your test suite is slow, it will not be used. When you’re feeling stressed, you’ll skip running them, and problem code will enter the system. In the worst case the test suite will never become green. You’ll fix the one or two problems in a given run and kick off a new test run, but in the meantime someone else has checked in other changes, and the new run is not green either. You’re developing all the while the tests are running, and they never quite catch up. This can become pretty demoralizing.

As you improve Coverage, you add more test cases, and this will naturally increase the execution time for the whole test suite.

How are these principles useful?

I find it useful to remember these principles when designing test cases. I may need to make tradeoffs between them, and it helps just to step back and assess how I’m doing on each principle from time to time as I develop.

I also find these principles useful when I’m trying to diagnose why a test suite is not being useful to a development team, especially if things have got so bad they have stopped maintaining it. I can often identify which principle(s) the team has missed, and advise how to refactor the test suite to compensate.

For example, if the problem is lack of Speed you have some options and tradeoffs to make:

  • Invest in hardware and run tests in parallel (costs $)
  • Use a profiler to optimize the tests for speed the same as you would production code (may affect Readability)
  • push down tests to a lower level of granularity where they can execute faster. (may reduce Coverage and/or increase Readability)
  • Identify key test cases for essential functionality and remove the other test cases. (sacrifice Coverage to get Speed)

Explaining these principles can promote useful discussions with people new to agile, particularly testers. The test suite is a resource used by many agile teamembers – developers, analysts, managers etc, in its role as “Living Documentation” for the system, (See Gojko Adzic‘s writings on this). This emphasizes the need for both Readability and Coverage. Automated tests in agile are quite different from in a traditional process, since they are run continually throughout the process, not just at the end. I’ve found many traditional automation approaches don’t lead to enough Speed and Robustness to support agile development.

I hope you will find these principles will help you to reason about the automated tests in your suite.

Programmers have a vested interest in making sure the software they create does what they think it does. When I’m coding I prefer to work in the context of feedback from automated tests, that help me to keep track of what works and how far I’ve got. I’ve written before about Test Driven Development, (TDD). In this article I’d like to explain some of the main features of Text-Based Testing. It’s a variant on TDD, perhaps more suited to the functional level than unit tests, and which I’ve found powerful and productive to use.


The basic idea
You get your program to produce a plain text file that documents all the important things that it does. A log, if you will. You run the program and store this text as a “golden copy” of the output. You create from this a Text-Based Test with a descriptive name, any inputs you gave to the program, and the golden copy of the textual output.

You make some changes to your program, and you run it again, gathering the new text produced. You compare the text with the golden copy, and if they are identical, the test passes. If the there is a difference, the test fails. If you look at the diff, and you like the new text better than the old text, you update your golden copy, and the test is passing once again.

Tool Support
Text-Based Testing is a simple idea, and in fact many people do it already in their unit tests. AssertEquals(String expected, String actual) is actually a form of it. You often create the “expected” string based on the actual output of the program, (although purists will write the whole assert before they execute the test).

Most unit test tools these days give you a nice diff even on multi-line strings. For example:

download
download (1)

Which is a failing text-based test using JUnit.

Once your strings get very long, to the scale of whole log files, even multi-line diffs aren’t really enough. You get datestamps, process ids and other stuff that changes every run, hashmaps with indeterminate order, etc. It gets tedious to deal with all this on a test-by-test basis.

My husband, Geoff Bache, has created a tool called “TextTest” to support Text-Based testing. Amongst other things, it helps you organize and run your text-based tests, and filter the text before you compare it. It’s free, open source, and of course used to test itself. (Eats own dog food!) TextTest is used extensively within Jeppesen Systems, (Geoff works for them, and they support development), and I’ve used it too on various projects in other organizations.

In the rest of this article I’ll look at some of the main implications of using a Text-Based Testing approach, and some of my experiences.

Little code per test
The biggest advantage of the approach, is that you tend to write very little unique code for each test. You generally access the application through a public interface as a user would, often a command line interface or (web)service call. You then create many tests by for example varying the command line options or request contents. This reduces test maintenance work, since you have less test code to worry about, and the public API of your program should change relatively infrequently.

Legacy code
Text-Based Testing is obviously a regression testing technique. You’re checking the code still does what it did before, by checking the log is the same. So these tests are perfect for refactoring. As you move around the code, the log statements move too, and your tests stay green, (so long as you don’t make any mistakes!) In most systems, it’s cheap and risk-free to add log statements, no matter how horribly gnarly the design is. So text-based testing is an easy way to get some initial tests in place to lean on while refactoring. I’ve used it this way fairly successfully to get legacy code under control, particularly if the code already produces a meaningful log or textual output.


No help with your design
I just told you how good Text-Based Testing is with Legacy code. But actually these tests give you very little help with the internal design of your program. With normal TDD, the activity of creating unit tests at least forces you to decompose your design into units, and if you do it well, you’ll find these tests giving you all sorts of feedback about your design. Text-Based tests don’t. Log statements don’t care if they’re in the middle of a long horrible method or if they’re spread around several smaller ones. So you have to get feedback on your design some other way.

I usually work with TDD at the unit level in combination with Text-Based tests at the functional level. I think it gives me the best of both worlds.

Log statements and readability
Some people complain that log statements reduce the readability of their code and don’t like to add any at all. They seem to be out of fashion, just like comments. The idea is that all the important ideas should be expressed in the class and method names, and logs and comments just clutter up the important stuff. I agree to an extent, you can definitely over-use logs and comments. I think a few well placed ones can make all the difference though. For Text-Based Testing purposes, you don’t want a log that is megabytes and megabytes of junk, listing every time you enter and leave every method, and the values of every variable. That’s going to seriously hinder your refactoring, apart from being a nightmare to store and update.

What we’re talking about here is targeted log statements at the points when something important happens, that we want to make sure should continue happening. You can think about it like the asserts you make in unit tests. You don’t assert everything, just what’s important. In my experience less than two percent of the lines of code end up being log statements, and if anything, they increase readability.

Text-Based tests are completed after the code
In normal TDD you write the test first, and thereby set up a mini pull system for the functionality you need. It’s lean, it forces you to focus on the problem you’re trying to solve before you solve it, and starts giving you feedback before you commit to an implementation. With Text-Based Testing, you often find it’s too much work the specify the log up front. It’s much easier to wait until you’ve implemented the feature, run the test, and save the log afterwards.

So your tests usually aren’t completed until after the code they test, unlike in normal TDD. Having said that, I would argue that you can still do a form of TDD with Text-Based Tests. I’d normally create the half the test before the code. I name the test, and find suitable inputs that should provoke the behaviour I need to implement in the system. The test will fail the first time I run it. In this way I think I get many of the benefits of TDD, but only actually pin down the exact assertion once the functionality is working.

“Expert Reads Output” Antipattern
If you’re relying on a diff in the logs to tell you when your program is broken, you had better have good logs! But who decides what to log? Who checks the “golden copy”? Usually it is the person creating the test, who should look through the log and check everything is in order the first time. Of course, after a test is created, every time it fails you have to make a decision whether to update the golden copy of the log. You might make a mistake. There’s a well known antipattern called “Expert Reads Output” which basically says that you shouldn’t rely on having someone check the results of your tests by eye.

This is actually a problem with any automated testing approach – someone has to make a judgement about what to do when a test fails – whether the test is wrong or there’s a bug in the application. With Text-Based Testing you might have a larger quantity of text to read through compared with other approaches, or maybe not. If you have human-readable, concise, targeted log statements and good tools for working with them, it goes a long way. You need a good diff tool, version control, and some way of grouping similar changes. It’s also useful to have some sanity checks. For example TextTest can easily search for regular expressions in the log and warn you if you try to save a golden copy containing a stack trace for example.

In my experience, you do need to update the golden copy quite often. I think this is one of the key skills with a Text-Based Testing approach. You have to learn to write good logs, and to be disciplined about either doing refactoring or adding functionality, not both at the same time. If you’re refactoring and the logs change, you need to be able to quickly recognize if it’s ok, or if you made a mistake. Similarly, if you add new functionality and no logs change, that could be a problem.

Agile Tests Manage Behaviour
When you create a unit test, you end with an Assert statement. This is supposed to be some kind of universal truth that should always be valid, or else there is a big problem. Particularly for functional level tests, it can be hard to find these kinds of invariants. What is correct today might be updated next week when the market moves or the product owner changes their mind. With Text-Based Testing you have an opportunity to quickly and easily update the golden copy every time the test “fails”. This makes your tests much more about keeping control of what your app does over time, and less about rewriting assert statements.

Text-Based Testing grew up in the domain of optimizing logistics planning. In this domain there is no “correct” answer you can predict in advance and assert. Planning problems that are interesting to solve are far too complex for a complete mathematical analysis, and the code relies on heuristics and fancy algorithms to come up with better and better solutions. So Text-Based Testing makes it easy to spot when the test produces a different plan from before, and use it as the new baseline if it’s an improvement.

I think generally it leads to more “agile” tests. They can easily respond to changes in the business requirements.

Conclusions
There is undoubtedly a lot more to be said about Text-Based Testing. I havn’t mentioned text-based mocking, data-driven vs workflow testing, or how to handle databases and GUIs – all relevant topics. I hope this article has given you a flavour of how it’s different from ordinary TDD, though. I’ve found that good tool support is pretty essential to making Text-Based Testing work well, and that it’s a particularly good technique for handling legacy code, although not exclusively. I like the approach because it minimizes the amount of code per test, and makes it easy to keep the tests in sync with the current behaviour of the system.

 

This weekend I was in Stockholm to facilitate a Code Retreat, organized by Peter Lind and sponsored by Valtech. We were about 40 coders gathered in the warm autumn sunshine early on a Saturday morning at Valtech’s offices. (Do take a look at Peter’s blog post about it, he has a photo too).

It’s actually the first time I’ve even attended a code retreat, let alone facilitated, but I think it went pretty well. Corey Haines has written extensively about what should happen, and what the facilitator should do. I think he’s given a great gift to the community, not just by inventing the format, but also by documenting it thorougly.  I’ve previously led various coding dojos and “clean code day” events, but code retreat is somewhat different in format, if not in aim.

The reason for going to a code retreat is to practice your coding skills. By repeating the same exercise over and over, with different pairing partners, you have a chance to work on your coding habits. Do you pay attention to what your tests are telling you about your design? Do you remember to refactor regularly? Can you take really small steps when you need to?

For the day in Stockholm, we followed the tried and tested formula for a code retreat that Corey has laid out. I spent about 20 minutes introducing the day, the aims and the coding problem (Conway’s Game of Life). Then we did 6 coding sessions, each with a short retrospective, and a longer retrospective at the end of the day. Each session comprised 45 minutes coding in pairs, 10 minutes retrospective in groups of 6-8, and 5 minutes to swap partners. I also began each coding session by reminding everyone of what we were supposed to be practicing, and highlighted a different “challenge” to add some variety. The challenges were things like:

– concentrate on writing really beautiful code so the language looks like it was made for the problem. *
– partition code at different levels of abstraction. **
– Think about TDD in terms of states and moves.
– do TDD as if you meant it
– concentrate on refactoring in very small steps

Each pairing session is just 45 minutes, and in that time you don’t actually have time to really solve the whole kata, and that’s actually quite difficult to cope with. Most coders are very motivated by writing code that does something useful, and like to show off their finished designs at the end. To try to prevent that, Corey emphasizes that you should keep in mind the end result isn’t important, and be sure to delete the code at the end of the session. I found that even with that rule, there was quite a lot of discussion of how the designs ended up, and some people even saved their code.

One of the things I encouraged people to try was working in an unfamiliar programming language, and although I specified “for 1 or 2 sessions”, I was surprised to find how popular it was to do this. After the first session when most people used Java, C#, Ruby or Python, there were more and more people coding in Clojure, Javascript, Erlang and even Vim script. I think it got a bit out of hand actually. It’s hard to practice your coding habits and TDD skills when you’re struggling with the language syntax and how to get the tests to run. Next time I facilitate I’ll try to be clearer about using a familiar environment for most of the sessions.

One of the things I offered in the last session was using the cyberdojo, and three pairs agreed to try it. I had them working in Java and Ruby, switching pairs every 5 minutes, coding in a browser window. They complained about the browser experience compared with their IDEs, but they liked the feedback cyberdojo gives you. It shows how long you spend between running the tests, and whether the tests pass, fail or give a compiler error.

I’m not sure if it was a good idea to bring in the cyberdojo at the code retreat. One of the main things we discussed in the retrospective for that session was the resistance they all felt to changing the first test that was written at one of the three pairing stations. This test was too big and focussed on a boring part of the problem. Yet each person who “inherited” the code tried their best to make it pass, no-one started over with a better test. It’s that kind of collaboration problem that the cyberdojo is good at highlighting. It’s not so much a tool for improving your coding skills as improving your collaboration skills. This is good, but not really the purpose of the code retreat.

Thinking back over the day, I’ve also become a little uncertain about the “delete your code” rule. I understand why it’s there, but it didn’t seem to prevent people from trying to solve the whole problem in 45 minutes. By deleting the code, you also lose the opportunity to use analysis tools like those in the cyberdojo to give you some more feedback on how you’re doing.

Outside of this code retreat, I’ve been trying out the codersdojo client quite a bit recently, to see if it gives a useful analysis of a coding session. Unlike cyberdojo, it lets you use your normal coding tools/IDE. So far it’s still in beta testing and seems too buggy for me to recommend, but if you’re lucky enough to successfully upload your coding session, you do get quite a good visualization of some of your coding habits. It will clearly show if you spend a long time between test runs, or if you spend a lot of time with failing tests.

So after my first code retreat, I’m feeling very encouraged that this is a good format for becoming a better coder, and I’d be happy to run one again. I’d like to try using coding visualization tools as part of the retrospective for each session. I’d also like to try setting the challenges before people have chosen a pairing partner, so they can find someone who also wants to work on my challenge rather than just try a new language. Or maybe I just need to emphasize more that trying a new language isn’t the focus of the day.

In any case, I hope this blog post shows that I learnt a lot from facilitating this code retreat, even if I didn’t write a single line of code myself 🙂

* “You can call it beautiful code when the code also makes it look like the language was made for the problem” — Ward Cunningham quoted in “Clean Code” by Bob Martin.
** G6: Code at Wrong Level of Abstraction – advice from “Clean Code” by Bob Martin.

I’ve been working on a kata called “Tennis”*, which I find interesting, because it is quite quick to code, yet is a big enough problem to be worth doing. It’s also possible to enumerate pretty much all the allowed scores, and get very comprehensive test coverage.

What I’ve found is that when I’m using TDD to solve the Kata, I tend to only enumerate actually a very small number of the test cases. I generally end up with something like:

Love-All
Fifteen-All
Fifteen-Love
Thirty-Forty
Deuce
Advantage Player1
Win for Player1
Advantage Player2

I think that’s enough to test drive a complete implementation, built up in stages. I thought it would be enough tests to also support refactoring the code, but I actually found it wasn’t. After I’d finished my implementation and mercilessly refactored it for total readability, I went back and implemented exhaustive tests. To my horror I found three (of 33) that failed! I’d made a mistake in one of my refactorings, and none of my original tests found it. The bug only showed up with scores like Fifteen-Forty, Love-Thirty and Love-Forty, where my code instead reported a win for Player 2. (I leave it as an exercise for the reader to identify my logic error 🙂

So what’s the point of TDD? Is it to help you make your design good, or to protect you from introducing bugs when refactoring? Of course it should help with both, but I think doing this practice exercise showed me (again!) that it really is worth being disciplined and careful about refactorings. I also think I need to develop a better sense for which refactorings might not be well covered by the tests I have, and when I should add more.

This is something that my friend Andrew Dalke brings up when he criticises TDD. The red-green-refactor iterative, incremental rhythm can lull you into a false sense of security, and means you forget to stop and look at the big picture, and analyze if the tests you have are sufficient. You don’t get reminded to add tests that should pass straight away, but might be needed if you refactor the code.

So in any case, I figured I needed to practice my refactoring skills. I’ve created comprehensive tests and three different “defactored” solutions to this kata, in Java and Python. You can get the starting code here. You can use this to practice refactoring with a full safety net, or if you feeling brave, without. Try commenting out a good percentage of the tests, and do some major refactoring. When you bring all the tests back, will they still all pass?

I’m planning to try this exercise with my local python user group, GothPy, in a few weeks time. I think it’s going to be fun!

* Tennis Kata: write a program that if you tell it how many points each player has won in a single game of tennis, it will tell you the score.