Archive for the ‘Coding Skills’ Category

I’ve previously written about Agile test automation principles, and since then I’ve had some interesting discussions with people that have led me to revise them in this article. In particular, Seb Rose wrote about his 6 principles of unit testing and pointed out some issues with mine. So this article is an update on the previous one, and I’m hoping this will spark further interesting discussions!

I feel like I’ve spent most of my career learning how to write good automated tests in an agile environment. When I downloaded JUnit in the year 2000 it didn’t take long before I was hooked – unit tests for everything in sight. That gratifying green bar is near-instant feedback that everthing is as expected, my code does what I intended, and I can continue developing from a firm foundation.

Later, starting in about 2002, I began writing larger granularity tests, for whole subsystems; functional tests if you like. The feedback that my code does what I intended, and that it has working functionality has given me confidence time and again to release updated versions to end-users.

I was not the first to discover that developers design automated functional tests for two main purposes. Initially we design them to help clarify our understanding of what to build. In fact, at that point they’re not really tests, we usually call them scenarios, or examples. Later, the main purpose of the tests becomes to detect regression errors, although we continue use them to document what the system does.

When you’re designing a functional test suite, you’re trying to support both aims, and sometimes you have to make tradeoffs between them. You’re also trying to keep the cost of writing and maintaining the tests as low as possible, and as with most software, it’s the maintenance cost that dominates. Over the years I’ve begun to think in terms of four principles that help me to design functional test suites that make good tradeoffs and identify when a particular test case is fit for purpose.

Book-128Readability

When you look at the test case, you can read it through and understand what the test is for. You can see what the expected behaviour is, and what aspects of it are covered by the test. When the test fails, you can quickly see what is broken.

If your test case is not readable, it will not be useful, neither for understanding what the system does, or identifying regression errors. When it fails you will have to dig though other sources outside of the test case to find out what is wrong. Quite likely you will not understand what is wrong and you will rewrite the test to check for something else, or simply delete it.

internet_128Robustness

When a test fails, it means there is a regression error, (functionality is broken), or the system has changed and the tests no longer document the correct behaviour. You need to take action to correct the system or update the test, and this is as it should be. If however, the test has failed for no good reason, you have a problem: a fragile test.

There are many causes of fragile tests. For example tests that are not isolated from one another, duplication between test cases, and dependencies on random or threaded code. If you run a test by itself and it passes, but fails in a suite together with other tests, then you have an isolation problem. If you have one broken feature and it causes a large number of test failures, you have duplication between test cases. If you have a test that fails in one test run, then passes in the next when nothing changed, you have a flickering test.

If your tests often fail for no good reason, you will start to ignore them. Quite likely there will be real failures hiding amongst all the false ones, and the danger is you will not see them.

Speed-128Speed

As an agile developer you run your test suite frequently. Both (a) every time you build the system, (b) before you check in changes, and (c) after check-in in an automated Continuous Integration environment. I recommend time limits of 2 minutes for (a), 10 minutes for (b), and 60 minutes for (c). This fast feedback gives you the best chance of actually being willing to run the tests, and to find defects when they’re cheapest to fix, soon after insertion.

If your test suite is slow, it will not be used. When you’re feeling stressed, you’ll skip running them, and problem code will enter the system. In the worst case the test suite will never become green. You’ll fix the one or two problems in a given run and kick off a new test run, but in the meantime you’ll continue developing and making other changes. The diagnose-and-fix loop gets longer and the tests become less likely to ever all pass at the same time. This can become pretty demoralizing.

updatability.001Updatability

When the needs of the users change, and the system is updated, your tests also need to be updated in tandem. It should be straightforward to identify which tests are affected by a given change, and quick to update them all.

If your tests are not easy to update, they will likely get left behind as the system moves on. Faced with a small change that causes thousands of failures and hours of work to update them all, you’ll likely delete most of the tests.

Following these four principles implies Maintainability

Taken all together, I think how well your tests adhere to these principles will determine how maintainable they are, or in other words, how much they will cost. That cost needs to be in proportion to the benefits you get: helping you understand what the system does, and regression protection.

As your test suite grows, it becomes ever more challenging to adhere to all the principles. Readability suffers when there are so many test cases you can’t see the forest for the trees. The more details of your system that you cover with tests, the more likely you are to have Robustness problems – tests that fail when these details change.  Speed obviously also suffers – the time to run the test suite usually scales linearly with the number of test cases. Updatability doesn’t necessarily get worse as the number of test cases increases, but it will if you don’t adhere to good design principles in your test code, or lack tools for bulk update of test data for example.

I think the principles are largely the same whether you’re writing skinny little unit tests or fatter functional tests that touch more of the codebase. My experience tells me that it’s a lot easier to be successful with unit tests. As the testing thickness increases, the feedback cycle gets slower, and your mistakes are amplified. That’s why I concentrate on teaching these principles through unit testing exercises. Once you understand what you’re aiming for, you can transfer your skills to functional tests.

How can you use these principles?

I find it useful to remember these principles when designing test cases. I may need to make tradeoffs between them, and it helps just to step back and assess how I’m doing on each principle from time to time as I develop. If I’m reviewing someone else’s test cases, I can point to code and say which principles it’s not following, and give them concrete advice about how to make improvements. We can have a discussion for example about whether to add more test cases in order to improve regression protection, and how to do that without reducing overall readability.

I also find these principles useful when I’m trying to diagnose why a test suite is not being useful to a development team, especially if things have got so bad they have stopped maintaining it. I can often identify which principle(s) the team has missed, and advise how to refactor the test suite to compensate.

For example, if the problem is lack of Speed you have some options and tradeoffs to make:

  • Replace some of the thicker, slower end-to-end tests with lots of skinny fast unit tests, (may reduce regression protection)
  • Invest in hardware and run tests in parallel (costs $)
  • Use a profiler to optimize the tests for speed the same as you would production code (may affect Readability)
  • Use more fakes to replace slow parts of the system (may reduce regression protection)
  • Identify key test cases for essential functionality and remove the other test cases. (sacrifice regression protection to get Speed)

Strategic Decisions

The principles also help me when I’m discussing automated testing strategy, and choosing testing tools. Some tools have better support for updating test cases and test data. Some allow very Readable test cases. It’s worth noting that automated tests in agile are quite different from in a traditional process, since they are run continually throughout the process, not just at the end. I’ve found many traditional automation tools don’t lead to enough Speed and Robustness to support agile development.

I hope you will find these principles help you to reason about your strategy and tools for functional automated testing, and to design more maintainable, useful test cases.

Images Attribution: DaPino Webdesign, Lebreton, Asher Abbasi, Woothemes, Iconshock, Andy Gongea, FatCow from www.iconspedia.com

This Code Kata is included in my new book “The Coding Dojo Handbook”, currently published as a work-in-progress on LeanPub.com. You can also download starting code and these instructions from my github page.

As a Health Insurer,
I want to be able to search for patients who have a medicine clash,
So that I can alert their doctors and get their prescriptions changed.

Health Insurance companies don’t always get such good press, but in this case, they actually do have your best interests at heart. Some medicines interact in unfortunate ways when they get into your body at the same time, and your doctor isn’t always alert enough to spot the clash when writing your prescriptions. Sometimes, medicine interactions are only identified years after the medicines become widely used, and your doctor might not be completely up to date. Your Health Insurer certainly wants you to stay healthy, so discovering a customers has a medicine clash and getting it corrected is good for business, and good for you!

For this Kata, you have a recently discovered medicine clash, and you want to look through a database of patient medicine and prescription records, to find if any need to be alerted to the problem. Create a “Patient” class, with a method “Clash” that takes as arguments a list of medicines, and how many days before today to consider, (defaults to the last 90 days). It should return a collection of days on which all the medicines were being taken during this time.

If you like, you can also create a visualization of the clash, something like this:

medicine_clash

Data Format

You can assume the data is in a database, which is accessed in the code via an object oriented domain model. The domain model is large and complex, but for this problem you can ignore all but the following entities and attributes:

TDDStatesMoves_003

In words, this shows that each Patient has a list of Medicines. Each Medicine has a list of Prescriptions. Each Prescription has a dispense date and a number of days supply.

You can assume:

  • Patients start taking the medicine on the dispense date.
  • The “days supply” tells you how many days they continue to take the medicine after the dispense date.
  • If they have two overlapping prescriptions for the same medicine, they stop taking the earlier one. Imagine they have mislaid the medicine they got from the first prescription when they start on the second prescription.

When you’ve tried the Kata for yourself

Then you might be interested in reviewing the sample solution I’ve put up on my github page. I find this code interesting because it is seemingly well written. The methods are short with thought-through names, and there are lots of unit tests. I also find it very difficult to follow. What do you think?

The biology of medicine clashes*

When you take a pill of medicine, the active substance will be absorbed through the lining of the gut, and enter your bloodstream. That means it will be taken all over your body, and can do its work. For example, if you take a headache pill, the active substance in the drug will be taken by your blood to where it can block your pain receptors. At the same time, there are enzymes at work in your liver, which break down medicinal substances they find in your bloodstream. Eventually all the medicine will be removed, so you have to take another pill if you want the effects to continue.

In the liver, there are several different enzymes working, and they are specialized in breaking down different substances. For example, the “CYP 2C9” enzyme will break down ibuprofen, the active ingredient in many headache pills. The trouble is, there are other medicines which will stop particular enzymes from doing their work, which can lead to an overdose or other ill effects.

One example is the clash between fluoxetine and codeine. Fluoxetine is known by its trade name “Prozac”, and is often taken for depression. Codeine is another ingredient used in headache pills, and is actually a “pro-drug”, so it works slightly differently. Codeine needs to be broken down in the liver by the enzyme “CYP 2D6” into the active substance, morphine, before it will do anything. Fluoxetine has the effect of blocking “CYP 2D6”, so if you take the two medicines together, you won’t get much painkilling effect from the codeine. That could be depressing!

The solution to the problem is to take a different painkiller – one that’s not affected by that liver enzyme. Simply switch codeine for ibuprofen, and you should be be a little happier.

* With thanks to Sara Sjöberg for helping me with this section

In my last post I discussed some exercises put together by Luca Minudel. He was using them as part of a study into how developer’s skill at removing SOLID violations was related to their skill at TDD.

I initially did the exercises in Java, then translated them into Python so we could look at them in our local Python User Group meeting last week. What follows are my own opinions, but I must thank all the pythonistas who were at the meeting, and Andrew Dalke who couldn’t be there but was kind enough to share his opinions and code anyway. What I write below owes a lot to their input. I’m so lucky to be in this community!

We found that in Python, some violations of the Open-Closed Principle are much easier to handle than in Java or C#. You can monkeypatch, and exploit the fact that data is only private by convention. The advantage of being able to get code under test without changing it is of course that you reduce the need for risky refactorings where you unintentionally break the code and get no failing tests to alert you to it.

So the TirePressure example has a violation of the Open-Closed principle where it’s hard to change the specific Sensor used without opening up the existing code. In Python it was dead easy to get under test without modifying the code, because although the sensor is marked as private in the Alarm class, nothing in the language stops you from assigning to it. (See this explanation of how Python considers private data.)

Here’s what (some of) the test code looks like, without making any modifications to the production code: (using the testing framework py.test)

from tire_pressure_monitoring import Alarm

class StubSensor(object):

def __init__(self, pressures):
self.pressures = pressures

def pop_next_pressure_psi_value(self):
return self.pressures.pop()

def test_pressure_in_expected_range_doesnt_trigger_alarm():
alarm = Alarm()
alarm._sensor = StubSensor([18])
alarm.check()
assert not alarm.is_alarm_on()

def test_pressure_below_expected_range_triggers_alarm():
alarm = Alarm()
alarm._sensor = StubSensor([15])
alarm.check()
assert alarm.is_alarm_on()

This means you can quickly get some tests in place. You probably shouldn’t leave the tests like that though, since they are relying on implementation details of the class. This could make them fragile in
the face of refactoring –  we’d rather the test only relied on the public interface. These tests also use  hard coded numbers where it’s not obvious why those values are chosen – another sign the production code could be improved.

Similarly in the HTMLConverter example, there is a violation of the Open-Closed principle that makes it awkward to have the code read from a string instead of a file. One way to get it under test initially is to use monkeypatching. You just pass in a different implementation of the “open” method that doesn’t in fact open a file, but rather provides a file-like object constructed from a string. Since we have duck typing, the production code doesn’t notice the substitution. You do have to be careful to put the “open” method back to normal at the end of the test though, or the test will have side effects.

So for example:

from cStringIO import StringIO

import unicode_to_html_converter
from unicode_to_html_converter import UnicodeFileToHtmlTextConverter

class StubOpen(object):
def __init__(self, text):
self.text = text

def __call__(self, *args, **kwargs):
return StringIO(self.text)

def test_convert_to_html():
try:
stub_open = StubOpen("text to convert <>&\"\n")
unicode_to_html_converter.open = stub_open
converter = UnicodeFileToHtmlTextConverter("a filename that will be ignored by StubOpen")
# unfortunately this next line is being mangled by my syntax highlighter.
# I think you know what I mean though
assert "text to convert <>&"
" == converter.convert_to_html()
finally:
unicode_to_html_converter.open = open

Again you can quickly get some tests in place, at the cost of a test that is somewhat awkward to read. The other way to get the code under test without changing it is the same as for C# or Java – put the text in a temporary file that is deleted afterwards. This may be simpler to understand, but will be slower to execute. It’s a tradeoff.

I guess the thing with these exercises is that you can get them under test in various ways without correcting the SOLID violations, or the other problems in the code. The idea is that skilled developers will not leave it at that. They will listen to the feedback the tests give them about the code being unecessarily hard to test, and respond by improving the design.

Do you automatically get better design with TDD? Does an otherwise average software developer produce superior designs if they write the tests first rather than afterwards? Does it make a difference what style of TDD you use?

incident #1

I was at a session at XP2012 with J.B. Rainsberger called “Architecture without Trying”. He demonstrated how he could develop a software system for Point-of-Sale terminals using TDD, and how the design naturally tended towards an MVC pattern as he did it. He claimed that purely by doing TDD, and focussing on two things, (removing duplication and improving names) that a good design would naturally emerge.

incident #2

I heard a talk by Luca Minudel at Agile Testing Days 2011 called “TDD with Mock Objects: Design Principles and Emergent Properties”. He was talking about a study he had done where he got people with varying levels of experience at TDD to do four short exercises. He also got them to answer a questionnaire about their knowledge of SOLID principles, and TDD. He then evaluated how well the designs they came up with in the exercises adhered to SOLID principles, and tried to correlate that with their TDD skill. He found that the people skilled in TDD did better in the exercises than those who only knew the theory of SOLID principles. The practice of TDD seems to help people with design. Luca also found that those more experienced with the London School of TDD did even better than other TDDers.

incident #3

I was working at a client recently when I met a developer from a different department. He came to see me several times over a period of a couple of weeks, and asked for advice about TDD. On about his fourth visit he told me he had written some code and now it was basically working, he wanted to write tests for it. He said he was having difficulty since he’d written a lot of static “helper” methods. I advised him that static methods make code quite hard to test, and can often be a sign of a not very good object oriented design.

He suggested we should invest in a fancy mocking tool that would enable him to easily replace these static methods in the tests. I told him a better investment would be for him to learn to write the tests first, get better at OO design, and not use static methods in the first place. I was probably a bit blunt, and he was quite polite, all things considered. He protested that he shouldn’t have to change the production code in order to get it under test, then left. That was the last time he came to me for advice.

Discussion

So does doing TDD guarantee better design? Well it should certainly help. I’ve presented before about the way TDD gives you early feedback on your design and plenty of opportunities to refactor. It’s less help though if you don’t know what a good design looks like in the first place. I think J.B. goes too far in his claims – if you don’t know MVC or SOLID principles then I’d be surprised if they started turning up in your code with any consistency.

No tool nor technique can survive inadequately trained developers” 

(A quote attributed to Steve Freeman). I think you do need to invest in learning good design techniques independently of TDD. If you lack basic OO design skills you probably won’t be able to do TDD in the first place, London School or otherwise.

I’ve been learning and improving my practice of TDD, including the London School, for many years now, and I was intrigued by Luca’s claims that it led to better adherence to SOLID principles than classic TDD. The London School involves an outside-in approach to design, that makes heavy use of mocks to check interactions between objects. This is in contrast to a more classic TDD style that prefers to verify the code works by checking the state of an object after an interaction.  I wouldn’t claim to be an expert in the London School of TDD, but I think I understand the basics and can adopt this style when I feel the problem is appropriate for it.

I tried out Luca’s four problems, (here on github) to see how I did. Luca very kindly gave me some feedback on my code, and I found hadn’t done as well as I had hoped to in adhering to SOLID principles. I’d got the code under test, but in a few places I could have improved the design more. I also slightly misunderstood the requirements for two of the problems, which led me to fork the repo and improve the instructions 🙂

I think in the cases where I could have done better with the design, it’s possible using the London School of TDD would have led to the improvements. I’m feeling there might be something in Luca’s conjecture. On the other hand, these problems might be so small and abstract, that I didn’t behave the same as I would in a real codebase. Certainly in one case I felt it wasn’t worth extracting an interface when there was only one implementation for it. In a real system maybe it would be more obvious that more implementations were likely, and that adding the interface would lead to a more decoupled design. Or then again maybe I’m just too used to python where expicit interface classes don’t tend to be used. Or maybe I’m just making excuses! In any case, doing these exercises has made me more interested to improve my knowledge and practice of the London School TDD style. 

I think these exercises are interesting little code katas in their own right, quite apart from Luca’s study on TDD. I think you can use them to learn about the SOLID principles, and practice some of the refactorings you often have to do to get badly designed code under test.

I’m working on a python translation of the exercises so we can try them out at the Gothenburg Python User Group meeting next week. Feel free to fork the repo and have a go at them yourself.

Please note – As of March 2013, I have rewritten this post in the light of further experience and discussions. The updated post is available here.

I feel like I’ve spent most of my career learning how to write good automated tests in an agile environment. When I downloaded JUnit in the year 2000 it didn’t take long before I was hooked – unit tests for everything in sight. That gratifying green bar is near-instant feedback that everthing is as expected, my code does what I intended, and I can continue developing from a firm foundation.

Later, starting in about 2002, I began writing larger granularity tests, for whole subsystems; functional tests if you like. The feedback that my code does what I intended, and that it has working functionality has given me confidence time and again to release updated versions to end-users.

Often, I’ve written functional tests as regression tests, after the functionality is supposed to work. In other situations, I’ve been able to write these kinds of tests in advance, as part of an ATDD, or BDD process. In either case, I’ve found the regression tests you end up with need to have certain properties if they’re going to be useful in an agile environment moving forward. I think the same properties are needed for good agile functional tests as for good unit tests, but it’s much harder. Your mistakes are amplified as the scope of the test increases.

I’d like to outline four principles of agile test automation that I’ve derived from my experience.

Coverage

If you have a test for a feature, and there is a bug in that feature, the test should fail. Note I’m talking about coverage of functionality, not code coverage, although these concepts are related. If your code coverage is poor, your functionality coverage is likely also to be poor.

If your tests have poor coverage, they will continue to pass even when your system is broken and functionality unusable. This can happen if you have missed out needed test cases, or when your test cases don’t check properly what the system actually did. The consequences of poor coverage is that you can’t refactor with confidence, and need to do additional (manual) testing before release.

The aim for automated regression tests is good Coverage: If you break something important and no tests fail, your test coverage is not good enough. All the other principles are in tension with this one – improving Coverage will often impair the others.

Readability

When you look at the test case, you can read it through and understand what the test is for. You can see what the expected behaviour is, and what aspects of it are covered by the test. When the test fails, you can quickly see what is broken.

If your test case is not readable, it will not be useful. When it fails you will have to dig though other sources outside of the test case to find out what is wrong. Quite likely you will not understand what is wrong and you will rewrite the test to check for something else, or simply delete it.

As you improve Coverage, you will likely add more and more test cases. Each one may be fairly readable on its own, but taken all together it can become hard to navigate and get an overview.

Robustness

When a test fails, it means the functionality it tests is broken, or at least is behaving significantly differently from before. You need to take action to correct the system or update the test to account for the new behaviour. Fragile tests are the opposite of Robust: they fail often for no good reason.

Aspects of Robustness you often run into are tests that are not isolated from one another, duplication between test cases, and flickering tests. If you run a test by itself and it passes, but fails in a suite together with other tests, then you have an isolation problem. If you have one broken feature and it causes a large number of test failures, you have duplication between test cases. If you have a test that fails in one test run, then passes in the next when nothing changed, you have a flickering test.

If your tests often fail for no good reason, you will start to ignore them. Quite likely there will be real failures hiding amongst all the false ones, and the danger is you will not see them.

As you improve Coverage you’ll want to add more checks for details of your system. This will give your tests more and more reasons to fail.

Speed

As an agile developer you run the tests frequently. Both (a) every time you build the system, and (b) before you check in changes. I recommend time limits of 2 minutes for (a) and 10 minutes for (b). This fast feedback gives you the best chance of actually being willing to run the tests, and to find defects when they’re cheapest to fix.

If your test suite is slow, it will not be used. When you’re feeling stressed, you’ll skip running them, and problem code will enter the system. In the worst case the test suite will never become green. You’ll fix the one or two problems in a given run and kick off a new test run, but in the meantime someone else has checked in other changes, and the new run is not green either. You’re developing all the while the tests are running, and they never quite catch up. This can become pretty demoralizing.

As you improve Coverage, you add more test cases, and this will naturally increase the execution time for the whole test suite.

How are these principles useful?

I find it useful to remember these principles when designing test cases. I may need to make tradeoffs between them, and it helps just to step back and assess how I’m doing on each principle from time to time as I develop.

I also find these principles useful when I’m trying to diagnose why a test suite is not being useful to a development team, especially if things have got so bad they have stopped maintaining it. I can often identify which principle(s) the team has missed, and advise how to refactor the test suite to compensate.

For example, if the problem is lack of Speed you have some options and tradeoffs to make:

  • Invest in hardware and run tests in parallel (costs $)
  • Use a profiler to optimize the tests for speed the same as you would production code (may affect Readability)
  • push down tests to a lower level of granularity where they can execute faster. (may reduce Coverage and/or increase Readability)
  • Identify key test cases for essential functionality and remove the other test cases. (sacrifice Coverage to get Speed)

Explaining these principles can promote useful discussions with people new to agile, particularly testers. The test suite is a resource used by many agile teamembers – developers, analysts, managers etc, in its role as “Living Documentation” for the system, (See Gojko Adzic‘s writings on this). This emphasizes the need for both Readability and Coverage. Automated tests in agile are quite different from in a traditional process, since they are run continually throughout the process, not just at the end. I’ve found many traditional automation approaches don’t lead to enough Speed and Robustness to support agile development.

I hope you will find these principles will help you to reason about the automated tests in your suite.