As I mentioned in my last post I chaired a fishbowl discussion at SDC2010 with title “Should a professional developer always use Test Driven Development?”. I was delighted that the invited panelists Michael Feathers, Geoff Bache and Andrew Dalke all turned up, along with a few dozen other conference participants. As I predicted, we had a lively and interesting debate.

Michael half-jokingly complained that Bob Martin goes around making these controvertial statements all the time, which Michael then gets to go around defending. Michael has a much more conciliatory attitude than Bob, and his take was that every truly professional developer must have at least given TDD a good try and learnt the technique, even if they then decide not to use it.

Geoff’s main point was that we need to widen the definition of TDD to include any process that involves checking in tests at the same time as the code, and not restrict it to just the classic Red-Green-Refactor style with tests in the same language as the code.

Michael was largely receptive to this view, or at least that the soundbite description of “never write any code until you have a failing test” probably was a bit too brief description to encompass the whole of TDD. He did argue though, that the classic TDD style leads to code with good design characteristics of high cohesion, loose coupling, small classes and methods etc, and that he had not found other design techniques which led to better code than TDD. He was not keen to move to a TDD approach without unit tests, and lose these benefits, even if they result in good tests.

Andrew argued that TDD is not sufficient by itself to produce a good suite of tests, and that there are other, better ways to produce these tests. Andrew pointed out that he had examined Fitnesse, a codebase that Bob Martin, (and some others), has created using TDD, and that he found several bugs, including security holes in it. Michael’s counterargument was that with TDD, you get as good tests as you are capable of – if you are not skilled/aware of security issues, then you won’t test for security holes, whatever process you use to create tests.

Another argument of Andrew’s was that he often likes to write tests that he expects to pass, to verify that his code works as expected, for example that he has implemented an algorithm correctly. In the narrow definition of TDD, you are only allowed to write tests you expect to fail. Michael’s take was that this was indeed a too narrow definition of TDD. He said that he frequently writes tests as a way of asking questions of his code, and this often leads to tests that pass straight away.

Some of the “audience” also stepped up to the microphones and joined in. Brian Marick pointed out that forcing yourself to write the test first was a very good way of ensuring you do actually write the test, instead of being lazy and just writing more code. The counter to that was along the lines of that there are other processes for arriving at a good test suite, which took different kinds of discipline. Andrew quoted the sqlite project, which boasts 100% branch coverage of their code by their test suite. Publishing your coverage figures and refusing to let them slip is a way of preventing developer laziness too.

Brian Marick wrote an article about coverage and tests over a decade ago, so he summarized it for us, which was interesting, but I think slightly beside the point. I think he was trying to argue that measuring coverage alone is not enough to guarantee you have a good test suite, but I don’t think that was what Andrew was trying to claim. Simply doing TDD is not a guarantee that you will end up with a good test suite either.

For me, the interesting outcome of the discussion was pointing out that the alternatives to TDD are not only “cowboy coding” or “test later, ie never”, or “bad tests”, but that there are other legitimate ways to come up with a good test suite, and professional developers may choose to use them instead of classic TDD. TDD is a discipline which all professional developers should perhaps have in their repertoire though. I think we agreed it is also a teaching aid for learning to write good tests.

Happily, we definitely all agree that creating a good automated test suite alongside code is important. The precise method a professional developer should always use to produce it was not agreed upon though.

I’m really looking forward to Scandinavian Developer Conference, and in particular the fishbowl discussion I’ll be moderating on the Tuesday at 10:30am. Presenting their views will be Michael Feathers, Andrew Dalke, and Geoff Bache, and the topic under discussion is the same as the title of this post: Should a professional developer always use TDD?

I’ve been enthusiastic about writing automated tests for my code since 2000 when I discovered eXtreme Programming, and started using JUnit. It’s become a habit for me to write tests before code. Occasionly I decide not to, perhaps I am feeling lazy, or think a test would be too difficult to write. I find I usually regret it and end up writing a test afterwards anyway.

One of the things Bob Martin, (a colleague of Michael Feathers), says about TDD in his book “clean code”, is that it is a matter of professionalism. Developers should be like doctors. Would you trust a doctor who didn’t wash her hands because she didn’t belive in it? Well, you shouldn’t trust a developer who doesn’t use TDD because she doesn’t believe in it.

I’ve known Andrew Dalke since 2002, and we’ve worked together on and off since then. Recently he wrote this article criticising TDD. Andrew does not believe TDD is necessary for good development work to happen. Is he unprofessional? Far from it.

My experience of working with Andrew tells me that he is an excellent programmer, who produces high quality code and automated tests. However, the process by which he arrives at this code and tests is not TDD. Tests get written during development, but not in advance of the code they test. The tests do not in any way drive the design, in fact, he uses knowledge of the design of the code to inform what tests he writes.

Andrew says in his article “Once I have a good sketch of how the code is going to be, I often continue by filling in the details. At this point unit tests starts to be useful” he likens what he does to an XP spike solution, except that he does not throw away the spike code and start over when he starts adding tests.

The other person I know who has a complex relationship with TDD is my husband Geoff. Several years ago he was labelled a heretic and almost thrown out when he admitted to a room full of XP enthusiasts that he didn’t write unit tests at all. Geoff does write tests – a lot of tests in fact – but they are not xUnit tests, and they don’t drive the design of his code.

Geoff uses an approach he calls “text-based testing” which involves driving the program from the command line, (or some kind of script), and having his code write a plain text log file of what it is doing. A tool called TextTest picks up the log output and compares it to the saved version from a previous run. Differences are flagged as test failure.

It’s a simple idea, but it is actually very effective and easy to use when you get the hang of it. The main advantage over ordinary TDD is that there is little or no code written per test, meaning less code to maintain overall. The fact that the tests are independent of the design of the code makes refactoring easier, and writing tests for legacy code relatively risk-free.

TDD is a bit different with the text-based approach though. Geoff thinks of what he does as TDD, but actually, only half of the test is nailed down in advance of the code – only the part that tells the program which features to exercise. The part that asserts that it did the right thing is simply recorded after the code is written.

So I expect a fascinating and lively discussion to ensue when I get these guys together! Perhaps you’ll join us?

(Note: I wrote up the discussion in my next post)

In my current assignment, I’m taking the role of “developer-in-test”. I’m working in a large distributed development project, which is building new functionality on a large existing codebase. In practice, I work closely with the developers in the project and build automated tests for subsystems that previously had only manual tests. The developers can use these tests to support their work, and add new tests as they build new features.

My background is basically as a developer, so I have been reading up on testing. I found “Lessons Learned in Software Testing” by Kaner, Pettichord and Bach very helpful, and “Agile Testing” by Lisa Crispin and Janet Gregory helpful and also very thorough. I find it interesting that the authors of the first book started out as developers and now classify themselves as testers, while Lisa and Janet apparently always have called themselves testers, although they clearly write a fair amount of code as part of their work.

Dave Nicolette recently made a blog post “Merging the developer and tester roles” where he argues that Tester is just a specialization of Developer in the agile world, like DBA (DataBaseAdministrator) is a specialization of Developer. He argues that agile teams need to be staffed with generalizing specialists. That means anyone can turn their hand to any task that is currently needed, while still having some tasks they perform with more skill than others.

I like Dave’s viewpoint, it fits my experience. I can only write effective 2nd Quadrant tests, (business facing, support the team), if I understand what the developers need, and I do that best if I have done some development on that part of the system myself. To put it another way, I need to be just as competent at writing code as the other developers in the project I’m working in, but I also need additional skills to do with testing.

I like the term “developer-in-test” to describe a role writing and enabling 2nd Qudrant tests.

Having said all that, I’m not sure I agree with Dave that the Developer and Tester roles should always be merged. In my current assignment I’m also helping a group of testers, usability experts, technical writers and product owners to get going with exploratory testing. This testing falls into Q3 of the agile testing quadrants, and is quite different. You still need testing skills, but developer skills are mostly irrelevant. It’s much more about understanding what the user is trying to achieve with the system, and how they view it.

I think there is a role for non-coding testers in Q3 testing. However, I don’t think you’ll get far with Q3 unless you have the other quadrants well covered with automated tests. So I think the majority of work for a tester in an agile environment is still going to involve test automation. Only the biggest projects will be able to afford to have non-coding testers.

I’ve just heard that two of my proposals for XP2010 have been accepted, which means I will definitely be off to Trondheim in early June. I’ve heard Trondheim is very beautiful, and the XP conference it usually excellent, so I’m really looking forward to it. It will actually be my 8th XP conference!

I’m going to be running a half day workshop “Test Driven Development: Performing Art”, which will be similar to the one I ran at XP2009, (which I blogged about here). I’ve put up a call for proposals on the codingdojo wiki, so do write to me if you’re interested in taking part.

The other thing I’ll be doing is a lightning talk “Making GUI testing productive and agile”. This will basically be a brief introduction to PyUseCase with a little demo. Hopefully it will raise interest in this kind of approach.

Perhaps I’ll see you there?

At GothPy yesterday, Geoff talked about code coverage and tests. Geoff has spent a lot of his evenings lately working on PyUseCase, and getting the test coverage up to 100%, (statement coverage), a feat which he achieved last week. The evidence for this is available for all to see on the texttest site, (which is updated daily, btw, so if it is not green and 100% the day you read this post, then clearly Geoff had a bad day yesterday).

I have limited experience of using coverage statistics to evaluate my tests, so it was interesting to hear Geoff summarize his findings. He thought it had been well worth the effort to get coverage to 100%, he’d found some bugs, some dead code, and improved his design along the way. Actually, saying he has 100% coverage is a statement that needs qualification. The tool he’s been using – coverage.py – has a feature whereby you can mark lines of code as # pragma: no cover, ie I don’t want this line counted for coverage purposes. So he’s marked 37 of 3242 lines like this.

The reason for excluding these lines is mostly practical – due to the nature of the tool you can’t test it automatically when it is in “interactive” mode without physically pressing the buttons yourself – so automated tests for that part are impossible. Some excluded lines are for error cases which should never occur, but for which it would be useful to have a good error message if they ever did.

Overall, Geoff thinks coverage is very useful to help you to identify

  • poorly tested areas of your code
  • mistakes in your tests
  • dead code
  • refactoring opportunities

The first one is obvious, but the others might take more explaination. Generally, each test is for a specific feature. If you think you have a test for a feature, but the code coverage shows the implementation of that feature not to be covered, then there is probably a mistake in your test.

Similarly, if your tests cover all your features and some code is not covered, maybe it’s not that important code at all, and could be safely removed. Geoff’s tests are not unit tests, they are testing the whole of PyUseCase, and that maybe makes a difference with this particular point. If I just had unit tests, and a piece of code wasn’t covered, I’m not sure I could as easily infer that it wasn’t needed as a part of a larger feature.

Refactoring opportunities can be identified from gaps in coverage too. The idea is that poorly tested code is a clue that it has other problems too. Perhaps you find two pieces of code are similar, and one copy has a gap in coverage. This could indicate they originate from copy-paste programming, and could be combined into one routine, with full test coverage.

Geoff had some tips for people who wanted to use coverage statistics to improve their tests.

  • Don’t design your tests around coverage. Write appropriate tests, and then measure coverage.
  • This applies even when working with coverage results. See the coverage report as containing clues for new tests, not commands.
  • Use “#pragma : no cover” in your code to be explicit about code that you decide not to try and cover. Review these periodically.
  • Don’t be fanatical about absolute numbers. Commands like “Aim for at least 85% coverage” are counterproductive. (You get what you measure).
  • It’s always good to increase feasible coverage. It’s sometimes better to spend your limited time on other things. But if you don’t measure, you can’t make that decision effectively.

These last points are mostly also made in an article by Brian Marick which is quite old (1997). Geoff found the article when he was researching the talk for GothPy, and thought it was very good, and fits his experience.

Inspired by Geoff’s talk, I spent some time today trying to get some coverage numbers for the code and tests I’m working on at present. Unfortunately it seemed to be a bit tricky to get the coverage tool to work. It’s not python, of course, and that may have something to do with it. Hopefully I’ll sort it out and be able to write a new blog post about my own experiences with coverage statistics sometime soon.