Posts tagged ‘TDD’

Last week I created a little quiz  and put a link to it on Twitter. I was interested to see whether the terminology around Test Doubles has standardized on Gerard Meszaro’s definitions, from his book “xUnit Test Patterns“, and I thought my twitter followers (I have over 1000 now!) might be able to tell me.

The quiz was taken by nearly 150 people, and overall, my conclusion is that Meszaro’s definitions are in general use today, (at least amongst people who follow me on Twitter). You can see a summary of the responses here. (I’ve closed the survey now, btw).

The first question was “Which of these is not a kind of Test Double”, and anyone who thought a “Suite” was a kind of test double clearly hasn’t got a clue, so I excluded their answers from my analysis. That was only a handful of responses though.

Looking at the remaining answers, between 70 – 85% share my understanding of what each kind of test double is. The scores are noticably a little lower for ‘Spy’, and actually I think my question on Spies was not very good. Jason Gorman kindly sent me a few tweets pointing out what he thought was unclear about it. The answer I was looking for to distinguish a Spy was “A Spy may raise an exception in the ‘Assert’ part of the test.”.  I was trying to articulate the difference between a “Spy” and a “Mock”, but I’ve personally only used Test Spy frameworks like Mockito and unittest.mock. Clearly I have more to learn.

I got the Mock question right, “A mock will fail the test if its methods are not called as expected”, but I thought a Spy was basically the same, except it fails the test in the “Assert” part instead of the “Act” part. Jason kindly pointed out that a Spy could be a Stub, a Fake, or even a decorated ‘real’ object. You don’t necessarily need to use a framework. The distinguishing feature of a Spy is that it records information about interactions, that your test can query later. So that was a good result from my quiz – I learnt something about Spies!

I also asked whether people had read Gerard Meszaro’s book, or my work-in-progress book, because I was interested to see if people would give better answers in this case. When I excluded all the people who said they hadn’t read either book, (about a third of responses), the scores improved significantly – over 80% agreed with me about the definitions of Mock, Stub and Fake. For Spies, on the other hand, the score was lower than for the group as a whole! That was clearly my fault…

Some people tried to argue that the distinction between the various kinds of test double is not interesting, that it doesn’t matter. I disagree. I think each kind has a different purpose and different situations when it is appropriate to use. If you can’t articulate what kind of test double you’re using, then there’s a very real danger you’re using it wrongly, or that better alternatives exist. Like – using several Mocks in the same test case, and finding it very fragile, when Stubs would be sufficient. Or – creating a new Stub for every test case, when it would be less work, and make for clearer tests, if you had a Fake that you could re-use in several.

So my conclusion from this quiz, and my other research on the subject, is that it’s worth using Meszaro’s definitions when discussing Test Doubles. A lot of programmers have a good working understanding of them, and do know the difference between a Stub and a Mock. So that’s very encouraging! One of the reasons Meszaros invented the “Test Double” terminology was to clear up the confusion that reigned at the time. It seems to me we’re in a much better position today because of his work. I do think there is more to do, though, I still see people using Test Doubles badly. Which is partly the motivation for my new book, which is about various techniques in Test Driven Development, not just Test Doubles actually. I’d better go and do some work on the section on Spies…

I blogged a while back about “Text-Based testing”, which is a variant of Test-Driven Development that I’ve used quite a bit. My husband, Geoff Bache, is developing several tools to support this style of development.

Recently, we met Llewellyn Falco and discovered the work he’s been doing with Approval Tests. We were all really excited to realize we’ve independently been working on something very similar. Llewellyn’s Approval Tests library is in some ways a unit-test version of Geoff’s tool TextTest which is probably more suited to integration or functional tests. What we’ve been calling “Text-Based Testing” I think is better described as “Text-Based Approval Testing”. I think it’s a particularly powerful technique for characterization tests of legacy code, and regression testing in general. Geoff’s latest tools also make it a viable approach for GUI testing, traditionally an area where people have difficulty doing pure TDD. (We’ll be talking about this at Eurostar in November.

I’ve written a fuller description of Approval Testing in a chapter of my work-in-progress book “Mocks, Fakes and Stubs”, but I’ll summarize here. In classic Test-Driven Development, you begin by defining a test case comprising three parts – “Arrange”, “Act”, “Assert”. The assertion generally takes the form assertEqual(expected, actual), and you calculate the expected value when you define the test case. Then you go away and implement the functionality, until the “actual” value matches “expected”, and the test passes.

With Approval Testing, you design the test case to the point of having “Arrange” and “Act”, but defer defining the “expected” value for the “Assert”. You take the approach of “I’ll know it when I see it”, and get on with implementing the code. When the actual value the code produces looks right, you “Approve” it – store the actual value in the test case.  So the assert statement becomes “assertEqual(approved, actual)”.

Text-Based Approval Testing
The value you approve on could be anything you can automatically diff the actual program output against – a file, a string, a screenshot, some json, contents of a database table… you name it. The thing is, plain text is wonderfully simple to diff, version control, merge, store, manipulate… and there’s a wealth of existing, well understood tools to do that. I guess that’s why Geoff’s tools only support plain text so far. His approach has always been that if your program produces output in a different format, you write a test fixture to convert it to plain text before you diff. Llewellyn’s tools have branched out more into diffing images and suchlike.

I think “Approval Testing” is a good name for the style of testing both Geoff and Llewellyn’s tools support. I like the implication that you explicitly approve the output from your program as correct, and use that as the basis for your test.

Other Approval Testers

Geoff and Llewellyn aren’t the only people using an Approval testing approach, either. Recently I led a workshop where we compared writing tests for the Gilded Rose Kata using both Cucumber and Approval Tests. Nat Pryce was there, and he later blogged about it. He speculates that Approval testing might solve some problems he’s seen with other kinds of testing. Nat has subsequently started developing a new approval testing tool, Pearlfish, so he can test his ideas.

There is also this recent screencast by Brett Slatkin from Google  who explains how he’s using an approval testing technique with image diffs to regression test his webapp. He says he finds this technique essential in a continuous delivery environment – these tests find bugs his other tests (both manual and automatic) miss entirely.

I have also found Approval testing to be a really useful technique, and I hope that simply having a good name for it will help people understand what it is. Perhaps you’ll realize it’s an approach you’ve already used, just without having a name for it. Or maybe you’ll be inspired to try out one of the tools I’ve mentioned.

Last week I was in Oxford at “Iverson College”, which is a conference on the topic of Array Language Programming. There were about 25 programmers there, most of whom are expert in one or more of APL, J, K, or Q. It’s not my usual comfort zone, put it that way! I’m fairly competent with a number of programming languages, notably Python and Java, but nothing I know is really much like these array languages. It’s been a huge culture shock, but in a good way, I think.

My main discoveries are that Array Programming is different again from Object Oriented Programming and Functional Programming, (although it has a lot in common with functional programming), and that this community contains some exceptional programmers. The total number of array language programmers is however extremely small and their work seems to be pretty much unknown to the wider programming community.

Array Programming Languages
I mentioned before four languages, APL, J, K and Q. They are similar to each other, kind of like Ruby and Python are similar to each other. I’ve gone through an introductory training in each language this week, largely given by the language designers themselves. I’d like to relate a little of what I’ve discovered about them.

APL
This is the oldest of the array languages, invented by Ken Iverson in the 1960s. It’s notorious for using an alphabet of funny-looking symbols to represent the built-in functions. You can try it out at http://tryapl.org – an interactive REPL (Read-Evaluate-Print-Loop) where you can put in snippets of code and see what the symbols do.

I thought at first that APL looked really intimidating and unnecessarily weird. Now having got to know it a little, I can see the benefits to the little symbols. They make the code really concise and unambiguous, and it doesn’t take long to learn their names. Once you can pronounce each symbol in your head as you read the code, it’s not much different from writing out the names in full in the editor.

The variant of APL that most of the conference attendees use is produced by the company Dyalog. I first met the CTO, Morten Kromberg, at an XP conference in 2006. He’s shown me some APL before, but this time I really got a chance to sit down with him and look at how he writes code. Dyalog APL has a powerful IDE including a REPL, where Morten showed me how he plays around with data and code, in order to come up with some useful APL expressions. When he’d got something working, he transfers code from the REPL into a file, to make it re-usable and shareable. It’s a familiar way of working to me, many Python programmers code this way, flipping between the REPL and a script file. It was a real pleasure to code with Morten – he is an extremely skilled programmer. Dyalog APL looks nice too, it has a fully-fledged IDE, and interfaces with .Net, Excel spreadsheets, ASP.net and more. It would fit nicely into the technology stack of many IT departments basically.

J
This was Ken Iverson’s next language, created together with Roger Hui, who now continues development of it. J is similar to APL in many ways, but is open source, and uses only ASCII characters. They’ve made an effort to make it open and less intimidating to newcomers, and probably for that reason, it’s the one I chose to download and try to learn before the conference.

I met Roger at breakfast on the first day of the conference, knowing nothing about who he was, he just said he was a programmer. I confessed that I’d downloaded J and made some joke about hoping I’d get on better with it than Ron Jeffries. (Ron wrote articles in his blog, about his efforts to learn J, and later gave up, finding it too hard!). Roger genuinely didn’t know who Ron Jeffries is, although he did know of the agile manifesto. He was very kind and concerned to help me to understand J though, (and Ron, if he wants!)

Despite my head start with J, by the end of the conference I found APL code easier to grasp – J seems more extreme to me. Roger calls J “executable mathematical notation”, and I’ve always been a bit more of an engineer than a mathematician.

K
K was invented by Arthur Whitney, who was also at the conference. I didn’t really get a feel for how the language works, more than that it’s extremely terse.

Arthur gave a talk at the conference, about his new project, KOS. He and two other guys are writing an operating system pretty much from scratch, using K, C, and bits of the linux kernel, (although they’d like to remove those). He showed us how you write applications for this new OS in K, by demonstrating building a text editor. He began from the alpha version of the OS with just a window manager, and a plain new window canvas that didn’t respond to any keyboard or mouse events.

Arthur added a line of K code to let you enter text into the window – a listener to key presses. Then a line of code to move the caret around with the arrow keys. Then a line of code for changing the font size. Then scrolling. Code to handle Ctrl-C and Ctrl-V to copy and paste text…. in the space of less than half an hour, he had an equivalent to notepad working. No compilation, no reloading. And the code was…. phenomenal. You can see a version of it here. I can’t read it really, it looks mostly like line noise to me. All his variable and function names are one or two characters, and K just seems pathologically terse.

I raised my hand and asked Arthur if he thought his code would be more readable if he used longer variable names? He thought for a moment, looking surprised and a little bewildered by the question. Then shook his head and said slowly “No. no. I don’t think I need that. I want to see all my code on the screen at once”. Needless to say, that was a big culture shock moment for me!

The size of the codebase is something all the array language programmers seem really concerned about, even if Arthur’s code is considered extreme even in that community. One thing I did later that week was take a piece of code that is in Robert C. Martin’s book “Clean Code”  (Args.java), as an example of clean Java code, and showed it to the group. There were general exclamations of “aargghh! that hurts my eyes!” but after a little while as I explained the structure of the code they seemed to appreciate it a little better. What they did say that intrigued me though, was that they automatically scanned the page looking for the symbols – the >, !, = signs – the parts that do something, as they put it. The other text they said obscures the structure, it distracts the eye. Yes, that’s right. Having names for the functions and variables makes the code less readable.

KDB+ and Q
KDB+ is a very small and fast commercial database largely used by financial institutions, also originally created by Arthur Whitney. Q is a kind of domain specific language built on top of K, that you use to query the data in a KDB+ instance.

I sat down with Attila Vrabecz, an experienced Q and KDB+ programmer, and we coded together for a couple of hours. We tackled a problem I’d previously coded with Morten in APL, to help me see what was different. There were many similarities – the workflow was the same for example – experimenting in the REPL before transferring the code into a more permanent, reusable form. I noticed Q has many more English words in it, fewer strange symbols, and Attila made more use of library functions than Morten did. It seems Q is designed to be approachable for a former SQL programmer, although once you scratch the surface, it’s much more like APL than SQL.

Test-Driven Development
I gave a talk at the conference about TDD. My aim was to provoke discussion, and argued that writing automated tests using TDD is the best approach. I was definitely successful at sparking a discussion! Actually, it didn’t seem the idea that programmers should write automated tests for their code was all that controversial, especially amongst the more seasoned developers present. We got way more hung up on how large a chunk of code counts as a “unit”, for your unit test, and what clean code looks like in an array language. To my eye, their units are large and their clean code is terse.

A challenge for the future
Dave Thomas, former lead developer for the Eclipse project, and general software visionary, is also an APL and K programmer. He flew in for just one day of the conference, and his talk functioned as the keynote address for the week – it was a clear challenge to the Array Languages community.

Dave painted a vision for the future where people will be living in a sea of big data they don’t understand, and lack adequate tools to query. He sees a great opportunity for array languages, which are generally very good at handling large amounts of data.

He ended his talk with an ambitious challenge to this community to get its act together, start being seen as a credible alternative, and grow. I could only applaud and agree – I found his advice insightful, and I hope the array languages community will do as he suggests.

I spent one very pleasant evening chatting with a woman who is about to embark on her PhD in atmospheric science. She’s hoping to use array languages to help her create software models that will execute quickly on huge arrays of multi-dimensional climate data. Her work sounds fascinating, and I hope it’s a sign of array languages starting to be used beyond their traditional niche in finance.

So I’m leaving the conference carrying a huge tome entitled “A complete introduction to Dyalog APL”, some pieces of code I’ve written, and good intentions to study further. I do find it fascinating that even with the little I know of it, APL allows me to think about and solve a problem differently than I do in Python. I anticipate I’ll find plenty of people in the Array Languages community willing to help me if I do continue to try to learn it. They’re an opinionated, quirky, mature, gentle, yet small bunch of extremely skilled programmers, and I’m glad to have met and coded with them.

This is the third post in a series about London School TDD. The first one is here, introducing the topic. The second post discusses “Outside-In Development with Double-Loop TDD”. In this post I’d like to talk about the second difference I see between Classic and London School TDD, which is to do with your style of Object Oriented Design.

“Different design styles have different techniques that are most applicable for test-driving code written in those styles, and there are different tools that help you with those techniques…

That’s what we  … designed JMock to do …
“Tell, Don’t Ask” object-oriented design.”

— Nat Pryce, in an email to a discussion forum.

That quote explains the objective Nat et al had when designing JMock, and I think it shows  that London School TDD is actually a school of design as much as a testing technique. Let’s take a closer look at this way of designing objects.

Tell, Don’t Ask

“Tell, Don’t Ask” Object Oriented Design is about having Cohesive objects that hide their internal workings. If your objects obey the Law of Demeter, that’s a good start, it means they hide their inner workings and don’t talk to objects far away on the object graph. It reduces Coupling in your system, which should make for better maintainability.

In their book “Growing Object Oriented Software, Guided by Tests”, Freeman & Pryce actually define “Tell, Don’t Ask” as the same as following the Law of Demeter (p17). Then they go on with several chapters about their design style, expanding far beyond simply “following the Law of Demeter”. It’s well worth a read, here’s a sample:

“… we focus our design effort on how the objects collaborate … obviously, we want to achieve a well-designed class structure, but we think the communication patterns between objects are more important.”

— Freeman & Pryce, GOOS, p58

Message passing vs Types with Data

So it’s basically about how you view your objects. Do you see them primarily in terms of sending and receiving messages to other objects in order to get stuff done? Or are you more focussed on the data your objects look after and the class of objects they are part of?

In the diagram below you can see an object is defined in terms of which messages it sends and receives:      london_school_008

This diagram shows the same object, but with a focus on data rather than messages:

london_school_007

If you have a “message” focus, you’ll be concerned with defining protocols and interfaces. You’ll worry about which collaborators will be needed to process a particular message. If you have a “data” focus, you’ll be interested in checking your object goes through particular state transitions. You’ll check it makes correct calculations based on its data, and hides whether the result is cached or calculated.

In my first post I talked about the three ways to verify object behaviour. In Classic TDD, the most popular way to write your assert is to check the state of the object you’re testing, or a collaborator, using a public API. This naturally leads you to design objects that are more type-oriented, with the emphasis on class relationships.

In London School TDD, the favoured way to write your assert is to use a mock and check a particular interaction happened, or in other words, a particular message was passed. This is because you favour a design where objects don’t reveal much at all about their data – your system is all about the interactions.

Dependencies and Collaborators

In a message-oriented design, it’s natural to want to specify which collaborators a particular object needs in order to get something done, and what messages it will send them. It’s part of the public specification of an object, and natural to pin down in a test case using mocks. If you instead check your object via a method that lets you query its state, it could expose details that might stop you refactoring the internals later. This leads you to prefer to check your messages, rather than state and data.

If you have a more type-oriented design, you may want to hide the fact you’re storing data accross several objects, or delegating certain calculations to other objects. Those dependencies aren’t part of the public specification, what matters is the end result. If you start exposing these interactions in your test via mocks, you’ll end up with brittle tests that hinder a subsequent redistribution of responsibilities between an object and its dependents. This leads you to prefer to check state and data, rather than interactions.

Comparing the two styles

In these articles I’ve tried to draw each style of TDD to an extreme in order to emphasize the differences. Of course, in practice, a competent developer will use the style most appropriate to the situation she finds herself in. She may use both styles while developing different pieces of the same system. In my next post, I’d like to illustrate this with a small example.

In my last post, I started talking about London School TDD, and the two features of it that I think distinguish it from Classic TDD. The first was Outside-In development with Double Loop TDD, which I’d like to talk more about in this post. The second was “Tell, Don’t Ask” Object Oriented Design. I’ll take that topic up in my next post.

Double Loop TDD

london_school_001

When you’re doing double loop TDD, you go around the inner loop on the timescale of minutes, and the outer loop on the timescale of hours to days. The outer loop tests are written from the perspective of the user of the system. They generally cover thick slices of functionality, deployed in a realistic environment, or something close to it. In my book I’ve called this kind of test a “Guiding Test”, but Freeman & Pryce call them “Acceptance Tests”. These tests should fail if something the customer cares about stops working – in other words they provide good regression protection. They also help document what the system does. (See also my article “Principles for Agile Automated Test Design“).

I don’t think Double Loop TDD is unique to the London School of TDD, I think Classic TDDers do it too. The idea is right there in Kent Beck’s first book about eXtreme Programming. What I think is different in London School, is designing Outside-In, and the use of mocks to enable this.

Designing Outside-In

If you’re doing double loop TDD, you’ll begin with a Guiding Test that expresses something about how a user wants to interact with your system. That test helps you identify the top level function or class that is the entry point to the desired functionality, that will be called first. Often it’s a widget in a GUI, a link on a webpage, or a command line flag.

With London School TDD, you’ll often start your inner loop TDD by designing the class or method that gets called by that widget in the GUI, that link on the webpage, or that command line flag. You should quickly discover that this new piece of code can’t implement the whole function by itself, but will need collaborating classes to get stuff done.

london_school_003

The user looks at the system, and wants some functionality. This implies a new class is needed at the boundary of the system. This class in turn needs collaborating classes that don’t yet exist.

The collaborating classes don’t exist yet, or at least don’t provide all the functionality you need. Instead of going away and developing these collaborating classes straight away, you can just replace them with mocks in your test. It’s very cheap to change mocks and experiment until you get the the interface and the protocol just the way you want it. While you’re designing a test case, you’re also designing the production code.

london_school_004

You replace collaborating objects with mocks so you can design the interface and protocol between them.

When you’re happy with your design, and your test passes, you can move down the stack and start working on developing the implementation of one of the collaborating classes. Of course, if this class in turn has other collaborators, you can replace them with mocks and design these interactions too. This approach continues all the way through the system, moving through architectural layers and levels of abstraction.

london_school_005

You’ve designed the class at the boundary of the system, and now you design one of the collaborating classes, replacing its collaborators with mocks.

This way of working lets you break a problem down into manageable pieces, and get each part specified and tested before you move onto the next part. You start with a focus on what the user needs, and build the system from the “outside-in”, following the user interaction through all the parts of the system until the guiding test passes. The Guiding Test will not usually replace parts of the system with mocks, so when it passes you should be confident you’ve remembered to actually implement all the needed collaborating classes.

Outside-In with Classic TDD

A Classic TDD approach may work outside-in too, but using an approach largely without mocks. There are various strategies to cope with the fact that collaborators don’t exist yet. One is to start with the degenerate case, where nothing much actually happens from the user’s point of view. It’s some kind of edge case where the output is much simpler than in the normal or happy-path case. It lets you build up the structure of the classes and methods needed for a simple version of the functionality, but with basically empty implementations, or simple faked return values. Once the whole structure is there, you flesh it out, perhaps working inside-out.

Another way to do this in Classic TDD is to start writing the tests from the outside-in, but when you discover you need a collaborating class to be implemented before the test will pass, comment out that test and move down to work on the collaborator instead. Eventually you find something you can implement with collaborators that already exist, then work your way up again.

A Classic TDD approach will often just not work outside-in at all. You start with one of the classes nearer the heart of the system. You’ll pick something that can be fully implemented and tested using collaborating classes that already exist.  Often it’s a class in the central domain model of the application. When that is done, you continue to develop the system from the heart towards the outside, adding new classes that build on one another. Because you’re using classes that already exist, there is little need for using mocks. Eventually you find you’ve built all the functionality needed to get the Guiding Test to pass.

Pros and Cons

I think there’s a definite advantage to working outside-in, it keeps your focus on what the user really needs, and helps you to build something useful, without too much gold-plating. I think it takes skill and discipline to work this way with either Classic or London School. It’s not easy to see how to break down a piece of functionality into incremental pieces that you can develop and design step-by-step. If you work from the heart outwards, there is a danger you’ll build more than you need for what the user wants, or that you’ll get to the outside, discover it doesn’t “fit”, and have to refactor your work.

Assuming you are working outside-in, though, one difference seems to me to be in whether you write faked implementations in the actual production code, or in mocks. If you start with fakes in the production code, you’ll gradually replace them with real functionality. If you put all the faked functionality into mocks, they’ll live with the test code, and remain there when the real functionality is implemented. This could be useful for documentation, and will make your tests continue to execute really fast.

Having said that, there some debate about the maintainability of tests that use a lot of mocks. When the design changes, it can be prohibitive to update all the mocks as well as the production code. Once the real implementations are done, maybe the inner-loop tests should just be deleted? The Guiding Test could provide all the regression protection you need, and maybe the tests that helped you with your original design aren’t useful to keep? I don’t think it’s clear-cut actually. From talking to London School proponents, they don’t seem to delete all the tests that use mocks. They do delete some though.

I’m still trying to understand these issues and work out in what contexts London School TDD brings the most advantage. I hope I’ve outlined what I see as the differences in way of working with outside-in development. In my next post I look at how London School TDD promotes “Tell, Don’t Ask” Object Oriented Design.