Archive for the ‘Opinion’ Category

Last week I created a little quiz  and put a link to it on Twitter. I was interested to see whether the terminology around Test Doubles has standardized on Gerard Meszaro’s definitions, from his book “xUnit Test Patterns“, and I thought my twitter followers (I have over 1000 now!) might be able to tell me.

The quiz was taken by nearly 150 people, and overall, my conclusion is that Meszaro’s definitions are in general use today, (at least amongst people who follow me on Twitter). You can see a summary of the responses here. (I’ve closed the survey now, btw).

The first question was “Which of these is not a kind of Test Double”, and anyone who thought a “Suite” was a kind of test double clearly hasn’t got a clue, so I excluded their answers from my analysis. That was only a handful of responses though.

Looking at the remaining answers, between 70 – 85% share my understanding of what each kind of test double is. The scores are noticably a little lower for ‘Spy’, and actually I think my question on Spies was not very good. Jason Gorman kindly sent me a few tweets pointing out what he thought was unclear about it. The answer I was looking for to distinguish a Spy was “A Spy may raise an exception in the ‘Assert’ part of the test.”.  I was trying to articulate the difference between a “Spy” and a “Mock”, but I’ve personally only used Test Spy frameworks like Mockito and unittest.mock. Clearly I have more to learn.

I got the Mock question right, “A mock will fail the test if its methods are not called as expected”, but I thought a Spy was basically the same, except it fails the test in the “Assert” part instead of the “Act” part. Jason kindly pointed out that a Spy could be a Stub, a Fake, or even a decorated ‘real’ object. You don’t necessarily need to use a framework. The distinguishing feature of a Spy is that it records information about interactions, that your test can query later. So that was a good result from my quiz – I learnt something about Spies!

I also asked whether people had read Gerard Meszaro’s book, or my work-in-progress book, because I was interested to see if people would give better answers in this case. When I excluded all the people who said they hadn’t read either book, (about a third of responses), the scores improved significantly – over 80% agreed with me about the definitions of Mock, Stub and Fake. For Spies, on the other hand, the score was lower than for the group as a whole! That was clearly my fault…

Some people tried to argue that the distinction between the various kinds of test double is not interesting, that it doesn’t matter. I disagree. I think each kind has a different purpose and different situations when it is appropriate to use. If you can’t articulate what kind of test double you’re using, then there’s a very real danger you’re using it wrongly, or that better alternatives exist. Like – using several Mocks in the same test case, and finding it very fragile, when Stubs would be sufficient. Or – creating a new Stub for every test case, when it would be less work, and make for clearer tests, if you had a Fake that you could re-use in several.

So my conclusion from this quiz, and my other research on the subject, is that it’s worth using Meszaro’s definitions when discussing Test Doubles. A lot of programmers have a good working understanding of them, and do know the difference between a Stub and a Mock. So that’s very encouraging! One of the reasons Meszaros invented the “Test Double” terminology was to clear up the confusion that reigned at the time. It seems to me we’re in a much better position today because of his work. I do think there is more to do, though, I still see people using Test Doubles badly. Which is partly the motivation for my new book, which is about various techniques in Test Driven Development, not just Test Doubles actually. I’d better go and do some work on the section on Spies…

A while back, Gojko Adzic published this article “Redefining Software Quality” and I think it’s pretty insightful, pointing out that we often expend a lot of effort ensuring quality at lower levels of the pyramid, when we should perhaps be investing higher up.

I wanted to work out what testing activities you’d do to ensure quality at different levels of Gojko’s Quality Hierarchy, so I began by thinking about the software testing quadrants. The testing quadrants were originally documented by Brian Marick in his blog, and later developed by Lisa Crispin and Janet Gregory in their book “Agile Testing“. (Here is a slide deck by Janet Gregory that gives a summary of the quandrants).

agile testing quadrants

I like the agile testing quadrants because they help you to reason about different testing activities and why you’re doing them. They make it clear that in agile, testing has this big role in supporting the team – spreading knowledge about what’s being built and enabling the team to be agile about feature changes. In more traditional projects, testing focuses almost exclusively on the right side of the quadrants, missing this role entirely.

Anyway, if you put the agile testing quadrants alongside Gojko’s quality hierarchy, I think you get something like this:

quality hierarchy and testing quadrants

Clearly, Q1 and Q2 are all about ensuring functionality basically works, and usually Q2 tests will run against deployed software, (deployed in a test environment). Q4 tests cover things like performance and security, and usability falls under Q3. I think things get a little more tricky for the higher levels though. There’s a distinct danger we’ve just run out of quadrants!

Beyond the testing quadrants

To test a piece of software is useful or successful, you’ll need to look at ideas that are relatively new in Agile. In his article, Gojko of course points out his book on Impact Mapping, and mentions Feature Injection and Lean Startup. Lisa and Janet published their book in 2009, “Lean Startup” by Eric Ries came out in 2011, Gojko’s “Impact Mapping” book is from 2012, and correct me if I’m wrong, but I don’t think there is a book on Feature Injection yet.

Agile ideas are moving on, especially compared with where they were when methodologies like Scrum and XP were originally documented a decade or more ago. I think it’s clear testers need to embrace new ideas and practices too.

If you look at Lean Startup, one of the concrete ideas is to do A/B testing of new features – that is, you divide your users into an “A” group, who see the new feature and a “B” group who don’t. Then you measure how each group behaves, and from that draw conclusions about whether the feature was any good. If the “A” group buys more stuff, spends more time on the site, generally acts like they are happier than the “B” group, you keep the feature, otherwise it gets dropped. It’s a bit like a trial of a new medicine – you compare the patients who get it with a control group who don’t, before you decide whether to approve it.

It seems to me that testers should be involved in designing and performing A/B tests – they’re well positioned with critical thinking skills, knowledge of the user, and technical automation skills. The results of such tests should tell us about whether users find a new feature useful. So that should get us up another level on Gojko’s pyramid:

quality hierarchy and testing quadrants - with A/B

The last level is fairly dependent on how you define success, but for a lot of software products, success means lots of users. Lean startup has another interesting idea here – the “Net Promoter Score”. Basically you ask a small group of initial users if they’d recommend your product, and make a simple calculation to predict if your user base is going to grow when you release it more widely. It’s an idea for what to put at the top level:

quality hierarchy and testing quadrants - with net promoter score

Conclusions

Of course, for many teams, it’s enough of a struggle to test for quality at the lower levels of the pyramid, without worrying about A/B tests or Net Promoter score! James Shore and Diana Larsen have come up with a model of “Agile Fluency” which I think is relevant here. Basically it outlines how, as teams get better at agile, their practices change. The three star level of fluency seems to contain a lot of ideas from Lean Startup, and optimizing for quality at the top two levels of Gojko’s pyramid. At one and two star fluency, delivering business value on the market cadence, just the testing quadrants get you a long way.

Changing testing activities also implies changes for testers. The role of tester has already changed with the advent of agile methods, and I predict it’s going to continue changing. I see a technical tester role appearing that is pretty close to business analyst, doing things like supporting the team with test automation and data analysis of A/B tests. So testers: get a head start and find out about Lean Startup!

The world need more and better programmers. Jason Gorman recently wrote this post encouraging people to start offering software apprenticeships, as an alternative to computer science degrees.

He writes:

“our computing education in [the UK] is preparing students for a career in a version of computing most of us don’t recognise. Students devote the majority of their time learning theory and skills that they almost certainly won’t be applying when they get their first proper job. Computing schools are hopelessly out of touch with the reality of computing in the real world. While employers clamour for TDD or refactoring skills, academics turn their noses up at them and focus on things like formal specification and executable UML and compiler design, along with outdated and thoroughly discredited “software engineering” processes.” — Jason Gorman

Jason ends his post with a call to arms – if you’re a good software developer, get yourself an apprentice, and start training them. It’s the same message I heard from Dave Hoover when he visited Göteborg recently. I think he also sees a multi-year apprenticeship as a better alternative for training programmers than a computer science degree.

I also recently came across this article, written by a computer science teacher in the US, with the following paragraph:

“I no longer teach programming by teaching the features of the language and asking the students for original compositions in the language. Instead I give them programs that work and ask them to change their behavior. I give them programs that do not work and ask them to repair them. I give them programs and ask them to decompose them. I give them executables and ask them for source, un-commented source and ask for the comments, description, or specification. I let them learn the language the same way that they learned their first language. All tools, tactics and strategies are legitimate. ” — William Hugh Murray

So clearly some academics are teaching in creative ways. Rather than abandoning computer science degrees, might it not be better to improve their content?

One of the things about the XP conference is that it brings together industry and academics, and lets them hear from one another. How to teach programming is a very important topic that is often discussed there. XP2005 for example was held at Sheffield university, where I remember chatting to one of the professors, and being impressed by the way they used eXtreme Programming as part of their undergraduate course.

Another thing that happened at XP2005 was the first coding dojo I attended, and I believe the first one ever held outside of France. It was presented by Laurent Bossavit and Emmanuel Gaillot, founders of the Paris dojo. I was excited to discover a context in which I could improve my practical programming skills, in regular short bursts, alongside a continuing paid job.

So one of the things I do in my new life as an agile testing consultant is to use the coding dojo format to teach people how to program better. We’ll do code kata exercises and practice Test Driven Development, Refactoring, and discuss what Clean Code looks like. So far the reaction from professionals I’ve done this with has been very positive. Lots of people who have been coding for years appreciate the chance to learn new practical skills.

I’m also getting involved in more formal education, this spring I’m teaching a three week course in automated testing, as part of a “Kvalificerad Yyrkesutbildning” in software testing. This is a one year full time course for students wanting to learn a practical skill, as an alternative to going to university and studying a more academic subject. In Sweden you can get a student loan while you’re studying this course, and part of the time is spent working in a company gaining on-the-job experience.

I’m starting to plan how I’m going to teach TDD, BDD, and how to use tools like Selenium, Fitnesse, TextTest and Cucumber. I think it’s going to be very hands on and practical, but also go into the general principles behind tool choice and writing maintainable automated tests. I’m helping to write a formal syllabus and exam, with criteria for grades awarded.

I guess what I’m trying to say is that I don’t like this strand of thought in the Software Craftsmanship movement that wants to abandon formal education. There are lots of ways to train software developers, and apprenticeship isn’t without its problems.

I think this is just the sort of thing we’ll be discussing at XP2011, where there will be a host of academics and experts from industry. Won’t you join us?

Last night at GothPy we had a play with Django, a web application framework for Python. I’m fairly familiar with Rails, so it was interesting to see a different take on solving the same kinds of problems.

I downloaded Django for the first time when preparing for the meeting, and spent about a day going through the tutorial and trying stuff out. At the meeting were a few people who have used Django before, notably Mikko Hellsing, who has worked with it daily for several years.

It was so much easier learning the tool from people who knew it than by reading the documentation. Constant code review and real time answering of questions is just brilliant for learning.

At the meeting we decided to implement a very very simple web application, similar to the one that Johannes Brodwall performed as a Kata at XP2010 together with Ivar Nilsen. He calls it “Java EE Spike Kata” since he does it in Java with no particular web frameworks, just what comes with Java. (There is a video of him doing it on his blog, and sample solution here on github).

I thought we should be able to implement the same little application in any web framework, and it might be interesting to see differences, so I though we should try doing it in Django. It just involves two pages. One page “Add User” which lets you create a new user and save it to the database, and another page “Search User” which lets you search for users, and presents results. So the scenario is to create a user, search for them, and see they are returned.

When I work on a problem in Rails I usually start with a Cucumber scenario for the feature, and I discovered there is a python version of Cucumber called Lettuce. We could have course have just used Cucumber with python, but given the big “WARNING – Experimental” notice Aslak wrote on this page, I thought we could give Lettuce a try.

So at the meeting we all worked together with one laptop and a projector, (me at the keyboard), and we started with a Lettuce scenario. We implemented step definitions using Django’s test Client, which is a kind of headless browser that understands how to instrument a Django application. Then we spent a couple of hours writing Django code together until the scenario was all green.

The code we ended up with isn’t much to write home about, but I’ve put it up on github here.

What we learned from this exercise
Of course since I know Rails much better, I found it interesting to compare the two web frameworks. Django seems similar in many ways. It was dead easy to get up and running, it gives you a basic structure for your code, and separates the model from the presentation layer.

The O-R mapping looks quite similar to ActiveRecord in the way you declare model classes backed by tables in the db. The presentation layer seems to have a different philosophy from Rails though. The html view part is rather loosely coupled to your application, and doesn’t allow you to embed real python code in it, just basic control structures.

You hook up the html to the controller code using url regular expression matching. I was a little confused about exactly how was supposed to work, since what I considered controller code was put in a file called “views.py”. Most of the code we wrote ended up in here, and I feel a bit unhappy with it. It seems to be a mixture of stuff at all levels of abstraction. The Django Form objects we used seemed quite powerful though, and reduced the amount of code we had to write.

The biggest difference I noticed compared with Rails was how explicit Python is about where stuff comes from. You always have to declaratively import or pass stuff before you can use it, so I found it straightforward to follow the connections and work out which code was executed and where it came from. I think that is because of Python’s philosophy of strict namespaces, which Ruby doesn’t have so much of.

I also liked the way Django encourages you to structure a larger web application into a lot of independent smaller ones, and lets you include and re-use other people’s small apps. Rails has plugins too of course, but I thought Django’s way made it seem very natural to contribute and re-use code.

Comparing Lettuce to Cucumber, they look almost the same, (by design). One small difference I found was that Cucumber step definitions don’t care if you write “Given”, “When” or “Then” in front of them, whereas Lettuce did. So I had steps like this:

Then I should see "results found"
And I should see "Name"

where the first step passed and the second step was reported as unimplemented by Lettuce. So Lettuce need a little more work to be really usable.

I was also pretty disappointed by the Django test client for implementing test steps. It seemed to interact with pages at an abstraction layer lower than I am used to – at the level of making post and get requests and parsing html as a dom. I missed Capybara and its DSL for interacting with web pages. I couldn’t find any equivalent. I probably should have turned to Selenium directly, but since we had no client side javascript it seemed overkill. (Francisco Souza has written about using Lettuce with Selenium compared with the Django test client here).

When it comes to unit-level testing, Django provides an extension to the normal unittest tool that comes with python (unittest was originally based on JUnit). We didn’t do much with it at the session, and it seemed to work fine. It’s nothing like RSpec though, and I miss that expressiveness and structure for my tests whenever I work in Python.

Overall
All in all it was fun to look at Django with a group and to get some really helpful comments and insights from people who know it far better than I do. The Kata we chose seemed to work ok, but I should really do it again in Rails since I spent the whole time comparing Django with Rails in my head, not Django with Johannes’ Java code 🙂

My conclusions are basically that Django looks like a good alternative to Rails. It would take time to learn, and surely has strengths and weaknesses I can’t really evaluate from one short session looking at it. However, I’d fairly sure I’d have to do some work improving the testing tools if I was going to be happy working with it for real.

Yesterday, Geoff gave his talk about agile GUI testing. For anyone who missed it, here is a video of him giving roughly the same talk earlier this year at Europython. Gojko Adzic has also blogged about what he said, which is exactly the kind of feedback we came to this conference for. In his post Gojko explains Geoff’s testing approach, and seems quietly positive about it, at least compared to the “sine of death” you get with other UI test automation tools. His conclusion that it looked more suitable for legacy code than greenfield development is a little uncomfortable though. We think it works there too 🙂

 

Today I’m giving my talk about teaching Test Driven Development via Coding Dojos. I’m looking forward to some feedback from the community about that. In the meantime I’ve written a bit about the three keynote talks we listened to yesterday.

 

Lisa Crispin

The day started with a keynote from Lisa Crispin titled “Agile Defect Management”. The overall message was to “lower the bar!” and aim to reduce defect count as far as possible, even to the point where a defect tracking software becomes superfluous. There was a lot of talk about whether such a tool was needed, and in what situations, and she gave a good overview of the state of the art of thinking in this area.

 

This was a good talk, with audience involvement, by someone who knows what they are talking about. The thing is I have higher expectations from a keynote. I expect to come away inspired and challenged, with some new insight to take back to my daily life. For me, this talk didn’t really deliver that. Lisa concentrated too much on the specific question of whether to use an electronic defect tracking tool or not, and didn’t sufficiently put that question in to the wider context given in the title of the talk “agile defect management”. I was disappointed to find nothing really original or surprising in what Lisa said.

 

Linda Rising

This was a good talk to hold straight after lunch when everyone is a bit sleepy. Linda spoke very amusingly on the subject “Deception and Estimation: How we Fool Ourselves”. She began by inviting us to see this as “the weird talk” of the conference, and that she was going to go through some of the latest scientific research in the area of cognition and psychology and hoped to relate that to why we have such trouble making good estimates in the context of a software project.

 

A self-confessed scientific amateur, she stated up front that she wouldn’t provide references to the research she mentioned, although she could give them to you if you emailed her and asked. Linda then proceeded to relate a series of amusing anecdotes designed to illustrate how irrational and over-optimistic people can be. (Markus Gärtner has blogged about what they all were). Towards the end of the talk, Linda began to relate all this to the subject of estimation, and told a story about some conference where she met people who were trying to apply scientific methods, statistics and data mining to the problem of improving estimates in software projects. In my mind, a seemingly a rational response to the problem of irrational, over-optimistic people.

 

Linda then did what I saw as a complete about-turn in her argument. She quoted one proponent of this “scientific” approach as saying, “well we can’t just make up a number, can we?”. Well no, we can’t, Linda just spent the last half hour convincing us humans are over-optimistic and irrational, and you can’t trust them to make up numbers. Yet that seemed to be exactly what Linda then proposed we do in agile. The points about the way agile overcomes this natural human over-optimism by for example breaking down problems into thin slices, using the wisdom of the crowd and enforcing tight feedback cycles, all kind of got crammed in at the end with little or no explaination.

 

Quite apart from those specific criticisms of her argument, as someone with a scientific training I didn’t like the way Linda protrayed science and scientists. They initially appeared in Linda’s talk as white-coated oracles who make pronouncements of the truth. “80% of drivers think they are above average”. “You eat more at an all-you-eat buffet”. “Online daters lie about their age and weight”. She then attempted to shatter this illusion of scientific infalliability by quoting Planck, who pointed out that the scientific process doesn’t always proceed in an orderly manner, and sometimes new and better theories only really catch on when the older generation who invented the previous ones atually die off.

 

Yes, scientists are human too, and you do them a disservice when you only present their results as received truths, without references, and without explaining either the methods they used to reach the conclusions, or what they themselves think of the wider applicability of the results.

 

This could have been a far more interesting talk about actual recent research studies – what’s coming out of the latest brain imaging techniques, for exaple. Linda could also have spent more time explaining how agile works with human nature to provide better estimates and plans. For all that it made me laugh, all this talk left me with was a bunch of amusing anecdotes and an uneasy feeling that agile was anti-science.

 

Elisabeth Hendrickson

The last session of the day saw Elisabeth Hendrickson presenting “Lessons Learned from 100+ Simulated Agile Transitions”. With a huge amount of energy and panache, Elisabeth strode around the stage, explaining what happens in an exercise which she usually does with a group of 8-20 people over the course of a whole day. Within the framework of this simulation, she drew out stories and anecdotes to illustrate such diverse subjects as the Satir change model, how physical layout affects communication, the difference between status meetings and communication meetings, and tests as alignment tools.

 

This talk was definitely the highlight of my day. Elisabeth took some things I kind of knew about, and made me think about them in a different light, from a different angle, and in a new context. I was challenged to go back to my standup meetings and make sure they really are about communication, and that my task board really does make status visible. I have a additional way of explaining ATDD to people – in terms of aligning developers and other stakeholders and getting them moving in the same direction.

 

Having said all that, I do have a criticism (are you surprised?!). Elisabeth released her slides under the creative commons license, but she does not release the details of her simulation. This would effectively prevent anyone else from running it. I think this is rather like a tool vendor who presents a new testing approach, which by the way, you can’t use without either buying their tool, or spending a large amount of time and money developing your own version. Those kinds of talks don’t tend to get accepted at this kind of conference.

 

I was disappointed that Elisabeth didn’t release her simulation materials, and I’m not sure why she doesn’t want to. She is obviously a fantastic agile coach and facilitator, and has more invitations to speak than she has time or inclination to accept. It would surely only enhance her reputation to make the agile transition simulation game materials available.

 

Update: I talked to Elisabeth afterwards about her materials, and she related a story about another independent consultant she knows, who arrived at a client site ready to do a simulation exercise that he had designed, only to discover the participants had done the exact same exercise the week before with a different consultant! The other consultant had just taken the material without permission or acknowledgement. Elisabeth doesn’t want to end up in the same situation, and I can understand that. She did say she could release more information about the simulation though, enough that you could understand how it is designed, and perhaps build your own similar one. I think that would be a reasonable compromise. I just felt slightly cheated after her keynote – I wanted to look at this simulation, poke at it, see how it works and understand why she could use it to generate so many great insights into agile transitions.