Archive for the ‘Opinion’ Category

I forget exactly when, but I think it was 2008 or 2009. Anyway, I was at a software conference, and I was chatting with a developer after one of the sessions about cool new technologies and stuff. I don’t remember what hot new thing it was we talked about, all I remember, is the shoes he was wearing!

credit: flickr, Steve Hodgson

This is rather an unusual style of running shoe. At the time, I’d never seen any like this before, and I was intrigued. It turns out that this developer I was talking to was, like me, also something of a serial early adopter, it’s just that he not only picked up shiny new programming tools and technologies.

At the time, I was running in a pair of shoes with thick heel padding the shop assistant had assured me would correct my bad posture and foot “pronation”. This guy’s “five finger” shoes had none of that, in fact quite the opposite. I was looking at disruptive running technology.

The conversation quickly switched from the latest programming tools and frameworks, as this guy explained the essential benefits of his shoes:

  • Lightweight
  • Your toes can spread out, giving better push-off from the ground
  • Thin sole – you adapt your stride to the surface because you can feel it
  • No heel padding, means you strike the ground with the whole foot simultaneously.

And of course, most importantly for a technology enthusiast:

  • People stare at you feet!

Following this conversation, a little googling about and watching the odd video by a “running style expert”, I became convinced. To be honest, it wasn’t much contest – shiny new technology, being in with the cool kids – I bought some new shoes, with toes and everything!

It took me several weeks to get used to them. You have to start with short distances, build up some new muscles in your foot, and learn to strike the ground with the whole foot at once. In my old shoes, I had a tendency to strike heel-first, because of the huge wad of padding on the sole, but in these minimal shoes, that just hurt. It was useful feedback, the whole-foot-at-once gait is supposed to be better for your knees.

After a few weeks of running shorter distances, slower than before, I gradually found my stride, and really started to enjoy running my usual 7km circuit of the local forest, in my eye-catching five-fingered shoes.
Unfortunately it didn’t last. Maybe two or three months later something happened. I think the technical term for it is Swedish Autumn. It turns out that forest tracks gain a surprising number of cold, muddy puddles at that time of year! Shoes with a very thin sole that isolate and surround each individual toe in waterlogged fabric, mean absolutely freezing feet 🙁

So I’m back on the internet, looking for new, shiny technology to fix this problem, and of course, I buy some new shoes. This time I got a pair of minimalist shoes in waterproof goretex, with basically all the features of my old shoes, minus the individual toes.


I was back out on the forest track, faster than ever, with dry, comfortable feet – win! The only problem was, people were no longer staring at my eye-catching toes. So you can’t have it all!

So this is normally a blog about programming. What’s going on?

Test Driven Development as a Disruptive Technology

I’ve been thinking about this, and it seems to me that as with running shoes that have toes, TDD is something of a disruptive technology. Just as I haven’t seen the majority of runners switch to shoes with toes, I also havn’t seen the majority of developers using TDD yet. Neither seem to have crossed Geoffrey Moore’s “chasm”.

Geoffrey Moore's technology adoption distribution showing the chasm

Lots of developers write unit tests, but I think that’s slightly different. I’m talking about a TDD where developers primarily use tests to inform and direct design decisions, and rely on them for minute-by-minute feedback as they work. In 2009, Kent Beck made an observation in his blog that “the data suggests that there are at most a few thousand Java programmers actively practicing TDD”. I don’t think the situation is radically different today.

So can we learn anything about TDD from the story about running shoes? A couple of points I find relevant:

  • Early adopters will try a new technology based on really very flimsy evidence, and will persevere with it, even if it slows them down in the short term.
  • Early adopters like to look cool and stick out.

You may think that last point is just vanity, but actually, being a talking point helps drive adoption, but primarily amongst other, similarly minded technology geeks.

I remember a while back I was at work, writing some code, when a guy from another team came over to ask me something. He was about to leave, when he did a double-take and stared at my screen for a moment. “Are you doing TDD? I’ve never seen anyone actually do that in production code. Do you mind if I watch?”. So, you see, eye-catching shiny new technology, and I’m one of the cool kids, about to be emulated by the guy in the next team. 🙂

The other part of this story, is of course the compromise I made when the cool technology met the reality of a muddy Swedish forest track. The toes went, but the shoe I ended up with is still radically different from the one I had before. I think that for TDD to reach the mainstream, it may need to become a little less extreme, a little more practical – but without losing the essential benefits.

What are the essential benefits of TDD? Well, I would say something like this:

  • Design: useful feedback, pushing you away from long methods and tightly-coupled classes, because they’re hard to test.
  • Refactoring: quickly detecting regression when you make a mistake
  • Productivity: helping you to manage complexity and work incrementally

So is it possible to get these things in another way? Without driving development minute-by-minute with tests? Well, that’s probably the subject of another blog post…

You might be interested to watch a video of my recent keynote speech at Europython, where I told this story.

Last week I created a little quiz  and put a link to it on Twitter. I was interested to see whether the terminology around Test Doubles has standardized on Gerard Meszaro’s definitions, from his book “xUnit Test Patterns“, and I thought my twitter followers (I have over 1000 now!) might be able to tell me.

The quiz was taken by nearly 150 people, and overall, my conclusion is that Meszaro’s definitions are in general use today, (at least amongst people who follow me on Twitter). You can see a summary of the responses here. (I’ve closed the survey now, btw).

The first question was “Which of these is not a kind of Test Double”, and anyone who thought a “Suite” was a kind of test double clearly hasn’t got a clue, so I excluded their answers from my analysis. That was only a handful of responses though.

Looking at the remaining answers, between 70 – 85% share my understanding of what each kind of test double is. The scores are noticably a little lower for ‘Spy’, and actually I think my question on Spies was not very good. Jason Gorman kindly sent me a few tweets pointing out what he thought was unclear about it. The answer I was looking for to distinguish a Spy was “A Spy may raise an exception in the ‘Assert’ part of the test.”.  I was trying to articulate the difference between a “Spy” and a “Mock”, but I’ve personally only used Test Spy frameworks like Mockito and unittest.mock. Clearly I have more to learn.

I got the Mock question right, “A mock will fail the test if its methods are not called as expected”, but I thought a Spy was basically the same, except it fails the test in the “Assert” part instead of the “Act” part. Jason kindly pointed out that a Spy could be a Stub, a Fake, or even a decorated ‘real’ object. You don’t necessarily need to use a framework. The distinguishing feature of a Spy is that it records information about interactions, that your test can query later. So that was a good result from my quiz – I learnt something about Spies!

I also asked whether people had read Gerard Meszaro’s book, or my work-in-progress book, because I was interested to see if people would give better answers in this case. When I excluded all the people who said they hadn’t read either book, (about a third of responses), the scores improved significantly – over 80% agreed with me about the definitions of Mock, Stub and Fake. For Spies, on the other hand, the score was lower than for the group as a whole! That was clearly my fault…

Some people tried to argue that the distinction between the various kinds of test double is not interesting, that it doesn’t matter. I disagree. I think each kind has a different purpose and different situations when it is appropriate to use. If you can’t articulate what kind of test double you’re using, then there’s a very real danger you’re using it wrongly, or that better alternatives exist. Like – using several Mocks in the same test case, and finding it very fragile, when Stubs would be sufficient. Or – creating a new Stub for every test case, when it would be less work, and make for clearer tests, if you had a Fake that you could re-use in several.

So my conclusion from this quiz, and my other research on the subject, is that it’s worth using Meszaro’s definitions when discussing Test Doubles. A lot of programmers have a good working understanding of them, and do know the difference between a Stub and a Mock. So that’s very encouraging! One of the reasons Meszaros invented the “Test Double” terminology was to clear up the confusion that reigned at the time. It seems to me we’re in a much better position today because of his work. I do think there is more to do, though, I still see people using Test Doubles badly. Which is partly the motivation for my new book, which is about various techniques in Test Driven Development, not just Test Doubles actually. I’d better go and do some work on the section on Spies…

A while back, Gojko Adzic published this article “Redefining Software Quality” and I think it’s pretty insightful, pointing out that we often expend a lot of effort ensuring quality at lower levels of the pyramid, when we should perhaps be investing higher up.

I wanted to work out what testing activities you’d do to ensure quality at different levels of Gojko’s Quality Hierarchy, so I began by thinking about the software testing quadrants. The testing quadrants were originally documented by Brian Marick in his blog, and later developed by Lisa Crispin and Janet Gregory in their book “Agile Testing“. (Here is a slide deck by Janet Gregory that gives a summary of the quandrants).

agile testing quadrants

I like the agile testing quadrants because they help you to reason about different testing activities and why you’re doing them. They make it clear that in agile, testing has this big role in supporting the team – spreading knowledge about what’s being built and enabling the team to be agile about feature changes. In more traditional projects, testing focuses almost exclusively on the right side of the quadrants, missing this role entirely.

Anyway, if you put the agile testing quadrants alongside Gojko’s quality hierarchy, I think you get something like this:

quality hierarchy and testing quadrants

Clearly, Q1 and Q2 are all about ensuring functionality basically works, and usually Q2 tests will run against deployed software, (deployed in a test environment). Q4 tests cover things like performance and security, and usability falls under Q3. I think things get a little more tricky for the higher levels though. There’s a distinct danger we’ve just run out of quadrants!

Beyond the testing quadrants

To test a piece of software is useful or successful, you’ll need to look at ideas that are relatively new in Agile. In his article, Gojko of course points out his book on Impact Mapping, and mentions Feature Injection and Lean Startup. Lisa and Janet published their book in 2009, “Lean Startup” by Eric Ries came out in 2011, Gojko’s “Impact Mapping” book is from 2012, and correct me if I’m wrong, but I don’t think there is a book on Feature Injection yet.

Agile ideas are moving on, especially compared with where they were when methodologies like Scrum and XP were originally documented a decade or more ago. I think it’s clear testers need to embrace new ideas and practices too.

If you look at Lean Startup, one of the concrete ideas is to do A/B testing of new features – that is, you divide your users into an “A” group, who see the new feature and a “B” group who don’t. Then you measure how each group behaves, and from that draw conclusions about whether the feature was any good. If the “A” group buys more stuff, spends more time on the site, generally acts like they are happier than the “B” group, you keep the feature, otherwise it gets dropped. It’s a bit like a trial of a new medicine – you compare the patients who get it with a control group who don’t, before you decide whether to approve it.

It seems to me that testers should be involved in designing and performing A/B tests – they’re well positioned with critical thinking skills, knowledge of the user, and technical automation skills. The results of such tests should tell us about whether users find a new feature useful. So that should get us up another level on Gojko’s pyramid:

quality hierarchy and testing quadrants - with A/B

The last level is fairly dependent on how you define success, but for a lot of software products, success means lots of users. Lean startup has another interesting idea here – the “Net Promoter Score”. Basically you ask a small group of initial users if they’d recommend your product, and make a simple calculation to predict if your user base is going to grow when you release it more widely. It’s an idea for what to put at the top level:

quality hierarchy and testing quadrants - with net promoter score

Conclusions

Of course, for many teams, it’s enough of a struggle to test for quality at the lower levels of the pyramid, without worrying about A/B tests or Net Promoter score! James Shore and Diana Larsen have come up with a model of “Agile Fluency” which I think is relevant here. Basically it outlines how, as teams get better at agile, their practices change. The three star level of fluency seems to contain a lot of ideas from Lean Startup, and optimizing for quality at the top two levels of Gojko’s pyramid. At one and two star fluency, delivering business value on the market cadence, just the testing quadrants get you a long way.

Changing testing activities also implies changes for testers. The role of tester has already changed with the advent of agile methods, and I predict it’s going to continue changing. I see a technical tester role appearing that is pretty close to business analyst, doing things like supporting the team with test automation and data analysis of A/B tests. So testers: get a head start and find out about Lean Startup!

The world need more and better programmers. Jason Gorman recently wrote this post encouraging people to start offering software apprenticeships, as an alternative to computer science degrees.

He writes:

“our computing education in [the UK] is preparing students for a career in a version of computing most of us don’t recognise. Students devote the majority of their time learning theory and skills that they almost certainly won’t be applying when they get their first proper job. Computing schools are hopelessly out of touch with the reality of computing in the real world. While employers clamour for TDD or refactoring skills, academics turn their noses up at them and focus on things like formal specification and executable UML and compiler design, along with outdated and thoroughly discredited “software engineering” processes.” — Jason Gorman

Jason ends his post with a call to arms – if you’re a good software developer, get yourself an apprentice, and start training them. It’s the same message I heard from Dave Hoover when he visited Göteborg recently. I think he also sees a multi-year apprenticeship as a better alternative for training programmers than a computer science degree.

I also recently came across this article, written by a computer science teacher in the US, with the following paragraph:

“I no longer teach programming by teaching the features of the language and asking the students for original compositions in the language. Instead I give them programs that work and ask them to change their behavior. I give them programs that do not work and ask them to repair them. I give them programs and ask them to decompose them. I give them executables and ask them for source, un-commented source and ask for the comments, description, or specification. I let them learn the language the same way that they learned their first language. All tools, tactics and strategies are legitimate. ” — William Hugh Murray

So clearly some academics are teaching in creative ways. Rather than abandoning computer science degrees, might it not be better to improve their content?

One of the things about the XP conference is that it brings together industry and academics, and lets them hear from one another. How to teach programming is a very important topic that is often discussed there. XP2005 for example was held at Sheffield university, where I remember chatting to one of the professors, and being impressed by the way they used eXtreme Programming as part of their undergraduate course.

Another thing that happened at XP2005 was the first coding dojo I attended, and I believe the first one ever held outside of France. It was presented by Laurent Bossavit and Emmanuel Gaillot, founders of the Paris dojo. I was excited to discover a context in which I could improve my practical programming skills, in regular short bursts, alongside a continuing paid job.

So one of the things I do in my new life as an agile testing consultant is to use the coding dojo format to teach people how to program better. We’ll do code kata exercises and practice Test Driven Development, Refactoring, and discuss what Clean Code looks like. So far the reaction from professionals I’ve done this with has been very positive. Lots of people who have been coding for years appreciate the chance to learn new practical skills.

I’m also getting involved in more formal education, this spring I’m teaching a three week course in automated testing, as part of a “Kvalificerad Yyrkesutbildning” in software testing. This is a one year full time course for students wanting to learn a practical skill, as an alternative to going to university and studying a more academic subject. In Sweden you can get a student loan while you’re studying this course, and part of the time is spent working in a company gaining on-the-job experience.

I’m starting to plan how I’m going to teach TDD, BDD, and how to use tools like Selenium, Fitnesse, TextTest and Cucumber. I think it’s going to be very hands on and practical, but also go into the general principles behind tool choice and writing maintainable automated tests. I’m helping to write a formal syllabus and exam, with criteria for grades awarded.

I guess what I’m trying to say is that I don’t like this strand of thought in the Software Craftsmanship movement that wants to abandon formal education. There are lots of ways to train software developers, and apprenticeship isn’t without its problems.

I think this is just the sort of thing we’ll be discussing at XP2011, where there will be a host of academics and experts from industry. Won’t you join us?

Last night at GothPy we had a play with Django, a web application framework for Python. I’m fairly familiar with Rails, so it was interesting to see a different take on solving the same kinds of problems.

I downloaded Django for the first time when preparing for the meeting, and spent about a day going through the tutorial and trying stuff out. At the meeting were a few people who have used Django before, notably Mikko Hellsing, who has worked with it daily for several years.

It was so much easier learning the tool from people who knew it than by reading the documentation. Constant code review and real time answering of questions is just brilliant for learning.

At the meeting we decided to implement a very very simple web application, similar to the one that Johannes Brodwall performed as a Kata at XP2010 together with Ivar Nilsen. He calls it “Java EE Spike Kata” since he does it in Java with no particular web frameworks, just what comes with Java. (There is a video of him doing it on his blog, and sample solution here on github).

I thought we should be able to implement the same little application in any web framework, and it might be interesting to see differences, so I though we should try doing it in Django. It just involves two pages. One page “Add User” which lets you create a new user and save it to the database, and another page “Search User” which lets you search for users, and presents results. So the scenario is to create a user, search for them, and see they are returned.

When I work on a problem in Rails I usually start with a Cucumber scenario for the feature, and I discovered there is a python version of Cucumber called Lettuce. We could have course have just used Cucumber with python, but given the big “WARNING – Experimental” notice Aslak wrote on this page, I thought we could give Lettuce a try.

So at the meeting we all worked together with one laptop and a projector, (me at the keyboard), and we started with a Lettuce scenario. We implemented step definitions using Django’s test Client, which is a kind of headless browser that understands how to instrument a Django application. Then we spent a couple of hours writing Django code together until the scenario was all green.

The code we ended up with isn’t much to write home about, but I’ve put it up on github here.

What we learned from this exercise
Of course since I know Rails much better, I found it interesting to compare the two web frameworks. Django seems similar in many ways. It was dead easy to get up and running, it gives you a basic structure for your code, and separates the model from the presentation layer.

The O-R mapping looks quite similar to ActiveRecord in the way you declare model classes backed by tables in the db. The presentation layer seems to have a different philosophy from Rails though. The html view part is rather loosely coupled to your application, and doesn’t allow you to embed real python code in it, just basic control structures.

You hook up the html to the controller code using url regular expression matching. I was a little confused about exactly how was supposed to work, since what I considered controller code was put in a file called “views.py”. Most of the code we wrote ended up in here, and I feel a bit unhappy with it. It seems to be a mixture of stuff at all levels of abstraction. The Django Form objects we used seemed quite powerful though, and reduced the amount of code we had to write.

The biggest difference I noticed compared with Rails was how explicit Python is about where stuff comes from. You always have to declaratively import or pass stuff before you can use it, so I found it straightforward to follow the connections and work out which code was executed and where it came from. I think that is because of Python’s philosophy of strict namespaces, which Ruby doesn’t have so much of.

I also liked the way Django encourages you to structure a larger web application into a lot of independent smaller ones, and lets you include and re-use other people’s small apps. Rails has plugins too of course, but I thought Django’s way made it seem very natural to contribute and re-use code.

Comparing Lettuce to Cucumber, they look almost the same, (by design). One small difference I found was that Cucumber step definitions don’t care if you write “Given”, “When” or “Then” in front of them, whereas Lettuce did. So I had steps like this:

Then I should see "results found"
And I should see "Name"

where the first step passed and the second step was reported as unimplemented by Lettuce. So Lettuce need a little more work to be really usable.

I was also pretty disappointed by the Django test client for implementing test steps. It seemed to interact with pages at an abstraction layer lower than I am used to – at the level of making post and get requests and parsing html as a dom. I missed Capybara and its DSL for interacting with web pages. I couldn’t find any equivalent. I probably should have turned to Selenium directly, but since we had no client side javascript it seemed overkill. (Francisco Souza has written about using Lettuce with Selenium compared with the Django test client here).

When it comes to unit-level testing, Django provides an extension to the normal unittest tool that comes with python (unittest was originally based on JUnit). We didn’t do much with it at the session, and it seemed to work fine. It’s nothing like RSpec though, and I miss that expressiveness and structure for my tests whenever I work in Python.

Overall
All in all it was fun to look at Django with a group and to get some really helpful comments and insights from people who know it far better than I do. The Kata we chose seemed to work ok, but I should really do it again in Rails since I spent the whole time comparing Django with Rails in my head, not Django with Johannes’ Java code 🙂

My conclusions are basically that Django looks like a good alternative to Rails. It would take time to learn, and surely has strengths and weaknesses I can’t really evaluate from one short session looking at it. However, I’d fairly sure I’d have to do some work improving the testing tools if I was going to be happy working with it for real.