Archive for the ‘Experience Report’ Category

On Saturday I was up in Stockholm facilitating my fourth code retreat for Valtech, and my second Global Day of Code Retreat. It seemed to go very well. I tried out a few new elements, which seemed to make it go even better than the previous ones, so I thought I’d talk about them here in my blog.

The first thing we changed was that we had quotas for men and women, and we were aiming for a 50/50 gender balance. About a month before the code retreat, we were fully booked, with 20 places each for men and women taken. In the end several people dropped out at the last minute, and there were slightly more men. Still, it was a much better balance than last year. I think it made for a healthier atmosphere during the day, and we encouraged more women to be active in the community generally.

The second thing we changed was I didn’t require people to do Game of Life, I suggested some other code katas as alternatives. Game of Life is an excellent kata for practicing skills like writing clean code, simple design, and TDD generally. It’s also really fun to be doing an excercise that thousands of other programmers are also working on. However, I realized that around 1/3 of the people who’d signed up for the event had already been to a previous code retreat, and might be getting bored with Game of Life. We’re there to have fun, after all!

As I’ve been working on my book, and running various coding dojos recently, I’ve been thinking hard about TDD and how to learn it. It’s a multifaceted skill, and some katas allow you to focus on particular aspects of it, which can make practice more rewarding. For a complete beginner to TDD, Game of Life is actually quite hard to get started with, I’ve seen a lot of people struggling with it.  So in my introduction I highlighted four other katas people could choose from, as well as Game of Life:

  • String Calculator – good for the TDD newbie, since it really leads you by the hand
  • Tennis – good for practicing refactoring
  • GildedRose – good for practicing writing really good tests (and refactoring)
  • TyrePressure – good for understanding SOLID principles

People seemed to generally appreciate having a choice. Some pairs chose a String Calculator for their first couple of sessions, to get used to TDD, then went on to Game of Life for the others. Some pairs tried out Tennis, Gilded Rose and Tyre Pressure in the afternoon, instead of one of the other challenges. Some pairs who were already good at TDD tried out String Calculator in Clojure, and found it let them concentrate on the learning the language not solving the problem. One person, (who has been to 3 code retreats previously), didn’t do Game of Life all day, and said he really enjoyed himself and learnt loads!

The third thing we changed was that I suggested people try cyber-dojo as a coding environment. This is a tool designed by Jon Jagger as an aid to practicing your coding skills. It provides a basic editor/testing environment, and records the state of the code every time you run the tests. You can review all your changes in the session retrospective, and get a picture of how well your TDD was going.

Before the code retreat, I had set up cyber dojos for Game of Life, and Tennis. I put up the cyber-dojo ids on the whiteboard, and through the day most of the pairs tried it out for one or more sessions.

As a facilitator, I found it really helpful when going up to a pair, to be able to see the little row of traffic light symbols at the top of their screen. I could quickly get an overview of how their session was going. For example, if I see a few red-green-yellow sequences then I can infer they are probably doing ok at TDD. If I can see nothing bug a long string of yellow, then I know they’re in trouble!

For the pairs, some liked cyber-dojo because it meant they didn’t waste any time setting up their coding environment at the start of the session. For those doing Tennis, if they made a refactoring mistake, they could very easily just click on the most recent green traffic light to revert back to a known good state, and practice doing the refactoring again. Some pairs found it helpful in the retrospective to review their session in the tool.

Other pairs didn’t get on so well with cyber-dojo. Some couldn’t live without their usual editor commands. Some got confused by the lack of syntax highlighting. A couple found a bug in cyber-dojo that it lost touch with the server and stopped updating their test results, whatever they did to the code. (The workaround is to open the same url in a new window, btw).

I know Jon is aware of this bug, and I’m sure he’ll track it down, but actually the lack of syntax highlighting and editor commands is deliberate. It’s the same idea as the other challenges, like mute pairing etc – to throw you out of your comfort zone and make you really concentrate on the way you code.

So that was the three things we changed – gender balance, other katas, cyber-dojo. What we didn’t change was the overall format or aims of the day. We still paired for 5 sessions of 45 minutes, and threw away the code afterwards. We still had coding challenges like “Mute pairing with find the loophole” and “Maximum 4 line methods”, and held retrospectives after every session. I think deliberate practice in a group is at the core of what makes Code Retreat (and coding dojos!) fun and valuable. The changes we introduced were all designed to enhance the learning experience, and were based on experience and reflection after previous events.

Overall I was very pleased with all the changes we introduced, and I’d like to do it like that again. If you’re facilitating or organizing a code retreat, perhaps you’d like to introduce some of these elements too.

I’ve heard Joseph Wilk speak before, so I was delighted when he accepted our invitation to come to Scandinavian Developer Conference and share some ideas around “Acceptance testing in the land of the Startup”. In this post I aim to summarize my understanding of what Joseph said. I think what he is doing at Songkick is really interesting and innovative, and is applicable beyond that particular environment.


Joseph began by introducing what he meant by “Startup” using Eric Ries‘ definition:

“a human institution designed to create new products and services under conditions of extreme uncertainty”

“Acceptance testing” is usually about checking the product meets the previously agreed requirements – but if everything is changing all the time, that is not as simple as it sounds. Joseph’s acceptance tests reflect what “done” looks like in conditions of extreme uncertainty.

Joseph talked about the Cynefin model (by Dave Snowden) and how working in a startup feels like you’re in chaos a lot of the time, although parts of the business may be merely complicated or even simple. An effective strategy in the “chaos” part of the Cynefin model is “Act, Sense, Respond”, which he related to the lean startup cycle of “Build, Measure, Learn”.

The “Build” part of that cycle for their software product often takes more time than the other parts, and work should be coordinated between several people. During the “Build” phase you can use automated tests to help with communication and feedback. In the “Measure” phase, you’re primarily looking for feedback from the actual end users, rather than automated tests.

The rest of the talk was largely about how Songkick use automated tests during the “Build” phase, and how they do acceptance testing with end users in the “Measure” phase.

The startup

The software Joseph is working on is for the startup Songkick which is devoted to live music. They are about 20-30 people, and the majority are programmers. They have two dedicated UX specialists and two QA specialists. Most people are multi-skilled, care deeply about the quality of the product, and are not afraid to learn how to help outside of their speciality.

Songkick holds three principles dear:

  • Embrace Change
  • Usability
  • Testing

The rest of his talk was about the concrete practices they use around this last principle – Testing. Joseph stressed that he was reporting what they had found to work, and not holding up any “best practice” or industry gold standard.


At Songkick, they use Behaviour Driven Development. They are continually building and improving a “ubiquitous language” to describe their system, by writing Features and Scenarios in a semi-formal syntax. The tool Cucumber is used to turn these features and scenarios into Executable Specifications (aka Automated Tests).

Joseph gave a short intro to how Cucumber works – take a look at this site if you’re not familiar with it. In short, Cucumber lets you define many “Features”, each of which define numerous “Scenarios” which exemplify the desired feature behaviour. It requires you use a particular syntax called “Gherkin”. The gist of it is that you use the following formats when writing features and scenarios:

“In order to… I want… So that…”

“Given… When… Then…”

If you put in some extra work and also write “Step Definitions”, your scenarios become executable, and can automatically check the application behaves as expected when run.

The primary purpose of the Cucumber specifications is to promote communication within the team: the words are carefully chosen to have clear meanings related to the domain of the system you’re building.


In Lean Startup you don’t talk so much about building a “Feature”, more about having a “Hypothesis”. As in “if we add a pink button here, do we get more traffic to the site?”

Joseph showed the following example of a Feature with a Hypothesis:

Facebook signup
In order to increase signup
I want visitors to sign up via Facebook
So that we see a 10% increase in signups

That last line is the measurable part of the hypothesis, that makes it more than just a normal Cucumber Feature. When you’ve delivered a system that has this feature, it should lead to 10% more signups. If it doesn’t, the system might be modified to remove this feature. Features written in this form (In order to… I want… So that…) are prioritized in the product backlog.

When a developer is ready to start working on a feature, she calls a meeting to discuss it. At the meeting there should be representatives from four different job roles: a developer, tester, business analyst and usability expert. Together they discuss what the feature is, the details of how it should work, and perhaps create some wireframes or UI mockups. They will also brainstorm perhaps 7-8 scenarios, such as:

  • successful signup via facebook
  • failed signup via facebook
  • facebook changes their API and no one can sign in any more

After the meeting, the developer starts working on the code to implement the feature, bearing in mind the discussion and the scenarios. The people in the other job roles will also get involved in development, and assessing whether the feature is ready for deployment. Some or all of the scenarios they have come up with will be automated as executable with Cucumber.

There is an overhead to automating scenarios with Cucumber: implementing “Step Definitions” takes time, and afterwards the scenarios are relatively slow to execute. Joseph said that they were more and more only automating a few scenarios for each feature, often just a happy and a sad path. The other scenarios would perhaps be used in unit tests, or exploratory testing. The most important aspect of writing many scenarios was to discover issues with a feature before implementation.

All their Cucumber features and scenarios are published in a searchable web interface called “Relish” which he said encouraged developers to use more business-friendly language, since they knew non-programmers would look there. Joseph said features containing incomprehensible technobabble like “Given the asynchronous message buffer queue has been emptied” (!) do appear occasionally, but don’t last long. Their ubiquitous language is highly visible and in daily use.

Actual Acceptance Tests

Songkick uses a Continuous Delivery approach, where anyone in the company can deploy the latest code, at the push of a button. They have invested a lot of effort to make it all automated, so that when anyone checks in code changes, Jenkins builds it, runs all the tests, (including Cucumber specs), and allows anyone to deploy any successful build. He said deployment was so easy even a dog could do it, (and had on one occasion!).

Joseph explained that many features would be built in such a way that they were “turned off” in production for a few days while they were being built. Only once the whole feature was implemented and present would they activate the code, and perhaps only then for a subset of all the visitors to the site. In this way you can build a feature that takes a few days to implement, and continue to release the new code every few hours. The half built feature is being deployed, it’s just switched off.

The reason for all this effort to get code out of the door is that a feature can’t be considered Done until its hypothesis has been validated. As Joshua Kerievsky said:

“A story isn’t done until it’s being used by real users in production and has been validated to be a useful part of a product”

By deploying often, you get that confirmation or rejection as soon as possible, so you can evaluate whether the feature was worth building. “Build, Measure, Learn”. The “Measure” part happens after an acual customer has got hold of the code and started using it. That is the real acceptance testing that’s going on, not the Cucumber scenarios.

So that’s the way they work today, but it hasn’t always been quite like that. Joseph also talked about why they have arrived at this process.

The build time issue

The Cucumber scenarios are run at every checkin, and Joseph said that at one point it took 4 hours to run them all. This was a serious problem, delaying feedback and slowing the continuous deployment process to a crawl.

Their first approach was to use their technical skills to divide up the tests and run them in parallel on the cloud. The build time went down from 4 hours to just 16 minutes, but at a cost of about $7000 per month in cloud servers. That proved too expensive to be sustainable, so they reconsidered.

The automated tests are there to give confidence that the system works before you deploy. They realized that some of the tests gave more confidence than others, so they removed something like 60% of them. These less valuable tests were then run much less frequently. This took them off the critical path to deployment, and enabled more frequent releases. So far this strategy has led to only insignificant problems in production.

They also re-evaluated their policy for which scenarios to automate in Cucumber, and started putting much more emphasis on unit tests. When you have comprehensive Cucumber tests, it’s easy to have lots of confidence that the system works, and not bother with unit tests. Joseph said they were missing the design benefits of unit testing though – unit tests force your code to be decomposed into units! These days they write fewer Cucumber scenarios, and more unit tests.

Duplicating effort with QA

In addition to automated Cucumber scenarios, the QA people do manual testing before deployment. At one point the developers were surprised to learn than maybe 60-70% of these manual tests were covering functionality that was already well covered by Cucumber specs. They realized that the QA people had little confidence in the automated tests.

Once they had noticed the problem, they began to include the QA people more in the scenario writing process, so they could learn about what the tests did, and how they work. This helped give QA confidence to remove some of the manual tests, so now they only test the most crucial functionality manually before deployment.

Metrics as feedback

Joseph invented a tool called “Limited Red” that records metrics from every test run. It shows failure rates for each scenario, and correlates that with changes in the corresponding feature file, using the Git log. Using this data, he can plot graphs that show each feature, how many times it has failed, and how often it has been edited.

For example, he might find one of the features has failed 16 times, while the code in the feature was only changed 4 times. This could indicate the feature is testing an area of the code that is poorly implemented – the code is broken more often than the test is invalid. This gives developers feedback that they can act on to improve the quality of both the code and the Cucumber scenarios.

Joseph has a blog post with more detail about this tool and metrics approach.

My Concluding Thoughts

Most startups fail, and I have little insight into whether Songkick will be successful in the long run. I am fascinated though, by the way they are working. It seems to me they have a great process that promotes communication and feedback, coupled with enough introspection to adapt it as they learn more.

I’m interested to note that Songkick has extended the “Rule of Three” introduced by Lisa Crispin and Janet Gregory. In their book on “Agile Testing” they encourage you to include a tester in all discussions between a developer and customer. At Songkick they appear to have business analysts in the customer role, and additionaly include a UX (usability) expert in the discussion.

Joseph didn’t mention it, but I suspect that his experiences automating fewer scenarios with Cucumber may be one of the reasons Liz Keogh wrote her blog post “Step away from the tools“. Or maybe the blog came first, I don’t know. The advice is the same, in any case – BDD is about communication first, test automation second.

The “Limited Red” tool is quite new, and I think Joseph is still learning how to act on the data he’s gathering. Having said that, I have high hopes that eventually this kind of measurement and feedback will find its way into more automated testing tools and Jenkins builds. It seems to me that finding out which of your tests are giving the best value for money and which are costing the most to maintain will be generally useful.

My next challenge is to work out how to apply these kinds of ideas to the situation I find myself in right now. It’s a large company not a startup, the customer seems totally inaccessible, the release cycle is years not hours, the build is slow (when it works at all), and UX experts are thin on the ground. Hmm. As Joseph said, his story is about what they’ve found to work, not some universal “best practices”. It’s given me some new ideas and encouragement – now I just have to work out what principles and practices I can reasonably introduce where I am.

Notes 2013-5-13: one of Joseph’s colleagues wrote a post “From 15 hours to 15 seconds” explaining how they got the build time so low, a few months after I wrote this post. I also got a note from Liz Keogh explaining that her blog post “Step away from the tools” was written well before the events at Songkick, in March 2011, and that she consulted with them briefly after that, discussing the build time problem amongst other issues.

A little while back I was back in Stockholm facilitating another Code Retreat, (see previous post), this time as part of Global Day of Code Retreat, (GDCR). (Take a look at Corey Haines’ site for loads of information about this global event that comprised meetings in over 90 cities worldwide, with over 2000 developers attending.)

I think the coding community could really do with more people who know how to do TDD and think about good design, so I’m generally pretty encouraged by the success of this event. I do feel a bit disappointed about some aspects of the day though, and this post is my attempt to outline what I think could be improved.

I spent most of the Global Day of Code Retreat walking around looking over people’s shoulders and giving them feedback on the state of their code and tests. I noticed that very few pairs got anywhere close to solving the problem before they deleted the code at the end of each 45 minute session. Corey says that this is very much by design. You know you don’t have time to solve the problem, so you can de-stress: no-one will demand delivery of anything. You can concentrate on just writing good tests and code.

In between coding sessions, I spent quite a lot of effort reminding people about simple design, SOLID principles, and how to do TDD (in terms of states and moves). Unfortunately I found people rarely wrote enough code for any of these ideas to really be applicable, and TDD wan’t always helping people.

I saw a lot of solutions that started with the “Cell” class, that had an x and y coordinate, and a boolean to say whether it was alive or not. Then people tended to add a “void tick(int liveNeighbourCount)” method, and start to implement code that would follow the four rules of Conway’s Game of Life to work out if the cell should flip the state of its boolean or not. At some point they would create a “World” class that would hold all the cells, and “tick()” them. Or some variant of that. Then, (generally when the 45 minutes were running out), people started trying to find a data structure to hold the cells in, and some way to find out which were neighbours with each other. Not many questioned whether they actually needed a Cell class in the first place.

Everyone at the code retreat deleted their code afterwards, of course, but you can see an example of a variant on this kind of design by Ryan Bigg here (a work in progress, he has a screencast too).

Of course, as facilitator I spent a fair amount of time trying to ask the kind of questions that would push people to reevaluate their approach. I had partial success, I guess. Overall though, I came away feeling a bit disillusioned, and wanting to improve my facilitation so that people would learn more about using TDD to actually solve problems.

At the final retrospective of the day, everyone seemed to be very positive about the event, and most people said they learnt more about pair programming, TDD, and the language and tools they were working with. We all had fun. (If you read Swedish, Peter Lind wrote up the retrospective in Valtech’s blog) This is great, but could we tweak the format to encourage even more learning?

I think to solve Conway’s Game of Life adequately, you need to find a good datastructure to represent the cells, and an efficient algorithm to “tick” to the next generation. Just having well named classes and methods, although a good idea, probably won’t be enough.

For me, TDD is a good method for designing code when you already mostly know what you’re doing. I don’t think it’s a good method for discovering algorithms. I’m reminded of the debate a few years back when Peter Norvig posted an excellent Sudoku solver (here) and people thought it was much better than Ron Jeffries’ TDD solution to the same problem (here). Peter was later interviewed about whether this proved that TDD was not useful. Dr Norvig said he didn’t think that it proved much about TDD at all, since

“you can test all you want and if you don’t know how to approach the problem, you’re not going to get a solution”
(from Peter Siebel’s book “Coders At Work”, an extract of which is available in his blog)

I felt that most of the coders at the code retreat were messing around without actually knowing how to solve the problem. They played with syntax and names and styles of tests without ever writing enough code to tell whether their tests were driving a design that would ultimately solve the problem.

Following the GDCR, I held a coding dojo where I experimented with the format a little. I only had time to get the participants do two 45 minute coding sessions on the Game of Life problem. At the start, as a group we discussed the Game of Life problem, much as Corey recommends for a Code Retreat introduction. However, in addition, I explained my favoured approach to a solution – the data structure and algorithm I like to use. I immediately saw that they started their TDD sessions with tests and classes that could lead somewhere. I feel that if they had continued coding for longer, they should have ended up with a decent solution. Hopefully it would also have been possible to refactor it from one datastructure to another, and to switch elements of the algorithm without breaking most of the tests.

I think this is one way to improve the code retreat. Just give people more clues at the outset as to how to tackle the problem. In real life when you sit down to do TDD you’ll often already know how to solve similar problems. This would be a way for the facilitator to give everyone a leg-up if they havn’t worked on any rule-based games involving a 2D infinite grid before.

Rather than just explaining a good solution up front, we could spend the first session doing a “Spike Solution”. This is one of the practices of XP:

“the idea is just to drive through the entire problem in one blow, not to craft the perfect solution first time out.”
From “Extreme Programming Installed” by Jeffries, Anderson, Hendrickson, p41

Basically, you hack around without writing any tests or polishing your design, until you understand the problem well enough to know how you plan to solve it. Then you throw your spike code away, and start over with TDD.

Spending the first 45 minute session spiking would enable people to learn more about the problem space in a shorter amount of time than you generally do with TDD. By dropping the requirement to write tests or good names or avoid code smells, you could hopefully hack together more alternatives, and maybe find out for yourself what would make a good data structure for easily enumerating neighbouring cells.

So the next time I run a code retreat, I think I’ll start with a “spiking” session and encourage people to just optimize for learning about the problem, not beautiful code. Then after that maybe I’ll sketch out my favourite algorithm, in case they didn’t come up with anything by themselves that they want to pursue. Then in the second session we could start with TDD in earnest.

I always say that the code you end up with from doing a Kata is not half as interesting as the route you took to get there. To this end I’ve published a screencast of myself doing the Game of Life Kata in Python. (The code I end up with is also published, here). I’m hoping that people might want to prepare for a Code Retreat by watching it. It could be an approach they try to emulate, improve on or criticise. On the other hand, showing my best solution like this is probably breaking the neutrality of how a code retreat facilitator is supposed to behave. I don’t think Corey has ever published a solution to this kata, and there’s probably a reason for that.

I have to confess, I already broke all the rules of being a facilitator, when I spent the last session of the Global Day of Code Retreat actually coding. There was someone there who really wanted to do Python, and no-one else seemed to want to pair with him. So I succombed. In 45 minutes we very nearly had a working solution, mostly because I had hacked around and practiced how to do it several times before. It felt good. I think more people should experience the satisfaction of actually completing a useful piece of code using TDD at a Code Retreat.

Scandinavian Developer Conference will be held in Göteborg in April, and last week we launched the detailed programme on the website. I’ve been involved with the conference ever since the first event in 2009, but this year I’ve taken on increased responsibilities, acting as Programme Chair.

When P-A Freiholtz, the Managing Director at Apper Systems*, (the company behind the conference), approached me about taking this role, I jumped at the chance. My business is all about professional development for software developers, and a conference in Göteborg is a really good opportunity for a lot of local people to hear about what’s going on in the world outside. Many of the developers I meet in my work don’t often get away from their desks, spending the majority of their time caught up in the intrigues and deadlines of a single project and company environment. SDC2012 will be a chance for a lot of local people to broaden their horizons without it costing their boss a fortune, or disrupting their usual schedule too much.

I wrote an editorial on the front page of the conference website which explains more about what’s on the programme and who should come. Everyone involved in software development, basically 🙂

These days, we compete in a global marketplace for software development. I’m hopeful about the future, I’m enthusiastic about Swedish working culture, and I think Scandinavian Developer Conference is helping Göteborg grow as a center of competence. Please do join us at the conference.

* formerly named Iptor – now under new ownership.

I’ve never been to Öredev before, and it really is a very impressive conference. Gathered in Malmö right now, is over a thousand developers, with a collection of speakers representing the elite of the global software development industry. The sessions are overflowing with a plethora of great advice, news and inspiration. That’s primarily why I’m here, although I could also praise the excellent conference organization, food, live music, sponsor giveaways and other entertainments.

I’d like to talk about my first impressions of the conference. As with other software conferences I’ve been too, women are rather unusual here, both among delegates and speakers. I’ve blogged about this before, and it’s not untypical. The sad fact is that the proportion of women is low, and, unlike in other industries, actually falling in software development.

I’d just like to relate two experiences I had yesterday.

Women speakers
I’m here as an ordinary delegate at this conference, but at most conferences I go to, I’m a speaker. I was really happy to be greeted yesterday by Dan North, Gojko Adzic, Pat Kua, Corey Haines and others, since they’re people who I really respect. I’ve mostly got to know them when we’ve spoken at the same conferences in the past. Of course they all asked me which day I was speaking on at Öredev. Well, the easy answer is that I was not asked to speak here. I looked on the Öredev conference website and there was no call for submissions, unlike other conferences I have spoken at like XP, Agile Testing Days, ScanDev, Agile, JFokus etc. I assumed that it was invitation only, and the track chairs would mail me if they were interested in having me speak. Now I realize that I should have mailed them to point out I wanted to speak here.

I think I just fell into this trap that women apparently often fall into, described in this article I read this week: “Four Ways Women Stunt Their Careers Unintentionally“. Apparently we tend to be “overly modest” and “women fail to get promoted because they fail to step up and apply”. So I didn’t apply, and I didn’t get a job to speak at Öredev. #kickingmyself

Neal Ford’s keynote
I listened to a keynote by Neal Ford yesterday. I’ve heard him speak before, and he is always very entertaining and yet makes some interesting points about software. It’s just a small thing he said that really bothered me. A keynote speech is supposed to set the tone for a whole conference – you have the everybody gathered, and you’re supposed to say something inspirational and thought provoking.

One of Neal’s jokes was about some obscure Star Trek reference that I didn’t get (although from the audience reaction I’m guessing most others did), that he followed up with a slide showing the top Google image results when you searched for this thing. He made some comment about google knowing that when you search for this, you’re really asking for porn. He had helpfully airbrushed out the image results so you could only vaguely see the outline of naked women.

Neal, you really didn’t need to do that. Your talk had enough fun stuff in it without alienating me with science fiction references and disguised porn.

Looking forward
There are two more days of the main conference, and plenty more good stuff on the programme. I’ve spotted a session called “geek feminism” in the “extra” track. I’ve never thought of myself as a geek feminist, but maybe I am. Having had the experiences described above, I think I’ll go along and find out.