I was in Finland recently, at the European Testing Conference. I both attended the conference and presented a workshop about “Approval testing with TextTest“. I won’t say any more about that, since Ben Linders did a brilliant write-up already that was published on InfoQ. There were several other highlights, and I wanted to just share a paragraph or so about each.

Mob Testing is what happens when your development team decides to work together on testing tasks as a Mob. I took part in a workshop where Maaret Pyhäjärvi facilitated two different mobbing exercises, one where we automated some UI tests using Selenium, and one where we practiced Test-Driven Development on the FizzBuzz kata. I have already done some Mob Programming and this felt very similar, except the focus was on developing tests rather than production code. It seems to have similar benefits – you have access to all the knowledge of everyone in the team, and you can learn things you didn’t even know to ask about. It makes pairing seem like a slow way to share good working practices.

JUnit 5 is on the horizon, and has several useful improvements over the previous version. Generally the syntax clutter is reduced, and the way you create parameterized tests has been overhauled. The most significant change though, (especially for people like me who work on developing other testing tools), seems to be that they’re designing the test-running engine to be separated so you can re-use it to run other kinds of tests. Any infrastructure that works with JUnit will then be able to run these other tests as well. In principle it opens up JUnit’s success as a platform, to be re-used by other test frameworks. Thanks Nicolai Parlog for this useful summary of the next generation of one of the most widely-used tools in the Java world.

Joel Hynoski has worked at many of the tech giants in our industry, including Google, Twitter, Apple, and now Lyft. He spoke about some of the engineering challenges they had overcome, specifically in the area of testing. One thing I liked was their tool that detects flaky tests, and puts them in ‘jail’. (A flaky test is one that sometimes passes and sometimes fails, when run against the same code. They are a pain and can be a huge waste of time.) When a test is in ‘jail’, that means it’s no longer run in the build pipeline, so it doesn’t block new releases. It instead gets flagged as needing maintenance. They then have a SLA that says how long a test is allowed to remain in jail before an engineer needs to look at it and fix the flakyness – a day or two I think.

I can feel a little in awe of someone who has worked in those kinds of famous engineering organizations, working at web-scale with some of the best developers in our industry. What I found most encouraging about talking to Joel, was that he was very down to earth about the problems these organizations face. They still battle with legacy code, despite it often only being a few years old. They have trouble creating reliable automated tests. The developers don’t always trust the test automation. They still have production bugs and hotfixes…

Alex Schladebeck spent the first ten minutes of her presentation giving a splendid rant about the bad reputation of UI testing. To summarize: (criticisms she hears about UI tests -> her responses)

UI tests give slow feedback -> and valuable feedback, doesn’t have to be after every build
need more infrastructure/machines -> yes, deal with it
they’re the top of the test pyramid -> they are in the pyramid! you can’t ignore them. They find different stuff than unit tests. Consider your context.
they’re flaky -> they’re not as bad as they used to be! Could be your app isn’t designed for testabilty? Could be your test design is poor?
they cause lots of work when small changes in your app -> that happens in development work too! Also, happens more if you design them badly.

She then went on to give some excellent advice about how to design your UI tests. It was mostly about layering your test code in different levels of abstraction, and getting a good collaboration going between developers and testing specialists.

Conferences are about meeting people and the organizers of this conference had very deliberately scheduled sessions to encourage this. We had a ‘speed dating’ session where you talk to about 8 random people for five minutes each. We had a ‘lean coffee’ session, where all the speakers were each asked to facilitate a discussion table. I thought this worked particularly well as a way to find people with similar interests, and get them to talk about their experiences. The hands-on workshops were all at the same time, so you had to go to one and not just attend talks all the time. There was also an open space scheduled when it would not clash with any other kinds of sessions. I thought all this together made for a pretty welcoming conference where you were bound to get to know new people.

Overall I had a really good time at this conference and I’d recommend it to both testers and developers with a strong quality focus.

I’ve been favouring an Approval Testing approach for many years now, since I find it pretty useful in many situations, particularly for acceptance tests. Not many people I meet know the term though, and even fewer know how to use the technique. Recently I’ve put together some small exercises – code katas – to help people to learn about it. I’ll be going through them at a couple of upcoming conference workshops, but for all you people who won’t be there in person, I’m publishing them on github as well.

I’ve got three katas set up now, Minesweeper, Yatzy and GildedRose. If you’ve done any of these katas before, you’ll probably have been using ordinary unit testing techniques. Hopefully by doing them again, with Approval Testing, you’ll learn a little about what’s different about this technique, and how it could be useful.

Before you can do the katas, you’ll need to install an approval testing tool. I’m one of the developers of TextTest, so that’s the tool I’ve set up right now. Below are some useful commands for a debian/ubuntu machine for installing it.

I’m still developing these exercises, and would like feedback about what you think of them. For example I have Python versions for all three, but only one has a Java version as yet. Do people want more translations? Do let me know how you get on, and what you think!

Installation instructions

You will need to have Python 2, and TextTest. (Unfortunately TextTest uses a GUI library that doesn’t support Python 3). For example:

$ sudo apt-get install python-pip
$ sudo pip install texttest

For more detailed instructions, and for other platforms see the texttest installation docs. For more general documentation, see the texttest website.

You need to have an editor and a diff tool configured for texttest to use. I recommend sublime text and meld. Install them like this:

$ sudo add-apt-repository ppa:webupd8team/sublime-text-3
$ sudo apt-get update
$ sudo apt-get install sublime-text-installer
$ sudo apt-get install meld

Then you need to configure texttest to use them:

$ cd
$ mkdir .texttest
$ touch .texttest/config
$ subl .texttest/config

Enter the following in that file, and save:


For convenience, I also like to create an alias ‘tt’ for starting TextTest for these exercises. Change directory to one of the exercise repositories, then a ‘tt’ command should start the TextTest GUI and show the tests for that exercise. Define such an alias like this:

alias tt='texttest -d python -c .'

Two of the exercises start with a small test suite for you to build on. There should be instructions in the README file of each respective exercise, to help you to get going. If you really can’t work out what to do, have a look at the sample solutions and see if that helps. These are also on github: Minesweeper-sample-solution, Yatzy-sample-solution, GildedRose-sample-solution

I blogged a while back about “Text-Based testing”, which is a variant of Test-Driven Development that I’ve used quite a bit. My husband, Geoff Bache, is developing several tools to support this style of development.

Recently, we met Llewellyn Falco and discovered the work he’s been doing with Approval Tests. We were all really excited to realize we’ve independently been working on something very similar. Llewellyn’s Approval Tests library is in some ways a unit-test version of Geoff’s tool TextTest which is probably more suited to integration or functional tests. What we’ve been calling “Text-Based Testing” I think is better described as “Text-Based Approval Testing”. I think it’s a particularly powerful technique for characterization tests of legacy code, and regression testing in general. Geoff’s latest tools also make it a viable approach for GUI testing, traditionally an area where people have difficulty doing pure TDD. (We’ll be talking about this at Eurostar in November.

I’ve written a fuller description of Approval Testing in a chapter of my work-in-progress book “Mocks, Fakes and Stubs”, but I’ll summarize here. In classic Test-Driven Development, you begin by defining a test case comprising three parts – “Arrange”, “Act”, “Assert”. The assertion generally takes the form assertEqual(expected, actual), and you calculate the expected value when you define the test case. Then you go away and implement the functionality, until the “actual” value matches “expected”, and the test passes.

With Approval Testing, you design the test case to the point of having “Arrange” and “Act”, but defer defining the “expected” value for the “Assert”. You take the approach of “I’ll know it when I see it”, and get on with implementing the code. When the actual value the code produces looks right, you “Approve” it – store the actual value in the test case.  So the assert statement becomes “assertEqual(approved, actual)”.

Text-Based Approval Testing
The value you approve on could be anything you can automatically diff the actual program output against – a file, a string, a screenshot, some json, contents of a database table… you name it. The thing is, plain text is wonderfully simple to diff, version control, merge, store, manipulate… and there’s a wealth of existing, well understood tools to do that. I guess that’s why Geoff’s tools only support plain text so far. His approach has always been that if your program produces output in a different format, you write a test fixture to convert it to plain text before you diff. Llewellyn’s tools have branched out more into diffing images and suchlike.

I think “Approval Testing” is a good name for the style of testing both Geoff and Llewellyn’s tools support. I like the implication that you explicitly approve the output from your program as correct, and use that as the basis for your test.

Other Approval Testers

Geoff and Llewellyn aren’t the only people using an Approval testing approach, either. Recently I led a workshop where we compared writing tests for the Gilded Rose Kata using both Cucumber and Approval Tests. Nat Pryce was there, and he later blogged about it. He speculates that Approval testing might solve some problems he’s seen with other kinds of testing. Nat has subsequently started developing a new approval testing tool, Pearlfish, so he can test his ideas.

There is also this recent screencast by Brett Slatkin from Google  who explains how he’s using an approval testing technique with image diffs to regression test his webapp. He says he finds this technique essential in a continuous delivery environment – these tests find bugs his other tests (both manual and automatic) miss entirely.

I have also found Approval testing to be a really useful technique, and I hope that simply having a good name for it will help people understand what it is. Perhaps you’ll realize it’s an approach you’ve already used, just without having a name for it. Or maybe you’ll be inspired to try out one of the tools I’ve mentioned.