Archive for the ‘Review’ Category

I was in Finland recently, at the European Testing Conference. I both attended the conference and presented a workshop about “Approval testing with TextTest“. I won’t say any more about that, since Ben Linders did a brilliant write-up already that was published on InfoQ. There were several other highlights, and I wanted to just share a paragraph or so about each.

Mob Testing is what happens when your development team decides to work together on testing tasks as a Mob. I took part in a workshop where Maaret Pyhäjärvi facilitated two different mobbing exercises, one where we automated some UI tests using Selenium, and one where we practiced Test-Driven Development on the FizzBuzz kata. I have already done some Mob Programming and this felt very similar, except the focus was on developing tests rather than production code. It seems to have similar benefits – you have access to all the knowledge of everyone in the team, and you can learn things you didn’t even know to ask about. It makes pairing seem like a slow way to share good working practices.

JUnit 5 is on the horizon, and has several useful improvements over the previous version. Generally the syntax clutter is reduced, and the way you create parameterized tests has been overhauled. The most significant change though, (especially for people like me who work on developing other testing tools), seems to be that they’re designing the test-running engine to be separated so you can re-use it to run other kinds of tests. Any infrastructure that works with JUnit will then be able to run these other tests as well. In principle it opens up JUnit’s success as a platform, to be re-used by other test frameworks. Thanks Nicolai Parlog for this useful summary of the next generation of one of the most widely-used tools in the Java world.

Joel Hynoski has worked at many of the tech giants in our industry, including Google, Twitter, Apple, and now Lyft. He spoke about some of the engineering challenges they had overcome, specifically in the area of testing. One thing I liked was their tool that detects flaky tests, and puts them in ‘jail’. (A flaky test is one that sometimes passes and sometimes fails, when run against the same code. They are a pain and can be a huge waste of time.) When a test is in ‘jail’, that means it’s no longer run in the build pipeline, so it doesn’t block new releases. It instead gets flagged as needing maintenance. They then have a SLA that says how long a test is allowed to remain in jail before an engineer needs to look at it and fix the flakyness – a day or two I think.

I can feel a little in awe of someone who has worked in those kinds of famous engineering organizations, working at web-scale with some of the best developers in our industry. What I found most encouraging about talking to Joel, was that he was very down to earth about the problems these organizations face. They still battle with legacy code, despite it often only being a few years old. They have trouble creating reliable automated tests. The developers don’t always trust the test automation. They still have production bugs and hotfixes…

Alex Schladebeck spent the first ten minutes of her presentation giving a splendid rant about the bad reputation of UI testing. To summarize: (criticisms she hears about UI tests -> her responses)

UI tests give slow feedback -> and valuable feedback, doesn’t have to be after every build
need more infrastructure/machines -> yes, deal with it
they’re the top of the test pyramid -> they are in the pyramid! you can’t ignore them. They find different stuff than unit tests. Consider your context.
they’re flaky -> they’re not as bad as they used to be! Could be your app isn’t designed for testabilty? Could be your test design is poor?
they cause lots of work when small changes in your app -> that happens in development work too! Also, happens more if you design them badly.

She then went on to give some excellent advice about how to design your UI tests. It was mostly about layering your test code in different levels of abstraction, and getting a good collaboration going between developers and testing specialists.

Conferences are about meeting people and the organizers of this conference had very deliberately scheduled sessions to encourage this. We had a ‘speed dating’ session where you talk to about 8 random people for five minutes each. We had a ‘lean coffee’ session, where all the speakers were each asked to facilitate a discussion table. I thought this worked particularly well as a way to find people with similar interests, and get them to talk about their experiences. The hands-on workshops were all at the same time, so you had to go to one and not just attend talks all the time. There was also an open space scheduled when it would not clash with any other kinds of sessions. I thought all this together made for a pretty welcoming conference where you were bound to get to know new people.

Overall I had a really good time at this conference and I’d recommend it to both testers and developers with a strong quality focus.

I blogged a while back about “Text-Based testing”, which is a variant of Test-Driven Development that I’ve used quite a bit. My husband, Geoff Bache, is developing several tools to support this style of development.

Recently, we met Llewellyn Falco and discovered the work he’s been doing with Approval Tests. We were all really excited to realize we’ve independently been working on something very similar. Llewellyn’s Approval Tests library is in some ways a unit-test version of Geoff’s tool TextTest which is probably more suited to integration or functional tests. What we’ve been calling “Text-Based Testing” I think is better described as “Text-Based Approval Testing”. I think it’s a particularly powerful technique for characterization tests of legacy code, and regression testing in general. Geoff’s latest tools also make it a viable approach for GUI testing, traditionally an area where people have difficulty doing pure TDD. (We’ll be talking about this at Eurostar in November.

I’ve written a fuller description of Approval Testing in a chapter of my work-in-progress book “Mocks, Fakes and Stubs”, but I’ll summarize here. In classic Test-Driven Development, you begin by defining a test case comprising three parts – “Arrange”, “Act”, “Assert”. The assertion generally takes the form assertEqual(expected, actual), and you calculate the expected value when you define the test case. Then you go away and implement the functionality, until the “actual” value matches “expected”, and the test passes.

With Approval Testing, you design the test case to the point of having “Arrange” and “Act”, but defer defining the “expected” value for the “Assert”. You take the approach of “I’ll know it when I see it”, and get on with implementing the code. When the actual value the code produces looks right, you “Approve” it – store the actual value in the test case.  So the assert statement becomes “assertEqual(approved, actual)”.

Text-Based Approval Testing
The value you approve on could be anything you can automatically diff the actual program output against – a file, a string, a screenshot, some json, contents of a database table… you name it. The thing is, plain text is wonderfully simple to diff, version control, merge, store, manipulate… and there’s a wealth of existing, well understood tools to do that. I guess that’s why Geoff’s tools only support plain text so far. His approach has always been that if your program produces output in a different format, you write a test fixture to convert it to plain text before you diff. Llewellyn’s tools have branched out more into diffing images and suchlike.

I think “Approval Testing” is a good name for the style of testing both Geoff and Llewellyn’s tools support. I like the implication that you explicitly approve the output from your program as correct, and use that as the basis for your test.

Other Approval Testers

Geoff and Llewellyn aren’t the only people using an Approval testing approach, either. Recently I led a workshop where we compared writing tests for the Gilded Rose Kata using both Cucumber and Approval Tests. Nat Pryce was there, and he later blogged about it. He speculates that Approval testing might solve some problems he’s seen with other kinds of testing. Nat has subsequently started developing a new approval testing tool, Pearlfish, so he can test his ideas.

There is also this recent screencast by Brett Slatkin from Google  who explains how he’s using an approval testing technique with image diffs to regression test his webapp. He says he finds this technique essential in a continuous delivery environment – these tests find bugs his other tests (both manual and automatic) miss entirely.

I have also found Approval testing to be a really useful technique, and I hope that simply having a good name for it will help people understand what it is. Perhaps you’ll realize it’s an approach you’ve already used, just without having a name for it. Or maybe you’ll be inspired to try out one of the tools I’ve mentioned.

For a little while now I’ve been collecting Refactoring Kata exercises in a github repo, (you’re welcome to clone it and try them out). I’ve recently facilitated working on some of these katas at various coding dojo meetings, and participants seem to have enjoyed doing them. I usually give a short introduction about the aims of the dojo and the refactoring skills we’re focusing on, then we split into pairs and work on one of these Refactoring Katas for a fixed timebox. Afterwards we compare designs and discuss what we’ve learnt in a short retrospective. It’s satisfying to take a piece of ugly code and after only an hour or so make it into something much more readable, flexible, and significantly smaller.

Test Driven Development is a multifaceted skill, and one aspect is the ability to improve a design incrementally, applying simple refactorings one at a time until they add up to a significant design improvement. I’ve noticed in these dojo meetings that some pairs do better than others at refactoring in small steps. I can of course stand behind them and coach to some extent, but I was wondering if we could use a tool that would watch more consistently, and help pairs notice when they are refactoring poorly.

I spent an hour or so doing the GildedRose refactoring kata myself in Java, and while I was doing it I had two different monitoring tools running in the background, the “Codersdojo Client” from http://content.codersdojo.org/codersdojo_client/, and “Sessions Recorder” from Industrial Logic. (This second tool is commercial and licenses cost money, but I actually got given a free license so I could try it out and review it). I wanted to see if these tools could help me to improve my refactoring skills, and whether I could use them in a coding dojo setting to help a group.

Setting it up and recording your Kata
The Codersdojo Client is a ruby gem that you download and install. When you want to work on a kata, you have to fiddle about a bit on the command line getting some files in the right places, (like the junit jar), then modify and run a couple of scripts. It’s not difficult if you follow the instructions and know basicly how to use the command line. You have a script running all the time you are coding, and it runs the tests every time you save a source file.

The Sessions Recorder is an Eclipse plugin that you download and install in the same way as other Eclipse plugins. It puts a “record” button on your toolbar. You press “record” before you start working on the kata.

Uploading the Kata for analysis
When you’ve finished the kata, you need to upload your recording for analysis. With the Codersdojo Client, when you stop the script, it gives you the option of doing the upload. When that’s completed it gives you a link to a webpage, where you can fill in some metadata about the Kata and who you are. Then it takes you to a page with the full final code listing and analysis.

The Sessions Recorder is similar. You press the button on the Eclipse toolbar to stop recording, and save the session in a file. Then you go to the Indutrial Logic webpage, log into your account, and go to the page where you upload your recorded Session file. You don’t have to enter any metadata, since you have an account and it remembers who you are, (you did pay for this service after all!) It then takes you to a page of analysis.

Codersdojo Client Analysis
The codersdojo client creates a page that you can make public if you want – mine is available here. It gives you a graph like this:

Screen Shot 2012-08-16 at 09.08.08

It’s showing how long you spent between saving the file, (a “move”), and whether the tests were red or green when you saved. There is also some general statistics about how long you spent in each move, and how many modifications there were on average. It points out your three longest moves, and has links to them so you can see what you were doing in the code at those points.

I think this analysis is quite helpful. I can see that I’m going no more than two or three minutes between saving the file, and usually if the tests go red I fix them quickly. Since it’s a refactoring kata I spend quite a lot of moves at the start where it’s all green, as I build up tests to cover the functionality. In the middle there is a red patch, and this is a clear sign to me that I could have done that part of the kata better. Looking over my code I was doing a major redesign and I should have done it in a better way that would have kept the tests running in the meantime.

Towards the end of the kata I have another flurry of red moves, as I start adding new functionality for “Conjured” items. I tried to move into a more normal TDD red-green-refactor cycle at that point, but it actually doesn’t look like I succeeded very well from this graph. I think I rushed past the “green” step without running the tests, then did a big refactoring. It worked in the end but I think I could have done that better too.

Sessions Recorder Analysis
The Sessions Recorder produces a page which is personal to me, and I don’t think it allows me to share it publicly on the web. On the page is a graph that looks like this:

anychart

As you can see it also shows how long I spend with passing and failing tests, in a slightly different way from the Codersdojo Client’s graph. It also distinguishes compiler errors from failing tests, (pink vs red).

This graph also clearly shows the areas I need to improve – the long pink patch in the middle where I do a major redesign in too large a step, and at the red bit at the end when I’m not doing TDD all that well.

The line on the graph is a “score” where it is awarding me points when I successfully perform the moves of TDD. Further down the page it gives me a list of the “events” this score is based on:

Screen Shot 2012-08-16 at 09.39.32

(This is just some of the events, to show you the kinds of things this picks up on.) “New Green Test” seems to score zero points, which is a bit disappointing, but adding a failing test gets a point, and so does making it pass. “Went green, but broke other tests” gets zero points. It’s clearly designed to help me successfully complete red-green-refactor cycles, not reward me for adding test coverage to existing code, then refactoring it.

There is another graph, more focused on the tests:

Screen Shot 2012-08-16 at 09.44.53

This graph has mouseover texts so when you hover over a red dot, it shows all the compilation errors you had at that point, and if you hover over a green dot it tells you which tests were passing. It also distinguishes “compler errors” from a “compiler rash”. The difference is that a “compiler rash” is a more serious compilation problem, that affects several files.

You can clearly see from this graph that the first part of the kata I was building up the test coverage, then just leaning on these tests and refactoring for the rest. It hasn’t noticed that I had two @Ignore ‘d tests until the last few minutes though. (I added failing tests for Conjured Items near the start then left them until I had the design suitably refactored near the end).

I actually found this graph quite hard to use to work out what I need to improve on. There seem to be three long gaps in the middle, full of compilation errors where I wasn’t running the tests. Unlike with the Codersdojo Client, there isn’t a link to the actual code changes I was making at those points. I’m having trouble working out just from the compiler errors what I should have been doing differently. I think one of these gaps is the same major redesign I could see clearly in the Codersdojo Client graph as a too big step, but I’m not so sure what the other two are.

There are further statistics and analysis as well. There is a section for “code smells” but it claims not to have found any. The code I started with should qualify as smelly, surely? Not sure why it hasn’t picked up on that.

Conclusions
I think both tools could help me to become better at Test Driven Development, and could be useful in a dojo setting. I can imagine pairs comparing their graphs after completing the kata, discussing how they handled the refactoring steps, and where the design ended up. Pairs could swap computers and look through someone else’s statistics to get some comparison with their own.

The Codersdojo Client is free to use, and works with a large number of programming languages, and any editor. You do have to be comfortable with the command line though. The Sessions Recorder tool only supports Java and C# via Eclipse. It has more detailed analysis, but for this Refactoring Kata I don’t think it was as helpful as it could have been.

The other big difference between the tools is about openness. The Sessions Recorder keeps your analysis private to you, and if you want to discuss your performance, it lets you do so with the designers of the tool via a “comment on this page” function. I havn’t tried that out yet so I’m not sure how it works, that is, whether you get feedback from a real person as well as the tool.

The Codersdojo Client also lets you keep your analysis private if you want, but in addition lets you publish your Kata performance for general review, as I have done. You can share your desire for feedback on twitter, g+ or facebook. People can go in and comment on specific lines of code and make suggestions. That wouldn’t be so needed during a dojo meeting, but might be useful if you were working alone.

Further comparison needed
On another occasion I tried out the Sessions Recorder on a normal TDD kata, and found the analysis much better. For example this graph of me doing the Tennis kata from scratch:

anychart (1)

This shows a clear red-green pattern of small steps, and steadily increasing score rewarding me for doing TDD correctly. Unfortunately I didn’t do a Codersdojo Client session at the same time as this one, for comparison. A further blog post is clearly needed for this case… 🙂

I’m speaking next week at ScanDev on Tour in Stockholm on the subject of “Software Development Craftsmanship”, and as part of my research I read both “The Clean Coder” by Robert C. Martin and “Apprenticeship Patterns” by Dave Hoover & Adewale Oshineye. These are very different books, but both aimed at less experienced software developers who want to learn about what it means to be a professional in the field. In this article I’d like to review them side by side. First some text from each preface on what the authors think the books are about:

Apprenticeship Patterns

“This book should help you through the tough decisions you face as a newcomer to the field of professional software development. “ (preface xi)

The Clean Coder

“This book is about software professionalism. It contains a lot of pragmatic advice” (preface xxii)

The Content
Both books contain a lot of personal stories and anecdotes from the authors’ careers, and begin with a short autobiography. Some of the advice is also similar. Both advise you to practice with “Kata” exercises, to read widely and to find suitable mentors. I think that’s mostly where the similarities end though.

Dave and Ade don’t say much about how to handle unreasonable managers imposing impossible deadlines. Bob Martin devotes a several chapters to this kind of issue, handling pressure, time management, estimation, making committments etc.

Dave and Ade talk more about how to get yourself into situations optimized for learning and progress in your career. They advise you to “Be the worst”, “Find mentors”, seek “Kindred Spirits”. In other words, join a team where you’re the least skilled but you’ll be taught, look for mentors in many places, and get involved in the community.

Bob talks about a lot of specific practices and has detailed advice. He mentions “… pairing is the most efficient way to solve a problem” (p164) Later in the chapter he suggests the optimal composition of job roles in a gelled team. (p169) He also has some advice about how to successfully argue with your boss and go over their head when necessary (p35).

The Advice
Those few example perhaps illustrate that these two books are miles apart when it comes to writing style, approach and world view. Dave&Ade have clearly spent a lot of time talking with other professionals about their material, acting on feedback and testing their ideas for validity in real situations. The book is highly collaborative and while full of advice, is not prescriptive.

Bob Martin on the other hand loves to be specific, provocative and extreme in his advice. “QA should find nothing.”(p114) “You should plan on working 60 hours per week.” (p16) “Avoid the Zone.” (p62) “The jury is in! … TDD works” (p79) These are some of his more suprising pieces of advice, which I think are actually fairly doubtful propositions when taken to extremes like this. Mixed in are more reasonable statements. “You do not have to attend every meeting to which you are invited” (p123) “The professional developer is calm and decisive under pressure”. (p150)

The way everything is presented as black-and-white, do-or-do-not-there-is-no-try is actually pretty wearing after a while. He does it to try to make you think, as a rhetorical device, to promote healthy discussion. I think it all too easily leads the reader to throw the baby out with the bathwater. I can’t accept one of his recommendations, so I throw them all out.

Some of Dave&Ade’s advice is actually just as hard to put into practice. Each of their patterns is followed by a call to action. Things like re-implementing a program you’ve written in an imperative language in a functional language (p21). Join or start a user group (p65). Solve the same coding exercise once a week for the next four weeks (p79). None of these things is particularly easy to do, but they seem to me to be interesting and useful challenges.


Collaboration
Bob has also clearly not collaborated very widely when preparing his material. One part that particularly sticks out for me is a footnote on page 75:

“I had a wonderful conversation with @desi (Desi McAdam, founder of DevChix) about what motivates women programmers. I told her that when I got a program working, it was like slaying the great beast. She told me that for her and other women she had spoken to, the act of writing code was an act of nurturing creation.” (footnote, p75)

Has he ever actually run his “programming is slaying a great beast” thing past any other male programmers? Let me qualify that – non-fantasy-role-playing male programmers? Thought not. This is in enormous contrast to Dave&Ade, whose book is full of stories from other people backing up their claims.

Stories
Bob’s book is full of stories from his own career, and he is very honest and open about his failures. This is a very brave thing to do, and I have a great deal of respect for him for doing so. It’s also really interesting to hear about the history of what life was like when computers filled a room and people used punch cards to program them. Dave&Ades stories are less compelling and not always as well written.

Bob’s book is not just about his professional life, he shares his likes and dislikes. He reccommends cycling or walking to recharge your energy, or “focus-manna” as he calls it, (p127). Reading science fiction as a cure for writer’s block. (p66) Listening to “The Wall” while coding could be bad for your design. (p63) When describing “Master” programmers he likens them to Scotty from Star Trek. (p182)

All this is very cute and gives you a more rounded picture of what software professionalism is about. Maybe. Actually it really puts me off the idea. I know a lot of software developers like science fiction and fantasy role playing, but it really isn’t mandatory. He usually says that you may have other preferences, and you don’t have to do like he does, but I just don’t think it helps all that much. The rest of the book is highly dogmatic about what you should and shouldn’t do, and it kind of rubs off.

Conclusions
The bottom line is, I wouldn’t reccommend “The Clean Coder” to any young inexperienced software developer, particularly not if she were a woman. There is too much of it written from a foreign culture, in a demanding tone, propounding overly extreme behaviour. The interesting stories and good pieces of advice are drowned out.

On the other hand, I would recommend “Apprenticeship Patterns”. I think it is humbly written and anchored in real experience from a range of people. I agree with them when they say you need to read it twice to understand it. The first time to get an overview, the second time to understand how the patterns connect. It’s not as easy to read as it might be. But still, I think the content is interesting, and it gives a good introduction to what being a professional software craftsman is about, and how to get there.

As I mentioned in my last post I’ve recently taught a course in automated testing to a bunch of students at KYH. Before the course I spent some time looking for good course books for them. I looked at a few options and eventually decided on “The art of Unit Testing with examples in .Net” by Roy Osherove, and “The RSpec Book” by Chelimsky et al.

I chose the unit testing book because Roy does a good job of describing the basics of test driven development, including simple mocks and stubs. The book is very practical and is full of insight from experience and code examples.

I also looked at “Pragmatic Unit Testing in C# with NUnit” by Andrew Hunt and David Thomas. I’m a big fan of their book “The Pragmatic Programmer”, so I had high hopes for this one. Unfortunately I was rather disappointed with it. It talks about what good unit tests should look like, but not much about how you use Test Driven Development to create them.

I chose the RSpec Book because it has quite a bit of material about Cucumber and how it fits in to a Behaviour Driven Development process. I think the published literature on automated testing focuses too much on unit level tools, and there is not enough written about feature level tests and how to use them as part of the whole agile process.

I also looked at “Bridging the Communication Gap” by Gojko Adzic, which I think is an excellent introduction to how to use feature level tests as part of the agile process, but it is largely tool agnostic. There is a short chapter introducing some tools, including JBehave, a forerunner to Cucumber, Selenium and TextTest too. It’s a little out of date now though, and for this course I wanted something with more detail.

I hope these short book reviews are useful.