I’ve never been to Öredev before, and it really is a very impressive conference. Gathered in Malmö right now, is over a thousand developers, with a collection of speakers representing the elite of the global software development industry. The sessions are overflowing with a plethora of great advice, news and inspiration. That’s primarily why I’m here, although I could also praise the excellent conference organization, food, live music, sponsor giveaways and other entertainments.

I’d like to talk about my first impressions of the conference. As with other software conferences I’ve been too, women are rather unusual here, both among delegates and speakers. I’ve blogged about this before, and it’s not untypical. The sad fact is that the proportion of women is low, and, unlike in other industries, actually falling in software development.

I’d just like to relate two experiences I had yesterday.

Women speakers
I’m here as an ordinary delegate at this conference, but at most conferences I go to, I’m a speaker. I was really happy to be greeted yesterday by Dan North, Gojko Adzic, Pat Kua, Corey Haines and others, since they’re people who I really respect. I’ve mostly got to know them when we’ve spoken at the same conferences in the past. Of course they all asked me which day I was speaking on at Öredev. Well, the easy answer is that I was not asked to speak here. I looked on the Öredev conference website and there was no call for submissions, unlike other conferences I have spoken at like XP, Agile Testing Days, ScanDev, Agile, JFokus etc. I assumed that it was invitation only, and the track chairs would mail me if they were interested in having me speak. Now I realize that I should have mailed them to point out I wanted to speak here.

I think I just fell into this trap that women apparently often fall into, described in this article I read this week: “Four Ways Women Stunt Their Careers Unintentionally“. Apparently we tend to be “overly modest” and “women fail to get promoted because they fail to step up and apply”. So I didn’t apply, and I didn’t get a job to speak at Öredev. #kickingmyself

Neal Ford’s keynote
I listened to a keynote by Neal Ford yesterday. I’ve heard him speak before, and he is always very entertaining and yet makes some interesting points about software. It’s just a small thing he said that really bothered me. A keynote speech is supposed to set the tone for a whole conference – you have the everybody gathered, and you’re supposed to say something inspirational and thought provoking.

One of Neal’s jokes was about some obscure Star Trek reference that I didn’t get (although from the audience reaction I’m guessing most others did), that he followed up with a slide showing the top Google image results when you searched for this thing. He made some comment about google knowing that when you search for this, you’re really asking for porn. He had helpfully airbrushed out the image results so you could only vaguely see the outline of naked women.

Neal, you really didn’t need to do that. Your talk had enough fun stuff in it without alienating me with science fiction references and disguised porn.

Looking forward
There are two more days of the main conference, and plenty more good stuff on the programme. I’ve spotted a session called “geek feminism” in the “extra” track. I’ve never thought of myself as a geek feminist, but maybe I am. Having had the experiences described above, I think I’ll go along and find out.

Programmers have a vested interest in making sure the software they create does what they think it does. When I’m coding I prefer to work in the context of feedback from automated tests, that help me to keep track of what works and how far I’ve got. I’ve written before about Test Driven Development, (TDD). In this article I’d like to explain some of the main features of Text-Based Testing. It’s a variant on TDD, perhaps more suited to the functional level than unit tests, and which I’ve found powerful and productive to use.


The basic idea
You get your program to produce a plain text file that documents all the important things that it does. A log, if you will. You run the program and store this text as a “golden copy” of the output. You create from this a Text-Based Test with a descriptive name, any inputs you gave to the program, and the golden copy of the textual output.

You make some changes to your program, and you run it again, gathering the new text produced. You compare the text with the golden copy, and if they are identical, the test passes. If the there is a difference, the test fails. If you look at the diff, and you like the new text better than the old text, you update your golden copy, and the test is passing once again.

Tool Support
Text-Based Testing is a simple idea, and in fact many people do it already in their unit tests. AssertEquals(String expected, String actual) is actually a form of it. You often create the “expected” string based on the actual output of the program, (although purists will write the whole assert before they execute the test).

Most unit test tools these days give you a nice diff even on multi-line strings. For example:

download
download (1)

Which is a failing text-based test using JUnit.

Once your strings get very long, to the scale of whole log files, even multi-line diffs aren’t really enough. You get datestamps, process ids and other stuff that changes every run, hashmaps with indeterminate order, etc. It gets tedious to deal with all this on a test-by-test basis.

My husband, Geoff Bache, has created a tool called “TextTest” to support Text-Based testing. Amongst other things, it helps you organize and run your text-based tests, and filter the text before you compare it. It’s free, open source, and of course used to test itself. (Eats own dog food!) TextTest is used extensively within Jeppesen Systems, (Geoff works for them, and they support development), and I’ve used it too on various projects in other organizations.

In the rest of this article I’ll look at some of the main implications of using a Text-Based Testing approach, and some of my experiences.

Little code per test
The biggest advantage of the approach, is that you tend to write very little unique code for each test. You generally access the application through a public interface as a user would, often a command line interface or (web)service call. You then create many tests by for example varying the command line options or request contents. This reduces test maintenance work, since you have less test code to worry about, and the public API of your program should change relatively infrequently.

Legacy code
Text-Based Testing is obviously a regression testing technique. You’re checking the code still does what it did before, by checking the log is the same. So these tests are perfect for refactoring. As you move around the code, the log statements move too, and your tests stay green, (so long as you don’t make any mistakes!) In most systems, it’s cheap and risk-free to add log statements, no matter how horribly gnarly the design is. So text-based testing is an easy way to get some initial tests in place to lean on while refactoring. I’ve used it this way fairly successfully to get legacy code under control, particularly if the code already produces a meaningful log or textual output.


No help with your design
I just told you how good Text-Based Testing is with Legacy code. But actually these tests give you very little help with the internal design of your program. With normal TDD, the activity of creating unit tests at least forces you to decompose your design into units, and if you do it well, you’ll find these tests giving you all sorts of feedback about your design. Text-Based tests don’t. Log statements don’t care if they’re in the middle of a long horrible method or if they’re spread around several smaller ones. So you have to get feedback on your design some other way.

I usually work with TDD at the unit level in combination with Text-Based tests at the functional level. I think it gives me the best of both worlds.

Log statements and readability
Some people complain that log statements reduce the readability of their code and don’t like to add any at all. They seem to be out of fashion, just like comments. The idea is that all the important ideas should be expressed in the class and method names, and logs and comments just clutter up the important stuff. I agree to an extent, you can definitely over-use logs and comments. I think a few well placed ones can make all the difference though. For Text-Based Testing purposes, you don’t want a log that is megabytes and megabytes of junk, listing every time you enter and leave every method, and the values of every variable. That’s going to seriously hinder your refactoring, apart from being a nightmare to store and update.

What we’re talking about here is targeted log statements at the points when something important happens, that we want to make sure should continue happening. You can think about it like the asserts you make in unit tests. You don’t assert everything, just what’s important. In my experience less than two percent of the lines of code end up being log statements, and if anything, they increase readability.

Text-Based tests are completed after the code
In normal TDD you write the test first, and thereby set up a mini pull system for the functionality you need. It’s lean, it forces you to focus on the problem you’re trying to solve before you solve it, and starts giving you feedback before you commit to an implementation. With Text-Based Testing, you often find it’s too much work the specify the log up front. It’s much easier to wait until you’ve implemented the feature, run the test, and save the log afterwards.

So your tests usually aren’t completed until after the code they test, unlike in normal TDD. Having said that, I would argue that you can still do a form of TDD with Text-Based Tests. I’d normally create the half the test before the code. I name the test, and find suitable inputs that should provoke the behaviour I need to implement in the system. The test will fail the first time I run it. In this way I think I get many of the benefits of TDD, but only actually pin down the exact assertion once the functionality is working.

“Expert Reads Output” Antipattern
If you’re relying on a diff in the logs to tell you when your program is broken, you had better have good logs! But who decides what to log? Who checks the “golden copy”? Usually it is the person creating the test, who should look through the log and check everything is in order the first time. Of course, after a test is created, every time it fails you have to make a decision whether to update the golden copy of the log. You might make a mistake. There’s a well known antipattern called “Expert Reads Output” which basically says that you shouldn’t rely on having someone check the results of your tests by eye.

This is actually a problem with any automated testing approach – someone has to make a judgement about what to do when a test fails – whether the test is wrong or there’s a bug in the application. With Text-Based Testing you might have a larger quantity of text to read through compared with other approaches, or maybe not. If you have human-readable, concise, targeted log statements and good tools for working with them, it goes a long way. You need a good diff tool, version control, and some way of grouping similar changes. It’s also useful to have some sanity checks. For example TextTest can easily search for regular expressions in the log and warn you if you try to save a golden copy containing a stack trace for example.

In my experience, you do need to update the golden copy quite often. I think this is one of the key skills with a Text-Based Testing approach. You have to learn to write good logs, and to be disciplined about either doing refactoring or adding functionality, not both at the same time. If you’re refactoring and the logs change, you need to be able to quickly recognize if it’s ok, or if you made a mistake. Similarly, if you add new functionality and no logs change, that could be a problem.

Agile Tests Manage Behaviour
When you create a unit test, you end with an Assert statement. This is supposed to be some kind of universal truth that should always be valid, or else there is a big problem. Particularly for functional level tests, it can be hard to find these kinds of invariants. What is correct today might be updated next week when the market moves or the product owner changes their mind. With Text-Based Testing you have an opportunity to quickly and easily update the golden copy every time the test “fails”. This makes your tests much more about keeping control of what your app does over time, and less about rewriting assert statements.

Text-Based Testing grew up in the domain of optimizing logistics planning. In this domain there is no “correct” answer you can predict in advance and assert. Planning problems that are interesting to solve are far too complex for a complete mathematical analysis, and the code relies on heuristics and fancy algorithms to come up with better and better solutions. So Text-Based Testing makes it easy to spot when the test produces a different plan from before, and use it as the new baseline if it’s an improvement.

I think generally it leads to more “agile” tests. They can easily respond to changes in the business requirements.

Conclusions
There is undoubtedly a lot more to be said about Text-Based Testing. I havn’t mentioned text-based mocking, data-driven vs workflow testing, or how to handle databases and GUIs – all relevant topics. I hope this article has given you a flavour of how it’s different from ordinary TDD, though. I’ve found that good tool support is pretty essential to making Text-Based Testing work well, and that it’s a particularly good technique for handling legacy code, although not exclusively. I like the approach because it minimizes the amount of code per test, and makes it easy to keep the tests in sync with the current behaviour of the system.

 

I’m speaking next week at ScanDev on Tour in Stockholm on the subject of “Software Development Craftsmanship”, and as part of my research I read both “The Clean Coder” by Robert C. Martin and “Apprenticeship Patterns” by Dave Hoover & Adewale Oshineye. These are very different books, but both aimed at less experienced software developers who want to learn about what it means to be a professional in the field. In this article I’d like to review them side by side. First some text from each preface on what the authors think the books are about:

Apprenticeship Patterns

“This book should help you through the tough decisions you face as a newcomer to the field of professional software development. “ (preface xi)

The Clean Coder

“This book is about software professionalism. It contains a lot of pragmatic advice” (preface xxii)

The Content
Both books contain a lot of personal stories and anecdotes from the authors’ careers, and begin with a short autobiography. Some of the advice is also similar. Both advise you to practice with “Kata” exercises, to read widely and to find suitable mentors. I think that’s mostly where the similarities end though.

Dave and Ade don’t say much about how to handle unreasonable managers imposing impossible deadlines. Bob Martin devotes a several chapters to this kind of issue, handling pressure, time management, estimation, making committments etc.

Dave and Ade talk more about how to get yourself into situations optimized for learning and progress in your career. They advise you to “Be the worst”, “Find mentors”, seek “Kindred Spirits”. In other words, join a team where you’re the least skilled but you’ll be taught, look for mentors in many places, and get involved in the community.

Bob talks about a lot of specific practices and has detailed advice. He mentions “… pairing is the most efficient way to solve a problem” (p164) Later in the chapter he suggests the optimal composition of job roles in a gelled team. (p169) He also has some advice about how to successfully argue with your boss and go over their head when necessary (p35).

The Advice
Those few example perhaps illustrate that these two books are miles apart when it comes to writing style, approach and world view. Dave&Ade have clearly spent a lot of time talking with other professionals about their material, acting on feedback and testing their ideas for validity in real situations. The book is highly collaborative and while full of advice, is not prescriptive.

Bob Martin on the other hand loves to be specific, provocative and extreme in his advice. “QA should find nothing.”(p114) “You should plan on working 60 hours per week.” (p16) “Avoid the Zone.” (p62) “The jury is in! … TDD works” (p79) These are some of his more suprising pieces of advice, which I think are actually fairly doubtful propositions when taken to extremes like this. Mixed in are more reasonable statements. “You do not have to attend every meeting to which you are invited” (p123) “The professional developer is calm and decisive under pressure”. (p150)

The way everything is presented as black-and-white, do-or-do-not-there-is-no-try is actually pretty wearing after a while. He does it to try to make you think, as a rhetorical device, to promote healthy discussion. I think it all too easily leads the reader to throw the baby out with the bathwater. I can’t accept one of his recommendations, so I throw them all out.

Some of Dave&Ade’s advice is actually just as hard to put into practice. Each of their patterns is followed by a call to action. Things like re-implementing a program you’ve written in an imperative language in a functional language (p21). Join or start a user group (p65). Solve the same coding exercise once a week for the next four weeks (p79). None of these things is particularly easy to do, but they seem to me to be interesting and useful challenges.


Collaboration
Bob has also clearly not collaborated very widely when preparing his material. One part that particularly sticks out for me is a footnote on page 75:

“I had a wonderful conversation with @desi (Desi McAdam, founder of DevChix) about what motivates women programmers. I told her that when I got a program working, it was like slaying the great beast. She told me that for her and other women she had spoken to, the act of writing code was an act of nurturing creation.” (footnote, p75)

Has he ever actually run his “programming is slaying a great beast” thing past any other male programmers? Let me qualify that – non-fantasy-role-playing male programmers? Thought not. This is in enormous contrast to Dave&Ade, whose book is full of stories from other people backing up their claims.

Stories
Bob’s book is full of stories from his own career, and he is very honest and open about his failures. This is a very brave thing to do, and I have a great deal of respect for him for doing so. It’s also really interesting to hear about the history of what life was like when computers filled a room and people used punch cards to program them. Dave&Ades stories are less compelling and not always as well written.

Bob’s book is not just about his professional life, he shares his likes and dislikes. He reccommends cycling or walking to recharge your energy, or “focus-manna” as he calls it, (p127). Reading science fiction as a cure for writer’s block. (p66) Listening to “The Wall” while coding could be bad for your design. (p63) When describing “Master” programmers he likens them to Scotty from Star Trek. (p182)

All this is very cute and gives you a more rounded picture of what software professionalism is about. Maybe. Actually it really puts me off the idea. I know a lot of software developers like science fiction and fantasy role playing, but it really isn’t mandatory. He usually says that you may have other preferences, and you don’t have to do like he does, but I just don’t think it helps all that much. The rest of the book is highly dogmatic about what you should and shouldn’t do, and it kind of rubs off.

Conclusions
The bottom line is, I wouldn’t reccommend “The Clean Coder” to any young inexperienced software developer, particularly not if she were a woman. There is too much of it written from a foreign culture, in a demanding tone, propounding overly extreme behaviour. The interesting stories and good pieces of advice are drowned out.

On the other hand, I would recommend “Apprenticeship Patterns”. I think it is humbly written and anchored in real experience from a range of people. I agree with them when they say you need to read it twice to understand it. The first time to get an overview, the second time to understand how the patterns connect. It’s not as easy to read as it might be. But still, I think the content is interesting, and it gives a good introduction to what being a professional software craftsman is about, and how to get there.

This weekend I was in Stockholm to facilitate a Code Retreat, organized by Peter Lind and sponsored by Valtech. We were about 40 coders gathered in the warm autumn sunshine early on a Saturday morning at Valtech’s offices. (Do take a look at Peter’s blog post about it, he has a photo too).

It’s actually the first time I’ve even attended a code retreat, let alone facilitated, but I think it went pretty well. Corey Haines has written extensively about what should happen, and what the facilitator should do. I think he’s given a great gift to the community, not just by inventing the format, but also by documenting it thorougly.  I’ve previously led various coding dojos and “clean code day” events, but code retreat is somewhat different in format, if not in aim.

The reason for going to a code retreat is to practice your coding skills. By repeating the same exercise over and over, with different pairing partners, you have a chance to work on your coding habits. Do you pay attention to what your tests are telling you about your design? Do you remember to refactor regularly? Can you take really small steps when you need to?

For the day in Stockholm, we followed the tried and tested formula for a code retreat that Corey has laid out. I spent about 20 minutes introducing the day, the aims and the coding problem (Conway’s Game of Life). Then we did 6 coding sessions, each with a short retrospective, and a longer retrospective at the end of the day. Each session comprised 45 minutes coding in pairs, 10 minutes retrospective in groups of 6-8, and 5 minutes to swap partners. I also began each coding session by reminding everyone of what we were supposed to be practicing, and highlighted a different “challenge” to add some variety. The challenges were things like:

– concentrate on writing really beautiful code so the language looks like it was made for the problem. *
– partition code at different levels of abstraction. **
– Think about TDD in terms of states and moves.
– do TDD as if you meant it
– concentrate on refactoring in very small steps

Each pairing session is just 45 minutes, and in that time you don’t actually have time to really solve the whole kata, and that’s actually quite difficult to cope with. Most coders are very motivated by writing code that does something useful, and like to show off their finished designs at the end. To try to prevent that, Corey emphasizes that you should keep in mind the end result isn’t important, and be sure to delete the code at the end of the session. I found that even with that rule, there was quite a lot of discussion of how the designs ended up, and some people even saved their code.

One of the things I encouraged people to try was working in an unfamiliar programming language, and although I specified “for 1 or 2 sessions”, I was surprised to find how popular it was to do this. After the first session when most people used Java, C#, Ruby or Python, there were more and more people coding in Clojure, Javascript, Erlang and even Vim script. I think it got a bit out of hand actually. It’s hard to practice your coding habits and TDD skills when you’re struggling with the language syntax and how to get the tests to run. Next time I facilitate I’ll try to be clearer about using a familiar environment for most of the sessions.

One of the things I offered in the last session was using the cyberdojo, and three pairs agreed to try it. I had them working in Java and Ruby, switching pairs every 5 minutes, coding in a browser window. They complained about the browser experience compared with their IDEs, but they liked the feedback cyberdojo gives you. It shows how long you spend between running the tests, and whether the tests pass, fail or give a compiler error.

I’m not sure if it was a good idea to bring in the cyberdojo at the code retreat. One of the main things we discussed in the retrospective for that session was the resistance they all felt to changing the first test that was written at one of the three pairing stations. This test was too big and focussed on a boring part of the problem. Yet each person who “inherited” the code tried their best to make it pass, no-one started over with a better test. It’s that kind of collaboration problem that the cyberdojo is good at highlighting. It’s not so much a tool for improving your coding skills as improving your collaboration skills. This is good, but not really the purpose of the code retreat.

Thinking back over the day, I’ve also become a little uncertain about the “delete your code” rule. I understand why it’s there, but it didn’t seem to prevent people from trying to solve the whole problem in 45 minutes. By deleting the code, you also lose the opportunity to use analysis tools like those in the cyberdojo to give you some more feedback on how you’re doing.

Outside of this code retreat, I’ve been trying out the codersdojo client quite a bit recently, to see if it gives a useful analysis of a coding session. Unlike cyberdojo, it lets you use your normal coding tools/IDE. So far it’s still in beta testing and seems too buggy for me to recommend, but if you’re lucky enough to successfully upload your coding session, you do get quite a good visualization of some of your coding habits. It will clearly show if you spend a long time between test runs, or if you spend a lot of time with failing tests.

So after my first code retreat, I’m feeling very encouraged that this is a good format for becoming a better coder, and I’d be happy to run one again. I’d like to try using coding visualization tools as part of the retrospective for each session. I’d also like to try setting the challenges before people have chosen a pairing partner, so they can find someone who also wants to work on my challenge rather than just try a new language. Or maybe I just need to emphasize more that trying a new language isn’t the focus of the day.

In any case, I hope this blog post shows that I learnt a lot from facilitating this code retreat, even if I didn’t write a single line of code myself 🙂

* “You can call it beautiful code when the code also makes it look like the language was made for the problem” — Ward Cunningham quoted in “Clean Code” by Bob Martin.
** G6: Code at Wrong Level of Abstraction – advice from “Clean Code” by Bob Martin.

I’ve been working on a kata called “Tennis”*, which I find interesting, because it is quite quick to code, yet is a big enough problem to be worth doing. It’s also possible to enumerate pretty much all the allowed scores, and get very comprehensive test coverage.

What I’ve found is that when I’m using TDD to solve the Kata, I tend to only enumerate actually a very small number of the test cases. I generally end up with something like:

Love-All
Fifteen-All
Fifteen-Love
Thirty-Forty
Deuce
Advantage Player1
Win for Player1
Advantage Player2

I think that’s enough to test drive a complete implementation, built up in stages. I thought it would be enough tests to also support refactoring the code, but I actually found it wasn’t. After I’d finished my implementation and mercilessly refactored it for total readability, I went back and implemented exhaustive tests. To my horror I found three (of 33) that failed! I’d made a mistake in one of my refactorings, and none of my original tests found it. The bug only showed up with scores like Fifteen-Forty, Love-Thirty and Love-Forty, where my code instead reported a win for Player 2. (I leave it as an exercise for the reader to identify my logic error 🙂

So what’s the point of TDD? Is it to help you make your design good, or to protect you from introducing bugs when refactoring? Of course it should help with both, but I think doing this practice exercise showed me (again!) that it really is worth being disciplined and careful about refactorings. I also think I need to develop a better sense for which refactorings might not be well covered by the tests I have, and when I should add more.

This is something that my friend Andrew Dalke brings up when he criticises TDD. The red-green-refactor iterative, incremental rhythm can lull you into a false sense of security, and means you forget to stop and look at the big picture, and analyze if the tests you have are sufficient. You don’t get reminded to add tests that should pass straight away, but might be needed if you refactor the code.

So in any case, I figured I needed to practice my refactoring skills. I’ve created comprehensive tests and three different “defactored” solutions to this kata, in Java and Python. You can get the starting code here. You can use this to practice refactoring with a full safety net, or if you feeling brave, without. Try commenting out a good percentage of the tests, and do some major refactoring. When you bring all the tests back, will they still all pass?

I’m planning to try this exercise with my local python user group, GothPy, in a few weeks time. I think it’s going to be fun!

* Tennis Kata: write a program that if you tell it how many points each player has won in a single game of tennis, it will tell you the score.