Posts tagged ‘clean code’

Fictitious people that might get real – and sue you!

Finding realistic data for testing is often a headache, and a good strategy is often to fabricate it. But what if your randomly generated data turns out to belong to a real person? What if they complain and you get fined 4% of global turnover?!

Please note: This story was originally posted on Praqma’s site in October 2017.

GDPR

I am of course referring to the new GDPR regulations which will come into force next year. The aim of them is to prevent corporations from misusing our personal data, which generally seems like a good thing. Many companies use copies of production databases for testing new versions of their software. Sometimes that means testers and developers get access to a lot of data that they probably shouldn’t be able to see. Sometimes it means you’re testing with old and out of date versions of people’s information, or even testing with customers who cancelled their contracts with you long ago.

Generally I’m in favour of tightening up the rules around this, since I think it will better protect people’s privacy. I think it will also force companies to improve their testing practices. In my experience it’s easier to get reliable, repeatable automated tests if each test case is responsible for creating all the test data it uses.

If you instead write a lot of tests assuming what data is in the database from the start, maintaining that data can get really expensive. I’ve seen this happen first hand – at one place I worked, maintaining test data became such a big job that a whole test suite had to be thrown away along with the data!

Data-Driven Testing

At my last job, part of the test strategy involved data-driven testing. In my test code I’d access an internal API to create fictitious companies in the system. Then, simply by varying the exact configuration of these companies, I could exercise many different features of the system under test. Every case test created different, unique companies. This meant you could run each case as many times as you liked. You could also run lots of tests together in parallel, since they all worked with different data. Each time you ran the tests, you had the opportunity to start over with a fresh, almost empty database – a small memory footprint that meant a fast system and speedy test results.

Fictitious People

When I came to my current client I was planning to use a similar approach. The difference here is that instead of fictitious companies, I needed to create fictitious people. So, I got out my random number generator and started creating fictitious names, addresses, and, of course, Swedish Personal Numbers. That’s when things started getting tricky… Let me introduce those of you who aren’t Swedish to the Personal Number.

It’s a bit like a social security number, unique for each person, and the government and other agencies use it as a kind of primary key when storing your data. If you know someone’s number, (and you know where to look), you can find out their official residence, how much tax they paid, credit rating, that sort of thing.

You’ll understand why, then, we started getting worried about GDPR at this point. A Swedish personal number is definitely a ‘personally identifiable’ piece of data, and GDPR has a lot to say about those.

A primary key containing your age

The other thing about the personal number is that it’s not just a number. When they first came up with the idea, in the 1940s, they thought it would be a great idea to encode some information in this number. (I’m not sure database indexing theory and primary keys were very well understood then!) So, they decided that the first part should be your birthdate, then three digits specifying where in Sweden you were born, and the last figure is both a checksum, so numbers can be validated, and also specifies your gender.

These days, lots of Swedish people are born outside Sweden, and not everyone has a gender that matches the one they had at birth, so they had to relax some of these rules. Sometimes too many children are born on one day and they run out of numbers for it, in which case you might get assigned the day before or after. However, the number still always includes your birth year, and hence your age.

Never ask a woman her age, (just get her personal number)

These days it’s quite common for retail chains to ask you to join a ‘customer club’ so you can accrue points on your purchases. When you buy something the assistant will often ask for your club membership number. Almost always the membership number is the same as your personal number, so you end up telling them (and everyone in the queue behind you) exactly how old you are. It’s disconcerting, if not rude!

Fictitious Test People get real and sue you

Back to my test data problem. I’m generating fictitious people, but the system under test both validates the checksum, and uses the personal number to determine their age. I have to generate realistic numbers from the last hundred years or so. This is where things start to get tricky. Someone in the legal department objected to this. He told me that if I was to generate a number belonging to a real person we could really get into trouble because of GDPR. If that person discovered we were using their personal number, and had changed their name to “Test Testsson” living at “Test gata 1, 234 56 Storstad”, that might not be too bad. Unfortunately, if we were using them for a tricky scenario, they might also discover our test database linked their personal number to a dreadful credit rating and a history of insolvency. At that point they might start to get insulted. In the worst case, they could sue us and GDPR would mean we’d have to pay them an awful lot of money!

Keeping fictitious test people imaginary

However unlikely that scenario might be my client is understandably unwilling to take the risk, and I really need to restrict the test system to genuinely fictitious personal numbers.The authorities do provide a list of these exactly for testing purposes, but the trouble is there are only about 1000 numbers on the official list. With the kind of automated testing I’m planning to do that is not enough. Not nearly enough! Assuming my system has about 500 features, I want to write 5 scenarios per feature, I have 25 developers running all the tests 25 times a day… I need something like half a million unique numbers. Less if I can wipe the db more often than once a day, but still. A thousand doesn’t go very far.

So, here’s a startup idea for you – I’m pretty sure my client isn’t the only company out there who would pay good money for a list of a million personal numbers, for people aged less than 100, that are guaranteed not to exist in reality. If you have such a list, and would be willing to maintain it for me such that it remains guaranteed fictitious, please get in touch!

Playing with time

Right, so I’m back on generating personal numbers again, not having found that list yet. I have a couple of ideas about how to keep them fictitious though. The first idea is to use really old numbers. While I’m testing the system, I can set the clock back to 1900, and only use generated numbers from 1800-1890. Since everyone over 125 can be assumed to have passed on, they are unable to sue us. For some scenarios I need under 18 year olds, and by changing the date that the system clock thinks is ‘today’, I can generate children born in the 1870s. Fictitious children complete with bank accounts, mobile phone contracts, student debt, acne… and everything will work just fine!

Hack the checksum

The other idea I had was to use personal numbers with invalid checksums. After the date of birth and the three random numbers, the last figure in the personal number is calculated from the others, using the Luhn algorithm. While I’m testing the system, I can modify the ‘PersonalNumber.validateChecksum()’ function to instead accept all invalid numbers, and reject all valid ones. Then all my tests can generate perfectly invalid personal numbers, and the test database will get filled with people that definitely can’t exist in reality.

As with any well-designed system, there is only one place in the code where I calculate the checksum, so it’s actually a really small change just for testing. It also comes with insurance – if I ever accidently deploy this code change in production, absolutely everything will stop working, since all those valid numbers already in the production database will suddenly get rejected. I think we’d notice pretty quickly if that happened!

Is there a better way?

I haven’t actually implemented any of these strategies yet – we’re still at the planning stage. I’d be very interested to hear if anyone else has solved this problem in a better way. In the meantime, I’ve written some code that can generate personal numbers with and without valid checksums, and also lets you specify an age range – it’s available on my github page.

We can’t be the only people worried about GDPR and personal numbers. That threat of a fine of 4% of global turnover is a pretty mighty incentive to focus some effort on cleaning up test data.

I’ve been working on this Kata “Gilded Rose” at a few different coding dojos lately. There is even a video of a session I did at the “Tampere Goes Agile” conference recently. In the video, you can see me talking about my Principles of Agile Test Automation, which I have just written about, and updated in my last blog post.

I think these test automation principles are useful to think about when you’re doing the Gilded Rose kata. The basic plot of the Kata is that you’ve just been hired to look after an existing system, and the customer wants a new feature. Having a look at the code, you can see you’re going to want to refactor it a little before adding the new feature, and before you do that, you’re going to want some automated tests.

So the first part of the Kata is to add automated tests to the existing code. You’ve got a requirements document the customer has given you, and you can use it to identify test cases. You’ve also got the code which you can read and execute and work out what it does. The customer is happily using the code in production right now, so you can assume that the behaviour it has is the behaviour they want to keep, whatever it says in the requirements document. (hint!)

Warning – spoilers lie ahead! You should probably try the Gilded Rose kata for yourself before reading on!

When I’ve done this exercise with various groups, I’ve spent a lot of time discussing with people how to make their test cases really readable, and express the requirements clearly, and at the same time useful as regression protection when refactoring the code later.

When you design a test suite you have two main aims – to help you understand what the code should do, (and what it does now), and protection from regression failures when you update it. It can be a bit tricky to do both with the same test suite. If you focus solely on describing the requirements in an executable way, you tend to miss edge cases and there are gaps in the regression protection. If you focus only on regression protection, you’ll spend time analysing the edge cases, and measuring code coverage to see how well you’re doing, but the test cases can become quite hard to read and understand.

You can see for yourself by comparing this test case by Bobby Johnson with this text-based approval test. (It was written by several people at a GothPy meeting). Bobby’s test case is extremely readable and expresses the requirements clearly. He’s done pretty well on the edge cases, but I think he’s missing one or two*. With the text-based approval tests, it’s not so easy to understand what the underlying business rules are, although the regression protection is very good.

When I do this kata with a group, we spend some time discussing the various test cases we’ve come up with, and showing them on the projector. When we did this last week at the Booster Conference, people commented that showing these different test cases had given them a better understanding of “readability” and “regression protection”, and many went on to improve their test suites.

Once you’re reasonably happy with your test suite, the next task is to do the refactoring and add the new feature. How useful are your test cases for regression protection? It’s very easy to make refactoring mistakes in this kata, and you will be testing your tests! You may discover while refactoring that there are more test cases that you want to add. Version control can be pretty useful, so you can run the new test cases against the original code.

There’s also an interesting restriction on your refactoring options – the “Item” class is owned by a nasty-sounding goblin and he doesn’t want you to change his code, so if you do, you have to be prepared for some serious consequences! When comparing refactored solutions at the end of the dojo, this is often an interesting discussion point – did you change the Item class? Is your new design so great that you’re prepared to argue with the goblin for it?!

I havn’t tried this, but I would actually like to try running the text-based approval test against all the refactored solutions at the end of the coding dojo, as input to the retrospective. I think this test covers all the edge cases very well, and would reveal any refactoring mistakes that were not caught by the tests people had developed themselves. That would be interesting feedback to have!

If you havn’t tried the Gilded Rose kata yourself, I do recommend it for practicing writing good test cases. I’d be happy to get a pull request from you if you want to translate the exercise into your favourite programming language, or you can do it in the original C#, as Bobby suggests.

If you’re interested in taking part in a coding dojo with me, I’ll be at several conferences later this year: ACCU in Bristol, XP2013 in Vienna and Test Automation Day in the Netherlands.

* I believe he’s missing a check that the quality of backstage passes doesn’t increase past 50

This Code Kata is included in my new book “The Coding Dojo Handbook”, currently published as a work-in-progress on LeanPub.com. You can also download starting code and these instructions from my github page.

As a Health Insurer,
I want to be able to search for patients who have a medicine clash,
So that I can alert their doctors and get their prescriptions changed.

Health Insurance companies don’t always get such good press, but in this case, they actually do have your best interests at heart. Some medicines interact in unfortunate ways when they get into your body at the same time, and your doctor isn’t always alert enough to spot the clash when writing your prescriptions. Sometimes, medicine interactions are only identified years after the medicines become widely used, and your doctor might not be completely up to date. Your Health Insurer certainly wants you to stay healthy, so discovering a customers has a medicine clash and getting it corrected is good for business, and good for you!

For this Kata, you have a recently discovered medicine clash, and you want to look through a database of patient medicine and prescription records, to find if any need to be alerted to the problem. Create a “Patient” class, with a method “Clash” that takes as arguments a list of medicines, and how many days before today to consider, (defaults to the last 90 days). It should return a collection of days on which all the medicines were being taken during this time.

If you like, you can also create a visualization of the clash, something like this:

medicine_clash

Data Format

You can assume the data is in a database, which is accessed in the code via an object oriented domain model. The domain model is large and complex, but for this problem you can ignore all but the following entities and attributes:

TDDStatesMoves_003

In words, this shows that each Patient has a list of Medicines. Each Medicine has a list of Prescriptions. Each Prescription has a dispense date and a number of days supply.

You can assume:

  • Patients start taking the medicine on the dispense date.
  • The “days supply” tells you how many days they continue to take the medicine after the dispense date.
  • If they have two overlapping prescriptions for the same medicine, they stop taking the earlier one. Imagine they have mislaid the medicine they got from the first prescription when they start on the second prescription.

When you’ve tried the Kata for yourself

Then you might be interested in reviewing the sample solution I’ve put up on my github page. I find this code interesting because it is seemingly well written. The methods are short with thought-through names, and there are lots of unit tests. I also find it very difficult to follow. What do you think?

The biology of medicine clashes*

When you take a pill of medicine, the active substance will be absorbed through the lining of the gut, and enter your bloodstream. That means it will be taken all over your body, and can do its work. For example, if you take a headache pill, the active substance in the drug will be taken by your blood to where it can block your pain receptors. At the same time, there are enzymes at work in your liver, which break down medicinal substances they find in your bloodstream. Eventually all the medicine will be removed, so you have to take another pill if you want the effects to continue.

In the liver, there are several different enzymes working, and they are specialized in breaking down different substances. For example, the “CYP 2C9” enzyme will break down ibuprofen, the active ingredient in many headache pills. The trouble is, there are other medicines which will stop particular enzymes from doing their work, which can lead to an overdose or other ill effects.

One example is the clash between fluoxetine and codeine. Fluoxetine is known by its trade name “Prozac”, and is often taken for depression. Codeine is another ingredient used in headache pills, and is actually a “pro-drug”, so it works slightly differently. Codeine needs to be broken down in the liver by the enzyme “CYP 2D6” into the active substance, morphine, before it will do anything. Fluoxetine has the effect of blocking “CYP 2D6”, so if you take the two medicines together, you won’t get much painkilling effect from the codeine. That could be depressing!

The solution to the problem is to take a different painkiller – one that’s not affected by that liver enzyme. Simply switch codeine for ibuprofen, and you should be be a little happier.

* With thanks to Sara Sjöberg for helping me with this section

A little while back I was back in Stockholm facilitating another Code Retreat, (see previous post), this time as part of Global Day of Code Retreat, (GDCR). (Take a look at Corey Haines’ site for loads of information about this global event that comprised meetings in over 90 cities worldwide, with over 2000 developers attending.)

I think the coding community could really do with more people who know how to do TDD and think about good design, so I’m generally pretty encouraged by the success of this event. I do feel a bit disappointed about some aspects of the day though, and this post is my attempt to outline what I think could be improved.

I spent most of the Global Day of Code Retreat walking around looking over people’s shoulders and giving them feedback on the state of their code and tests. I noticed that very few pairs got anywhere close to solving the problem before they deleted the code at the end of each 45 minute session. Corey says that this is very much by design. You know you don’t have time to solve the problem, so you can de-stress: no-one will demand delivery of anything. You can concentrate on just writing good tests and code.

In between coding sessions, I spent quite a lot of effort reminding people about simple design, SOLID principles, and how to do TDD (in terms of states and moves). Unfortunately I found people rarely wrote enough code for any of these ideas to really be applicable, and TDD wan’t always helping people.

I saw a lot of solutions that started with the “Cell” class, that had an x and y coordinate, and a boolean to say whether it was alive or not. Then people tended to add a “void tick(int liveNeighbourCount)” method, and start to implement code that would follow the four rules of Conway’s Game of Life to work out if the cell should flip the state of its boolean or not. At some point they would create a “World” class that would hold all the cells, and “tick()” them. Or some variant of that. Then, (generally when the 45 minutes were running out), people started trying to find a data structure to hold the cells in, and some way to find out which were neighbours with each other. Not many questioned whether they actually needed a Cell class in the first place.

Everyone at the code retreat deleted their code afterwards, of course, but you can see an example of a variant on this kind of design by Ryan Bigg here (a work in progress, he has a screencast too).

Of course, as facilitator I spent a fair amount of time trying to ask the kind of questions that would push people to reevaluate their approach. I had partial success, I guess. Overall though, I came away feeling a bit disillusioned, and wanting to improve my facilitation so that people would learn more about using TDD to actually solve problems.

At the final retrospective of the day, everyone seemed to be very positive about the event, and most people said they learnt more about pair programming, TDD, and the language and tools they were working with. We all had fun. (If you read Swedish, Peter Lind wrote up the retrospective in Valtech’s blog) This is great, but could we tweak the format to encourage even more learning?

I think to solve Conway’s Game of Life adequately, you need to find a good datastructure to represent the cells, and an efficient algorithm to “tick” to the next generation. Just having well named classes and methods, although a good idea, probably won’t be enough.

For me, TDD is a good method for designing code when you already mostly know what you’re doing. I don’t think it’s a good method for discovering algorithms. I’m reminded of the debate a few years back when Peter Norvig posted an excellent Sudoku solver (here) and people thought it was much better than Ron Jeffries’ TDD solution to the same problem (here). Peter was later interviewed about whether this proved that TDD was not useful. Dr Norvig said he didn’t think that it proved much about TDD at all, since

“you can test all you want and if you don’t know how to approach the problem, you’re not going to get a solution”
(from Peter Siebel’s book “Coders At Work”, an extract of which is available in his blog)

I felt that most of the coders at the code retreat were messing around without actually knowing how to solve the problem. They played with syntax and names and styles of tests without ever writing enough code to tell whether their tests were driving a design that would ultimately solve the problem.

Following the GDCR, I held a coding dojo where I experimented with the format a little. I only had time to get the participants do two 45 minute coding sessions on the Game of Life problem. At the start, as a group we discussed the Game of Life problem, much as Corey recommends for a Code Retreat introduction. However, in addition, I explained my favoured approach to a solution – the data structure and algorithm I like to use. I immediately saw that they started their TDD sessions with tests and classes that could lead somewhere. I feel that if they had continued coding for longer, they should have ended up with a decent solution. Hopefully it would also have been possible to refactor it from one datastructure to another, and to switch elements of the algorithm without breaking most of the tests.

I think this is one way to improve the code retreat. Just give people more clues at the outset as to how to tackle the problem. In real life when you sit down to do TDD you’ll often already know how to solve similar problems. This would be a way for the facilitator to give everyone a leg-up if they havn’t worked on any rule-based games involving a 2D infinite grid before.

Rather than just explaining a good solution up front, we could spend the first session doing a “Spike Solution”. This is one of the practices of XP:

“the idea is just to drive through the entire problem in one blow, not to craft the perfect solution first time out.”
From “Extreme Programming Installed” by Jeffries, Anderson, Hendrickson, p41

Basically, you hack around without writing any tests or polishing your design, until you understand the problem well enough to know how you plan to solve it. Then you throw your spike code away, and start over with TDD.

Spending the first 45 minute session spiking would enable people to learn more about the problem space in a shorter amount of time than you generally do with TDD. By dropping the requirement to write tests or good names or avoid code smells, you could hopefully hack together more alternatives, and maybe find out for yourself what would make a good data structure for easily enumerating neighbouring cells.

So the next time I run a code retreat, I think I’ll start with a “spiking” session and encourage people to just optimize for learning about the problem, not beautiful code. Then after that maybe I’ll sketch out my favourite algorithm, in case they didn’t come up with anything by themselves that they want to pursue. Then in the second session we could start with TDD in earnest.

I always say that the code you end up with from doing a Kata is not half as interesting as the route you took to get there. To this end I’ve published a screencast of myself doing the Game of Life Kata in Python. (The code I end up with is also published, here). I’m hoping that people might want to prepare for a Code Retreat by watching it. It could be an approach they try to emulate, improve on or criticise. On the other hand, showing my best solution like this is probably breaking the neutrality of how a code retreat facilitator is supposed to behave. I don’t think Corey has ever published a solution to this kata, and there’s probably a reason for that.

I have to confess, I already broke all the rules of being a facilitator, when I spent the last session of the Global Day of Code Retreat actually coding. There was someone there who really wanted to do Python, and no-one else seemed to want to pair with him. So I succombed. In 45 minutes we very nearly had a working solution, mostly because I had hacked around and practiced how to do it several times before. It felt good. I think more people should experience the satisfaction of actually completing a useful piece of code using TDD at a Code Retreat.

Programmers have a vested interest in making sure the software they create does what they think it does. When I’m coding I prefer to work in the context of feedback from automated tests, that help me to keep track of what works and how far I’ve got. I’ve written before about Test Driven Development, (TDD). In this article I’d like to explain some of the main features of Text-Based Testing. It’s a variant on TDD, perhaps more suited to the functional level than unit tests, and which I’ve found powerful and productive to use.


The basic idea
You get your program to produce a plain text file that documents all the important things that it does. A log, if you will. You run the program and store this text as a “golden copy” of the output. You create from this a Text-Based Test with a descriptive name, any inputs you gave to the program, and the golden copy of the textual output.

You make some changes to your program, and you run it again, gathering the new text produced. You compare the text with the golden copy, and if they are identical, the test passes. If the there is a difference, the test fails. If you look at the diff, and you like the new text better than the old text, you update your golden copy, and the test is passing once again.

Tool Support
Text-Based Testing is a simple idea, and in fact many people do it already in their unit tests. AssertEquals(String expected, String actual) is actually a form of it. You often create the “expected” string based on the actual output of the program, (although purists will write the whole assert before they execute the test).

Most unit test tools these days give you a nice diff even on multi-line strings. For example:

download
download (1)

Which is a failing text-based test using JUnit.

Once your strings get very long, to the scale of whole log files, even multi-line diffs aren’t really enough. You get datestamps, process ids and other stuff that changes every run, hashmaps with indeterminate order, etc. It gets tedious to deal with all this on a test-by-test basis.

My husband, Geoff Bache, has created a tool called “TextTest” to support Text-Based testing. Amongst other things, it helps you organize and run your text-based tests, and filter the text before you compare it. It’s free, open source, and of course used to test itself. (Eats own dog food!) TextTest is used extensively within Jeppesen Systems, (Geoff works for them, and they support development), and I’ve used it too on various projects in other organizations.

In the rest of this article I’ll look at some of the main implications of using a Text-Based Testing approach, and some of my experiences.

Little code per test
The biggest advantage of the approach, is that you tend to write very little unique code for each test. You generally access the application through a public interface as a user would, often a command line interface or (web)service call. You then create many tests by for example varying the command line options or request contents. This reduces test maintenance work, since you have less test code to worry about, and the public API of your program should change relatively infrequently.

Legacy code
Text-Based Testing is obviously a regression testing technique. You’re checking the code still does what it did before, by checking the log is the same. So these tests are perfect for refactoring. As you move around the code, the log statements move too, and your tests stay green, (so long as you don’t make any mistakes!) In most systems, it’s cheap and risk-free to add log statements, no matter how horribly gnarly the design is. So text-based testing is an easy way to get some initial tests in place to lean on while refactoring. I’ve used it this way fairly successfully to get legacy code under control, particularly if the code already produces a meaningful log or textual output.


No help with your design
I just told you how good Text-Based Testing is with Legacy code. But actually these tests give you very little help with the internal design of your program. With normal TDD, the activity of creating unit tests at least forces you to decompose your design into units, and if you do it well, you’ll find these tests giving you all sorts of feedback about your design. Text-Based tests don’t. Log statements don’t care if they’re in the middle of a long horrible method or if they’re spread around several smaller ones. So you have to get feedback on your design some other way.

I usually work with TDD at the unit level in combination with Text-Based tests at the functional level. I think it gives me the best of both worlds.

Log statements and readability
Some people complain that log statements reduce the readability of their code and don’t like to add any at all. They seem to be out of fashion, just like comments. The idea is that all the important ideas should be expressed in the class and method names, and logs and comments just clutter up the important stuff. I agree to an extent, you can definitely over-use logs and comments. I think a few well placed ones can make all the difference though. For Text-Based Testing purposes, you don’t want a log that is megabytes and megabytes of junk, listing every time you enter and leave every method, and the values of every variable. That’s going to seriously hinder your refactoring, apart from being a nightmare to store and update.

What we’re talking about here is targeted log statements at the points when something important happens, that we want to make sure should continue happening. You can think about it like the asserts you make in unit tests. You don’t assert everything, just what’s important. In my experience less than two percent of the lines of code end up being log statements, and if anything, they increase readability.

Text-Based tests are completed after the code
In normal TDD you write the test first, and thereby set up a mini pull system for the functionality you need. It’s lean, it forces you to focus on the problem you’re trying to solve before you solve it, and starts giving you feedback before you commit to an implementation. With Text-Based Testing, you often find it’s too much work the specify the log up front. It’s much easier to wait until you’ve implemented the feature, run the test, and save the log afterwards.

So your tests usually aren’t completed until after the code they test, unlike in normal TDD. Having said that, I would argue that you can still do a form of TDD with Text-Based Tests. I’d normally create the half the test before the code. I name the test, and find suitable inputs that should provoke the behaviour I need to implement in the system. The test will fail the first time I run it. In this way I think I get many of the benefits of TDD, but only actually pin down the exact assertion once the functionality is working.

“Expert Reads Output” Antipattern
If you’re relying on a diff in the logs to tell you when your program is broken, you had better have good logs! But who decides what to log? Who checks the “golden copy”? Usually it is the person creating the test, who should look through the log and check everything is in order the first time. Of course, after a test is created, every time it fails you have to make a decision whether to update the golden copy of the log. You might make a mistake. There’s a well known antipattern called “Expert Reads Output” which basically says that you shouldn’t rely on having someone check the results of your tests by eye.

This is actually a problem with any automated testing approach – someone has to make a judgement about what to do when a test fails – whether the test is wrong or there’s a bug in the application. With Text-Based Testing you might have a larger quantity of text to read through compared with other approaches, or maybe not. If you have human-readable, concise, targeted log statements and good tools for working with them, it goes a long way. You need a good diff tool, version control, and some way of grouping similar changes. It’s also useful to have some sanity checks. For example TextTest can easily search for regular expressions in the log and warn you if you try to save a golden copy containing a stack trace for example.

In my experience, you do need to update the golden copy quite often. I think this is one of the key skills with a Text-Based Testing approach. You have to learn to write good logs, and to be disciplined about either doing refactoring or adding functionality, not both at the same time. If you’re refactoring and the logs change, you need to be able to quickly recognize if it’s ok, or if you made a mistake. Similarly, if you add new functionality and no logs change, that could be a problem.

Agile Tests Manage Behaviour
When you create a unit test, you end with an Assert statement. This is supposed to be some kind of universal truth that should always be valid, or else there is a big problem. Particularly for functional level tests, it can be hard to find these kinds of invariants. What is correct today might be updated next week when the market moves or the product owner changes their mind. With Text-Based Testing you have an opportunity to quickly and easily update the golden copy every time the test “fails”. This makes your tests much more about keeping control of what your app does over time, and less about rewriting assert statements.

Text-Based Testing grew up in the domain of optimizing logistics planning. In this domain there is no “correct” answer you can predict in advance and assert. Planning problems that are interesting to solve are far too complex for a complete mathematical analysis, and the code relies on heuristics and fancy algorithms to come up with better and better solutions. So Text-Based Testing makes it easy to spot when the test produces a different plan from before, and use it as the new baseline if it’s an improvement.

I think generally it leads to more “agile” tests. They can easily respond to changes in the business requirements.

Conclusions
There is undoubtedly a lot more to be said about Text-Based Testing. I havn’t mentioned text-based mocking, data-driven vs workflow testing, or how to handle databases and GUIs – all relevant topics. I hope this article has given you a flavour of how it’s different from ordinary TDD, though. I’ve found that good tool support is pretty essential to making Text-Based Testing work well, and that it’s a particularly good technique for handling legacy code, although not exclusively. I like the approach because it minimizes the amount of code per test, and makes it easy to keep the tests in sync with the current behaviour of the system.