Posts tagged ‘agile’

Note: this article was first published on Praqma’s website

Experiences Pairing with Llewellyn Falco

How does a Technical Agile Coach improve work in a development team? When Llewellyn Falco asked me to pair with him at a client I jumped at the chance to see how effective mob programming is for introducing technical agile practices.

Day one: head first into the mob

My first day as a visiting Technical Agile Coach begins with coding, in a mob. All the developers in the team and I enter our names into a mob timer. It will prompt us to switch roles every 5 minutes, so we will all take a turn at the keyboard. Llewellyn takes the facilitator role, sitting at the back. I find mob programming is great way to get to know a team and their codebase. After only a few minutes, I take the Driver role, which forces to very quickly pick up what’s going on. I need to understand both the current task, the IDE and the particular code we’re working on. The Navigator prompts me and the whole team helps me to find where to click and what to type.

Llewellyn has worked with this client on and off for half a year, and he was here all the previous week. This team has quite a bit of experience mob programming with him and a couple of other visiting coaches. Llewellyn gets involved facilitating the mob from time to time, usually to draw our attention to some improvement we could make in the code, or in the way we’re working together. When the team is stuck, or going in the wrong direction, he will step in and take the Navigator role for a while. That can happen when we’re doing a tricky refactoring, or if he spots we aren’t taking full advantage of our tools.

After 90 minutes the team has to go to their stand-up meeting. Usually we’d mob for about 2 hours with a team, but today we have a longer breathing space before the Learning Hour. Llewellyn and I take the chance to discuss the challenges this team is facing and how we can coach them more effectively when we meet them again tomorrow. Later this week he’s going to get me to take the Facilitator role and he’ll be in the mob instead. Next week he’ll fly home and leave me here for a week coaching by myself, so we’re already preparing for that handover.

2

The Learning Hour

The Learning Hour is a fixture in the calendar of everyone in the department. It’s one hour devoted to learning new techniques in software development, every day, led by Llewellyn or a visiting coach like me. Not everyone can make it every day, so the planned topics are circulated in advance with an indication of whether it’s a coding session or not. When it’s coding not as many managers/scrum masters/product owners turn up. Actually some of them do enjoy attending these more technical sessions to get a better insight into what challenges their developers are facing.

The lesson I’ve decided to begin with uses the Tennis Refactoring Kata, an exercise I’ve done many times with teams. I am hoping it will be fun and not too difficult, especially compared with the production code they are used to. The exercise has comprehensive unit tests that quickly fail if you make a mistake. The developers aren’t used to having that, and they soon find they like having fast feedback on the accuracy of their work.

Lunch dating

Lunch is next up, and it turns out I’m not eating with Llewellyn. He’s set me up to go out with one of the developers from a team I won’t otherwise be working with. It’s a very deliberate policy Llewellyn has to help me to get to know the wider organization outside of the teams I mob with. He’s found that daily one-on-one chats with key people preferably in a social setting, is an effective way of becoming well-connected in an organization. Today it also gives me an opportunity to offer some career advice to an ambitious developer in a similar position to where I was ten or so years ago.

After lunch Llewellyn introduces me to a second team that I’ll be mob programming with, then leaves me to it. He’s confident this team is working well together and it will be straightforward for me to facilitate without him. We get stuck straight in to some front-end development work, improving a new account creation form. My javascript is a little rusty, but I find that’s not really a problem – they know their tools. I just need to keep an eye on how the mob is running, and think a little outside the box. I spot that what they’re doing will likely break some of the automated GUI tests, so we have a chat about how to handle that.

In the meantime, Llewellyn is working with a different team who are new to him. He begins teaching them the basics of mob programming, and getting to know their specific challenges. A future visiting coach might get to work with them once they’re up to speed.

Managing up and down

At the end of the session, we take a short break together and discuss how things are going. We have a little slack in our schedule, and Llewellyn spots one of the senior managers having a coffee. He takes the chance to greet him and book a short meeting the following day. Llewellyn’s heard there are plans afoot to break apart some of the teams, and he wants to ask the boss to protect a particular team from that reorganization because they are really starting to gel and mob well together.

For the third mobbing session of the day Llewellyn and I are pair-coaching again. It’s similar to the first session. All the teams we’re working with are really struggling with code quality and lack of automated tests. Even if ostensibly we’re working on adding a feature, most of the time we’re addressing code smells and adding unit tests.

The last thing Llewellyn and I do before we leave for the day is send a very short email to the department managers – the people authorizing our invoices. We write one sentence about each mobbing session and the learning hour, summarizing what we’ve done. It makes our work more visible to the decision makers.

Moving in the right direction

I’m impressed with how much Llewellyn and the other visiting coaches have already achieved with these developers. Most people have a positive, curious outlook, and consider learning new skills to be a normal part of work. Llewellyn will often pause the mob timer, pull out his laptop, and spend five minutes showing some relevant presentation slides. Some of the developers here know certain refactorings so well I find myself learning techniques from the teams not just from Llewellyn.

3

Optimising for when we’re not there

That’s not to say there aren’t problems. The codebase is still large, badly structured, slow to build, and lacking automated tests in many areas. Many developers in the mobbing teams are relatively new to the company, and most have little previous experience of Test-Driven Development or refactoring techniques. However, their starting point isn’t what’s important, the crucial thing is that they are continuing to improve and learn. What we’re doing here is creating momentum in the right direction, teaching skills and strategies so people can continue to make things better when we’re no longer present.

4

The wider organisation is also just beginning to adopt Agile practices and DevOps, and it’s not working smoothly yet. Many teams are still sitting in cubicles. The test, build and deployment infrastructure relies too much on manually executed steps.

There are several other agile coaches here at the same time as Llewellyn and me who are more focussed on improving process and product management. Later in the week I attend a sprint demo where there is much talk of A/B testing and hypothesis-driven development. People seem really keen to understand their customer needs and to verify they are building the right thing. The thing is, the starting point is less important than the direction of travel.

After a week of pair-coaching I feel confident I can pick up and continue Llewellyn’s work with the development teams at this client. I’ve got to know the people, the particular challenges they face, and have a structure in place that will let me continue the changes Llewellyn is initiating. I’m actually quite surprised how smoothly the handover has gone.

Visiting Technical Agile Coach

Not many coaches are generous enough to invite visitors to pair with them. Llewellyn has a whole list of people he’s inviting to visit in 2018, and I feel lucky to be one of them. It seems to me that everyone’s a winner. Of course Llewellyn wants to show off and spread his coaching methods to the visitors, but also to learn from them and get their feedback on his work. At the same time the client gets the benefit of teaching & advice from many different visitors. They do have to pay two coaches rather than one during the week we overlap, but having such a smooth handover gives them the ability to get more weeks of coaching than Llewellyn could provide by himself. So far the client seems to feel it’s worth the additional expense.

Taking what I’ve learnt home with me

I plan to start recommending this style of Technical Agile Coaching to my clients back home in Sweden. For a start I really enjoy coding, mobbing and teaching all day, but more importantly it seems to be more effective than what I’ve been doing until now. For years I’ve used Coding Dojos to teach the theory and practice of TDD, but all too often the initiative fizzles out and the dojos stop happening when I’m no longer there.

I think this combination of daily mob programming sessions with a learning hour is particularly effective at teaching both theory and practice. The lunches, the presence of other agile coaches, and sending daily summary mails connects me better with the wider organization. This has been a really valuable two weeks for me. I feel pair-coaching with Llewellyn has taught me an effective way to introduce technical agile practices and change developer behaviour for the better.

5

This post was originally published on Praqma’s blog

A short story about Pre-tested Integration

the three

Continuous Integration and Code Review are strongly correlated with success. Many use Pull Requests for code review, but for co-located teams this can be an obstacle for CI. Is there a better way?

There are three developers on the (fictitious) team: Annika, Boris and Carol. Annika is a recent hire, fresh from university, Boris is the team lead, and Carol has been around the longest. Each of them is working on a different task. They all synchronized their work with their shared master branch when they arrived at the office today, and now it’s approaching morning coffee time. They have all made some changes in the code which they’d like to share with the rest of the team.

Annika is working on a local branch called ‘red’. She checks it’s up-to-date with master and pushes it to a remote branch named ‘ready/red’. It’s similar for Boris and Carol. They are on blue and orange branches respectively and push their changes to ready/blue and ready/orange.

the three

The Build Server is set up so that it detects new branches on the Version Control Server that follow a naming convention. Any branch beginning with ‘ready/’ is scheduled for integration, and only one of these integration builds runs at a time. The Build Server delegates builds to one or more agents,and since the ready-job agent is idle, it picks up the ‘ready/red’ change straight away and leaves the other two ready-branch builds in the queue.

build servers

The build job has several steps. First, the agent merges the ready-branch into a local copy of the master branch. Annika’s changes get a simple fast-forward. The agent performs a full build, static analysis, code style check, and unit test. Everything goes well, so the agent pushes the merge result up to the Version Control Server and posts a message on the team message board.

Things start out similarly for Boris’ ready/blue branch. The build agent takes a copy of master from the git server and merges in the ready-branch. This isn’t a fast-forward merge, since there is a new commit in master for the ‘red’ changes, but it’s still ok. So long as the agent can do the merge without finding any conflicts the build can continue.

The agent then proceeds to the next build steps. Unfortunately, Boris hadn’t noticed that one of his changes caused a test failure. The team has previously agreed that the code in master should always pass the tests, so this means Boris’ changes shouldn’t be shared. The build agent sends a message to Boris telling him about the failed tests, discards its merged branch, and moves on. Carol’s ready/orange branch is up next. The build agent starts again with a fresh copy of the latest master taken from the git server. Carol’s changes also merge without difficulty and this time both build and tests pass. The build agent pushes the merge commit to the server and notifies the team.

build servers

Boris and Carol are having a cup of coffee while they wait for the build server to integrate their changes. Annika is chatting with the Product Owner about the new feature she plans to work on next, ‘cyan’. When they get back to their desks they see the messages from the build server.

Annika is happy to see her changes integrated successfully. She fetches the latest master from the remote git server. She’s now completed the work on the ‘red’ task, and her changes should undergo a code review. She marks the ‘red’ task as finished in the issue tracker and adds an agenda item to the team’s next scheduled code review meeting, which is later that week. Annika selects a new task to work on and checks out a local branch from master called ‘cyan’. Boris sees the message about his failed tests and realizes immediately what he missed. He’s a little embarrassed about his mistake, but happy his teammates are not affected. They may not even notice what’s happened. Boris takes the opportunity to merge the latest changes from master into his ‘blue’ branch. He is quickly able to address the problem with the tests and pushes an update to ready/blue. The build agent gets to work straight away.

Carol is not finished with the ‘orange’ task, but is happy to see her initial changes integrated successfully. She fetches master and merges it into ‘orange’ before continuing work there. She’s noticed a design change that would make her task easier. She plans the refactoring in steps so she can push small changes frequently as she completes the re-design. Sharing her changes with the team often will make it easier for everyone to avoid costly merges. Later that week, in the code review meeting, the team looks at Annika’s changes for the ‘red’ task. It represents a couple of days’ work. The code review tool presents a summary of all the commits involved and they discuss all the changes in the development of the ‘red’ feature.

Unfortunately, Boris and Carol are not happy with a part of the design Annika has made and the code formatting needs improving in places. The outcome of the meeting is that they agree to pair program with Annika on a refactoring of the design, and encourage her to initiate informal design discussions more often during development. The idea is that the more experienced developers, Boris and Carol, should help Annika to learn better design skills. The team finds the code-formatting issues a bit annoying since this kind of detail shouldn’t be in focus for a code review meeting. They create a task to improve the code-style checker in the pre-tested integration build to catch any similar code formatting problem in future.

Commentary

This development process is working really well for Annika, Boris and Carol, and pre-tested integration is a small but important piece. They are not using pull requests, but they have checks on what code is allowed into the master branch, and they have good code-review culture. Integration to master happens at a faster cadence than the flow of work-items. That’s important. Integration is less painful the more often you do it and you might not want to break your work items down to the same small granularity that would be best for code changes. You also don’t necessarily want to delay your integration by waiting for a teammate to review your pull request.

Strictly speaking, this process is not Trunk-Based development since there are more branches involved than just trunk, but so long as the integration is frequent in practice it’s indistinguishable. The benefit of this over Trunk-Based development is, of course, that Boris or any other developer can’t unwittingly break master for the rest of the team.

If you’re using Jenkins you can easily automate the integration process with our Pretested Integration plugin. It’s not difficult to implement this functionality yourself for other build servers. Whichever approach your team chooses, I recommend you settle on a process that results in frequent integration together with collaborative and constructive code-reviews.

Note: This post first appeared on Pagero’s blog

One of the questions that Kent Beck asked when he was developing the eXtreme Programming development methodology, was what happens if we turn the dials up all the way to 10? Take a practice we know is good, and do more of it? Practices like Test-Driven Development and Pair Programming are what he came up with, starting from manual testing and code review.

In the same way, Continuous Delivery is what you get if you turn the dials to 10 on your annual release cycle. You get to the point that you are pushing out new code to users, many times a day.

“Shortening the release cycle like this has a lot of advantages, especially around risk and quality.”

LOWER RISK AND HIGHER QUALITY WITH SHORTER RELEASE CYCLES

Shortening the release cycle like this has a lot of advantages, especially around risk and quality. Basically, you’re decreasing the batch size, a well-known tenet of lean manufacturing. If each new release contains fewer changes, then you have fewer places to look when things go wrong, so finding bugs is easier. You also lower the risk that any individual batch has a defect in the first place. By having an engineering setup that allows you to make code changes at the drop of a hat and push them out to production easily, you facilitate getting fixes out quickly.

So the upshot is quality problems surface sporadically instead of all at once, and are more easily dealt with. It’s an attractive prospect for us, especially with the growth in traffic we’re experiencing. Every time we have a defect in production, it affects a proportion of our customers, and the number of customers is increasing all the time. If we had a small bug a year ago that affected one or two customers, today the same bug might affect tens or even hundreds.

FROM MONOLITH TO MICROSERVICES FOR GREATER FLEXIBILITY

At Pagero, historically we’ve been pushing out a new version of our product “Pagero Online”, about once a month. We’ve been able to sustain that since about 2007. So when we began looking at Continuous Delivery, about three years ago, we were starting from a fairly good position. We’ve experienced steady growth in transactions through our cloud platform since the start, and it was in early 2014 we started switching over our architecture from a clustered monolithic JEE instance, to distributed microservices (see my previous article).

We needed to do this, in order to scale out our system horizontally, and handle the increasing traffic. One of the other benefits of microservices though, is you can deploy services independently of one another, and if you do it right, you can deploy new code without stopping traffic to the site.

“One of the other benefits of microservices, is you can deploy services independently of one another.”

FROM MONTHLY SERVICE WINDOWS ON SUNDAYS…

Our old monthly release cycle was based on having a ‘service window’, usually on a Sunday morning, where we could stop all the traffic, take a backup of the database, roll out the new version of the monolith, then bring everything back up again. You’ve got the database backup to fall back on, if something goes wrong with the update. You can easily roll everything back to the state it had before the service window.

…TO SEVERAL ROLLOUTS A WEEK

So of course, initially the microservices we had were fairly peripheral to the main function of our platform, and it wasn’t a huge risk to roll out new code without the safety of a service window. So we built deployment tools that allowed us to do that. All our microservices run with at least two instances, so an update consisted of taking each instance down in turn, replacing it with the new version. If something goes wrong, it’s not hard to roll back to a previous version. It’s a little more problematic to restore previous state, but generally we have good mechanisms to re-submit failed transactions once the service is working again.

So these days we roll out new versions of our microservices several times a week, when new features are ready, and rarely have any difficulties with this. The need to roll back does occur occasionally, but more often we can ‘roll-forward’ and deploy a newer version with a fix.

“These days we roll out new versions of our microservices several times a week, when new features are ready.”

MANY REASONS TO CONTINUE ON THIS PATH

With our former monolith, the situation is a little different though. Any changes that touch the database are deemed too risky to deploy without first taking a backup, and that currently requires a service window. We’ve got so used to frequently pushing out new versions of the microservices, and seen the benefits of that, that we’d like to do the same with the former monolith.

We also have good business reasons for wanting to release without having a service window – for a start our traffic is growing at such a rate, we can ill afford any downtime. Perhaps more importantly, as we get customers in more parts of the world, a Sunday morning is no longer a ‘quiet’ time of the week when it’s relatively ok to suspend our service. In some Arab countries where we do business, Sunday is the first day of the working week.

THE SHIFT TO CONTINUOUS DELIVERY HAS STARTED

Now we’ve gained some experience with Continuous Delivery of our microservices, it’s time to do the same with the whole Pagero Online platform, including our old monolith. So I look forward to being able to soon report that we’ve got the dials going all the way up to 10 and we are deploying any part of our system at any time.

 

Please note – As of March 2013, I have rewritten this post in the light of further experience and discussions. The updated post is available here.

I feel like I’ve spent most of my career learning how to write good automated tests in an agile environment. When I downloaded JUnit in the year 2000 it didn’t take long before I was hooked – unit tests for everything in sight. That gratifying green bar is near-instant feedback that everthing is as expected, my code does what I intended, and I can continue developing from a firm foundation.

Later, starting in about 2002, I began writing larger granularity tests, for whole subsystems; functional tests if you like. The feedback that my code does what I intended, and that it has working functionality has given me confidence time and again to release updated versions to end-users.

Often, I’ve written functional tests as regression tests, after the functionality is supposed to work. In other situations, I’ve been able to write these kinds of tests in advance, as part of an ATDD, or BDD process. In either case, I’ve found the regression tests you end up with need to have certain properties if they’re going to be useful in an agile environment moving forward. I think the same properties are needed for good agile functional tests as for good unit tests, but it’s much harder. Your mistakes are amplified as the scope of the test increases.

I’d like to outline four principles of agile test automation that I’ve derived from my experience.

Coverage

If you have a test for a feature, and there is a bug in that feature, the test should fail. Note I’m talking about coverage of functionality, not code coverage, although these concepts are related. If your code coverage is poor, your functionality coverage is likely also to be poor.

If your tests have poor coverage, they will continue to pass even when your system is broken and functionality unusable. This can happen if you have missed out needed test cases, or when your test cases don’t check properly what the system actually did. The consequences of poor coverage is that you can’t refactor with confidence, and need to do additional (manual) testing before release.

The aim for automated regression tests is good Coverage: If you break something important and no tests fail, your test coverage is not good enough. All the other principles are in tension with this one – improving Coverage will often impair the others.

Readability

When you look at the test case, you can read it through and understand what the test is for. You can see what the expected behaviour is, and what aspects of it are covered by the test. When the test fails, you can quickly see what is broken.

If your test case is not readable, it will not be useful. When it fails you will have to dig though other sources outside of the test case to find out what is wrong. Quite likely you will not understand what is wrong and you will rewrite the test to check for something else, or simply delete it.

As you improve Coverage, you will likely add more and more test cases. Each one may be fairly readable on its own, but taken all together it can become hard to navigate and get an overview.

Robustness

When a test fails, it means the functionality it tests is broken, or at least is behaving significantly differently from before. You need to take action to correct the system or update the test to account for the new behaviour. Fragile tests are the opposite of Robust: they fail often for no good reason.

Aspects of Robustness you often run into are tests that are not isolated from one another, duplication between test cases, and flickering tests. If you run a test by itself and it passes, but fails in a suite together with other tests, then you have an isolation problem. If you have one broken feature and it causes a large number of test failures, you have duplication between test cases. If you have a test that fails in one test run, then passes in the next when nothing changed, you have a flickering test.

If your tests often fail for no good reason, you will start to ignore them. Quite likely there will be real failures hiding amongst all the false ones, and the danger is you will not see them.

As you improve Coverage you’ll want to add more checks for details of your system. This will give your tests more and more reasons to fail.

Speed

As an agile developer you run the tests frequently. Both (a) every time you build the system, and (b) before you check in changes. I recommend time limits of 2 minutes for (a) and 10 minutes for (b). This fast feedback gives you the best chance of actually being willing to run the tests, and to find defects when they’re cheapest to fix.

If your test suite is slow, it will not be used. When you’re feeling stressed, you’ll skip running them, and problem code will enter the system. In the worst case the test suite will never become green. You’ll fix the one or two problems in a given run and kick off a new test run, but in the meantime someone else has checked in other changes, and the new run is not green either. You’re developing all the while the tests are running, and they never quite catch up. This can become pretty demoralizing.

As you improve Coverage, you add more test cases, and this will naturally increase the execution time for the whole test suite.

How are these principles useful?

I find it useful to remember these principles when designing test cases. I may need to make tradeoffs between them, and it helps just to step back and assess how I’m doing on each principle from time to time as I develop.

I also find these principles useful when I’m trying to diagnose why a test suite is not being useful to a development team, especially if things have got so bad they have stopped maintaining it. I can often identify which principle(s) the team has missed, and advise how to refactor the test suite to compensate.

For example, if the problem is lack of Speed you have some options and tradeoffs to make:

  • Invest in hardware and run tests in parallel (costs $)
  • Use a profiler to optimize the tests for speed the same as you would production code (may affect Readability)
  • push down tests to a lower level of granularity where they can execute faster. (may reduce Coverage and/or increase Readability)
  • Identify key test cases for essential functionality and remove the other test cases. (sacrifice Coverage to get Speed)

Explaining these principles can promote useful discussions with people new to agile, particularly testers. The test suite is a resource used by many agile teamembers – developers, analysts, managers etc, in its role as “Living Documentation” for the system, (See Gojko Adzic‘s writings on this). This emphasizes the need for both Readability and Coverage. Automated tests in agile are quite different from in a traditional process, since they are run continually throughout the process, not just at the end. I’ve found many traditional automation approaches don’t lead to enough Speed and Robustness to support agile development.

I hope you will find these principles will help you to reason about the automated tests in your suite.

Programmers have a vested interest in making sure the software they create does what they think it does. When I’m coding I prefer to work in the context of feedback from automated tests, that help me to keep track of what works and how far I’ve got. I’ve written before about Test Driven Development, (TDD). In this article I’d like to explain some of the main features of Text-Based Testing. It’s a variant on TDD, perhaps more suited to the functional level than unit tests, and which I’ve found powerful and productive to use.


The basic idea
You get your program to produce a plain text file that documents all the important things that it does. A log, if you will. You run the program and store this text as a “golden copy” of the output. You create from this a Text-Based Test with a descriptive name, any inputs you gave to the program, and the golden copy of the textual output.

You make some changes to your program, and you run it again, gathering the new text produced. You compare the text with the golden copy, and if they are identical, the test passes. If the there is a difference, the test fails. If you look at the diff, and you like the new text better than the old text, you update your golden copy, and the test is passing once again.

Tool Support
Text-Based Testing is a simple idea, and in fact many people do it already in their unit tests. AssertEquals(String expected, String actual) is actually a form of it. You often create the “expected” string based on the actual output of the program, (although purists will write the whole assert before they execute the test).

Most unit test tools these days give you a nice diff even on multi-line strings. For example:

download
download (1)

Which is a failing text-based test using JUnit.

Once your strings get very long, to the scale of whole log files, even multi-line diffs aren’t really enough. You get datestamps, process ids and other stuff that changes every run, hashmaps with indeterminate order, etc. It gets tedious to deal with all this on a test-by-test basis.

My husband, Geoff Bache, has created a tool called “TextTest” to support Text-Based testing. Amongst other things, it helps you organize and run your text-based tests, and filter the text before you compare it. It’s free, open source, and of course used to test itself. (Eats own dog food!) TextTest is used extensively within Jeppesen Systems, (Geoff works for them, and they support development), and I’ve used it too on various projects in other organizations.

In the rest of this article I’ll look at some of the main implications of using a Text-Based Testing approach, and some of my experiences.

Little code per test
The biggest advantage of the approach, is that you tend to write very little unique code for each test. You generally access the application through a public interface as a user would, often a command line interface or (web)service call. You then create many tests by for example varying the command line options or request contents. This reduces test maintenance work, since you have less test code to worry about, and the public API of your program should change relatively infrequently.

Legacy code
Text-Based Testing is obviously a regression testing technique. You’re checking the code still does what it did before, by checking the log is the same. So these tests are perfect for refactoring. As you move around the code, the log statements move too, and your tests stay green, (so long as you don’t make any mistakes!) In most systems, it’s cheap and risk-free to add log statements, no matter how horribly gnarly the design is. So text-based testing is an easy way to get some initial tests in place to lean on while refactoring. I’ve used it this way fairly successfully to get legacy code under control, particularly if the code already produces a meaningful log or textual output.


No help with your design
I just told you how good Text-Based Testing is with Legacy code. But actually these tests give you very little help with the internal design of your program. With normal TDD, the activity of creating unit tests at least forces you to decompose your design into units, and if you do it well, you’ll find these tests giving you all sorts of feedback about your design. Text-Based tests don’t. Log statements don’t care if they’re in the middle of a long horrible method or if they’re spread around several smaller ones. So you have to get feedback on your design some other way.

I usually work with TDD at the unit level in combination with Text-Based tests at the functional level. I think it gives me the best of both worlds.

Log statements and readability
Some people complain that log statements reduce the readability of their code and don’t like to add any at all. They seem to be out of fashion, just like comments. The idea is that all the important ideas should be expressed in the class and method names, and logs and comments just clutter up the important stuff. I agree to an extent, you can definitely over-use logs and comments. I think a few well placed ones can make all the difference though. For Text-Based Testing purposes, you don’t want a log that is megabytes and megabytes of junk, listing every time you enter and leave every method, and the values of every variable. That’s going to seriously hinder your refactoring, apart from being a nightmare to store and update.

What we’re talking about here is targeted log statements at the points when something important happens, that we want to make sure should continue happening. You can think about it like the asserts you make in unit tests. You don’t assert everything, just what’s important. In my experience less than two percent of the lines of code end up being log statements, and if anything, they increase readability.

Text-Based tests are completed after the code
In normal TDD you write the test first, and thereby set up a mini pull system for the functionality you need. It’s lean, it forces you to focus on the problem you’re trying to solve before you solve it, and starts giving you feedback before you commit to an implementation. With Text-Based Testing, you often find it’s too much work the specify the log up front. It’s much easier to wait until you’ve implemented the feature, run the test, and save the log afterwards.

So your tests usually aren’t completed until after the code they test, unlike in normal TDD. Having said that, I would argue that you can still do a form of TDD with Text-Based Tests. I’d normally create the half the test before the code. I name the test, and find suitable inputs that should provoke the behaviour I need to implement in the system. The test will fail the first time I run it. In this way I think I get many of the benefits of TDD, but only actually pin down the exact assertion once the functionality is working.

“Expert Reads Output” Antipattern
If you’re relying on a diff in the logs to tell you when your program is broken, you had better have good logs! But who decides what to log? Who checks the “golden copy”? Usually it is the person creating the test, who should look through the log and check everything is in order the first time. Of course, after a test is created, every time it fails you have to make a decision whether to update the golden copy of the log. You might make a mistake. There’s a well known antipattern called “Expert Reads Output” which basically says that you shouldn’t rely on having someone check the results of your tests by eye.

This is actually a problem with any automated testing approach – someone has to make a judgement about what to do when a test fails – whether the test is wrong or there’s a bug in the application. With Text-Based Testing you might have a larger quantity of text to read through compared with other approaches, or maybe not. If you have human-readable, concise, targeted log statements and good tools for working with them, it goes a long way. You need a good diff tool, version control, and some way of grouping similar changes. It’s also useful to have some sanity checks. For example TextTest can easily search for regular expressions in the log and warn you if you try to save a golden copy containing a stack trace for example.

In my experience, you do need to update the golden copy quite often. I think this is one of the key skills with a Text-Based Testing approach. You have to learn to write good logs, and to be disciplined about either doing refactoring or adding functionality, not both at the same time. If you’re refactoring and the logs change, you need to be able to quickly recognize if it’s ok, or if you made a mistake. Similarly, if you add new functionality and no logs change, that could be a problem.

Agile Tests Manage Behaviour
When you create a unit test, you end with an Assert statement. This is supposed to be some kind of universal truth that should always be valid, or else there is a big problem. Particularly for functional level tests, it can be hard to find these kinds of invariants. What is correct today might be updated next week when the market moves or the product owner changes their mind. With Text-Based Testing you have an opportunity to quickly and easily update the golden copy every time the test “fails”. This makes your tests much more about keeping control of what your app does over time, and less about rewriting assert statements.

Text-Based Testing grew up in the domain of optimizing logistics planning. In this domain there is no “correct” answer you can predict in advance and assert. Planning problems that are interesting to solve are far too complex for a complete mathematical analysis, and the code relies on heuristics and fancy algorithms to come up with better and better solutions. So Text-Based Testing makes it easy to spot when the test produces a different plan from before, and use it as the new baseline if it’s an improvement.

I think generally it leads to more “agile” tests. They can easily respond to changes in the business requirements.

Conclusions
There is undoubtedly a lot more to be said about Text-Based Testing. I havn’t mentioned text-based mocking, data-driven vs workflow testing, or how to handle databases and GUIs – all relevant topics. I hope this article has given you a flavour of how it’s different from ordinary TDD, though. I’ve found that good tool support is pretty essential to making Text-Based Testing work well, and that it’s a particularly good technique for handling legacy code, although not exclusively. I like the approach because it minimizes the amount of code per test, and makes it easy to keep the tests in sync with the current behaviour of the system.