Bob Martin has just written a post in his blog where he tells the story of a test manager who has 80 000 manual tests, and wishes they were automated instead. Bob writes:

”One common strategy to get your tests automated is to outsource the problem. You hire some team of test writers to transform your manual tests into automated tests using some automation tool. These folks execute the manual test plan while setting up the automation tool to record their actions. Then the tool can simply play the actions back for each new release of the system; and make sure the screens don’t change.”

Bob then goes on to explain why this is such a terrible idea – and blames it all on coupling. That the tests and the GUI are coupled to the extent that when you change the GUI, loads of tests break. Wheras humans can handle a fair amount of GUI changes and still correctly determine whether a manual test should pass or fail, machines fall over all too easily and just fail as soon as something unexpected happens. So you end up re-recording them, which can cost as much as just doing the tests manually in the first place.

These problems are of course bigger or smaller depending on the GUI automation tool you choose. Anything that records pixel positions will fall over when you simply change the screen resolution, let alone when you add new buttons and features in your GUI. More modern tools record the names or ids of the widgets, so they don’t break if the widget simply moves to another part of the screen. In other words, you reduce your coupling.

Geoff has been working on PyUseCase which takes this to another level. Instead of coupling the tests to widget names, you couple them to ”domain actions”. This makes your tests even more robust in the face of gui changes. A drop down list can turn into a set of radio buttons and your tests won’t mind, since they just say something like ”select airport SFO”. This doesn’t isolate you from the big changes, like moving the order of the screens in a wizard around, but since the tests are written in plain text, in a language any domain expert can read, they are relatively cheap to update.

There is another respect in which machines under-perform compared to manual testers. An intelligent human will usually do a certain amount of exploration beyond the scripted test steps they have infront of them. They try to understand the purpose of the test, click around a bit and ask questions when parts of the system peripheral to the test in hand start to look odd. Machines don’t do any exploration, and in fact often don’t even notice errors on parts of the screen they havn’t been told to look at.

Geoff’s PyUseCase can partly address this kind of a problem. Used together with TextTest, it will continually scan the log the System Under Test produces, and fail the test for example if any stack traces appear. PyUseCase also automatically produces a low fidelity ascii-art-esque log of how the current screen looks, and can compare it against what it looked like last time the test ran. Changes are flagged as test failures, which will bring to your attention the change in an unrelated corner of the screen which says ”32nd December” instead of ”1st January”.

I know that sounds like we just introduced a huge amount of coupling between the tests and the way the GUI looks, and yes, we have. The difference is that this coupling is very easy to manage. If 1000 tests all fail saying ”expected: 1st January, found: January 1st”, TextTest handily groups all the test failures and lets you accept or reject the change en-masse. So it is very little work to update a lot of tests when the GUI just looks different, but you don’t care.

There is still a problem though, that the machine will not explore outside of the scripted steps you tell it to perform. So you will have to do some manual exploratory testing too, not everything can be automated.

So a simplistic lets-just-automate-our-manual-tests is a bad idea because machines can’t handle GUI changes as well as humans can, and because machines don’t look around and explore. Potentially your automated tests will cost more than your manual tests, and find fewer bugs.

So should we stick with our manual test suite then? No, of course not. The value of automated tests is not simply that you can run them more cheaply than manual tests, it is that you can run them more often – at every build, constantly supplying developers with valuable feedback rather than just at the end of the release cycle. It is this kind of feedback that enables refactoring, and lets developers build quality code from the start. That is their real gain over manual tests.

Bob Martin’s suggestion is that you shouldn’t rely on expensive GUI tests for this kind of feedback – only perhaps 15% of your tests should be GUI reliant. The rest run against some kind of api, which is less volatile and hence cheaper to maintain. With the kinds of tools Bob I suspect has been using for GUI testing I’m not surprised he says this. I just think that with tools like PyUseCase and TextTest the costs are much reduced, and call for reconsideration of this ratio. Looking at Geoff’s self tests for TextTest (a GUI intensive tool), around half are testing through the GUI, using pyUseCase. Basically I don’t think GUI tests have to be as bad and expensive as Bob makes out.