Posts tagged ‘TDD’

I gave this talk at JFokus this week in Stockholm. I thought people might be interested in a summary of my main points.

I’ve been using Selenium for web application testing for over a year now, on a couple of different projects. This talk is based on my experiences.

What is Selenium?

Selenium is actually an umbrella term for a suite of tools. The two tools in the suite I have been using are Selenium Remote Control (RC) ,which allows you to test your webapp as it runs in a broswer, from Java code., and Selenium IDE, which is a Firefox plugin that allows you to record tests.

Selenium’s killer feature is that it allows you to test your app while it is running in a browser, with a real browser javascript engine. This is particularly important for web applications with a lot of javascript and ajax. Selenium is also actively maintained, with version 2.0 having recently arrived at alpha release status.

I showed a demo of how to write a test using Selenium IDE to record browser interaction, then pasting that into Java code. The test doesn’t pass the first time you execute it, you have to tweak the code to replace xpaths with ids, and add a “wait” for the page to load at one point in the test. I find this is typical – Selenium IDE helps you to get started with your test, but there is still a lot of manual work to get it to pass.

The code you end up with is rather widget-oriented and low level, this is what I end up with in my example:


@Test
public void createNewPublisher() {
selenium.click("menuItemPNewPublisher");
selenium.waitForPageToLoad("30000");
selenium.type("name", "Disney");
selenium.type("address", "24 Stars Ave");
selenium.type("zipCode", "91210");
selenium.type("city", "Hollywood");
selenium.type("country", "USA");
selenium.type("contactPersonName", "Emily");
selenium.type("contactPersonEmail", "emily@test.com");
selenium.type("contactPersonPassword", "123456");
selenium.type("contactPersonRetypePassword", "123456");
selenium.click("save");
selenium.waitForPageToLoad("30000");

verifyEquals(selenium.getTable("Users.1.1"), "Emily");
}

This is all very well, you can basically read what it is doing, (open the “New publisher” page, fill in a form, then check the user table displays correctly), but I think we can do better.

Agile Testing Theory

Brian Marick came up with this idea of “Testing Quadrants”, which helps us to reason about what kinds of tests we need in an agile project. Selenium tests generally fall into the second quadrant – business facing tests that support the team. This implies they should be written in the language of the business, to help developers to understand what they are building. This is in contrast to the low-level, widget oriented script that selenium IDE produces.

It is better to discover a bug as soon after insertion as possible, and the faster the tests run, the more often you will run them. The usual rule of thumb says your tests must take less than a minute for you to run them as part of a TDD cycle, and less than 10 minutes for you to run them in the CI server. For a suite of selenium tests, even 10 minutes can be ambitious.

Selenium tests are fundamentally slow. They are slow because they run inside the browser using its javascript motor.

WebDriver

WebDriver (new in Selenium 2.0) uses a different model of interacting with your application under test. It uses the browser’s api instead of its javascript motor. This means tests execute faster and more reliably, at the cost of supporting fewer browsers. In addition, WebDriver has a well thought out, clean api, and built in support for the PageObject pattern. Having said that, WebDriver is a very recent addition to selenium, and currently has poor or no integration with Selenium IDE and Selenium Grid.

the Page Object pattern

The PageObject pattern is a way of organizing your test code. It tells you to model your test code after the page structure of your application. The test I listed earlier could be rewritten using this pattern, and might look something like this:


@Test
public void createNewPublisher() {
PublisherData newPublisher = new PublisherData("Disney")
.withAddress(
new AddressData("24 Stars Ave",
"91210", "Hollywood", "USA"))
.withContactPerson(
new UserData("Emily", "emily.bache@test.com",
"123456"));

NewPublisherPage page = new NewPublisherPage(selenium);
page.enterPublisherDetails(newPublisher);
ViewPublisherPage viewPage = (ViewPublisherPage)page.save();
viewPage.verifyPublisherDetails(newPublisher);
}

Here we are using a data struct to hold the form data, which tells you quite obviously that a PublisherData has a name, address and contact person. Then we have the Page Object for the NewPublisherPage, which uses selenium to navigate to the relevant page in the constructor, then has methods which allow us to interact with that page. When we as it to “save”, we navigate to a new page, in this case the ViewPublisherPage, which we can ask to validate the new publisher is displayed correctly.

This is an improvement over the original test in several ways. I think it is much more readable, we have a certain amount of type safety from the data struct, and if the page flow, or form data changes, there are obvious central places in the code to update all the tests. We can easily vary the form data in different tests in order to do data driven testing.

Scenario classes

However, in the application I’ve been working on, we havn’t used this pattern in quite this way. The application uses a lot of frames and ajax, and a page based model didn’t really fit. So we’ve been using the idea of “Scenario” classes instead of Page Objects. They fulfill the same purpose, but instead of modelling pages in your app, these classes group together domain actions, or services that your application provides. The Scenario classes hide all the details of how to use selenium to interact with the GUI and achieve an outcome a user is interested in. In this case, creating a publisher.


@Test
public void createNewPublisher() {
PublisherData newPublisher = new PublisherData("Disney")
.withAddress(new AddressData("24 Stars Ave",
"91210", "Hollywood", "USA"))
.withContactPerson(
new UserData("Emily", "emily@test.com",
"123456"));

PublisherScenarios scenarios =
new PublisherScenarios(selenium);
scenarios.createPublisher(newPublisher);

scenarios.verifyPublisherDetails(newPublisher);
}

Domain Specific Languages

So with the PageObject or Scenario patterns, you could say that my tests are written in a kind of Domain Specific Language, in that they are written in terms close to the domain of the application they are testing. The trouble is, this is still Java code, and business people generally can’t read Java. A more interesting DSL would enable communication between developers and business people.

This is where I turned to Fitnesse to help me. I found it pretty straightforward to put some Fitnesse tables over the top of this kind of Java code. Since the Java code is already organized in terms of the domain, it becomes natural to offer the same functionality in tables with meaningful names.

I chose Fitnesse because I wanted to try it out, but I think it would be equally easy to put Cucumber or Robot Framework or something like that on top, too. The whole point of these tools is that they allow you to easily write executable tests in a language anyone with knowledge of the domain can read. This is a huge benefit to developers when talking to business people.

Closing advice

Having said all this about how to make your Selenium tests more readable, robust and faster, I still find my Selenium test suite frustratingly slow and fragile (I havn’t used WebDriver enough yet to know if that is the case there too). Compared to PyUseCase, it is in the dark ages. (Unfortunately PyUseCase has no support for web applications, so there’s probably no point in mentioning it, actually. But it is a great tool!).

The difference between PageObjects and Scenario classes is not huge, but I like the latter, because they allow me to easily exchange Selenium for a WebService. My tests are written in terms of the services the application offers, and if I indeed implement those services as WebServices, I can call them directly from the tests. Using a WebService instead of selenium immediately makes my tests much faster and more robust.

The problem with testing against a WebService is the reason we chose Selenium in the first place – ajax and browsers are not covered.

So for web 2.0 application testing, I recommend both API tests and Selenium tests. Use Selenium to test the crucial parts of your GUI, but write most of your tests against some kind of WebService or API. Write your tests in terms a business user would understand, even in the Java code, and then it is easy to create a Domain Specific Language on top using another tool.

One of the things I like about GothPy is that lots of the people in the group enjoy writing code in their spare time, and like to share with us what they’re up to.

A little while back, Olof came up with this little tool, PyTDDmon, which is to help you when you’re doing TDD. Instead of your running your tests by hand, PyTDDmon sits in the background and runs them for you. When they are green, it has a little green blob, when they are red, it tells you how many are failing, and clicking on it gives you the stacktrace. Olof put together a screencast to show how it works.

Following Geoff’s presentation of PyUseCase at GothPy in December, he and Olof started to discuss whether it could be used to test PyTDDmon. Since it’s using the GUI library Tkinter instead of pyGTK, it didn’t look all that straightforward.

Of course, Olof developed the app using TDD in the first place, (definite own dog food eating going on!), and his unit tests have about 48% statement coverage. The parts with least coverage are GUI-related, so there seemed to be scope to improve matters by adding tests via the GUI.

Geoff sat down a few days ago with a branch of the PyTDDmon code, started trying to generalize PyUseCase to work with Tkinter, and to write some tests for PyTDDmon. (Geoff also likes writing code in his spare time 🙂

As I write, he’s just checking in his changes and a new suite of tests for PyTDDmon that use the new, improved PyUseCase, with (basic) Tkinter support! He says the hardest part was understanding Tkinter, and after a spot of refactoring, only an extra couple of hundred lines of code were needed to support it.

Subsequently adding tests for PyTDDmon was very easy (Geoff wrote most of them while I made biscuits with the children this afternoon 🙂 All he had to do to Olof’s code was assign names to some of the widgets, and sort out a problem with closing a window. (There was a listener for a mouse click that would destroy the window, before other listeners got a chance to do anything, such as record the click in a use case log…)

So now the statement coverage of the PyTDDmon tests is at 98%. I look forward to future, equally productive GothPy meetings and discussions!

I recently read this post in Brian Marick’s blog, and it set me thinking. He’s talking about a test whose intention in some way survived three major GUI revisions. The test code had to be rewritten each time, but the essence of it was retained. He says:

I changed the UI of an app and so
 I had to rewrite the UI code and the tests/checks of the UI code. It might seem the checks were worthless during the rewrite (and in 1997, I would have thought so). But it turns out that all (or almost all) of those checks contained an idea that needed to be preserved in the new interface. They were reminders of important things to think about, and that (it seemed to me) repaid the cost of rewriting them.

That was a big surprise to me.

I’m not sure why Brian is so surprised about this. If the user intentions and business rules are the same, then some aspects of the tests should also be preserved. A change in UI layout or technology should mean superficial changes only. In fact, one of the main claims for PyUseCase is that by having the tests written in a domain language decoupled from the specifics of the UI, it enables you to write tests that survive major UI changes. In practice this means when you rewrite the UI, you are saved the trouble of also rewriting the tests. So Geoff and I decided to write some code and see if this was true for the example Brian outlines.

In the blog post, there is only one small screenshot and some vague descriptions of the GUIs these tests are for, so we did some interpolation. I hope we have written an application that gets to the gist of the problem, although it is undoubtedly less beautiful and sophisticated than the one Brian was working on. All the code and tests is on launchpad here.

We started by writing an application which I hope is like his current GUI. You select animals in a list, click “book” and they appear in a new list below. You select procedures from another list, and unsuitable animals disappear.

In my app, I had to make up some procedures, in this case “milking”, which is unsuitable for Guicho (no udders on a gelding!), and “abdominocentesis” which is suitable for all animals, (no idea what that is, but it was in Brian’s example :-). Brian describes a test where an animal that is booked should not stay booked if you choose a procedure that is unsuitable for it, then change your mind and instead choose a procedure that it is suitable for.

select animals Guicho
book selected animals
choose procedure milking
choose procedure abdominocentesis
quit

This is a list of the actions the user must take in the GUI. So Guicho should disappear when you select “milking”, and reappear as available, but not as booked, when you select “abdominocentesis”. This information is not in the use case file, since it only documents user actions.

The other part of the test is the UI log, which documents what the application actually does in response to the user actions. This log is auto generated by pyUseCase. For this test, I won’t repeat the whole file, (you can view it here), but I will go through the important parts:

‘select animals’ event created with arguments ‘Guicho’

‘book selected animals’ event created with arguments ”

Updated : booked animals with columns: booked animals ,
-> Guicho | gelding

This part of the log shows that Guido is listed as booked.

‘choose procedure’ event created with arguments ‘milking’

Updated : available animals with columns: available animals , animal type
-> Good Morning Sunshine | mare
-> Goat 3 | goat
-> Goat 4 | goat
-> Misty | mare

Updated : booked animals with columns: booked animals ,

So you see that after we select “milking” the lists of available and booked animals are updated, Guicho disappears, and the “booked animals” section is now blank. The log goes on to show what happens when we select “abdominocentesis”:

‘choose procedure’ event created with arguments ‘abdominocentesis’

Updated : available animals with columns: available animals , animal type
-> Good Morning Sunshine | mare
-> Goat 3 | goat
-> Goat 4 | goat
-> Guicho | gelding
-> Misty | mare

‘quit’ event created with arguments ”

ie the “available animals” list is updated and Guicho reappears, but the booked animals list is not updated. This means we know the application behaves as desired – booked animals that are not suitable for a procedure do not reappear as booked if another procedure is selected.

Ok, so far so good. What happens to the test when we compeletely re-jig the UI and it instead looks like this?

Now there is no book button, and you book animals by ticking a checkbox. Selecting a procedure will remove unsuitable animals from the list in the same way as before. So now if you change your mind about the procedure, animals that reappear on the list should not be marked as booked, even if they were before they disappeared. There is no separate list of booked animals.

What we did was take a copy of the tests and the code, updated the code, and see what we needed to do to the tests to make them work again. In the end it was reasonably straightforward. We didn’t re-record or rewrite any tests. We just had to modify the use cases to remove the reference to the book button, and save new versions of the UI log to reflect the new UI layout. The use case part of the test looks like this now:

book animal Guicho
choose procedure milking
choose procedure abdominocentesis
quit

which is one line shorter than before, since we no longer have separate user actions for selecting and booking an animal.

So updating the tests to work with the changed UI consisted of:

  1. remove reference to “book” button in UI map file, since button no longer exists
  2. in use case files for all tests, replace “select animals x, y” with a line for each animal, “book animal x” and “book animal y”.
  3. Run the tests. All fail in identical manner. Check the changes in the UI log file using a graphical diff tool, once. (no need to look at every test since they are grouped together as identical by TextTest)
  4. Save the updated use cases and UI logs. (the spurious line “book selected animals” is removed from the use case files since the button no longer exists)
  5. Run the tests again. All pass.

The new UI log file looks like this:

‘book animal’ event created with arguments ‘Guicho’

Updated : available animals with columns: is booked , available animals , animal type
-> Check box | Good Morning Sunshine | mare
-> Check box | Goat 3 | goat
-> Check box | Goat 4 | goat
-> Check box (checked) | Guicho | gelding
-> Check box | Misty | mare

‘choose procedure’ event created with arguments ‘milking’

Updated : available animals with columns: is booked , available animals , animal type
-> Check box | Good Morning Sunshine | mare
-> Check box | Goat 3 | goat
-> Check box | Goat 4 | goat
-> Check box | Misty | mare

‘choose procedure’ event created with arguments ‘abdominocentesis’

Updated : available animals with columns: is booked , available animals , animal type
-> Check box | Good Morning Sunshine | mare
-> Check box | Goat 3 | goat
-> Check box | Goat 4 | goat
-> Check box | Guicho | gelding
-> Check box | Misty | mare

‘quit’ event created with arguments ”

It is quite explicit that Guicho is marked as booked before he disappears, and not checked when he comes back. Updating the UI map file was very easy – we viewed it in a graphical diff tool, noted the new column for the checkbox and the lack of the list of booked animals were as expected, and clicked “save” in TextTest.

I only actually had like 5 tests, but updating them to cope with the changed UI was relatively straightforward, and would still have been straightforward even if I had had 600 of them.

I’m quite pleased the way PyUseCase coped in this case. I really believe that with this tool you will be able to write your tests once, and they will be able to survive many generations of your UI. I think this toy example goes some way to showing how.

Geoff has been working really hard for the past few months, writing pyUseCase 3.0. It has some very substantial improvements over previous versions, and I am very excited about it. He’s written about how it works here.

It’s a tool for testing GUIs with a record-replay paradigm, that actually works. Seriously, you can do agile development with these tests, they don’t break the minute you change your GUI. The reason for this is that the tests are written in a high level domain language, decoupled from the actual current layout of your GUI. The tool lets you create and maintain a mapping file from the current widgets to the domain language, and helps you to keep it up to date.

In a way it’s a bit like Robot, or Twist, or Cucumber, that your tests end up being very human readable. The main difference is the record-replay capability. Anyone who can use the application GUI can create a test, which they can run straight away. With these other tools, a programmer typically has to go away and map the user domain language of the test into something that actually executes.

The other main way in which pyUseCase is different from other tools, is the way it checks your application did the right thing. Instead of the test writer having to choose some aspects of the GUI and make assertions about what they should look like, pyUseCase just records what the whole GUI looks like, in a plain text log. Then you can use TextTest to compare the log you get today with the one you originally recorded when you created the test. The test writer can concentrate on just normal interaction with the GUI, and still have very comprehensive assertions built into the tests they create.

pyUseCase, together with TextTest, makes it really easy to create automated tests, without writing code, that are straightforward to maintain, and readable by application users. Geoff has been developing his approach to testing for nearly a decade, and I think it is mature enough now, and sufficiently far ahead of the competition, that it is going to transform the way we do agile testing.

😀

Today I listened to a presentation about “Scrum for Managers” from Jens Östergaard. He’s a big, friendly Dane who grew up in Sweden, and now lives in the UK. I first met Jens at XP2003 in Genoa, when he had just run his first successful Scrum project. These days he spends his time flying around the world, teaching Scrum courses and coaching Scrum Masters. (He’ll be doing 2 more CSM courses in Göteborg in the next 6 months, and speaking at Scandinavian Developer Conference).

One thing I noticed about his talk was that most things about Scrum hardly seem to have changed at all. Jens was largely using the same language and examples that are in the original books. The other thing that struck me was that Jens said nothing about the technical practices that are needed to make agile development work. In my world, you can’t possibly hope to reliably deliver working software every sprint/iteration if you havn’t got basic practices like continuous integration and automated testing in place. I asked Jens about this afterwards, and he said it was deliberate. Scrum is a project management framework that can be applied to virtually any field, not just software development. Therefore he didn’t want to talk about software specific practices.

When I first heard Ken Schwaber talk about Scrum (keynote at XP2002) I’m farily sure he included the XP developer practices. I can’t find my notes from that speech, but I remember him being very firey and enthusiastic and encouraging us to go out and convert the world to Scrum and XP (the word agile wasn’t invented then).

Scrum has been hugely successful since then. Today we had a room full of project managers and line managers who all knew something about Scrum, many of whom are using it daily in their organizations. Scrum is relatively easy to understand and get going with at the project level, and has this CSM training course that thousands of people have been on. These are not bad things.

I do think that dropping the XP development practices entirely from the description of Scrum is unhelpful. I chatted with several people who are having difficulty getting Scrum to work in their organizations, and I think lack of developer practices, particularly automated testing, is compounding their problems. I think a talk given to software managers needs to say something about how developers might need coaching and training in new practices if they are going to succeed with Scrum.