Archive for the ‘Coding Skills’ Category

Last week at Scottish Ruby Conference I chatted with Brian Marick about software design. The week before that, he had been in Göteborg for Scandinavian Developer Conference, and had spent a morning pair programming with Geoff on TextTest. I took the chance to ask Brian what he thought about text-based testing now that he had seen it in action.

Brian’s view seems to be that text-based testing may be effective as a testing technique, but it just doesn’t offer the design benefits you get with standard TDD. It doesn’t give you guidance about small-scale design decisions, or intice you to structure your code into really small methods and classes. He thought the TextTest codebase wasn’t bad, but that the methods were larger than he would prefer, some classes were doing too much, and some ideas were not expressed clearly.

I’ve previously read about this design ethic of having really really small classes and methods in Bob Martin’s book “Clean Code”. Bob recommends one or two line methods. In contrast, I believe Steve McConnell’s “Code Complete” advocates methods small enough to fit comfortably on one screen. Geoff’s design for TextTest seems to land at about 5 or 6 lines for a typical method, which is somewhere in between.

Brian said the main drawback of code structured largely into small classes and one and two line methods is that it is harder for people who are unfamiliar with the codebase to get to grips with it. You get lost in the trees, and can’t easily get an overview of the forest. The big benefit is that people who are familiar with the code can potentially make sweeping improvements through very small, localized changes.

I’ve had the opportunity to work on this kind of codebase recently, and my experience hasn’t been entirely positive. Just as Brian predicted, I’ve found it hard to get into the code and grasp what it is doing. All the methods are one or two lines, and call each other. My pair and I ended up creating a temporary file where we pasted a whole call chain of about 6 methods so we could see them all at once, and read them in the order they called each other. There were 3 defects hidden in that 15 or so lines of code, and it took us a day or so to identify and fix them all. Yet, the code had been TDD’d, and the methods were well named. On the surface it looked very good. It actually took me some time to convince myself that we genuinely had found a defect, and that we hadn’t just misunderstood what the code was supposed to do.

The trouble seemed to stem from the fact that almost all the tests were for the “happy path” and there were several edge cases they never considered. Also, in one case the test had stubbed out the answer that one of the lower-level methods would return, and provided an answer the real code never gave. It took a long time for us to add missing tests for edge cases, and localize the defects to particular methods in the long call chain.

I’m very interested in whether we would have found it easier to find the defects if the code had been structured as two or three 5 line methods instead of 6 one and two line methods. I’m also interested if we would have found the issues more easily if the code had been built with text-based testing, with fewer, more coarse grained tests, and a few log statements printing key intermediate values. I’m considering getting the original versions of the files out of git and refactoring them to see how else they could have looked.

I don’t want you to conclude that I am against building designs with really small methods, or TDD, or using stubs or anything like that. I think there is value in all these techniques, each can be done well or badly, and you make tradeoffs when you choose your approach. You still need design skills and testing skills, whether you’re doing TDD or text-based testing. If Brian Marick had built TextTest the design might well have turned out differently. I don’t know how much of that would have been because of his use of TDD, and how much because of his skill, and views on design.

I’m actually relishing the prospect of working on more code with very small methods, and using TDD to build on it. I’ve got loads to learn about software design and testing 🙂

In my current assignment, I’m taking the role of “developer-in-test”. I’m working in a large distributed development project, which is building new functionality on a large existing codebase. In practice, I work closely with the developers in the project and build automated tests for subsystems that previously had only manual tests. The developers can use these tests to support their work, and add new tests as they build new features.

My background is basically as a developer, so I have been reading up on testing. I found “Lessons Learned in Software Testing” by Kaner, Pettichord and Bach very helpful, and “Agile Testing” by Lisa Crispin and Janet Gregory helpful and also very thorough. I find it interesting that the authors of the first book started out as developers and now classify themselves as testers, while Lisa and Janet apparently always have called themselves testers, although they clearly write a fair amount of code as part of their work.

Dave Nicolette recently made a blog post “Merging the developer and tester roles” where he argues that Tester is just a specialization of Developer in the agile world, like DBA (DataBaseAdministrator) is a specialization of Developer. He argues that agile teams need to be staffed with generalizing specialists. That means anyone can turn their hand to any task that is currently needed, while still having some tasks they perform with more skill than others.

I like Dave’s viewpoint, it fits my experience. I can only write effective 2nd Quadrant tests, (business facing, support the team), if I understand what the developers need, and I do that best if I have done some development on that part of the system myself. To put it another way, I need to be just as competent at writing code as the other developers in the project I’m working in, but I also need additional skills to do with testing.

I like the term “developer-in-test” to describe a role writing and enabling 2nd Qudrant tests.

Having said all that, I’m not sure I agree with Dave that the Developer and Tester roles should always be merged. In my current assignment I’m also helping a group of testers, usability experts, technical writers and product owners to get going with exploratory testing. This testing falls into Q3 of the agile testing quadrants, and is quite different. You still need testing skills, but developer skills are mostly irrelevant. It’s much more about understanding what the user is trying to achieve with the system, and how they view it.

I think there is a role for non-coding testers in Q3 testing. However, I don’t think you’ll get far with Q3 unless you have the other quadrants well covered with automated tests. So I think the majority of work for a tester in an agile environment is still going to involve test automation. Only the biggest projects will be able to afford to have non-coding testers.

At GothPy yesterday, Geoff talked about code coverage and tests. Geoff has spent a lot of his evenings lately working on PyUseCase, and getting the test coverage up to 100%, (statement coverage), a feat which he achieved last week. The evidence for this is available for all to see on the texttest site, (which is updated daily, btw, so if it is not green and 100% the day you read this post, then clearly Geoff had a bad day yesterday).

I have limited experience of using coverage statistics to evaluate my tests, so it was interesting to hear Geoff summarize his findings. He thought it had been well worth the effort to get coverage to 100%, he’d found some bugs, some dead code, and improved his design along the way. Actually, saying he has 100% coverage is a statement that needs qualification. The tool he’s been using – coverage.py – has a feature whereby you can mark lines of code as # pragma: no cover, ie I don’t want this line counted for coverage purposes. So he’s marked 37 of 3242 lines like this.

The reason for excluding these lines is mostly practical – due to the nature of the tool you can’t test it automatically when it is in “interactive” mode without physically pressing the buttons yourself – so automated tests for that part are impossible. Some excluded lines are for error cases which should never occur, but for which it would be useful to have a good error message if they ever did.

Overall, Geoff thinks coverage is very useful to help you to identify

  • poorly tested areas of your code
  • mistakes in your tests
  • dead code
  • refactoring opportunities

The first one is obvious, but the others might take more explaination. Generally, each test is for a specific feature. If you think you have a test for a feature, but the code coverage shows the implementation of that feature not to be covered, then there is probably a mistake in your test.

Similarly, if your tests cover all your features and some code is not covered, maybe it’s not that important code at all, and could be safely removed. Geoff’s tests are not unit tests, they are testing the whole of PyUseCase, and that maybe makes a difference with this particular point. If I just had unit tests, and a piece of code wasn’t covered, I’m not sure I could as easily infer that it wasn’t needed as a part of a larger feature.

Refactoring opportunities can be identified from gaps in coverage too. The idea is that poorly tested code is a clue that it has other problems too. Perhaps you find two pieces of code are similar, and one copy has a gap in coverage. This could indicate they originate from copy-paste programming, and could be combined into one routine, with full test coverage.

Geoff had some tips for people who wanted to use coverage statistics to improve their tests.

  • Don’t design your tests around coverage. Write appropriate tests, and then measure coverage.
  • This applies even when working with coverage results. See the coverage report as containing clues for new tests, not commands.
  • Use “#pragma : no cover” in your code to be explicit about code that you decide not to try and cover. Review these periodically.
  • Don’t be fanatical about absolute numbers. Commands like “Aim for at least 85% coverage” are counterproductive. (You get what you measure).
  • It’s always good to increase feasible coverage. It’s sometimes better to spend your limited time on other things. But if you don’t measure, you can’t make that decision effectively.

These last points are mostly also made in an article by Brian Marick which is quite old (1997). Geoff found the article when he was researching the talk for GothPy, and thought it was very good, and fits his experience.

Inspired by Geoff’s talk, I spent some time today trying to get some coverage numbers for the code and tests I’m working on at present. Unfortunately it seemed to be a bit tricky to get the coverage tool to work. It’s not python, of course, and that may have something to do with it. Hopefully I’ll sort it out and be able to write a new blog post about my own experiences with coverage statistics sometime soon.

I gave this talk at JFokus this week in Stockholm. I thought people might be interested in a summary of my main points.

I’ve been using Selenium for web application testing for over a year now, on a couple of different projects. This talk is based on my experiences.

What is Selenium?

Selenium is actually an umbrella term for a suite of tools. The two tools in the suite I have been using are Selenium Remote Control (RC) ,which allows you to test your webapp as it runs in a broswer, from Java code., and Selenium IDE, which is a Firefox plugin that allows you to record tests.

Selenium’s killer feature is that it allows you to test your app while it is running in a browser, with a real browser javascript engine. This is particularly important for web applications with a lot of javascript and ajax. Selenium is also actively maintained, with version 2.0 having recently arrived at alpha release status.

I showed a demo of how to write a test using Selenium IDE to record browser interaction, then pasting that into Java code. The test doesn’t pass the first time you execute it, you have to tweak the code to replace xpaths with ids, and add a “wait” for the page to load at one point in the test. I find this is typical – Selenium IDE helps you to get started with your test, but there is still a lot of manual work to get it to pass.

The code you end up with is rather widget-oriented and low level, this is what I end up with in my example:


@Test
public void createNewPublisher() {
selenium.click("menuItemPNewPublisher");
selenium.waitForPageToLoad("30000");
selenium.type("name", "Disney");
selenium.type("address", "24 Stars Ave");
selenium.type("zipCode", "91210");
selenium.type("city", "Hollywood");
selenium.type("country", "USA");
selenium.type("contactPersonName", "Emily");
selenium.type("contactPersonEmail", "emily@test.com");
selenium.type("contactPersonPassword", "123456");
selenium.type("contactPersonRetypePassword", "123456");
selenium.click("save");
selenium.waitForPageToLoad("30000");

verifyEquals(selenium.getTable("Users.1.1"), "Emily");
}

This is all very well, you can basically read what it is doing, (open the “New publisher” page, fill in a form, then check the user table displays correctly), but I think we can do better.

Agile Testing Theory

Brian Marick came up with this idea of “Testing Quadrants”, which helps us to reason about what kinds of tests we need in an agile project. Selenium tests generally fall into the second quadrant – business facing tests that support the team. This implies they should be written in the language of the business, to help developers to understand what they are building. This is in contrast to the low-level, widget oriented script that selenium IDE produces.

It is better to discover a bug as soon after insertion as possible, and the faster the tests run, the more often you will run them. The usual rule of thumb says your tests must take less than a minute for you to run them as part of a TDD cycle, and less than 10 minutes for you to run them in the CI server. For a suite of selenium tests, even 10 minutes can be ambitious.

Selenium tests are fundamentally slow. They are slow because they run inside the browser using its javascript motor.

WebDriver

WebDriver (new in Selenium 2.0) uses a different model of interacting with your application under test. It uses the browser’s api instead of its javascript motor. This means tests execute faster and more reliably, at the cost of supporting fewer browsers. In addition, WebDriver has a well thought out, clean api, and built in support for the PageObject pattern. Having said that, WebDriver is a very recent addition to selenium, and currently has poor or no integration with Selenium IDE and Selenium Grid.

the Page Object pattern

The PageObject pattern is a way of organizing your test code. It tells you to model your test code after the page structure of your application. The test I listed earlier could be rewritten using this pattern, and might look something like this:


@Test
public void createNewPublisher() {
PublisherData newPublisher = new PublisherData("Disney")
.withAddress(
new AddressData("24 Stars Ave",
"91210", "Hollywood", "USA"))
.withContactPerson(
new UserData("Emily", "emily.bache@test.com",
"123456"));

NewPublisherPage page = new NewPublisherPage(selenium);
page.enterPublisherDetails(newPublisher);
ViewPublisherPage viewPage = (ViewPublisherPage)page.save();
viewPage.verifyPublisherDetails(newPublisher);
}

Here we are using a data struct to hold the form data, which tells you quite obviously that a PublisherData has a name, address and contact person. Then we have the Page Object for the NewPublisherPage, which uses selenium to navigate to the relevant page in the constructor, then has methods which allow us to interact with that page. When we as it to “save”, we navigate to a new page, in this case the ViewPublisherPage, which we can ask to validate the new publisher is displayed correctly.

This is an improvement over the original test in several ways. I think it is much more readable, we have a certain amount of type safety from the data struct, and if the page flow, or form data changes, there are obvious central places in the code to update all the tests. We can easily vary the form data in different tests in order to do data driven testing.

Scenario classes

However, in the application I’ve been working on, we havn’t used this pattern in quite this way. The application uses a lot of frames and ajax, and a page based model didn’t really fit. So we’ve been using the idea of “Scenario” classes instead of Page Objects. They fulfill the same purpose, but instead of modelling pages in your app, these classes group together domain actions, or services that your application provides. The Scenario classes hide all the details of how to use selenium to interact with the GUI and achieve an outcome a user is interested in. In this case, creating a publisher.


@Test
public void createNewPublisher() {
PublisherData newPublisher = new PublisherData("Disney")
.withAddress(new AddressData("24 Stars Ave",
"91210", "Hollywood", "USA"))
.withContactPerson(
new UserData("Emily", "emily@test.com",
"123456"));

PublisherScenarios scenarios =
new PublisherScenarios(selenium);
scenarios.createPublisher(newPublisher);

scenarios.verifyPublisherDetails(newPublisher);
}

Domain Specific Languages

So with the PageObject or Scenario patterns, you could say that my tests are written in a kind of Domain Specific Language, in that they are written in terms close to the domain of the application they are testing. The trouble is, this is still Java code, and business people generally can’t read Java. A more interesting DSL would enable communication between developers and business people.

This is where I turned to Fitnesse to help me. I found it pretty straightforward to put some Fitnesse tables over the top of this kind of Java code. Since the Java code is already organized in terms of the domain, it becomes natural to offer the same functionality in tables with meaningful names.

I chose Fitnesse because I wanted to try it out, but I think it would be equally easy to put Cucumber or Robot Framework or something like that on top, too. The whole point of these tools is that they allow you to easily write executable tests in a language anyone with knowledge of the domain can read. This is a huge benefit to developers when talking to business people.

Closing advice

Having said all this about how to make your Selenium tests more readable, robust and faster, I still find my Selenium test suite frustratingly slow and fragile (I havn’t used WebDriver enough yet to know if that is the case there too). Compared to PyUseCase, it is in the dark ages. (Unfortunately PyUseCase has no support for web applications, so there’s probably no point in mentioning it, actually. But it is a great tool!).

The difference between PageObjects and Scenario classes is not huge, but I like the latter, because they allow me to easily exchange Selenium for a WebService. My tests are written in terms of the services the application offers, and if I indeed implement those services as WebServices, I can call them directly from the tests. Using a WebService instead of selenium immediately makes my tests much faster and more robust.

The problem with testing against a WebService is the reason we chose Selenium in the first place – ajax and browsers are not covered.

So for web 2.0 application testing, I recommend both API tests and Selenium tests. Use Selenium to test the crucial parts of your GUI, but write most of your tests against some kind of WebService or API. Write your tests in terms a business user would understand, even in the Java code, and then it is easy to create a Domain Specific Language on top using another tool.

One of the things I like about GothPy is that lots of the people in the group enjoy writing code in their spare time, and like to share with us what they’re up to.

A little while back, Olof came up with this little tool, PyTDDmon, which is to help you when you’re doing TDD. Instead of your running your tests by hand, PyTDDmon sits in the background and runs them for you. When they are green, it has a little green blob, when they are red, it tells you how many are failing, and clicking on it gives you the stacktrace. Olof put together a screencast to show how it works.

Following Geoff’s presentation of PyUseCase at GothPy in December, he and Olof started to discuss whether it could be used to test PyTDDmon. Since it’s using the GUI library Tkinter instead of pyGTK, it didn’t look all that straightforward.

Of course, Olof developed the app using TDD in the first place, (definite own dog food eating going on!), and his unit tests have about 48% statement coverage. The parts with least coverage are GUI-related, so there seemed to be scope to improve matters by adding tests via the GUI.

Geoff sat down a few days ago with a branch of the PyTDDmon code, started trying to generalize PyUseCase to work with Tkinter, and to write some tests for PyTDDmon. (Geoff also likes writing code in his spare time 🙂

As I write, he’s just checking in his changes and a new suite of tests for PyTDDmon that use the new, improved PyUseCase, with (basic) Tkinter support! He says the hardest part was understanding Tkinter, and after a spot of refactoring, only an extra couple of hundred lines of code were needed to support it.

Subsequently adding tests for PyTDDmon was very easy (Geoff wrote most of them while I made biscuits with the children this afternoon 🙂 All he had to do to Olof’s code was assign names to some of the widgets, and sort out a problem with closing a window. (There was a listener for a mouse click that would destroy the window, before other listeners got a chance to do anything, such as record the click in a use case log…)

So now the statement coverage of the PyTDDmon tests is at 98%. I look forward to future, equally productive GothPy meetings and discussions!