Archive for April, 2010

I’ve been working full time in Ruby now for about a month, and I think I’m beginning to get the hang of it. In many ways it’s not so different from Python.

I took some of the code I talked about struggling with in my last post, and set about translating it into Python. I wanted to see if the bugs were more obvious in a language I was more familiar with. Unfortunately before I was finished with the translation, my machine died and refused to restart, complaining about a disk error… so I may well have lost all my work. (ARRRGH!!) Anyway, before that happened, I was getting the feeling that the bugs weren’t very obvious, even in a language I knew better.

What I did find though, was that it was quite hard to make the methods as short in Python as they were in Ruby. I could make them as short, the language has the necessary features to do it, but when I did so, they just stopped looking like good Python to me.

The other thing that I found was that RSpec lets you write really readable and well organized tests. Translating them into unittest made them just a travesty of their former selves.

I’ve just listened to this talk by Gary Bernhardt, who is experienced in both Python and Ruby, and who has clearly thought quite hard about these kinds of issues and knows both languages very well. He has even tried to write a RSpec clone for Python, called Mote. (He says himself that it isn’t as good as RSpec, and that it is because Python lacks blocks, and won’t let him monkeypatch core classes).

Anyway, about half way through the talk, Gary shows this code example, first in Python:

    for obj in (
        for id in ids)
   if obj)

Then the equivalent code in Ruby: do |id|
 repository.retrieve(id) do |obj|

Gary makes the point that the Ruby code is easier to read – you can follow it from top to bottom and see what it does. The thing is, I don’t think many people would write code like that in Python. I might write it more like this:

objs = [repository.retrieve(id) for id in ids]
objs = filter(lambda x: x, objs)
names = [ for obj in objs]

This code is four statements long, rather than one, and has two local variables (“objs” and “names”) which the other two code snippets lack. In real code, you would probably be able to come up with rather more descriptive names, drawn from the problem domain. When I compare this code with the Ruby, I don’t think it is any less readable. The filter(lambda x: x, objs) is not as nice as the Ruby call to “compact”, but on the other hand, I think the two additional local variables make it clearer what is going on.

I’m wondering whether the trouble I was having locating bugs in these small methods was because they were cramming so much into one statement, and almost completely lacking in local variables. That seems to be the Ruby way of doing things – maybe I will just get used to it and learn to read it just as well eventually? I guess I am going to find out!

Anyway, I’m really hoping the friendly support technicians manage to save the contents of my hard disk, I want to use the code in an exercise at the upcoming Gothenburg Python Conference – GothPyCon. I am hoping to run a workshop where half the room gets the buggy code with small methods, and the other half gets the same code refactored into longer methods. You get half an hour to find the bugs, then swap to the other codebase and find them again. Then I was hoping to have a discussion about which codebase was easier to debug, and/or split into pairs and re-implement the code from scratch, and see if we could come up with other designs that solved the problem more elegantly and transparently.

If issues like this interest you – design, refactoring and testing -, I really hope you will come along!

Last week at Scottish Ruby Conference I chatted with Brian Marick about software design. The week before that, he had been in Göteborg for Scandinavian Developer Conference, and had spent a morning pair programming with Geoff on TextTest. I took the chance to ask Brian what he thought about text-based testing now that he had seen it in action.

Brian’s view seems to be that text-based testing may be effective as a testing technique, but it just doesn’t offer the design benefits you get with standard TDD. It doesn’t give you guidance about small-scale design decisions, or intice you to structure your code into really small methods and classes. He thought the TextTest codebase wasn’t bad, but that the methods were larger than he would prefer, some classes were doing too much, and some ideas were not expressed clearly.

I’ve previously read about this design ethic of having really really small classes and methods in Bob Martin’s book “Clean Code”. Bob recommends one or two line methods. In contrast, I believe Steve McConnell’s “Code Complete” advocates methods small enough to fit comfortably on one screen. Geoff’s design for TextTest seems to land at about 5 or 6 lines for a typical method, which is somewhere in between.

Brian said the main drawback of code structured largely into small classes and one and two line methods is that it is harder for people who are unfamiliar with the codebase to get to grips with it. You get lost in the trees, and can’t easily get an overview of the forest. The big benefit is that people who are familiar with the code can potentially make sweeping improvements through very small, localized changes.

I’ve had the opportunity to work on this kind of codebase recently, and my experience hasn’t been entirely positive. Just as Brian predicted, I’ve found it hard to get into the code and grasp what it is doing. All the methods are one or two lines, and call each other. My pair and I ended up creating a temporary file where we pasted a whole call chain of about 6 methods so we could see them all at once, and read them in the order they called each other. There were 3 defects hidden in that 15 or so lines of code, and it took us a day or so to identify and fix them all. Yet, the code had been TDD’d, and the methods were well named. On the surface it looked very good. It actually took me some time to convince myself that we genuinely had found a defect, and that we hadn’t just misunderstood what the code was supposed to do.

The trouble seemed to stem from the fact that almost all the tests were for the “happy path” and there were several edge cases they never considered. Also, in one case the test had stubbed out the answer that one of the lower-level methods would return, and provided an answer the real code never gave. It took a long time for us to add missing tests for edge cases, and localize the defects to particular methods in the long call chain.

I’m very interested in whether we would have found it easier to find the defects if the code had been structured as two or three 5 line methods instead of 6 one and two line methods. I’m also interested if we would have found the issues more easily if the code had been built with text-based testing, with fewer, more coarse grained tests, and a few log statements printing key intermediate values. I’m considering getting the original versions of the files out of git and refactoring them to see how else they could have looked.

I don’t want you to conclude that I am against building designs with really small methods, or TDD, or using stubs or anything like that. I think there is value in all these techniques, each can be done well or badly, and you make tradeoffs when you choose your approach. You still need design skills and testing skills, whether you’re doing TDD or text-based testing. If Brian Marick had built TextTest the design might well have turned out differently. I don’t know how much of that would have been because of his use of TDD, and how much because of his skill, and views on design.

I’m actually relishing the prospect of working on more code with very small methods, and using TDD to build on it. I’ve got loads to learn about software design and testing 🙂