Posts tagged ‘python’

In my last post I discussed some exercises put together by Luca Minudel. He was using them as part of a study into how developer’s skill at removing SOLID violations was related to their skill at TDD.

I initially did the exercises in Java, then translated them into Python so we could look at them in our local Python User Group meeting last week. What follows are my own opinions, but I must thank all the pythonistas who were at the meeting, and Andrew Dalke who couldn’t be there but was kind enough to share his opinions and code anyway. What I write below owes a lot to their input. I’m so lucky to be in this community!

We found that in Python, some violations of the Open-Closed Principle are much easier to handle than in Java or C#. You can monkeypatch, and exploit the fact that data is only private by convention. The advantage of being able to get code under test without changing it is of course that you reduce the need for risky refactorings where you unintentionally break the code and get no failing tests to alert you to it.

So the TirePressure example has a violation of the Open-Closed principle where it’s hard to change the specific Sensor used without opening up the existing code. In Python it was dead easy to get under test without modifying the code, because although the sensor is marked as private in the Alarm class, nothing in the language stops you from assigning to it. (See this explanation of how Python considers private data.)

Here’s what (some of) the test code looks like, without making any modifications to the production code: (using the testing framework py.test)

from tire_pressure_monitoring import Alarm

class StubSensor(object):

def __init__(self, pressures):
self.pressures = pressures

def pop_next_pressure_psi_value(self):
return self.pressures.pop()

def test_pressure_in_expected_range_doesnt_trigger_alarm():
alarm = Alarm()
alarm._sensor = StubSensor([18])
alarm.check()
assert not alarm.is_alarm_on()

def test_pressure_below_expected_range_triggers_alarm():
alarm = Alarm()
alarm._sensor = StubSensor([15])
alarm.check()
assert alarm.is_alarm_on()

This means you can quickly get some tests in place. You probably shouldn’t leave the tests like that though, since they are relying on implementation details of the class. This could make them fragile in
the face of refactoring –  we’d rather the test only relied on the public interface. These tests also use  hard coded numbers where it’s not obvious why those values are chosen – another sign the production code could be improved.

Similarly in the HTMLConverter example, there is a violation of the Open-Closed principle that makes it awkward to have the code read from a string instead of a file. One way to get it under test initially is to use monkeypatching. You just pass in a different implementation of the “open” method that doesn’t in fact open a file, but rather provides a file-like object constructed from a string. Since we have duck typing, the production code doesn’t notice the substitution. You do have to be careful to put the “open” method back to normal at the end of the test though, or the test will have side effects.

So for example:

from cStringIO import StringIO

import unicode_to_html_converter
from unicode_to_html_converter import UnicodeFileToHtmlTextConverter

class StubOpen(object):
def __init__(self, text):
self.text = text

def __call__(self, *args, **kwargs):
return StringIO(self.text)

def test_convert_to_html():
try:
stub_open = StubOpen("text to convert <>&\"\n")
unicode_to_html_converter.open = stub_open
converter = UnicodeFileToHtmlTextConverter("a filename that will be ignored by StubOpen")
# unfortunately this next line is being mangled by my syntax highlighter.
# I think you know what I mean though
assert "text to convert <>&"
" == converter.convert_to_html()
finally:
unicode_to_html_converter.open = open

Again you can quickly get some tests in place, at the cost of a test that is somewhat awkward to read. The other way to get the code under test without changing it is the same as for C# or Java – put the text in a temporary file that is deleted afterwards. This may be simpler to understand, but will be slower to execute. It’s a tradeoff.

I guess the thing with these exercises is that you can get them under test in various ways without correcting the SOLID violations, or the other problems in the code. The idea is that skilled developers will not leave it at that. They will listen to the feedback the tests give them about the code being unecessarily hard to test, and respond by improving the design.

Last night at GothPy we had a play with Django, a web application framework for Python. I’m fairly familiar with Rails, so it was interesting to see a different take on solving the same kinds of problems.

I downloaded Django for the first time when preparing for the meeting, and spent about a day going through the tutorial and trying stuff out. At the meeting were a few people who have used Django before, notably Mikko Hellsing, who has worked with it daily for several years.

It was so much easier learning the tool from people who knew it than by reading the documentation. Constant code review and real time answering of questions is just brilliant for learning.

At the meeting we decided to implement a very very simple web application, similar to the one that Johannes Brodwall performed as a Kata at XP2010 together with Ivar Nilsen. He calls it “Java EE Spike Kata” since he does it in Java with no particular web frameworks, just what comes with Java. (There is a video of him doing it on his blog, and sample solution here on github).

I thought we should be able to implement the same little application in any web framework, and it might be interesting to see differences, so I though we should try doing it in Django. It just involves two pages. One page “Add User” which lets you create a new user and save it to the database, and another page “Search User” which lets you search for users, and presents results. So the scenario is to create a user, search for them, and see they are returned.

When I work on a problem in Rails I usually start with a Cucumber scenario for the feature, and I discovered there is a python version of Cucumber called Lettuce. We could have course have just used Cucumber with python, but given the big “WARNING – Experimental” notice Aslak wrote on this page, I thought we could give Lettuce a try.

So at the meeting we all worked together with one laptop and a projector, (me at the keyboard), and we started with a Lettuce scenario. We implemented step definitions using Django’s test Client, which is a kind of headless browser that understands how to instrument a Django application. Then we spent a couple of hours writing Django code together until the scenario was all green.

The code we ended up with isn’t much to write home about, but I’ve put it up on github here.

What we learned from this exercise
Of course since I know Rails much better, I found it interesting to compare the two web frameworks. Django seems similar in many ways. It was dead easy to get up and running, it gives you a basic structure for your code, and separates the model from the presentation layer.

The O-R mapping looks quite similar to ActiveRecord in the way you declare model classes backed by tables in the db. The presentation layer seems to have a different philosophy from Rails though. The html view part is rather loosely coupled to your application, and doesn’t allow you to embed real python code in it, just basic control structures.

You hook up the html to the controller code using url regular expression matching. I was a little confused about exactly how was supposed to work, since what I considered controller code was put in a file called “views.py”. Most of the code we wrote ended up in here, and I feel a bit unhappy with it. It seems to be a mixture of stuff at all levels of abstraction. The Django Form objects we used seemed quite powerful though, and reduced the amount of code we had to write.

The biggest difference I noticed compared with Rails was how explicit Python is about where stuff comes from. You always have to declaratively import or pass stuff before you can use it, so I found it straightforward to follow the connections and work out which code was executed and where it came from. I think that is because of Python’s philosophy of strict namespaces, which Ruby doesn’t have so much of.

I also liked the way Django encourages you to structure a larger web application into a lot of independent smaller ones, and lets you include and re-use other people’s small apps. Rails has plugins too of course, but I thought Django’s way made it seem very natural to contribute and re-use code.

Comparing Lettuce to Cucumber, they look almost the same, (by design). One small difference I found was that Cucumber step definitions don’t care if you write “Given”, “When” or “Then” in front of them, whereas Lettuce did. So I had steps like this:

Then I should see "results found"
And I should see "Name"

where the first step passed and the second step was reported as unimplemented by Lettuce. So Lettuce need a little more work to be really usable.

I was also pretty disappointed by the Django test client for implementing test steps. It seemed to interact with pages at an abstraction layer lower than I am used to – at the level of making post and get requests and parsing html as a dom. I missed Capybara and its DSL for interacting with web pages. I couldn’t find any equivalent. I probably should have turned to Selenium directly, but since we had no client side javascript it seemed overkill. (Francisco Souza has written about using Lettuce with Selenium compared with the Django test client here).

When it comes to unit-level testing, Django provides an extension to the normal unittest tool that comes with python (unittest was originally based on JUnit). We didn’t do much with it at the session, and it seemed to work fine. It’s nothing like RSpec though, and I miss that expressiveness and structure for my tests whenever I work in Python.

Overall
All in all it was fun to look at Django with a group and to get some really helpful comments and insights from people who know it far better than I do. The Kata we chose seemed to work ok, but I should really do it again in Rails since I spent the whole time comparing Django with Rails in my head, not Django with Johannes’ Java code 🙂

My conclusions are basically that Django looks like a good alternative to Rails. It would take time to learn, and surely has strengths and weaknesses I can’t really evaluate from one short session looking at it. However, I’d fairly sure I’d have to do some work improving the testing tools if I was going to be happy working with it for real.

At GothPy yesterday, Geoff talked about code coverage and tests. Geoff has spent a lot of his evenings lately working on PyUseCase, and getting the test coverage up to 100%, (statement coverage), a feat which he achieved last week. The evidence for this is available for all to see on the texttest site, (which is updated daily, btw, so if it is not green and 100% the day you read this post, then clearly Geoff had a bad day yesterday).

I have limited experience of using coverage statistics to evaluate my tests, so it was interesting to hear Geoff summarize his findings. He thought it had been well worth the effort to get coverage to 100%, he’d found some bugs, some dead code, and improved his design along the way. Actually, saying he has 100% coverage is a statement that needs qualification. The tool he’s been using – coverage.py – has a feature whereby you can mark lines of code as # pragma: no cover, ie I don’t want this line counted for coverage purposes. So he’s marked 37 of 3242 lines like this.

The reason for excluding these lines is mostly practical – due to the nature of the tool you can’t test it automatically when it is in “interactive” mode without physically pressing the buttons yourself – so automated tests for that part are impossible. Some excluded lines are for error cases which should never occur, but for which it would be useful to have a good error message if they ever did.

Overall, Geoff thinks coverage is very useful to help you to identify

  • poorly tested areas of your code
  • mistakes in your tests
  • dead code
  • refactoring opportunities

The first one is obvious, but the others might take more explaination. Generally, each test is for a specific feature. If you think you have a test for a feature, but the code coverage shows the implementation of that feature not to be covered, then there is probably a mistake in your test.

Similarly, if your tests cover all your features and some code is not covered, maybe it’s not that important code at all, and could be safely removed. Geoff’s tests are not unit tests, they are testing the whole of PyUseCase, and that maybe makes a difference with this particular point. If I just had unit tests, and a piece of code wasn’t covered, I’m not sure I could as easily infer that it wasn’t needed as a part of a larger feature.

Refactoring opportunities can be identified from gaps in coverage too. The idea is that poorly tested code is a clue that it has other problems too. Perhaps you find two pieces of code are similar, and one copy has a gap in coverage. This could indicate they originate from copy-paste programming, and could be combined into one routine, with full test coverage.

Geoff had some tips for people who wanted to use coverage statistics to improve their tests.

  • Don’t design your tests around coverage. Write appropriate tests, and then measure coverage.
  • This applies even when working with coverage results. See the coverage report as containing clues for new tests, not commands.
  • Use “#pragma : no cover” in your code to be explicit about code that you decide not to try and cover. Review these periodically.
  • Don’t be fanatical about absolute numbers. Commands like “Aim for at least 85% coverage” are counterproductive. (You get what you measure).
  • It’s always good to increase feasible coverage. It’s sometimes better to spend your limited time on other things. But if you don’t measure, you can’t make that decision effectively.

These last points are mostly also made in an article by Brian Marick which is quite old (1997). Geoff found the article when he was researching the talk for GothPy, and thought it was very good, and fits his experience.

Inspired by Geoff’s talk, I spent some time today trying to get some coverage numbers for the code and tests I’m working on at present. Unfortunately it seemed to be a bit tricky to get the coverage tool to work. It’s not python, of course, and that may have something to do with it. Hopefully I’ll sort it out and be able to write a new blog post about my own experiences with coverage statistics sometime soon.

One of the things I like about GothPy is that lots of the people in the group enjoy writing code in their spare time, and like to share with us what they’re up to.

A little while back, Olof came up with this little tool, PyTDDmon, which is to help you when you’re doing TDD. Instead of your running your tests by hand, PyTDDmon sits in the background and runs them for you. When they are green, it has a little green blob, when they are red, it tells you how many are failing, and clicking on it gives you the stacktrace. Olof put together a screencast to show how it works.

Following Geoff’s presentation of PyUseCase at GothPy in December, he and Olof started to discuss whether it could be used to test PyTDDmon. Since it’s using the GUI library Tkinter instead of pyGTK, it didn’t look all that straightforward.

Of course, Olof developed the app using TDD in the first place, (definite own dog food eating going on!), and his unit tests have about 48% statement coverage. The parts with least coverage are GUI-related, so there seemed to be scope to improve matters by adding tests via the GUI.

Geoff sat down a few days ago with a branch of the PyTDDmon code, started trying to generalize PyUseCase to work with Tkinter, and to write some tests for PyTDDmon. (Geoff also likes writing code in his spare time 🙂

As I write, he’s just checking in his changes and a new suite of tests for PyTDDmon that use the new, improved PyUseCase, with (basic) Tkinter support! He says the hardest part was understanding Tkinter, and after a spot of refactoring, only an extra couple of hundred lines of code were needed to support it.

Subsequently adding tests for PyTDDmon was very easy (Geoff wrote most of them while I made biscuits with the children this afternoon 🙂 All he had to do to Olof’s code was assign names to some of the widgets, and sort out a problem with closing a window. (There was a listener for a mouse click that would destroy the window, before other listeners got a chance to do anything, such as record the click in a use case log…)

So now the statement coverage of the PyTDDmon tests is at 98%. I look forward to future, equally productive GothPy meetings and discussions!

Geoff has been working really hard for the past few months, writing pyUseCase 3.0. It has some very substantial improvements over previous versions, and I am very excited about it. He’s written about how it works here.

It’s a tool for testing GUIs with a record-replay paradigm, that actually works. Seriously, you can do agile development with these tests, they don’t break the minute you change your GUI. The reason for this is that the tests are written in a high level domain language, decoupled from the actual current layout of your GUI. The tool lets you create and maintain a mapping file from the current widgets to the domain language, and helps you to keep it up to date.

In a way it’s a bit like Robot, or Twist, or Cucumber, that your tests end up being very human readable. The main difference is the record-replay capability. Anyone who can use the application GUI can create a test, which they can run straight away. With these other tools, a programmer typically has to go away and map the user domain language of the test into something that actually executes.

The other main way in which pyUseCase is different from other tools, is the way it checks your application did the right thing. Instead of the test writer having to choose some aspects of the GUI and make assertions about what they should look like, pyUseCase just records what the whole GUI looks like, in a plain text log. Then you can use TextTest to compare the log you get today with the one you originally recorded when you created the test. The test writer can concentrate on just normal interaction with the GUI, and still have very comprehensive assertions built into the tests they create.

pyUseCase, together with TextTest, makes it really easy to create automated tests, without writing code, that are straightforward to maintain, and readable by application users. Geoff has been developing his approach to testing for nearly a decade, and I think it is mature enough now, and sufficiently far ahead of the competition, that it is going to transform the way we do agile testing.

😀