Archive for the ‘Code Kata’ Category

Note: this article was first published on Praqma’s blog

Writing tests for ‘Theatrical Players’

When I read Fowler’s new ‘Refactoring’ book I felt sure the example from the first chapter would make a good Code Kata. However, he didn’t include the code for the test cases. I can fix that!

Martin Fowler recently published a new edition of his classic book ‘Refactoring’. The first chapter is a worked example of how he would go about refactoring a small piece of code that he needs to update with some new functionality.

This is a really common scenario faced by software developers daily. It’s impossible to anticipate every future change and enhancement, so the design of the code will often not be extensible in the direction you need to take it. Refactoring the design in order to make a proposed update easier is an essential skill.

I like using small exercises to practice these kinds of skills, called ‘Code Katas’. I’ve got a whole collection of them in my book ‘The Coding Dojo Handbook’, and more on my github page. When I saw Fowler’s example I felt it would make a really good kata. The particular refactoring techniques he demonstrates are techniques that many developers would benefit from learning and practicing.

I asked Fowler for permission to use his code, which he kindly granted, then I set about transforming this example into a kata.

Creating a new Code Kata

This is Fowler’s example javascript code, which produces a statement for a customer. The new functionality that’s needed is to produce the same information but formatted as HTML.

statement.js

function statement (invoice, plays) {
    let totalAmount = 0;
    let volumeCredits = 0;
    let result = `Statement for ${invoice.customer}\n`;
    const format = new Intl.NumberFormat("en-US",
        { style: "currency", currency: "USD",
            minimumFractionDigits: 2 }).format;

    for (let perf of invoice.performances) {
        const play = plays[perf.playID];
        let thisAmount = 0;
        switch (play.type) {
            case "tragedy":
                thisAmount = 40000;
                if (perf.audience > 30) {
                    thisAmount += 1000 * (perf.audience - 30);
                }
                break;
            case "comedy":
                thisAmount = 30000;
                if (perf.audience > 20) {
                    thisAmount += 10000 + 500 * (perf.audience - 20);
                }
                thisAmount += 300 * perf.audience;
                break;
            default:
                throw new Error(`unknown type: ${play.type}`);
        }
        // add volume credits
        volumeCredits += Math.max(perf.audience - 30, 0);
        // add extra credit for every ten comedy attendees
        if ("comedy" === play.type) volumeCredits += Math.floor(perf.audience / 5);
        // print line for this order
        result += ` ${play.name}: ${format(thisAmount/100)} (${perf.audience} seats)\n`;
        totalAmount += thisAmount;
    }
    result += `Amount owed is ${format(totalAmount/100)}\n`;
    result += `You earned ${volumeCredits} credits\n`;
    return result;
}

As you can see, the logic for formatting the statement is all mixed up with the calculation logic, so as it stands it’s not straightforward to add the HTML formatting feature. What’s needed is to change the design and detangle the logic, via refactoring.

Fowler mentions that the first step in refactoring is always the same – to ensure you have a solid set of tests for that section of code.

However, he did not include the test code he used in the book. (It’s a book primarily about refactoring, and I guess including test code would have taken attention away from that topic). To make this example into a Kata, I wanted to include some tests.

The testing approach I favour in this kind of situation is approval testing. The code exists already and it works, so the easiest way to add a regression test is to find some test data, exercise the code, and approve the result.

This is different from a test-driven development approach where you define the test case before the code exists. (You can use approval testing in that kind of scenario too, but that’s a topic for another article). I usually find it much quicker and easier to add an approval test to existing code than a classic assertion-based test case.

Creating a first approval test

Fowler supplies us with some test data – a sample invoice:

invoice.json
{
  "customer": "BigCo",
  "performances": [
    {
      "playID": "hamlet",
      "audience": 55
    },
    {
      "playID": "as-like",
      "audience": 35
    },
    {
      "playID": "othello",
      "audience": 40
    }
  ]
}

And a list of plays:

plays.json
{
  "hamlet": {"name": "Hamlet", "type": "tragedy"},
  "as-like": {"name": "As You Like It", "type": "comedy"},
  "othello": {"name": "Othello", "type": "tragedy"}
}

The function that we want to test, statement, returns the information to send to the customer, formatted as a plain text string. This statement string is perfect to use as the approved value in the approval test.

This is the test case I designed, it uses the Jest testing framework:

statement.test.js
test('example statement', () => {
   const invoice = JSON.parse(fs.readFileSync('test/invoice.json', 'utf8'));
   const plays = JSON.parse(fs.readFileSync('test/plays.json', 'utf8'));
   const statement_string = statement(invoice, plays);
   expect(statement_string).toMatchSnapshot();
});

First, I read both data files and parse them into javascript objects. Then I call the function we want to test – “statement” and store the result in the ‘statement_string’ constant. The last line of the test checks that this result matches the approved value. What it does is compare the actual value against a stored value which I checked and approved earlier.

This is kept in another file:

__snapshots__/statement.test.js.snap
exports[`example statement 1`] = `
"Statement for BigCo
Hamlet: $650.00 (55 seats)
As You Like It: $580.00 (35 seats)
Othello: $500.00 (40 seats)
Amount owed is $1,730.00
You earned 47 credits
"
`;

Is it called Snapshot or Approval testing?

The Jest testing framework calls the approved value a ‘snapshot’. I prefer to talk about ‘approval’ testing and comparing with an ‘approved’ value. I think these words get to the heart of what you’re doing – deciding what is good enough to test against.

‘Snapshot’ as a word just implies it will be updated again soon. I like to emphasize the human agency involved in inspecting and deciding whether to approve a new value or not.

Is this test good enough to support refactoring?

This test goes a long way to giving us the confidence we need to start refactoring the ‘statement’ code. It generates a full statement with test data that seems realistic enough for this context.

I think it’s a good idea to do a little more analysis of whether this test is sufficient to support the refactoring we want to do. One way to do that is to run the test and measure code coverage. Lines of code that are not covered could be refactored away, or have bugs inserted into them, without the tests failing.

The coverage data shows only one line of code which is not covered by this test:

Line 28 in statement.js:
throw new Error(`unknown type: ${play.type}`);

Our test case is pretty good, to only have one line uncovered. The test data supplied by Fowler includes different kinds of play that already exercise several paths through the code.

This code path is different, however. If you end up on this line then the function aborts, and doesn’t produce a statement. We can’t tweak the data in our existing code to make it also execute this line and still produce a statement.

What we need is an additional test, one that uses an unsupported play type. I invented some new data that continues the Theatrical theme:

new_plays.json
{
 "henry-v": {"name": "Henry V", "type": "history"},
 "as-like": {"name": "As You Like It", "type": "pastoral"}
}

invoice_new_plays.json
{
  "customer": "BigCoII",
  "performances": [
    {
      "playID": "henry-v",
      "audience": 53
    },
    {
      "playID": "as-like",
      "audience": 55
    }
  ]
}

This is the test case that uses them:

statement.test.js
test('statement with new play types', () => {
    const invoice = JSON.parse(fs.readFileSync('test/invoice_new_plays.json', 'utf8'));
    const plays = JSON.parse(fs.readFileSync('test/new_plays.json', 'utf8'));
    expect(() => {statement(invoice, plays)}).toThrow(/unknown type/);
});

This test is not an approval test, it’s a normal assertion-based test. It checks that the error is thrown and the message contains the string ‘unknown type’.

I could have instead used an approval test to check the error message. In this case I chose not to do that. The string to be checked is quite short, and there is little advantage in storing it in a separate file away from the test code.

Confidence in the tests

With that test added, we’re more confident we can use this test suite to support refactoring.

We have 100% coverage now, after all. Unfortunately, that is no guarantee that our tests are perfect, or even reasonably good. One thing I like to do is a spot of (manual) mutation testing. to assess how good my tests are.

I edit the code to insert a bug, then run my tests. If they fail, then that’s a good sign. I might try adding bugs in several different places to increase my confidence.

When I try it on this code, I find any change that affects the way the statement is presented gives me a failing test. I can also easily get a nice diff showing exactly what I broke.

tests-screenshot

I’m now feeling pretty confident these tests will be good enough to support refactoring.

This is a good starting point for the kata. From here you can practice all the refactorings demonstrated by Fowler in his book, with a safety net to alert you if you make a mistake.

When you come to implement the HTML rendering feature, it should be straightforward to add a new test for it too.

Adding more languages to your tests

In the book, the example language is Javascript, which is widely-used and understood. I often work with teams who also use languages like Java, C# and Python, and the refactoring techniques are just as relevant in those languages. Once I had this JavaScript version working, I added a translation or two. One of the reasons I keep my exercises publicly available on GitHub is so that people can send me pull requests.

If you’d like to do this exercise in a programming language that’s not supported yet, you can fix that for everyone 🙂 I particularly like getting pull requests with translations of my exercises.

The published code kata

The starting point for this new exercise ‘Theatrical-Players-Refactoring-Kata’ is now published and available for download.

The first chapter of ‘Refactoring’ containing a worked solution to this problem is available as a free sample. You have every opportunity to practice and learn these testing and refactoring techniques now.

Happy coding!

A new card game to design Continuous Delivery pipelines

Note: This article was originally published on Praqma’s blog

What testing steps should you include in your Continuous Delivery pipeline? Don’t just string together existing manual processes – use simple, collaborative tools to design something better!

Creating a Continuous Delivery (CD) pipeline is a key development in an organization’s transformation to DevOps. A CD Pipeline covers all the activities needed to transform a code change made by a developer into updated software bringing value to users. Steps in the pipeline include building a new version of the software as well as testing and deploying it. Exactly what kinds of build, test and deployment steps will depend on many factors and there is no ‘one perfect pipeline’ which will suit all situations.

I created the card game ‘Pipeline’ as a quick and fun way to explore alternatives for a CD pipeline without actually building anything. The goal of the game is to design a pipeline for a given scenario and optimize the deployment lead time. You work in a small group and get to discuss what steps are needed and which order you want to do them in. You will run into design tradeoff decisions and may discover people have different risk tolerances. If you play the game a second time with a different scenario, or compare notes with another group, you can learn more about how different scenarios drive different decisions.

facilitating

Playing the game should help you to build a real pipeline for the real software system you are working on. Building a CD pipeline needs specialist knowledge of particular tools and could be many weeks of work. Playing this game should help you to avoid some costly misunderstandings. It’s also a fun way to engage a small group for an hour or two while you think through the issues you’re facing.

Scenario cards

To play the game each group begins with a sketch of a guiding scenario. For example, imagine you work for a startup aiming to displace an established business, like a social media website or an online bank. The reason for the scenario is that it makes the discussions more concrete and productive. You can discuss some specific risks you need to mitigate through testing. With that in mind it’s easier to reason about how long to spend on each testing step, and also assess the potential impact of excluding a step entirely.

Knowing the deployment lead time of the competitor in the scenario helps guide the group to discussions about tradeoffs and where quality comes from. Can we leave out this slow testing step? Can we mitigate that risk another way?

Scenario Cards

Pipeline step cards

The ‘pipeline step’ cards form the majority of the cards in the deck. These are the ones you lay out on a table to define the structure of your pipeline. You can choose which steps are important and which can be optional or omitted entirely. You can put several steps in parallel to reduce overall lead time. You can choose to perform a step twice, both before and after building the deployable component for example, or to do some steps before the beginning of the pipeline. The deck includes a few blank cards so that you can invent completely new steps. Just put a new sticky note on one with a text explaining what the new step involves.

Pipeline Step Cards

Game rule cards

You have a lot of freedom about how to design your pipeline, but in reality there are a few rules about how you must order the steps. For example, you can’t deploy to production before you’ve built a release candidate. The “Game Rule” cards help you to keep track of these rules. One of the cards reminds you to add sticky notes to each step in the pipeline with an expected execution time. You should estimate this – either how long you think it will take, or a timebox for how long it is allowed to take. If it’s a manual step this estimate should include the time spent waiting for the person to be available.

Constraint Cards

Review

It might take a group of 4-6 people an hour or so to design a pipeline for their scenario and estimate the overall deployment lead time. When this is done (or the time for this part of the exercise is used up!) you should bring out the ‘review’ card. Each group can discuss what the implications of having this pipeline would be for their startup business. How does your lead time compare with your competitor’s? You might discuss how deployment lead time relates to time to market, or whether you would be able to do Lean Product Development, or A/B testing.

Some of the fictitious competitors have rather short deployment lead times. That’s quite deliberate on my part. I’d like people to think about how the competitor is achieving that. What must their pipeline look like? What testing must they be doing differently? You may think they are taking too many risks, and they will be easy to out-compete. Or perhaps they have found a better way, and you’d better imitate them.

Some of the fictitious competitors have much longer deployment lead times. I’d like people to think about whether that is really necessary. Perhaps they have included heavyweight change processes that take time without providing any benefits.

In any case, it’s interesting to think about, and might help you make more informed choices when you come to design a pipeline for your production system. Playing this game could alert you to some of the real life pitfalls that often arise.

Notes for a facilitator

If you want to play this game with your team I hope it will be straightforward for you to organize. The instructions are on the box. Allocate something between 1-2 hours for the exercise, split people into groups of about 3-6 people, with one deck of cards per group. You will also need a block of small sticky notes and pens for each group.

Your job as facilitator is to make sure everyone understands the point of the exercise, follows the pipeline game rules, and correctly calculates their deployment lead time. You should keep time and answer questions as they come up.

One thing a group might do is to decide on a microservices architecture, or front and back ends that are deployed separately. They might want to design a different pipeline for each service or each deployable component. That’s harder to do with only one deck of cards. For the purposes of this exercise, ask them to pick only one pipeline to design to start with. If they complete it quickly, you could give them another deck of cards to design another one later on.

As facilitator you may want to be quite active in the ‘review’ part of the game. Ask leading questions about the pipeline design to get people to explain their reasoning. If there are several groups you’ll want to have them look at one another’s work. This will help people to understand that different scenarios might call for different pipeline design choices. Everyone should come away with a better understanding of how to design a CD Pipeline and the hard choices that sometimes have to be made. Remind them that what they are learning in this game could help you all to avoid costly mistakes when designing a CD pipeline for real.

Throughout, try to keep these goals in mind:

  • People should have fun
  • Each group should come up with a feasible Continuous Delivery pipeline for their scenario, and they should be able to explain why they designed it that way
  • People understand what deployment lead time is, and that it could be significant when it comes to business competitiveness
playing

Acknowledgements and Thanks

The first idea for this game came from a fun exercise at a workshop at the European Testing Conference in February 2018, led by Lisa Crispin and Abby Bangser. They divided us into groups, then handed each a pile of cards showing names of steps you could include in a pipeline. For example, “automated test: unit test”, “manual test: exploratory test”, or “deploy step: deploy to performance testing environment”. We had to collaborate to lay out the cards on a table in the form of a pipeline, and estimate the overall lead time.

I really enjoyed the exercise. There is something about having cards to move about and wave at people that encourages discussion and group engagement. Some cards provoked lively controversy – I remember discussing whether to include a manual code review step before merging to master, with someone who was quite convinced it was essential.

Iterative improvements to the exercise

I liked the exercise so much I decided to adapt it for a workshop I lead about Testing in Continuous Delivery. As I did this exercise with different groups I tweaked it a little each time. I also did it once with Llewellyn Falco and incorporated some of his suggestions too.

As the game gradually grew more sophisticated; I realized it would be worth getting some custom cards printed, so I started working with Phable on a layout and design. My colleagues at Praqma did some play-testing of an early version, and contributed some useful ideas. I’d like to thank everyone who played earlier versions of this game with me along the way. It’s a better game because of your feedback!

The second version of the cards came about six months after the first print run, following many successful CD workshops. Some of the ‘scenario’ cards have a lead time measured in weeks, and I wanted to make it possible to model those pipelines too. I added a few cards with more traditional release activities like a ‘Go/No Go meeting’ and ‘Decide Release Version number’. I also took on board some feedback from Steve Smith that ‘Constraint’ cards were poorly named – it reminds too much of ‘Theory of Constraints’, which has nothing to do with this game. I now call them ‘Game Rule’ cards, which I hope you’ll agree is more accurate.

I’d like to thank everyone who played earlier versions of this game with me along the way. It’s a better game because of your feedback!

Get your own card deck

These card decks are available for sale from Praqma. Head over to the original version of this article – there is a “Buy Now” link at the end of it: https://www.praqma.com/stories/pipeline-card-game/

When you inherit difficult code it can take weeks to become productive. Having the right tools for the job and knowing how to use them makes a huge difference. These videos show you how.

Note: this post originally appeared here https://www.praqma.com/stories/advanced-testing-refactoring-techniques/

Sometimes you don’t know what a piece of code is supposed to do, but you do know that people are using it in production, and that it in some sense ‘works’. One approach I often use in this situation is Approval testing. It can get you test coverage quickly without having to understand the code.

Since you don’t know what the code is supposed to do, you can’t define in advance what results you expect. But, what you can do is run the code, accept whatever it does as ‘correct’, then invent scenarios that will exercise all the code branches. I’ve made a video of me doing just this on some rather hairy legacy code – The Gilded Rose Refactoring Kata. With the right tools the tests fall into place relatively easily.

I’d like to credit Llewellyn Falco who showed me this way to solve this exercise.

I recorded a screencast in three parts. This is the first part.

Part 1: Introducing the Gilded Rose kata and writing test cases using Approval Tests

About the Gilded Rose code

One of the exercises I’ve used for years to help programmers improve their skills is the Gilded Rose Kata. It’s a refactoring kata – the code needs cleaning up and tests adding so you can build a new feature. That is a realistic scenario that programmers often face in everyday work, but this exercise adds a fantasy twist. The code you have to work with keeps track of various magical items stocked at the Gilded Rose establishment. The new feature concerns support for “Conjured” items that have slightly different magical properties from the other items. The scenario is just weird enough to be fun and just realistic enough to be a useful exercise.

I didn’t design the kata originally, that was Terry Hughes. I spruced up the code a little to make it a better exercise and added some extra instructions to get you going. I also translated the starting code into a few different programming languages and put it up on GitHub. In the 5 years since then I have been delighted to see how popular it’s become. I’ve had over 50 contributors chip in with various translations and improvements, and at least 800 people have forked the project and presumably had a go at the refactoring.

I think the appeal of the exercise is partly the wacky scenario it throws you into, and partly how utterly terrible the code is at the start. If you do the refactoring well it actually looks really neat at the end, which is very satisfying.

Lift-Up Conditional

When you inherit difficult code it can take weeks to become productive. I’d like to show you the difference it can make when you have the right tools for the job and know how to use them.

Once you’ve got good tests in place you can refactor much more confidently. In my previous post I showed how to get good tests using Approval Testing. I’m pretty confident in these tests, so I’ve made a second video showing some initial refactorings I’d do to get this code cleaned up a little.

One of the techniques I’m using is called ‘Lift-Up Conditional’. It’s a manipulation of a long complex conditional statement that will let you group together all the statements related to one particular conditional. I haven’t seen this particular refactoring described in the literature before – it was Llewellyn Falco who showed it to me originally. It’s perfect for the Gilded Rose code which basically comprises one big complex conditional.

The other star of this show is IntelliJ. It has a lot of automated refactorings that come together to make ‘Lift-Up Conditional’ easy and it makes really short work of cleaning up this code.

This is the second screencast in the series. My aim is to show that with the right tools and refactoring know-how you can quickly become productive with the code, even without fully understanding the byzantine business rules.

Part 2: Refactoring item logic using ‘lift up conditional’

Replace Conditional with Polymorphism

When you inherit difficult code it can take weeks to become productive. I’d like to show you the difference it can make when you have the right tools for the job and know how to use them.

Once you’ve got the code cleaned up to the point where you can see the parts of the logic that belong together, you can start to create a better class structure. A classic refactoring for this is ‘Replace Conditional with Polymorphism’ which was first described in Martin Fowler’s book ‘Refactoring’.

The basic idea is that you create subclasses to encapsulate the logic concerned with each logical case. Your design becomes much more flexible if you need to add new types that are variations on types that are already there – as in this case.

This is the third screencast in the series. My aim is to show that with the right tools and refactoring know-how you can quickly become productive with this code, even without fully understanding the byzantine business rules.

Part 3: Replace Conditional with Polymorphism


I’ve been favouring an Approval Testing approach for many years now, since I find it pretty useful in many situations, particularly for acceptance tests. Not many people I meet know the term though, and even fewer know how to use the technique. Recently I’ve put together some small exercises – code katas – to help people to learn about it. I’ll be going through them at a couple of upcoming conference workshops, but for all you people who won’t be there in person, I’m publishing them on github as well.

I’ve got three katas set up now, Minesweeper, Yatzy and GildedRose. If you’ve done any of these katas before, you’ll probably have been using ordinary unit testing techniques. Hopefully by doing them again, with Approval Testing, you’ll learn a little about what’s different about this technique, and how it could be useful.

Before you can do the katas, you’ll need to install an approval testing tool. I’m one of the developers of TextTest, so that’s the tool I’ve set up right now. Below are some useful commands for a debian/ubuntu machine for installing it.

I’m still developing these exercises, and would like feedback about what you think of them. For example I have Python versions for all three, but only one has a Java version as yet. Do people want more translations? Do let me know how you get on, and what you think!

Installation instructions

You will need to have Python 2, and TextTest. (Unfortunately TextTest uses a GUI library that doesn’t support Python 3). For example:

$ sudo apt-get install python-pip
$ sudo pip install texttest

For more detailed instructions, and for other platforms see the texttest installation docs. For more general documentation, see the texttest website.

You need to have an editor and a diff tool configured for texttest to use. I recommend sublime text and meld. Install them like this:

$ sudo add-apt-repository ppa:webupd8team/sublime-text-3
$ sudo apt-get update
$ sudo apt-get install sublime-text-installer
$ sudo apt-get install meld

Then you need to configure texttest to use them:

$ cd
$ mkdir .texttest
$ touch .texttest/config
$ subl .texttest/config

Enter the following in that file, and save:

[view_program]
default:subl
[end]
[diff_program]
default:meld
[end]

For convenience, I also like to create an alias ‘tt’ for starting TextTest for these exercises. Change directory to one of the exercise repositories, then a ‘tt’ command should start the TextTest GUI and show the tests for that exercise. Define such an alias like this:

alias tt='texttest -d python -c .'

Two of the exercises start with a small test suite for you to build on. There should be instructions in the README file of each respective exercise, to help you to get going. If you really can’t work out what to do, have a look at the sample solutions and see if that helps. These are also on github: Minesweeper-sample-solution, Yatzy-sample-solution, GildedRose-sample-solution

I’ve been interested for a while in the relationship between TDD and good design for a while, and the  SOLID principles of Object Oriented Design in particular. I’ve got this set of 4 “Racing Car” exercises that I originally got from Luca Minudel, that I’ve done in coding dojos with lots of different groups. If you’ve never done them, I do recommend getting your editor out and having a go, at least at the first one. I think you get a much better understanding of the SOLID principles when you both know the theory, and have experienced them in actual code.

I find it interesting that in the starting code for each of the four Katas there are design flaws that make it awkward to write unit tests for the code. You can directly point to violations of one or more of the SOLID principles. In particular for the Dependency Inversion Principle, it seems to me there is a very direct link with testability. If you have a fixed dependency to a concrete class, that is always going to be harder to isolate for a unit test, and the Tyre Pressure exercise shows this quite clearly.

What bothers me about the 4 original exercises is that there are actually 5 SOLID principles, and none of them really has a problem with the Liskov Substitution Principle. So I have designed a new exercise! It’s called “Leaderboard” and I’ve put it in the same git repository as the other four.

I tried it out last week in a coding dojo with my colleagues at Pagero, and it seemed to work pretty well. The idea is that the Liskov principle violation means you can’t propely test the Leaderboard class with test data that only uses the base class “Driver”, you have to add tests using a “SelfDrivingCar”. (Ok, I confess, I’ve taken some liberties with what’s likely in formula 1 racing!) Liskov says that your client code (ie Leaderboard) shouldn’t need to know if it has been given a base class or a subclass, they should be totally substitutable. So again, I’m finding a link between testability and good design.

Currently the exercise is only available in Scala, Python and Java, so I’m very open to pull requests for translations into other programming languages. Do add a comment here or on github if you try my new Kata.