Archive for the ‘Review’ Category

I’ve known the author for many years and actually did some remote pair programming with him in early 2020 which we published as a video. I’m confident Adrian is very skilled at both pair programming and working with distributed development teams. I was pleased when he asked me to review this book and, (full disclosure), the publisher sent me a free copy.

Adrian’s advice is detailed, specific, and as the title implies, intensely practical. I won’t relate all of his advice here, of course, but a couple of things stuck out to me. 

Many of us these days work in cross functional teams and any given feature might need both some Java code and some Javascript code to be written. If you have a Java programmer and a Javascript programmer, they can pair together on a Javascript task, and after an initial slowdown while the Java programmer learns some Javascript, you can expect the overall team productivity on Javascript tasks to increase. He estimates it could take 2-3 weeks of daily pair programming sessions for that to happen. Over time you can build a truly cross-functional, flexible team.

Another situation where pairing can give superior results is when you have a rather complex programming task which two senior developers work on together. If they are skilled at pair programming they will likely be able to complete this complex task faster and with higher quality than either would alone.

As a technical coach, I was particularly interested that Adrian recommends any team starting out with pair programming to have a trainer/facilitator or coach to help them. He is clearly experienced in taking that role himself. He mentions that some programmers like working alone and don’t want to pair program at all. In that case you need a good coach who will create a safe space to try it and with tact and patience make it possible for all kinds of people to learn the necessary skills and techniques. He does note that some people never take to it though. I would have liked a few more words there about what (if anything) you can do about that.

The book is structured into three parts, the first part is about pair programming generally and the second part is about remote pair programming. The third part is a kind of appendix with lots of detail about tools. When I read the first chapter in part 2 I suddenly felt like it should have been the first chapter of the whole book. It gives a good summary of almost all the topics that appear elsewhere. This was where he was persuasive about all the good reasons to do pair programming, along with strategies for when to use specific styles and techniques. I found this was the most useful part. It gave me a kind of index for looking up interesting information in the rest of the book.

Some of the detailed advice in subsequent chapters didn’t interest me very much. There was really a huge amount of advice about setting up lighting and sound and remote collaboration tools. I would have preferred Adrian to spend more time illustrating the various pairing styles and techniques. I’m still a little hazy about the difference between the Pairing-Trainee technique and the Beginner-Advanced technique. That seems to me to be advice that will still be useful and relevant when the differences between Zoom and Google Meet in early 2021 are long forgotten.

My personal advice – read chapter 4 to find out Adrian’s overall approach. Read chapter 6 to find out more about remote styles and techniques. Refer to chapter 3 for more detail on any techniques you’re less familiar with. Chapters 1 and 2 have relevant background information about pair programming in general. Read the last section before the ‘summary’ in chapter 5 and marvel at the sheer amount of technology and hardware that Adrian prefers for his personal remote pair programming setup, then read any more of chapter 5 and 7 that you need to achieve the equivalent for yourself. 

A book for managers, about architecture

Ruth Malan re-stated Conway’s law like this: “If the architecture of the system and the architecture of the organization are at odds, the architecture of the organization wins”. As a technical leader, that really caught my attention. The sphere of things I should be able to influence in order to do my job well just grew! 

Team Topologies book on my desk with some post-its around it

I read that quote in the book “Team Topologies” by Matthew Skelton and Manuel Pais. I think it’s a really interesting book and introduces several ideas that were new to me at least. They deserve a wide audience within the software industry. In this article I’ll give you some highlights, and hopefully tempt you to read the book for yourself.

This first idea perhaps isn’t so novel, but it is the basis for the rest, so worth mentioning first. You should use the team as the fundamental building block for your organization. Rather than thinking about hierarchies or departments that get re-organized from time to time, think about building close-knit, long-lived teams. Adopt a team structure that is aligned to the business and adapts and evolves over time. 

Conway’s law

When designing a team structure for your organization you should take into account Conway’s law. Your team structure and your software architecture are strongly linked. You should design team responsibilities and communication pathways. Skelton and Pais suggest there are four fundamental team topologies, ie kinds of team. The main one is ‘business stream-aligned’, the other three have supporting roles. The way they describe a stream-aligned team seems familiar to me. It’s like a DevOps team or a feature team. I think the name is better though. It’s a team that’s aligned to a single, valuable stream of work. They do everything needed to deliver valuable software to customers, gather feedback from how it’s used, and improve the way they work.

The other three kinds of team exist to reduce the burden on those stream-aligned teams. Skelton and Pais introduce the idea that good organization design requires restricting the amount of ‘cognitive load’ each team is expected to handle. Software systems can become large and complex and can require an overwhelming amount of detailed knowledge to work on them. This really resonated with me. I’ve worked on teams where the amount of software we had to look after was just too much to cope with. Important work was delayed or missed because we were overburdened.

The other three types of team are: 

  • Platform 
  • Complicated Subsystem
  • Enabling

There is a fuller description of each in the book of course, but to summarize – platform teams provide a self-service tool, API or software service. Enabling teams support other teams with specialized knowledge in particular techniques and business domains. Complicated subsystem teams look after a particular component of the software that needs specialized knowledge, like an optimizer or a machine learning algorithm. 

Following from the idea of Conway’s law in particular, is the idea that you should have only three ‘interaction modes’ between teams. Restrict the communication pathways to get the architecture you want, and to avoid unnecessary team cognitive load. Skelton and Pais suggest teams should work together in one of three ways:

  • Collaborate
  • Facilitate
  • Provide X-as-a-Service

Collaboration means two teams work closely together, have a common goal, and only need to work this closely for a limited time. (Otherwise the teams would merge!) Facilitation is about one team clearing impediments that are holding back the other team. X-as-a-service is a much looser collaboration, where one team provides something like a software library or api to another team. The way teams interact with one another will evolve over time, and consequently your organization will also evolve.

I thought it was a good sign that I could imagine how my own work would fit into an organization designed this way. I think I would fit well into an Enabling team that would support stream-aligned teams through facilitation. We would work with several teams over time. My particular role is to facilitate a team to clear impediments around technical debt and code quality, and learn skills like Test-Driven Development. 

Team Topologies really does make organizational design feel like you’re doing architecture. Skelton and Pais have pretty diagrams throughout their book with colours and patterns and lines and boxes describing various organizational designs. It’s all very attractive to a software developer like me. I think the intended audience is managers though. People who are designing organizations today. I really hope some of them read this book and are inspired to involve technical people in important organizational decisions.

I was in Finland recently, at the European Testing Conference. I both attended the conference and presented a workshop about “Approval testing with TextTest“. I won’t say any more about that, since Ben Linders did a brilliant write-up already that was published on InfoQ. There were several other highlights, and I wanted to just share a paragraph or so about each.

Mob Testing is what happens when your development team decides to work together on testing tasks as a Mob. I took part in a workshop where Maaret Pyhäjärvi facilitated two different mobbing exercises, one where we automated some UI tests using Selenium, and one where we practiced Test-Driven Development on the FizzBuzz kata. I have already done some Mob Programming and this felt very similar, except the focus was on developing tests rather than production code. It seems to have similar benefits – you have access to all the knowledge of everyone in the team, and you can learn things you didn’t even know to ask about. It makes pairing seem like a slow way to share good working practices.

JUnit 5 is on the horizon, and has several useful improvements over the previous version. Generally the syntax clutter is reduced, and the way you create parameterized tests has been overhauled. The most significant change though, (especially for people like me who work on developing other testing tools), seems to be that they’re designing the test-running engine to be separated so you can re-use it to run other kinds of tests. Any infrastructure that works with JUnit will then be able to run these other tests as well. In principle it opens up JUnit’s success as a platform, to be re-used by other test frameworks. Thanks Nicolai Parlog for this useful summary of the next generation of one of the most widely-used tools in the Java world.

Joel Hynoski has worked at many of the tech giants in our industry, including Google, Twitter, Apple, and now Lyft. He spoke about some of the engineering challenges they had overcome, specifically in the area of testing. One thing I liked was their tool that detects flaky tests, and puts them in ‘jail’. (A flaky test is one that sometimes passes and sometimes fails, when run against the same code. They are a pain and can be a huge waste of time.) When a test is in ‘jail’, that means it’s no longer run in the build pipeline, so it doesn’t block new releases. It instead gets flagged as needing maintenance. They then have a SLA that says how long a test is allowed to remain in jail before an engineer needs to look at it and fix the flakyness – a day or two I think.

I can feel a little in awe of someone who has worked in those kinds of famous engineering organizations, working at web-scale with some of the best developers in our industry. What I found most encouraging about talking to Joel, was that he was very down to earth about the problems these organizations face. They still battle with legacy code, despite it often only being a few years old. They have trouble creating reliable automated tests. The developers don’t always trust the test automation. They still have production bugs and hotfixes…

Alex Schladebeck spent the first ten minutes of her presentation giving a splendid rant about the bad reputation of UI testing. To summarize: (criticisms she hears about UI tests -> her responses)

UI tests give slow feedback -> and valuable feedback, doesn’t have to be after every build
need more infrastructure/machines -> yes, deal with it
they’re the top of the test pyramid -> they are in the pyramid! you can’t ignore them. They find different stuff than unit tests. Consider your context.
they’re flaky -> they’re not as bad as they used to be! Could be your app isn’t designed for testabilty? Could be your test design is poor?
they cause lots of work when small changes in your app -> that happens in development work too! Also, happens more if you design them badly.

She then went on to give some excellent advice about how to design your UI tests. It was mostly about layering your test code in different levels of abstraction, and getting a good collaboration going between developers and testing specialists.

Conferences are about meeting people and the organizers of this conference had very deliberately scheduled sessions to encourage this. We had a ‘speed dating’ session where you talk to about 8 random people for five minutes each. We had a ‘lean coffee’ session, where all the speakers were each asked to facilitate a discussion table. I thought this worked particularly well as a way to find people with similar interests, and get them to talk about their experiences. The hands-on workshops were all at the same time, so you had to go to one and not just attend talks all the time. There was also an open space scheduled when it would not clash with any other kinds of sessions. I thought all this together made for a pretty welcoming conference where you were bound to get to know new people.

Overall I had a really good time at this conference and I’d recommend it to both testers and developers with a strong quality focus.

I blogged a while back about “Text-Based testing”, which is a variant of Test-Driven Development that I’ve used quite a bit. My husband, Geoff Bache, is developing several tools to support this style of development.

Recently, we met Llewellyn Falco and discovered the work he’s been doing with Approval Tests. We were all really excited to realize we’ve independently been working on something very similar. Llewellyn’s Approval Tests library is in some ways a unit-test version of Geoff’s tool TextTest which is probably more suited to integration or functional tests. What we’ve been calling “Text-Based Testing” I think is better described as “Text-Based Approval Testing”. I think it’s a particularly powerful technique for characterization tests of legacy code, and regression testing in general. Geoff’s latest tools also make it a viable approach for GUI testing, traditionally an area where people have difficulty doing pure TDD. (We’ll be talking about this at Eurostar in November.

I’ve written a fuller description of Approval Testing in a chapter of my work-in-progress book “Mocks, Fakes and Stubs”, but I’ll summarize here. In classic Test-Driven Development, you begin by defining a test case comprising three parts – “Arrange”, “Act”, “Assert”. The assertion generally takes the form assertEqual(expected, actual), and you calculate the expected value when you define the test case. Then you go away and implement the functionality, until the “actual” value matches “expected”, and the test passes.

With Approval Testing, you design the test case to the point of having “Arrange” and “Act”, but defer defining the “expected” value for the “Assert”. You take the approach of “I’ll know it when I see it”, and get on with implementing the code. When the actual value the code produces looks right, you “Approve” it – store the actual value in the test case.  So the assert statement becomes “assertEqual(approved, actual)”.

Text-Based Approval Testing
The value you approve on could be anything you can automatically diff the actual program output against – a file, a string, a screenshot, some json, contents of a database table… you name it. The thing is, plain text is wonderfully simple to diff, version control, merge, store, manipulate… and there’s a wealth of existing, well understood tools to do that. I guess that’s why Geoff’s tools only support plain text so far. His approach has always been that if your program produces output in a different format, you write a test fixture to convert it to plain text before you diff. Llewellyn’s tools have branched out more into diffing images and suchlike.

I think “Approval Testing” is a good name for the style of testing both Geoff and Llewellyn’s tools support. I like the implication that you explicitly approve the output from your program as correct, and use that as the basis for your test.

Other Approval Testers

Geoff and Llewellyn aren’t the only people using an Approval testing approach, either. Recently I led a workshop where we compared writing tests for the Gilded Rose Kata using both Cucumber and Approval Tests. Nat Pryce was there, and he later blogged about it. He speculates that Approval testing might solve some problems he’s seen with other kinds of testing. Nat has subsequently started developing a new approval testing tool, Pearlfish, so he can test his ideas.

There is also this recent screencast by Brett Slatkin from Google  who explains how he’s using an approval testing technique with image diffs to regression test his webapp. He says he finds this technique essential in a continuous delivery environment – these tests find bugs his other tests (both manual and automatic) miss entirely.

I have also found Approval testing to be a really useful technique, and I hope that simply having a good name for it will help people understand what it is. Perhaps you’ll realize it’s an approach you’ve already used, just without having a name for it. Or maybe you’ll be inspired to try out one of the tools I’ve mentioned.

For a little while now I’ve been collecting Refactoring Kata exercises in a github repo, (you’re welcome to clone it and try them out). I’ve recently facilitated working on some of these katas at various coding dojo meetings, and participants seem to have enjoyed doing them. I usually give a short introduction about the aims of the dojo and the refactoring skills we’re focusing on, then we split into pairs and work on one of these Refactoring Katas for a fixed timebox. Afterwards we compare designs and discuss what we’ve learnt in a short retrospective. It’s satisfying to take a piece of ugly code and after only an hour or so make it into something much more readable, flexible, and significantly smaller.

Test Driven Development is a multifaceted skill, and one aspect is the ability to improve a design incrementally, applying simple refactorings one at a time until they add up to a significant design improvement. I’ve noticed in these dojo meetings that some pairs do better than others at refactoring in small steps. I can of course stand behind them and coach to some extent, but I was wondering if we could use a tool that would watch more consistently, and help pairs notice when they are refactoring poorly.

I spent an hour or so doing the GildedRose refactoring kata myself in Java, and while I was doing it I had two different monitoring tools running in the background, the “Codersdojo Client” from http://content.codersdojo.org/codersdojo_client/, and “Sessions Recorder” from Industrial Logic. (This second tool is commercial and licenses cost money, but I actually got given a free license so I could try it out and review it). I wanted to see if these tools could help me to improve my refactoring skills, and whether I could use them in a coding dojo setting to help a group.

Setting it up and recording your Kata
The Codersdojo Client is a ruby gem that you download and install. When you want to work on a kata, you have to fiddle about a bit on the command line getting some files in the right places, (like the junit jar), then modify and run a couple of scripts. It’s not difficult if you follow the instructions and know basicly how to use the command line. You have a script running all the time you are coding, and it runs the tests every time you save a source file.

The Sessions Recorder is an Eclipse plugin that you download and install in the same way as other Eclipse plugins. It puts a “record” button on your toolbar. You press “record” before you start working on the kata.

Uploading the Kata for analysis
When you’ve finished the kata, you need to upload your recording for analysis. With the Codersdojo Client, when you stop the script, it gives you the option of doing the upload. When that’s completed it gives you a link to a webpage, where you can fill in some metadata about the Kata and who you are. Then it takes you to a page with the full final code listing and analysis.

The Sessions Recorder is similar. You press the button on the Eclipse toolbar to stop recording, and save the session in a file. Then you go to the Indutrial Logic webpage, log into your account, and go to the page where you upload your recorded Session file. You don’t have to enter any metadata, since you have an account and it remembers who you are, (you did pay for this service after all!) It then takes you to a page of analysis.

Codersdojo Client Analysis
The codersdojo client creates a page that you can make public if you want – mine is available here. It gives you a graph like this:

Screen Shot 2012-08-16 at 09.08.08

It’s showing how long you spent between saving the file, (a “move”), and whether the tests were red or green when you saved. There is also some general statistics about how long you spent in each move, and how many modifications there were on average. It points out your three longest moves, and has links to them so you can see what you were doing in the code at those points.

I think this analysis is quite helpful. I can see that I’m going no more than two or three minutes between saving the file, and usually if the tests go red I fix them quickly. Since it’s a refactoring kata I spend quite a lot of moves at the start where it’s all green, as I build up tests to cover the functionality. In the middle there is a red patch, and this is a clear sign to me that I could have done that part of the kata better. Looking over my code I was doing a major redesign and I should have done it in a better way that would have kept the tests running in the meantime.

Towards the end of the kata I have another flurry of red moves, as I start adding new functionality for “Conjured” items. I tried to move into a more normal TDD red-green-refactor cycle at that point, but it actually doesn’t look like I succeeded very well from this graph. I think I rushed past the “green” step without running the tests, then did a big refactoring. It worked in the end but I think I could have done that better too.

Sessions Recorder Analysis
The Sessions Recorder produces a page which is personal to me, and I don’t think it allows me to share it publicly on the web. On the page is a graph that looks like this:

anychart

As you can see it also shows how long I spend with passing and failing tests, in a slightly different way from the Codersdojo Client’s graph. It also distinguishes compiler errors from failing tests, (pink vs red).

This graph also clearly shows the areas I need to improve – the long pink patch in the middle where I do a major redesign in too large a step, and at the red bit at the end when I’m not doing TDD all that well.

The line on the graph is a “score” where it is awarding me points when I successfully perform the moves of TDD. Further down the page it gives me a list of the “events” this score is based on:

Screen Shot 2012-08-16 at 09.39.32

(This is just some of the events, to show you the kinds of things this picks up on.) “New Green Test” seems to score zero points, which is a bit disappointing, but adding a failing test gets a point, and so does making it pass. “Went green, but broke other tests” gets zero points. It’s clearly designed to help me successfully complete red-green-refactor cycles, not reward me for adding test coverage to existing code, then refactoring it.

There is another graph, more focused on the tests:

Screen Shot 2012-08-16 at 09.44.53

This graph has mouseover texts so when you hover over a red dot, it shows all the compilation errors you had at that point, and if you hover over a green dot it tells you which tests were passing. It also distinguishes “compler errors” from a “compiler rash”. The difference is that a “compiler rash” is a more serious compilation problem, that affects several files.

You can clearly see from this graph that the first part of the kata I was building up the test coverage, then just leaning on these tests and refactoring for the rest. It hasn’t noticed that I had two @Ignore ‘d tests until the last few minutes though. (I added failing tests for Conjured Items near the start then left them until I had the design suitably refactored near the end).

I actually found this graph quite hard to use to work out what I need to improve on. There seem to be three long gaps in the middle, full of compilation errors where I wasn’t running the tests. Unlike with the Codersdojo Client, there isn’t a link to the actual code changes I was making at those points. I’m having trouble working out just from the compiler errors what I should have been doing differently. I think one of these gaps is the same major redesign I could see clearly in the Codersdojo Client graph as a too big step, but I’m not so sure what the other two are.

There are further statistics and analysis as well. There is a section for “code smells” but it claims not to have found any. The code I started with should qualify as smelly, surely? Not sure why it hasn’t picked up on that.

Conclusions
I think both tools could help me to become better at Test Driven Development, and could be useful in a dojo setting. I can imagine pairs comparing their graphs after completing the kata, discussing how they handled the refactoring steps, and where the design ended up. Pairs could swap computers and look through someone else’s statistics to get some comparison with their own.

The Codersdojo Client is free to use, and works with a large number of programming languages, and any editor. You do have to be comfortable with the command line though. The Sessions Recorder tool only supports Java and C# via Eclipse. It has more detailed analysis, but for this Refactoring Kata I don’t think it was as helpful as it could have been.

The other big difference between the tools is about openness. The Sessions Recorder keeps your analysis private to you, and if you want to discuss your performance, it lets you do so with the designers of the tool via a “comment on this page” function. I havn’t tried that out yet so I’m not sure how it works, that is, whether you get feedback from a real person as well as the tool.

The Codersdojo Client also lets you keep your analysis private if you want, but in addition lets you publish your Kata performance for general review, as I have done. You can share your desire for feedback on twitter, g+ or facebook. People can go in and comment on specific lines of code and make suggestions. That wouldn’t be so needed during a dojo meeting, but might be useful if you were working alone.

Further comparison needed
On another occasion I tried out the Sessions Recorder on a normal TDD kata, and found the analysis much better. For example this graph of me doing the Tennis kata from scratch:

anychart (1)

This shows a clear red-green pattern of small steps, and steadily increasing score rewarding me for doing TDD correctly. Unfortunately I didn’t do a Codersdojo Client session at the same time as this one, for comparison. A further blog post is clearly needed for this case… 🙂