I was in Finland recently, at the European Testing Conference. I both attended the conference and presented a workshop about “Approval testing with TextTest“. I won’t say any more about that, since Ben Linders did a brilliant write-up already that was published on InfoQ. There were several other highlights, and I wanted to just share a paragraph or so about each.

Mob Testing is what happens when your development team decides to work together on testing tasks as a Mob. I took part in a workshop where Maaret Pyhäjärvi facilitated two different mobbing exercises, one where we automated some UI tests using Selenium, and one where we practiced Test-Driven Development on the FizzBuzz kata. I have already done some Mob Programming and this felt very similar, except the focus was on developing tests rather than production code. It seems to have similar benefits – you have access to all the knowledge of everyone in the team, and you can learn things you didn’t even know to ask about. It makes pairing seem like a slow way to share good working practices.

JUnit 5 is on the horizon, and has several useful improvements over the previous version. Generally the syntax clutter is reduced, and the way you create parameterized tests has been overhauled. The most significant change though, (especially for people like me who work on developing other testing tools), seems to be that they’re designing the test-running engine to be separated so you can re-use it to run other kinds of tests. Any infrastructure that works with JUnit will then be able to run these other tests as well. In principle it opens up JUnit’s success as a platform, to be re-used by other test frameworks. Thanks Nicolai Parlog for this useful summary of the next generation of one of the most widely-used tools in the Java world.

Joel Hynoski has worked at many of the tech giants in our industry, including Google, Twitter, Apple, and now Lyft. He spoke about some of the engineering challenges they had overcome, specifically in the area of testing. One thing I liked was their tool that detects flaky tests, and puts them in ‘jail’. (A flaky test is one that sometimes passes and sometimes fails, when run against the same code. They are a pain and can be a huge waste of time.) When a test is in ‘jail’, that means it’s no longer run in the build pipeline, so it doesn’t block new releases. It instead gets flagged as needing maintenance. They then have a SLA that says how long a test is allowed to remain in jail before an engineer needs to look at it and fix the flakyness – a day or two I think.

I can feel a little in awe of someone who has worked in those kinds of famous engineering organizations, working at web-scale with some of the best developers in our industry. What I found most encouraging about talking to Joel, was that he was very down to earth about the problems these organizations face. They still battle with legacy code, despite it often only being a few years old. They have trouble creating reliable automated tests. The developers don’t always trust the test automation. They still have production bugs and hotfixes…

Alex Schladebeck spent the first ten minutes of her presentation giving a splendid rant about the bad reputation of UI testing. To summarize: (criticisms she hears about UI tests -> her responses)

UI tests give slow feedback -> and valuable feedback, doesn’t have to be after every build
need more infrastructure/machines -> yes, deal with it
they’re the top of the test pyramid -> they are in the pyramid! you can’t ignore them. They find different stuff than unit tests. Consider your context.
they’re flaky -> they’re not as bad as they used to be! Could be your app isn’t designed for testabilty? Could be your test design is poor?
they cause lots of work when small changes in your app -> that happens in development work too! Also, happens more if you design them badly.

She then went on to give some excellent advice about how to design your UI tests. It was mostly about layering your test code in different levels of abstraction, and getting a good collaboration going between developers and testing specialists.

Conferences are about meeting people and the organizers of this conference had very deliberately scheduled sessions to encourage this. We had a ‘speed dating’ session where you talk to about 8 random people for five minutes each. We had a ‘lean coffee’ session, where all the speakers were each asked to facilitate a discussion table. I thought this worked particularly well as a way to find people with similar interests, and get them to talk about their experiences. The hands-on workshops were all at the same time, so you had to go to one and not just attend talks all the time. There was also an open space scheduled when it would not clash with any other kinds of sessions. I thought all this together made for a pretty welcoming conference where you were bound to get to know new people.

Overall I had a really good time at this conference and I’d recommend it to both testers and developers with a strong quality focus.

Note: This post first appeared on Pagero’s blog

One of the questions that Kent Beck asked when he was developing the eXtreme Programming development methodology, was what happens if we turn the dials up all the way to 10? Take a practice we know is good, and do more of it? Practices like Test-Driven Development and Pair Programming are what he came up with, starting from manual testing and code review.

In the same way, Continuous Delivery is what you get if you turn the dials to 10 on your annual release cycle. You get to the point that you are pushing out new code to users, many times a day.

“Shortening the release cycle like this has a lot of advantages, especially around risk and quality.”

LOWER RISK AND HIGHER QUALITY WITH SHORTER RELEASE CYCLES

Shortening the release cycle like this has a lot of advantages, especially around risk and quality. Basically, you’re decreasing the batch size, a well-known tenet of lean manufacturing. If each new release contains fewer changes, then you have fewer places to look when things go wrong, so finding bugs is easier. You also lower the risk that any individual batch has a defect in the first place. By having an engineering setup that allows you to make code changes at the drop of a hat and push them out to production easily, you facilitate getting fixes out quickly.

So the upshot is quality problems surface sporadically instead of all at once, and are more easily dealt with. It’s an attractive prospect for us, especially with the growth in traffic we’re experiencing. Every time we have a defect in production, it affects a proportion of our customers, and the number of customers is increasing all the time. If we had a small bug a year ago that affected one or two customers, today the same bug might affect tens or even hundreds.

FROM MONOLITH TO MICROSERVICES FOR GREATER FLEXIBILITY

At Pagero, historically we’ve been pushing out a new version of our product “Pagero Online”, about once a month. We’ve been able to sustain that since about 2007. So when we began looking at Continuous Delivery, about three years ago, we were starting from a fairly good position. We’ve experienced steady growth in transactions through our cloud platform since the start, and it was in early 2014 we started switching over our architecture from a clustered monolithic JEE instance, to distributed microservices (see my previous article).

We needed to do this, in order to scale out our system horizontally, and handle the increasing traffic. One of the other benefits of microservices though, is you can deploy services independently of one another, and if you do it right, you can deploy new code without stopping traffic to the site.

“One of the other benefits of microservices, is you can deploy services independently of one another.”

FROM MONTHLY SERVICE WINDOWS ON SUNDAYS…

Our old monthly release cycle was based on having a ‘service window’, usually on a Sunday morning, where we could stop all the traffic, take a backup of the database, roll out the new version of the monolith, then bring everything back up again. You’ve got the database backup to fall back on, if something goes wrong with the update. You can easily roll everything back to the state it had before the service window.

…TO SEVERAL ROLLOUTS A WEEK

So of course, initially the microservices we had were fairly peripheral to the main function of our platform, and it wasn’t a huge risk to roll out new code without the safety of a service window. So we built deployment tools that allowed us to do that. All our microservices run with at least two instances, so an update consisted of taking each instance down in turn, replacing it with the new version. If something goes wrong, it’s not hard to roll back to a previous version. It’s a little more problematic to restore previous state, but generally we have good mechanisms to re-submit failed transactions once the service is working again.

So these days we roll out new versions of our microservices several times a week, when new features are ready, and rarely have any difficulties with this. The need to roll back does occur occasionally, but more often we can ‘roll-forward’ and deploy a newer version with a fix.

“These days we roll out new versions of our microservices several times a week, when new features are ready.”

MANY REASONS TO CONTINUE ON THIS PATH

With our former monolith, the situation is a little different though. Any changes that touch the database are deemed too risky to deploy without first taking a backup, and that currently requires a service window. We’ve got so used to frequently pushing out new versions of the microservices, and seen the benefits of that, that we’d like to do the same with the former monolith.

We also have good business reasons for wanting to release without having a service window – for a start our traffic is growing at such a rate, we can ill afford any downtime. Perhaps more importantly, as we get customers in more parts of the world, a Sunday morning is no longer a ‘quiet’ time of the week when it’s relatively ok to suspend our service. In some Arab countries where we do business, Sunday is the first day of the working week.

THE SHIFT TO CONTINUOUS DELIVERY HAS STARTED

Now we’ve gained some experience with Continuous Delivery of our microservices, it’s time to do the same with the whole Pagero Online platform, including our old monolith. So I look forward to being able to soon report that we’ve got the dials going all the way up to 10 and we are deploying any part of our system at any time.

 

Please note: this article was originally published on Pagero’s site.

At Pagero we are very proud of the technical architecture of our flagship product, Pagero Online. We’re successfully handling more document transactions than ever, as we see an ever increasing demand for e-document services. In this article I’d like to tell you a little about the journey we’ve taken, from humble beginnings almost ten years ago, to the present day and beyond. I’ll be talking a little about the technology stack we’ve chosen, including the business and technical reasoning behind our choices. If you’ve ever worked on a high availability, cloud based platform handling millions of events, or aspire to do so, you could be interested in our story.

This spring I was at the Craft conference in Budapest, which I thoroughly recommend by the way. There was a full program, with lot of great sessions, and interesting speakers. I did notice, browsing the program beforehand, that there were a lot of talks about Microservices, and Docker. Everyone seemed to have an opinion on the best deployment options, how to manage distributed data, building, testing, logging… This is clearly the hip and trendy way to build systems these days. I found this quite gratifying, since at Pagero we’ve been using a Microservices architecture for some time now, and have been using Docker in production since early 2014. It’s become our everyday life, not some hyped trend that we just heard about. Our reasons for going with Docker and Microservices are firmly rooted in the needs of our business.

Let me explain. Pagero Online is a cloud-based platform for exchange of electronic documents between businesses, for example invoices and orders. The point is, our customers can send their documents to us in whatever format their internal system produces, and we will deliver it in the format the recipient finds easiest to process in their internal systems. It’s clearly a valuable service, since we have an impressive year on year growth in document transactions.

The growth illustrates the challenge we’ve been meeting successfully for several years now – to scale our cloud system to handle ever increasing traffic. It’s of course a great problem to have, and we in the R&D department have worked hard to keep everything running smoothly throughout. The architecture we had when we started, is not the architecture we have now.

IN THE BEGINNING

Back in 2007, Enterprise Java Beans were the thing to do, and we felt confident we were building a future-proof, scalable system, using a JBoss container talking to a PostgreSQL database. Moore’s law meant that we could initially scale just by buying a bigger machine now and then. As time went by, we needed more, and started using the clustering capabilities built into the J2EE platform – i.e. several instances of the same code, receiving requests via a load balancer. At some point in about 2012 we realized this approach could no longer handle the increase in traffic that we were experiencing. We could no longer just add new instances of the same code, the slowdown from the communication overhead between them would be greater than the speedup from the increased CPU power. We needed to give more CPU power to just a few parts of the code that were doing the most intensive processing, without also hitting the communication bottlenecks.

ENTER MICROSERVICES AND DOCKER

Everything was pointing to a need to break apart our monolith into more manageable pieces. Microservices and Docker seemed the perfect match to our problems, so we spent the next year or so building the infrastructure needed. In February 2014 we deployed our monolith, packaged in a Docker container, together with some essential services for monitoring, service discovery, and message passing, (with protobuf over Rabbit MQ). Over the following months, the whole of our R&D department completed a course in the Scala programming language, and we built and deployed several more services for new features in the system. It worked! Since 2014 we have been able to quickly grow to about twenty services a year later, and sixty today.

We’ve realized the Microservices architecture enabled organizational streamlining too. Over the years our development team has grown from a handful of developers in the same room, to about 30 people split across three time zones. By breaking up the codebase, we can also divide up the development work more efficiently. We now have half a dozen ‘devops’ teams each responsible for a handful of Microservices. Both new and seasoned developers are more productive when working in these smaller codebases.

SCALING THE DATABASE

It was in around mid-2015, however, we started to see where the bottleneck had moved to, now the application code was performing better. Our trusty PostgreSQL database was handling a good many more gigabytes of data than ever before, and some transactions were getting a little slow. We concocted a plan to split it up too, just like we were doing with the monolith of code. We settled upon Cassandra and worked out how we were going to safely migrate all the document data out of Postgres, and into this distributed data store. The rest of the data will remain where it is, but just taking out the documents should free up a good deal of space, and release the main bottleneck. We of course need to do this without disrupting our service in any way, so one way to reduce the risk is to run the new Cassandra database in parallel with the existing Postgres, duplicating all the data. Only once we’ve done extensive testing, and we can see it’s working ok, will we remove the redundant copy

That’s kind of where we are now, we have just started this parallel running, and initial results are looking good.

THE BIG BREAK-UP

The next challenge is to continue to break apart our monolith of code, and create new services out of the pieces. Although all our new features are being built in Microservices, we still have the heart of the system in the former monolith. We’ve seen so many benefits to having Microservices, we’d like all our code to look like that. In some ways it’s a more daunting prospect than breaking up the database. This is a large quantity of tried and tested code that has been running in production for many years – breaking it up is not something you can do over a weekend!

We have to make this big change without any interruption to our production service, and we’ve thought carefully about what our strategy should be. One way to do a big risky change is to split it into a series of less-risky, smaller changes. The idea is that after every step in the break up, to run a battery of automated regression tests. The shorter the time the tests take to run, the smaller increments we can work with, and less risk of breaking anything. I’m personally pretty excited by this prospect. We’ve spent several years now building and improving our automated tests for Pagero Online, to the point where we feel pretty confident in taking on this challenge.

The other part of the strategy is to do the same as we have with the database migration. We’ll run both the old and new versions of the service in production for a while before we cut over to the new one. This should find any issues missed by the automated tests, without affecting any of our production traffic.

It’s going to be a real proof of how good our testing and deployment routines are. What kind of tests and deployment tools we’ve built, now that’s a topic for another blog post. If I’m lucky, I might even be telling you about the hip and trendy hot technologies that will be all over the agenda of the next Craft conference :-).

I’ve been favouring an Approval Testing approach for many years now, since I find it pretty useful in many situations, particularly for acceptance tests. Not many people I meet know the term though, and even fewer know how to use the technique. Recently I’ve put together some small exercises – code katas – to help people to learn about it. I’ll be going through them at a couple of upcoming conference workshops, but for all you people who won’t be there in person, I’m publishing them on github as well.

I’ve got three katas set up now, Minesweeper, Yatzy and GildedRose. If you’ve done any of these katas before, you’ll probably have been using ordinary unit testing techniques. Hopefully by doing them again, with Approval Testing, you’ll learn a little about what’s different about this technique, and how it could be useful.

Before you can do the katas, you’ll need to install an approval testing tool. I’m one of the developers of TextTest, so that’s the tool I’ve set up right now. Below are some useful commands for a debian/ubuntu machine for installing it.

I’m still developing these exercises, and would like feedback about what you think of them. For example I have Python versions for all three, but only one has a Java version as yet. Do people want more translations? Do let me know how you get on, and what you think!

Installation instructions

You will need to have Python 2, and TextTest. (Unfortunately TextTest uses a GUI library that doesn’t support Python 3). For example:

$ sudo apt-get install python-pip
$ sudo pip install texttest

For more detailed instructions, and for other platforms see the texttest installation docs. For more general documentation, see the texttest website.

You need to have an editor and a diff tool configured for texttest to use. I recommend sublime text and meld. Install them like this:

$ sudo add-apt-repository ppa:webupd8team/sublime-text-3
$ sudo apt-get update
$ sudo apt-get install sublime-text-installer
$ sudo apt-get install meld

Then you need to configure texttest to use them:

$ cd
$ mkdir .texttest
$ touch .texttest/config
$ subl .texttest/config

Enter the following in that file, and save:

[view_program]
default:subl
[end]
[diff_program]
default:meld
[end]

For convenience, I also like to create an alias ‘tt’ for starting TextTest for these exercises. Change directory to one of the exercise repositories, then a ‘tt’ command should start the TextTest GUI and show the tests for that exercise. Define such an alias like this:

alias tt='texttest -d python -c .'

Two of the exercises start with a small test suite for you to build on. There should be instructions in the README file of each respective exercise, to help you to get going. If you really can’t work out what to do, have a look at the sample solutions and see if that helps. These are also on github: Minesweeper-sample-solution, Yatzy-sample-solution, GildedRose-sample-solution

Last week I met Woody Zuill when he came to Göteborg to give a workshop about Mob Programming.  At first glance mobbing seems really innefficient. You have a whole team of maybe 6-7 people sitting together all day, every day, programming at one computer. How could that possibly be a productive way to work?

I’m pretty intrigued by the idea. It reminds me of the reaction people had to eXtreme Programming when they first heard about it back in like 2000. Is it just an off-putting name for something that could actually be quite brilliant? There are certainly some interesting people who I respect, talking warmly about it. The thing is, when it comes to working together with others, programming at one computer, I’ve had some mixed results. Sometimes good, sometimes less good.

I’ve done some pair programming, and found it worked well with some people and not others. It’s generally worked much better when I’ve paired with someone who has a lot of useful knowledge that I’ve lacked. Either about the language and frameworks we’re using, or about how the software will be used – ie the problem domain. I’ve found it’s worked a lot less well in other situations, with other people. I find it all too easy to hog the keyboard, basically. So I do pair, but not that often.

With ‘Randori’ style coding dojos, the idea is that you have a pair at the front who code, and switch one person every 5-7 minutes, or every test case. I’ve facilitated a lot of these sessions, and I find them especially useful for quickly getting a group of people new to TDD all up and running and pointed in the same direction. Recently I’ve been doing it only for the first session or two, instead having everyone working in pairs most of the time. As a facilitator, this is far easier to handle – much less stressful. Managing the interactions in a bigger group is difficult, both to keep the discussions on track, answer questions about the exercise, and to maintain the pair switching. I also find the person at the keyboard easily gets stressed and intimidated by having everyone watching them, and often writes worse code than they are capable of. So I do facilitate whole-group Randori sessions, but not that often.

So I wanted to find out if mob programming had similar strengths and weaknesses. In what situations does it excel, and when are you better off pairing or working alone? Would I find it stressful, like a Randori? Would I want to drive most of the time, as in pair programming?

Woody turns out to be a really gentle person, about as far away from a ‘hard sell’ as you can get. He facilitaed the session masterfully, mixing theory and practice, telling us stories about what he’s found to work and why. I am confident he knows a lot about software development in general, mob programming in particular, and he is very humble about it.

The most important insight I gained from the session, was that I need to get good at ‘strong-style pairing’. That seems to me to be at the heart of what makes Mob programming work, and not be stressful like the Randori sessions I’ve been doing. I think it will also help me to get pair programming to work well in a wider variety of situations.

I have heard about ‘strong-style pairing’ before, from Llewellyn Falco, who invented it, but I havn’t really experienced it very much before, or understood how important it is. Do go read his blog post about it, for a fuller explanation of what I’m talking about.

The basic idea is “For an idea to go from your head into the computer it MUST go through someone else’s hands”. That forces you to express your ideas really clearly, in words, first. That is actually pretty difficult when you havn’t done it much before. The thing is, that if you do that, then you open up your programming ideas for discussion, critique, and improvement, in a way that doesn’t happen if they go straight from your head through your own hands into the computer. I think if I get better at ‘strong-style pairing’ it will help me not only with Mob programming, but also with pairing and facilitating Randori dojo sessions. Probably also with programming generally.

Pairing has worked best for me when I’ve been driving, and my navigator is good at expressing ideas for me to understand and then type. I think I need to get good at that navigator role for the times when I’m the one with more ideas. I need to learn that when I think ‘I have an idea about to solve this problem!’ I should hand over the keyboard, not grab it. I need to learn to express my coding ideas verbally. Then I will be able to pair productively with a wider range of people.

Randori sessions are much less stressful if the driver has less to do. If the responsibility is shared more evenly with the Navigator, then I think everyone will write better code. As a facilitator, I have less group dynamics to worry about if the designated navigator is in control, and everyone else talks less. (Woody advised that, at least at first, you should ban anyone else in the mob from giving the Driver ideas about what to type, so the Navigator learns the role.)

So thanks, Woody, for taking the time to come to Göteborg, sharing your experiences and facilitating a great workshop. I learnt a lot, and I think Mob Programming and Strong-style pairing could quite possibly be some of those brilliant ideas that change the way I write code, for the better.