A write-up on learning away from work

Note: this post originally appeared on Praqma’s blog

Do you work on any hobby coding projects in your free time? Practice code katas? We all wish we could, but making time for learning away from work isn’t possible for everyone. So, who should pay for learning time?

Recently, I took part in a panel discussion at the Software Craftsmanship Conference in London. One of the questions from the audience was about finding time to learn new skills. Bob Martin, sitting to my right in the picture below, talked about the importance of spending your free time on reading, practicing code katas, and generally sharpening your skills. He said that we need to be able to call upon those skills when there’s a deadline and production software needs to be built. He also said that it’s unrealistic to think these skills can be acquired on the job. That practice needs to be done beforehand, at home, on your own time.

Picture from@cyriux

I agree strongly that software developers need to learn throughout their career. As professional knowledge workers it’s crucial for us to keep up to date and continually expand our toolboxes. Skills like Test-Driven Development (TDD) take time and practice before you become effective with them. Our industry moves very fast and new tools and techniques come along frequently.

I also agree that you need to practice in safe situations before applying a technique in production code. When I see badly designed code it’s often due to developers not fully understanding the libraries and frameworks they use. This causes them to skip testing or do it badly. What’s more, they often also lack techniques for refactoring and can’t improve the design once they’ve learnt more.

The problem is only getting bigger

Software is taking an ever larger role in our society. Bob Martin pointed out in his keynote speech that there is very little you can do in this life without interacting with computers and the software they run. You can’t buy anything, watch tv, do your washing, or probably arrange to meet your friends and go out for a drink without some form of interaction with information technology. Software is everywhere and skilled software developers are needed now in unprecedented numbers.

Not everyone has free time for this

I agree with Bob Martin about the problem: developers need to improve their skills. However, I just don’t think it’s realistic to expect people to spend their free time on this. At least, not in any sustained way and over a whole career. I actually think it’s pretty unhelpful to suggest this should be the main way skill acquisition happens. Several people at the conference agreed with me – not people in the panel, I should say – but regular conference attendees. At least half a dozen people approached me the following day and thanked me for saying this.

At times in my career I have had the luxury of enough free time to spend some of it on learning, reading, practicing and improving my software development skills. But at other times I’ve had family commitments, caring responsibilities, and other priorities that have completely prevented this. Expecting people to learn skills like TDD solely in their free time will cut off a large section of software developers from progressing and becoming more productive. We need those developers too!

Let’s look at better solutions.

One consequence of the current shortage of skilled software developers is our ability to vote with our feet. If your employer does not provide paid on-the-job time for learning, you can probably find another one that will. I am very happy to work for Praqma where our policy is that everyone should spend a good proportion of their time studying, going to conferences, reading, giving internal seminars and generally keeping themselves updated on what’s going on.

Giving developers time to learn makes good business sense in a competitive world where attracting and retaining skilled people is an advantage in the marketplace. You have the power to choose your employer. I don’t think Praqma is the only company that has realized providing time for learning is attractive to smart people and actually a win/win proposition. There may be a similar company close to you too. Have a look around!

What if I can’t change jobs?

Just as having free time for learning can be an unattainable luxury, there may be times in a career where changing jobs is not an option either. In this case you may need to talk persuasively with your manager and affect change from within. I really believe that a developer who improves their skills will work more effectively and be a bigger asset to their team than one who works only on production code and never does any practice. I think you should be able to persuade your boss that a small investment in training time now will reap dividends later.

I’ve also seen people spending a lot of time at work on code that never makes it to production. Have you seen that? Can you bring your boss examples of time you’ve wasted on proof-of-concept projects that never got to production, or long-lived feature branches that were impossible to merge and had to be thrown away? Perhaps you could spend some of that time practicing instead, and avoid some of those costly mistakes.

We need industry-wide skill acquisition

I think Bob Martin is right about the problem but wrong about the solution. When asked how an individual can improve their skills the answer can’t rely on them using their free time. It’s elitist, only those with that luxury of having free time will be able to do it. We need training and skills acquisition on a broad front that will work for the majority of developers. I think employers must put up some time for this as part of the working week. As individual developers we can argue for this, and vote with our feet if necessary. The learning of new skills shouldn’t be a luxury that’s only available to those with free time on their hands. Software, and its role in our world, is too important to rely on such a tenuous and poorly distributed resource.

Note: this article was first published on Praqma’s website

Experiences Pairing with Llewellyn Falco

How does a Technical Agile Coach improve work in a development team? When Llewellyn Falco asked me to pair with him at a client I jumped at the chance to see how effective mob programming is for introducing technical agile practices.

Day one: head first into the mob

My first day as a visiting Technical Agile Coach begins with coding, in a mob. All the developers in the team and I enter our names into a mob timer. It will prompt us to switch roles every 5 minutes, so we will all take a turn at the keyboard. Llewellyn takes the facilitator role, sitting at the back. I find mob programming is great way to get to know a team and their codebase. After only a few minutes, I take the Driver role, which forces to very quickly pick up what’s going on. I need to understand both the current task, the IDE and the particular code we’re working on. The Navigator prompts me and the whole team helps me to find where to click and what to type.

Llewellyn has worked with this client on and off for half a year, and he was here all the previous week. This team has quite a bit of experience mob programming with him and a couple of other visiting coaches. Llewellyn gets involved facilitating the mob from time to time, usually to draw our attention to some improvement we could make in the code, or in the way we’re working together. When the team is stuck, or going in the wrong direction, he will step in and take the Navigator role for a while. That can happen when we’re doing a tricky refactoring, or if he spots we aren’t taking full advantage of our tools.

After 90 minutes the team has to go to their stand-up meeting. Usually we’d mob for about 2 hours with a team, but today we have a longer breathing space before the Learning Hour. Llewellyn and I take the chance to discuss the challenges this team is facing and how we can coach them more effectively when we meet them again tomorrow. Later this week he’s going to get me to take the Facilitator role and he’ll be in the mob instead. Next week he’ll fly home and leave me here for a week coaching by myself, so we’re already preparing for that handover.

2

The Learning Hour

The Learning Hour is a fixture in the calendar of everyone in the department. It’s one hour devoted to learning new techniques in software development, every day, led by Llewellyn or a visiting coach like me. Not everyone can make it every day, so the planned topics are circulated in advance with an indication of whether it’s a coding session or not. When it’s coding not as many managers/scrum masters/product owners turn up. Actually some of them do enjoy attending these more technical sessions to get a better insight into what challenges their developers are facing.

The lesson I’ve decided to begin with uses the Tennis Refactoring Kata, an exercise I’ve done many times with teams. I am hoping it will be fun and not too difficult, especially compared with the production code they are used to. The exercise has comprehensive unit tests that quickly fail if you make a mistake. The developers aren’t used to having that, and they soon find they like having fast feedback on the accuracy of their work.

Lunch dating

Lunch is next up, and it turns out I’m not eating with Llewellyn. He’s set me up to go out with one of the developers from a team I won’t otherwise be working with. It’s a very deliberate policy Llewellyn has to help me to get to know the wider organization outside of the teams I mob with. He’s found that daily one-on-one chats with key people preferably in a social setting, is an effective way of becoming well-connected in an organization. Today it also gives me an opportunity to offer some career advice to an ambitious developer in a similar position to where I was ten or so years ago.

After lunch Llewellyn introduces me to a second team that I’ll be mob programming with, then leaves me to it. He’s confident this team is working well together and it will be straightforward for me to facilitate without him. We get stuck straight in to some front-end development work, improving a new account creation form. My javascript is a little rusty, but I find that’s not really a problem – they know their tools. I just need to keep an eye on how the mob is running, and think a little outside the box. I spot that what they’re doing will likely break some of the automated GUI tests, so we have a chat about how to handle that.

In the meantime, Llewellyn is working with a different team who are new to him. He begins teaching them the basics of mob programming, and getting to know their specific challenges. A future visiting coach might get to work with them once they’re up to speed.

Managing up and down

At the end of the session, we take a short break together and discuss how things are going. We have a little slack in our schedule, and Llewellyn spots one of the senior managers having a coffee. He takes the chance to greet him and book a short meeting the following day. Llewellyn’s heard there are plans afoot to break apart some of the teams, and he wants to ask the boss to protect a particular team from that reorganization because they are really starting to gel and mob well together.

For the third mobbing session of the day Llewellyn and I are pair-coaching again. It’s similar to the first session. All the teams we’re working with are really struggling with code quality and lack of automated tests. Even if ostensibly we’re working on adding a feature, most of the time we’re addressing code smells and adding unit tests.

The last thing Llewellyn and I do before we leave for the day is send a very short email to the department managers – the people authorizing our invoices. We write one sentence about each mobbing session and the learning hour, summarizing what we’ve done. It makes our work more visible to the decision makers.

Moving in the right direction

I’m impressed with how much Llewellyn and the other visiting coaches have already achieved with these developers. Most people have a positive, curious outlook, and consider learning new skills to be a normal part of work. Llewellyn will often pause the mob timer, pull out his laptop, and spend five minutes showing some relevant presentation slides. Some of the developers here know certain refactorings so well I find myself learning techniques from the teams not just from Llewellyn.

3

Optimising for when we’re not there

That’s not to say there aren’t problems. The codebase is still large, badly structured, slow to build, and lacking automated tests in many areas. Many developers in the mobbing teams are relatively new to the company, and most have little previous experience of Test-Driven Development or refactoring techniques. However, their starting point isn’t what’s important, the crucial thing is that they are continuing to improve and learn. What we’re doing here is creating momentum in the right direction, teaching skills and strategies so people can continue to make things better when we’re no longer present.

4

The wider organisation is also just beginning to adopt Agile practices and DevOps, and it’s not working smoothly yet. Many teams are still sitting in cubicles. The test, build and deployment infrastructure relies too much on manually executed steps.

There are several other agile coaches here at the same time as Llewellyn and me who are more focussed on improving process and product management. Later in the week I attend a sprint demo where there is much talk of A/B testing and hypothesis-driven development. People seem really keen to understand their customer needs and to verify they are building the right thing. The thing is, the starting point is less important than the direction of travel.

After a week of pair-coaching I feel confident I can pick up and continue Llewellyn’s work with the development teams at this client. I’ve got to know the people, the particular challenges they face, and have a structure in place that will let me continue the changes Llewellyn is initiating. I’m actually quite surprised how smoothly the handover has gone.

Visiting Technical Agile Coach

Not many coaches are generous enough to invite visitors to pair with them. Llewellyn has a whole list of people he’s inviting to visit in 2018, and I feel lucky to be one of them. It seems to me that everyone’s a winner. Of course Llewellyn wants to show off and spread his coaching methods to the visitors, but also to learn from them and get their feedback on his work. At the same time the client gets the benefit of teaching & advice from many different visitors. They do have to pay two coaches rather than one during the week we overlap, but having such a smooth handover gives them the ability to get more weeks of coaching than Llewellyn could provide by himself. So far the client seems to feel it’s worth the additional expense.

Taking what I’ve learnt home with me

I plan to start recommending this style of Technical Agile Coaching to my clients back home in Sweden. For a start I really enjoy coding, mobbing and teaching all day, but more importantly it seems to be more effective than what I’ve been doing until now. For years I’ve used Coding Dojos to teach the theory and practice of TDD, but all too often the initiative fizzles out and the dojos stop happening when I’m no longer there.

I think this combination of daily mob programming sessions with a learning hour is particularly effective at teaching both theory and practice. The lunches, the presence of other agile coaches, and sending daily summary mails connects me better with the wider organization. This has been a really valuable two weeks for me. I feel pair-coaching with Llewellyn has taught me an effective way to introduce technical agile practices and change developer behaviour for the better.

5

This post was originally published on Praqma’s blog

A short story about Pre-tested Integration

the three

Continuous Integration and Code Review are strongly correlated with success. Many use Pull Requests for code review, but for co-located teams this can be an obstacle for CI. Is there a better way?

There are three developers on the (fictitious) team: Annika, Boris and Carol. Annika is a recent hire, fresh from university, Boris is the team lead, and Carol has been around the longest. Each of them is working on a different task. They all synchronized their work with their shared master branch when they arrived at the office today, and now it’s approaching morning coffee time. They have all made some changes in the code which they’d like to share with the rest of the team.

Annika is working on a local branch called ‘red’. She checks it’s up-to-date with master and pushes it to a remote branch named ‘ready/red’. It’s similar for Boris and Carol. They are on blue and orange branches respectively and push their changes to ready/blue and ready/orange.

the three

The Build Server is set up so that it detects new branches on the Version Control Server that follow a naming convention. Any branch beginning with ‘ready/’ is scheduled for integration, and only one of these integration builds runs at a time. The Build Server delegates builds to one or more agents,and since the ready-job agent is idle, it picks up the ‘ready/red’ change straight away and leaves the other two ready-branch builds in the queue.

build servers

The build job has several steps. First, the agent merges the ready-branch into a local copy of the master branch. Annika’s changes get a simple fast-forward. The agent performs a full build, static analysis, code style check, and unit test. Everything goes well, so the agent pushes the merge result up to the Version Control Server and posts a message on the team message board.

Things start out similarly for Boris’ ready/blue branch. The build agent takes a copy of master from the git server and merges in the ready-branch. This isn’t a fast-forward merge, since there is a new commit in master for the ‘red’ changes, but it’s still ok. So long as the agent can do the merge without finding any conflicts the build can continue.

The agent then proceeds to the next build steps. Unfortunately, Boris hadn’t noticed that one of his changes caused a test failure. The team has previously agreed that the code in master should always pass the tests, so this means Boris’ changes shouldn’t be shared. The build agent sends a message to Boris telling him about the failed tests, discards its merged branch, and moves on. Carol’s ready/orange branch is up next. The build agent starts again with a fresh copy of the latest master taken from the git server. Carol’s changes also merge without difficulty and this time both build and tests pass. The build agent pushes the merge commit to the server and notifies the team.

build servers

Boris and Carol are having a cup of coffee while they wait for the build server to integrate their changes. Annika is chatting with the Product Owner about the new feature she plans to work on next, ‘cyan’. When they get back to their desks they see the messages from the build server.

Annika is happy to see her changes integrated successfully. She fetches the latest master from the remote git server. She’s now completed the work on the ‘red’ task, and her changes should undergo a code review. She marks the ‘red’ task as finished in the issue tracker and adds an agenda item to the team’s next scheduled code review meeting, which is later that week. Annika selects a new task to work on and checks out a local branch from master called ‘cyan’. Boris sees the message about his failed tests and realizes immediately what he missed. He’s a little embarrassed about his mistake, but happy his teammates are not affected. They may not even notice what’s happened. Boris takes the opportunity to merge the latest changes from master into his ‘blue’ branch. He is quickly able to address the problem with the tests and pushes an update to ready/blue. The build agent gets to work straight away.

Carol is not finished with the ‘orange’ task, but is happy to see her initial changes integrated successfully. She fetches master and merges it into ‘orange’ before continuing work there. She’s noticed a design change that would make her task easier. She plans the refactoring in steps so she can push small changes frequently as she completes the re-design. Sharing her changes with the team often will make it easier for everyone to avoid costly merges. Later that week, in the code review meeting, the team looks at Annika’s changes for the ‘red’ task. It represents a couple of days’ work. The code review tool presents a summary of all the commits involved and they discuss all the changes in the development of the ‘red’ feature.

Unfortunately, Boris and Carol are not happy with a part of the design Annika has made and the code formatting needs improving in places. The outcome of the meeting is that they agree to pair program with Annika on a refactoring of the design, and encourage her to initiate informal design discussions more often during development. The idea is that the more experienced developers, Boris and Carol, should help Annika to learn better design skills. The team finds the code-formatting issues a bit annoying since this kind of detail shouldn’t be in focus for a code review meeting. They create a task to improve the code-style checker in the pre-tested integration build to catch any similar code formatting problem in future.

Commentary

This development process is working really well for Annika, Boris and Carol, and pre-tested integration is a small but important piece. They are not using pull requests, but they have checks on what code is allowed into the master branch, and they have good code-review culture. Integration to master happens at a faster cadence than the flow of work-items. That’s important. Integration is less painful the more often you do it and you might not want to break your work items down to the same small granularity that would be best for code changes. You also don’t necessarily want to delay your integration by waiting for a teammate to review your pull request.

Strictly speaking, this process is not Trunk-Based development since there are more branches involved than just trunk, but so long as the integration is frequent in practice it’s indistinguishable. The benefit of this over Trunk-Based development is, of course, that Boris or any other developer can’t unwittingly break master for the rest of the team.

If you’re using Jenkins you can easily automate the integration process with our Pretested Integration plugin. It’s not difficult to implement this functionality yourself for other build servers. Whichever approach your team chooses, I recommend you settle on a process that results in frequent integration together with collaborative and constructive code-reviews.

This post was originally published on Praqma’s blog

Continuous Integration is now synonymous with having a server set up to build and test any change submitted to a central repository. But this isn’t the only way, or even how CI used to work. What did we do before DVCSs and Jenkins?

What is the connection between cricket and pre-tested integration? Pre-tested integration is about the way you set up your Continuous Integration server to metaphorically catch the balls you drop and the tests that fail, just like in cricket.

When a team first sets up a build server, this is the typical way to do it – you have it build any change in the mainline branch, and notify you if it fails. Developers are supposed to run the build and test before they push the code; the build server is supposed to be a kind of back-stop, catching any balls you drop. Any developer who manages to break the build is singled out by having their name displayed on the ‘information radiator’ screen in the corner. Just a little ridicule, some good-natured teasing, enough so that they are shamed into remembering to run the tests next time.

The thing is, it’s quite easy to make a mistake that causes the build to fail. People can get upset with you for breaking the build, and it can be embarrassing to have your name in lights. You can get into a bad cycle of stress, making mistakes more likely, causing you to break the build more often, making you more stressed…

Even a short length of time with a broken master can disrupt anyone else who is trying to integrate their changes, and discourage them from doing so. You can get into a bad cycle of people mistrusting the master and hoarding their changes. This causes bigger commits, more likely failed builds, and less trust… if it gets really bad you can find yourself spending several miserable days resolving three-way merges. Believe me, you don’t want to go there!

I think we can learn something from how they used to do Continuous Integration in the days before CI servers became widespread. I think they had a more humane process that was less stressful, and led to more virtuous cycles than downward spirals.

The whole point of Continuous Integration is to ensure that all the developers are working on essentially the same version of the source code. If you compared my working copy with those of my colleagues you should not be able to see any more differences than have accumulated through work either I or they have done today. Ideally, only changes we’ve made in the last couple of hours. The point of this is to make integrating our work a trivial task that takes seconds and is hard to mess up.

If you’re interested, you can read Jame Shore’s description of CI without a build server, but I will summarize it below:

When a developer was ready to integrate their changes they would physically walk to a designated ‘integration machine’, load their changes there, and perform a full build and test. No-one else was allowed to begin their integration until they were done. If the integration build passed they would simply say in a loud voice to the other developers to pull their changes from that machine. If there was a problem with the integration, and they couldn’t fix it quickly, they would revert their changes on the integration machine, and return to their workstation to fix the problem.

It’s a process that clearly only works if everyone is sitting in the same room. On the other hand, it works fine even when there are no atomic commits, no build servers, and no automated merges. Let me highlight some aspects of this process:

  • Integration to mainline is serialized – one set of changes is integrated and verified at a time.
  • Developers are notified when there are changes in mainline, and are expected to integrate them locally soon after.
  • If the integration fails the rest of the team is not affected, mainline is not broken, no-one else need know.
  • Developers can’t start working on the next task until the integration is completed because they sit at the integration machine, not their workstation. Developers can supervise the integration, fixing small problems as they arise.

I think those points have been lost in the way I see most teams set up their Continuous Integration build server. Basically – too much shaming, and not enough collaboration!

What we at Praqma have done with several of our customers is use a technique we’re calling ‘pre-tested integration’. I think it’s closer to the original CI process. You might have heard it called ‘validated merges’ or ‘pre-verified commits’.

When a developer has local changes that are ready to be integrated to the mainline they first push them to a remote branch named ‘ready/xxx’ (replace xxx with whatever you like, a Jira task number, a feature name, random string…) The build server is triggered whenever it detects a new remote branch that obeys this naming convention. It puts them in a queue and works on integrating them one at a time.

The build server takes a copy of the latest mainline from the central version control repository, and updates it with the changes from the ready-branch. For this to succeed any merges must be straightforward – i.e. possible for it to be done automatically. The build server then runs an automated build and test. If everything succeeds the server pushes the integrated changes up to the central code repository. The build server also updates a webpage or slack channel, notifying the team that new code is available in mainline. The ready-branch is now integrated, finished with, and can be deleted.

On the other hand, if the merge or the build fails, the developer has more work to do – their work wasn’t ready to integrate after all. The build server notifies them so they can go back and fix the problem and submit a new ready-branch. Crucially, the rest of the team is completely unaffected. Someone else is free to integrate their changes instead.

This pre-tested integration process is very similar to the CI process without a build server described earlier. We’ve added a little automation, we’re making use of some features of modern version control systems, but on the whole I think it’s closer to the old manual process than the usual way CI servers are set up.

  • We ensure integration is serialized through build server configuration rather than a physical token.
  • The build server notifies other developers when new code is successfully integrated – I think a post in a slack channel works much like talking in a loud voice in the team space.
  • If an integration fails the shared mainline is not broken, only the developer who submitted the change is affected.

The other two points do differ though:

  • Nothing forces developers to wait for the integration step before starting on a new task, but it is usually more convenient to do so. You wait for the build server chat room message, then pull your integrated change from the remote master before continuing work.
  • Developers can’t fix problems in the integration step while it is ongoing.

During the integration step the build server is busy, but your developer machine is not. You could be tempted to move on to your next task before the previous one is finished and integrated. If the integration fails you’ll have to context-switch back, which may take more time than the idling you avoided.

There are upsides to having an integration process that actually can’t be supervised though. You get into a virtuous cycle where you are more successful if you submit small changes that can be integrated automatically, and you are more successful when you integrate more often. In short, you have put some automation in place that will draw you in the right direction. Smaller, more frequent commits, is exactly the behaviour we’d like to encourage.

If you’d like to find out more please take a look at Praqma’s pretested integration plugin for Jenkins. It’s one way to implement the ideas described in this post. It’s a free and open-source project, and we’re currently working on something of an overhaul. This development is funded through our CoDe Alliance, “where ambitious customers meet around continuous delivery”.

The latest version has support for Jenkins pipeline. This will bring the plugin up to date with modern usage – many people use Pipeline and freestyle jobs these days. The new version will also allow for much more flexibility in job design, with matrix and job combinations. It’s exactly the kind of tool I need for the team I’m working with right now.

Fundamentally though, succeeding with Continuous Integration is not really about tools, it’s about collaboration. You should choose tools that let developers feel safe, tools which encourage them to synchronize their work often, integrating small pieces continually in their work together.

Fictitious people that might get real – and sue you!

Finding realistic data for testing is often a headache, and a good strategy is often to fabricate it. But what if your randomly generated data turns out to belong to a real person? What if they complain and you get fined 4% of global turnover?!

Please note: This story was originally posted on Praqma’s site in October 2017.

GDPR

I am of course referring to the new GDPR regulations which will come into force next year. The aim of them is to prevent corporations from misusing our personal data, which generally seems like a good thing. Many companies use copies of production databases for testing new versions of their software. Sometimes that means testers and developers get access to a lot of data that they probably shouldn’t be able to see. Sometimes it means you’re testing with old and out of date versions of people’s information, or even testing with customers who cancelled their contracts with you long ago.

Generally I’m in favour of tightening up the rules around this, since I think it will better protect people’s privacy. I think it will also force companies to improve their testing practices. In my experience it’s easier to get reliable, repeatable automated tests if each test case is responsible for creating all the test data it uses.

If you instead write a lot of tests assuming what data is in the database from the start, maintaining that data can get really expensive. I’ve seen this happen first hand – at one place I worked, maintaining test data became such a big job that a whole test suite had to be thrown away along with the data!

Data-Driven Testing

At my last job, part of the test strategy involved data-driven testing. In my test code I’d access an internal API to create fictitious companies in the system. Then, simply by varying the exact configuration of these companies, I could exercise many different features of the system under test. Every case test created different, unique companies. This meant you could run each case as many times as you liked. You could also run lots of tests together in parallel, since they all worked with different data. Each time you ran the tests, you had the opportunity to start over with a fresh, almost empty database – a small memory footprint that meant a fast system and speedy test results.

Fictitious People

When I came to my current client I was planning to use a similar approach. The difference here is that instead of fictitious companies, I needed to create fictitious people. So, I got out my random number generator and started creating fictitious names, addresses, and, of course, Swedish Personal Numbers. That’s when things started getting tricky… Let me introduce those of you who aren’t Swedish to the Personal Number.

It’s a bit like a social security number, unique for each person, and the government and other agencies use it as a kind of primary key when storing your data. If you know someone’s number, (and you know where to look), you can find out their official residence, how much tax they paid, credit rating, that sort of thing.

You’ll understand why, then, we started getting worried about GDPR at this point. A Swedish personal number is definitely a ‘personally identifiable’ piece of data, and GDPR has a lot to say about those.

A primary key containing your age

The other thing about the personal number is that it’s not just a number. When they first came up with the idea, in the 1940s, they thought it would be a great idea to encode some information in this number. (I’m not sure database indexing theory and primary keys were very well understood then!) So, they decided that the first part should be your birthdate, then three digits specifying where in Sweden you were born, and the last figure is both a checksum, so numbers can be validated, and also specifies your gender.

These days, lots of Swedish people are born outside Sweden, and not everyone has a gender that matches the one they had at birth, so they had to relax some of these rules. Sometimes too many children are born on one day and they run out of numbers for it, in which case you might get assigned the day before or after. However, the number still always includes your birth year, and hence your age.

Never ask a woman her age, (just get her personal number)

These days it’s quite common for retail chains to ask you to join a ‘customer club’ so you can accrue points on your purchases. When you buy something the assistant will often ask for your club membership number. Almost always the membership number is the same as your personal number, so you end up telling them (and everyone in the queue behind you) exactly how old you are. It’s disconcerting, if not rude!

Fictitious Test People get real and sue you

Back to my test data problem. I’m generating fictitious people, but the system under test both validates the checksum, and uses the personal number to determine their age. I have to generate realistic numbers from the last hundred years or so. This is where things start to get tricky. Someone in the legal department objected to this. He told me that if I was to generate a number belonging to a real person we could really get into trouble because of GDPR. If that person discovered we were using their personal number, and had changed their name to “Test Testsson” living at “Test gata 1, 234 56 Storstad”, that might not be too bad. Unfortunately, if we were using them for a tricky scenario, they might also discover our test database linked their personal number to a dreadful credit rating and a history of insolvency. At that point they might start to get insulted. In the worst case, they could sue us and GDPR would mean we’d have to pay them an awful lot of money!

Keeping fictitious test people imaginary

However unlikely that scenario might be my client is understandably unwilling to take the risk, and I really need to restrict the test system to genuinely fictitious personal numbers.The authorities do provide a list of these exactly for testing purposes, but the trouble is there are only about 1000 numbers on the official list. With the kind of automated testing I’m planning to do that is not enough. Not nearly enough! Assuming my system has about 500 features, I want to write 5 scenarios per feature, I have 25 developers running all the tests 25 times a day… I need something like half a million unique numbers. Less if I can wipe the db more often than once a day, but still. A thousand doesn’t go very far.

So, here’s a startup idea for you – I’m pretty sure my client isn’t the only company out there who would pay good money for a list of a million personal numbers, for people aged less than 100, that are guaranteed not to exist in reality. If you have such a list, and would be willing to maintain it for me such that it remains guaranteed fictitious, please get in touch!

Playing with time

Right, so I’m back on generating personal numbers again, not having found that list yet. I have a couple of ideas about how to keep them fictitious though. The first idea is to use really old numbers. While I’m testing the system, I can set the clock back to 1900, and only use generated numbers from 1800-1890. Since everyone over 125 can be assumed to have passed on, they are unable to sue us. For some scenarios I need under 18 year olds, and by changing the date that the system clock thinks is ‘today’, I can generate children born in the 1870s. Fictitious children complete with bank accounts, mobile phone contracts, student debt, acne… and everything will work just fine!

Hack the checksum

The other idea I had was to use personal numbers with invalid checksums. After the date of birth and the three random numbers, the last figure in the personal number is calculated from the others, using the Luhn algorithm. While I’m testing the system, I can modify the ‘PersonalNumber.validateChecksum()’ function to instead accept all invalid numbers, and reject all valid ones. Then all my tests can generate perfectly invalid personal numbers, and the test database will get filled with people that definitely can’t exist in reality.

As with any well-designed system, there is only one place in the code where I calculate the checksum, so it’s actually a really small change just for testing. It also comes with insurance – if I ever accidently deploy this code change in production, absolutely everything will stop working, since all those valid numbers already in the production database will suddenly get rejected. I think we’d notice pretty quickly if that happened!

Is there a better way?

I haven’t actually implemented any of these strategies yet – we’re still at the planning stage. I’d be very interested to hear if anyone else has solved this problem in a better way. In the meantime, I’ve written some code that can generate personal numbers with and without valid checksums, and also lets you specify an age range – it’s available on my github page.

We can’t be the only people worried about GDPR and personal numbers. That threat of a fine of 4% of global turnover is a pretty mighty incentive to focus some effort on cleaning up test data.