Leaving the Platform – Branching and Releasing for Independent Subsystems

For several years, much of the code for the systems at thetrainline.com has been versioned and deployed together as a single ‘platform’. Recently, we have begun to divide up the platform into smaller chunks, to enable us to deliver some parts more frequently and rapidly, leaving other parts to evolve more slowly (as needed). Moving from a single version number for all subsystems to multiple version numbers for independent subsystems has implications for how code is built and released; this blog post outlines some of the work we have done so far in this area.

My colleague Owain Perry and I recently presented on this topic at the London Continuous Delivery meetup group (http://londoncd.org.uk/) and the slides we showed relate to the details in this post:

Why Release All Systems as a Platform?

The codebase of the systems which power thetrainline.com began life around 1999, when the first public-facing booking system was launched, in partnership with Virgin Trains. We had a substantial code re-write around 2006, and today, we have about 4 million lines of application code (mostly in C# on the .NET platform).

TheTrainLine in 1999 - Wayback Machine

TheTrainLine in 1999 – from Wayback Machine

During the code re-write, we needed to be sure that we had consistency across all parts of the code when testing new features, and so it made sense to apply the same version number to all subsystems and components, and then deploy all parts of the system together as a ‘platform’. The rate of code change at the time was very high, and almost every part of the code was undergoing rapid changes, so it was also necessary to deploy almost every subsystem on a regular basis.

TW Cruise - NewPipelineActivity

TW Cruise – NewPipelineActivity – from http://www.thoughtworks.com/products/docs/cruise/1.2.1/help

The subsystems were built, tested, and deployed with Continuous Integration (CI) techniques using Cruise from ThoughtWorks (we have since moved to ThoughtWorks GO for CI and deployment orchestration). Delivery of features which spanned multiple subsystems required any given team to work on any of the subsystems. Deployment to Production happened out-of-hours (overnight) and required us to take down thetrainline.com systems for many hours during the deployment activity.

As of October 2013, we deploy a major new release every six weeks, and need around five weeks for fully testing that new set of changes. The production deployment itself is now fairly rapid: down from 6 hours in 2010 to 17 minutes now, thanks to some nifty Blue-Green deployment techniques put in place by our Deployments and Environments team.

Platform Heartbeat

The Platform release heartbeat

With the platform components, we have not to date made much use of feature toggles, which means that we need to use release branching in order to manage the work required for a particular release.

Release Branching

Branching for platform releases

At any one time, for subsystems in the platform we have three release branches active. Bugfixes from older branches (either in or on the way to Production) are merged into newer branches, and once a newer release is stable in Production, we delete (actually, deactivate) the previous branch, leading to a ‘staircase’ branching scheme without a mainline. This is not perfect, but fits the platform release model.

What Are the Limitations of a Platform Release?

Building systems as a single platform does have some advantages in terms of simplifying interdependencies and whole-system testing, and was useful when the systems at thetrainline.com were evolving at an identical, rapid rate. However, our subsystems now need to change at different rates, and, the release branching scheme became increasingly difficult to manage as more and smaller subsystems were added into the platform. Our cycle times were also quite long (around 12 weeks from start of development to a feature being available in production), and some systems needed to change more rapidly than this. We identified that Conway’s Law was having a negative effect on the design of our systems: because any team could change any code anywhere, we started to see a blurring of domain boundaries between different subsystems which ought to be separate, for instance – more on Conway later.

Continuous Delivery by Jez Humble and David Farley

Continuous Delivery by Jez Humble and David Farley

All of this led us to look at practices and techniques from Continuous Delivery, which advocates (among other things) avoiding builds and deployments if the component has not changed. We have started to extract some subsystems from the platform, making them what we’re calling ‘independent subsystems’. Most of these systems still call into parts of the platform, but we’re treating these independent systems somewhat differently, building and deploying them when they change, and pushing those changes more frequently than every 6 weeks.

We expect this to allow us to see return on investment (ROI) for new features sooner, and to make changes to our website more quickly, responding to changing market conditions (such as weather, industry news, Government policy, etc.). More simplistically, a software developer is more likely to be able to remember details of the code they wrote a few days ago compared to code they wrote 12 weeks ago!

Supporting Independent Subsystems which Depend on a Platform

Taking Advantage of Conway’s Law

We decided to take the implications of Conway’s Law and turn them around: if we set up our teams – and the work we assign to those teams – to reflect the ideal communication between software subsystems, then we have a good chance of building systems which work well together and avoid ‘bleed’ or incorrect coupling between domains. Following some of the great work done at Spotify on team structure, we have started to align our software development teams with groups of products and related subsystems.

Moving to product-aligned teams

Moving to product-aligned teams

We are also looking at how to achieve effective cross-team collaboration for concerns such as security, deployments, or performance (shown by the horizontal oval in the diagram, contrasting with the vertical product-focused team groupings). A team will need to honour a ‘social contract’ with other teams for the subsystems and components which it provides, cleaning up any ‘mess’ which they introduced or inherited, but gaining the authority to make appropriate changes to their systems as they see fit from an engineering viewpoint.

Semantic Versioning

We have also identified the need for the widespread use of Semantic Versioning for communicating meaning between teams and components to help reduce some of the complexity introduced by multiple version numbers across independent subsystems (in the .NET world, the scheme is known as SemVer).  By identifying in the version number when a change in the component will break consuming clients, teams can make effective decisions about when to upgrade to a new version of a dependency, which helps to avoid tightly-coupled changes across different teams.

More Frequent Deployments

Independent subsystems use a semantic versioning scheme separate from the version number of the platform, which allows us to communicate to any ‘consumers’. For these systems that we want to deploy more frequently (‘interim deployments’), we still need to synchronise with the six-weekly ‘heartbeat’ of the platform release, which we treat as a breaking change, because all platform components will have been rebuilt and deployed. This means that as the platform release approaches, the rate of interim deployments decreases, to give us time to test the independent subsystem against the Release Candidate of the platform.

Platform Heartbeat with Interim Deployments

Platform Heartbeat with Interim Deployments

For independent subsystems we can develop almost entirely on the mainline (‘master’ in Git, or ‘trunk’ in Subversion – we use Git). Only when we need to test against the (potentially ‘breaking’) platform release do we create a temporary release branch; once the platform release has gone live and is stable, we can merge the release branch back to the mainline (with some tags).

Branching for independent subsystems

Branching for independent subsystems

In fact, in some cases we avoid the need to create a release branch by running nightly CI builds against the release candidate, and only branching if the build fails. The two branching schemes can be compared visually like this:

Comparison of platform release branching scheme and independent branching scheme

Comparison of platform release branching scheme and independent branching scheme

Eventually, we expect to be able to release changes to some systems daily; for the time being, we will retain the 6-week platform ‘heartbeat’, but see the strength of this rhythm reduce as more systems are made independent over time.

Thanks to the members of the #londoncd meetup group for comments and questions, and to Matt R for the mainline branching optimisation details.

One thought on “Leaving the Platform – Branching and Releasing for Independent Subsystems

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s