Moving to Multiple Deployments Per Week at thetrainline.com


Here at thetrainline.com we have several useful online tools for helping our customers plan and manage their train travel, including Train Times and Live Departure Boards. We recently changed the way we build, test, and deploy these kinds of applications to enable us to release new features much more frequently and easily; in fact, we shortened the deployment cycle from one deployment every few months to multiple deployments per week.  These changes have produced a sea change in team culture, with a marked increase in product ownership by the team. This post describes what we’ve done so far, and where we want to go over the coming months.

Until recently (unlike our main platform releases), we configured and deployed our train tools applications in a manual way (zip files for binary packages, xcopy for deployments onto target servers, etc.). This led to unnecessary errors with configuration settings and file copy operations, and made deployments, well, scary.

We introduced an improved process for building and deploying the train tools applications with the aim to:

  • Reduce errors and outages
  • Reduce on-going operational effort
  • Improve speed of delivery of new features and changes
  • Improve the auditability of the actions taken during deployments
  • Improve the visibility and traceability of the progress of changes towards production for all stakeholders

We implemented a single, gated deployment pipeline per product using ThoughtWorks GO, with role-based permissions to control the flow of changes to production. The new pipelines for the train tools applications have fewer components and ‘moving parts’ than the main platform-based pipelines, leading to a lower maintenance overhead.

Early last week, at our weekly tech shindig, Burrito Club, we showcased the progress made so far:

Moving to multiple deployments per week

Click the image to view the presentation.

Deployment Pipeline

We implemented a single pipeline to production, that flowed from initial code commit (to Git) to production deployment with smoke tests. This single pipeline allows us to see at a glance what stage a particular change is at at any point.

Train-Tools-end-to-end

  1. A build is triggered automatically via a code commit, building and executing the unit tests, outputting a single package containing the binaries.
  2. The “auto-validation” stage is triggered automatically, and runs self contained, in memory tests (using stubs and technologies such as CassiniDev and SQLite).
  3. The  “deployment-map” stage is triggered automatically and combines the binaries package with the configuration for a given environment.
  4. When ready, a developer or QA can trigger the deployment to the test environment, which is automatically followed by the smoke tests of the test environment (see below).
  5. At that point, role based security prevents anyone not in the QA role from triggering the “manual-test” stage. This stage is a manual sign-off checkpoint.
  6. Finally, Production Support is the only one with rights to deploy the code to production, which is followed automatically by (read-only) smoke tests against production.

Initially, we implemented role based permissions to ensure that only authorised people could push to production:

Train-Tools-end-to-end-with-Audit

In the end, however, the only stage with role based security ended up being the “deploy-to-production” stage. We removed the manual gates before “deploy-to-test” and “manual-test” as the culture of trust increased.

At each manual checkpoint, any user can easily see who triggered the stage:

Approval-for-Prod

Hurdles

One limitation of Thoughtworks Go that we had to work around was the assignment of build agents to a pipeline. As this can only be done at the pipeline level, this meant we had the possibility of a normal build running on the deployment agents (in the production environment!). In the end, we created a small utility that triggered a separate pipeline and returned the results.

Rollbacks

Our (internal) packaging and deployment technology detects a previous installation and will uninstall the old package and install the new. This enables us to easily roll back to an earlier version by simply running a previous pipeline.

Network Security

In order to isolate the deployment agent servers from as much of the production infrastructure as possible, the deployment agent servers are placed in a limited-access DMZ-style restricted zone. The deployment servers are able to reach target servers in production only on specific ports, and can reach their controlling GO server on the internal network, but little else.

Deployment pipeline network diagram

On the left side of the diagram, we can see the build agents responsible for compilation and unit testing. On the right, one of the several deployment environments (eg, test, pre-prod or production).

Culture

The most difficult part of this implementation was the cultural change required – from developers who are used to doing deployments the manual way, to greatly increasing the trust required by IS Operations of the developers, to the people who did the actual deployments.

In the past, the Production Support team received a change request with a one or two sentence description of the change. This didn’t really make them happy, but as they were involved in the deployment, it sort of balanced out. As we changed to one-click deployments, we lost the sense of involvement, leaving the team feeling completely out of the loop. We learnt the hard way that the development team had to share much more information about code and configuration changes with the people who were charged with the first line support. After a slightly rocky start, we’ve all ended up in a much better place where communication and involvement are key.

From a development team perspective, we’ve found a much greater sense of ownership, increased confidence and happiness. The team now is much more production focused and cares much more about getting the changes into the hands of the users. Probably the best indicator of this is the “ready for deploy” column on our kanban board is now usually empty!

One of the most suprising challenges we faced was that we started delivering much faster than the business was expecting. We found that we hadn’t taken the business with us on the ride, and they were not used to things being delivered this fast. However, increasing communication helped us around this – something that can only help.

Future

Future improvements will likely include:

  • Better visualisation – we found that the single pipeline was good for developers, but didn’t really show other teams what they were after. It also meant that in the event of a rollback, there was no obvious way of showing that this had occured. We are planning a dashboard that has slightly different views available, depending on the audience. For example, the business wants to know when deployments happen and when they can expect their change to go live. However, Production Support wants to know what is in production right now, and what changed in the most recent release.
  • Automated maintence mode – we inadvertandly caused several P2 alerts by deploying without setting our monitoring tool (SCOM) into maintenance mode. We are intending on updating our installation tool to handle setting the server into and out of maintenance mode automatically, and raising alerts if an installation fails.
  • A fully trusted automated test suite – we are aiming to get to the point that our automation is trusted enough that we can remove the manual test stage (bring on Continuous Deployment!).
  • Automated raising of CRs – we are contemplating generatiung a change request (CR) automatically from Git commit logs.
  • Blue/green deployments – now that we can deploy much easier, and once we have migrated all train tools apps to this new process, we will implement blue/green deployments with automated switching at the load balancer level.

Summary

In short, we have found the new deployment pipelines have had a suprising effect on the culture of our teams, enouraging shared ownership and a delivery focus. With shorter cycle time, the business has been able to see changes delivered to production much sooner.

Acknowledgements

Thanks to Pete Belcher and Matthew Skelton.

5 thoughts on “Moving to Multiple Deployments Per Week at thetrainline.com

  1. Pingback: Sysadmin Sunday 158 - Server Density Blog

    • For other applications, we are using environments in this way, however it prevents the single pipeline visualisation. The Value Stream Map feature in Go 13.2 goes some way to solving this, but we were aiming for the single end to end pipeline view. As we put multiple parallel test environments into this pipeline, we will run into issues again which might force us down the path of the Value Stream Map, or even away from the Go view and towards the per-team dashboard concept.

  2. Nice feedback. Your story is pretty much our story.
    When i saw your pipeline, i was wondering how you could handle many environments in one pipeline.
    Then i got the answer when red the “Hurdles” paragraph.
    I can suggest a way to make it work.
    Instead of having a single pipeline having X stages, you could have X pipelines with one stage. Then, each pipeline can have a different environment.
    Of course, you will have to use artifacts between your pipelines. I guess you had to use it anyway between your stages.

    The easiest way to transform your pipelines this is to change directly the xml configuration from the server (backup before, and you can notice that this configuration is under git source control).

    • Thanks for your comment.

      The multiple pipelines approach seems to be the way that is encouraged with Go these days, but unfortunately that seems to break the main benefit of using Go – the end to end pipeline visualisation. Its a shame, really.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s