Our team supports nine applications out of the same code base (achieved by a combination of configuration, feature toggles and CSS magic). This code base has been evolving continuously over the last five years and we do at least one release every week and often more than that. Given this scenario, you can imagine how vital a role that unit tests play.
We depend a lot on our unit tests (among other things, of course) to ensure that releases go smoothly and that, when we add that shiny new feature that enables the customer to change her seat, it does not break the feature that lets her get the ticket on her mobile! To achieve this, our team adheres strictly to TDD and we have over 10,000 unit test cases that are run every time a commit is pushed to github and this number keeps on growing with every new feature development that we do.
How could we run our unit tests faster?
OK, so we have great unit-test coverage. However, the side effect of this is that it usually took more than 5 minutes to run the unit tests. Now that is not a very big number by itself but it does become an irritant when we run tests on our developer boxes multiple times a day before we push our commits to git. On a given day, a developer could have spent 15-30 minutes waiting for the tests to run and the build to finish. So how could we spped up this process?
It turns out that NUnit-3 test engine has the ability to run tests in parallel. We hoped that it would reduce our test execution time. In addition, we looked at how Rake Multitask could help us reduce our overall build times. Read on to see what happened…
Recently, we performed a mass migration of our git repositories from Gitolite to GitHub Enterprise.
We had found that the level of maintenance required on Gitolite was quite high, and had quite an impact on the team that was looking after it due to the configuration complexity. We were running a rather old version with some pretty big security flaws, and running on some out of date, snowflake servers. One of the biggest issues though was the way that it required developers to request another team to create repos and change permissions etc, adding unnecessary delay and causing blockages.
After reviewing multiple options, we decided to migrate to GitHub Enterprise (GitHub), which runs as an on-premises VMWare appliance. We chose this due to the familiarity most developers have with github.com, and GitHub’s superior support amongst third party tools. This allowed developers to create repositories and perform most common tasks as self-service, rather than relying on another team.
As this migration does not appear to be very common, this post shares some detail about the steps that were required.
Frequently, when compiling applications, we need to update the version number inside some source files, so that the binary ends up with the correct version metadata. This usually means there is a build task that modifies a source file, based on the information passed from the controlling CI system.
This works well when it all happens on the CI server, and any modifications to those files are thrown away at the end of the build. However, this can be a pain when you run the build locally, and you end up with modified files in your working copy. You are able run the same build that happens on the CI server locally, aren’t you?
This can be avoided by skipping this task in your build script if it’s not run under the CI server (for example, if certain environment variables are not present). The downside of this is that the process you test locally is different to the one that runs on the CI server.
Here at thetrainline.com we have several useful online tools for helping our customers plan and manage their train travel, including Train Times and Live Departure Boards. We recently changed the way we build, test, and deploy these kinds of applications to enable us to release new features much more frequently and easily; in fact, we shortened the deployment cycle from one deployment every few months to multiple deployments per week. These changes have produced a sea change in team culture, with a marked increase in product ownership by the team. This post describes what we’ve done so far, and where we want to go over the coming months.
For several years, much of the code for the systems at thetrainline.com has been versioned and deployed together as a single ‘platform’. Recently, we have begun to divide up the platform into smaller chunks, to enable us to deliver some parts more frequently and rapidly, leaving other parts to evolve more slowly (as needed). Moving from a single version number for all subsystems to multiple version numbers for independent subsystems has implications for how code is built and released; this blog post outlines some of the work we have done so far in this area.
My colleague Owain Perry and I recently presented on this topic at the London Continuous Delivery meetup group (http://londoncd.org.uk/) and the slides we showed relate to the details in this post:
At thetrainline.com we use Opscode Chef for managing our build infrastructure. Like many other tools running on Windows, the chef-client ohai framework relies on WMI for extracting information about the server machine on which scripts are being run. We found that Windows WMI repository corruption can cause chef-client runs to fail due to missing WMI classes, which causes the node to remain out of policy. The WMI repo can be repaired using winmgmt /salvagerepository, and the WMI errors can be monitored using the WMIDiag script to alert on WMI repository corruption before future chef-client runs. This post details how we detected and fixed the problem, and how to monitor for WMI repository corruption.
In common with other big systems, thetrainline’s systems use a variety of technologies under the hood. Most of our code is written for the .NET framework, although there are bits of other technology stacks in there as well.
Recently, working with a project targeting version 3.5 of the .NET framework using Visual Studio, I came across a rather subtle gotcha.
Visual Studio 2010 was released in April 2010 and by default will target version 4 of the .NET framework. Version 4 of .NET came with, amongst other things, the following features.
- The Parallel extensions library.
- Dynamic dispatch.
- Named parameters.
- Optional parameters.
It was this last feature – optional parameters – that was the original source of this gotcha, leading to ‘error CS0241: Default parameter specifiers are not permitted’.
A commonly overlooked area of many systems are the non-functional requirements and the design to meet those requirements. Patterns for Performance and Operability by Ford, Gileadi, Purba and Moerman provides everyone involved in the software life-cycle from development to support with a good foundation in understanding why non-functional requirements are important and real examples of how to capture, develop, test and operate with these requirements. Systems fail when non-functional requirements have not be considered and it is everyone’s role in the SDLC to consider them.
At thetrainline.com we recently transformed our software release process by rebuilding our problematic test and integration environments on a private (on-premise) PaaS cloud platform. The outcome of the 8 month project was a fully automated and repeatable infrastructure and software build process that reduced the environment build time from 12 weeks down to 4.5 hours, achieving ROI within 8 months. In this post we’ll share the rationale behind why we chose private cloud over the readily available public cloud offerings, details of the components, what we’ve learnt, and how we were able to use the experience to improve our other environments and processes.
We’re excited to be launching this blog covering the work done by the engineering team at thetrainline.com, the UK’s foremost retailer of train tickets online, which runs one of the busiest web infrastructures in the UK.
Expect details of how we build and deploy our software, how we record and display metrics, how we design, configure and operate the system, and how we diagnose and fix errors in the software, networks, hardware, infrastructure and system architecture.
Follow us on Twitter: @ttl_engineering
thetrainline.com engineering team