Trainline is Europe’s leading independent Rail ticket retailer, selling £2.3bn tickets per year and enabling our customers to make more than 100,000 smarter journeys every day. We have 150 development staff who are constantly improving our user experiences, and our need to innovate means that we cannot allow the underlying infrastructure to be a constraint on time to market.
This desire for infrastructure agility recently led us to migrate 100% of our Development, Staging, UAT and Production environments from legacy private data-centre to Amazon’s public Cloud. Simply lifting and shifting components into the Cloud would have improved agility somewhat, but for us this was just the starting point.
In our legacy environment, Trainline developers spent up to 30% of their time troubleshooting environment issues – mismatched or missing versions, differences to other environments, problems with deployments etc. Consequently, we recognised that we also needed a standardised way for our developers to quickly and safely deploy and manage individual applications and entire environments. After much searching, and several false starts with existing tools (see below), we ended up developing our own platform. This platform is known as Environment Manager.
As announced at the recent AWS Summit, we’re delighted to be open sourcing Environment Manager and hope it proves useful to a wider audience. The code is available as of today, and we would love to hear your feedback!
What is Environment Manager?
Environment Manager is a minimally opinionated platform that enables continuous delivery of software components into Windows and Linux AWS environments. It features blue/green, canary and overwrite deployments; multi-tenancy support; an API that enables deployments to be scripted into build pipelines; as well as audit capabilities suitable for a PCI Level 1 organisation. Environment Manager is best suited to companies with between 100 and 5,000 servers running a mixture of legacy and modern applications.
Key Features and Benefits
- Developers save time by deploying new services flexibly and reliably using an innovative approach based around Consul
- Create and refresh entire environments, trigger deployments across one or more servers with full visibility into deployment progress and issues
- Reduce downtime with built in health checks that alert and automatically remove misbehaving applications from the service pool
- Manage environments – who owns them, what they are for, what applications are deployed where
- Minimise the likelihood and impact of errors using blue/green and canary deployments
- Improved visibility and control, including change audit history and rollback, leads to faster diagnosis of issues when they do occur
- In-built capabilities to determine the age of AMIs help improve security compliance
- Reduced AWS costs by scheduling servers and/or environments to be turned off when not in use
- Reduced AWS costs through efficient multi-tenancy support
- Support staff save time by scaling and patching servers without downtime
- Compare and synchronise environments e.g. view app versions deployed across Test, Staging and Production
- Improved manageability through the automatic application of infrastructure standards e.g. for security, naming, tagging etc. as part of deployment process
- Fine grained permissions model based on resources, actions, environments and ownership
Why Yet Another Tool?
When we looked at existing tools, we found that they tended to be opinionated about either the approach (e.g. Spinnaker assuming baked images); or about technology (e.g. Kubernetes assuming everything is a container); or about single-tenancy (e.g. AWS Code Deploy); or lacked first-class Windows support (e.g. basically everything).
In the real world, most businesses have a range of technologies including many that are not ideally suited to Cloud or that cannot yet be containerised. For example, Trainline still have one old Visual Basic application that requires a Windows server joined to Active Directory with a static IP, yet we also have cutting edge NodeJS applications deployed to immutable Linux servers.
Environment Manager has been designed to be as un-opinionated as possible. It supports both Windows and Linux, along with a range of deployment methods including basic overwrite, blue/green, canaries, and immutable servers. Support for containers and Lambda functions are also on the roadmap.
Taking a more pragmatic approach means a single tool can support our whole estate as it is now, and as we continue to evolve systems to newer technologies in future.
Another key difference in Environment Manager is the level of abstraction it deals with. Rather than expecting people to log into the AWS Console and manage individual instances, volumes etc. the tool provides a higher-level abstraction over the top of AWS that better fits the mental model of developers e.g. ‘I want to deploy this component to that environment’. It does not attempt to duplicate the AWS Console, merely provide an easier and more natural way of performing environment level operations and the most common instance tasks.
Limitations and Assumptions
Environment Manager is best suited to companies with between 100 and 5,000 servers running a mixture of legacy and more modern applications.
Environment manager currently makes the following assumptions:
- All infrastructure hosted in AWS
- For EC2, all instances are deployed into Auto-Scaling Groups as the unit of management
- NGINX is used for load balancing
External dependencies have been designed to be pluggable, so if you prefer an alternative technology it should be supportable with relatively minor code changes. Similarly, Trainline specific standards (e.g. for how instances are named and tagged) can be adjusted to suit your own requirements with minimal effort.
What has not been accounted for is the use of a Cloud platform other than AWS, or any kind of hybrid infrastructure. This is not on our roadmap as we intend to stay with AWS for the foreseeable future.
Under the hood Environment Manager is built with the following technologies:
- The self-service console is an AngularJS web application backed by a set of NodeJS APIs
- Lambda functions help with housekeeping and ancillary tasks such as instance scheduling
- Consul is used for service discovery, health checks and deployment
- Configuration is held in DynamoDB
- Active Directory is used to authenticate users within the tool and detect group membership
The diagram below shows a logical view of the application architecture.
The code is available now on the Trainline public github repository. All code is under the Apache 2 license.
There is also an embryonic Environment Manager website with further information and installation instructions. More content will appear here over the next few weeks.
We wanted to open source this tool as early as possible and then continue to improve it. This means it still has some rough edges and Trainline specific assumptions. In particular, it should be noted that setup is currently a pretty lengthy process.
However, we are now using this open source version ourselves internally, not a separate fork. This means we will continue to add features and improve the code base – including improvements for the wider community such as reducing dependencies, simplifying setup and removing Trainline specific assumptions.
We genuinely believe this tool has great potential to help in the wider community. We would love to hear what you think!
For feedback, help or suggestions, please contact: firstname.lastname@example.org