Is DevOps the answer? Or just a key part of the Journey? Part 2

This post is part 2 in a series. Read part 1.

thetrainline’s Journey in Improving Throughput

From a very early point in thetrainline’s journey it was clear that the web site was only the tip of the iceberg and that there would need to be a continued programme of development to: improve customer experience; adapt as web technology evolved; and as more automation was implemented in back office processing from initially fulfilment to most recently refunds. In the past 14 years rail travel has doubled in size and the customer’s expectations have also risen. Although journey planning and advanced purchase ticketing are well planned and carried out in advance the immediate future will see more innovation in ticketing from smart cards supporting multiple train operators to NFC payment and potentially ticketing. In order to be able to provide the required levels of service at the relevant price for the product and across all channels and devices thetrainline will need to continue to improve the throughput of ideas through to production implementation.

1999 to 2003 – The Waterfall Age

The first four years of our journey were formed in the early dot com bubble with much hype over how the internet would revolutionise the modern world. With hind sight we are not quite there but without doubt the internet has and will continue to dominate all markets and businesses for the foreseeable future. The demand for the speed of change will continue to grow if an enterprise is to continue to grow. was first launched in 1999 and in its first 12 months as a start-up it was able to benefit from the development, support and infrastructure teams/specialists all working closely together. Despite having a typical waterfall approach the proximity of the teams to each other as well as a single management structure allowed for operations and developers to communicate and rotate roles, as well as help ensure feedback from the production system was used in the next phase of development.

However as the system grew and matured, as experienced people left and new members joined, and as SLAs were defined, the lines between development and operations teams were drawn. The push to reduce costs resulted in infrastructure working in mutualised teams in remote locations. Initially when there was a period of relatively little development this was acceptable, but this would ultimately lead to the redevelopment of the system with little knowledge and experience of the operational requirements and how the system behaved in live being understood by the development team. During this period releases were delivered every quarter but the time from idea to live would be at least six months with significant management overheads.

2003 to 2007 – The Dawn of Agile (Almost)

The next chapter of growth would see the implementation of more agile methods of development as the need to re-platform from a Visual Basic and Windows 2000/NT infrastructure was clear. The cost of adopting a waterfall approach to the re-platform would have resulted in a multi-million pound failure and potentially the end for thetrainline.

A major platform refresh was undertaken, including both the application software and the infrastructure. The initial approach followed was a traditional waterfall approach but using the Rational Unified Process (RUP). During the development phase key developers would play central roles in providing and maintaining the build and test environments, automating the deployments where possible using Microsoft tools. BUT when the system needed to be deployed to production significant delays and pains were felt due to the change in people involved and the diverse responsibilities of the development and operations teams. Migration from the old platform to the new platform had also been underestimated. The lack of process and organisation structure, married with the poor tooling and automation led to severe delays.

To regain control Agile Development approaches were implemented, including extreme programming, test driven development and continuous integration. However there still remained a separation between Development and Operations teams. This provided a sense of control over changes being implemented in production but ultimately still prevented the required functional knowledge in operations teams and the lack of production feedback in the development teams. The net effect, a significant amount of product improvements still remaining on the shelf for months and by the time they were live the developers had already moved onto something else. It also required the management of knowledge transfer to Operations teams but with the inevitable loss of knowledge.

2007 to 2011 – The Teen Years of Agile

To address the loss of control over production and to increase the knowledge and feedback into development of operational requirements 2nd and 3rd line application support teams were formalised and close working between the teams a primary goal. Initially this proved successful as it provided a much needed stability to production systems, but again we would ultimately hit the limits of continuous improvement.

During the same period server virtualisation was also implemented in the test and production environments. Due to physical separation as well as commercial relationships the resources, tools and management of the environments remained separate. This initially did not block improvements in throughput and stability but in the long run led to inconsistency in automation approaches and a lack of feedback to the development teams.

2011 to present – Improving on Agile

The most recent past has been focused on three key initiatives:

  1. Formalising and enhancing operational processes such as Run Books, Service Monitoring and Change Control to work with Agile delivery, without this being managed and stable other initiatives would still be blocked.
  2. Removal of snowflakes within the test and development (build) environments. Due to the number of environments, variations in purpose and users of the environments, plus variations in operational requirements, the need for closer collaboration and communication between development and operations staff has never been greater. Development and Test environments are production systems, if they stop working it is very costly in terms of lost productivity and velocity.
  3. And most recently BUT the most important step has been the implementation of Product teams responsible from idea to operation of their products from development through to live.

To be continued…

Part 3 will share some of our key learnings.