DevOps is a part of the natural evolution of Agile Development and Continuous Delivery. Where quick feedback from the use of a system by its users in the production environment helps to drive the next phase of the product; maintain or improve the rate of change and the total cost of ownership. But the fundamental principles of DevOps are not new. Developers have been seeking an understanding of how their code behaves from the first line of code written. This post summarises the DevOps journey at thetrainline.com and how Operations have embraced the principles of DevOps with the goal of achieving Continuous Delivery. DevOps is a key part of the answer to improving product throughput BUT it is a small part. This post details my 12 year journey with thetrainline.com but, more importantly, the wider need for Developers to learn from others involved in a product’s life-cycle, both from a historical view point as well as the capabilities available today. For the continued growth of a business the tools and processes required to reduce the time taken for feedback from live use of a product are essential to both a start-up and an enterprise.
Without going back too far into history, when a programmer handed over his punched cards to an operator he would eagerly await the results of the next batch run, expectant of failures and waiting to learn from the mistakes made. Unfortunately the more bugs you had the less likely the operator would be to load your cards in the future. From these early days the battle lines were drawn between the developer and the operator. The developer becoming ever more frustrated that the operator would not allow them near production and the operator keen to keep things working without the ever-increasing number of adhoc changes threatening stability. As systems have become increasingly complex, and with the lines between hardware and software continuing to shift, it has become harder to learn from the mistakes. The conveyor belt of software delivery has resulted in multiple teams and layers of management trying to remain in control. Environments have spawned to manage the quality and predictability of the next delivery, increasing the distance from the developers machine and the production environment. Now not only are the developers kept away from production, but they have another layer of environments and users to work with (or ignore) and potentially another layer of operations staff to interact with. Developers understand the need for the separation of concerns when designing their applications and the software behind it. The same applies when a development team is constructed; UI developers are a different beast from those that specialise in systems integration. So developers do understand the need to have someone who manages the WAN or the SAN. But they do not understand why the person supporting the systems overnight is someone they have never seen? Why are teams organised in a way that makes it harder for us to learn from mistakes? DevOps is happening in many business scenarios not just in start-ups. For some it has always been here, its name may change over time but the underlying principles, the key being communication and trust, must be embraced by operations for an enterprise to continue to grow and succeed. Operations are the key stakeholder and must help shape and develop DevOps rather than feel they are being led down a path of no return. Similarly, the rise of Infrastructure as Software is also not new. As the complexity of infrastructure increases, the need to adopt software development patterns and practices within infrastructure deployment and management are also key, from basic versioning to test driven development, followed by continuous integration and, potentially, continuous delivery. All this points towards the importance of learning from mistakes and implementing the lessons as early as possible into the product life cycle in order to reduce the total cost of ownership. But a careful balance needs to be created to ensure that the automation and the humans performing the development and operations activity can be sustained.
Within a traditional IT/IS operation the service delivery manager is continually seeking ways to improve the level of service provided. This is typically driven by reviewing the incidents, problems and changes over a period of weeks or months. For systems that have a low rate of change, are stable and have change delivered in one large batch this approach to seeking improvements works well. However as more and more systems rely on frequent change to enhance and improve the functionality provided, and are therefore delivered through agile rather than waterfall development methodologies, the ability to manage continuous improvement by the service delivery manager is also challenged. eCommerce is well suited to Continuous Delivery and Service Improvement. Not only are the users and channels of use continually changing but so is the underlying hardware platforms supporting the systems. The typical life cycle for an eCommerce platform is measured in years not decades. Some would argue that the user interface should be replaced in months. This level of change needs an operations methodology that can provide continuous improvement. Developers need the skills of Operations staff to ensure that the systems being built can scale, operate 24×7 and meet performance expectations. Operations need the skills of Developers to automate the operation of the systems. This covers aspects such as deployment, resilience, scalability, monitoring and configuration of the systems through all environments. The historical approach of moving a system through development, QA, NFT, staging and live environments is based on a waterfall approach. The tools and techniques for managing change, incidents and problems across the environments are inconsistent and typically an afterthought. The staff members involved in the end to end process varies and they require knowledge to be shared at each stage to achieve success. Over time as development teams move to new challenges the knowledge held in the operations team is insufficient to resolve complex issues, and the cost of resolving these issues increase. Many enterprises also make use of 3rd parties and off-shore locations to operate their systems, again adding to the complexity of operations. As organisations embrace the move to the cloud we are entering another phase of evolution for IT and IS functions within an enterprise. DevOps provides an approach to deliver Continuous Improvement in the next phase of IT and IS. DevOps includes the relevant automation to achieve Continuous Deployment but it is more than that, it must include interaction between application support teams and development teams as well as development teams and infrastructure teams.
To be continued…
Part 2 will cover thetrainline’s journey in improving throughput. Part 3 will share our key learnings.