Is DevOps the answer? Or just a key part of the Journey? Part 3


This post is part 3 in a series. Read part 1 and part 2.

Key Learnings

Part 2 finished detailing our relatively recent move to Product teams, a change that has had a big impact on our delivery process.

While this is definitely an exciting change, with product teams having a lot more responsibility from development to live, it highlights the fact that Development and Test environments have some needs that are similar to the Live environment, but also differences that must be clearly understood and supported, potentially in a different way to the Live environment:

  • Provisioning both Automated and Manual
    Use of the same tools, process and resources across the build farm to live deployment is key to reducing the time taken to operate the pipeline. The same tools should be used. The market has no clear leader, with Chef and Puppet being popular in the space but still lacking many capabilities.
  • Configuration Management
    Driving as much of the configuration of the infrastructure and application from development through to live using SDLC processes.
  • Change Control
    The number of gates and level of approval is historically driven by the risk of the change failing and also the time at which the change will be implemented. With increased automated testing, Canary Releases or Blue / Green deployments with adequate real time monitoring changes can be made without the need for a formal manual change review board. The quality of auditing becomes more important; when did change x actually take place and when did the system observe a change in reliability?  The tooling here remains inadequate and bespoke in particular for systems that have a large fulfillment window, i.e. a UI change may not result in customer issues for many days if the fulfillment is delayed.
  • Incident and Problem Management
    Whom to communicate issues to will vary, but as mentioned previously an outage to a test environment can be as important as a production incident. The tooling and general processes for managing problems and incidents should be the same but the communication plans and business impact does differ, i.e. internal communications vs external communications and contractual liabilities. JIRA is more than adequate at managing the tickets but the maturity of an organisation to prioritise non-production incidents and problems over production is a benchmark in evaluating how well continuous delivery is understood.
  • Availability and Performance Management
    Application Performance Management tools, and those with “Real User Monitoring” are essential and must be accessible to everyone. Products such as NewRelic and AppDynamics lead the field. Ensure the APM tools are also used on the build and deployment infrastructure, as well as across test and live environments.
  • Capacity Management and Scalability
    Cloud public, private or hybrid has allowed for a step change in auto-provisioning of servers. However scaling storage, although cheap, still requires effort to implement, and there are differences between Test and Live environments that need to be handled.
  • Security including Anti Virus
    What are the actual threats that need to be protected against? Security should be baked into applications and development tools and processes but there still needs to be effective dynamic monitoring of threats. The live systems will need extra protection and monitoring.
  • Patching (not the application code)
    In the ideal world all servers will be baked and rebuilt frequently with the latest patches applied, but there will still be servers that cannot be rebuilt and hence effective processes still need to implemented to allow for patching of the operating systems, rdbms, messaging frameworks (middle ware).
  • Backups
    Leave this to someone else, do not burden your Product teams with this, BUT ensure that there is clarity between systems of record and systems of engagement. Infrastructure has the capability to backup both machines and data.

Summary

For any eCommerce business in 2015 Continuous Delivery is mandatory and not an option. Automation plays a key role in this.To ensure the Operations requirements are built into the system being developed, as well as how it is deployed, monitored and managed, the operations and development resource MUST work together. The culture and management of the organisation must embrace this. This does not mean every resource involved in development and operations is involved as a full time member in every team, as this would be impractical in any medium to large size organisation BUT they must all follow the principles of Continuous Delivery and be encouraged to do so.

Shipping early (MVP), even with minimal features allows for quicker feedback which in turn drives product optimisation. Discovering that a product is not fit for purpose early in the process is significantly cheaper than uncovering the failure at the end when full design, development and integration costs have been incurred for the entire product. This is one of the great benefits of the agile journey.

This is important for us at thetrainline as the time taken to release new products and resolve issues post go live impacts the next wave of development. If it takes multiple weeks before a new feature leaves the test environments to live use this then causes a drag on product improvements. It is not possible to have an effective backlog of work to balance with new features if the feedback from production takes many weeks. Development resources are re-deployed to new features which are then delayed increasing the total cost but more importantly delaying value to the business.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s