Clusters, IT Operations and why Availability is the wrong word


It has been a very busy time at thetrainline, transformational projects have been implemented to improve our development capacity and IT Operations, including automation of our build agent environment using Chef, and more direct control over our hardware across the entire Development to Production pipeline. Consequently the Engineering Blog has been somewhat overlooked, my apologies.

Clusters

Even though the above is an excellent list of cool stuff that needed doing, for me the most important change (aside from the arrival of a new CTO Mark Holt) has been the shift from a project centric development approach to ‘Clusters’. Clusters have been on the (Kanban) Cards for a while at thetrainline but has taken time to make through the Pipeline. The brain child of Duncan Freke our Development Director, it is about aligning the Dev and Commercial teams on a Product basis but by doing so empowering the Product Owners to take ownership and derive true value for the customer.  Sounds like Marketing hype right? Yes, but that does not mean it’s not true.

Where do IT Operations fit in?

Given Clusters is primarily an evolution of Development and Commercial departments you would be forgiven for wondering why, as Head of IS Operations, I am writing about this change. The answer is simple, I believe IT Operations is not just about Availability but about the Service we provide to our customers.

For example, when I tell someone who I work for and the first thing they mention (after asking if I get cheap train tickets) is our booking fee I take it personally. Not because I think they are being cheap but because we have failed them. If the experience they have received – whether it is the performance of the site, the features they used or the information we gave them throughout the booking flow – did not give them added value equal to the booking fee then we have done something wrong.

This is why Clusters is important to me. We now have true ownership for the products we offer and as such my teams have a consistent place to go and explain the customer experience that we see – this does not exist with a project centric approach.

Why is Availability the wrong word?

So, after I have apologised to someone I just met for them not getting the value out of our site I explain what I do, but I actively avoid using the word Availability. It’s not that I don’t like the word or what it stands for, but for me it’s the wrong focus.

To explain why I am going to digress for a minute and draw an analogy with mobile phones. You may remember some of these beauties (I even had a pencil case that looked very much like the second from the left 🙂 ):

mobile-evolution-1

Source: Kyle Bean (Mobile Evolution). Works licensed under a Creative Commons BY‑NC‑ND 3.0 License

Once mobile phones entered the mainstream they rapidly got smaller and smaller. However, as I was reliably informed some years ago by an Industrial Designer friend of mine, miniaturisation (in general) was not driven by consumer demand but was a natural result of improvements in technology. Availability of IT systems is no different. As the technology improves from dedicated hardware to virtualisation to Cloud computing, availability should increase as a by-product. Therefore to just focus on availability is like making a phone from the 80’s smaller and smaller, where is the ease of use? Where is the bigger screen, where are the Apps!!??

So, if miniaturisation was not the primary consumer demand, and therefore the right focus, what is?

Time

It’s Time. The thing that everyone wants more of, and are we willing to pay for. Time is the great demand of consumers, don’t believe me? Take a look at the retail cost of Laptops and see how much more you pay for a faster Laptop:

ProcessorCost

Prices courtesy of Insight UK 03/06/14

I find it hard to believe that the cost to design and make an i7 processer is ~£600 more per unit than an i3 even when you factor in the difference in motherboard, number of cores etc.? No, these are commodities, most likely made in the same factory and at similar cost. You are paying a premium for the convenience of time – saving yourself many seconds for each operation, which in total mean you can get more done.

If Miniaturisation is to Availability, Time is to…

Service! When you remove the focus on availability and understand Time is the new currency you realise that uptime is not the be all and end all of IT Operations – it’s just the beginning. Service is the key.

That means we need to be measuring real customer performance, we need to be tracking errors that slow customers down and we need to build robust compartmentalised systems that mean a problem in one area for one customer does not impede the ability of another.

It also means that as an IT Operations team we need to be able to respond to issues that come from the other departments (Contact Centre, Commercial teams etc.) quicker and crucially first time without the waste associated with ticket queues and handoffs.

Finally, we should see Clusters as our opportunity to be working closely with Product Owners, feeding back these experiences so that they can make Brilliant Products that constantly strive to meet customer demand and expectations.

This is why the change to Clusters is important, it should be a catalyst for the IT Operations teams to move from the traditional Availability is King model to one where Customer experience is King, that means being more proactive, more accepting of Agile, Dev driven processes like Continuous Deployment and always ask ourselves are we providing the best experience possible?

In short the new word is Service, not Availability.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s