Continuous Delivery with Blue/Green Deployment

At thetrainline.com we are always striving to deliver the best user experience for our customers. We want to get great ideas from conception to the customer as quickly as possible, to enhance our offerings and streamline our processes. This post talks about how we helped achieve this by harnessing Continuous Delivery.

Following on from the post (Moving to Multiple Deployments Per Week at thetrainline.com) that Matt Richardson and I published in December 2013 we have since evolved our deployments to allow us to achieve automated continuous delivery of our components from development through to production – with zero downtime.

Continue reading

A walkthru of July 2014 ThoughtWorks Technology Radar

James Lewis, who is one of the authors of Thoughtworks Technology Radar gave us a personal walkthru of the latest version.

It is best read at the link above as it allows for a wealth of interaction and so the only comment I would add is that I think that when creating a team, consideration should be given to protection via the Corbomite maneuver rather than focusing on the Inverse Conway Maneuver  as the radar suggests.

 

Joel Spolsky on managing large sites + dealing with Idiots + non-citable dogs

Nice things

We at thetrainline were lucky enough to have a visit from Joel Spolsky @spolsky.  This was a good thing and bad.  For instance, in ensuring that I got the spelling for his website http://www.joelonsoftware.com/ correct, I have just spent 30 mins reading the damn thing.

…and it’s still more interesting that writing up his talk….damn another 15 minutes lost.

Focus Mr Shoop.

Anyway back to Mr Spolsky.  He talked to us about several things

The below is my take on what he was saying and what I found interesting rather than just a precis of his talk.  Therefore it contains my analysis of what I think we was meaning ..which of course may be wrong.  Joel if you are reading this and disagree – then please comment.  Also  – Joel if you are reading this….really?

A group is its own worst enemy

Worst enemy

You start small with an interest group and then after you grow to around 10,000 users the chances are pretty good that the group will now have attracted someone who is more interested in playing with the group that what the group was setup for.

This individual will  then start behaving like a teenager pushing the limits of what is permissible and will keep going until they provoke a reaction.  And they cry hard and loud that they are being oppressed.  Then the freedom loving members of the actual community by default come down on the side of the individual and the scene is set for a time-wasting, meaningless discussion on what should and shouldn’t be allowed.

Evidently Clay Shirkey has researched the issues relating to the growth of online communities.  (the article is a somewhat academic phrasing of a problem that Joel has addressed…in a way that Clay indicated that such problems need to be addressed)  A convergence of ease of action, instant gratification , an audience and anonymity unleashes the worst of behaviours in some people.

Thus something that was nice is now broken. So rules/processes have to be created in order to have nice things.  These are not optional if you still want the nice things.

 Diversion – Wikipedia – some things you probably didn’t know

Joel then meandered away from the subject to talk about Wikipedia.  Thinking about this, as I now write-up the talk, this was more or a storytellers approach (a Ronnie Corbett type story for those old enough to have watched the Two Ronnies in their youth) to meander off into a seemingly unrelated but entertaining topic and thus surprisingly it arcs back to the topic in hand.

So..off on entertaining meander…..

The wikipedia rule

Q:  What does Wikipedia publish?

A:  It seems that the answer isn’t  The Truth.  Or even those aspects of The Truth that can be verified.  But actually it’s – what other sources have reported about the truth. ie it’s a tertiary source.  This is why Philip Roth couldn’t directly change one of the articles about himself. His direct change to Wikipedia couldn’t be referenced to a source outside Wikipedia.  I guess if he had the change written up in the Guardian, then Wikipedia could have made the change and referenced the Guardian article.

For the same reason Joel couldn’t get an article about his dog Taco. There are no external articles about his dog that Wikipedia could refer to and so there can be no entry in Wikipedia – in their language Taco the dog is WP:N ie not notable.

Note that this blog has no such restriction

Taco

As an aside, which again was another spiral that we would come back to, Wikipedia is running short of admins (those who regularly edit).  This could be due to the conflicts around notability – eg “Wikipedia doesn’t want things that are correct or provable only what notable people say.”  This is a misunderstanding of their rules and is causing issues as people assume they know what notable means without actually knowing how Wikipedia uses the term (ie notable does not equal famous)

Anyway back to the mains story – why did Wikipedia use this rule?  Because if it didn’t then facts couldn’t be checked and so could be filled with incorrect info.

But linking back to the main topic that if you create something large then bad things will happen.  There will be people with a vested interest in having a page in Wikipedia read a certain way – and there will be others who will just get a kick out of having their edit in a page even if, or especially if their contribution is wrong.  Thus by insisting that whatever appears in Wikipedia must have appeared elsewhere then the worst of this is mitigated and by using citations at least everyone will know from where Wikipedia is getting the information it publishes and can form an opinion on its likely veracity accordingly.

As an example of what happens when a reasonableness test is removed – look at comments on YouTube … and it is just drivel.

But this rule for Wikipedia contribution has some bad effects eg Philip Roth couldn’t correct a statement about himself that he felt was wrong.

But remember the rule of idiots – Wikipedia has now scaled so that it is no longer dealing with just reasonable people.  The rule is to lock out the idiots and idiots who will try to use unreasonableness as a reason to let them ruin the site.  So I guess Wikipedia would rather than a restricted goal that can be defended rather than the nice thing to break.  The notability rule wasn’t there at the start of Wikipedia, it was only needed when it got large.  Timing is everything,  Mr Roth

The rule

I guess this is an example of trying for the best result possible rather than the best possible result.

But but but, in enforcing a rule you are going to lose friends (remember those declining Wikipedia admins) so rather than not enforcing it, and letting in the idiots, its a good idea to see if you can work on the presentation of the restriction eg Stack Overflow (of which Joel was a co-founder)  introduced a rule that said that questions should be closed if the questions were a matter of opinion.

This was important as he realised that the main beneficiaries of a question and answer session weren’t the original questioner and those that answered (typically 2 to 4) but those who came next.  For every question and answer session there are hundreds or thousands of readers of that session that came after looking for an answer to the same thing.  Thus if Stack Overflow allowed itself to be clogged with religious wars or question that could only be answered by opinion, then it wasted the time not only of those engaged in the futile discussion but everyone who wanted a genuine answer to the issue under discussion.  And this would only get worse and if it got so bad that most sessions were unproductive then people would stop using it as a resource and a nice thing would be broken.

So hence the not constructive rule (too opinion based).

They also stopped shopping basket questions (what are all the things that can do this? – the info gets out of date too quickly)

Also too localised – too specific and so not of interest to others (even tho it might be of interest to questioner)

..and it wasn’t for community/jokes

So to change perception of their rules they changed terminology to make it more aligned to the actual principle behind the exclusion

  • Closed -> On hold (it can be re-opened)
  • Too Localised -> off topic
  • Not constructive -> Primarily opinion based
  • Not a real question -> unclear what you are asking

The result of these changes saw an increase of questions that were edited after ‘closing’ to try to get them reconsidered from 5% to 10%.

changes  after closing

And the level of engagement with StackOverflow increased – those who post at least 5 times has increased over time ie Stack Overflow has introduced rules to protect itself which by definition will stop some contribution and will also almost inevitably have unintended/unfortunate consequences, but by putting effort into explaining their position, they have managed to take their core audience with them.

5 users

Conclusion

The conclusion seems to be – if you have something nice and enough people find it then you will attract those who want/result in breaking it.

In order to retain the benefit you have to remove things.  Either content of purpose. In order to remove things you must have rules.

Those rules are likely to be misunderstood by those who you value.  This can ruin your site as much as the barbarians you were trying to protect yourself from.

So have the rules and work on communications with those you care about to minimise the damage to/from them.

 

 

 

 

 

 

Clusters, IT Operations and why Availability is the wrong word

It has been a very busy time at thetrainline, transformational projects have been implemented to improve our development capacity and IT Operations, including automation of our build agent environment using Chef, and more direct control over our hardware across the entire Development to Production pipeline. Consequently the Engineering Blog has been somewhat overlooked, my apologies.

Clusters

Even though the above is an excellent list of cool stuff that needed doing, for me the most important change (aside from the arrival of a new CTO Mark Holt) has been the shift from a project centric development approach to ‘Clusters’. Clusters have been on the (Kanban) Cards for a while at thetrainline but has taken time to make through the Pipeline. The brain child of Duncan Freke our Development Director, it is about aligning the Dev and Commercial teams on a Product basis but by doing so empowering the Product Owners to take ownership and derive true value for the customer.  Sounds like Marketing hype right? Yes, but that does not mean it’s not true.

Where do IT Operations fit in?

Given Clusters is primarily an evolution of Development and Commercial departments you would be forgiven for wondering why, as Head of IS Operations, I am writing about this change. The answer is simple, I believe IT Operations is not just about Availability but about the Service we provide to our customers.

For example, when I tell someone who I work for and the first thing they mention (after asking if I get cheap train tickets) is our booking fee I take it personally. Not because I think they are being cheap but because we have failed them. If the experience they have received – whether it is the performance of the site, the features they used or the information we gave them throughout the booking flow – did not give them added value equal to the booking fee then we have done something wrong.

This is why Clusters is important to me. We now have true ownership for the products we offer and as such my teams have a consistent place to go and explain the customer experience that we see – this does not exist with a project centric approach.

Why is Availability the wrong word?

So, after I have apologised to someone I just met for them not getting the value out of our site I explain what I do, but I actively avoid using the word Availability. It’s not that I don’t like the word or what it stands for, but for me it’s the wrong focus.

To explain why I am going to digress for a minute and draw an analogy with mobile phones. You may remember some of these beauties (I even had a pencil case that looked very much like the second from the left :) ):

mobile-evolution-1

Source: Kyle Bean (Mobile Evolution). Works licensed under a Creative Commons BY‑NC‑ND 3.0 License

Once mobile phones entered the mainstream they rapidly got smaller and smaller. However, as I was reliably informed some years ago by an Industrial Designer friend of mine, miniaturisation (in general) was not driven by consumer demand but was a natural result of improvements in technology. Availability of IT systems is no different. As the technology improves from dedicated hardware to virtualisation to Cloud computing, availability should increase as a by-product. Therefore to just focus on availability is like making a phone from the 80’s smaller and smaller, where is the ease of use? Where is the bigger screen, where are the Apps!!??

So, if miniaturisation was not the primary consumer demand, and therefore the right focus, what is?

Time

It’s Time. The thing that everyone wants more of, and are we willing to pay for. Time is the great demand of consumers, don’t believe me? Take a look at the retail cost of Laptops and see how much more you pay for a faster Laptop:

ProcessorCost

Prices courtesy of Insight UK 03/06/14

I find it hard to believe that the cost to design and make an i7 processer is ~£600 more per unit than an i3 even when you factor in the difference in motherboard, number of cores etc.? No, these are commodities, most likely made in the same factory and at similar cost. You are paying a premium for the convenience of time – saving yourself many seconds for each operation, which in total mean you can get more done.

If Miniaturisation is to Availability, Time is to…

Service! When you remove the focus on availability and understand Time is the new currency you realise that uptime is not the be all and end all of IT Operations – it’s just the beginning. Service is the key.

That means we need to be measuring real customer performance, we need to be tracking errors that slow customers down and we need to build robust compartmentalised systems that mean a problem in one area for one customer does not impede the ability of another.

It also means that as an IT Operations team we need to be able to respond to issues that come from the other departments (Contact Centre, Commercial teams etc.) quicker and crucially first time without the waste associated with ticket queues and handoffs.

Finally, we should see Clusters as our opportunity to be working closely with Product Owners, feeding back these experiences so that they can make Brilliant Products that constantly strive to meet customer demand and expectations.

This is why the change to Clusters is important, it should be a catalyst for the IT Operations teams to move from the traditional Availability is King model to one where Customer experience is King, that means being more proactive, more accepting of Agile, Dev driven processes like Continuous Deployment and always ask ourselves are we providing the best experience possible?

In short the new word is Service, not Availability.

How to use nUnit TestCase to simplify near-identical test cases

Often a situation faced by coders, especially when following test-driven development, is the writing of very similar test cases, changing only in, for example, the expected and actual values, along with some set up parameters. We often end up writing dozens, nay hundreds of near identical test cases, and end up with a test class that looks that it has suffered from a terminal case of copy-paste. This blog post shows a little-known technique for making this sort of test class a little more readable using the nUnit TestCase attribute.

Continue reading

Moving to Multiple Deployments Per Week at thetrainline.com

Here at thetrainline.com we have several useful online tools for helping our customers plan and manage their train travel, including Train Times and Live Departure Boards. We recently changed the way we build, test, and deploy these kinds of applications to enable us to release new features much more frequently and easily; in fact, we shortened the deployment cycle from one deployment every few months to multiple deployments per week.  These changes have produced a sea change in team culture, with a marked increase in product ownership by the team. This post describes what we’ve done so far, and where we want to go over the coming months.

Continue reading

thetrainline.com at Silicon Milkroundabout 6.0 – November 17th 2013

We (the tech team at thetrainline.com) will be at the Silicon MilkRoundabout recruitment fair on 17th November 2013, between 12 noon and 5pm. The event is at the Old Truman Brewery, Brick Lane, London.

Drop by and visit us on stand 17, and have a chat about what we’re up to!

thetrainline at Silicon Milk 2013

We were at Silicon MilkRoundabout 5.0, so look out for our dark blue stand.