Objectivity and prioritisation in conversion optimisation

Published in

Trainline’s Blog

7 min readAug 30, 2017

In my last post I discussed the merits of using a scientific approach in optimisation testing. I used a real experiment as an example and demonstrated how a good hypothesis helped us with our development process.

During the writing of that post, there were certain aspects that I wanted to cover but didn’t get a chance. One of those was about the process of prioritising tests for development — i.e. what do we spend our time and effort on?

We, as a team, decided early on that we wanted to be as scientific and objective as possible in our approach. We also expected/hoped for a test win-rate between 20–30% (this is what I’ve heard mentioned as an industry average). Since we were such a small team, it was important for us to launch as many tests as possible in order to get a high number of wins.

In this post, I’ll be covering the process we used, how it worked for us, and also how we optimised to further enhance productivity.

Prioritisation matrices

A prioritisation matrix is basically a list of criteria to score “something” (eg. test ideas) against. Scoring test ideas helps prioritise based on the highest return on investment of time and effort. This is absolutely vital to the success of our workstream.

Here’s an example of what our matrix looks like. It’s populated by dummy data in case you’re wondering:

The idea is that the experiments with the greatest potential return on investment will naturally float to the surface. Thus help us deliver tests at a high velocity, with a decent win-rate.

For us, our main goal was increasing overall conversion so all our criteria were set up to reflect this. The four criteria that we used are: Potential, Importance, Ease and Reusability (PIER).

Now, imagine we’re scoring a test idea and I’ll take you through these criteria in more detail…

Potential

This is made up of two parts:

1) How badly is a particular page or surface area doing, and what is the potential for improvement?

We can determine how badly a page is performing based on the drop-off rate of that page. The higher the drop-off rate, the “worse” that page is, and the more scope there is to optimise. We could also use customer feedback to validate our findings.

It’s worth pointing out at this point that we had previously gone through an exercise in which we graded every page in our main funnel based on the drop-off rate. The page with the highest drop-off rate got a score of 10, and the rest of the pages were marked relative to this performance.

In our weekly meeting, where we score tests, we would identify the page that the test will be on and give it the relevant score. Easy enough. It’s nice and objective, too. But this is only half of the score, the other half is…

2) How good is the test idea, and what is the potential for this idea to improve conversion?

This involved discussing the test idea as a team and agreeing on a score for the potential of this specific idea. In order to keep things from becoming too subjective, we ensured that all discussions are backed up by analytics data, customer feedback, and/or past test results (heuristics). One or more of us would facilitate to ensure we stuck to this rule.

The overall Potential score is a combination of the two parts above.

Example: for an experiment idea where the page drop-off score is an 8, we would half that score, and then we’d increase it again based on the perceived strength of the idea. The highest this test can get is an 8.

Importance

This is simpler than “Potential” and much more objective. This is because Importance is all about traffic volume. Again, it’s worth mentioning that in an earlier exercise we graded every page in our funnel based on the amount of traffic they got. The page with the most traffic got a score of 10 and the other pages got scored relative to this.

It’s worth noting that a test group is not necessarily all traffic to a specific page. A test group is defined as “users exposed to the test”.

For example, at Trainline we have a dynamic search results page. So if our test idea meant that only 50% of that page’s traffic were exposed to it, then our Importance score would reflect that — i.e. it would be 50% of the overall page score.

Ease

Ease is about how easy and quick it is to make the test a reality. This score is mainly based on the ease of development, but we also factor in stakeholder sign off, legal implications etc.

Each test gets a score out of 10 (10 being the easiest).

Reusability

We started off using Reusability as a means of considering the reuse of developed assets from an engineering perspective. So if the code developed for a particular experiment could be reused, we’d score the test favourably using this criteria.

However, we soon broadened the scope of the Reusability criteria to consider the wider picture of our overall business goals, i.e. what would we learn by running the experiment? Running experiments favouring learning can help us develop new test ideas as well as potentially allowing us to evolve the product with bigger step changes. So, while the other criteria are very focussed on conversion, Reusability gets us thinking about other ways that the test could benefit us.

As with the other criteria, the score is out of 10. With 10 giving us maximum learning and technical reusability.

How we found the process?

Our goal was to become more objective in our prioritisation, and we definitely saw that. Test ideas that one or more of us didn’t like regardless scored well and turned out to be wins. This is a definite benefit of letting objectivity dictate our priorities.

In terms of productivity, despite being a small team, we were able to reach a good velocity in terms of test development and deployment. This was due to our ability to identify the tests which were quick and “cheap” to run, yet gave us good returns.

We still thought there was room for improvement, though…

How we optimised our process

There were three specific problem areas that we identified:

1) We thought there was still too much subjectivity in our meetings (especially around the second part of the Potential score).

2) The Reusability score seemed to sway the final result too much and we felt we weren’t getting the benefit from it as we once did.

3) We were taking too much time scoring tests as some of our discussions were lengthy. That meant that less tests were getting scored in our sessions.

We decided to cut Reusability completely from our process altogether, which saved a lot of time in our scoring sessions.

We also changed the way we scored Potential. The first part (the one based on drop-off rates) was fine, but the second part (the potential of a test idea) we thought could do with being more objective.

What was our new process for scoring potential?

Imagine we had a test idea for a page, where the drop-off score was an 8.

When scoring the potential for the test idea, by default we’d assume a 50/50 chance of this test being a win. That means the test would score a 4 for overall potential.

We’d then use analytics data, customer feedback, heuristic analysis to add or subtract percentage points to or from that score.

For example, if there were data — either analytics data or heuristic data — to support or counter a hypothesis, then we’d add/subtract 20% percentage points, respectively. In the instance where customer feedback existed, then we’d add/subtract 10% points.

The maximum a test can score here is still 8 — for this example, at least.

How was our new process?

The first thing we noticed is that we managed to score noticeably more tests in our meetings! We were even able to reduce the time for our meetings, as well. A great bonus, giving us more time to build tests.

We also found that the meetings became much more objective as having such a strict framework for scoring forced conversation to be fact and data-based.

It also seemed possible that anyone could run this meeting provided they had access to the relevant data-sources and adhered strictly to the meeting format.

Conclusion

Our goal was to best utilise the efforts of a small team in order for us to be as productive as possible. It was also to run a testing program that was objective and data-driven. I believe using a prioritisation matrix helped us achieve that, and by optimising our process, we were able to tweak the way we work to get the best performance we could.

I’ll be the first to admit that prioritisation matrices don’t always work for every situation, but where you can measure outcomes and goals, and where you can run a workstream with objectivity, I think using one is the key success.

Thanks to Tim Stewart for recommending PIER to us.

About the author

My name’s Iqbal Ali and I’m a conversion optimisation specialist at Trainline. I’m interested in all aspects of the optimisation process: ideation, building and analysing of tests. I’m also passionate about learning new stuff.