About our App
The Trainline app is a ticket-reseller market leader with more than a million active users. So you can imagine that the quality of our app is one of the main things we care about. The ability to spot issues quickly and capture crashes is critical for the apps of this scale and complexity. And as we are always developing something new and updating the application continuously, we must make sure that we have a fast and robust way of making sure that we do not break anything on the way.
As with any responsible team, of course, we write unit tests for business logic, and we also have integration tests for some big system components. But we have started to get a sense that we are missing something and this something is quite big piece of the puzzle in order to get the right level of confidence about the quality and stability of our application as the team makes code changes from release to release. So we started thinking about this. What is missing? Logic is tested and covered. Even complex class interactions are covered, but what is missing?
Of course, one of the most crucial parts of the application to test is the UI! The most prominent piece of the application that people are guaranteed to come into contact with in their daily use of the app. If something breaks the UI, if something breaks in screen interactions, it will inevitably lead to a very obvious bad user experience and we definitely don’t want to upset the users of our app at all!
To address this issue correctly, we first identified any obstacles that we might face while working with a UI test automation suite:
Mobile UI testing challenges
In any mobile application, testing is a real challenge, and even more so if we are talking about UI testing. There are many hurdles and hidden “gotchas” that we face in this area. Here is a list of the three biggest ones:
- UI tests are brittle by nature. Sometimes they will fail just because of timing, sometimes because of the data from a service. And it’s always hard to understand when a failure occurred and what the reason is behind it. So it is very hard to make UI tests stable and reliable.
- UI tests are slow. No matter what tool you use, as your test suite grows, the tests will become slower. Yes, some tools are faster then others, but nevertheless, you will see some slowdown eventually.
- Maintainability and housekeeping. Automation should not be viewed as an afterthought, tacked on in a slapdash fashion. A team will stick with them for a long time, the number of tests will grow and the tests will become more and more complex. If we don’t pay attention to maintainability, it will come back and bite us! Failing to maintain tests properly results in a pile of code that will be hard to change, which means eventually having to throw it away and start again. So we should treat tests as first-class citizens, just like application code: maintain it, refactor it and love it.
I need also hardly mention that UI Testing is a team effort! It can’t be the responsibility of a single person. Without the whole team being involved, the tests will start to degrade and become an overhead rather than a benefit.
OK, we have a lot to think about and solve here. Each of the problems listed above is in itself a complex area in isolation and so in this post, we are going to focus on the first of them.
Part 1: Making UI tests stable and reliable
First we need to identify what causes instability.
Testing the whole application completely “externally” is basically trying to imitate every possible user action. This is huge. We have to face the fact that this will involve every single piece of code in every component! But out of all of these, one area really stands out. From the perspective of the mobile team, it is the server side components which we are generally not in control of and mainly just consume, and it is therefore this area, due to it being out of our control, that is at the highest risk of instability as far as we are concerned.
To minimise the impact of this, the best way to go is to replace the real service output with stubbing data, which is reliable because it always guarantees you the same result when calling it. And if the results of the service call are always predictable then the UI should always behave in the same way. And this is exactly what we want to achieve during testing: repetition of the scenarios each and every time with the same output, so if the UI changes and something stops working within it, we know for sure that a failed test indicates a problem with the UI itself.
I am sure, of course, that you may well be asking “but what if the server side service does actually change and the issue is with the app not handling the new response?” This is a perfectly valid question. And to make sure that this is not happening, we should also have set of tests running on the real service periodically: some integration end-to-end tests on key areas of interaction within the application so we can make sure all systems are working.
Selecting an approach
Having defined what we needed, we had to ask ourselves: “How we are actually going to do it?” What are the possible ways to implement stubbed responses that replace real services with minimum, or ideally no, impact on production code? Ideally, we want to avoid additions to code that we ship or test to minimise the potential risk of shipping test code to production, for example. If this is not possible, we should at least try to ensure that any such additions are kept to an absolute minimum so that we can easily control them. Also we really want to be testing the exact codebase that we will actually be releasing and not something that is specifically made for tests.
There are a few ways to implement stubbing responses in place of a real service and these can be divided into two categories: those which are internal to the main code and those which are external.
This approach involves intercepting requests internally from within the same application and replacing responses with stubbed data. So all data should be somewhere close to the application codebase.
There are many ways this could be set up – multiple frameworks exist to help us. For example,
OHTTPStubs or just using
NSURLProtocol for the interception of requests. But not all requests can be intercepted with such approaches as some frameworks, like
AFNetworking using custom
NSURLProtocol, need to be injected manually. This is also slightly too much of a code intrusion.
The second potential way of stubbing internally is to set up an internal service with some kind of framework like
GCDWebServer and redirect requests to that service. However, by nature, XC UI Tests are sandboxed and you can’t use an internal service solution like this in our case, because an application is not able to access this service when it is set up using automation.
We concluded that adding to the codebase of the main application is very intrusive and that there is a danger that this could result in the code becoming unmaintainable in the future.
We wanted to minimise any intrusion in the main code of the application in order to achieve the same effect and exclude any test code in main compilations. So we looked more closely at external stubbing. This is when the stubbing data is stored outside of application and imitates a real service.
Using this approach it is possible to minimise the negative impact of any code modifications to the stubbed application and to avoid storing any test data inside the main codebase as all of it will be stored in the external stubbing service. Also, the test suite will be able to control stubs in such a way as to achieve scenarios we are actually testing with 100% predictability.
One further benefit of using this approach is that tests run are faster, because the stubbed service is not doing any heavy data processing, and will respond almost immediately to the application.
To build an external service for stubbing data, we considered
Ruby on Rails as both have the benefit of being very easy to work with and flexible enough for such a task. The main points of comparison were: the ability to work very fast with it, support for templates and fast response. With this in mind,
Node.js was a perfect fit. A nice framework based on
First we chose
Express for routing and we started building a service around it that was able to serve static files as responses in replace to real ones. But we soon came to realise that we were going to need to change response based on test needs. To do that we created a simple system of scenarios. Basically, all response files where split in folders categorised by scenario and tests could send a little control call to the service to replace scenario currently being played out and with that change the responses that the service would serve to the main app.
To complete the picture I need to explain how we magically redirected all responses to the stubbing service. Simple –
NSURLProtocol. We added a custom URLProtocol which we could switch on and off from the test bundle. And inside this protocol we intercepted all requests and redirected them to a fake service. Also to be able to work with remote service in secure way to match test that run with app instance we introduced concept of
GUID's for this sessions and added
GUID to request headers. This is what the integration looks like:
We had this setup for a few iterations and we began to see that the amount of files started growing exponentially with every new automation scenario which we needed to introduce and we started to worry about maintainability if we were to allow things to continue like this.
We decided to replace this with something more flexible that we could control and maintain and still allow tests to specify what data they expected to be served to the main application. At this moment we realised that we needed templating and we picked up
Mustache.js for this purpose. By templating, we can generate the responses we need based on the data we are putting in and the test can actually control and serve this data to the stubbing service. Then we actually looked up and chose
Handlebars.js – which is a smaller version of the
Mustache engine. So now the interaction was more like this:
After all of this we made one more improvement – we combined all data spaces into one model, so we were able to hold one model per
GUID and we could do slightly smarter things with that model such as reacting on some calls to modified stubbed data. Sometimes it just makes life easier for an automation engineer to have a smarter stubbing service and in some scenarios it helps to reduce manual effort to control stubbing from within the test.
Output and Results
We have carried out a huge amount of work to define and stabilise a stubbing solution for our automation tests and we have gone through a few iterations to improve our technique and approach, encountering a few bumps on the road because of some of the complexities of the task itself and the amount of work it required. But having overcome such difficulties and seeing how much the team has benefitted from it, we are in no doubt that we did the right thing. So in summary, we had a reliable source of data for our tests with the ability to control it in a way that allows us to test really tricky scenarios easily. In addition to this, the stubbing solution has ended up becoming “universal” and we are able to use the same service for analytics testing and also for automation tests on other platforms.