Using mutation test to ensure the quality of tests

At cVation, we work actively with automatic quality assurance. When the tests pass I guess we can relax, pat ourselves on the shoulders and be sure that the code has a high quality. Right? Or not? How can you trust your tests with peace at mind? How do we test the quality of our quality assurance?

Thomas Lindegaard
Software Engineer

At cVation, we mostly ensure our quality with automated tests, which both help us secure the business logic straight away, but also protect us against regressions in the future. When we develop new features, they are accompanied by a series of tests, that ensure that the new code works as it should in different situations. When the tests pass, we can relax, pat ourselves on the shoulders and be sure that the code has a high quality. Right? Or not?


How can you trust your tests with peace of mind? The testing we do is also code, and that code could - like the code we try to ensure the quality of - contain errors. How do we know that the automated tests are testing what we expect? Is it possible for our tests at all to detect defects we have not been able to predict ourselves? How do we ensure the quality of our quality assurance?

Code coverage

We could write new tests of our tests, but to avoid this endless recursion we must come up with something better. Many use code coverage for this. This concept is about measuring how much of the code is affected through the existing tests. Let us take a concrete example. This simple method accepts a person and returns the person's title. The method is intended to return the person's name with “Mr. " in front, if the person is 21 years or older:

With code coverage, we can investigate whether all branches of the code are affected. In our example, code coverage will specifically be able to tell us whether the test covers the scenarios where we enter the if statement. With the following test we achieve 100% code coverage for the method:

Here we can point out two things that illustrate the shortcomings of code coverage. First, we only cover one of two outcomes, ie the one which has the prefixed title: "Mr. ”. Second, we can achieve the same code coverage without actually testing anything:

Here we will achieve the same 100% code coverage, but even if the code has an error, the test will never fail. Even with a code coverage of 100%, we can not declare that our tests are comprehensive. Code coverage therefore could point out the areas we certainly have not tested, but the tool is not strong enough to confirm that our tests work as they should.

Mutation test

As an alternative to code coverage, we use a technique called mutation testing. With mutation testing, you try to measure how well you have tested by mutating the production code and maintaining the written test code. So we try changing the code we test to see if the tests fail as a result. A simple mutation could be to change ">" to "> =". If the change causes one or more tests to fail, you call the mutation "dead", which is great. If, on the other hand, the mutation does not result in a failed test, the mutation has "survived". For each surviving mutation, there is a place in the code that can be changed without being detected. If you have many surviving mutations, you did not test your code well enough.

Here are two examples of mutations of our code:

Tools:

Mutation testing requires you to mutate the code and test your tests. It goes without saying that if one were to do this by hand it would be both unreliable and inefficient. Fortunately, there are tools which performs a wide range of mutations and automatically test the tests. These tools will create reports describing the surviving mutations.

Our results with mutation test:

In cVation, we used a hackathon to experiment with mutation tests. We used Stryker.NET on three ongoing projects. There is a difference from project to project and from class to class on how well we scored.

Here is a partial result where 76% of the mutations died:

There is one method in a class that gives a weird result. The method is void. All mutations of the code itself inside the method were killed by tests, but the mutant removing all the code survived. So there are no tests that detects the side effects of this method. That, of course, is fixed now.

Reflection

In the current version of the tool used, we found a few barriers. Among other things, it takes a long time to run in a Continuous Integration pipeline, which is why it sometimes timed out. This of course means that we can not use it as a gate in our pipelines. Instead, one can run mutation tests daily and continuously analyze on the results. Time grows exponentially with the amount and complexity of code, so a microservice architecture will reduce process time. Likewise, you can use it as part of the development process when writing new code. This means that you can choose to mutate only those areas of the code that relate to your work.

Mutation testing will help us upgrade our quality assurance of the software, ensuring that the tests you write and rely on are actually apt and provide value. When you start with mutation testing you could prioritize it by correlating mutation scores with a hot spot analysis, for example. By this, you could start to work with the areas in the code that change frequently - in combination with the fact that there are many surviving mutants.

We are now in the process of implementing mutation testing on a project to get deeper experience than the knowledge we could build on a one-day hackathon.

Read more about hot spot analysis in our second blog:
How to use hot spots to prioritize your technical debt

Technical debt is built up over time and can be much larger than you imagine. It is necessary to prioritize technical debt because one rarely has time to deal with all the debt at once.

Read the blog post now