5 October 2016

One of the most difficult problems when writing programs is making sure that they do what you want them to do. Programs have a way of getting complicated, for reasons like performance, additional features, and so on. In spite of that complexity, we want to have confidence that the program is working as intended. Fast-slow testing is one technique that is common in competitive programming, but I haven't seen in other contexts.

The Scenario

The primary contests that I participated in were the USA Computing Olympiad and the International Olympiad in Informatics, which have similar styles. One contest day consists of three or four problems and three to five hours to work on them. The problems generally involve reading data from an input file, performing some calculation, and writing it to an output file. They are algorithmic contests, meaning a big part is in figuring out an efficient method to do the calculation.

For most of the time that I competed, very little feedback was given to the competitors during the contest itself. Each problem would include a more-or-less trivial sample test case, and the grading system would ensure that the correct answer was given for the sample case, but not give any feedback beyond that. Since the sample case would both be very simple and very small, this amount of feedback is close to useless. But because a logical error causing the program to fail on more complex or larger test cases would be detrimental to your score, it's important to be able to ensure correctness in spite of the lack of feedback.

Since IOI 2010, things have changed a little, and contestants now get some more feedback on their solutions, usually in the form of their score on the problem. In particular, they now know whether their solution is completely correct (since they'd get a perfect score), and so this sort of testing would only be done to find errors, rather than ensuring correctness. Note that the feedback does not include any sort of test case where your program fails. It is part of the contest to figure out your errors with your own work.

Because these are algorithmic problems, a lot of work is on finding algorithms with better asymptotic complexities. A problem might have an obvious \( O(n^2) \) solution, but if \( n \) can go up to 100,000, that has no chance of finishing within the one second time limit on the larger cases. Sometimes the obvious solution is even worse, maybe even exponential. But usually there is an obvious solution.

Fast-Slow Testing

Fast-slow testing consists of three parts: the program you want to test (the "fast"), a program that you are confident is correct (the "slow"), and a test case generator. Then you wire them together so that the test case generator makes a test case, you run both programs on that case, and make sure that the output is the same. Because the test cases are automatically generated, it is low cost (in terms of programmer time) to run the testing on a lot of test cases. Generally I'd try to run the solutions against each other on at least 10,000 cases. This kind of testing works because of a few observations.

The problem usually has an easy, but slow solution

As a very simple example, let's imagine that you're writing a program to find shortest paths. You aren't completely confident in your Dijkstra implementation (perhaps the parameters of the input are such that you implemented a Fibonacci heap to get \( O(E + V\log V) \) runtime). But you know for sure that you can implement Floyd-Warshall because it's five lines that are easily memorized.

The difference between \( O(E + V \log V) \) and \( O(V^3\) ) is huge, but implementing the \( O(V^3) \) solution takes only a little bit of your time budget, and gives you something to compare against. Furthermore, in a contest setting it is useful to write these slow solutions in case you do not finish debugging the fast solution before the contest ends. In that case, you can submit the slow solution and still get some amount of partial credit. In fact, it's often worth it to write intermediate programs to increase that partial credit. So before going all out to get \( O(E + V \log V) \), you might write a \( O(V^2) \) or \( O(E\log V) \) solution.

Complexities in input appear at small sizes

The main potential worry about fast-slow testing is that the slow solution can only handle small test cases. However, in practice this is almost never a concern. If your slow solution is \( O(n^3) \), it will still work plenty fast for \( n = 100 \). So for example in the case of shortest paths, if your fast solution gives the wrong answer in some cases, what's the chance that it works perfectly for all graphs with less than 100 vertices? Even an \( O(2^n) \) solution will generally be fine for \( n = 14 \), which is plenty of room for the bulk of the problem complexity to show up.

This observation also means that fast-slow testing is a great way to find an error in your program. Since any error is likely to appear in small cases as well as large cases, you can generate small test cases until one appears where you get different results from the fast solution and the slow solution. Working through that test case manually shows you if the problem is a logic error, a weird edge case, or something else.

Test cases are easy to randomly generate

Generating an input is just making a file that adheres to the specified input format. Fast-slow testing is effective even if you make no effort into trying to hit the edge cases. Especially at small input sizes, edge cases are plentiful, so generating test cases randomly will hit them if you try enough cases. So generating a random graph can be as simple as putting down random edges. If it's a weighted graph, make the weights random too.

Enabling this sort of testing is a big benefit of "purity" in programming. The idea that you can write down the entire input, and understand the output as a function of that input that doesn't depend on anything else, is essential. If the output depends on something you can't write down, then how are you going to really understand that your program is correct?

Variations

Some problems are not well-suited for fast-slow testing as I've written it above. For example, a problem might ask for any one of a number of valid solutions. In that case, it'd be no surprise when your fast solution gives a slightly different output from the slow solution. In cases like that, rather than directly comparing the output, it is useful to write an output verifier. In some cases, you don't need a slow solution at all. For example, if you were writing a sort algorithm, it is relatively easy to check that the output is a sorted permutation of the input.

However, sometimes the slow solution enables easier output verification. To go back to the shortest path example, there might be multiple shortest paths, so just comparing the two outputs will give some false alarms. But it's also not easy to verify that a particular path is the shortest. That's where the slow solution comes in. The slow solution doesn't include all of the shortest paths, but it does tell you how long the shortest path should be, so then you can check that the fast solution is a valid path and has the same length as the path from the slow solution.

This variation bears a striking similarity to property-based testing, as popularized by QuickCheck. You can even view the basic fast-slow idea as a special case of property-based testing, where the property is that it agrees with this other program.

Another variation involes changing up the test case generation step. Sometimes, a problem is particularly well suited to generating a test case alongside its answer, where it might be harder to write a slow solution. You need to be careful in that case, because it is surprisingly easy to write bugs into your test and answer generation.

I'm not sure why I've heard so little about this testing style since leaving competitive programming. It is one of the only ways that I've really been able to be confident about the correctness of a program. Even though I haven't explicitly compared two programs over a large number of inputs since then, there's a very noticeable impact on how I work. I place a huge amount of emphasis on being explicit about the input (especially the format of the input), and a precise description of the output, which are integral parts of fast-slow testing and a pillar of a good competitive programming problem.