A Philosophy of Testing 6: The Quest for Determinism
Having discussed what sorts of tests to write for Scala services, and how extensively to test, let’s change gears and talk about a more basic principle: reliability.
How many times has this happened to you? You commit your code, open a PR, your integration pipeline runs — and the unit tests fail. This is mysterious, since they worked for you locally. So you kick the CI server, and this time they pass. You have no idea why it failed the first time, or why it passed the second; you’re just left with a lingering sense of nervousness.
Flaky tests are the bane of every software organization. Over the past 20 years we’ve come to rely on test automation to keep things moving along quickly — a test suite that can prove out your changes in a few minutes gives you a lot of confidence that you haven’t broken the world. But tests that fail sporadically break that confidence. Is it the test code that’s flaky, or is the software itself subtly broken? It’s easy to lose days figuring that out.
Determinism is Your Friend
The way you avoid this is by making your tests, and the code itself, as deterministic as possible. That flakiness is usually the result of some sort of non-determinism — or to put it more plainly, randomness.
Avoid randomness and non-determinism: strive to make your system as deterministic as possible at test time.
What sort of randomness? Most often, the problem is timing. Modern applications (remember, we’re talking mainly about Scala services here, but this applies more broadly) tend to be multi-threaded, often massively so. It’s not unusual to have dozens or sometimes hundreds of threads running. Worse, in modern Scala code it’s common to have thousands of logically parallel pathways running in fibers.
This provides a host of ways in which things can get slightly off. You can have a “slow” pathway that always finishes after the “fast” pathway — except for the one time in a thousand when a garbage collect happens at the wrong moment and breaks your assumptions. You can have a test that always runs in less than a second, so your 10-second timeout seems totally reasonable — except for the occasional hiccup where it isn’t.
Timing isn’t the only problem: if you have anything in the system that is random (intentionally or otherwise), you need to be very careful to control for that in your tests. Scenario tests shouldn’t be random — if you need to test a bunch of edge cases, do so explicitly, rather than leaving it to the whims of fate. In other words, if you have something generating randomness, make sure that at scenario-test time you can substitute something that is not at all random, and that you explicitly test each category of possible result.
One of the easiest things to control is too often overlooked: your clock.
This problem is common enough to be worth a simple rule:
Never, ever use
Instant.nowfor anything other than logging.
Yes, that seems a bit extreme, but it’s worth following. If you use the actual system clock for any sort of business logic, you are opening yourself up to non-determinism. A time span can, during your test, take ten milliseconds or ten seconds — you really don’t know, and that variation can break your tests.
And it’s easy to avoid — just make sure that you are always using a dependency-injected clock mechanism that you can control. It really doesn’t matter how you do it, whether you’re using Guice, Macwire, a Reader Monad like ZIO, or whatever — just make sure that your application is using a clock that your tests can control precisely.
Make sure that your controllable clock is laced through the time-based parts of your application, and provide fakes for any dependencies or subsystems that are time-based. Do you have a scheduler? Make sure that advancing the clock triggers the scheduled jobs. Build the infrastructure correctly upfront, and your system can literally run like clockwork.
Once you have that, make sure that any tests that are time-based take that into account. Don’t send a message and wait ten realtime seconds for something to happen — send the message and step the clock forward by ten seconds! It’s not only more deterministic, it will make your tests run much faster. (And the faster your tests run, the more often you’re likely to run them.)
Don’t Wait for Non-Events
One more aspect of this: sometimes, you need to be testing for nothingness. That is, you send an event to your controller entry point, and you specifically want to check that, in this use case, nothing observable happens. For example, you might be receiving an incoming stream that calls an external API every tenth message, when a buffer fills up. Testing that it calls after ten messages is easy; testing that it does not call on the ninth is trickier.
This is a special but common situation, and folks often just ignore it because it’s a nuisance. When they do deal with it, it’s often the source of the very worst non-determinism: they will send the event, wait until some sort of timeout, and then throw up their hands and say, “Okay, nothing happened, so I guess we’re good.” Which might be correct, or you might just not be waiting long enough and you’re actually wrong.
Don’t ignore it, and don’t wait: add code to your system that is solely for the benefit of the test harness, to say “I decided not to do anything”, and listen for that in your test. We’ll talk about how to do this more in the next section.
Waiting for a non-event is always non-deterministic; you always want to be listening for a real signal that you expect will happen. Highlighting that:
Timeouts during unit tests are almost always a bad smell.
Occasionally you have no choice, but usually you can do better.
Test Your Race Conditions
Finally, one of the most pernicious sources of bugs is race conditions in your code. Obviously, you should avoid those being possible whenever you can. But what if you can’t?
For this sort of situation, I again recommend adding hooks specifically for the test code. Regardless of whether you are based on
IO or whatever, you can inject steps into your async logic that are pure no-ops during runtime, but which deliberately block during testing.
When you suspect a possible race condition, use these to manage your threads during the test. Block thread A until B gets to the critical point, then release it and see what happens. Delay B until A times out. If you think a race condition is possible, test it and make sure that it isn’t doing anything dangerous.
This is persnickety stuff, and I will admit that I only do it occasionally — it’s far better to avoid possible race conditions at the design level. But when you can’t avoid them, make sure that you are testing them.
Note the implication of much of the above: you need to write your code specifically to make it testable. I’ll have a lot more to say about that in the next installment, including some recommendations about a useful
TestHooks approach that I often use.