A Philosophy of Testing 7: Code for Testing

Mark "Justin" Waks
8 min readNov 23, 2021

Back to Part 6.

In the previous post, I made a number of suggestions about how to build your system in order to make it reliably testable. There’s a bit of controversy nestled in there, so let’s tackle it head-on.

It’s not unusual for folks to take the viewpoint that testing should be utterly, completely external to the code under test. There are some valid arguments for that — in particular, at least in theory you want to maintain separation of concerns between the application and its tests.

More importantly, it is easy to naively intermingle your test code into your application in ways that compromise security: there are too many horror stories about people who put backdoors into their system for testing, didn’t lock them down properly, and got exploited.

Those are valid concerns, and should be kept in mind. But I’m going to argue for a middle ground:

Your code must be reliably testable. It should provide whatever hooks are needed in order to test it effectively.

As always, there’s a lot to unpack there, so let’s dive into it.

Testing is a Critical Requirement

All too often, folks think of testing as something separate from the code — you write the code, and only afterwards do you test it. Test automation is often treated as menial work, to be written by less-skilled engineers after the fact. That is pernicious nonsense, and I’d like to encourage you to stay far away from that sort of thinking. (In particular, if you think it’s a menial task, you probably aren’t testing mindfully.)

Quite the contrary: test automation should be treated as a requirement for the application, and insofar as is reasonable, a requirement for every story. In a healthy development environment, you should be required to demonstrate that each feature story works, with an automated test illustrating that.

Now, that’s not an unusual viewpoint. But there is a corollary that I’d encourage you to draw from it:

If your code doesn’t provide a good means to test it, the code is incomplete.

That really is a corollary: if testing is a requirement, then you should be taking that seriously. That ties back to the last post, about determinism — if the code isn’t reliably testable, then it’s broken.

And the thing is, testing well often requires complicity from the code under test. Drawing a hard line of “the application should know nothing about its test harness” is just plain unhelpful — it’s a dogmatic position that doesn’t take reality into account.

So the middle ground runs like this — the application should:

  • Be structured in such a way as to allow the test harness to fake and instrument the various subsystems as necessary.
  • Provide “hooks” that allow the tests to deeply understand what is going on in the running code, again as necessary.

We’ll tackle those below.

Structuring for Tests: zio-test

The ZIO framework has a tendency to be rather all-or-nothing, and isn’t everyone’s cup of tea. But it and its bespoke test framework provide several good simple examples of how to design your system for testability.

ZIO programs generally handle dependency injection via the R type parameter on each class or method — that defines the “environment” that this code is running within. (In other words, its dependencies.) Normal ZIO apps come with several such dependencies built into the default environment, including:

  • Clock, which lets you fetch the current time and schedule events.
  • Console, which handles console I/O.
  • Random, which lets you obtain random numbers and strings.
  • System, which provides access to environment variables and properties.

These come for free via the default ZEnv environment, and each is basically an interface that is dependency injected.

In the zio-test harness, though, you get the TestEnvironment, which looks the same but provides test-controllable versions of each of these interfaces:

  • TestClock, which lets you jump the time forward or pin it to a specific time.
  • TestConsole, which lets you specify what console input will be provided, and gives you access to the any output being sent to the system console by the program.
  • TestRandom, which lets you specify the exact random values that will be generated.
  • TestSystem, which lets you provide test values for the environment variables.

These are specific example, but they nicely illustrate the underlying principle. Any time that your code might depend on an externality — especially anywhere that it might depend on something non-deterministic like time or randomness — you should be using an interface that the test harness can override and control. That gives you the levers you will need in order to write great tests.

Summarizing the idea:

Everything external to the code itself should be designed to be test-controllable.

Structuring for Tests: RqClient

Building your code for testability doesn’t have to be limited to simple things like random numbers — pretty much every aspect of your interactions with the external world can and should be designed to be testable.

For a more serious example, I’ll go back to my last employer, Rally Health. (Now part of Optum Digital.) Rally is microservice-oriented, so there are a lot of RESTful HTTP calls involved — most services have lots of calls to other services.

Some years back, folks at Rally (I’ll particularly call out Doug Roper, who spearheaded this excellent project) built an abstraction around those REST calls, to future-proof it. Over time, that has evolved into a really lovely library, which allows you to describe your call in terms of the “what” — GET vs POST, the URL, the parameters, how to handle the results — in terms of a typeclass-centric interface.

The primary goal there was to be able to write business logic without being bound to a specific HTTP library — a fine thing in and of itself — but it came with an additional benefit: that makes it easy to slap in a fake implementation at test time. By substituting a FakeRqClient in place of the real one, you can avoid doing HTTP calls at all (since even local loopbacks take some time), and just replace them with an emulator for the external service you are calling.

As mentioned in previous sections, this sort of emulator can be quite simple: it’s usually enough to just implement the entry points that you will be using, with a few in-memory maps instead of a real database. When you think in terms of “what does the service under test actually need?” instead of “what does the emulated service really do?”, you will often find that you can get away with a pretty slim piece of code. It just has to look sufficiently real to your calls; it doesn’t have to be at all real.

While this emulator wants to look real when called by the business logic, you almost always want to make it very transparent to your test harness. Build it so that you can inject specific test data before the test begins, and let the test examine the state of the emulator at any point, and you will make it much easier for your tests to see exactly what is going on.

A fake HTTP client like this also has other benefits: for example, specific tests can tell the emulator to return HTTP error codes like 404 or 500 for specific calls, to let you easily test what happens when you get downstream errors. I generally build my emulators so that I can substitute specific return codes for specific URLs in a particular test — that is usually sufficient to be able to test a complex scenario that does a lot before hitting the error.

Done properly, this gives you highly reusable machinery that can apply to dozens or hundreds of tests unchanged, while still allowing you to specify mocks to test weird edge cases.

TestHooks

Less obviously, there are times when you simply need your code to be more transparent to the test harness — where the simple “send some input, check the output” isn’t good enough.

Why? There are a variety of reasons, but most of them boil down to situations where we don’t have enough visible evidence that things have worked correctly. This breaks down into two major categories:

  • What happens is a side-effect. For example, a call to the service results in a change to an in-memory value.
  • Nothing happens, and that’s good. For example, this call is redundant with the current state, and we intentionally don’t want to change anything.

The latter case is often the killer. Folks often approach it by using timeouts — they make the call, wait ten seconds, and if nothing changes, then they figure that it did the right thing. But that violates our determinism goal: if you are dependent on timeouts, your tests will occasionally fail spuriously because of a garbage-collect or network delay, causing a lot of head-scratching and fire drills. It’s not a good approach.

Instead, I’ve found it best to provide a way for the code to tell the tests exactly what has happened. To that end, I will often provide a service that looks something like this:

trait TestHooks {
def event(evt: => Any): Unit
}
class TestHooksImpl() extends TestHooks {
def event(evt: => Any): Unit = {}
}

Obviously, adjust this to suit your world — for example, in an FP environment, event() would return IO[Unit]. But that is the core idea: a trivial little module that allows the code to say "something happened" in a lightweight way.

Note that TestHooksImpl is sort of a fake. That's because, in the real running system, you want to be sure that this system, by design, does absolutely nothing. (That is how you avoid security problems.) The real version exists only at test time, and is something like:

class TestTimeTestHooks() extends TestHooks {
var eventsRaised: List[Any]
def event(evt: => Any): Unit = synchronized {
eventsRaised = evt :: eventsRaised
}
}

That’s a bit crude and simple — adjust it to suit your coding style — but it should give you the idea: when you are running your scenario tests, you inject this useful “test” version instead of your empty “real” one, allowing your test code to examine the events that have happened, and confirm that things are doing what you expect.

Given this model, it is cheap and easy to lace “this just happened” events around your code as needed:

case class InputDiscarded(name: String)
...
if (needToDoSomethingTo(name)) {
goDoTheThingTo(name)
} else {
testHooks.event(InputDiscarded(name))
}

Note that TestHooks.event() is call-by-name, so in the empty “real” case, it won’t even spend the time building the case class instance — it just won’t do anything. So it is both safe and inexpensive to add these to your code wherever it is helpful. And then, in your tests, you can do something like:

callThatShouldNotDoAnythingWith(name)
eventually {
assert(testHooks.eventsRaised.exists(_ == InputDiscarded(name)))
}

The benefit, again, is determinacy. Instead of setting a timeout and going, “welp, nothing happened so I guess we are okay”, you wait for the event to actually be raised. You’re not looking for side-effects: you’re asking the code to say, “I decided not to do anything here”. That allows you to build much cleaner, more-reliable, faster tests.

I encourage you to play with this idea, and see how it might work in your world. Once you accept this notion that the application code can and should be in conversation with the test harness, there is a lot you can do with it. For example, I sometimes add a TestHooks method that blocks the current processing until the test releases it — that allows me to write deterministic, reliable tests for some thorny, non-deterministic race conditions. (And as always, that method is empty in the normal running system.)

Turning that into a bullet point:

Provide a way for your code to tell the tests about side-effects and non-events.

Summary

When you think of testability as a requirement of your application, not just an after-the-fact checkbox, it changes the way you think about structuring your application. You want to code so that everything outside the application proper (including basic environmental factors like the time) can be easily controlled by the test harness. And you want to connect the application and test harness via lightweight mechanisms such as TestHooks, so that the code can provide the tests with the insight that they need.

Next time: Conclusions, at least for now

--

--

Mark "Justin" Waks

Lifelong programmer and software architect, specializing in online social tools and (nowadays) Scala. Architect of Querki (“leading the small data revolution”).