A Philosophy of Testing 4: Scenario Tests

7 min readNov 15, 2021

Having argued passionately, in the last article, against writing too many unit tests, that begs the question: what should you be doing instead? In this part, I’m going to lay out the Scenario Test approach.

Stepping back a bit, it’s common for folks to focus on two different kinds of tests:

On the one hand, there are unit tests — atomized little tests, typically focusing on a single class or just a few.
On the other hand, there are integration tests — “opaque” tests that run the application for real, often in the context of other running services, only looking at it from the outside with an external test driver.

There’s a pretty huge gap between those extremes, and we are going to focus in between there. For Scala services (remember, not libraries), the most valuable tests are often smack in the middle:

Scenario tests test the application as a whole, transparently, in-process, with heavy instrumentation.

I should note that the term “scenario test” is my own, reflecting the fact that you tend to focus on realistic scenarios of usage. The concept isn’t a rare one, but I haven’t found a consensus term for it, so that’s what we’ll call it through this series.

While they are much broader than unit tests, they are usually run in the same way as unit tests, using a framework like ScalaTest in conventional ways. That’s part of the difference between scenario tests and integration tests — scenario tests should still be pretty lightweight and fast, suites that you can run dozens of times a day while you are developing.

Structure of a Scenario Test

The key idea for scenario tests is that you are mostly running the entire service normally. You generally don’t mock any of the internal components, you use the real ones.

But — that’s only for the internal components. All of the external bits — the data stores, APIs, and so on — typically get replaced by fakes, emulators that are good enough for your needs.

(Note that I don’t recommend mocks, even here: as discussed below, a proper emulator is somewhat more work upfront, but it usually pays off nicely.)

The idea is that you are testing the whole service, but wrapping it so that you have complete control over, and insight into, everything it is communicating with.

Note that it is okay (and common) to add subclasses and hooks that only exist during testing. I’ll talk more about the appropriate use of these in a later article, but the high concept is that you often need more transparency in your code, in order to precisely set up a scenario, or to check that side-effects are happening as expected. The key, though, is that these generally aren’t mocks — they are tweaked subclasses that are largely the real code with some slight adjustments.

As for the test themselves, they’re somewhat like small integration tests: you are calling the real APIs of your service, controlling all of the external systems that are involved, and confirming that the results and side-effects match what you expected. It’s kind of like unit testing, but the “unit” is your application as a whole.

It is often convenient for these “scenarios” to get fairly involved, sometimes dozens of API calls long. That’s fine — indeed, when creating a new subsystem I will often create a large “happy path” spec to go with it, that exercises the core functionality in a variety of ways in one go. That makes a great smoketest that you can run extremely frequently, with all the edge cases off in their own separate tests.

Why?

To understand why scenario tests are usually better than unit tests for services, let’s turn it around and ask: where are the bugs?

Consider: classic unit tests are all about testing classes in isolation. They are checking whether this class does exactly what you expect it to do.

The problem is, that’s not where the bugs mostly are. In plumbing-centric applications (that is, most of the code in most services), the bugs are often in the interactions between the components. In particular, bugs sneak in when you tweak the semantics of class A in a way that class B doesn’t expect.

(This is where some of the purists chime in and point out that, if this can happen, your types aren’t strong enough — you should use stronger types that represent the contracts between components more precisely. That’s often true, but in practice not a lot of application code is that strongly typed. Making contractual errors impossible is a great ideal, but not often fully realized.)

Testing isn’t something you do because it’s fun to write tests — your tests should be all about uncovering bugs, as consistently as you can reasonably manage. And if the bugs are mostly in the interactions, then you need to be testing those interactions, in realistic scenarios. And the easiest way to test the interactions is to use the entire application at once.

Emulating External Services

Unlike the mocks in a typical unit test, you can build most of the infrastructure once, since it is generally just “doing the right thing”. The goal is to create simple but realistic emulators for all of the external services, that generally work as expected.

At this point in the discussion, folks tend to pipe up with the question, “Doesn’t that mean we’re completely reimplementing those services?” The answer is no, for several reasons.

First, you don’t need to implement everything, just the bits you actually care about. It’s routine to find that the external service has two dozen entry points (or two hundred), but the service under test only calls three of them. So you only implement those three.

Second, it is usually way, way easier to build an in-memory emulator than a full service. You don’t need to deal with persistence, or scaling, or initialization, or any of the standard infrastructure of a real service. It’s common to find that a good-enough emulator is less than 1% of the size of the thing it is emulating, sometimes much less.

And while, yes, building such an emulator takes effort, it amortizes wonderfully, because you are building it once. Mocks are simple, but you generally need a different mock for each test. Emulators are a bit more complex, but you can then usually just plug them into your test harness and use them for hundreds of tests. The overall effort is far lower, and it’s usually easier to understand how it works because it is consistent across your tests.

You’ll often find that you want the emulator to expose lots of hooks to the test harness, allowing the tests to inject specific data, overriding the default behavior so you can test specific error conditions, and examine the state after the test. A few getters and setters will go a long ways there.

Sometimes you don’t even have to build it yourself. For example, the easiest way to emulate Postgres or MySQL, at least for simpler database applications, is often to drop in the H2 in-memory database. It isn’t an absolutely perfect emulator, and lacks some of the more complex functionality, but it is often plenty good enough, and it is quite fast.

Using the Real Thing

There’s a reasonable case to be made that, if you’re trying to faithfully emulate an external component, why not just use that external component? You can’t get much more faithful than that.

That’s entirely reasonable, and I’ve sometimes done that. But I’ve often wound up regretting it, mostly because of speed. Remember, scenario tests are something you want to be able to run frequently during development, and that means they need to be fast — a good scenario test should take tens to maybe hundreds of milliseconds to run, so that the whole suite doesn’t take long.

Even the best external service is still external. That usually means network latency, and even locally that can often add a respectable fraction of a second to each interaction. And most external services aren’t lightning-fast.

That can quickly add up to individual tests that take several seconds each, or more — not too bad in isolation, but when you have hundreds of tests in your suite (as you typically should), it piles up fast. If the suite takes five minutes to run, or fifty, you’re not going to run it often, and that kind of defeats the purpose.

So I usually recommend sticking to emulators as much as possible, and only pulling in the real thing if you are confident that it is responsive enough to not get in the way.

When to Use Unit Tests

As mentioned last article, there are exceptions to the “don’t bother with unit tests” rule. Some particular examples:

Data structures that are genuinely complicated — in particular, while you shouldn’t usually be using mutable structures, even a complex immutable structure needs careful testing. (Use ScalaCheck here.)
Complex algorithms. In particular, if you have functions that are highly sensitive to their inputs, then it is typically worth writing some focused tests for those.
Hard-to-hit code. This tends to be obscure error pathways that are difficult to exercise except under extreme circumstances. Unit tests for those are sometimes worthwhile, although I would always ask whether there is a feasible way to hit them with a scenario test instead.

There are plenty of other cases; don’t be afraid to write a mix of unit and scenario tests. The message here isn’t that unit testing is evil, just that an excessive focus on it is counter-productive. As always, test mindfully and always be thinking about how to most effectively test your code in a maintainable way.

Conclusion

Putting this together:

Services tend to be mostly plumbing.
Bugs arise less often in plumbing classes, and more between them.
You should focus on testing the interactions between the components.
The best way to do that is by testing the entire application on realistic inputs.
Emulate the external services used by the components, with little simulators that are fast and under precise control.

Next time: The Case for 100% Test Coverage.