Monday, December 26, 2016

On testing

I've started writing another post (hope to complete it somewhere soon), and as one of the offshoots it brought my thoughts to the automated testing. The automated testing had been a great new thing, then not so new and kind of a humbug thing, then "instead of testing we'll just run a canary in production", then maybe it's doing a bit of a come-back. I think the fundamental reason for automated testing dropping from the great new thing is (besides the natural cycle of newer greater things) is that there are multiple ways to do it and a lot of these ways being wrong. Let me elaborate.

I think that the most important and most often missed point about the automated testing is this: Tests are a development tool. The right tests do increase the quality of the product but first and foremost they increase the productivity of the developers while doing so.

A popular analogy of the programs is that a program is like a house, and writing a program is like building a house. I'd say not, this analogy is wrong and misleading. A program is not like a house, it's like a drawing or a blueprint of a house. A house is an analogy of what a program produces.

The engineering drawings are not done much with the pencil and paper nowadays, they're usually done in a CAD system. A CAD system not only records the drawing but also allows to model the items on the drawing and test that they will perform adequately in reality before they're brought to reality. The automated tests are the CAD for programs.

The CADs for drawings require entering the extra information to be able to do their modeling. So do the automated tests. It's an overhead but if done right this overhead is an acceptable one, bringing more benefits than overhead.

The first obvious benefit of the automated tests is that they make the programs robust. To bring the house analogy again, like a house of bricks, not a house of cards. You might remember the saying about "If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization". Not with the automated tests, not any more. With the automated tests you don't get into the situation "I've just changed one little thing and it all suddenly fell apart" any more.

But that's again just the reliability. Where is the productivity coming from? The major benefit for the productivity is that you don't need to do so much analysis any more. You can always change things quickly and see what happens in a test environment. This is also very useful for tracing the dependencies: if you change something and the product breaks, you see where it breaks. This is a major, major item that makes the programming much more tractable and turns the devilishly complicated modifications into the reasonably straightforward ones.

The programming tasks tend to come in two varieties: The first variety is the straightforward one: you just sit and write what the program needs to be doing, or even better copy a chunk of the existing code and modify it to do the new thing you need. The second variety happens when the thing you need to do hits the limitations of the infrastructure you have. It usually has some kind of a paradox in it, when you need to do this and that but if you do this then that falls apart and vice versa.

When you have this second kind of a task on your hands, you need to stretch your infrastructure to support both this and that at the same time. And this is the situation when things tend to get broken. I really like to solve such problems in two steps:

1. Stretch the infrastructure to support the new functionality but don't add the new functionality.
2. Add the new functionality.

At the end of step 1 the program works differently inside but in exactly the same way on the outside. Having the automated tests, they all still pass in exactly the same way as before. And only after the second step there will be the new functionality that will require the new tests and/or modification of the old tests.

But as I've said before, not all the tests are useful. I have formulated the following principles of the useful tests:

  • Easily discoverable.
  • Fast running. 
  • Fast starting.
  • Easy to run both by the developer and in an automated scheduler.
  • As much end-to-end as possible.
  • Test support built into the product and visible as an external API.

What do these things mean? Let's look in detail.

Easily discoverable means that the tests must live somewhere next to the code. If I change a line of code, I must be able to find the most relevant tests easily and run them. It's an easy mantra: changed something, run the tests on it. And doing so must be easy, even if the code is not familiar to the developer.

Fast running means that the tests should try not to take too much time. This comes partially to trying to make the tests fast and partially to the ability to run only the relevant subsets of the tests. The way of work here is: changed a few lines, run the basic tests, change a few more lines, run the tests again. The basic subset for some lines should be easy to specify and take seconds, or at worst a few minutes to run. This is sometimes quite complicated when testing the handling of the time-based events. For time-based events it's very good to have a concept of the application time that can be consistently accelerated at will. Another typical pitfall is the situation "I've requested to do this, how do I find out if it's done yet, before running the next step of the test?". This is something that is easy to resolve by adding the proper asynchronous notifications to the code, and making everyone's life easier along the way.  The fast running concept is also useful for debugging of the failed tests: how long does it take to re-run the test and reproduce the problem or verify the fix?

Fast starting is related to the fast running but is a bit different. It has to do with the initialization. The initialization should be fast. If you're running a large test suite of ten thousand tests, you can afford to have a lengthy initialization up front. If you run one small test, you can't have this lengthy initialization every time. If you really need this initialization, there must be a way to do it once and then repeatedly re-run the individual test.

The other aspect of the fast starting is that the programmer must be able to run the tests directly from the development environment. The repeating cycle is write-compile-test. The tests must directly and automatically consume the newly compiled code. There must not be any need to commit the code nor to run it through some kind of an official build.

This partially overlaps with the next requirement: the tests being easy to run both directly by the programmer and in an automated system that goes together with the automated official build. The need to run the tests directly comes from the need for the efficiency of development and from the use of the tests as a CAD. There shouldn't be any special hoops to jump through for including the tests into the official builds either, it should just work once checked in. This is another place where keeping the tests next to the code comes in: both the tests and the code must be a part of the same versioning system and must come in as a single commit. Well, not necessary literally as a single commit, personally I like to do a lot of small partial commits on my personal branch as I write the code and tests, but when it comes up into the official branch, the code of the program and of the tests must come together.

The end-to-end part is very important from both the standpoint of the cost of the tests and of their usefulness. It needs a good amount of elaboration. There had been much widespread love professed for the unit tests, and then also as much disillusionment. I think the reason for this is that the unit tests as they're often understood are basically crap. People try to test each function, and this tends to both require a lot of code and be fairly useless, as many functions just don't have enough substance to test. Mind you, there are exceptions: if a function does something complicated or is a part of a public API, it really should have some tests. But if a function is straightforward, there is no point in looking at it in isolation. Any issues will come up anyway as a part of a bigger test of the code that uses this function.

The worst possible thing is the use of the mocks: those result in just the tautologies when the same thing is written twice and compared. These "tests" test only that the function goes through the supposedly-right motions, not that these motions are actually right and produce the proper result. This produces many horrible failures on deployment to production. A real test must check that the result is correct. If some API you use changes under you and breaks the result, a proper test will detect it. If you really have to use mocks, you must use them at least one level down. I.e. if you're testing a function in API1 that calls API2 that calls API3, you might sometimes get something useful by mocking the functions in API3 but never mock the functions in API2.

Another benefit of the integrated testing is that it often turns up bugs where you didn't expect to. The real programs are full of complex interactions, and the more you exercise these interactions, the better are your tests.

So my concept of an unit test is an "integrated unit test": a test for one feature of an externally visible API. Note that there might be some complex internal components that should be seen as a layer of API and tested accordingly, especially if they are reused in multiple places. But the general rule is the same. This both increases the quality and cuts down on the amount of the tautology code.And it also facilitates the stretch-then-extend model I've described above: if a test tests what the API call does, not how it does that then you can change the underlying implementation without any change to the tests, and verify that all the tests still pass, your external API hasn't changed. This is a very important and fundamental ability of tests-as-CAD, if your tests don't have it then they are outright crap.

Some readers might now say: but doesn't that end-to-end approach contradict the requirement of the tests being fast? Doesn't this integration include the lengthy initialization? When done right, no, it doesn't. The answer comes two-fold: First, when the tests are more integrated, there are fewer of them, and due to the integration they cover more of the internal interactions. Second, this forces you to make the initialization not lengthy. It might require  a little extra work but it's worth it, and your customers will thank you for that.

This brings us to the last point: when you've built the support of the testing into your product, keep it there for the public releases and make it publicly available. This will make the life of your customers a lot easier, allowing them to easily build their own tests on top of your infrastructure. Instead of writing the stupid mocks, it's much better to have a side-door in the underlying APIs that would make them return the values you need to test your corner cases, and do it internally consistent with the rest of their state. I.e. if you mock some kind of an underlying error, how do you know that you mock it right, in the same way as the underlying API would manifest it? You don't. But if an underlying API has a way to ask it to simulate this error, it would simulate properly, as it does in the "real life".

Some people might now ask: But what about security? But what about performance?

As far as the security goes, security-through-obscurity is a bad idea anyway. Obviously, don't give the test access to the random entities, have a separate "access panel" in your API for the tests that would only allow the authorized connections. And if your product is a library then the security-through-obscurity is a REAL bad idea, and there are no random entities to start with. Make things accessible to your user. There is no good reason for a class to have any private members other than for the non-existing components. The most protection a class might ever need is protected. The same, there is no good reason for a class to have any final members. And if you're worried that someone will use the testing part of the API to do something that the main API doesn't allow then the solution is simple: make the main API allow it, otherwise you're crippling it intentionally.

As far as the performance goes, the overhead is usually pretty low. Note that you don't need to embed the test support all over the place, only at the substantial API layers. These API layers usually deal with the larger-scale concepts, not the ones you'd find at the bottom of a triple-nested loop. Sometimes there are exceptions to this rule but nothing that can't be resolved in some way, and the result after resolution is always better than before.

Hope that I've convinced you that the right testing infrastructure makes a world of difference in the software development, not only improves the quality but also makes it faster and cheaper. With the right tests you'll never get stuck with "it's working somehow, don't touch it" or "we will need to study for a year before making that change".  Or to use again the building analogy (yeah, I've just decried it as wrong but I'll use it differently), the tests are not like the scaffolding on a building that you discard after the construction is completed, they are like the maintenance passages and hatches that keep the building in an inhabitable condition throughout its life.