Wednesday, March 28, 2012

Monday, March 19, 2012

When automated tests make sense, and when they don't

Some years ago, I stumbled upon an article about automated testing, "When should a test be automated?" (I'm linking to Google since my previous link is now broken). I found it quite interesting because it, unaffected by all the hype surrounding automated tests, provided some rational guidelines for when to write automated tests and when not to.

A main principle was that an automated test is a waste of time if it doesn't find a real bug upon being run again.

This is a good thing to keep in mind, but since we generally don't know where the bugs are or will be in the future, it's not a very practical rule. So on and off, while working on various web applications, I've been thinking about this.

Why automated tests are good
Praises of automated tests are not hard to find. But briefly: the main practical advantage is that if you change the software, you can easily rerun the tests to see if you have broken something, compared to having to retest everything manually (or just crossing your fingers). If you test the same things over and over again, having it automated can be a real time saver.

There is also a psychological advantage; writing and running the tests make you feel better. Plus you're in, which is always a good thing if you're a little low on the hype factor.

Why automated tests are bad
However, there are also some downsides.

First of all, automating tests takes time.

And unfortunately, it's usually a pretty boring task. It doesn't help that you often have to test things manually too while experimenting or as a sanity check, so you pay the cost of testing twice.

And the tests, while radiating nice comforting feelings, also have a tendency to give a false sense of security. The tests are running, so it works, right?

So you always have to keep in mind that automated tests only test part of what makes the code correct. You could argue that this is a sign of a too lazy test writer, I think that's the main driver in test-driven development, but really, you can't foresee everything (pesky user clicks the wrong button), and if you try to, you end up with so detailed tests that you can't change the program without having to the rewrite the tests completely. And then they have lost their main automated advantage.

Besides, an automated test only confirms what you already suspect, it doesn't tell you if the software is creating real value for its users. So you still need human testing.

Furthermore, even if you don't test everything, in my experience you still end up with a large amount of test code, as much or more as the actual program. There's a hidden cost in this; many changes to the program will also require adapting the tests. In other words, code maintenance take longer. People don't think about this, but imagine you show me 100 lines of your code, and I told you I could reduce it to 40 lines with no ill effects on readability or performance? Would you take that? What if those 60 lines I take away are your tests?

The trade-off
With the understanding that automated tests have good and bad sides, if you value your time, you should evaluate the situation on a scale from should-automate to should-not-waste-time-automating.

Circumstances pro automation
Hard to setup tests. If you have to go through ten steps to be able to test a small change, testing quickly becomes a tedious chore. Usual issues in web applications are getting test data prepared, or being able to intercept email sent by the system, or testing error scenarios.

Important corner cases. Testing manually that a new button does the right thing is not tedious. But it is tedious if you have to try 20 different combinations of inputs to see they all work.

Team of disparate people touching the code. Most code have some implicit assumptions that cause bugs if violated. When multiple persons are modifying it, there's a greater chance of bugs, and smaller chance that each remembers all the corner cases that need to work. Note that you don't necessarily need more than one person on the development team to fall into this category. Add a gap of a year or two in the development, and most people will have happily forgotten the pecularities of the project in the meantime.

Complex code. If the code is complex, it's likely to be needing bug fixes that don't change the expected output, which is good because then we don't have to change the tests, and it's also likely that it's harder to see what the expected output is supposed to be, which can be conveniently solved by writing it down in a test.

The code is seeing changing input data. Modifications to complex code is one source of bugs, but another is changes in the input data. You test it today, and it works fine, but tomorrow your data source decides life is too quiet and changes something that violates your assumptions about the input. Again, this requires bug fixes that usually don't change the expected output.

The code is an API with no UI. If there's no UI, you need to write code to test it, and then you might as well automate it anyway. With Python or any other language with a REPL, this is not entirely true, though. I often test small snippets I'm not yet confident are bug-free in the interpreter.

Circumstances tipping the scales toward manual tests
Simple, easy-to-follow code. If it's simple, there are fewer opportunities for bugs and thus a greater chance automation will be a waste of time. For instance, code using templates to spit out HTML is in most cases trivial, a div here, a heading there. You could add a test to find out whether the div and the heading are there, but it's trivial to see that by mere inspection, and if you've checked the way they look in the browser, there's little chance they're magically going to disappear later.

Localized changes. If changes only affect things in their near vicinity and have no impact on the rest of the software, it's easy to test them manually. For example, consider a web application made up of several independent pages. When you're working on one page, you can just reload it to check it's fine and ignore the rest of the program.

Manual tests are simple to set up. If all it takes to run the test is a reload in the browser, the relative cost of automation is high.

Hard-to-quantify issues are important. If look and feel are important, there's currently no substitute for a human. Imagine setting up a test for an animation - how do you test should look good?

The functionality sees lots of changes and experiments. If the functionality of the software keeps changing, maintaining a set of automated tests is a burden and is more likely to turn you into the grumpy nay-saying change-resistant bastard it's important not to be.

The software sees no changes. If nothing changes, there's little opportunity for bugs and little opportunity for rerunning a set of automated tests. This situation sounds strange, but actually in my experience there's a certain polarity in software maintenance; some things change all the time, others are truly write-once. Of course, this observation doesn't help a lot when you're starting on a project and don't know where it's going to end.

Only one person. Again, this is an important special case. If there's only one person working on a given piece of software (or perhaps module), that person is likely to know exactly what needs to be tested, diminishing the value of a broad test suite.

Minor errors are not critical. This sounds sloppy, but in reality with many web sites, it creates more value for the users and customers if you focus on having the major paths hit the target than being 100% correct because errors outside the major paths can be fixed quickly when discovered.

Of course, most of the above points aren't independent. For example, say you're writing a little script to convert your music collection into a standard format. In that case, you're the only developer, you're going to experiment and change the script until you're happy, an error is not critical since you're around to supervise the process, and you don't expect the script to run more than once when it's finished. Would you write automated tests for that script?

Unit testing versus acceptance testing
I have an extra remark regarding the level at which the tests are performed. There's been a lot of talk about unit testing, in fact most automated test frameworks are called unit testing frameworks, but the reality is that unit tests represent little any end-user value in themselves because they are just providing evidence that some component that nobody will ever run by itself is working if it is being run by itself in the lab.

A more appropriate level for end users is acceptance test where you are testing requirements to the system itself. For instance, the site should enable authorized users to add new products which should then show on this page or when a customer pays for some goods, an order should show up in the system. These are the kind of things that when failing will be sure to cause trouble for the users and thus you.

Of course, unit tests can add value indirectly by highlighting bugs early in the process where they are easier and thus cheaper to debug.

So where are we?
In my experience, when it comes to web development, there are some instances where unit tests make sense for a complicated component, and also some examples of vital processes where acceptance tests make sense because testing those over and over gets annoying. One example could be a sign up feature. Nobody tests that by themselves, because you only ever sign up once.

But otherwise a lot of web development fit the above criteria against automation quite well. Hence, contrary to what the hype says, I do believe automation in many cases would be a waste of effort.

Note that this is not an argument against testing. It's an argument against automating the testing.

It's also not a universal observation. For instance, if you're writing library code all day long, your world is likely looking different. Or if you're Google or another big web site, then the operating conditions are different - as a part of a roll out it's important to have easy reassurance everything is behaving as intended because even small mistakes can be painful if you have a million users. But most web sites don't need to scale.

Also, don't forget that the purpose of testing, whether it being manually or automated, is to find and fix bugs. Thus preventing the bugs from happening in the first place is perhaps the best place to focus your attention. Usually the key is easy-to-understand code with few implicit assumptions and attention to detail, making sure corner cases are handled well.