Shared publicly  - 
 
Taking Automated Tests Off The Pedestal

I think that by now we've all learned that automated tests are good. I look at this as one of the big successes of the past ten years, the fact that software teams these days understand the value of tests and often feel downright embarrassed if they don't have them. Yes, there are some holdouts, but there really does seem to be a consensus about this, so it's probably a good time to look at the status quo and examine it a little bit.

Many projects have a very large number of automated tests. And, that's good. It's better than not having them. On the other hand, many teams feel like these tests are a yoke around their necks. Their build time keeps increasing. They spend more and more time dealing with test management, and at the end of the day, they know that things are getting worse. Yes, the tests enable growth in the code base, but it's that kind of reluctant growth that costs a bit more each time it happens.

The typical response when people notice this problem is to optimize. We can make builds and tests faster in a number of ways, from parallelization, build pipelining, and hardware upgrades to the more mundane yet more scalable strategy of rewriting tests so that they are more focused and therefore much much faster.

Again, this is all optimization. It never questions the core assumption, which is that more automated tests are better.

A while back, I used to spend a lot of time helping organizations move to agile development. They all started from different places, but one very common situation was that the developers worked for a period of time, finalized their work, and then handed it off to a QA team. The QA team typically did manual testing. There were often more manual tests than the team could run in the amount of time that they had, and the whole organization sat on pins and needles when QA finally blessed the build. I used to call it the "trailer-hitched QA" anti-pattern.

The remedy is easy, but nerve-wracking. You ramp up automated testing, and pull QA people into the iteration, having them perform whatever testing they need to do as each story completes. Most organizations didn't want to do this in one fell swoop, so you just slowly reduce the amount of time that the QA team has the build after each iteration. The nerve-wracking bit comes from the fact that you know that you are going to end with more bugs in production in the short term, but eventually things get better.

Why do they get better? Well, it's easy to point to the increased emphasis on automated testing. Surely, that makes things get better, but really the reality is much more subtle. Most of the quality benefit that teams get comes from the development team being more careful. And, when you don't have people running around after you spending weeks checking your work, you do end up being more careful. If you aren't, bug reports from production are going to start to spike.

I'm sure that at this point, you might think that I'm making an argument against automated testing. I'm not. I'm merely pointing out that at the end of the day, quality is a development responsibility and it has a lot to do with diligence. To me, the primary benefit of TDD is that we spend time thinking through cases we might not otherwise when we write our code. Quality can't help but go up. It's nice that the tests we write give us a nice regression suite, but is it possible to get too attached to it? I think so.

Automated tests suffer from a political problem. When you have a lot of them, the cost of running them may be high, but the cost of not running them is perceived to be higher. After all, the test that you don't run could be the one which catches some disastrous error. Who wants to take that chance? Frankly, I think that thinking that way is a bit of a trap. It's a trap in the same way that thinking that running all of your manual tests was necessary.

At the end of the day there are all sorts of things which increase quality. It takes a bit of courage to weed through automated tests and say "Yes, I'm sure this one is valid, it would catch a problem, but it's too slow or just doesn't have high enough cost/benefit for us to run any more." Few teams really do this because they are trapped by the political calculus: we could be faulted for not running the test.

Ultimately, tests are a feedback mechanism, and we should make active decisions about what feedback we need and when. My current thought is that it is perfectly reasonable for teams to set a build budget, a maximum amount of time that a build should take in order to provide feedback. If you can make slow tests faster to fit in that budget, fine. If you can't, toss them or rotate them through periodically. Make the value judgement about what tests are most important. We can do a lot of things to avoid that responsibility, but we shouldn't.
122
72
Kapil Sharma's profile photoGeorge Paci's profile photoBill Sorensen's profile photoJim Holmes's profile photo
25 comments
 
Been there, done that.

I've long been preaching the concept of continual evaluation of what tests you choose to write, maintain, and keep. If they're not continuing to provide great value, then delete them, or at least get them off your regular cycle of tests.

That's a tough thing to do when the organization/product team is relying on those automated tests to give them a false sense of security for their failure to address fundamental quality issues in how they're building the actual system.
 
We've had some trial and error over the years finding the balance of how many automated regression test cases is enough, but truly it hasn't been that hard. As you say, we've optimized, our 6000+ JUnits run in under 8 minutes. All our API level and GUI level tests run within 45 min. of check-in, we've had to disperse suites among build slaves to do that but it's fine. (We'd like them faster and we'll get to that some day).

But I think the most important benefit we've achieved through automating tests is the required communication and collaboration among all team members to get it done. We practice spec by example (ATDD) so testers, customers & pgmrs have to talk to get shared understanding of each user story via specifying the tests. The regression tests that come out of it are just a bonus. In fact, we don't keep all the tests we create during development as regression tests. As you say, that would be way too many tests.

Even the GUI tests improve our website because we need to have valid HTML and JS, and smart UI design to help ease automation. We do minimal GUI automation, but for the past 8 years it has been enough. I can think of two occasions where we purposely chose to not automate a test for a particular edge case, and the edge case happened in prod (and was bad when it happened, but not disastrous, so I think that was a fair trade-off).

I think more important than "let's automate all our tests" is "let's get the whole team working together on useful test automation". I don't want anyone to use this as an excuse to put off necessary automation, but if every team member takes responsibility for improving quality, the right things will happen automation-wise.

(And if anyone says "Yes, Lisa, that's fine for you but your team is special", I will scream. We didn't start out "special". This has been hard work and learning and time.)
 
So this is probably the third time recently someone has suggested that unit testing can be dangerous when over emphasized. The first was +Scott Bellware at Agile Vancouver (I'm assuming you saw that talk, I'm wondering is this partly inspired by that?). Then there was this post (http://www.makinggoodsoftware.com/2012/01/27/the-evil-unit-test/) the other day.

I think this is really good for people to start thinking about and discussing, but I also hope this discussion doesn't discourage developers from learning and applying TDD and BDD. I think that developers still need to practice TDD and by extension learn how to write/design testable code/systems. It isn't until after someone has struggled through practicing and learning TDD that they can be fully equipped to make these kinds of judgement calls.

After that it probably makes sense to go back to delete, or at least rewrite the earlier, probably poorly written tests that came from that learning.
 
Chris, yeah, it's something I've been thinking about for a long while. My position is that we should write tests but we should also be aware of our tendency to venerate them or feel obligated to run all of them all of the time. In the end, they are sensing tools that are useful to the degree that they give us timely feedback.

About 10 yrs ago, I was at a summit of testing gurus who were learning about agile from me and a couple of others. I was enthusiastic about automation and I remember one of the gurus saying "So, what are you going to do when you visit a 15 yr old project with hundreds of thousands of automated tests that take days to run?" I've never forgotten that.

Missed Scott's talk at AV, but we are like-minded :-)
 
I Just had an interesting talk about this with +Joseph Wilk yesterday. Since there are different scales of feedback, you should analyze which tests give you the value based on both feedback quickness and maintenance level.
In order to do this, you need to do the analysis, like Joe does. And I don't think that this is so prevalent.
All this aside, I don't think that automated is that common either. True, people are more open to try them, but if they feel the yoke is heavy, they'll drop it very quickly. The ones with enough discipline and energy are the ones that carry on to the the next challenges.
 
Michael, on your latest comment about "15 yr old project with hundreds of thousands of automated tests that take days to run", what I am trying to tell you is, we are an 8+ yr old project with tens of thousands of tests and they run really fast. Maybe that means we have applied what you recommend. I think we have balanced test coverage and short feedback.
 
Lisa,

Yeah, I think what I'm getting toward is something more controversial. The idea is that it can be reasonable to value bounded feedback time for test execution over coverage.

When I've floated this idea before, people have pointed out that if you are diligent, you can continually optimize your build to keep tests fast, and it's true, but frankly, if I had two-days worth of tests, I'd choose not to run many of them just to get that quick feedback loop running, and I'd want to put an upper bound on execution of the full suite to motivate higher value tests, faster tests, and back-pressure for higher quality.
 
Isn't much of the issue non-unit tests? If a test does not hit databases, services, the file system, or other dependencies, it (usually) will run very quickly. With tens of thousands of tests, that can add up, but careful library/package design can reduce the need to build/test everything on every change. Then the issue is reduced to a question of how often to run integration, acceptance, UI, etc. tests.
 
"So, what are you going to do when you visit a 15 yr old project with hundreds of thousands of automated tests that take days to run?" -- A project with this description sounds like its suffered lots of churn and/or become severely bloated. In this case I'd argue that the project has reached the limits of maintainability. Slow feedback probably isn't their number 1 problem. The need for fast, automated tests should/could be used as leverage to evolve the project into separate, maintainable parts.
 
I've worked at places where, over time, little compromises in commitment to developer testing and the quick feedback loop have added up to this type of huge testing and design debt.

The first time you hit the test run time limit, you can stop and think "how can I design my production and test code better so that I can get the test results faster?" Yes, deleting automated tests is a possibility.

Or you can think "hmm. redesign is hard. I guess I'll leave things the way they are, and accept the longer test run time." And you lose the quick feedback loop. Or you think "I'll move some of the longer running tests into a run-nightly suite", and you lose the feedback loop.

Having a limit on test run time hopefully encourages more and better thinking about the design of the whole system, not just the tests.

As an aside, I have this idea about 3 of the ways to improve the value of feedback. 1 - start assembling the feedback sooner, 2 - decrease the time it takes to get the feedback (once started), and 3 - provide additional valuable feedback. I think this translates here to 1 - run the tests often, not just once per night/week; 2 - make the tests run faster; and 3 - write more and better tests.
 
Michael, I've used that full set / partial set approach to get short feedback loops while keeping better coverage overnight or on the weekend. It works very well, but of course one must still be disciplined and continually prune both sets of suites.

I like your idea of being explicit with a boundary for execution time.
 
Doesn't, y'know, paying attention take care of this problem? It's the continual oscillation between overconfidence and hypervigilance, which converges over time to a person's "testing risk profile". I theorise, and therefore teach, that step 1 involves "write more tests!" because the standard context involves people who just don't write (automated) tests. I might ask, though, at my next workshop, "How many people here think that they already have too many automated tests?"
 
Hundreds of thousands of tests that take days to run. It's legacy code. And....?
 
But Michael, we aren't running 2 days of tests. We're running 8 minutes max of JUnit, and 45 min. max of all other tests. I don't see anything bad with what we do. Good regression coverage, quick feedback. Of course we always look for ways to improve.
 
Lisa, thanks. I wasn't talking about your specific case, but rather what it can be like situations where people haven't done all of the work that you've done.
 
Hi Michael,

Nice post :)

As the project grows so the features set and within this the number of test cases, in that scenario the automation plays a good active catalyst in testing cycle, cut down the manual QA effort not fully but provide a healthy windows for testers to think beyond regular repeated test cycle again and again.

I am not advocating for fully or partial automation but as tester/QA we should look for best optimization of our testing efforts :)

Thanks,
Kapil (http://testing-mines.blogspot.com/)
 
Hello Michael,
I totally agree with you . Automated tests certainly help in the quest for quality software, but in my opinion they are overrated. In my years of experience as a coder, I have noted that time and time again, nasty bugs are discovered as soon as someone different than the original programmer starts using a software module that was thought to be stable, simply because as individuals, we unconciously tend to repeat certain usage patterns over and over again.
What could be a more static usage pattern than a predefined, scripted test?
The problem I see is that many coders advocating TDD begin to develop a false sense of security when all tests go green, as if they think, "this is bug free", when most probably, it is not. I'm not against it, but in my eyes, it is definitely, no silver bullet.
 
One doesn't need to couple production code to tests. If large parts of the system have to change, start by finding the layer where change starts, then delete the tests that no longer match what you need. Now you have all the freedom you need to change whatever you need.

I think people get into trouble trying to maintain mutually contradictory or incompatible tests. I recognise this situation in my own practice much better than I used to. It takes confidence to delete tests when things change.
 
One way to put it is that, over time, automated test suites can become overgrown. What would be useful would be:

1) A guide to pruning your automated test suite: how do you identify tests, types of tests, or entire classes of tests that are not worth the time it takes to run them? I think a lot of people would be more confident pruning their tests if they had a handbook telling them what's a sapper and what's a trunk.

2) Advice on how to garden better incrementally: how do you know when your automated test suite's growing out of control? Are there particular smells to be vigilant for (e.g. the one Dave brought up about only running the entire suite at night)? When the tests bolt and suddenly reach the x minute mark, what's the real underlying problem likely to be?

I'm a little leery of arguments that say "thus-and-so will force developers to be more careful." Throwing me out a window would, by that token, force me to learn to fly; as an educational technique, I don't give it much chance of success. If you want developers to be more careful, be direct about it, and offer us specific practices (or at least principles) that will help.
 
Nice idea, +George Paci -- when do you starting writing your book "Bonsai Tests: prune like a master"?

I have one pattern springing to mind: quarantine sick tests. You have 900 rotting tests. 73 fail regularly. Another 52 fail intermittently. Take all the failing tests and move them to quarantine. As you work on the parts of the system that those tests touch, review the relevant ones, and at that time decide once and for all either to fix 'em or chuck 'em. Whatever tests you leave behind, they must pass. As quarantined tests are rehabilitated, move them back into the active test suite. I've done this a few times working with groups that have had uneven TDD/TFP discipline for a period of months to years.
 
Automated testing is hard, but in the same manner that software development in general is hard. If you approach either mindlessly & with a lack of dedication or commitment then you will inevitably get into sticky situations. Even if you are diligent you'll still get into a mess sometimes just because it's all really bloody hard.

We don't suffer from test suites that take 3 days to run, but at the same time are frustrated when we have acceptance tests which take 20 minutes! We also only have only 2 testers in a team of 30 developers and a manageable rate of bugs on a 7 year old ecosystem.

Recent analysis shows we're more productive than ever and I attribute a lot of this to the efforts we've made to improve our automated testing meaning we can get changes into production faster.

If you're getting into a situation where you're spending proportionately more time maintaining your tests it means you probably weren't spending enough time on them in the first place.

I'd encourage people not to throw the baby out with the bathwater if they're having problems with their automated tests. They need to be treated with the same amount of care & attention as production code.
 
We had a packed park-bench discussion about this post in the pub for eXtreme Tuesday Club last night. Here are some notes I can remember:

Steve Freeman (@sf105) said he thinks that nearly always when he hears people blaming tests, the fault is elsewhere. For example, slow tests probably mean a team are over-relying on end-to-end tests, and don't have enough coverage from faster unit tests. This in turn probably indicates that they have badly-designed code.

Rob Bowley (@robbowley) fretted that any talk about not keeping tests is irresponsible, because it will cause noobs who haven't even tried TDD yet to think that it isn't worth it.

Liz Keough (@lumivore) tried to persuade us that we should delete tests. For example, for behaviour that is well understood such as a login screen, having tests for such obvious functionality is meaningless and clutters up the test suite.

Bob Marshall (@flowchainsensei) pointed out that, ideally, all testing is waste. If only software developers spent more time on learning how to do it well he said, rather than catching the mistakes they assumed they would make, maybe we wouldn't need to worry so much about testing at all.

Some other people, who work in contexts where production bugs are both tolerable and can be fixed quickly by continuous deployment mechanisms, said this context reduced the value of automated tests to the point where they only wanted a few of them.

Rachel Davies (@racheldavies) pointed out that you need to go through some consolidation work after driving out a feature, moving checks down from acceptance tests to unit tests, for example, so that they're more focussed and faster.

Nat Pryce (@natpryce) didn't understand what we were all worrying about. He said that well designed code should be closed to modification, so that if you need to change the behaviour of the code you'll be adding new code rather than modifying existing code, so why would you need to re-run the tests all the time?

@SleepyFox made, for me, the point of the night. His point is that value isn't static over time: a test has a value curve that spikes high early on when you're driving out the code, then decays over time. You might get a bump in the value curve if you need to do some heavy refactoring and need the test again, but generally it decays away towards zero as the system stabilises.

It seemed to me that the theme of the night was the attention we pay to tests once they've started to pass. We generally don't pay them any attention until they fail again for some reason, or the build starts to feel too slow. Paying more attention to our tests - prioritising them so we run the most useful ones first, moving checks down from acceptance tests to unit tests, or even deleting them altogether - is something we should we should be doing continually, rather than reactively when the tests scream for our attention.
 
Tests form a pool of change detectors, among their many functions, and for that reason alone, I'd keep them even after they've outlived their use for understanding a feature. If they execute quickly and keep passing, they mostly stay out of the way.

Tests also provide long term feedback on the design, detecting dependency problems even before they get out of control, which encourages me even more to keep them. Of course, here I mean micro tests.

I found that my practise as a programmer progressed rapidly after blaming the production code for problems I had with tests.
Add a comment...