A few weeks ago, I got into a semi heated debate with an ex-coworker about re-running automated tests. Specifically, some of his tests failed and he blindly just re-ran them. I find this is common practice in the testing community. Something fails, and so we just re-execute our tests, and if those failed tests then pass, we move on, ignoring those failures.
On some of the web forums and groups I’m a part of, I see people ask about the easiest way to re-run just the failed tests, and some frameworks have the capability readily built-in. TestNG, for example, produces a testng.xml
file upon test completion that will make it simple to just re-run failures. All of this, I feel, does a vast disservice to the software and testing community, makes us worse testers, and devalues our credibility as defect locators.
The Problem
In my opinion, this is one of the worst habits that we have as testers, and even worse, we’ve trained others outside of our field to do the same thing. Something fails? No worries, just re-run it until you get a success. This is especially true when testing at the UI level, where we’re used to our tests just being ‘flakey’. The problem with this is that we’re conveying the message saying our software (yes, automated tests are software) sucks, if it doesn’t work maybe just try again. What would you do if a developer said, “Oh don’t worry about that error, just try again”? I, for one, would open a ticket, and hound the developer to fix the issue. So, why aren’t we as testers doing the same thing?
The problem is, we’re ignoring a very important part of testing: triage. When you encounter a failed test: if it’s your fault, fix the test; if it’s the developer’s fault, fix the app; if you don’t know, figure it out, or at least open a ticket to deal with this, and remove the test from your test suite. When we don’t do this, we end up with tests that we consider ‘flakey’ – they ‘work’, but sometimes we need to run them two or three times before they pass. This is unfortunately common in many organizations, and it causes problems.
What we’re doing, is training developers (and others that run and see our tests) is that a failure within our test suite isn’t actually that – it’s an expected error (side note, I can’t even keep count of the number of developers who when I point out errors (sometimes even blatant failure messages or log statements) tell me that’s what’s supposed to happen). If a test fails, is it really not a problem? Is that really expected behavior? Because what you then end up with, is a series of tests that no one will trust when there actually is an application failure that the failing test caught. At this point, we’re just running our tests blindly, running them until they pass. And if that’s the case, why bother running those tests at all? If we’re going to blindly pass the application, why not just do it without the tests, and save us some time?
What We’re Hiding
Over the years, I have uncovered hundreds of hidden timing, threading, and drawing issues, with automated tests, that could not be found just by clicking through a browser. Just because someone can’t replicate the issue by hand doesn’t mean it’s a bad test, maybe it means it’s a really good test because there is an underlying problem that is just very hard to exercise. Working at a previous client, the first time we implemented our automated tests, we had intermittent failures registering a user. Instead of just re-running tests, we bugged the developers.
They showed us time and again they could register a user, but we continued to insist there was nothing wrong with our tests. Eventually, a database locking issue was discovered that prevented 2 users from being registered within milliseconds of each other. The likelihood of this issue being discovered outside of our automated tests was minuscule, but the implications and potential impact of the bug was huge. The company was expecting over a million users to register on launch day, which when distributed, created a high likelihood of encountering this issue in production.
At this particular client, we uncovered multiple low-level architectural problems just from our automated tests, but these were all first encountered as intermittently failing tests: issues that nothing outside of our tests suite could replicate. Had we not been vigilant and pushed back on re-running tests, these bugs could have had major quality impacts. Mind you, we did write bad tests sometimes, which were intermittently failing because we wrote them poorly.
The Solution
At this particular client, we spent a decent amount of time on triage efforts, to determine the true cause of the test failure. Some things were straightforward, like when we tried to interact with an element that wasn’t there yet – we needed to make our test more robust by adding in waits. Other times they were less so, like getting stale element exceptions.
Often this was us not properly accessing an element, but twice we found drawing issues a developer unfamiliar with React created, with elements that would blink out for a millisecond and then redraw itself. To ensure we weren’t missing things, we created a rule, that failed tests could not be re-run unless something was changed, either in the app, or the test itself. This was true for running locally, debugging, or even within CI. Otherwise, we feared missing an actual issue. This wasn’t the first time, nor the last I’ve run into this situation, but definitely one where I believe we had a lot of success finding issues and forcing robust test creation.
What this all means is I would rather delete a test and not have any automated coverage in that area, than to have a test that sometimes passes and sometimes fails, that I just rerun until it passes. At least without any automated test, I know that I have to go back and test the system myself because there’s a gap in that area. Failing ‘good’ tests mean you’ve trained your entire organization to ignore that test when it fails. So, if you ever do find an issue in the application with that test, everyone will assume it’s not actually a bug. Another reason to just delete that test, because it’s worthless.
I can’t stress this enough, never rerun a failed test without changing anything. You may as well just say the application works and save yourself the time of running a test again and again and again until it passes. Because that’s all you’re doing, getting to a pass however you can. And there is no point in that, it cheapens testing, and provides no value. Next time a test fails, take some time and figure out why. Remember, a failed test isn’t a bad thing, it’s a learning experience, don’t waste it.