A quite typical picture: software development company X, first delivery of project Y to testing team after few months of coding just failed because software is so buggy and crashes so often. Project Manager decides to invest some money in automated testing tool that will solve all stability problems with click-and-replay interface. Demos are very impressive. The tool was integrated and a bunch of tests were "clicked".
After a month we have 10% of tests that are failing. 10% is not a big deal, we can live with them. After additional month 30% of tests fails because important screen design was changed and some tests cannot authorize them for some reason. Pressure for next delivery increases, chances to delegate some testers to fix failing test cases are smaller every week.
What are the final results of above tool?
- unmaintained set of tests (and the tool itself) is abandoned
- man-days lost for setting up test cases
- $$ lost for the tool and training
Has the tool been introduced too late? Maybe wrong tool was selected?
In my opinion automation and integration tests don't play well together. Let's review then main enemies of automation in integration tests:
I'm tracking current project state using automated test suite that is executed 24/7. It gives information about stability by randomly exploring project state space. Test launches are based on fresh builds from auto-build system connected to master branch for this project.
Recently I hit few times typical stability regression scenario: N+1th commit that should not have unexpected side effects caused crash in auto-testing suite just after 2 minutes of random testing thus blocking full test suite to be executed. OK, we finally had feedback (from auto-test), but let's compute the (bug .. detection) delay here:
- auto-build phase: 30 minutes .. 1 hour (we build few branches in cycle, sometimes build can wait long time to be performed especially if build cache has to be refilled for some reason)
- test queue wait phase: 30 minutes (pending tests should be finished before loading new version)
- review of testing results (currently manual): ~1 hour (we send automatic reports, but only twice a day)
Everyone agrees that internal state checking using assert(), Q_ASSERT(), assert are good. Programmer can declare expected input (asserting parameters), internal state (invariants) and verify return values (postconditions) and runtime will verify such expectations. There are languages with direct support for assertions in those three variants (Eiffel with his Design By Contract philosophy).
Those assertions typically will show filename/line number/message information and abort program / raise an exception if the condition is not met. Runtime environment then can collect current stacktrace to give developer more information on failed expectation.
One can disable assertions entirely (C++, Java) or select subset of assertions (Eiffel) for production releases. Resulting code will be faster, but failed expectations will not be verified by software – problem reports may be harder to interpret.
On the other hand if an assertion is too strict (the assumption may be invalid) it may abort program giving user negative impression about software stability.
What to do then? How can we keep problem-diagnosing power of enabled assertions and prevent minor of invalid failed assertions from aborting whole program?
The answer is: weak assertions (assertion without abort).
My current project I'm working on is based on embedded systems and QT platform. Of course the very first task in the project is to implement some kind of testing method to get feedback on software quality. The test system is composed from few components:
- Automatic crash reports collected on central server
- Automatic random test runners connected to always-running (24/7) devices to catch crashes
First channel collects all crashes (from human and automated tests), second channel is performed fully automatically. Second channel allows to measure MMTF (mean time between failures) and analyse changes in time, probably helping with estimating current software quality state.
Second testing channel requires automatic test driver to inject random UI events (key presses, remote in my case). I used QT message queue for that purpose:
Collecting runtime errors (crashes, failed assertions, …) is very important part of software quality efforts. If you know crash details from your testing team you can handle them even before a tester writes first line of error report (!). That improves development speed.
Probably the fastest method how to create KISS (Keep It Simple Stupid) central crash report repository is to use:
- netcat – command line UDP server
- crontab – for daily logs rotation