Every manager or project manager wants to deliver features as fast as possible. Testing and fixing bugs take a lot of time in the software development life cycle. People have tried to reduce this time in many ways. One of them was to switch from Waterfall to Agile. Instead of sequential steps with testing as the last one (often after months of development), teams tried to deliver smaller parts of features in shorter sprints. The other idea was to invest in test automation. Tests executed by computers are faster than those performed by humans. Computers don’t make mistakes, they don’t suffer from fatigue after hours of work. We can use them to test on-demand, any time we want. That’s why many teams started automating their manual test cases - using the tools to click on the UI on their behalf.
Table of Contents
After the initial excitement about watching the tests executed by a machine, the teams realized that this approach also had some drawbacks. The more tests were automated, the more time their execution took. Testing in parallel didn’t help as it made some of the tests flaky - when run a few times in a row, their results differed without any apparent reason. Some of the tests were unstable even when run sequentially. Finally, the results were ignored as not trustworthy and the applications were tested manually. It looked like test automation didn't work. This model is called The Ice cream cone and it’s a well-known anti-pattern.
What can be improved?
Fortunately, there are some ways to make test automation your friendrather than your enemy. One of them is the test pyramid. According to the ISTQB glossary, the testing pyramid is “A graphical model representing the relationship of the amount of testing per level, with more at the bottom than at the top.” Why does it matter? How does having more tests on the ground help? First of all, we need to define the levels of the pyramid, starting from the bottom.
Unit tests check the smallest parts of the application code - in most cases the methods. They are isolated which means that they are independent of other parts of the application or dependencies like databases. They are easy to write: you pass some parameters to the tested method and verify if the data returned by the method is as expected. When they are written properly, their execution takes very little time - milliseconds. This is because they can be run on a local machine right after the code has been compiled. That’s why they are used heavily in the Test-driven development technique (TDD) which consists of 3 steps repeated in a loop: A developer writes a test first, it should be failed, or even not compile. A tested method is implemented, the test should be passed. The tested method is refactored, the test should be passed.
Unit tests are also useful if you want to measure the quality of your code. You can calculate and track the test coverage - how many lines of code were checked by the tests. It's hard to reach 100% and it often does not make any sense to try. Howver, the more unit tests and the higher coverage the better. And that's why they are at the bottom of the pyramid.
As the name implies, integration tests check if the classes and methods or modules combined can work successfully. They might also verify if the application module can communicate successfully with external parts of a system (like the database, HTTP client, etc.). The application still doesn’t need to be deployed anywhere - integration tests can be run on a local machine or a CI system, but some of the application modules need to be set up together which might take more time in unit tests.
Finally, we want to move from modules and parts of ane application to a combination of many applications. End-to-end tests, also called E2E tests, verify the whole working system, composed of many applications: backend, frontend, databases, message buses, etc. They check the system from the user's perspective and verify complex workflows. On the other hand, their stability depends on all the services in the test flow, which means they are flakier and their execution takes more time. Also, because of the test environment limitations, there might be queues of features to be tested, which increases the time of testing E2E even more. Due to these performance issues and the costly maintenance of these tests, their number should be limited. They should cover only the most critical business paths and that's why they are at the top of the pyramid.
Back to the pyramid
Based on the testing pyramid, we should invest the most in the unit and integration tests, reducing the number of E2E tests. But why does it matter? The testing pyramid is about reducing the time and the cost, which are strictly connected. It is all about shortening the feedback loop.
At the beginning, I mentioned that test automation has failed in many projects. Why did it go so bad? Let’s skip the bottom layers of the pyramid for a while and test everything on the E2E level. How long does it take to see if a change you have made in an application didn’t break anything? What needs to be done to test the application end-to-end?
- Code review - in some projects the code change can’t be deployed to the testing environment before the change is approved (~hours).
- Application build (~minutes).
- Application deploy to the testing environment (~minutes-hours, there might be more people willing to deploy their changes causing the queue of changes to be tested)
- Automated tests execution (~minutes to hours).
- Results analysis - there are probably a few failed tests, not essentially caused by the change, but the test environment itself. Also, other teams might waste their time here, if your change has broken their testing (~minutes to hours).
- Fixing a bug, if any, and going back to (1).
How many code changes can you test during a day?
At the same time, if you invest the most in the tests at the bottom, verifying the change is quite straightforward:
- Code compilation (~seconds).
- Unit tests execution (~seconds).
- Result analysis (~minutes if any failures, but in most cases, you can be sure that your change introduced the failures, not anyone else).
- Fixing the bug, if any, and going back to (1).
You magically saved a few hours or even days, which means you can introduce many more changes and deliver more features.
Does it mean you should write only unit tests?
Probably you have seen a lot of pictures or GIFs like the one above. Two components can work perfectly fine in isolation, but when integrated with other parts of a system, they can cause some problems. Testing on higher levels of the pyramid helps to avoid these integration issues. These tests are essential for your CI/CD, however, when your components are properly unit tested, you can reduce their number and decrease the overall testing time.
What is a rule of thumb?
You might be wondering how to decide on which pyramid level your test should be. My suggestion is to ask a few simple questions.
- Can the feature be unit tested? A unit test is a perfect solution. Test everything possible, including corner cases.
- How will it be combined with some other parts of the system? Check it in an integration test (but only verify the integration, no need to check the details as they were unit tested).
- Will it be used in an end-to-end flow? Check the happy paths to see if the system can be used properly by the users (corner cases of functionality and the integrations have been already covered).
You can stay with the ice cream cone model and keep complaining about the automated tests, or you can improve them with the pyramid model. Starting from the bottom of it will make your test automation successful. Stable and trustworthy tests will allow you to deliver faster and, more often, keeping everyone happy. The end users will receive high-quality features more frequently. The team members will be able to work on a new functionality instead of maintaining unstable tests. The ice cream cone or the pyramid? The choice is yours.