Unit Tests

Our profession has come a long way in the last ten years. In 1997, no one had heard of Test-Driven Development. For the vast majority of us, unit tests were short bits of throw-away code that we wrote to make sure our programs “worked.” We would painstakingly write our classes and methods, and then make some ad hoc code to test them. Typically, this would involve some kind of simple driver program that would allow us to interact manually with the program we had written.

A piece of code created to prototype something should be discarded when the concept has been proven and rewritten properly. This code has several names: throw-away code, a quick hack, kleenex code, or disposable code.

The Agile and TDD movements have encouraged many programmers to write automated unit tests. But in the mad rush to add testing to our discipline, many programmers have missed some of the more subtle and important points of writing good tests.

Automated unit tests are a powerful way to ensure code reliability and catch bugs early. They involve testing individual units of a software application in isolation to verify their functionality. JUnit: Popular for Java applications. TestNG: Offers advanced features like parallel testing. Katalon: Known for its user-friendly interface. Diffblue: Uses AI to automate unit test creation

The Three Laws of TDD

By now, everyone knows that TDD asks us to write unit tests first, before we write production code. But that rule is just the tip of the iceberg. Consider the following three laws:

First Law: You may not write production code until you have written a failing unit test.
Second Law: You may not write more of a unit test than is sufficient to fail, and not compiling is failing.
Third Law: You may not write more production code than is sufficient to pass the currently failing test.

These three laws lock you into a cycle that is perhaps thirty seconds long. The tests and the production code are written together, with the tests just a few seconds ahead of the production code.

If we work this way, we will write dozens of tests every day, hundreds of tests every month, and thousands of tests every year. If we work this way, those tests will cover all of our production code.

Keeping Tests Clean

“Quick and dirty” was the watchword. Their variables did not have to be well named, their test functions did not need to be short and descriptive. Their test code did not need to be well-designed and thoughtfully partitioned. So long as the test code worked, and so long as it covered the production code, it was good enough.

What this team did not realize was that having dirty tests is equivalent to, if not worse than, having no tests. The problem is that tests must change as the production code evolves. The dirtier the tests, the harder they are to change. The more tangled the test code, the more likely it is that you will spend more time cramming new tests into the suite than it takes to write the new production code. As you modify the production code, old tests start to fail, and the mess in the test code makes it hard to get those tests to pass again. So the tests become viewed as an ever-increasing liability.

From release to release, the cost of maintaining my team’s test suite rose. Eventually, it became the single biggest complaint among the developers. When managers asked why their estimates were getting so large, the developers blamed the tests. In the end, they were forced to discard the test suite entirely. But, without a test suite, they lost the ability to make sure that changes to their code base worked as expected.

Without a test suite, they could not ensure that changes to one part of their system would not break other parts of their system. So their defect rate began to rise. As the number of unintended defects rose, they started to fear making changes. They stopped cleaning their production code because they feared the changes would do more harm than good. Their production code began to rot. In the end, they were left with no tests, tangled and bug-riddled production code, frustrated customers, and the feeling that their testing effort had failed them.

In a way, they were right. Their testing effort had failed them. But they decided to allow the tests to be messy, which was the seed of that failure. Had they kept their tests clean, their testing effort would not have failed. I can say this with some certainty because I have participated in and coached many teams that have been successful with clean unit tests.

The moral of the story is simple: Test code is just as important as production code. It is not a second-class citizen. It requires thought, design, and care. It must be kept as clean as production code.

Tests enable the -ilities

If you don’t keep your tests clean, you will lose them. And without them, you lose the very thing that keeps your production code flexible. Yes, you read that correctly. It is unit tests that keep our code flexible, maintainable, and reusable. The reason is simple.

If you have tests, you do not fear making changes to the code! Without tests, every change is a possible bug. No matter how flexible your architecture is, no matter how nicely partitioned your design, without tests, you will be reluctant to make changes because of the fear that you will introduce undetected bugs.

But with tests that fear virtually disappears. The higher your test coverage, the less your fear. Indeed, you can improve that architecture and design without fear! So, having an automated suite of unit tests that cover the production code is the key to keeping your design and architecture as clean as possible. Tests enable all the ilities , because tests enable change.

So if your tests are dirty, then your ability to change your code is hampered, and you begin to lose the ability to improve the structure of that code. The dirtier your tests, the dirtier your code becomes. Eventually, you lose the tests, and your code rots.

Clean Tests

What makes a clean test? Three things. Readability, readability, and readability. Readability is perhaps even more important in unit tests than it is in production code. What makes tests readable? The same thing that makes all code readable: clarity, simplicity, and density of expression. In a test you want to say a lot with as few expressions as possible.

After Refactor

The BUILD-OPERATE-CHECK2 pattern is made obvious by the structure of these tests. Each of the tests is clearly split into three parts. The first part builds up the test data, the second part operates on that test data, and the third part checks that the operation yielded the expected results.

Notice that the vast majority of annoying details have been eliminated. The tests get right to the point and use only the data types and functions that they truly need. Anyone who reads these tests should be able to work out what they do very quickly, without being misled or overwhelmed by details.

Domain-Specific Testing Language

The tests in examples demonstrate the technique of building a domain-specific language for your tests. Rather than using the APIs that programmers use to manipulate the system, we build up a set of functions and utilities that make use of those APIs and that make the tests more convenient to write and easier to read.

These functions and utilities become a specialized API used by the tests. They are a testing language that programmers use to help themselves write their tests and to help those who must read those tests later on.

This testing API is not designed up front; rather, it evolves from the continued refactoring of test code that has gotten too tainted by obfuscating detail. Just as you saw me refactor bad tests into clean tests, so too will disciplined developers refactor their test code into more succinct and expressive forms.

A Domain-Specific Testing Language (DSTL) is a specialized language designed to make writing and understanding tests easier for a specific domain or application. Instead of relying on general-purpose programming languages, DSTLs provide tailored abstractions and utilities that align closely with the problem domain.

The term "domain" refers to the specific area or subject matter that the testing language is designed to address. For example: Application Domain: software, like e-commerce platforms, mobile apps, or financial systems. Problem Domain: This focuses on the challenges or requirements within a specific area, such as user authentication, payment processing, or data visualization. Business Domain: This relates to the industry or business processes, like healthcare, logistics, or education.

A Dual Standard

In one sense, the team I mentioned at the beginning of this chapter had things right. The code within the testing API does have a different set of engineering standards than production code. It must still be simple, succinct, and expressive, but it need not be as efficient as production code. After all, it runs in a test environment, not a production environment, and those two environments have very different needs.

That is the nature of the dual standard. There are things that you might never do in a production environment that are perfectly fine in a test environment. Usually, they involve issues of memory or CPU efficiency. But they never involve issues of cleanliness.

One Assert per Test

Every test function in a JUnit test should have one and only one assert statement. This rule may seem draconian, but the advantage can be seen in the example. Those tests come to a single conclusion that is quick and easy to understand.

In the tests in the previous example, it seems unreasonable that we could somehow easily merge the assertion that the output is XML and that it contains certain substrings. However, we can break the test into two separate tests, each with its particular assertion.

Notice that I have changed the names of the functions to use the common given-when-then convention. This makes the tests even easier to read. Unfortunately, splitting the tests as shown results in a lot of duplicate code.

We can eliminate the duplication by using the TEMPLATE METHOD pattern and putting the given/when parts in the base class, and the then parts in different derivatives. Or we could create a completely separate test class and put the given and when parts in the @Before function, and the when parts in each @Test function. But this seems like too much mechanism for such a minor issue.

The Template Method Pattern is a behavioral design pattern used in object-oriented programming. It defines the skeleton of an algorithm in a base class (often abstract) and allows subclasses to provide specific implementations for certain steps of the algorithm.

I think the single assert rule is a good guideline. I usually try to create a domain-specific testing language that supports it. But I am not afraid to put more than one assertion in a test. I think the best thing we can say is that the number of asserts in a test ought to be minimized.

Single Concept per Test

Perhaps a better rule is that we want to test a single concept in each test function. We don’t want long test functions that go testing one miscellaneous thing after another. Example of such a test. This test should be split up into three independent tests because it tests three independent things. Merging them all together into the same function forces the reader to figure out why each section is there and what is being tested by that section.

The three test functions probably ought to be like this:

Given the last day of a month with 31 days (like May):

When you add one month, such that the last day of that month is the 30th (like June), then the date should be the 30th of that month, not the 31st.
When you add two months to that date, such that the final month has 31 days, then the date should be the 31st

Given the last day of a month with 30 days in it (like June):

When you add one month such that the last day of that month has 31 days, then the date should be the 30th, not the 31st.

Stated like this, you can see that there is a general rule hiding amidst the miscellaneous tests. When you increment the month, the date can be no greater than the last day of the month. This implies that incrementing the month on February 28th should yield March 28th. That test is missing and would be a useful test to write.

So it’s not the multiple asserts in each section that cause the problem. Rather, it is the fact that there is more than one concept being tested. So, probably the best rule is that you should minimize the number of asserts per concept and test just one concept per test function.

F.I.R.S.T.

Clean tests follow five other rules that form the above acronym:

Fast Tests should be fast. They should run quickly. When tests run slowly, you won’t want to run them frequently. If you don’t run them frequently, you won’t find problems early enough to fix them easily. You won’t feel as free to clean up the code. Eventually, the code will begin to rot.
Independent Tests should not depend on each other. One test should not set up the conditions for the next test. You should be able to run each test independently and run the tests in any order you like. When tests depend on each other, then the first one to fail causes a cascade of downstream failures, making diagnosis difficult and hiding downstream defects.
Repeatable Tests should be repeatable in any environment. You should be able to run the tests in the production environment, in the QA environment, and on your laptop while riding home on the train without a network. If your tests aren’t repeatable in any environment, then you’ll always have an excuse for why they fail. You’ll also find yourself unable to run the tests when the environment isn’t available.
Self-Validating. The tests should have a Boolean output. Either they pass or fail. You should not have to read through a log file to tell whether the tests pass. You should not have to manually compare two different text files to see whether the tests pass. If the tests aren’t self-validating, then failure can become subjective, and running the tests can require a long manual evaluation.
Timely The tests need to be written in a timely fashion. Unit tests should be written just before the production code so that they pass. If you write tests after the production code, then you may find the production code to be hard to test. You may decide that some production code is too hard to test. You may not design the production code to be testable.

Conclusion

We have barely scratched the surface of this topic. Indeed, I think an entire book could be written about clean tests. Tests are as important to the health of a project as the production code is. Perhaps they are even more important because tests preserve and enhance the flexibility, maintainability, and reusability of the production code.

So keep your tests constantly clean. Work to make them expressive and succinct. Invent testing APIs that act as a domain-specific language that helps you write the tests.

If you let the tests rot, then your code will rot too. Keep your tests clean

Unit Tests

Ahmed Hosny

Session Lead @Udacity | Education Manager @Cairo Coding School | Mobile Software Engineer @The Chance | Mobile Engineer (Android & Flutter) | Project manager & Scrum Master

The Three Laws of TDD

Keeping Tests Clean

Tests enable the -ilities

Clean Tests

After Refactor

Domain-Specific Testing Language

A Dual Standard

One Assert per Test

Single Concept per Test

F.I.R.S.T.

Conclusion

More articles by this author

Others also viewed

Strategies to simplify your BDD step definitions by Tamás Balog

Unlocking Efficiency: How GenAI Cut Months Off Our Test Automation Framework Code Migration

API Testing: An Approach - Introducing G.E.A.R.S.

Is Manual Testing Dying?

When testing just doesn't cut it

Oops Concepts in Selenium: Mastering Test Automation

Achieving Excellence in Test Automation: BDD with Cucumber, Selenium, Java, and TestNG

Exploring Postman Flows: Visualising API Tests Like Never Before

What is Swagger and How to Use It

Utilizing Code Coverage Tools for Effective Quality Assurance: A Hands-On Guide

Explore topics

The Three Laws of TDD

Keeping Tests Clean

Tests enable the -ilities

Clean Tests

After Refactor

Domain-Specific Testing Language

A Dual Standard

One Assert per Test

Single Concept per Test

F.I.R.S.T.

Conclusion

Head First Design Patterns (CH 1 - Part 1)

Jun 4, 2025

Chapter 11 of Cracking The Coding Interview

Apr 24, 2025

Chapter 1 of Grokking Algorithms

Apr 5, 2025

Others also viewed

Strategies to simplify your BDD step definitions by Tamás Balog

Unlocking Efficiency: How GenAI Cut Months Off Our Test Automation Framework Code Migration

API Testing: An Approach - Introducing G.E.A.R.S.

Is Manual Testing Dying?

When testing just doesn't cut it

Oops Concepts in Selenium: Mastering Test Automation

Achieving Excellence in Test Automation: BDD with Cucumber, Selenium, Java, and TestNG

Exploring Postman Flows: Visualising API Tests Like Never Before

What is Swagger and How to Use It

Utilizing Code Coverage Tools for Effective Quality Assurance: A Hands-On Guide

Explore topics