Unit Testing For Embedded Software Development

Unit Testing Keyboard

Unit testing uses small, fine-grained automated tests of specific behavior that can be run off-target to drive development of embedded system code, even when the hardware isn’t available.

Introduction

Embedded systems have different reliability requirements than other types of software. They may be installed in remote places, have long expected lifetimes, and may be difficult to service. It may also be difficult to get meaningful information about what is happening with products in the field. Additionally, some systems, such as medical or industrial equipment, have the possibility of causing harm or causing damage if a failure occurs. All of this points to the need for a higher focus on quality up front.

Unit testing is a vital part of a comprehensive testing strategy. It should be used in conjunction with other types of testing, such as integration, system, or HIL (Hardware in the Loop) testing. Testing does not add quality. Instead, it verifies quality and detects deficiencies so that you can correct them before they affect your customers. No single testing regimen handles all quality attributes.

Unit testing means testing small portions of software in isolation from the larger system, using automated tests. In particular for embedded systems, it typically means testing off-target, on a development host or build server rather than on the target hardware. Unit tests should be fast and isolated, and are typically used to verify complex logic. When scoping them, imagine the ability to run 1000’s per second.

There are multiple benefits to this.

  • Individual code units can be tested separately from the rest of the system. This allows development to progress when other parts of the system, or hardware, isn’t ready yet.
  • It allows a much more granular level of verification than is achievable at the integration and system levels. This leads to higher levels of reliability and quality assurance.
  • It serves as an early warning system, allowing you to detect and address problems sooner in the development process. The later in the process a defect is found, the more expensive it is to fix.
  • It serves as executable documentation. The tests ensure that the code under test satisfies the requirements of the interface. This does not replace documentation such as Doxygen comments, but should go hand-in-hand with it.
  • Since unit tests test only one thing, they are very good at localizing the problem. When they fail, you generally know exactly where the problem is. Compare that experience to a system-level test which may fail, but you have no idea where the problem is without additional investigation.

Concepts and Terminology

What is a unit and what did we mean by referring to them as “small portions of software”? In general, a unit a single implementation unit that can be taken in isolation, such as a module or class. It will often consist of a single implementation file. The goal is to take this module or class and mount it into a test harness that can test it in isolation.

Before discussing further, let’s cover some terminology.

  • Test assertion: an individual statement that can pass or fail, that asserts an expected condition resulting from a test.
  • Test fixture/harness: the environment in which the test runs, that needs to be set up to establish the initial conditions for the test.
  • Test case: an individual test that can pass or fail, comprised of test setup and test assertions.
  • Test suite: executable test program comprised of a set of test cases.
  • Code Under Test (CUT): the software being tested by a test case and test suite. Also known as Software Under Test (SUT) or Unit Under Test (UUT).
  • Unit: classical unit testing uses the term “unit” to refer to an implementation unit, such as a module or a class, specifically writing test cases for each function they contain.
  • Depended on Component (DOC): a direct dependency of the CUT. For example, a motor controller might talk to a motor driver. The driver is a depended on component of the controller.
  • Transitively Depended on Component (TDOC): a dependency at least one step removed from the CUT.

Mechanics of Unit Testing

Imagine a dependency graph with higher-level modules depending on lower-level modules. Now select one of those modules as the unit to be tested, and we can represent it and the surrounding units as follows.

The goal of the test harness is to isolate the CUT, fake out any dependencies, and then exercise a test suite. The test cases exercise the CUT directly, operating as a client of the CUT. The depended on components (DOCs) are replaced by test doubles. They may be mocks, fakes, or stubs. More on this later. The point is that a test double stands in for the DOC it replaces and may be controlled directly by the unit tests. This allows the tests to verify interactions of the CUT with its DOCs and simulate results back up to the CUT. Since none of this relies on hardware, this is how it can be run off-target on the development host. The test framework will create a standalone executable (or executables) that, when executed, report test results to standard output. They may create XUnit or JUnit reports as well.

Here is what a dependency graph isolating the CUT may look like in a unit test framework.

Mocks, Fakes, and Stubs

What do we use for the test doubles? There are a few options and their choice depends on context. The developer must decide what’s right for each situation.

  • Fake: A partial implementation that stands in for the real thing. An example would be a RAM-based implementation of a flash driver. The CUT can still perform writes and reads and can read back what was written, but there are no actual flash operations happening. A fake must be written by the developer.
  • Mock: Verifies functions called, their call order, the parameters passed to them, and can return specific values to the CUT. All of these details are specified by the test case using ancillary functions. Mocks can be generated automatically by the test framework based on a header file or a class definition.
  • Stub: A simplified version of mock. It may do nothing or just return a single value. This may be used where some implementation needs to exist, but the exact details do not exactly matter.

Replacing the Dependencies

How do we replace the dependencies? The easiest and most ideal answer is with dependency injection. If the code under test is passed in references to its dependencies, then the unit tests can easily provide a test double. It only need ensure that the test doubles conform to the expected interface. Importantly, for dependency injection to work in this scenario, it is important that the dependencies are represented as abstract interfaces. In C, this could be an array of function pointers. In C++, it could be an actual abstract class.

What happens if the CUT relies directly on its DOCs with no interface abstraction? Here we can rely on the concept of seams. Michael Feathers describes a seam as

a place where you can alter behavior in your program without editing in that place.

There are two types of seams we will focus on here: link seams and object seams.

The diagram below shows a simple example of a module calling into an LED driver. This example is kept simple for demonstration purposes. In real life the driver may automatically handle active high vs active low LEDs or support more sophisticated LED states such as PWM, blinking or “breathing”. The point here is that even though the dependency to led_driver has been directly declared, we can still swap out the code it connects to at link time.

The code under test still includes the same led_driver.h and this allows it to compile. However, led_driver.c is gone (along with its transient dependency on the HAL) and in it’s place is led_driver_mock.c. The mock is something that some unit test frameworks, such as Ceedling, can generate directly from the header file. The test case will then interact directly with the mock, telling it would should expect before invoking the function under test.

Object Seams

In object-oriented designs, the interactions between objects represent object seams. The dependency objects of a CUT can be replaced without the CUT knowing about it. In practice, this requires inheritance, so the operation is not quite as transparent a seam as the link seam. For a mocking framework like gMock (which is part of GoogleTest) the minimum requirement is that the class methods to be replaced must be declared virtual. This allows gMock to inherent from the dependency class and replace its method implementations.

Comparison of Unit Testing Frameworks

There is a variety of unit testing frameworks to choose from. Here we will focus on two frameworks that are preferred by Dojo Five: Ceedling and GoogleTest. Every testing framework varies in what it offers: the automation it provides, the format for writing unit tests, how they deal with dependencies, and so forth.

The main difference between these two frameworks is the focus on C vs C++. Ceedling is designed to generate CMock style mocks for you based on header files and use those in place of depended on components (i.e. link seams). GoogleTest uses gMock, which relies on object seams.

Ceedling GoogleTest
Best for C C++
Can be used for C with the Fake Function Framework (FFF)
Mocking seam Linker:
Replace real implementation .c file with mock .c file. Mock declares the same symbols and CUT links against those.
Object:
Inherit from depended on object and override its methods. Both the real implementation and the mock exist side-by-side.
Build framework Custom. Based on Ruby. Ceedling is provided as a gem.
Entire process of generating mocks, building test executables, and running tests is automated based off of directory structure and a special yaml file.
Supports CMake and Bazel. Tutorials provided for both.
See this list for the currently supported OSes and compilers.
Generated output A separate executable for each test_xxx.c file. A single executable that contains all the test suites.
IDE integration Ceedling Test Explorer GoogleTest Adapter
C++ TestMate

One advantage with Ceedling when it comes to IDE integration: because its build process rebuilds the tests and runs them as part of the same process, this means that rerunning test suites from the Ceedling Test Explorer can be done in a single click. In contrast, GoogleTest more explicitly separates the build and running stages and the IDE extensions do not combine this into a single click.

One note about using GoogleTest together with FFF. Since GoogleTest combines all test suites into a single executable, you will not be able to have mocked functions and the real functions side-by-side in the same executable. I.e. the same module cannot be a CUT for one test suite and then replaced with a mock if it is a DOC in another test suite. This is a limitation of link seams, which is why Ceedling generates multiple executables. A deeper discussion of the problem and potential workarounds can be found on Stack Overflow here.

Designing Test Cases

There are several guidelines that make a good test. Not following them risks creating tests that are difficult to maintain. A particular risk is brittle tests, i.e. tests that break whenever you change the code, becoming a significant maintenance burden in their own right.

Test Behavior, Not Implementation

This is also stated as “test interface, not implementation”. Test only the behavioral aspects that are presented via the interface of the CUT, not the details of the underlying implementation. The interface will tend to be stable long-term once you’ve settled on it, but the implementation could change.

This practice also better supports regression tests. Regression tests are important because most products have a long life and evolve over time. They may need maintenance to fix bugs or improve other quality attributes and may gain new features. A good set of unit tests gives you the confidence to verify that these activities don’t break previously-working functionality. Then using TDD on the new work leaves you with additional new tests.

This really is just the same principles of loose coupling and information hiding that apply to any other client.

Test One Thing at a Time

Each test case should test only one specific behavior. Each one should be very focused on establishing initial conditions, triggering the CUT, and checking the results. This is the Four-Phase Test Pattern: setup, exercise, verify, and cleanup. Cleanup is important to ensure there is no leftover state or unfreed resources.

This makes each case standalone, without order dependencies, so that behaviors can be tested in isolation at fine granularity, and test failures will point to very specific parts of the CUT.

Test cases should not perform additional tests. In particular, avoid “run-on” tests, or the “one test that does it all”. This can produce a cascade of failures that are difficult to analyze for root cause.

Make Your Test Fast

Each test case should be very fast, and the entire suite should complete in seconds, no more than a minute at most. This enables a fast edit-build-run cycle for TDD.

Use fakes to achieve this, because one of the main consumers of run time is slow or long-running operations performed by dependencies. For example, database or file operations, network communications, and physical device interactions. Timed delays are another; make these instantaneous by faking the passage of time.

Follow the BDD Pattern

Behavior-Driven Development (BDD) is a way of structuring test cases that embodies all the guidelines above.

BDD is a form of “specification by example,” using a simple pattern: given a set of initial conditions, when an event occurs, then there should be expected outcomes.

Another way of organizing a BDD test is given/should/when. For example, given an empty FIFO, when an item is added to it, then it should be non-empty. Alternatively, given an empty FIFO, it should be non-empty when an item is added to it.

You can name test cases directly according to this pattern, so that the names read as a set of specifications for what should happen. For example, “GivenEmptyFIFO_WhenItemAdded_ThenShouldBeNonEmpty” or “EmptyFIFO_ShouldBeNonEmpty_WhenItemAdded”.

Then write the test case code to follow the pattern, setting up the initial conditions as the given, calling the CUT as the when, and asserting expectations as the then. For example:

// Given 
IntFifo_t fifo; 
IntFifo_init(&fifo); 

// When 
bool was_added = IntFifo_add(&fifo, 1); 

// Then 
EXPECT_TRUE(was_added); 
EXPECT_FALSE(IntFifo_is_empty(&fifo));

There may be multiple steps to setting up the initial conditions, and multiple expectations asserted to verify the results, but the event being tested, the when, should be a single call to the CUT.

The tests should form an interlocking set, so that once you’ve tested a particular behavior, you don’t have to verify it in each test case. For example, the above assumes that an initialized FIFO is empty, so there should be a case for that, “FIFO_ShouldBeEmpty_WhenInitialized”.

When one of these fine-grained tests fails, the intended behavior and specific failing CUT are clear.

Test Driven Development

The current modern version of unit testing is Test-Driven Development (TDD). This is actually a development methodology, driven by tests. That subtlety changes your perspective and guides the overall process.

As the developer of the code, you write the tests as well, simultaneously, so that you can use them to verify your code in real-time. You get both working code and unit tests as outputs from the process.

The TDD workflow follows a simple cycle:

  1. Write a small test to test a behavior.
  2. Build and run the test suite to see the new test fail, possibly not even compile yet.
  3. Make the CUT changes needed to pass the test.
  4. Build and run the test suite to see the new test pass.
  5. Refactor to remove any duplication or cleanup the test or CUT.

Repeat the cycle until you have fully implemented a satisfactory version of the CUT. “Satisfactory” means all quality criteria that you are trying to achieve: proper functioning, acceptable performance, scalability and resource utilization, and good design.

In order to keep this cycle fast (a few seconds to a few minutes), every iteration should be the minimum amount of code possible. When a test fails unexpectedly (or passes when you expected it to fail!), you know clearly that the problem is bounded by that small amount of code.

It’s perfectly fine to start with a trivial implementation to make a test pass, such as returning a hard-coded value from a function. As you build out the tests and CUT, that will eventually cease to be sufficient, and you’ll replace it with real code. That allows you to make rapid progress with small temporary steps as scaffolding that turns into the real code.

The first few iterations of this cycle will seem silly, but with further iterations, it blossoms into a fully-fledged, high-quality implementation. This grows the implementation from the blank page, always with confirmation that the code you have so far works.

It’s important to see the test fail when you expect it to, and see it pass when you expect it to. That confirms that the test works properly, avoiding false positives and false negatives. Tests are code too, subject to all the failures that can occur in the CUT.

The alternating pattern of running a test to see a failure and running to see a success, followed by cleanup, is known as red/green/refactor, since many tools will highlight failure in red and success in green. Refactoring is also a green phase, since the intent is to cleanup working code and keep it working.

TDD also allows you to experiment with alternate implementations of all or part of the design. Once you have green status, you can try out different things, keeping it green.

The refactoring step is very important. That’s where you clean up the code after having worked through whatever it took to get the test to pass. The idea is to constantly keep cleaning up every cycle; do some work, then clean it up, so that the code is always in a clean state.

Dealing with Legacy Code

You may have an existing codebase with no unit tests, or with unit tests created in a classical, non-TDD manner (sometimes referred to as Test-Later Development (TLD), where the tests are written after the code). When adding to these codebases, for instance for bugfixes or to add features, you can begin using TDD.

If you have existing off-target unit tests, TDD will be easier to adopt, because for the most part it just means moving the test development up into the code development. This is referred to as “shift left,” because it shifts getting feedback to the left on the development timeline.

Your existing unit testing infrastructure should still apply. The biggest change will be that the unit tests are written by the same developers who write the CUT, not by separate test developers.

Codebases that don’t have any off-target unit tests are more difficult. The existing software architecture may not have been built with testability in mind and may not lend itself well to testing smaller parts.

In addition to investing time creating your unit test infrastructure, you may need to make changes to your codebase to facilitate test isolation. The benefit of this effort is that you bring the code under test.

Resources

For TDD on embedded systems, see Test Driven Development for Embedded C, by James Grenning.

For more on TDD that’s not specifically for embedded systems, see Modern C++ Programming with Test-Driven Development, by Jeff Langr.

For the original blog post on BDD, see Dan North’s post Introducing BDD

For working on existing codebases, see the book Working Effectively with Legacy Code, by Michael Feathers, where Michael defines “legacy code” as any code that doesn’t have unit tests.

Conclusion

Unit testing uses small, fine-grained automated tests of specific behavior that can be run off-target to drive development of embedded system code, even when the hardware isn’t available. It provides fast feedback and confidence in the code so that you are always building from a working base.

Dojo Five can help you with unit testing. We bring modern tools, techniques, and best practices from the web and mobile development environments, paired with leading-edge innovations in firmware to our customers to help them build successful products and successful clients. Our talented engineers are on hand ready to help you with all aspects of your EmbedOps journey. Bring your interesting problems that need solving – we are always happy to help out. You can reach out at any time on LinkedIn or through email!