Embedded CI with Jenkins: Free, fast, and foolish
Jenkins is the leading open source automation server, but can be a real struggle to set up and maintain for an embedded development environment. Join us for this free webinar, where we’ll discuss the good, the bad and the ugly that comes with Jenkins for embedded CI.
By: Steve Branam
Principal Firmware Engineer, Dojo Five
Unit testing uses small, fine-grained automated tests of specific behavior that can be run off-target to drive development of embedded system code, even when the hardware isn’t available.
What Is Unit Testing For Embedded Systems?
Unit testing is a vital part of a comprehensive testing strategy. Testing does not add quality. Instead, it verifies quality and detects deficiencies so that you can correct them before they affect your customers. No single testing regimen handles all quality attributes.
The meaning of unit testing has evolved over time. This post focuses on the most modern version.
Unit testing means testing small portions of software in isolation from the larger system, using automated tests. That allows them to be tested separately from other parts of the system that may not be ready yet.
In particular for embedded systems, it means testing off-target, on a development host or build server rather than on the target hardware.
This allows testing in small discrete increments so that you can focus on a small part of the system at a time.
Contrast this with integration and system testing, which test larger combined portions of the system that have been integrated together, up to the complete system on target hardware. These are also vital parts of a comprehensive testing strategy, as is HIL (Hardware In the Loop) testing.
Unit tests serve as an early warning system, allowing you to detect and address problems sooner in the development process.
The current modern version of unit testing is Test-Driven Development (TDD). This is actually a development methodology, driven by tests. Once development is complete, the tests remain a valuable asset to run as regression tests.
Regression tests are important because most products have a long life and evolve over time. They may need maintenance to fix bugs or improve other quality attributes and may gain new features. A good set of unit tests gives you the confidence to verify that these activities don’t break previously-working functionality. Then using TDD on the new work leaves you with additional new tests.
Unit tests also serve as executable documentation. Want to know how to use the software? Look at the tests and see how they call the software.
- Test assertion: an individual statement that can pass or fail, that asserts an expected condition resulting from a test.
- Test fixture: the environment in which the test runs, that needs to be set up to establish the initial conditions for the test. Also known as a test harness.
- Test case: an individual test that can pass or fail, comprised of test setup and test assertions.
- Test suite: executable test program comprised of a set of test cases.
- Code Under Test (CUT): the software being tested by a test case and test suite. Also known as Software Under Test (SUT) or Unit Under Test (UUT).
- Unit: classical unit testing uses the term “unit” to refer to an implementation unit, such as a module or a class, specifically writing test cases for each function they contain. In modern parlance, “unit” is more subtle: it refers to a unit of functionality or behavior. Test suites may still be organized around implementation units, but test cases are focused on testing behaviors.
Isolating The CUT
An important part of unit testing is isolating the CUT from its dependencies, the other components it relies on.
You do this by providing some kind of fake implementation for them. There are various strategies, tools, and frameworks for creating these “fakes”. They range from trivial stubs that satisfy the build dependencies to more complex mocks that track calls and simulate output and return values. Dependency injection is a way of managing how the dependencies (either real or fake) are provided to the CUT.
In embedded systems, software also depends on hardware elements. A typical strategy for isolating the CUT is to implement a Hardware Abstraction Layer (HAL) that provides access functions for direct hardware interaction. At the very lowest level, the HAL typically contains functions to get and set register values. It may also offer higher-level behaviors, for instance, the logical operations supported by a peripheral and its associated registers.
For testing, you replace the HAL with a version that uses the various faking strategies to allow running off-target, simulating the hardware interactions.
You can follow the same strategy with dependencies on RTOS elements. An Operating System Abstraction Layer (OSAL) or Kernel Abstraction Layer (KAL) allows isolating the CUT from the real system for off-target testing.
A layered, componentized system architecture allows you to test each part independently as the CUT.
Eventually, you need to test the CUT with its real dependencies. That’s the role of integration testing. Knowing that each part works properly in isolation helps with integration.
As mentioned above, TDD is development, driven by tests. That subtlety changes your perspective and guides the overall process.
As the developer of the code, you write the tests as well, simultaneously, so that you can use them to verify your code in real-time. You get both working code and unit tests as outputs from the process.
The TDD workflow follows a simple cycle:
- Write a small test to test a behavior.
- Build and run the test suite to see the new test fail, possibly not even compile yet.
- Make the CUT changes needed to pass the test.
- Build and run the test suite to see the new test pass.
- Refactor to remove any duplication or cleanup the test or CUT.
Repeat the cycle until you have fully implemented a satisfactory version of the CUT. “Satisfactory” means all quality criteria that you are trying to achieve: proper functioning, acceptable performance, scalability and resource utilization, and good design.
In order to keep this cycle fast (a few seconds to a few minutes), every iteration should be the minimum amount of code possible. When a test fails unexpectedly (or passes when you expected it to fail!), you know clearly that the problem is bounded by that small amount of code.
It’s perfectly fine to start with a trivial implementation to make a test pass, such as returning a hard-coded value from a function. As you build out the tests and CUT, that will eventually cease to be sufficient, and you’ll replace it with real code. That allows you to make rapid progress with small temporary steps as scaffolding that turns into the real code.
The first few iterations of this cycle will seem silly, but with further iterations, it blossoms into a fully-fledged, high-quality implementation. This grows the implementation from the blank page, always with confirmation that the code you have so far works.
It’s important to see the test fail when you expect it to, and see it pass when you expect it to. That confirms that the test works properly, avoiding false positives and false negatives. Tests are code too, subject to all the failures that can occur in the CUT.
The alternating pattern of running a test to see a failure and running to see a success, followed by cleanup, is known as red/green/refactor, since many tools will highlight failure in red and success in green. Refactoring is also a green phase, since the intent is to cleanup working code and keep it working.
TDD also allows you to experiment with alternate implementations of all or part of the design. Once you have green status, you can try out different things, keeping it green.
The refactoring step is very important. That’s where you clean up the code after having worked through whatever it took to get the test to pass. The idea is to constantly keep cleaning up every cycle; do some work, then clean it up, so that the code is always in a clean state.
Schools Of Thought for TDD
There are two schools of thought when applying TDD, which have come to be known as the Chicago and London schools.
The Chicago school focuses on “emergent design,” meaning you don’t have a lot of up-front design in mind before starting, and the design emerges as you work. The London school focuses on having a particular design in mind up-front.
In both cases, what you do have in mind is the behavior you want the CUT to provide. And regardless of how much design you have in mind up-front, what remains are the exact low-level details of implementing that design.
For example, you might need a data structure that provides FIFO storage behavior. With the Chicago school, you might end up with either a linked-list or a circular buffer. With the London school, you might decide up front that you want a circular buffer.
Both schools of thought are useful. There will be times when you have only a general idea of what you will implement, and times when you have a very clear idea going in. Much of the system may go one way, while the rest goes the other.
Either way, you want to be sure that the code you write works and satisfies the design goals. As it progresses, the instantaneous feedback of the TDD cycle drives that design and implementation.
You may have an existing codebase with no unit tests, or with unit tests created in a classical, non-TDD manner (sometimes referred to as Test-Later Development (TLD), where the tests are written after the code). When adding to these codebases, for instance for bugfixes or to add features, you can begin using TDD.
If you have existing off-target unit tests, TDD will be easier to adopt, because for the most part it just means moving the test development up into the code development. This is referred to as “shift left,” because it shifts getting feedback to the left on the development timeline.
Your existing unit testing infrastructure should still apply. The biggest change will be that the unit tests are written by the same developers who write the CUT, not by separate test developers.
Codebases that don’t have any off-target unit tests are more difficult. The existing software architecture may not have been built with testability in mind and may not lend itself well to testing smaller parts.
In addition to investing time creating your unit test infrastructure, you may need to make changes to your codebase to facilitate test isolation. The benefit of this effort is that you bring the code under test.
Contrasting With Classical Unit Testing
Classical unit testing has been referenced a couple times above, in the definition of a “unit”, and referring to it as TLD. If you’re used to that style of unit testing, TDD may seem strange.
One of the main issues with TLD is that the feedback loop from test results back to development is long. It can be days or weeks before you as the developer are notified of test failures, by which time you’ve moved on to other work; you may have to spend time reacquainting yourself with the code. Chasing down problems may require a large amount of debugging time, further delaying fixes.
TDD eliminates that delay. You have immediate feedback while things are fresh in your mind. You know immediately if something is working or not. That frees you to experiment with different approaches and quickly discard the ones that aren’t satisfactory. By working in small test/code steps, it’s much more apparent where the failure is in the CUT, so debugging may be eliminated.
Integration will be faster and smoother because you’re integrating known good components, not untested ones that may have lurking bugs.
While placing the burden of writing unit tests on developers may appear on paper to slow their progress down, the fast feedback cycle during development, the reduction or complete elimination of debugging, and faster integration all combine to result in overall faster delivery of completed functionality and systems.
One philosophy of testing is that developers shouldn’t test their own code, because they won’t be thorough enough in testing all the code written, resulting in poor test coverage. TDD takes a different philosophy, where the TDD cycle and emphasis on behavior means that no code is written unless there’s a test that covers it. That results in high test coverage, all the way up to 100%.
What Makes A Good Test?
There are several guidelines that make a good test. Not following them risks creating tests that are difficult to maintain. A particular risk is brittle tests, i.e. tests that break whenever you change the code, becoming a significant maintenance burden in their own right.
Test Behavior, Not Implementation
This is also stated as “test interface, not implementation”. Test only the behavioral aspects that are presented via the interface of the CUT, not the details of the underlying implementation.
Continuing the example above of a FIFO data structure, that implies specific behavior that will be apparent in the interface as you work it out. For instance, you know that adding items to the structure will change its state from empty to non-empty, retrieving them should remove them in the order in which they were added, and retrieving all of them will change it from non-empty to empty. You might also be able to query the number of items it contains.
This interface will tend to be stable long-term once you’ve settled on it, but the implementation could change. You might track internal state differently, or you might switch from a linked-list implementation to a circular buffer implementation. Even if you do all this correctly, it would cause failures in tests that tested the old implementation.
If the interface doesn’t allow a way to check for evidence of behavior, you can use mocks to verify that the CUT called underlying dependencies as expected. For example, testing code that sends data out a UART may only return a success or failure status. But it will need to interact with hardware, so using a mock version of the HAL would allow you to check what was actually sent to the HAL interface.
If you find that you have to fix tests when you make changes to the implementation, that’s a signal that you’re testing implementation. The only time you should have to fix tests is when you change the interface, and that should be limited in scope to the specific tests that use the part of the interface that has changed.
Test One Thing At A Time
Each test case should test only one specific behavior. Each one should be very focused on establishing initial conditions, triggering the CUT, and checking the results.
This makes each case standalone, without order dependencies, so that behaviors can be tested in isolation at fine granularity, and test failures will point to very specific parts of the CUT.
Test cases should not perform additional tests. In particular, avoid “run-on” tests, or the “one test that does it all”. This can produce a cascade of failures that are difficult to analyze for root cause.
Make Your Test Fast
Each test case should be very fast, and the entire suite should complete in seconds, no more than a minute at most. This enables a fast edit-build-run cycle for TDD.
Use fakes to achieve this, because one of the main consumers of run time is slow or long-running operations performed by dependencies. For example, database or file operations, network communications, and physical device interactions. Timed delays are another; make these instantaneous by faking the passage of time.
Follow The BDD Pattern
Behavior-Driven Development (BDD) is a way of structuring TDD test cases that embodies all the guidelines above.
BDD is a form of “specification by example,” using a simple pattern: given a set of initial conditions, when an event occurs, then there should be expected outcomes. Another way of stating this is given/should/when.
For example, given an empty FIFO, when an item is added to it, then it should be non-empty. Alternatively, given an empty FIFO, it should be non-empty when an item is added to it.
You can name test cases directly according to this pattern, so that the names read as a set of specifications for what should happen. For example, “GivenEmptyFIFO_WhenItemAdded_ThenShouldBeNonEmpty” or “EmptyFIFO_ShouldBeNonEmpty_WhenItemAdded”.
Then write the test case code to follow the pattern, setting up the initial conditions as the given, calling the CUT as the when, and asserting expectations as the then. For example:
// Given IntFifo_t fifo; IntFifo_init(&fifo); // When bool was_added = IntFifo_add(&fifo, 1); // Then EXPECT_TRUE(was_added); EXPECT_FALSE(IntFifo_is_empty(&fifo));
There may be multiple steps to setting up the initial conditions, and multiple expectations asserted to verify the results, but the event being tested, the when, should be a single call to the CUT.
The tests should form an interlocking set, so that once you’ve tested a particular behavior, you don’t have to verify it in each test case. For example, the above assumes that an initialized FIFO is empty, so there should be a case for that, “FIFO_ShouldBeEmpty_WhenInitialized”.
When one of these fine-grained tests fails, the intended behavior and specific failing CUT are clear.
For TDD on embedded systems, see the book Test Driven Development for Embedded C, by James Grenning.
For more on TDD that’s not specifically for embedded systems, see the book Modern C++ Programming with Test-Driven Development, by Jeff Langr.
For the original blog post on BDD, see https://dannorth.net/introducing-bdd/, by Dan North.
For working on existing codebases, see the book Working Effectively with Legacy Code, by Michael Feathers. His definition of “legacy code” is any code that doesn’t have unit tests.
Unit Testing Frameworks
There are several unit testing frameworks suitable for embedded systems:
- Unity (http://www.throwtheswitch.org/unity): For C only. Includes CMock, which automatically generates mocks from header files.
- Google Test (https://google.github.io/googletest/): For C and C++. Includes Google Mock, which creates mocks with minimal manual effort.
- CppUTest (http://cpputest.github.io/): For C and C++. Includes CppUMock, which performs automated mocking.
You can use these on your development host for TDD, and in your Continuous Integration (CI) build server process. Builds should run all unit test suites; failing tests should fail the build.
Unit testing uses small, fine-grained automated tests of specific behavior that can be run off-target to drive development of embedded system code, even when the hardware isn’t available. It provides fast feedback and confidence in the code so that you are always building from a working base.
Dojo Five can help you with unit testing. We bring modern tools, techniques, and best practices from the web and mobile development environments, paired with leading-edge innovations in firmware to our customers to help them build successful products and successful clients. Our talented engineers are on hand ready to help you with all aspects of your EmbedOps journey. Bring your interesting problems that need solving - we are always happy to help out. You can reach out at any time on LinkedIn or through email!