Direct Memory Access (DMA) controllers can vastly improve performance on resource-limited systems, a very common attribute of many embedded systems.

As demands of embedded systems are growing all the time, the developers’ ongoing task is to find ways of getting the most performance out of the system. The good news is that as embedded processors become more complex, DMA controllers are showing up on more and more of them. And those DMA controllers are becoming more powerful. If there’s a DMA controller in the system, consider the places in which it can be incorporated to off-load tasks from the processor. That controller is probably just sitting there idle, because, well, everyone else probably overlooked it. There are amazing performance improvements to be had, right there in the open, and there’s no additional cost besides some development time to explore the possibilities.


Direct Memory Access (DMA) controllers are often overlooked in embedded development, but when used correctly, they can improve the overall performance of the system. A DMA controller is a peripheral whose purpose is to move data from point A to point B. Many developers’ opinions of DMA controllers conjure up thoughts of desktop computers or large computer systems. They’re best known in those systems for transferring data between hard disks and system memory. But – they’re so much more useful than that.

In a nutshell, instead of having the processor move data from one place to another, the DMA controller is programmed to do the moving. It contains some registers that tell it where the data is coming from, where the data is going, how much data to move, and when to move it. To move a couple of bytes, it’s quicker to have the processor just do the work. But to move blocks of data, with a few instructions to configure the DMA controller, the controller steps in and does all the work. Meanwhile, the processor is free to work on other tasks in parallel.

My first use of a DMA controller as an embedded developer was in a clone of an IBM Personal Computer XT. It was connected to a data hungry printer and if the data stream couldn’t keep up, the printed images would be corrupted. The original PC used a clunky, pieced-together DMA implementation that allowed the processor to prep the new data while the printer buzzed away. However, the DMA controller could only handle the lower 16 bits of the address; the upper bits were held in a separate static register. In addition, the addressing was page oriented, and page alignment, address wrapping, and overflow all had to be overcome to operate correctly.

Modern Implementations

DMA controllers have come a long way. A great example appears in Texas Instrument’s TMS320 family DMA controllers. Many aspects of the transfers are programmable and that provides a tremendous amount of control. Rather than just define a transfer size and indicate an address increment or decrement after each transfer, the source and destination addresses can be modified independently. And multiple, different requests can be chained together. Programmed correctly, it can not only read data from a peripheral and/or memory contents to another memory location but can also perform a BitBlt* or rotate asymmetric data matrixes.

[*BitBlt is a technique used to copy a rectangle’s worth of graphical 2D image data from a source buffer to a destination buffer. It is typically used to copy a rectangle of graphical data from one part of the display to another. The rectangle typically has a different height and width than the source buffer or the destination buffer, which makes for some simple but interesting math.]

Nordic Semiconductor has another great implementation for their DMA controllers. One of the agonizing issues with low level communications like SPI buses is to decide what to do between bytes of a multibyte transfer. You can either tie up the processor by polling for those few microseconds for each byte to transfer, or you can create an interrupt driven design that typically invokes as much, if not more, overhead handling the interrupts to keep the data moving. Or you may call a sleep function to transfer control to a separate thread, which will significantly slow down the throughput. Large transfers to storage devices or displays really exacerbate the problem. Nordic solved the problem by making the standard SPI input and output access through the DMA controller on their SPIM device. Setting up the DMA transfer is very simple, much simpler than most DMA controllers. Put the whole message in a buffer, and the DMA controller takes care of the rest. Once the whole message has been sent, an interrupt occurs. In fact, I don’t know if there’s direct program access to the SPI transmit and receive buffers. They’re not mentioned in the datasheet. They offer a similar implementation for their UART (UARTE) and I2C (TWIM) interfaces.

Even less-fancy DMA controllers can be used to perform amazing feats. Using a basic DMA controller and a SPI controller, I was able to generate large digital waveforms that met some tight timing requirements. In another project, a necessary peripheral wasn’t available for communications. Software bit banging was replaced with a DMA controller and a GPIO interface, creating waveforms worthy of the nonexistent peripheral, and meeting all the necessary hard real-time timing requirements, but without tying up the processor.

If you’d like to learn more about the potential for DMA controllers in your next project or have an exciting implementation to share, we are always happy to hear about cool projects or interesting problems to solve, so don’t hesitate to reach out and chat with us on LinkedIn or through email!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.