Introduction to Serial Framing Formats

Introduction to Serial Framing Formats

Introduction

Serial communication (UART, RS-232, SPI, I2C, etc) provides the ability to transfer raw data between devices in Embedded Systems. There are a number of challenges to consider when developing or integrating a serial communication solution in your system that impacts performance and reliability. Does your device know when a frame of data starts and ends? Can your device handle data synchronization in a noisy environment? Serial framing formats help in solving some of these challenges.

What are Serial Framing Formats?

Serial framing formats are methods to structure raw data into recognizable messages, ensuring that devices can interpret, synchronize and recover from errors efficiently. Without proper framing, serial communication can become ambiguous, unreliable and prone to data corruption. The following are common Serial Framing formats and specific examples for each with their tradeoffs and applications.

Book a Call with Dojo Five Embedded Experts

Common Serial Framing Formats

High Bit Set Algorithms (9-bit Serial Framing, MIDI)

High Bit Set algorithms use the Most Significant Bit (MSB) to indicate special control or address bytes. When the MSB is equal to 1, this signals control or frame start bytes, while regular data bytes have the MSB set to 0. This method is effective in low-bandwidth communications to maximize processing time.

Musical Instrument Digital Interface (MIDI) is an example of High Bit Set implementation where the 8th bit is used to denote control bytes or data bytes while the remaining 7 bits are data. In contrast, 9-bit serial framing (used on multidrop networks) uses the 9th bit to differentiate between addresses and data. In this scheme, multiple slaves are constantly listening for an address byte from a master. An address byte, denoted by the 9th bit set to 1, specifies the device that should receive subsequent data bytes. Data bytes are denoted by the 9th bit set to 0. All devices keep track of the current address and ignore data not addressed to them.

Advantages:

  • Simple and easy to parse as every frame is self-contained
  • Works well for addressing in multi-drop networks

Disadvantages:

  • Not efficient for binary data as it reduces available data bits per byte (i.e. 7 bits of data per byte in 8-bit version)
  • Hardware support for 9-bit modes is limited (not all UART supports it)
  • No built-in error detection, susceptible to frame misinterpretation
  • Does not support variable-length frames

Timed-Based Framing (Modbus RTU, other UART-based protocols)

In Time-Based Framing, each frame is separated by a predefined idle time between the start and end of transmissions. In other words, a frame is considered complete if no new data arrives within a given time window. This method is particularly useful if explicit frame delimiters add too much overhead for the system. It is especially effective when protocols operate in a continuous data stream, where timing gaps naturally separate messages. Time-Based Framing was particularly prevalent in low-cost and legacy systems that lacked the processing power to handle more complex framing formats.

Modbus RTU (Remote Terminal Unit) is a good example of Time-Based Framing implementation. Modbus RTU data frames are separated by an idle period. This period must be at least the transmission time of 3.5 characters. This avoids the need for additional control characters which translates to higher data efficiency. Modbus is used in systems using RS-485 serial buses for communication in industrial control and automation.

Advantages:

  • Simple (no extra control characters) meaning no data overhead

Disadvantages:

  • Requires precise timing. Not reliable especially in systems with poor timing (such as systems with multiple layers of software running)
  • Potential throughput overhead in more complex applications
  • No built-in error detection, susceptible to frame misinterpretation

Character-Oriented Framing (BISYNC)

Character-Oriented (or Byte-Oriented) Framing is a method where data is encapsulated by special and predefined control characters (or byte sequences). These control characters provide simple control over the establishment of a connection and the transmission of data. This format was particularly prevalent in early serial communications and legacy systems that benefitted from human-readable characters. Control characters could also be used to provide synchronization methods for protocols.

Clear examples of Character-Oriented Framing are Modbus ASCII and Binary Synchronous Communication (BISYNC). Both of these examples have a predefined list of characters used to denote the beginning and end of a frame. In Modbus ASCII, the leading colon “:” denotes the start and the trailing new line CR/LF denotes the end. BISYNC had a longer list of control characters such as SYN (alert incoming frame), ACK0 or ACK1 (for acknowledging of good frames), EOT (sender termination), etc. 

Modern binary framing techniques such as byte-stuffing, bit-stuffing and length-prefix (covered later on in the article) proved much faster and more efficient than character-oriented formats.

Advantages:

  • Easier for humans to read and debug
  • Synchronization via control characters

Disadvantages:

  • Higher overhead due to the additional control characters
  • Not efficient in high-speed or real-time communications
  • Primarily legacy applications as it has been replaced by newer methods

Byte Stuffing Algorithms 

Another possible solution to breaking data up into discrete frames is to introduce special characters that signify the start/end of a frame by byte-stuffing. When encoding data to be sent using this format, the data is scanned for any special characters, and if encountered, those special characters are then replaced (or “stuffed”) with some new value. For example, a common delimiter in the Consistent Overhead Byte Stuffing (COBS) format is zero, and so during encoding the algorithm will replace any instance of zero with the offset to the next occurrence of zero in the data. In doing this, the data will be free of any zeros during transmission, but may be easily restored during decoding. The COBS protocol is also common on embedded platforms where reliable data transmission and low overhead requirements are in place. It should be noted that while average data may be encoded/decoded with relatively low overhead, every instance of a special character in the original data will require escaping or stuffing bytes. This means there could be up to 100% overhead in worst case scenarios for both encoding and decoding. For example the data [0x00,0x00,0x00,(…repeating)] would be encoded as [0x01,0x01,0x01,0x01,(…repeating),0x00] using the COBS framing method. 

Byte stuffing protocols can be found in various applications and tech stacks. For example, Point-to-Point Protocol (PPP) is used in DSL internet connections which implements byte stuffing. You may recognize the acronym PPPoE or PPPoA if you have ever peaked at your modem’s network settings. Unlike COBS, the PPP format has a more robust data scheme that allows it to function effectively in a network environment.

Advantages:

  • Relatively simple to implement
  • Flexible allowing for async communications, various data types, and variable transmission speeds
  • Software based solutions are simple to port to new platforms

Disadvantages:

  • Some versions have a worst case of 100% data processing overhead as it is possible every byte could be a special character. 

Bit-Stuffing Algorithms (HDLC)

This approach tends to be more of a data line synchronization method than full on framing algorithm. Unlike byte stuffing, Bit-Stuffing formats work at the bit level and aim to eliminate sync loss when the data line does not have many transitions from one to zero. Some devices can be especially susceptible to losing sync when too many ones in a row are seen in some binary string, and so High-Level Data Link Control (HDLC) aims to solve this problem with bit-stuffing by inserting an artificial zero after every five consecutive ones. In these cases, the zero is ignored on the receiving end, and so it is able to maintain synchronization in a fairly simple manner. Additionally, the HDLC format incorporates several special characters to signify the start and end of some data block enabling both synchronization as well as data framing. 

The HDLC format tends to be used in a network setting where multiple nodes may need to communicate with each other. HDLC is also designed to work in both synchronous and asynchronous networks making it a great choice when more flexibility is required on a serial network connection. It should be noted, however, unless bit stuffing is implemented at the hardware level, it can be quite expensive to add in software.

Advantages:

  • Very low overhead

Disadvantages:

  • Transfers happen one bit at a time
  • Lacks data transfer confirmation

Length-Prefixed Formats 

When data is in a more complicated form (i.e. large data structures, dictionaries, JSON-like content), it can become quite cumbersome to attempt to encode this data for serial transmission. In cases like these, it may be beneficial to consider using Length-Prefixed formats as they are designed around data with variable sizes. Length-Prefixed formats simplify framing by including the length of data being sent in the beginning of the message, allowing the receiver to simply count bytes as it receives them.

Protocol Buffers (Protobuf), a popular Google protocol buffer library, is often used to serialize/deserialize data into discrete chunks commonly used with TCP/IP sockets as well as serial communication. Protocol buffers provide a standard data interface which makes them extremely useful when communicating with devices based on different operating systems. While Protobuf can support prefixed lengths in some instances, this isn’t always the case. When prefixed lengths are required all the time, Protobuf can be paired with gRPC which prepends a length to the front of every Protobuf frame. It should be noted that Protobuf and gRPC are designed around operating system level use, and so, if a bare metal solution is needed, it may be better to go with a simple format like Type-Length-Value (TLV). While TLV is generally used in network level protocols, it can easily be adapted to a bare metal embedded system. 

Advantages:

  • Greater flexibility for complex data transfers
  • No need for special characters
  • Supports multiple data types

Disadvantages:

  • Requires receiving device to keep a count of data as it is clocked in

Error Detection Section

It is important to note that most serial framing formats do not inherently provide error detection. This is why it is essential to consider protocols or implementations which leverage both the serial framing formats and error detection techniques such as Cyclic Redundancy Check (CRC) or Checksums. Protocols and implementations such as HDLC, PPP, Modbus, etc., support CRC-based solutions for error detection. In cases such as COBS, which lacks an inherent error detection method, CRC can be added manually.

Table /Summary

FormatData overheadProcessing overheadComplexitySynchronizationSpecial considerations
High Bit SetNegligibleNegligibleSimpleNo9-bit mode not supported by all hardware
Time BasedNegligibleNegligibleSimpleYesPrecise time between devices required
Character OrientedHighMediumDepends on specific implementationDepends on specific implementationBoth devices must use same char encoding
Byte StuffingImplementation DependentCOBS = LowPPP = HighAverage = LowWorst = HighMediumYesDevices must agree on special char
Bit StuffingVariable with Max 20% OverheadAverage = LowWorst = HighMediumYesDevices must process stuffed bits identically
Length PrefixedLowLowSimpleNoReceiving device must handle dynamic data sizes

Conclusion

As you can see, there are many different ways to break up data being sent over a Serial connection. Serial Framing Formats are able to provide a much needed solution to the question “How do I know when my Serial data should start/stop?”. This leads to much more reliable data transfers by providing predictable boundaries around data. The framing method chosen should take into account the pros and cons of what is available as well as what makes the most sense for a given project.

Modernize Your Serial Communication with Dojo Five

If you are in need of additional assistance with Serial protocols, framing formats or have something else you would like to discuss, please feel free to reach out! Dojo Five is dedicated to helping you or your team learn, and would love to ensure your current and future projects are a success. You can book a call with us to get the conversation started. Or if you’re into DevOps, you can sign up for our EmbedOps platform. We look forward to hearing from you!

Discover why Dojo Five EmbedOps is the embedded enterprise choice for build tool and test management.

Sign up to receive a free account to the EmbedOps platform and start building with confidence..

  • Connect a repo
  • Use Dev Containers with your Continuous Integration (CI) provider
  • Analyze memory usage
  • Integrate and visualize static analysis results
  • Perform Hardware-in-the-Loop (HIL) tests
  • Install the Command Line Interface for a developer-friendly experience

Subscribe to our Monthly Newsletter

Subscribe to our monthly newsletter for development insights delivered straight to your inbox.

Interested in learning more?

Best-in-class embedded firmware content, resources and best practices

Laptop with some code on screen

I want to write my first embedded program. Where do I start?

The boom in the Internet of Things (IoT) commercial devices and hobbyist platforms like the Raspberry Pi and Arduino have created a lot of options, offering inexpensive platforms with easy to use development tools for creating embedded projects. You have a lot of options to choose from. An embedded development platform is typically a microcontroller chip mounted on a circuit board designed to show off its features. There are typically two types out there: there are inexpensive versions, sometimes called

Read More »
Medical device monitoring vitals

IEC-62304 Medical Device Software – Software Life Cycle Processes Primer – Part 1

IEC-62304 Software Lifecycle requires a lot of self-reflection to scrutinize and document your development processes. There is an endless pursuit of perfection when it comes to heavily regulated industries. How can you guarantee something will have zero defects? That’s a pretty hefty task. The regulatory approach for the medical device industry is process control. The concept essentially states that if you document how every step must be completed, and provide checks to show every step has been completed properly, you

Read More »
Operating room filled with medical devices

IEC-62304 Medical Device Software – Software Life Cycle Processes Primer – Part II

Part I provides some background to IEC-62304. Part II provides a slightly more in-depth look at some of the specifics. The IEC 62304 Medical Device Software – Software Lifecycle Processes looks into your development processes for creating and maintaining your software. The standard is available for purchase here. So what activities does the standard look at? Here are some of the major topics. For any given topic, there will be a lot more specifics. This will look at a few

Read More »