The Curious Case Where the Compiler Was Wrong

This is the story of an interesting project issue – including compiler optimizations, inline assembly, bootloaders, and all other manner of mayhem. Learning about optimizations and their impact on code, especially low-level code, is critical for improving your embedded firmware game.

Consider the following code from the Nordic bootloader:

__STATIC_INLINE void jump_to_addr(uint32_t new_msp, uint32_t new_lr, uint32_t addr)
{
    __ASM volatile ("MSR MSP, %[arg]" : : [arg] "r" (new_msp));
    __ASM volatile ("MOV LR,  %[arg]" : : [arg] "r" (new_lr) : "lr");
    __ASM volatile ("BX       %[arg]" : : [arg] "r" (addr));
}

This code implements the very last steps of the bootloader before it jumps to the application.

Step 1: Setup the MSP register (“Main Stack Pointer”) with the beginning of the stack.

Step 2: Setup the LR register (“Link Register”) with the return address (in this case, a dummy value since we never expect the main app to return to the bootloader).

Step 3: JUMP! Branch to the address of our application code and be on our way!

photo-1470162015499-2b9772941cde

On your way to bootloader/application jump happiness (image source)

We had compiled this code into a new project and were experiencing random reboots and flaky behavior.

Check out the assembly code that is generated by this code when it is operating correctly:

		B500        push {lr}
    F3808808    msr msp, r0
    468E        mov lr, r1
    4710        bx r2
    F85DFB04    pop.w {pc}

According to the ARM ABI (see section 6.1.1), the parameters are provided in r0, r1, and r2. The operation proceeds in a fairly straightforward fashion:

Step 1: Move r0 into msp.

Step 2: Move r1 into lr.

Step 3: JUMP! Branch to r2.

Everything looks great, no issues. This code works fine.

Now – the surprising part. Let’s say you want to debug the bootloader. You turn the compiler optimizations OFF and add debug symbols so you can follow the code. Now take a look at the assembly:

		B500        push {lr}
    B085        sub sp, sp, #20
    9003        str r0, [sp, #12]
    9102        str r1, [sp, #8]
    9201        str r2, [sp, #4]
    9B03        ldr r3, [sp, #12]
    F3838808    msr msp, r3
    9B02        ldr r3, [sp, #8]
    469E        mov lr, r3
    9B01        ldr r3, [sp, #4]
    4718        bx r3
    BF00        nop
    B005        add sp, sp, #20
    F85DFB04    pop.w {pc}

See the problem yet? When we turned off optimizations, the compiler no longer uses registers directly, instead opting to put the parameters on the stack.

These parameters are placed on the stack in the first chunk of str instructions. Then the following happens:

Step 1: Load the new stack pointer into MSP.

Step 2: Load the new link register into LR.

Step 3: Load the jump address and execute the jump with r3.

Except that’s not really what happens…

Here’s what really happens:

Step 1: Load the new stack pointer into MSP.

Step 2: Load garbage into the LR.

Step 3: Load garbage and execute a jump to a garbage address.

In step 1, the stack pointer – where all the remaining parameters are stored – is blown away. Ugh…

Screen Shot 2020-03-06 at 10.58.29 AM

Yucky invalid stack pointers make for a rotten day (image source)

There are likely some directives you could give gcc to help it avoid this issue. Our approach for now is to simply always build the Nordic bootloader with optimizations turned on.

And if you have questions about an embedded project you’re working on, Dojo Five can help you with all aspects of your EmbedOps journey! We are always happy to hear about cool projects or interesting problems to solve, so don’t hesitate to reach out and chat with us on LinkedIn or through email!

Check out our services    |    Check out our Emedded CI Platform

Or contact us at [email protected]

Posted in