The Curious Case Where the Compiler Was Wrong

This is the story of an interesting project issue – including compiler optimizations, inline assembly, bootloaders, and all other manner of mayhem. Learning about optimizations and their impact on code, especially low-level code, is critical for improving your embedded firmware game.
Consider the following code from the Nordic bootloader:
__STATIC_INLINE void jump_to_addr(uint32_t new_msp, uint32_t new_lr, uint32_t addr)
{
__ASM volatile ("MSR MSP, %[arg]" : : [arg] "r" (new_msp));
__ASM volatile ("MOV LR, %[arg]" : : [arg] "r" (new_lr) : "lr");
__ASM volatile ("BX %[arg]" : : [arg] "r" (addr));
}
This code implements the very last steps of the bootloader before it jumps to the application.
Step 1: Setup the MSP register (“Main Stack Pointer”) with the beginning of the stack.
Step 2: Setup the LR register (“Link Register”) with the return address (in this case, a dummy value since we never expect the main app to return to the bootloader).
Step 3: JUMP! Branch to the address of our application code and be on our way!
We had compiled this code into a new project and were experiencing random reboots and flaky behavior.
Check out the assembly code that is generated by this code when it is operating correctly:
B500 push {lr}
F3808808 msr msp, r0
468E mov lr, r1
4710 bx r2
F85DFB04 pop.w {pc}
According to the ARM ABI (see section 6.1.1), the parameters are provided in r0, r1, and r2. The operation proceeds in a fairly straightforward fashion:
Step 1: Move r0 into msp.
Step 2: Move r1 into lr.
Step 3: JUMP! Branch to r2.
Everything looks great, no issues. This code works fine.
Now – the surprising part. Let’s say you want to debug the bootloader. You turn the compiler optimizations OFF and add debug symbols so you can follow the code. Now take a look at the assembly:
B500 push {lr}
B085 sub sp, sp, #20
9003 str r0, [sp, #12]
9102 str r1, [sp, #8]
9201 str r2, [sp, #4]
9B03 ldr r3, [sp, #12]
F3838808 msr msp, r3
9B02 ldr r3, [sp, #8]
469E mov lr, r3
9B01 ldr r3, [sp, #4]
4718 bx r3
BF00 nop
B005 add sp, sp, #20
F85DFB04 pop.w {pc}
See the problem yet? When we turned off optimizations, the compiler no longer uses registers directly, instead opting to put the parameters on the stack.
These parameters are placed on the stack in the first chunk of str instructions. Then the following happens:
Step 1: Load the new stack pointer into MSP.
Step 2: Load the new link register into LR.
Step 3: Load the jump address and execute the jump with r3.
Except that’s not really what happens…
Here’s what really happens:
Step 1: Load the new stack pointer into MSP.
Step 2: Load garbage into the LR.
Step 3: Load garbage and execute a jump to a garbage address.
In step 1, the stack pointer – where all the remaining parameters are stored – is blown away. Ugh…

Yucky invalid stack pointers make for a rotten day (image source)
There are likely some directives you could give gcc to help it avoid this issue. Our approach for now is to simply always build the Nordic bootloader with optimizations turned on.
And if you have questions about an embedded project you’re working on, Dojo Five can help you with all aspects of your EmbedOps journey! We are always happy to hear about cool projects or interesting problems to solve, so don’t hesitate to reach out and chat with us on LinkedIn or through email!
Check out our services | Check out our Emedded CI Platform
Or contact us at [email protected]
Are you leaving out the k constraint when modifying msp on purpose? I’m curious if adding an output constraint might help here.
The may require that you use a local register variable , and temp variable located in msp to work.
Also, might you assign the lr register first, then update the msp… this seems like it would ensure that lr is updated from incoming stacked parameters, prior to moving the stack pointer.
Agree this is quirky, but, not as bad as compiler bugs I’ve seen, where the code produced is patently wrong.
FWIW, this function is so small why bother writing it in C and then using asm? Why not just extern a pure asm version of it so it’s callable from C?
I totally agree with you Cullen! There are any number of ways this could be fixed to give the compiler clues as to what is really intended and not do the wrong thing. Ultimately you could make the case the blame lies somewhere between the original code author and the compiler. It was just so weird that turning optimizations OFF caused the issue. Without integration tests, this issue was/would be difficult to discover.