This is the story of an interesting project issue – including compiler optimizations, inline assembly, bootloaders, and all other manner of mayhem. Learning about optimizations and their impact on code, especially low-level code, is critical for improving your embedded firmware game.

Consider the following code from the Nordic bootloader:

__STATIC_INLINE void jump_to_addr(uint32_t new_msp, uint32_t new_lr, uint32_t addr)
    __ASM volatile ("MSR MSP, %[arg]" : : [arg] "r" (new_msp));
    __ASM volatile ("MOV LR,  %[arg]" : : [arg] "r" (new_lr) : "lr");
    __ASM volatile ("BX       %[arg]" : : [arg] "r" (addr));

This code implements the very last steps of the bootloader before it jumps to the application.

Step 1: Setup the MSP register (“Main Stack Pointer”) with the beginning of the stack.

Step 2: Setup the LR register (“Link Register”) with the return address (in this case, a dummy value since we never expect the main app to return to the bootloader).

Step 3: JUMP! Branch to the address of our application code and be on our way!


On your way to bootloader/application jump happiness (image source)

We had compiled this code into a new project and were experiencing random reboots and flaky behavior.

Check out the assembly code that is generated by this code when it is operating correctly:

		B500        push {lr}
    F3808808    msr msp, r0
    468E        mov lr, r1
    4710        bx r2
    F85DFB04    pop.w {pc}

According to the ARM ABI (see section 6.1.1), the parameters are provided in r0, r1, and r2. The operation proceeds in a fairly straightforward fashion:

Step 1: Move r0 into msp.

Step 2: Move r1 into lr.

Step 3: JUMP! Branch to r2.

Everything looks great, no issues. This code works fine.

Now – the surprising part. Let’s say you want to debug the bootloader. You turn the compiler optimizations OFF and add debug symbols so you can follow the code. Now take a look at the assembly:

		B500        push {lr}
    B085        sub sp, sp, #20
    9003        str r0, [sp, #12]
    9102        str r1, [sp, #8]
    9201        str r2, [sp, #4]
    9B03        ldr r3, [sp, #12]
    F3838808    msr msp, r3
    9B02        ldr r3, [sp, #8]
    469E        mov lr, r3
    9B01        ldr r3, [sp, #4]
    4718        bx r3
    BF00        nop
    B005        add sp, sp, #20
    F85DFB04    pop.w {pc}

See the problem yet? When we turned off optimizations, the compiler no longer uses registers directly, instead opting to put the parameters on the stack.

These parameters are placed on the stack in the first chunk of str instructions. Then the following happens:

Step 1: Load the new stack pointer into MSP.

Step 2: Load the new link register into LR.

Step 3: Load the jump address and execute the jump with r3.

Except that’s not really what happens…

Here’s what really happens:

Step 1: Load the new stack pointer into MSP.

Step 2: Load garbage into the LR.

Step 3: Load garbage and execute a jump to a garbage address.

In step 1, the stack pointer – where all the remaining parameters are stored – is blown away. Ugh…

Screen Shot 2020-03-06 at 10.58.29 AM

Yucky invalid stack pointers make for a rotten day (image source)

There are likely some directives you could give gcc to help it avoid this issue. Our approach for now is to simply always build the Nordic bootloader with optimizations turned on.

And if you have questions about an embedded project you’re working on, Dojo Five can help you with all aspects of your EmbedOps journey! We are always happy to hear about cool projects or interesting problems to solve, so don’t hesitate to reach out and chat with us on LinkedIn or through email!

Check out our services   |   Check out our Emedded CI Platform

Or contact us at


  1. Are you leaving out the k constraint when modifying msp on purpose? I’m curious if adding an output constraint might help here.

    The may require that you use a local register variable , and temp variable located in msp to work.

    Also, might you assign the lr register first, then update the msp… this seems like it would ensure that lr is updated from incoming stacked parameters, prior to moving the stack pointer.

    Agree this is quirky, but, not as bad as compiler bugs I’ve seen, where the code produced is patently wrong.

    FWIW, this function is so small why bother writing it in C and then using asm? Why not just extern a pure asm version of it so it’s callable from C?

    1. I totally agree with you Cullen! There are any number of ways this could be fixed to give the compiler clues as to what is really intended and not do the wrong thing. Ultimately you could make the case the blame lies somewhere between the original code author and the compiler. It was just so weird that turning optimizations OFF caused the issue. Without integration tests, this issue was/would be difficult to discover.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.