HOME

How to gracefully return from ARM Cortex M3 Hard Faults

Cortex M3 MPU fault dump

In a system that uses the Cortex M3's MPU it obviously is a good idea to have automated tests for the MPU configuration just as you would have them for other propoerties of the hard- or software.

Creating a test case that checks if faults are generated when illegal memory operations are performed is generally straight forward. In most cases you would already have some kind of framework in place that can execute the necessary instructions and monitor the system's response e.g. via a TTY connected to the controller's UART.

All that is left to do is to set up a fault handler that prints a stack trace to the TTY and have the test case cause a MPU fault and check the trace.

While this works just fine it might take quite a while to thoroughly test a system that has a complex MPU configuration since every time the fault handler is triggered the system will usually get stuck in the handler's idle loop and need to be reset before the next test case can run.

One way to speed up the testing process is to instrument the fault handler so that it can be set to a testing mode.
When the testing mode is activated the handler will just print the stack trace and exit gracefully returning control to the mainprogram.

While this sound simple there is one minor catch:
When a fault is triggered the program counter points to the instruction that caused the fault. Entering and leaving the handler does not alter the state of the PC. Therefore after returning from the handler the very instruction that caused the fault will be executed again therefore triggering the handler again which returns to the same instruction again, etc. Therefore we would be stuck in an infinite loop.

One solution would be to have the fault handler modify the MPU configuration to allow the particular type of access so that the instruction can complete successfully.

This approach can sometimes be appropriate, not just for testing but even in production systems. For example if a certain memory area contains configuration data that should usually only be readable it can be locked in the MPU. Under certain circumstances a privileged process might need to change the configuration. The fault handler can then check that the fault was indeed caused by such a process and temporarily allow write access to the memory region or otherwise halt the system.

For system testing though this approach is much too complicated as it requires a lot of additional code (after temporarily allowing access the original MPU configuration has to be restored before testing can continue) and therefore has the potential to introduce new bugs.

There is another, much simpler solution:
Instead of returning control to the instruction that caused the fault, the fault handler can increment the PC and thereby discard the instruction that caused the fault and resume execution at the next instruction. (Obviously just skipping an instruction is a bad idea in a production system, but since we are running a test and the particular instruction's only purpose is to cause the fault, skipping it is perfectly fine.)

The problem with this approach when running in Thumb-2 mode is that instruction length is variable and we need to know if we are dealing with a 16 or 32bit instructions so that the PC can be incremented accordingly.

Fortunately 32bit instructions are easily identified by bits [15:11] of their first halfword.
If those bits are one of the following the instruction is 32bit wide, otherwise it is a 16bit instruction:

Putting everything together we get the following solution:

Finally here is some example code that implements this functionality:

testcase.c

volatile unsigned int _flag_exc_test = 0;
volatile unsigned int _exc_count = 0;
volatile char buf;

_flag_exc_test = 1;
_exc_count = 0;
buf = *(char *)(0x00000000); // read byte from 0x0
*(char *)(0x0000abcd) = 0;   // write byte to 0xABCD
*(char *)(0x00000000) = 0;   // write byte to 0x0
_flag_exc_test = 0;
if(_exc_count == 3) {
    /* If we got the expected number of exceptions, the test was successfull */
    ...
}

fault_handler.s

    .global   Fault_Handler
    .type     Fault_Handler, %function
Fault_Handler:
    bl      print_registers
    ldr     r0, F_EXC      /* Check if this is an exception test. */
    ldr     r0, [r0]
    cmp     r0, #1
    blt     EXC_OUT
    ldr     r0, C_EXC      /* This is an exception test; Increment exception counter. */
    ldr     r1, [r0]
    add     r1, r1, #1
    str     r1, [r0]
    ldr     r0, [sp,#24]  /* load saved PC. */
    ldr     r0, [r0]
    lsr     r0, r0, #11   /* test if PC points to 2 or 4 byte instruction. */
    mov     r1, #0x1f
    and     r0, r1
    cmp     r0, #28
    ble     INS_2b       /* increment saved PC (skip instruction that caused the fault)... */
INS_4b:
    ldr     r0, [sp,#24] /* ... by 4 bytes. */
    add     r0, #4
INS_2b:
    ldr     r0, [sp,#24] /* ... by 2 bytes. */
    add     r0, #2
SET_EXC_RET:
    str     r0, [sp,#24]  /* save modified PC. */
    mov     r0, #0xe      /* set PC to EXC_RETURN magic value. */
    mvn     r0, r0
    bx      r0            /* continue execution */
EXC_OUT:                  /* Real exception, not a test. */
0:  b 0b                  /* Wait in infinite loop. */

.align 4
    .extern _flag_exc_test
    .extern _exc_count
F_EXC:  .word   _flag_exc_test
C_EXC:  .word   _exc_count