malloc behaviour on an embedded system

It does not look like malloc is doing any checks at all. The fault that you get comes from hardware detecting a write to an invalid address, which is probably coming from malloc itself.

When malloc allocates memory, it takes a chunk from its internal pool, and returns it to you. However, it needs to store some information for the free function to be able to complete deallocation. Usually, that's the actual length of the chunk. In order to save that information, malloc takes a few bytes from the beginning of the chunk itself, writes the info there, and returns you the address past the spot where it has written its own information.

For example, let's say you asked for a 10-byte chunk. malloc would grab an available 16-byte chunk, say, at addresses 0x3200..0x320F, write the length (i.e. 16) into bytes 1 and 2, and return 0x3202 back to you. Now your program can use ten bytes from 0x3202 to 0x320B. The other four bytes are available, too - if you call realloc and ask for 14 bytes, there would be no reallocation.

The crucial point comes when malloc writes the length into the chunk of memory that it is about to return to you: the address to which it writes needs to be valid. It appears that after the 18-th iteration the address of the next chunk is negative (which translates to a very large positive) so CPU traps the write, and triggers the hard fault.

In situations when the heap and the stack grow toward each other there is no reliable way to detect an out of memory while letting you use every last byte of memory, which is often a very desirable thing. malloc cannot predict how much stack you are going to use after the allocation, so it does not even try. That is why the byte counting in most cases is on you.

In general, on embedded hardware when the space is limited to a few dozen kilobytes, you avoid malloc calls in "arbitrary" places. Instead, you pre-allocate all your memory upfront using some pre-calculated limits, and parcel it out to structures that need it, and never call malloc again.


Your program most likely crashes because of an illegal memory access, which is almost always an indirect (subsequent) result of a legal memory access, but one that you did not intend to perform.

For example (which is also my guess as to what's happening on your system):

Your heap most likely begins right after the stack. Now, suppose you have a stack-overflow in main. Then one of the operations that you perform in main, which is naturally a legal operation as far as you're concerned, overrides the beginning of the heap with some "junk" data.

As a subsequent result, the next time that you attempt to allocate memory from the heap, the pointer to the next available chunk of memory is no longer valid, eventually leading to a memory access violation.

So to begin with, I strongly recommend that you increase the stack size from 0x200 bytes to 0x400 bytes. This is typically defined within the linker-command file, or through the IDE, in the project's linker settings.

If your project is on IAR, then you can change it in the icf file:

define symbol __ICFEDIT_size_cstack__ = 0x400

Other than that, I suggest that you add code in your HardFault_Handler, in order to reconstruct the call-stack and register values prior to the crash. This might allow you to trace the runtime error and find out exactly where it happened.

In file 'startup_stm32f03xx.s', make sure that you have the following piece of code:

EXTERN  HardFault_Handler_C        ; this declaration is probably missing

__tx_vectors                       ; this declaration is probably there
    DCD     HardFault_Handler

Then, in the same file, add the following interrupt handler (where all other handlers are located):

    PUBWEAK HardFault_Handler
    SECTION .text:CODE:REORDER(1)
HardFault_Handler
    TST LR, #4
    ITE EQ
    MRSEQ R0, MSP
    MRSNE R0, PSP
    B HardFault_Handler_C

Then, in file 'stm32f03xx.c', add the following ISR:

void HardFault_Handler_C(unsigned int* hardfault_args)
{
    printf("R0    = 0x%.8X\r\n",hardfault_args[0]);         
    printf("R1    = 0x%.8X\r\n",hardfault_args[1]);         
    printf("R2    = 0x%.8X\r\n",hardfault_args[2]);         
    printf("R3    = 0x%.8X\r\n",hardfault_args[3]);         
    printf("R12   = 0x%.8X\r\n",hardfault_args[4]);         
    printf("LR    = 0x%.8X\r\n",hardfault_args[5]);         
    printf("PC    = 0x%.8X\r\n",hardfault_args[6]);         
    printf("PSR   = 0x%.8X\r\n",hardfault_args[7]);         
    printf("BFAR  = 0x%.8X\r\n",*(unsigned int*)0xE000ED38);
    printf("CFSR  = 0x%.8X\r\n",*(unsigned int*)0xE000ED28);
    printf("HFSR  = 0x%.8X\r\n",*(unsigned int*)0xE000ED2C);
    printf("DFSR  = 0x%.8X\r\n",*(unsigned int*)0xE000ED30);
    printf("AFSR  = 0x%.8X\r\n",*(unsigned int*)0xE000ED3C);
    printf("SHCSR = 0x%.8X\r\n",SCB->SHCSR);                
    while (1);
}

If you can't use printf at the point in the execution when this specific Hard-Fault interrupt occurs, then save all the above data in a global buffer instead, so you can view it after reaching the while (1).

Then, refer to the 'Cortex-M Fault Exceptions and Registers' section at http://www.keil.com/appnotes/files/apnt209.pdf in order to understand the problem, or publish the output here if you want further assistance.

UPDATE:

In addition to all of the above, make sure that the base address of the heap is defined correctly. It is possibly hard-coded within the project settings (typically right after the data-section and the stack). But it can also be determined during runtime, at the initialization phase of your program. In general, you need to check the base addresses of the data-section and the stack of your program (in the map file created after building the project), and make sure that the heap does not overlap either one of them.

I once had a case where the base address of the heap was set to a constant address, which was fine to begin with. But then I gradually increased the size of the data-section, by adding global variables to the program. The stack was located right after the data-section, and it "moved forward" as the data-section grew larger, so there were no problems with either one of them. But eventually, the heap was allocated "on top of" part of the stack. So at some point, heap-operations began to override variables on the stack, and stack-operations began to override the contents of the heap.