ARM: link register and frame pointer
Some register calling conventions are dependent on the ABI (Application Binary Interface). The FP
is required in the APCS standard and not in the newer AAPCS (2003). For the AAPCS (GCC 5.0+) the FP
does not have to be used but certainly can be; debug info is annotated with stack and frame pointer use for stack tracing and unwinding code with the AAPCS. If a function is static
, a compiler really doesn't have to adhere to any conventions.
Generally all ARM registers are general purpose. The lr
(link register, also R14) and pc
(program counter also R15) are special and enshrine in the instruction set. You are correct that the lr
would point to A. The pc
and lr
are related. One is "where you are" and the other is "where you were". They are the code aspect of a function.
Typically, we have the sp
(stack pointer, R13) and the fp
(frame pointer, R11). These two are also related. This
Microsoft layout does a good job describing things. The stack is used to store temporary data or locals in your function. Any variables in foo()
and bar()
, are stored here, on the stack or in available registers. The fp
keeps track of the variables from function to function. It is a frame or picture window on the stack for that function. The ABI defines a layout of this frame. Typically the lr
and other registers are saved here behind the scenes by the compiler as well as the previous value of fp
. This makes a linked list of stack frames and if you want you can trace it all the way back to main()
. The root is fp
, which points to one stack frame (like a struct
) with one variable in the struct
being the previous fp
. You can go along the list until the final fp
which is normally NULL
.
So the sp
is where the stack is and the fp
is where the stack was, a lot like the pc
and lr
. Each old lr
(link register) is stored in the old fp
(frame pointer). The sp
and fp
are a data aspect of functions.
Your point B is the active pc
and sp
. Point A is actually the fp
and lr
; unless you call yet another function and then the compiler might get ready to setup the fp
to point to the data in B.
Following is some ARM assembler that might demonstrate how this all works. This will be different depending on how the compiler optimizes, but it should give an idea,
; Prologue - setup
mov ip, sp ; get a copy of sp.
stmdb sp!, {fp, ip, lr, pc} ; Save the frame on the stack. See Addendum
sub fp, ip, #4 ; Set the new frame pointer.
...
; Maybe other functions called here.
; Older caller return lr
stored in stack frame.
bl baz
...
; Epilogue - return
ldm sp, {fp, sp, lr} ; restore stack, frame pointer and old link.
... ; maybe more stuff here.
bx lr ; return.
This is what foo()
would look like. If you don't call bar()
, then the compiler does a leaf optimization and doesn't need to save the frame; only the bx lr
is needed. Most likely this maybe why you are confused by web examples. It is not always the same.
The take-away should be,
pc
andlr
are related code registers. One is "Where you are", the other is "Where you were".sp
andfp
are related local data registers.
One is "Where local data is", the other is "Where the last local data is".- The work together along with parameter passing to create function machinery.
- It is hard to describe a general case because we want compilers to be as fast as possible, so they use every trick they can.
These concepts are generic to all CPUs and compiled languages, although the details can vary. The use of the link register, frame pointer are part of the function prologue and epilogue, and if you understood everything, you know how a stack overflow works on an ARM.
See also: ARM calling convention.
MSDN ARM stack article
University of Cambridge APCS overview
ARM stack trace blog
Apple ABI link
The basic frame layout is,
- fp[-0] saved
pc
, where we stored this frame. - fp[-1] saved
lr
, the return address for this function. - fp[-2] previous
sp
, before this function eats stack. - fp[-3] previous
fp
, the last stack frame. - many optional registers...
An ABI may use other values, but the above are typical for most setups. The indexes above are for 32 bit values as all ARM registers are 32 bits. If you are byte-centric, multiply by four. The frame is also aligned to at least four bytes.
Addendum: This is not an error in the assembler; it is normal. An explanation is in the ARM generated prologs question.