Why are rbp and rsp called general purpose registers?
If a register can be an operand for add
, or used in an addressing mode, it's "general purpose", as opposed to registers like the FS
segment register, or RIP. The GP registers are also called "integer registers", even though other kinds of registers can hold integers, too.
In computer architecture, it's common for CPUs to internally handle integer registers / instructions separately from FP/SIMD registers / instructions. e.g. Intel Sandybridge-family CPUs have separate physical register files for renaming GP integer vs. FP/vector registers. These are simply called the integer vs. FP register files. (Where FP is short-hand for everything that a kernel doesn't need to save/restore to use the GP registers while leaving user-space's FPU/SIMD state untouched.) Each entry in the FP register file is 256 bits wide (to hold an AVX ymm vector), but integer register file entries only have to be 64 bits wide.
On CPUs that rename segment registers (Skylake does not), I guess that would be part of the integer state, and so would RFLAGS + RIP. But when we say "integer register", we normally mean specifically a general-purpose register.
"General purpose" in this usage means "data or address", as opposed to an ISA like m68k where you had d0..7 data regs and a0..7 address regs, all 16 of which are integer regs. Regardless of how the register is normally used, general-purpose is about how it can be used.
Every register has some special-ness for some instructions, except some of the completely new registers added with x86-64: R8-R15. These don't disqualify them as General Purpose The (low 16 of the) original 8 date back to 8086, and there were implicit uses of each of them even in the original 8086.
For RSP, it's special for push/pop/call/ret, so most code never uses it for anything else. (And in kernel mode, used asynchronously for interrupts, so you really can't stash it somewhere to get an extra GP register the way you can in user-space code: Is ESP as general-purpose as EAX?)
But in controlled conditional (like no signal handlers) you don't have to use RSP for a stack pointer. e.g. you can use it to read an array in a loop with pop, like in this code-golf answer. (I actually used esp
in 32-bit code, but same difference: pop
is faster than lodsd
on Skylake, while both are 1 byte.)
Implicit uses and special-ness for each register:
See also x86 Assembly - Why is [e]bx preserved in calling conventions? for a partial list.
I'm mostly limiting this to user-space instructions, especially ones a modern compiler might actually emit from C or C++ code. I'm not trying to be exhaustive for regs that have a lot of implicit uses.
rax
: one-operand [i]mul / [i]div / cdq / cdqe, string instructions (stos),cmpxchg
, etc. etc. As well as special shorter encodings for many immediate instructions like 2-bytecmp al, 1
or 5-byteadd eax, 12345
(no ModRM byte). See also codegolf.SE Tips for golfing in x86/x64 machine code.There's also
xchg
-with-eax which is where0x90 nop
came from (beforenop
became a separately-documented instruction in x86-64, becausexchg eax,eax
zero-extends eax into RAX and thus can't use the0x90
encoding. Butxchg rax,rax
can still assemble to REX.W=1 0x90.)rcx
: shift counts,rep
-string counts, the slowloop
instructionrdx
:rdx:rax
is used by divide and widening-multiply (the one-operand forms), andcwd
/cdq
/cqo
to set up foridiv
. Alsordtsc
and BMI2mulx
.rbx
: 8086xlatb
.cpuid
use all four of EAX..EDX. 486cmpxchg8b
, x86-64cmpxchg16b
. Most 32-bit compilers will emitcmpxchg8
forstd::atomic<long long>::compare_exchange_weak
. (Pure load / pure store can use SSE MOVQ or x87 fild/fistp, though, if targeting Pentium or later.) 64-bit compilers will use 64-bitlock cmpxchg
, not cmpxchg8b.Some 64-bit compilers will emit
cmpxchg16b
foratomic<struct_16_bytes>
. RBX has the fewest implicit uses of the original 8, butlock cmpxchg16b
is one of the few compilers will actually use.rsi
/rdi
: string ops, includingrep movsb
which some compilers sometimes inline. (gcc also inlinesrep cmpsb
for string literals in some cases, but that's probably not optimal).rbp
:leave
(only 1 uop slower thanmov rsp, rbp
/pop rbp
. gcc actually uses it in functions with a frame pointer, when it can't justpop rbp
). Also the horribly-slowenter
which nobody ever uses.rsp
: stack operations: push/pop/call/ret, andleave
. (Andenter
). And in kernel mode (not user space) asynchronous use by hardware to save interrupt context. This is why kernel code can't have a red-zone.r11
:syscall
/sysret
use it to save/restore user-space's RFLAGS. (Along with RCX to save/restore user-space's RIP).
Addressing-mode encoding special cases:
(See also rbp not allowed as SIB base? which is just about addressing modes, where I copied this part of this answer.)
rbp
/r13
can't be a base register with no displacement: that encoding instead means: (in ModRM) rel32
(RIP-relative), or (in SIB) disp32
with no base register. (r13
uses the same 3 bits in ModRM/SIB, so this choice simplifies decoding by not making the instruction-length decoder look at the REX.B bit to get the 4th base-register bit). [r13]
assembles to [r13 + disp8=0]
. [r13+rdx]
assembles to [rdx+r13]
(avoiding the problem by swapping base/index when that's an option).
rsp
/r12
as a base register always needs a SIB byte. (The ModR/M encoding of base=RSP is escape code to signal a SIB byte, and again, more of the decoder would have to care about the REX prefix if r12
was handled differently).
rsp
can't be an index register. This makes it possible to encode [rsp]
, which is more useful than [rsp + rsp]
. (Intel could have designed the ModRM/SIB encodings for 32-bit addressing modes (new in 386) so SIB-with-no-index was only possible with base=ESP. That would make [eax + esp*4]
possible and only exclude [esp + esp*1/2/4/8]
. But that's not useful, so they simplified the hardware by making index=ESP the code for no index regardless of the base. This allows two redundant ways to encode any base or base+disp addressing mode: with or without a SIB.)
r12
can be an index register. Unlike the other cases, this doesn't affect instruction-length decoding. Also, it can't be worked around with a longer encoding like the other cases. AMD wanted AMD64's register set to be as orthogonal as possible, so it makes sense they'd spend a few extra transistors to check REX.X as part of the index / no-index decoding. For example, [rsp + r12*4]
requires index=r12, so having r12
not fully generally purpose would make AMD64 a worse compiler target.
0: 41 8b 03 mov eax,DWORD PTR [r11]
3: 41 8b 04 24 mov eax,DWORD PTR [r12] # needs a SIB like RSP
7: 41 8b 45 00 mov eax,DWORD PTR [r13+0x0] # needs a disp8 like RBP
b: 41 8b 06 mov eax,DWORD PTR [r14]
e: 41 8b 07 mov eax,DWORD PTR [r15]
11: 43 8b 04 e3 mov eax,DWORD PTR [r11+r12*8] # *can* be an index
Compilers like it when all registers can be used for anything, only constraining register allocation for a few special-case operations. This is what's meant by register orthogonality.
General purpose means all of these registers might be used with any instructions doing computation with general purpose registers while, for example, you cannot do whatever you want with the instruction pointer (RIP) or the flags register (RFLAGS).
Some of these registers were envisioned to be used for specific use, and commonly are. The most critical ones are the RSP and RBP.
Should you need to use them for your own purpose, you should save their contents before storing something else inside, and restore them to their original value when done.
Dereferencing rbp might result in a #SS(stack segment) fault.
Recently, I hit a linux kernel crash with a 'stack segment fault'.
crash> dmesg
[...]
stack segment: 0000 [#1] SMP
[...]
RIP: 0010:[<ffffffff8125fa8b>] lock_get_status+0x9b/0x3b0
RSP: 0018:ffff89954a317d90 EFLAGS: 00010282
[...]
RBP: 800000fa8c251867 R08: 0000000000001000 R09: 000000000000ffff
[...]
crash> dis lock_get_status+0x9b
0xffffffff8125fa8b <lock_get_status+0x9b>: mov 0x28(%rbp),%rax
The memory address in rbp is non-canonical address. That's the reason for this crash. What I learned from this crash is that accessing rbp implicitly accesses ss segment register even through rbp is not used as a stack frame base pointer.
According to Intel SDMv1 3.4.1 General-Purpose Registers:
EBP — Pointer to data on the stack (in the SS segment)