Is reserving stack space necessary for functions less than four arguments?
After playing with this more and reading the docs, the 32 bytes need to be reserved for any function that you call. If your function is as simple as the example and you don't call other functions, you don't have to reserve this space. Any function you call however may use this 32 bytes so if you do not reserve them the function may
Also your function may rely on there being 32 bytes available on the stack from the function that called yours if it's following the ABI. Commonly this 32 byte area is used to save registers that will be changed in your function so you can restore their values before returning. I think is is for performance purposes, 32 bytes being chosen as enough to make it so most leaf functions (functions that don't call others) don't need to reserve any stack space, and have temporary room on the stack to save registers and restore them before returning. Take this example:
Calling Function:
CallingFunction:
push rbp
mov rbp, rsp
sub rsp, 40 // $20 bytes we want to use at [rbp+30],
// plus $20 bytes for calling other functions
// according to windows ABI spec
mov rcx, [rsi+10] // parameter 1 (xmm0 if non-int)
mov rdx, 10 // parameter 2 (xmm1 if non-int)
movss xmm2, [rsi+28] // parameter 3 (r8 if int)
mov r9, [rsi+64] // parameter 4 (xmm3 if non-int)
call MyFunction
// ... do other stuff
add rsp, 40 // free space we reserved
pop rbp
xor rax,rax
ret
Called Function
CalledFunction:
push rbp // standard
mov rbp, rsp // standard
// should do 'sub rsp, 20' here if calling any functions
// to give them a free scratch area
// [rbp] is saved rbp
// [rbp+8] is return address
// [rbp+10] to [rbp+2f] are the 0x20 bytes we can
// safely modify in this function, this could
// be pushed higher if the function had more than 4
// parameters and some had to be passed on the stack
// or if returning a structure or something that needs
// more space. In these cases the CALLER would have
// allocated more space for us
// the main reason for the 0x20 is so that we can save
// registers we want to modify without having to allocate
// stack space ourselves
mov [rbp+10], rsi // save rsi in space allocated by caller
mov [rbp+18], rdi // save rdi in space allocated by caller
mov rsi, [rcx+20]
mov rdi, [rsi+48]
add rdi, [rsi+28]
mov rax, rdi
mov rdi, [rbp+18] // restore changed register
mov rsi, [rbp+10] // restore changed register
pop rbp
ret
Original answer
I just ran into this not knowing and it seems to be the case. The first two instructions in GetAsyncKeyState for instance overwrite the stack above the return address in the 0x20 byte area you're supposed to reserve for the callee to use for parameters:
user32.GetAsyncKeyState - mov [rsp+08],rbx
user32.GetAsyncKeyState+5- mov [rsp+10],rsi
user32.GetAsyncKeyState+A- push rdi
user32.GetAsyncKeyState+B- sub rsp,20
Your quote is from the "calling convention" part of the documentation. At the very least, you do not have to worry about this if you do not call other functions from your assembly code. If you do, then you must respect, among other things, "red zone" and stack alignment considerations, that the recommendation you quote is intended to ensure.
EDIT: this post clarifies the difference between "red zone" and "shadow space".