Why does the Mac ABI require 16-byte stack alignment for x86-32?

First, note that the 16 bytes alignment is an exception introduced by Apple to the System V IA-32 ABI.

The stack alignment is only needed when calling system functions, because many system libraries are using SSE or Altivec extensions which require the 16 bytes alignment. I found an explicit reference in the libgmalloc MAN page.

You can perfectly handle your stack frame the way you want, but if you try to call a system function with a misaligned stack, you will end up with a misaligned_stack_error message.

Edit: For the record, you can get rid of alignment problems when compiling with GCC by using the mstack-realign option.


I believe it's to keep it inline with the x86-64 ABI.


I am not sure as I don't have first hand proof, but I believe the reason is SSE. SSE is much faster if your buffers are already aligned on a 16 bytes boundary (movps vs movups), and any x86 has at least sse2 for mac os x. It can be taken care of by the application user, but the cost is pretty significant. If the overall cost for making it mandatory in the ABI is not too significant, it may worth it. SSE is used quite pervasively in mac os X: accelerate framework, etc...


From "Intel®64 and IA-32 Architectures Optimization Reference Manual", section 4.4.2:

"For best performance, the Streaming SIMD Extensions and Streaming SIMD Extensions 2 require their memory operands to be aligned to 16-byte boundaries. Unaligned data can cause significant performance penalties compared to aligned data."

From Appendix D:

"It is important to ensure that the stack frame is aligned to a 16-byte boundary upon function entry to keep local __m128 data, parameters, and XMM register spill locations aligned throughout a function invocation."

http://www.intel.com/Assets/PDF/manual/248966.pdf