x86 MUL Instruction from VS 2008/2010
There's three different types of multiply instructions on x86. The first is MUL reg
, which does an unsigned multiply of EAX
by reg and puts the (64-bit) result into EDX:EAX
. The second is IMUL reg
, which does the same with a signed multiply. The third type is either IMUL reg1, reg2
(multiplies reg1 with reg2 and stores the 32-bit result into reg1) or IMUL reg1, reg2, imm
(multiplies reg2 by imm and stores the 32-bit result into reg1).
Since in C, multiplies of two 32-bit values produce 32-bit results, compilers normally use the third type (signedness doesn't matter, the low 32 bits agree between signed and unsigned 32x32 multiplies). VC++ will generate the "long multiply" versions of MUL
/IMUL
if you actually use the full 64-bit results, e.g. here:
unsigned long long prod(unsigned int a, unsigned int b)
{
return (unsigned long long) a * b;
}
The 2-operand (and 3-operand) versions of IMUL
are faster than the one-operand versions simply because they don't produce a full 64-bit result. Wide multipliers are large and slow; it's much easier to build a smaller multiplier and synthesize long multiplies using Microcode if necessary. Also, MUL/IMUL writes two registers, which again is usually resolved by breaking it into multiple instructions internally - it's much easier for the instruction reordering hardware to keep track of two dependent instructions that each write one register (most x86 instructions look like that internally) than it is to keep track of one instruction that writes two.
imul
(signed) and mul
(unsigned) both have a one-operand form that does edx:eax = eax * src
. i.e. a 32x32b => 64b full multiply (or 64x64b => 128b).
186 added an imul dest(reg), src(reg/mem), immediate
form, and 386 added an imul r32, r/m32
form, both of which which only compute the lower half of the result. (According to NASM's appendix B, see also the x86 tag wiki)
When multiplying two 32-bit values, the least significant 32 bits of the result are the same, whether you consider the values to be signed or unsigned. In other words, the difference between a signed and an unsigned multiply becomes apparent only if you look at the "upper" half of the result, which one-operand imul
/mul
puts in edx
and two or three operand imul
puts nowhere. Thus, the multi-operand forms of imul
can be used on signed and unsigned values, and there was no need for Intel to add new forms of mul
as well. (They could have made multi-operand mul
a synonym for imul
, but that would make disassembly output not match the source.)
In C, results of arithmetic operations have the same type as the operands (after integer promotion for narrow integer types). If you multiply two int
together, you get an int
, not a long long
: the "upper half" is not retained. Hence, the C compiler only needs what imul
provides, and since imul
is easier to use than mul
, the C compiler uses imul
to avoid needing mov
instructions to get data into / out of eax
.
As a second step, since C compilers use the multiple-operand form of imul
a lot, Intel and AMD invest effort into making it as fast as possible. It only writes one output register, not e/rdx:e/rax
, so it was possible for CPUs to optimize it more easily than the one-operand form. This makes imul
even more attractive.
The one-operand form of mul
/imul
is useful when implementing big number arithmetic. In C, in 32-bit mode, you should get some mul
invocations by multiplying unsigned long long
values together. But, depending on the compiler and OS, those mul
opcodes may be hidden in some dedicated function, so you will not necessarily see them. In 64-bit mode, long long
has only 64 bits, not 128, and the compiler will simply use imul
.