Double cast to unsigned int on Win32 is truncating to 2,147,483,648
A compiler bug...
From assembly provided by @anastaciu, the direct cast code calls __ftol2_sse
, which seems to convert the number to a signed long. The routine name is ftol2_sse
because this is an sse-enabled machine - but the float is in a x87 floating point register.
; Line 17
call _getDouble
call __ftol2_sse
push eax
push OFFSET ??_C@_0BH@GDLBDFEH@Direct?5cast?5value?3?5?$CFu?6@
call _printf
add esp, 8
The indirect cast on the other hand does
; Line 18
call _getDouble
fstp QWORD PTR _d$[ebp]
; Line 19
movsd xmm0, QWORD PTR _d$[ebp]
call __dtoui3
push eax
push OFFSET ??_C@_0BJ@HCKMOBHF@Indirect?5cast?5value?3?5?$CFu?6@
call _printf
add esp, 8
which pops and stores the double value to the local variable, then loads it into a SSE register and calls __dtoui3
which is a double to unsigned int conversion routine...
The behaviour of the direct cast does not conform to C89; nor does it conform to any later revision - even C89 explicitly says that:
The remaindering operation done when a value of integral type is converted to unsigned type need not be done when a value of floating type is converted to unsigned type. Thus the range of portable values is [0, Utype_MAX + 1).
I believe the problem might be a continuation of this from 2005 - there used to be a conversion function called __ftol2
which probably would have worked for this code, i.e. it would have converted the value to a signed number -2147483647, which would have produced the correct result when interpreted an unsigned number.
Unfortunately __ftol2_sse
is not a drop-in replacement for __ftol2
, as it would - instead of just taking the least-significant value bits as-is - signal the out-of-range error by returning LONG_MIN
/ 0x80000000
, which, interpreted as unsigned long here is not at all what was expected. The behaviour of __ftol2_sse
would be valid for signed long
, as conversion of a double a value > LONG_MAX
to signed long
would have undefined behaviour.
Following @AnttiHaapala's answer, I tested the code using optimization /Ox
and found that this will remove the bug as __ftol2_sse
is no longer used:
//; 17 : printf("Direct cast value: %u\n", (unsigned int)getDouble());
push -2147483647 //; 80000001H
push OFFSET $SG10116
call _printf
//; 18 : double d = getDouble();
//; 19 : printf("Indirect cast value: %u\n", (unsigned int)d);
push -2147483647 //; 80000001H
push OFFSET $SG10117
call _printf
add esp, 28 //; 0000001cH
The optimizations inlined getdouble()
and added constant expression evaluation thus removing the need for a conversion at runtime making the bug go away.
Just out of curiosity, I made some more tests, namely changing the code to force float-to-int conversion at runtime. In this case the result is still correct, the compiler, with optimization, uses __dtoui3
in both conversions:
//; 19 : printf("Direct cast value: %u\n", (unsigned int)getDouble(d));
movsd xmm0, QWORD PTR _d$[esp+24]
add esp, 12 //; 0000000cH
call __dtoui3
push eax
push OFFSET $SG9261
call _printf
//; 20 : double db = getDouble(d);
//; 21 : printf("Indirect cast value: %u\n", (unsigned int)db);
movsd xmm0, QWORD PTR _d$[esp+20]
add esp, 8
call __dtoui3
push eax
push OFFSET $SG9262
call _printf
However, preventing inlining, __declspec(noinline) double getDouble(){...}
will bring the bug back:
//; 17 : printf("Direct cast value: %u\n", (unsigned int)getDouble(d));
movsd xmm0, QWORD PTR _d$[esp+76]
add esp, 4
movsd QWORD PTR [esp], xmm0
call _getDouble
call __ftol2_sse
push eax
push OFFSET $SG9261
call _printf
//; 18 : double db = getDouble(d);
movsd xmm0, QWORD PTR _d$[esp+80]
add esp, 8
movsd QWORD PTR [esp], xmm0
call _getDouble
//; 19 : printf("Indirect cast value: %u\n", (unsigned int)db);
call __ftol2_sse
push eax
push OFFSET $SG9262
call _printf
__ftol2_sse
is called in both conversions making the output 2147483648
in both situations, @zwol suspicions were correct.
Compilation details:
- Using command line:
cl /permissive- /GS /analyze- /W3 /Gm- /Ox /sdl /D "WIN32" program.c
In Visual Studio:
Disabling
RTC
in Project->
Properties->
Code Generation and setting Basic Runtime Checks to default.Enabling optimization in Project
->
Properties->
Optimization and setting Optimization to /Ox.With debugger in
x86
mode.
Nobody has looked at the asm for MS's __ftol2_sse
.
From the result, we can infer that it probably converted from x87 to signed int
/ long
(both 32-bit types on Windows), instead of safely to uint32_t
.
x86 FP -> integer instructions that overflow the integer result don't just wrap / truncate: they produce what Intel calls the "integer indefinite" when the exact value is not representable in the destination: high bit set, other bits clear. i.e. 0x80000000
.
(Or if the FP invalid exception isn't masked, it fires and no value is stored. But in the default FP environment, all FP exceptions are masked. That's why for FP calculations you can get a NaN instead of a fault.)
That includes both x87 instructions like fistp
(using the current rounding mode) and SSE2 instructions like cvttsd2si eax, xmm0
(using truncation toward 0, that's what the extra t
means).
So it's a bug to compile double
->unsigned
conversion into a call to __ftol2_sse
.
Side-note / tangent:
On x86-64, FP -> uint32_t can be compiled to cvttsd2si rax, xmm0
, converting to a 64-bit signed destination, producing the uint32_t you want in the low half (EAX) of the integer destination.
It's C and C++ UB if the result is outside the 0..2^32-1 range so it's ok that huge positive or negative values will leave the low half of RAX (EAX) zero from the integer indefinite bit-pattern. (Unlike integer->integer conversions, modulo reduction of the value is not guaranteed. Is the behaviour of casting a negative double to unsigned int defined in the C standard? Different behaviour on ARM vs. x86. To be clear, nothing in the question is undefined or even implementation-defined behaviour. I'm just pointing out that if you have FP->int64_t, you can use it to efficiently implement FP->uint32_t. That includes x87 fistp
which can write a 64-bit integer destination even in 32-bit and 16-bit mode, unlike SSE2 instructions which can only directly handle 64-bit integers in 64-bit mode.