Float operation difference in C vs C++
Introduction: Given that the question is not detailed enough, I am left to speculate the infamous gcc's 323 bug. As the low bug-ID suggests, this bug has been there forever. The bug report existed since June 2000, currently has 94 (!) duplicates, and the last one reported only half a year ago (on 2018-08-28). The bug affects only 32 bit executable on Intel computers (like cygwin). I assume that OP's code uses x87 floating point instructions, which are the default for 32 bit executables, while SSE instructions are only optional. Since 64 bit executables are more prevalent than 32, and no longer depend on x87 instructions, this bug has zero chance of ever being fixed.
Bug description: The x87 architecture has 80 bit floating point registers. The float
requires only 32 bits. The bug is that x87 floating point operations are always done with 80 bits accuracy (subject to hardware configuration flag). This extra accuracy makes precision very flaky, because it depends on when the registers are being spilled (written) to memory.
If a 80 bit register is spilled into a 32 bit variable in memory, then extra precision is lost. This is the correct behavior if this happened after each floating point operation (since float
is supposed to be 32 bits). However, spilling to memory slows things down and no compiler writer wants the executable to run slow. So by default the values are not spilled to memory.
Now, sometimes the value is spilled to memory and sometimes it is not. It depends on optimization level, on compiler heuristics, and on other seemingly random factors. Even with -O0 there could be slightly different strategies for dealing with spilling the x87 registers to memory, resulting in slightly different results. The strategy of spilling is probably the difference between your C and C++ compilers that you experience.
Work around:
For ways to handle this, please read c handling of excess precision. Try running your compiler with -fexcess-precision=standard
and compare it with -fexcess-precision=fast
. You can also try playing with -mfpmath=sse
.
NOTE: According to the C++ standard this is not really a bug. However, it is a bug according to the documentation of GCC which claims to follow the IEEE-754 FP standard on Intel architectures (like it does on many other architectures). Obviously bug 323 violates the IEE-754 standard.
NOTE 2: On some optimization levels -fast-math
is invoked, and all bets are off regarding to extra precision and evaluation order.
EDIT I have simulated the described behavior on an intel 64-bit system, and got the same results as the OP. Here is the code:
int main()
{
float a = hex2float(0x1D9969BB);
float b = hex2float(0x6CEDC83E);
float c = hex2float(0xAC89452F);
float d = hex2float(0xD2DC92B3);
float e = hex2float(0x4FE9F23C);
float result = (float)((double)a+b-c+d+e);
print("result", result);
result = flush(flush(flush(flush(a+b)-c)+d)+e);
print("result2", result);
}
The implementations of the support functions:
float hex2float(uint32_t num)
{
uint32_t rev = (num >> 24) | ((num >> 8) & 0xff00) | ((num << 8) & 0xff0000) | (num << 24);
float f;
memcpy(&f, &rev, 4);
return f;
}
void print(const char* label, float val)
{
printf("%10s (%13.10f) : 0x%02X%02X%02X%02X\n", label, val, ((unsigned char*)&val)[0],((unsigned char*)&val)[1],((unsigned char*)&val)[2],((unsigned char*)&val)[3]);
}
float flush(float x)
{
volatile float buf = x;
return buf;
}
After running this I have got exactly the same difference between the results:
result ( 0.4185241461) : 0xCC48D63E
result2 ( 0.4185241759) : 0xCD48D63E
For some reason this is different than the "pure" version described at the question. At one point I was also getting the same results as the "pure" version, but since then the question has changed. The original values in the original question were different. They were:
float a = hex2float(0x1D9969BB);
float b = hex2float(0x6CEDC83E);
float c = hex2float(0xD2DC92B3);
float d = hex2float(0xA61FD930);
float e = hex2float(0x4FE9F23C);
and with these values the resulting output is:
result ( 0.4185242951) : 0xD148D63E
result2 ( 0.4185242951) : 0xD148D63E
The C and C++ standards both permit floating-point expressions to be evaluated with more precision than the nominal type. Thus, a+b-c+d+e
may be evaluated using double
even though the types are float
, and the compiler may optimize the expression in other ways. In particular, using exact mathematics is essentially using an infinite amount of precision, so the compiler is free to optimize or otherwise rearrange the expression based on mathematical properties rather than floating-point arithmetic properties.
It appears, for whatever reason, your compiler is choosing to use this liberty to evaluate the expression differently in different circumstances (which may be related to the language being compiled or due to other variations between your C and C++ code). One may be evaluating (((a+b)-c)+d)+e
while the other does (((a+b)+d)+e)-c
, or other variations.
In both languages, the compiler is required to “discard” the excess precision when a cast or assignment is performed. So you can compel a certain evaluation by inserting casts or assignments. Casts would make a mess of the expression, so assignments may be easier to read:
float t0 = a+b;
float t1 = t0-c;
float t2 = t1+d;
float result = t2+e;