Why does MSVS not optimize away +0?
It is not the zero constant 0.0f
that is denormalized, it is the values that approach zero each iteration of the loop. As they become closer and closer to zero, they need more precision to represent, hence the denormalization. In the original question, these are the y[i]
values.
The crucial difference between the slow and fast versions of the code is the statement y[i] = y[i] + 0.1f;
. As soon as this line is executed, the extra precision in the float is lost, and the denormalization needed to represent that precision is no longer needed. Afterwards, floating point operations on y[i]
remain fast because they aren't denormalized.
Why is the extra precision lost when you add 0.1f
? Because floating point numbers only have so many significant digits. Say you have enough storage for three significant digits, then 0.00001 = 1e-5
, and 0.00001 + 0.1 = 0.1
, at least for this example float format, because it doesn't have room to store the least significant bit in 0.10001
.
The compiler cannot eliminate the addition of a floating-point positive zero because it is not an identity operation. By IEEE 754 rules, the result of adding +0. to −0. is not −0.; it is +0.
The compiler may eliminate the subtraction of +0. or the addition of −0. because those are identity operations.
For example, when I compile this:
double foo(double x) { return x + 0.; }
with Apple GNU C 4.2.1 using -O3
on an Intel Mac, the resulting assembly code contains addsd LC0(%rip), %xmm0
. When I compile this:
double foo(double x) { return x - 0.; }
there is no add instruction; the assembly merely returns its input.
So, it is likely the code in the original question contained an add instruction for this statement:
y[i] = y[i] + 0;
but contained no instruction for this statement:
y[i] = y[i] - 0;
However, the first statement involved arithmetic with subnormal values in y[i]
, so it was sufficient to slow down the program.