Is there any difference between using floating point casts vs floating point suffixes in C and C++?

The default is double. Assuming IEEE754 floating point, double is a strict superset of float, and thus you will never lose precision by not specifying f. EDIT: this is only true when specifying values that can be represented by float. If rounding occurs this might not be strictly true due to having rounding twice, see Eric Postpischil's answer. So you should also use the f suffix for floats.

This example is also problematic:

long double MY_LONG_DOUBLE = (long double)3.14159265358979323846264338328;

This first gives a double constant which is then converted to long double. But because you started with a double you have already lost precision that will never come back. Therefore, if you want to use full precision in long double constants you must use the L suffix:

long double MY_LONG_DOUBLE = 3.14159265358979323846264338328L; // L suffix

There is a difference between using a suffix and a cast; 8388608.5000000009f and (float) 8388608.5000000009 have different values in common C implementations. This code:

#include <stdio.h>

int main(void)
{
    float x =         8388608.5000000009f;
    float y = (float) 8388608.5000000009;
    printf("%.9g - %.9g = %.9g.\n", x, y, x-y);
}

prints “8388609 - 8388608 = 1.” in Apple Clang 11.0 and other implementations that use correct rounding with IEEE-754 binary32 for float and binary64 for double. (The C standard permits implementations to use methods other than IEEE-754 correct rounding, so other C implementations may have different results.)

The reason is that (float) 8388608.5000000009 contains two rounding operations. With the suffix, 8388608.5000000009f is converted directly to float, so the portion that must be discarded in order to fit in a float, .5000000009, is directly examined in order to see whether it is greater than .5 or not. It is, so the result is rounded up to the next representable value, 8388609.

Without the suffix, 8388608.5000000009 is first converted to double. When the portion that must be discarded, .0000000009, is considered, it is found to be less than ½ the low bit at the point of truncation. (The value of the low bit there is .00000000186264514923095703125, and half of it is .000000000931322574615478515625.) So the result is rounded down, and we have 8388608.5 as a double. When the cast rounds this to float, the portion that must be discarded is .5, which is exactly halfway between the representable numbers 8388608 and 8388609. The rule for breaking ties rounds it to the value with the even low bit, 8388608.

(Another example is “7.038531e-26”; (float) 7.038531e-26 is not equal to 7.038531e-26f. This is the such numeral with fewer than eight significant digits when float is binary32 and double is binary64, except of course “-7.038531e-26”.)

Is there any difference between using floating point casts vs floating point suffixes in C and C++?

Tags:

Floating Point

C++

C

Casting

Related

Recent Posts