Is there any difference between using floating point casts vs floating point suffixes in C and C++?
The default is double
. Assuming IEEE754 floating point, double
is a strict superset of float
, and thus you will never lose precision by not specifying f
. EDIT: this is only true when specifying values that can be represented by float
. If rounding occurs this might not be strictly true due to having rounding twice, see Eric Postpischil's answer. So you should also use the f
suffix for floats.
This example is also problematic:
long double MY_LONG_DOUBLE = (long double)3.14159265358979323846264338328;
This first gives a double
constant which is then converted to long double
. But because you started with a double
you have already lost precision that will never come back. Therefore, if you want to use full precision in long double
constants you must use the L
suffix:
long double MY_LONG_DOUBLE = 3.14159265358979323846264338328L; // L suffix
There is a difference between using a suffix and a cast; 8388608.5000000009f
and (float) 8388608.5000000009
have different values in common C implementations. This code:
#include <stdio.h>
int main(void)
{
float x = 8388608.5000000009f;
float y = (float) 8388608.5000000009;
printf("%.9g - %.9g = %.9g.\n", x, y, x-y);
}
prints “8388609 - 8388608 = 1.” in Apple Clang 11.0 and other implementations that use correct rounding with IEEE-754 binary32 for float
and binary64 for double
. (The C standard permits implementations to use methods other than IEEE-754 correct rounding, so other C implementations may have different results.)
The reason is that (float) 8388608.5000000009
contains two rounding operations. With the suffix, 8388608.5000000009f
is converted directly to float
, so the portion that must be discarded in order to fit in a float
, .5000000009, is directly examined in order to see whether it is greater than .5 or not. It is, so the result is rounded up to the next representable value, 8388609.
Without the suffix, 8388608.5000000009
is first converted to double
. When the portion that must be discarded, .0000000009, is considered, it is found to be less than ½ the low bit at the point of truncation. (The value of the low bit there is .00000000186264514923095703125, and half of it is .000000000931322574615478515625.) So the result is rounded down, and we have 8388608.5 as a double
. When the cast rounds this to float
, the portion that must be discarded is .5, which is exactly halfway between the representable numbers 8388608 and 8388609. The rule for breaking ties rounds it to the value with the even low bit, 8388608.
(Another example is “7.038531e-26”; (float) 7.038531e-26
is not equal to 7.038531e-26f
. This is the such numeral with fewer than eight significant digits when float
is binary32 and double
is binary64, except of course “-7.038531e-26”.)