How to properly compare an integer and a floating-point value?
(Restricting this answer to positive numbers; generalisation is trivial.)
Get the number of bits in your exponent for the
float
on your platform along with the radix. If you have an IEEE754 32 bitfloat
then this is a trivial step.Use (1) to compute the largest non-integer value that can be stored in your
float
.std::numeric_limits
doesn't specify this value, annoyingly, so you need to do this yourself. For 32 bit IEEE754 you could take the easy option:8388607.5
is the largest non-integral typefloat
.If your
float
is less than or equal to (2), then check if it's an integer or not. If it's not an integer then you can round it appropriately so as not to invalidate the<
.At this point, the
float
is an integer. Check if it's within in the range of yourlong long
. If it's out of range then the result of<
is known.If you get this far, then you can safely cast your
float
to along long
, and make the comparison.
Here's what I ended up with.
Credit for the algorithm goes to @chux; his approach appears to outperform the other suggestions. You can find some alternative implementations in the edit history.
If you can think of any improvements, suggestions are welcome.
#include <cmath>
#include <limits>
#include <type_traits>
enum partial_ordering {less, equal, greater, unordered};
template <typename I, typename F>
partial_ordering compare_int_float(I i, F f)
{
if constexpr (std::is_integral_v<F> && std::is_floating_point_v<I>)
{
return compare_int_float(f, i);
}
else
{
static_assert(std::is_integral_v<I> && std::is_floating_point_v<F>);
static_assert(std::numeric_limits<F>::radix == 2);
// This should be exactly representable as F due to being a power of two.
constexpr F I_min_as_F = std::numeric_limits<I>::min();
// The `numeric_limits<I>::max()` itself might not be representable as F, so we use this instead.
constexpr F I_max_as_F_plus_1 = F(std::numeric_limits<I>::max()/2+1) * 2;
// Check if the constants above overflowed to infinity. Normally this shouldn't happen.
constexpr bool limits_overflow = I_min_as_F * 2 == I_min_as_F || I_max_as_F_plus_1 * 2 == I_max_as_F_plus_1;
if constexpr (limits_overflow)
{
// Manually check for special floating-point values.
if (std::isinf(f))
return f > 0 ? less : greater;
if (std::isnan(f))
return unordered;
}
if (limits_overflow || f >= I_min_as_F)
{
// `f <= I_max_as_F_plus_1 - 1` would be problematic due to rounding, so we use this instead.
if (limits_overflow || f - I_max_as_F_plus_1 <= -1)
{
I f_trunc = f;
if (f_trunc < i)
return greater;
if (f_trunc > i)
return less;
F f_frac = f - f_trunc;
if (f_frac < 0)
return greater;
if (f_frac > 0)
return less;
return equal;
}
return less;
}
if (f < 0)
return greater;
return unordered;
}
}
If you want to experiment with it, here are a few test cases:
#include <cmath>
#include <iomanip>
#include <iostream>
void compare_print(long long a, float b, int n = 0)
{
if (n == 0)
{
auto result = compare_int_float(a,b);
std::cout << a << ' ' << "<=>?"[int(result)] << ' ' << b << '\n';
}
else
{
for (int i = 0; i < n; i++)
b = std::nextafter(b, -INFINITY);
for (int i = 0; i <= n*2; i++)
{
compare_print(a, b);
b = std::nextafter(b, INFINITY);
}
std::cout << '\n';
}
}
int main()
{
std::cout << std::setprecision(1000);
compare_print(999999984306749440,
999999984306749440.f, 2);
compare_print(999999984306749439,
999999984306749440.f, 2);
compare_print(100,
100.f, 2);
compare_print(-100,
-100.f, 2);
compare_print(0,
0.f, 2);
compare_print((long long)0x8000'0000'0000'0000,
(long long)0x8000'0000'0000'0000, 2);
compare_print(42, INFINITY);
compare_print(42, -INFINITY);
compare_print(42, NAN);
std::cout << '\n';
compare_print(1388608,
1388608.f, 2);
compare_print(12388608,
12388608.f, 2);
}
(run the code)
To compare a FP f
and integer i
for equality:
(Code is representative and uses comparison of float
and long long
as an example)
If
f
is a NaN, infinity, or has a fractional part (perhaps usefrexp()
),f
is not equal toi
.float ipart; // C++ if (frexp(f, &ipart) != 0) return not_equal; // C if (frexpf(f, &ipart) != 0) return not_equal;
Convert the numeric limits of
i
into exactly representable FP values (powers of 2) near those limits.** Easy to do if we assume FP is not a rare base 10 encoding and range ofdouble
exceeds the range on thei
. Take advantage that integer limits magnitudes are or near Mersenne Number. (Sorry example code is C-ish)#define FP_INT_MAX_PLUS1 ((LLONG_MAX/2 + 1)*2.0) #define FP_INT_MIN (LLONG_MIN*1.0)
Compare
f
to is limitsif (f >= FP_INT_MAX_PLUS1) return not_equal; if (f < FP_INT_MIN) return not_equal;
Convert
f
to integer and comparereturn (long long) f == i;
To compare a FP f
and integer i
for <
, >
, ==
or not comparable:
(Using above limits)
Test
f >= lower limit
if (f >= FP_INT_MIN) {
Test
f <= upper limit
// reform below to cope with effects of rounding // if (f <= FP_INT_MAX_PLUS1 - 1) if (f - FP_INT_MAX_PLUS1 <= -1.0) {
Convert
f
to integer/fraction and compare// at this point `f` is in the range of `i` long long ipart = (long long) f; if (ipart < i) return f_less_than_i; if (ipart > i) return f_more_than_i; float frac = f - ipart; if (frac < 0) return f_less_than_i; if (frac > 0) return f_more_than_i; return equal; }
Handle edge cases
else return f_more_than_i; } if (f < 0.0) return f_less_than_i; return not_comparable;
Simplifications possible, yet I wanted to convey the algorithm.
** Additional conditional code needed to cope with non 2's complement integer encoding. It is quite similar to the MAX
code.