Calculate the unit in the last place (ULP) for doubles
It seems the function is pretty trivial; this is based on the pseudocode in the accepted answer to the question linked by vulkanino:
double value = whatever;
long bits = BitConverter.DoubleToInt64Bits(value);
double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
double result = nextValue - value;
For floats, you'd need to provide your own implementation of SingleToInt32Bits
and Int32BitsToSingle
, since BitConverter doesn't have those functions.
This page shows the special cases in the java implementation of the function; handling those should be fairly trivial, too.
phoog answer is good but has weaknesses with negative numbers, max_double, infinity and NaN.
phoog_ULP(positive x)
--> a positive number. Good.phoog_ULP(negative x)
--> a negative number. I would expect positive number.
To fix this I recommend instead:
long bits = BitConverter.DoubleToInt64Bits(value) & 0x7FFFFFFFFFFFFFFFL;
Below are fringe cases that need resolution should you care...
phoog_ULP(x = +/- Max_double 1.797...e+308)
returns an infinite result. (+1.996...e+292) expected.phoog_ULP(x = +/- Infinity)
results in a NaN. +Infinity expected.phoog_ULP(x = +/- NaN)
may unexpectedly change from a sNan to a qNaN. No change expected. One could argue either way on if the sign should become + in this case.
To solve these, I only see a short series of brutish if()
tests to accommodate these, possible on the "bits" value for expediency. Example:
double ulpc(double value) {
long long bits = BitConverter::DoubleToInt64Bits(value);
if ((bits & 0x7FF0000000000000L) == 0x7FF0000000000000L) { // if x is not finite
if (bits & 0x000FFFFFFFFFFFFFL) { // if x is a NaN
return value; // I did not force the sign bit here with NaNs.
}
return BitConverter.Int64BitsToDouble(0x7FF0000000000000L); // Positive Infinity;
}
bits &= 0x7FFFFFFFFFFFFFFFL; // make positive
if (bits == 0x7FEFFFFFFFFFFFFFL) { // if x == max_double (notice the _E_)
return BitConverter.Int64BitsToDouble(bits) - BitConverter.Int64BitsToDouble(bits-1);
}
double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
double result = nextValue - fabs(value);
return result;
}