Calculate the unit in the last place (ULP) for doubles

It seems the function is pretty trivial; this is based on the pseudocode in the accepted answer to the question linked by vulkanino:

double value = whatever;
long bits = BitConverter.DoubleToInt64Bits(value);
double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
double result = nextValue - value;

For floats, you'd need to provide your own implementation of SingleToInt32Bits and Int32BitsToSingle, since BitConverter doesn't have those functions.

This page shows the special cases in the java implementation of the function; handling those should be fairly trivial, too.

phoog answer is good but has weaknesses with negative numbers, max_double, infinity and NaN.

phoog_ULP(positive x) --> a positive number. Good.
phoog_ULP(negative x) --> a negative number. I would expect positive number.
To fix this I recommend instead:

long bits = BitConverter.DoubleToInt64Bits(value) & 0x7FFFFFFFFFFFFFFFL;

Below are fringe cases that need resolution should you care...

phoog_ULP(x = +/- Max_double 1.797...e+308) returns an infinite result. (+1.996...e+292) expected.
phoog_ULP(x = +/- Infinity) results in a NaN. +Infinity expected.
phoog_ULP(x = +/- NaN) may unexpectedly change from a sNan to a qNaN. No change expected. One could argue either way on if the sign should become + in this case.

To solve these, I only see a short series of brutish if() tests to accommodate these, possible on the "bits" value for expediency. Example:

double ulpc(double value) {
  long long bits = BitConverter::DoubleToInt64Bits(value);
  if ((bits & 0x7FF0000000000000L) == 0x7FF0000000000000L) { // if x is not finite
    if (bits & 0x000FFFFFFFFFFFFFL) { // if x is a NaN
      return value;  // I did not force the sign bit here with NaNs.
      } 
    return BitConverter.Int64BitsToDouble(0x7FF0000000000000L); // Positive Infinity;
    }
  bits &= 0x7FFFFFFFFFFFFFFFL; // make positive
  if (bits == 0x7FEFFFFFFFFFFFFFL) { // if x == max_double (notice the _E_)
    return BitConverter.Int64BitsToDouble(bits) - BitConverter.Int64BitsToDouble(bits-1);
  }
  double nextValue = BitConverter.Int64BitsToDouble(bits + 1);
  double result = nextValue - fabs(value);
  return result;
}

Calculate the unit in the last place (ULP) for doubles

Tags:

C#

Floating Point

Double

.Net

Related

Recent Posts