How to simulate Single precision rounding with Doubles?

Use code like this (C):

double x, y;
/ ... y gets a double value somewhere ... /
x = (double)(float)y;

After that, x (double) will have a value resulted from rounding of y as single precision float.

You can get most of the effect of rounding to single precision using:

y = x + x * 0x1p29 - x * 0x1p29;

In most cases, this produces the same result in y as if x had been rounded to float (32-bit binary IEEE 754) and then converted back to double (64-bit). It works by adding a value (x * 0x1p29) that “pushes” some bits of x out of the significand, causing rounding at bit 23, and then subtracting the value that was added. (0x1p29 is hexadecimal floating-point for 2²⁹, 536870912.)

In rare cases, it produces a slightly different result. If you merely want to reduce noise in a model, these rare cases might be negligible. If you want to eliminate them, then, instead of adding and subtracting 2²⁹x, you could find the largest power of 2 not greater than x and add and subtract 2²⁹ times that instead of 2²⁹x. (To find the power of 2, you can take the base-two logarithm and take the floor of that. However, there are still rounding issues that might require compensation. Additionally, if the input might be zero or negative, you must avoid the error that occurs when taking its logarithm.)

Additionally, this does not reproduce the behavior for numbers that are subnormal in single-precision or that overflow in single-precision.

Finally, there are rare cases where computing a double-precision result and then rounding to single precision produces a result slightly different from computing a single-precision result originally, and no method of rounding the double-precision result will fix this.

You can create a custom function using a module in Excel VBA:

Function SINGLEFLO(Eingang As Double)
    Dim MySingle As Single
    MySingle = CSng(Eingang)
    SINGLEFLO = CDbl(MySingle)
End Function

Then you can use SINGLEFLO() in all Excel Cells were you need limiting precision to Single Float.

You want to use the library functions frexp and ldexp, which are standard C99 functions, and are available in Lua.

frexp takes a floating point number and separates the mantissa from the exponent. The resulting mantissa is either 0 or in one of the ranges [0.5, 1.0) or (-1.0, 0.5]. You can then remove any extra bits in the obvious way (floor(mantissa * 2^k)/2^k for non-negative values, for example). (Edited to add:) It would be better to subtract k from the exponent in the call to ldexp than to do the divide as shown, because I'm pretty sure that Lua doesn't guarantee that 2^k is precise.

ldexp is the inverse of frexp; you can use that to put the truncated number back together again.

I have no idea how to do this in Excel. Check the manual :) (Edited to add:) I suppose you could get roughly the same effect by dividing the number by 2 to the power of the ceiling of the log 2 of the number, and then doing the binary round as indicated above, and then reversing the process to recreate the original exponent. But I suspect the results would occasionally run into peculiarities with Excel's peculiar ideas about arithmetic.

How to simulate Single precision rounding with Doubles?

Tags:

Floating Point

Double

Ieee 754

Floating Accuracy

Related

Recent Posts