What range of numbers can be represented in a 16-, 32- and 64-bit IEEE-754 systems?
For a given IEEE-754 floating point number X, if
2^E <= abs(X) < 2^(E+1)
then the distance from X to the next largest representable floating point number (epsilon) is:
epsilon = 2^(E-52) % For a 64-bit float (double precision)
epsilon = 2^(E-23) % For a 32-bit float (single precision)
epsilon = 2^(E-10) % For a 16-bit float (half precision)
The above equations allow us to compute the following:
For half precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^10. Any larger than this and the distance between floating point numbers is greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 1. Any larger than this and the distance between floating point numbers is greater than 0.0005.
For single precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^23. Any larger than this and the distance between floating point numbers is greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^13. Any larger than this and the distance between floating point numbers is greater than 0.0005.
For double precision...
If you want an accuracy of +/-0.5 (or 2^-1), the maximum size that the number can be is 2^52. Any larger than this and the distance between floating point numbers is greater than 0.5.
If you want an accuracy of +/-0.0005 (about 2^-11), the maximum size that the number can be is 2^42. Any larger than this and the distance between floating point numbers is greater than 0.0005.
For floating-point integers (I'll give my answer in terms of IEEE double-precision), every integer between 1 and 2^53 is exactly representable. Beyond 2^53, integers that are exactly representable are spaced apart by increasing powers of two. For example:
- Every 2nd integer between 2^53 + 2 and 2^54 can be represented exactly.
- Every 4th integer between 2^54 + 4 and 2^55 can be represented exactly.
- Every 8th integer between 2^55 + 8 and 2^56 can be represented exactly.
- Every 16th integer between 2^56 + 16 and 2^57 can be represented exactly.
- Every 32nd integer between 2^57 + 32 and 2^58 can be represented exactly.
- Every 64th integer between 2^58 + 64 and 2^59 can be represented exactly.
- Every 128th integer between 2^59 + 128 and 2^60 can be represented exactly.
- Every 256th integer between 2^60 + 256 and 2^61 can be represented exactly.
- Every 512th integer between 2^61 + 512 and 2^62 can be represented exactly. . . .
Integers that are not exactly representable are rounded to the nearest representable integer, so the worst case rounding is 1/2 the spacing between representable integers.
The precision quoted form Peter R's link to the MSDN ref is probably a good rule of thumb, but of course reality is more complicated.
The fact that the "point" in "floating point" is a binary point and not decimal point has a way of defeating our intuitions. The classic example is 0.1, which needs a precision of only one digit in decimal but isn't representable exactly in binary at all.
If you have a weekend to kill, have a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic. You'll probably be particularly interested in the sections on Precision and Binary to Decimal Conversion.