Understanding simple numerical calculation

Introduction

This seems to be a question more about IEEE 754 binary64 format than about Mathematica per se.

The significand is 53 bits (52 stored, since the leading bit is assumed to be 1 in a normal number). When the input "0.1" is converted to a number, presumably, at some point 1. is divided by 10. The OP used the term "truncate," but more precisely, the result should be rounded to the nearest 53-bit floating-point number.

Some tools one can use to explore the binary representation of machine precision numbers are

SetPrecision[x, Infinity]
RealDigits[x, 2, 53]

SetPrecision[x, Infinity] converts the machine real to the exact fraction represented by the floating point number x. RealDigits[x, 2, 53] shows the values of the bits. If you replace the 53 with 54 or higher, the last "bits" will be returned as Indeterminate. Below I plot the bits so that a 1 is red, 0 is white, and Indeterminate is gray. The 53-bit limit is indicated with a vertical grid line.

OP's examples

In the OP's example Table[] command, the computed fraction is truncated too soon. Here is a table showing what is going on. The first three rows are the OP's table carried out to 12 iterations (as in the OP) up to 14. One has to go up to 14 to get all the bits needed to approximate 0.1 to machine precision. The last three rows show the bits of the machine real 0.1, the result of SetPrecision[0.1, Infinity], and the exact fraction 1/10. One can see that rounding 1/10 results in a carry at the last bit.

Mathematica graphics

ClearAll[tenth, mantissaplot, frexp];
tenth[n_] := Table[1/2^(4*i) + 1/2^(4*i + 1), {i, 1, n}] // Total;
mantissaplot[{digits_, exp_}, ref_: 0.1] := {ArrayPlot[
    {digits}, ColorRules -> {1 -> Red, Indeterminate -> Gray},
    Mesh -> True, MeshStyle -> Directive[Thin, Black], 
    ImageSize -> 450, Axes -> {True, False}, 
    GridLines -> {{exp - Ceiling@Log2[ref], 53}, None}],
   exp};
frexp[x_, bits_: 54] := mantissaplot@RealDigits[x, 2, bits];
TableForm[
 Join[
  Table[frexp[tenth[n], 57], {n, 12, 14}],
  {frexp[0.1, 57],
   frexp[SetPrecision[0.1, Infinity], 57],
   frexp[1/10, 57]}
  ],
 TableHeadings -> {{12, 13, 14, 0.1, Subscript[0.1, Infinity], 
    1/10}, {"significand", "exp"}}]

Now let's take up the OP example 10.1 - 10. In the table below, we can see that to store 10.1 the part representing 0.1 is shifted over 7 bits, so that the last 7 bits in 0.1 above are dropped. When 10. is subtracted, the result is shifted back, but the 7 bits are already lost. (This what is meant by precision loss due to subtractive cancellation.)

Mathematica graphics

TableForm[{
  frexp[10.1],
  frexp[10.1 - 10.],
  frexp[0.1]
  },
 TableHeadings -> {{10.1, HoldForm[10.1 - 10.], 0.1}, {"significand", "exp"}}]

Update:

My original answer is based on the apparently wrong assumption that the numbers are represented by $s\times 10^e$ with significand $s$ and exponent $e$. The correct binary representation is $s\times 2^e$ with $s$ having 52 bits plus one bit for the sign and $e$ being 11 bits long for 64 bit floating point numbers. Therefore, 0.1 is not exactly represented by this format.

Original answer:

All your numbers are exactly represented as floating point numbers. There are certain operations that preserve this precision. It seems not changing the exponent is one of them:

N[0.2 - 0.1] // FullForm
(* 0.1` *)

and only changing the exponent is one of them:

N[10/0.1] // FullForm
(* 100.` *)

However, changing both in one operation may lead to numerical errors:

N[1.1-1.0] // FullForm
(* 0.10000000000000009` *)

Understanding simple numerical calculation

Introduction

OP's examples

Tags:

Numerics

Number Representation

Related

Recent Posts