Understanding simple numerical calculation
Introduction
This seems to be a question more about IEEE 754 binary64 format than about Mathematica per se.
The significand is 53 bits (52 stored, since the leading bit is assumed to be 1
in a normal number). When the input "0.1"
is converted to a number, presumably, at some point 1.
is divided by 10.
The OP used the term "truncate," but more precisely, the result should be rounded to the nearest 53-bit floating-point number.
Some tools one can use to explore the binary representation of machine precision numbers are
SetPrecision[x, Infinity]
RealDigits[x, 2, 53]
SetPrecision[x, Infinity]
converts the machine real to the exact fraction represented by the floating point number x
. RealDigits[x, 2, 53]
shows the values of the bits. If you replace the 53
with 54
or higher, the last "bits" will be returned as Indeterminate
. Below I plot the bits so that a 1
is red, 0
is white, and Indeterminate
is gray. The 53-bit limit is indicated with a vertical grid line.
OP's examples
In the OP's example Table[]
command, the computed fraction is truncated too soon. Here is a table showing what is going on. The first three rows are the OP's table carried out to 12
iterations (as in the OP) up to 14
. One has to go up to 14
to get all the bits needed to approximate 0.1
to machine precision. The last three rows show the bits of the machine real 0.1
, the result of SetPrecision[0.1, Infinity]
, and the exact fraction 1/10
. One can see that rounding 1/10
results in a carry at the last bit.
ClearAll[tenth, mantissaplot, frexp]; tenth[n_] := Table[1/2^(4*i) + 1/2^(4*i + 1), {i, 1, n}] // Total; mantissaplot[{digits_, exp_}, ref_: 0.1] := {ArrayPlot[ {digits}, ColorRules -> {1 -> Red, Indeterminate -> Gray}, Mesh -> True, MeshStyle -> Directive[Thin, Black], ImageSize -> 450, Axes -> {True, False}, GridLines -> {{exp - Ceiling@Log2[ref], 53}, None}], exp}; frexp[x_, bits_: 54] := mantissaplot@RealDigits[x, 2, bits]; TableForm[ Join[ Table[frexp[tenth[n], 57], {n, 12, 14}], {frexp[0.1, 57], frexp[SetPrecision[0.1, Infinity], 57], frexp[1/10, 57]} ], TableHeadings -> {{12, 13, 14, 0.1, Subscript[0.1, Infinity], 1/10}, {"significand", "exp"}}]
Now let's take up the OP example 10.1 - 10.
In the table below, we can see that to store 10.1
the part representing 0.1
is shifted over 7 bits, so that the last 7 bits in 0.1
above are dropped. When 10.
is subtracted, the result is shifted back, but the 7 bits are already lost. (This what is meant by precision loss due to subtractive cancellation.)
TableForm[{ frexp[10.1], frexp[10.1 - 10.], frexp[0.1] }, TableHeadings -> {{10.1, HoldForm[10.1 - 10.], 0.1}, {"significand", "exp"}}]
Update:
My original answer is based on the apparently wrong assumption that the numbers are represented by $s\times 10^e$ with significand $s$ and exponent $e$. The correct binary representation is $s\times 2^e$ with $s$ having 52 bits plus one bit for the sign and $e$ being 11 bits long for 64 bit floating point numbers. Therefore, 0.1 is not exactly represented by this format.
Original answer:
All your numbers are exactly represented as floating point numbers. There are certain operations that preserve this precision. It seems not changing the exponent is one of them:
N[0.2 - 0.1] // FullForm
(* 0.1` *)
and only changing the exponent is one of them:
N[10/0.1] // FullForm
(* 100.` *)
However, changing both in one operation may lead to numerical errors:
N[1.1-1.0] // FullForm
(* 0.10000000000000009` *)