Local vs global truncation error
The truncation error does not satisfy that equation, it's just its definition.
Consider two following problems:
- The first is an ODE. $$ y'(t) = f(t, y(t))\\ y(0) = a. $$ Its solution is some smooth function $y(t)$.
- The second is a difference equation $$ \frac{z_{i+1} - z_i}{h} = f(t_i, z_i)\\ z_0 = a. $$ Its solution is some discrete function $z_i$.
I've intentionally used different letters to denote those two solutions. They are quite different, the former is a smooth function while the latter is a discrete one. One needs to be careful even to compare those two. Usually the third function is introduced. It is defined as a restriction of the smooth $y(t)$ to the grid $t_i$, where the discrete function $z_i$ is defined. Let's denote the restriction as $w_i$: $$ w_i \equiv y(t_i). $$
The function $w_i$ is discrete just like $z_i$ and $w_i$ coincide with $y(t)$ at grid points. Since now $w_i$ and $z_i$ are functions of the same class we can easily compare them: $$ e_i = w_i - z_i \equiv y(t_i) - z_i. $$ So, roughly speaking, the global error shows how close are $y(t)$ and $z_i$ (by restricting the former to the grid). When someone is solving some problem numerically the global error is what he is interesting in. Anyway, direct computation of global error is almost impossible, since we often simply do not have the exact values of $w_i = y(t_i)$ ( in contradistinction to $z_i$, which we can compute easily).
And the local truncation error concept comes to the rescue. Note that previously we've compared the solutions. Now we're going to compare problems. Take $z_i$. It is the solution to the second problem. Plugging $z_i$ into it makes it a valid identity $$ \frac{z_{i+1} - z_i}{h} = f(t_i, z_i)\\ z_0 = a. $$ But if we now take $w_i$ and try to plug it into the difference scheme we wont get an identity. Instead we'll get a residual: $$ \frac{w_{i+1} - w_i}{h} = f(t_i, w_i) \color{red}{{}+ d_i}\\ w_0 = a \color{red}{{} + d_0}. $$ If we are very lucky, some residuals may vanish, like $d_0$, but often it is not the case.
So why is $d_i$ interesting while it also is defined in terms of $w_i$ (the unknown solution to the original problem)? It turns out that we can estimate the $d_i$ without knowing the exact values of $w_i$ by just knowing the original problem. $$ d_i = \frac{w_{i+1} - w_i}{h} - f(t_i, w_i) \equiv \frac{y(t_{i+1}) - y(t_i)}{h} - f(t_i, y(t_i)) = \\ = y'(t_i) + h \frac{y''(t_i)}{2} + O(h^2) - f(t_i, y(t_i)) = \\ = \color{blue}{\left[y'(t_i) - f(t_i, y(t_i))\right]} + \color{red}{h \frac{y''(t_i)}{2} + O(h^2)} $$ The blue term in braces is exactly the original ODE, and $y(t)$ is exactly its solution. So the term is equal to zero. $$ d_i = h \frac{y''(t_i)}{2} + O(h^2). $$ Similar result may be obtained if using different form of Taylor's formula: $$ d_i = h \frac{y''(\xi_i)}{2}, \qquad \xi_i \in [t_{i}, t_{i+1}]. $$
So now we can estimate the local truncation error, but we're interested in the global error.
To relate them we need to introduce another concept of stability. Consider the two discrete problems $$ \begin{aligned} &\frac{z_{i+1} - z_i}{h} = f(t_i, z_i)\\ &z_0 = a \end{aligned} \qquad\text{and}\qquad \begin{aligned} &\frac{w_{i+1} - w_i}{h} = f(t_i, w_i) \color{green}{{} + d_i}\\ &w_0 = a \color{green}{{} + d_0} \end{aligned}. $$ Pretend that we know $d_i$. Let's view the second problem as a perturbation of the first one. That's reasonable, since $d_i$ is a small value of $O(h)$ magnitude. A difference problem is called stable if such small perturbations result in small changes of the solution. For this case this means that the difference $z_i - w_i$ will also be small. Precisely $$ \max_i |z_i - w_i| \leq C \max_i |d_i| $$ where $C$ is called the stability constant of the method. For the explicit Euler method it can be shown that for Lipschitz-continuous $f$ $$ C \leq e^{LT} $$ with $L$ being the Lipschitz constant of $f$ and $T$ is the total integration time $T = \max_i t_i$.
Finally we can relate the global error and the local truncation error by $$ |e_i| \leq C \max_i |d_i| $$
If the local truncation error tends to zero when the discrete mesh is refined the numerical method is called consistent. The Lax theorem states that a stable consistent method converges, in sense that $e_i \to 0$ when the mesh is refined.