What does it mean for a polynomial to be the 'best' approximation of a function around a point?
Given a function $f$, polynomials $p_1$ and $p_2$, and some $x_0$, we can define "better" as meaning that there is a neighborhood in which it is a better approximation. That is, if there exists $\epsilon$ such that $(|x-x_0|<\epsilon) \rightarrow (|p_1(x)-f(x)|<|p_2(x)-f(x)|)$, then near $x_0$, $p_1$ is a better approximation to $f$ than $p_2$ is.
So Taylor polynomials can, with this definition, be said to be better than any other polynomial with the same order. That is, if $f$ is analytic and $T_n$ is the $n$th order Taylor polynomial of $f$, then for all $n$th order polynomials $g$, there exists a neighborhood around $x_0$ such that $T_n$ is better than $g$.
I think that, in the area of function approximation, we heve to distinguish two cases
- around a point
- over a range
In the case of $\sin(x)$, if we take into account that is an odd function, for sure around $x=0$ only odd terms will be used and the best quadratic approximation will be something as $k x$ and if we want to math the slope at $x=0$, we shall have $k=1$.
Now, forget about the properties of $\sin(x)$ and say that you want the best quadratic approximation between $x=0$ and $x=\frac \pi 6$. So, consider the norm $$\Phi=\int_0^{\frac \pi 6} \Big[a+b x+c x^2-\sin(x)\big]^2\,dx$$ I shall skip the intermediate calculations and the minimum of $\Phi$ will be obtained for $$a=-\frac{9 \left(1440-720 \sqrt{3}-48 \pi -6 \pi ^2+\sqrt{3} \pi ^2\right)}{\pi ^3}\approx -0.00116136$$ $$b=\frac{432 \left(1080-540 \sqrt{3}-42 \pi -3 \pi ^2+\sqrt{3} \pi ^2\right)}{\pi ^4}\approx 1.02675$$ $$c=-\frac{3240 \left(864-432 \sqrt{3}-36 \pi -2 \pi ^2+\sqrt{3} \pi ^2\right)}{\pi ^5}\approx -0.128777$$ This would give $\Phi=9.91 \times 10^{-8}$ while setting $b=1$ and $a=c=0$, over that range, the norme would be $\Phi=4.19 \times 10^{-5}$.
Try this. Graph $y = e^x - (x + 1).$ You'll get what appears to be a parabola near $x=1.$ Of course, it's not really a parabola. Indeed, it would be absolutely amazing if the transcendental function $e^x - x - 1$ had the exact geometrical focus and directrix property that a true parabola has. But it seems fairly clear (this is not a proof, of course, since we're just looking at a picture) that the tangent at $x=0$ of this graph is horizontal. Assuming this, that means the tangent to the graph of $y = e^x - (x + 1) + 0.01x$ will be $y = 0.01x.$ Why? When we're very close to $x=0,$ we're essentially adding the graph of the $x$-axis to the graph of $y = 0.01x.$ And sure enough, if you look at a graph of $y = e^x - (x + 1) + 0.01x$, then you'll see that near $x=0$ the graph is linear and not horizontal (this much you can tell without trying to determine whether it's actually $y = 0.01x$ instead of possibly some other non-horizontal line), which means that changes in $y$ are proportional (by a nonzero constant) to changes in $x,$ something that was NOT true for the graph of $y = e^x - (x+1).$
It will help to investigate, for yourself, the graphs of $y = e^x - (x + 1) + ax$ for various values of $a \neq 0.$ In all such cases you should find that the graph crosses the $x$-axis at a nonzero angle, although when $a$ is close to $0$ you might have to zoom in a bit to see this.
This investigation suggests that, among all possible linear functions (by "linear", I mean "algebraic of degree at most $1),$ the one that BEST approximates $e^x$ in the vicinity of $x=0$ is $Ax + B$ for $A = 1+a$ and $B = 1,$ where $a = 0.$ [We actually haven't looked at what happens if $B \neq 1.$ It should be easy to see what happens if $B \neq 1,$ regardless of how we might try to vary the coefficient of $x$ to fix things.]
Usually the next step, when students are presented with an investigation such as this, is to consider what quadratic term we might add to get a better approximation. But before doing that, let's look at an intermediate adjustment to the approximation $1 + x,$ one of the form $1 + x + a|x|^{1.3},$ for various values of $a.$ The reason I'm using $|x|$ is to avoid issues with computer algebra systems trying to interpret everything for complex numbers. You'll find that near $x=0$ it doesn't matter what the value of $a$ is. Consider, for instance, the graph of (1) $y = e^x - (1 + x) + 2|x|^{1.3}$ and the graph of (2) $y = e^x - (1 + x) + 42|x|^{1.3}$. There seems to be no qualitative distinction between (1) the differences of the values of $e^x$ and the values of $1 + x + 2|x|^{1.3}$ and (2) the differences of the values of $e^x$ and the values of $1 + x + 42|x|^{1.3}.$ Of course, to be more convincing (still not a proof, however), you'll want to zoom in closer to $x=0$ to see whether this apparent similarity between (1) and (2) continues to hold. Also, if you try negative values of $a,$ then you'll find that the graph is below the $x$-axis, but the qualitative features are the same. Being below the $x$-axis for negative values of $a$ just means that when $a < 0$ and we're close to $x=0,$ the values of $e^x$ are less than the values of $1 + x + a|x|^{1.3}.$
Let's review things a bit. First, $1 + x$ is the best linear approximation to $e^x$ in the sense that, as $x$ approaches $0,$ the errors are smaller than $ax$ for any $a \neq 0.$ The previous paragraph appears to show that we don't get any substantial benefit by considering adjustments of the form $a|x|^{1.3},$ since the effect of adjusting $1 + x$ by adding $a|x|^{1.3}$ appears to produce graphs that look like $a|x|^{1.3}.$
If you repeat the above investigation for other possibilities of the form $1 + x + a|x|^{b},$ where $1 < b < 2,$ then you'll find that essentially the same thing happens --- there is no unique BEST approximation for these exponents, in the sense that there is NOT a unique value of $a$ (for any previously specified $b)$ that gives a qualitatively better approximation than all other values of $a.$
The situation changes abruptly if we use $b=2.$ Consider the graphs of $y = e^x - (1 + x - 2x^2)$ and $y = e^x - (1 + x + 5x^2)$. In each case the graphs appear quadratic near the origin, which suggests that the errors are proportional to $x^2,$ which is not qualitatively different than simply using the approximation $1 + x.$ If you experiment by changing the coefficient of the quadratic, you'll find the same thing until by chance you happen to try the magic value $1/2.$ The graph of $y = e^x - (1 + x + \frac{1}{2}x^2)$ appears to be cubic near $x=0.$
To bring this to a close, because this is getting much longer than I really had time for (it started out as a comment), $1 + x + \frac{1}{2}x^2$ is the best quadratic approximation to $e^x$ in the sense that, for functions of the form $ax^2 + bx + c,$ you won't get the errors to be smaller than quadratic (in $x)$ as $x$ approaches $0$ unless you choose $a = \frac{1}{2}$ and $b = 1$ and $c = 1.$ If you don't have $c=1,$ then there will be a "zeroth order" error (i.e. the errors will be proportional to $x^0$ in the limit as $x \rightarrow 0).$ And if $c=1,$ but you don't have $b = 1,$ then there will be a "first order" error (i.e. the errors will be proportional to $x^1$ in the limit as $x \rightarrow 0).$ And finally, if $c=1$ and $b=1,$ but you don't have $a = \frac{1}{2},$ then there will be a "second order" error (i.e. the errors will be proportional to $x^2$ in the limit as $x \rightarrow 0).$ However, if $c=1$ and $b=1$ and $a=\frac{1}{2},$ then the error will be proportional to $x^3$ (and not just to some intermediate order of smallness, like $x^{2.3}$ or $x^{2.87})$ in the limit as $x \rightarrow 0.$
As for the situation with $\sin x,$ what happens is that not only is $x$ the best linear approximation, but in fact $x$ is also the best quadratic approximation. That is, the best quadratic approximation is $x + 0x^2.$ And if you look at the graph of $\sin x - x$, you'll see that it resembles $x^3$ near $x=0.$
I'll end with this question. How does it happen that, once we stumble upon the best quadratic approximation, then only way to get a better approximation is to consider cubic adjustments? That is, why don't we have best $x^b$ approximations for non-integer values of $b$? Or to put it another way, could it be possible that the error between a function and its best quadratic approximation NOT be proportional to $x^3$ as $x \rightarrow 0,$ but instead be a bit later by, for example, being proportional to $x^{2.71}$ as $x \rightarrow 0$? In short, what is behind these exponent jumps, which one might see as analogous to quantum jumps in electron energy in atoms? (Answer: It has to do with the $C^n$ smoothness assumptions in Taylor's theorem.)