Understanding the Frechet derivative

The only difference is that you've moved everything to one side of the equation.

$$ f'(a) = \lim\limits_{h \to 0} \frac{f(a+h)-f(a)}{h}$$

becomes

$$ 0 = \lim\limits_{h \to 0} \frac{f(a+h)-f(a)}{h} - \frac{hf'(a)}{h} $$

so that

$$ 0 = \lim\limits_{h \to 0} \frac{f(a+h)-f(a)-f'(a)h}{h}$$

chucking in some absolute values doesn't change anything

$$ 0 = \lim\limits_{h \to 0} \frac{|f(a+h)-f(a)-f'(a)h|}{|h|}$$

The utility of defining derivatives this way is that it extends to situations other than that of functions of one variable. Let's recast this once more and instead of sending $h$ to $0$ we can equivalently send $x$ to $a$ ($h=x-a$). Then we get

$$ 0 = \lim\limits_{x \to a} \frac{|f(x)-f(a)-f'(a)(x-a)|}{|x-a|}$$

Now if you replace $x$ and $a$ with vectors, $f$ with a function from vectors to vectors and think of the absolute value as the length of a vector, we have a perfectly reasonable definition for a derivative. Well...except for this "$f'(a)(x-a)$" business.

We need to replace the "number" $f'(a)$ with a linear operator (or a matrix = Jacobian) and then everything makes sense.

By the way, this is my preferred way of presenting derivatives in multivariable calculus. We see the derivative as being a linearization which well approximates our function: $f(x) \approx f(a)+f'(a)(x-a)$ (the tangent). When $f$ is a scalar-valued function, $f'(a)$ is just the gradient. Also, we get that approaching this multivariate limit along coordinate axes reduces to partial derivatives. This then explains why partials (and in fact all directional derivatives) can exist at a point even when a function is not differentiable (a limit can exist along all lines but still fail to exist).