Proof of this fairly obscure differentiation trick?

Your observation is true and follows from the multivariable chain rule. To see why, let $f \colon \mathbb{R}^2 \rightarrow \mathbb{R}$ be differentiable and let $\gamma \colon \mathbb{R} \rightarrow \mathbb{R}^2$ be a differentiable curve. Set $\gamma(t) = (\gamma_1(t),\gamma_2(t))$ and consider the composition $h(t) = f(\gamma(t))$ which is a differentiable function from $\mathbb{R}$ to $\mathbb{R}$. The chain rule implies that

$$ h'(t) = \frac{d}{dt} f(\gamma_1(t),\gamma_2(t)) = \frac{\partial f}{\partial x}(\gamma(t)) \cdot \gamma_1'(t) + \frac{\partial f}{\partial y}(\gamma(t)) \cdot \gamma_2'(t). $$

If we take $\gamma(t) = (t,t)$, we get your observation and this obviously generalizes for arbitrary $N$.

A direct proof is also possible using the definition of differentiability. Write $$f(x,y) = f(t_0,t_0) + \frac{\partial f}{\partial x}(t_0,t_0)(x - t_0) + \frac{\partial f}{\partial y}(t_0,t_0)(y - t_0) + r(x,y)$$

where

$$ \lim_{(x,y) \to (t_0,t_0)} \frac{r(x,y)}{\sqrt{(x - t_0)^2 + (y - t_0)^2}} = 0 $$

and then

$$ \frac{f(t,t) - f(t_0,t_0)}{t - t_0} = \frac{\partial f}{\partial x}(t_0,t_0) + \frac{\partial f}{\partial y}(t_0,t_0) + \frac{r(t,t)}{t - t_0} \xrightarrow[t \to 0]{} \frac{\partial f}{\partial x}(t_0,t_0) + \frac{\partial f}{\partial y}(t_0,t_0). $$


BTW, I agree with calling your observation "a trick" but I wouldn't call it obscure. In fact, it is useful in various contexts. For example, in differential geometry this is useful in proving that the lie bracket of two vector fields measures how an infinitesimal parallelogram obtained from the flows fails to close or how the curvature contributes to parallel transport along a closed loop. In both cases, one defines a function $f \colon (-\varepsilon, \varepsilon)^4 \rightarrow V$ which depends on four parameters (so $f = f(t_1,t_2,t_3,t_4)$) and one wants to compute the second derivative of $h(t) = f(t,t,t,t)$ at $t = 0$. Applying the chain rule, we have

$$ h''(0) = \sum_{i,j} \frac{\partial^2 h}{\partial t_i \partial t_j}(0,0,0,0) $$

and then one uses various symmetries to compute the partial derivatives. For more details, see here.


Let's write $f(x, y)=x^y$. You want to find the derivative of the single-variable function $g(x)=f(x, x)$.

$$f(x+h, x+h)-f(x, x)=\big(f(x+h, x+h)-f(x+h,x)\big)+\big(f(x+h, x) - f(x, x)\big)$$

In other words, we're moving diagonally by first moving up and then moving right. When you divide this equation by $h$ and let $h\to 0$, you get

$$g'(x)=\lim_{h\to0}\frac{f(x+h, x+h)-f(x+h,x)}{h} + \partial_1f(x, x)$$

Where by $\partial_1$ I mean the partial derivative with respect to the first variable. So far we've just used the definition of the derivative.

But what about that limit? It sure looks a lot like $\partial_2 f(x, x)$, but the problem is that the first variable is changing as $h$ tends to $0$. But as $h$ tends to zero, the first variable is tending to $x$, so we can basically just replace it with $x$, and then we'll have the definition of $\partial_2 f(x, x)$... right?

There are probably a few ways to do this, but this is one of the things the mean value theorem exists for: justifying intuitions about this sort of thing. By the mean value theorem, that ratio is equal to $\partial_1f(x+h, \xi_h)$ for some $\xi_h$ in between $x$ and $x+h$. As $h$ tends to zero, that tends to $\partial_1 f(x, x)$... because $\partial_1 f(x, y)$ is a continuous (multi-variable) function (which is something that needs to be proven separately).

This is typical reasoning in basic multi-variable calc - you go from $A$ to $B$ one coordinate a time, apply single variable calc along each axis, and then use something like the mean value theorem to prove that what feels like it should work actually does work.