Intuition behind chain rule
The best way to think about the derivative is: if $f$ is differentiable at $x$, then \begin{equation*} f(x + \Delta x) \approx f(x) + f'(x) \Delta x. \end{equation*} The approximation is good when $\Delta x$ is small. This is practically the definition of $f'(x)$.
Now suppose $f(x) = g(h(x))$, and $h$ is differentiable at $x$, and $g$ is differentiable at $h(x)$. Then \begin{align*} f(x + \Delta x) & = g(h(x+\Delta x)) \\ &\approx g(h(x) + h'(x) \Delta x) \\ &\approx g(h(x)) + g'(h(x)) h'(x) \Delta x. \end{align*} Comparing this with the equation above suggests that \begin{align*} f'(x) = g'(h(x)) h'(x). \end{align*}
Many other rules about derivatives can be derived easily in this way.
For a function $g(x)$, imagine walking at constant (unit) speed along one number line, and seeing a red dot mark the function value of your current position on another number line. That is, imagine your position to be $x$, and the red dot to appear at $g(x)$. $g'(x)$ would be the speed of the red dot. Now, assume we chain this red dot to trigger a blue dot on a third number line, representing $f(x)$, i.e. if you yourself were to walk at unit speed along the $g$ line, then the blue dot on the $f$ line would light up at $f(x)$ and move with the speed $f'(x)$.
As you move along your original number line, the red dot appears at $g(x)$, so the blue dot appears at $f(g(x))$. This makes the blue dot move with speed $[f(g(x))]'$
The red dot on the $g$ line moves with speed $g'(x)$. The red and blue dots' movement speeds are proportional with proportionality factor $f'(g(x))$. Thus the resulting movement speed of the blue dot must be $f'(g(x))\cdot g'(x)$.
True even in several variables. Differentiable is locally linear-like. Composition of functions is locally $\approx$ composition of linear approximations. Composition of linear functions is matrix product.