Why isn't the directional derivative generally scaled down to the unit vector?

The intuition I think of for a directional derivative in the direction on $\overrightarrow{v}$ is that it is how fast the function changes if the input changes with a velocity of $\overrightarrow{v}$. So if you move the input across the domain twice as fast, the function changes twice as fast.

More precisely, this corresponds to the following process that relates calculus in multiple variables to calculus in a single variable. In particular, we can define a line based at a point $\overrightarrow{p}$ with velocity $\overrightarrow{v}$ parametrically as a curve: $$\gamma(t)=\overrightarrow{p}+t\overrightarrow{v}.$$ This is a map from $\mathbb R$ to $\mathbb R^n$. However, if $f:\mathbb R^n\rightarrow \mathbb R$ is another map, we can define the composite $$(f\circ \gamma)(t)=f(\gamma(t))$$ and observe that this is a map $\mathbb R\rightarrow\mathbb R$ so we can study its derivative! In particular, we define the directional derivative of $f$ at $\overrightarrow{p}$ in the direction of $\overrightarrow{v}$ to be the derivative of $f\circ\gamma$ at $0$.

However, when we do this, we only see a "slice" of the domain of $f$ - in particular, we only see the line passing through $\overrightarrow{p}$ in the direction of $\overrightarrow{v}$. This corresponds to the notion of slicing you bring up in your question. In particular, we do not see any values of $f$ outside of the image of $\gamma$, so we are only studying $f$ on some restricted set.


Let $f : \mathbb{R}^n \to \mathbb{R}^m$ and (if the limit exists) $$D_v f(x) = \lim_{h \to 0} \frac{f(x+hv)-f(x)}{h}$$ be the directional derivative in the direction $v$. This way, if the function is differentiabble $$ D_{au+bv} f(x) = a\, D_{u} f(x)+b\, D_{v} f(x) \qquad (a,b) \in \mathbb{R}^2$$ ie. the directional derivative is linear in the direction. Indeed $$D_{v} f(x) = J_x v$$ where $J_x$ is the Jacobian matrix.

You'll have some problems for saying and understanding that if you restrict to $\|v\|=1$, or worse if you normalize $D_vf(x)$


Unit vectors are vastly overrated — the notion of vector is far more computationally convenient when treated as a whole rather than decomposed into separate notions of direction and magnitude.

I claim it leads to better understanding as well.

Thus, one should not introduce unit vectors by habit — such a manipulation should be reserved for those circumstances when it does something useful.

Similarly, a good definition or computational tool shouldn't force unit vectors upon the user, unless there is a very good reason for doing so.


Algebraically, the directional derivative is not the main idea — the main idea is the differential of a function: in usual terms, $\nabla f$ is the row vector given by

$$ \nabla f(\vec{x}) = \begin{pmatrix} f_1(\vec{x}) & f_2(\vec{x}) & f_3(\vec{x}) \end{pmatrix} $$

where by $f_k$, I mean the derivative of the function $f$ in its $k$-th place. The directional derivative is simply the product of the differential with the given (column) vector:

$$ \nabla_\vec{v} f = (\nabla f) \vec{v} $$

As such, restricting to unit vectors is unnatural thing to do. Rescaling the input vector to be a unit vector is extremely unnatural.

Note that some people use $\nabla f$ to refer to a column vector, or even treat row and column vectors as the same thing. This is unfortunate, because it is computationally awkward when you change variables, and gets in the way of understanding the difference between vectors and covectors, and the close relationship between the inner product and the transpose operation.


Finally, it's worth noting that derivatives — even directional derivatives — make sense in contexts where there is no notion of length, and thus there is no notion of a "unit" vector that can be applied.