some confusion about the concept of gradient
You are probably asking yourself this question because in calculus courses we are not focused in proving but rather in computing. The fact that $D_f = \nabla f \cdot v$ is not really the right way to look at it (i.e. as a scalar product) to understand why this is the logical definition. The better way to think of it is to see $\nabla f$ as a linear transformation, and see $\nabla f \cdot v$ as the evaluation of this linear transform at $v$. I'll try to make myself clearer.
In one dimension, when one computes the derivative, one obtains a real number. This real number is in one-to-one correspondence with a linear transformation, $D_f(x)$, which associates to every number $v$ the new number $D_f(x) \cdot v$. In other words, this could be considered as a directional one-dimensional derivative (taking numbers different than $1$, as in the $2$-dimensional case, isn't very pertinent ; we can restrict ourselves to vectors of norm $1$, hence $\pm 1$ are the only interesting cases). So it makes sense that in the direction $-1$ we obtain $-D_f$, since in the opposite direction of the slope, the variation is minus the variation we would have in the positive direction of the slope.
Over $\mathbb R^n$, a function is defined to be differentiable at a point $x$ when there exists a linear transformation $L(x)$ (or a $n \times 1$ matrix, if you are not familiar with such concepts) such that $$ \lim_{v \to 0} \frac{ f(x + v) - f(x) - L(x)v }{\| v \|} = 0. $$ In this case $L(x)$ is said to be the derivative of $f$ at $x$.
Using Taylor's theorem, one can actually deduce that $L(x) = \nabla f(x)$ when $f$ is differentiable.
Note that you can also feel why this definition should hold, and see why $D_f = \nabla f \cdot v$ in this manner ; if you define $g(h) = f(x+hv)$, you realize that $g'(0) = \nabla f \cdot v$ by the chain rule. It's another way to think about it. See? You have, by writing $v = (\|v\| \cos \theta, \|v\| \sin \theta)$ : \begin{align} g'(0) & = \frac{\partial f}{\partial x} \frac{\partial (x+hv)_x}{\partial h} + \frac{\partial f}{\partial y} \frac{\partial(x+hv)_y}{\partial h} \\\ & = \frac{\partial f}{\partial x} \|v\| \cos \theta + \frac{\partial f}{\partial y} \|v\| \sin \theta = \nabla f \cdot (\|v\| \cos \theta, \|v\| \sin \theta) = \nabla f \cdot v. \end{align}
Hope that helps,
Let me just focus of function of two variables: i.e. $f:\mathbb{R}^2\rightarrow\mathbb{R}$. (You can generalize easily to $n$ variables by replacing $2$ by $n$ in the following) Then gradient of $f$ at $(x_0,y_0)$, $\nabla f(x_0,y_0)$, is a $2$-dimensional vector given by $$\nabla f(x_0,y_0)=(\frac{\partial f}{\partial x}(x_0,y_0),\frac{\partial f}{\partial y}(x_0,y_0)).$$ Given any unit vector $v=(v_1,v_2)$, the directional derivative $D_vf$ of $f$ at the point $(x_0,y_0)$ in the direction $v$ is defined as $$D_vf(x_0,y_0)=\lim_{t\rightarrow 0}\frac{f(x_0+tv_1,y_0+tv_2)-f(x_0,y_0)}{t}.$$
Therefore, the directional derivative $D_{(1,0)}f$ is nothing but the partial derivative of $f$ with respect to $x$, i.e. $D_{(1,0)}f(x_0,y_0)=\frac{\partial f}{\partial x}(x_0,y_0)$. Similarly, the directional derivative $D_{(0,1)}f$ is nothing but the partial derivative of $f$ with respect to $y$, i.e. $D_{(0,1)}f(x_0,y_0)=\frac{\partial f}{\partial y}(x_0,y_0)$.
Then the formula you have given follows from chain rule: $$D_vf(x_0,y_0)=\lim_{t\rightarrow 0}\frac{f(x_0+tv_1,y_0+tv_2)-f(x_0,y_0)}{t}$$ $$=\frac{d}{dt}(f(x_0+tv_1,y_0+tv_2))\big|_{t=0}=\frac{\partial f}{\partial x}(x_0,y_0)v_1+\frac{\partial f}{\partial y}(x_0,y_0)v_2=\nabla f(x_0,y_0)\cdot v.$$
Hope that this helps.
So, I think you are confused about the concept of a total differential, which then trickles down to a misunderstanding of the gradient and directional derivatives. So, for a function, say $f(x,y,z)$ we define $df$ (not $\Delta f$) as follows, $$df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy + \frac{\partial f}{\partial z} dz$$
We can understand the motivation of this definition and it makes sense by understanding some of its properties. So, I will list three I can think of,
- Encode how changes in $x,y,z$ affect $f$
- Placeholder for small variations $\Delta x, \Delta y, \Delta z$ to get approximation formula $\Delta f \approx f_{x} \Delta x + f_{y} \Delta y + f_{z} \Delta z$ (where $f_{i}$ is the partial derivative of $f$ with respect to $i$)
- Divide by something like $dt$ to get a rate of change. When $x=x(t), y=y(t), z=z(t)$, then $$\frac{df}{dt} =\frac{\partial f}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial f}{\partial y} \cdot \frac{dy}{dt} + \frac{\partial f}{\partial z} \cdot \frac{dz}{dt}$$ by the chain rule.
So, in general you can think of the total differential $df$ of a function as the thing that encodes how $f$ changes and has the capacity to change.
The gradient vector is defined as the following vector, $$\nabla w = \left(\frac{\partial w}{\partial x}, \frac{\partial w}{\partial y}, \frac{\partial w}{\partial z}\right)$$ So, as I posted in my previous response to your question here, we can derive the properties of how the gradient vector is perpendicular to the level surface and such. It seems that you are a little bit stuck in the single-variable mode of thinking. The definition of the derivative as the limit you suggested is talking about a different concept than the gradient vector. The gradient vector can be thought of as a "scalar field" of a function and as such it represents more than the limit as an x-value gets closer to the value of the function.
As for now understanding the concept of a directional derivative $$\frac{dw}{ds} \mid_{\hat{u}}$$ I consider it geometrically as the slope of a slice of the graph by a vertical plane parallel to $\hat{u}$. Refer to the other responses for additional explanation of some of the math behind directional derivatives.