How to visualize the gradient as a one-form?

Given a smooth manifold $M$, let $(U,\phi)$ be any chart on it, and let $X$ be a vector field. Then $X$ can be interpreted as a derivation if you think of every vector $X_p$ of the field as being the velocity vector of some path through $p$ with tangent vector $X_p$ itself. Now with the help of the coordinates given by the chosen chart, the vector field on $U$ can be written as $$X = \sum_{k=1}^n X^k\frac{\partial}{\partial x^k},$$ where the $X^k$ can be seen as the contravariant components of $X$ w.r.t. the chosen chart. The derivatives $\frac{\partial}{\partial x^k}$ are, at any point of $U$, a base of the tangent space $T_pM$. Geometrically, they are the tangent vectors to the curves on the manifold created by the chart under consideration (i.e. the curves $\gamma_i(t) = \phi^{-1}(\phi(p) + e_it)$, $e_i$ being the canonical basis vector of $\mathbb R^n$) at $p$. This notation comes from the fact that, if $f$ is any smooth function from $M$ to $\mathbb R$, $X$ operates on $f$ as a directional derivative $X(f)$, and in components on $U$ this is precisely $$X(f) = \sum_{k=1}^n X^k\frac{\partial f}{\partial x^k},$$ Now this expression can be dualised, in the sense that you can think that it is $f$ that is acting on the vector field through its transpose $f^*$ that is such that $$f^*(X) = X(f)$$ If you denote by $\text d x^k$ the dual basis of $\frac{\partial}{\partial x^k}$ on the cotangent bundle $T^*M$, it follows by a direct computation that $$f^* = \sum_{k=1}^n\frac{\partial f}{\partial x^k}\text d x^k,$$ which suggests the more evocative notation $\text df = f^*$. The components of $\text d f$ which is clearly a 1-form, are precisely those of the gradient, i.e. the partial derivatives of $f$.

As a further remark, observe that the gradient is usually defined as a vector field through the musical isomorphisms (in case then that you have a (pseudo)Riemannian manifold) induced by a metric tensor. More precisely, the gradient vector field of a smooth function $f$ is defined by $$g(\nabla f, Y) = \text d f(Y) = Y(f)$$ for any vector field $Y$ on $M$.


If you're going to take that path, then maybe you should be thinking more of a level set density, i.e. how closely spaced the level sets in question are. Sean Carrol's book is not wonted to me: if you can get a copy of Misner Thorne and Wheeler, the first few pages do a good job of this idea with their quaint "bong" machine that sounds a "bong" bell each time a vector pierces a level set. If you can't get this readily, then the early part of Kip Thorne's lectures here is also good

Anyhow, suppose we are given a scalar field $\phi(\vec{x})$ and the tangent space $T_x\mathcal{M}$ to $x\in\mathcal{M}$ in some manifold $\mathcal{M}$, and we imagine riding along a vector $X\in T_x\mathcal{M}$ in the tangent space: how often would we pierce level sets of $\phi$: in MTW's quaint and unforgettable words (and wonderful sketches), how often would our bell sound as we rode along the vector? It would be $\nabla\phi\,\cdot\,X$ (i.e. the directional derivative). Thus $\nabla\phi$ is a dual vector to the vector space of tangent vectors. It is a linear functional $T_x\mathcal{M}\to\mathbb{R}$ on the tangent space: it takes a vector $X\in T_x\mathcal{M}$ as its input and spits out the directional derivative $\nabla\phi\,\cdot\,X$.


In the context of general relativity it is also notable that the manifold is equipped with a metric tensor. This tensor provides a unique way (an isomorphism, s.a. answer by Phoenix87) to map covectors to vectors. The components are computed by index raising/lowering, e.g. for a given covector $w$ with components $w_i$ the corresponding vector reads $v^i=g^{ij}w_j$, where $g^{ij}$ are the components of the metric tensor and I use the summation convention over repeated indices.

Now the question is how this really looks like in the given case. The natural derivative of a function $f$ on a manifold $\mathcal{M}$ is $df$, with components $df_i = \partial_i f$, since this is possible without metric. As already mentioned, $df$ is visualized by level sets of $f$. Now the corresponding contravariant vector is computed by $X^i := g^{ij}\partial_i f\equiv (\nabla f)^i$, which we identify with the gradient of $f$. This vector is the vector that is everywhere perpendicular to the level sets of $f$. Note that the notion "perpendicular" can only be defined with a metric.

Many textbooks in physics also visualize the basis vector $dx^i$ with its gradient vector, i.e. the vector that is everywhere perpendicular to the levels sets of $x^i$. This is strictly speaking only possible when there is a metric present, but it simplifies the comparsion to $\frac{\partial}{\partial x^i}$ which is in general not the same direction.