Equivalence between Hamiltonian and Lagrangian Mechanics
Ok, let us start from scratch. A function $g: \mathbb R^n \to \mathbb R$ with $f \in C^2(\mathbb R^n)$ is said to be convex if its Hessian matrix (i.e. the one with coefficients $\partial^2 f/\partial x_i \partial x_j$) is everywhere (strictly) positively defined.
Let $\Omega \subset \mathbb R \times \mathbb R^n$ be an open set and focus on a jointly $C^2$ Lagrangian function $\Omega \times \mathbb R^n \ni (t,q,\dot{q}) \mapsto L(t, q, \dot{q}) \in \mathbb R$.
For fixed $(t,q) \in \Omega$, $L$ is assumed to be convex as a function of $\dot{q}$. In other words $\mathbb R^n \ni \dot{q} \mapsto L(t, q, \dot{q}) \in \mathbb R$ is supposed to be convex.
Referring to either systems made of points of matters or solid bodies, convexity arises form the structure of the kinetic energy part of Lagrangians, which are always of the form $T(t, q, \dot{q}) - V(t, q)$, even considering generalized potentials $V(t,q, \dot{q})$ with linear dependence on $\dot{q}$, as is the case for inertial or electromagnetic forces or inertial forces also in the presence of holonomic ideal constraints.
The associated Hamiltonian function is defined as the Legendre transformation of $L$ with respect to the variables $\dot{q}$. In other words:
$$H(t,q,p) := \max_{\dot{q} \in \mathbb R^n}\left[p\cdot \dot{q} - L(t, q, \dot{q})\right]\qquad (1)$$
Within our hypotheses on $L$, from the general theory of Legendre transformation, it arises that, for fixed $(t,q) \in \Omega$, a given $p \in \mathbb R^n$ is associated with exactly one $\dot{q}(p)_{t,q} \in \mathbb R^n$ where the maximum of the RHS in (1) is attained (for $n=1$ the proof is quite evident, it is not for $n>1$).
Since $\dot{q}(p)_{t,q} $ trivially belongs to the interior of the domain of the function $\mathbb R^n \ni \dot{q} \mapsto p\cdot \dot{q} - L(t, q, \dot{q})$, it must be:
$$\left.\nabla_{\dot{q}} \right|_{\dot{q}= \dot{q}(p)_{t,q}} \left( p\cdot \dot{q} - L(t, q, \dot{q})\right) =0\:.$$ In other words (always for fixed $t,q$): $$p = \left.\nabla_{\dot{q}} \right|_{\dot{q}(p)_{t,q}} L(t, q, \dot{q})\:, \quad \forall \dot{q} \in \mathbb R^n\qquad (2)$$
As a consequence, (always for fixed $(t,q)\in \Omega$) the map $\mathbb R^n \ni p \mapsto \dot{q}(p)_{t,q} \in \mathbb R^n$ is injective, because it admits a right inverse given by the map $\mathbb R^n \ni \dot{q} \mapsto \nabla_{\dot{q}} L(t, q, \dot{q})$ which, in turn, is surjective. However the latter map is also injective, as one easily proves using the convexity condition and the fact that the domain $\mathbb R^n$ is trivially convex too. The fact that the $\dot{q}$-Hessian matrix of $L$ is non-singular also implies that the map (2) is $C^1$ with its inverse.
Summing up, the map (2) is a $C^1$ diffeomorphism from $\mathbb R^n$ onto $\mathbb R^n$ and, from (1), we have the popular identity describing the interplay of the Hamiltonian and Lagrangian functions as:
$$H(t,q,p) = p\cdot \dot{q} - L(t, q, \dot{q})\qquad (3)$$
which holds true when $p \in \mathbb R^n$ and $\dot{q} \in \mathbb R^n$ are related by means of the $C^1$ diffeomorphism from $\mathbb R^n$ onto $\mathbb R^n$ (for fixed $(t,q)\in \Omega$): $$p = \nabla_{\dot{q}} L(t, q, \dot{q})\:, \quad \forall \dot{q} \in \mathbb R^n\qquad (4)\:.$$
By construction, $H= H(t,q,p)$ is a jointly $C^1$ function defined on $\Gamma := \Omega \times \mathbb R^n$. I stress that $L$ is defined on the same domain $\Gamma$ in $\mathbb R^{2n+1}$. The open set $\Gamma$ is equipped by the diffeomorphism: $$\psi: \Gamma \ni (t,q, \dot{q}) \mapsto (t,q, p) \in \Gamma \qquad (4)'$$ where (4) holds.
Let us study the relationship between the various derivatives of $H$ and $L$.
I remark that I will not make use of Euler-Lagrange or Hamilton equations anywhere in the following.
Consider a $C^1$ curve $\gamma: (a,b) \ni t \mapsto (t, q(t), \dot{q}(t)) \in \Gamma$, where $t$ has no particular meaning and $\dot{q}(t)\neq \frac{dq}{dt}$ generally. The diffeomorphism $\psi$ transform that curve into a similar $C^1$ curve $t \mapsto \psi(\gamma(t)) = \gamma'(t)$ I will also indicate by $\gamma': (a,b) \ni t \mapsto (t, q(t), p(t)) \in \Gamma$.
We can now evaluate $H$ over $\gamma'$ and $L$ over $\gamma$ and compute the total temporal derivative taking (3) and (4) into account, i.e. we compute:
$$\frac{d}{dt} H(t, q(t),p(t)) = \frac{d}{dt}\left(p(t) \dot{q}(t) - L(t,q(t),p(t)) \right)\:.$$
Computations gives rise almost immediately to the identity, where both sides are evaluated on the respective curve:
$$\frac{\partial H}{\partial t} + \frac{dq}{dt}\cdot \nabla_q H + \frac{dp}{dt}\cdot \nabla_p H = \frac{dp}{dt}\dot{q} + p \frac{d\dot{q}}{dt} -\frac{\partial L}{\partial t} - \frac{dq}{dt}\cdot \nabla_q L - \frac{d\dot{q}}{dt}\cdot \nabla_{\dot{q}} L \:.$$ In the RHS, the second and the last term cancel each other in view of (4), so that: $$\frac{\partial H}{\partial t} + \frac{dq}{dt}\cdot \nabla_q H + \frac{dp}{dt}\cdot \nabla_p H = \frac{dp}{dt}\dot{q} -\frac{\partial L}{\partial t} - \frac{dq}{dt}\cdot \nabla_q L \:.$$ Rearranging the various terms into a more useful structure: $$\left(\frac{\partial H}{\partial t}|_{\gamma'(t)} + \frac{\partial L}{\partial t}|_{\gamma(t)}\right) + \frac{dq}{dt}\cdot \left( \nabla_q H|_{\gamma'(t)} + \nabla_q L|_{\gamma(t)}\right) + \frac{dp}{dt}\cdot \left(\nabla_p H|_{\gamma'(t)} - \dot{q}|_{\gamma(t)}\right) =0\:.\qquad (5)$$
Now observe that actually, since $\gamma$ is generic, $\gamma(t)$ and $\gamma'(t)= \psi(\gamma(t))$ are generic points in $\Gamma$ (however connected by the transformation (4)). Moreover, given the point $(t,q, \dot{q}) = \gamma(t) \in \Gamma$, we are free to choose the derivatives $\frac{dq}{dt}$ and (using the diffeomorphism) $\frac{dp}{dt}$ as we want, fixing $\gamma$ suitably. If we fix to zero all these derivatives, (5) proves that, if $(t,q, \dot{q})$ and $(t,q,p)$ are related by means of (4):
$$\left(\frac{\partial H}{\partial t}|_{(t,q,p)} + \frac{\partial L}{\partial t}|_{(t,q, \dot{q})}\right) =0\:.$$
This result does not depend on derivatives $dq/dt$ and $dp/dt$ since they do not appear as arguments of the involved functions. So this result holds everywhere in $\Gamma$ because $(t,q, \dot{q})$ is a generic point therein. We conclude that (5) can be re-written as:
$$\frac{dq}{dt}\cdot \left( \nabla_q H|_{\gamma'(t)} + \nabla_q L|_{\gamma(t)}\right) + \frac{dp}{dt}\cdot \left(\nabla_p H|_{\gamma'(t)} - \dot{q}|_{\gamma(t)}\right) =0\:.\qquad (5)'$$
where again, we are considering a generic curve $\gamma$ as before. Fixing such curve such that all components of $\frac{dq}{dt}$ and $\frac{dp}{dt}$ vanish except for one of them, for instance $\frac{dq^1}{dt}$, we find:
$$\left(\frac{\partial H}{\partial q^1}|_{(t,q,p)} + \frac{\partial L}{\partial q^1}|_{(t,q, \dot{q})}\right) =0\:,$$
if $(t,q, \dot{q})$ and $(t,q,p)$ are related by means of (4), and so on.
Eventually we end up with the following identities, valid when $(t,q, \dot{q})$ and $(t,q,p)$ are related by means of (4)
$$\frac{\partial H}{\partial t}|_{(t,q,p)} =- \frac{\partial L}{\partial t}|_{(t,q, \dot{q})}\:, \quad \frac{\partial H}{\partial q^k}|_{(t,q,p)} =- \frac{\partial L}{\partial q^k}|_{(t,q, \dot{q})}\:, \quad \frac{\partial H}{\partial p_k}|_{(t,q,p)} = \dot{q}^k\:. \quad (6)$$ The last identity is the one you asked for. As you see, the found identities rely upon the Legendre transformation only and they do not consider Euler-Lagrangian equations or Hamilton ones.
However, exploiting these identities, it immediately arises that $\gamma$ verifies EL equations: $$\frac{d}{dt} \frac{\partial L}{\partial \dot{q}^k} - \frac{\partial L}{\partial q^k}=0\:,\quad \frac{dq^k}{dt} = \dot{q}^k\quad k=1,\ldots, n$$ if and only if the transformed curve $\gamma'(t) := \psi(\gamma(t))$ verifies Hamilton equations. $$\frac{d p_k}{dt} = -\frac{\partial H}{\partial q^k} \:, \quad \frac{dq^k}{dt} = \frac{\partial H}{\partial p_k}\quad k=1,\ldots, n\:.$$
Indeed, starting from a curve $\gamma(t) = (t, q(t), \dot{q}(t))$, the first EL equation, exploiting (4) (which is part of the definition of $\psi$) and the second identity in (6), becomes the first Hamilton equation for the transformed curve $\psi (\gamma(t))$. Moreover, the second EL equation, making use of the last identity in (6), becomes the second Hamilton equation for the transformed curve. This procedure is trivially reversible, so that, starting from Hamilton equations, you can go back to EL equations.
The first identity in (6) it not used here. However it implies that the system is or is not invariant under time translations simultaneously in Lagrangian and Hamiltonian formulation (in both cases, that invariance property implies the existence of a constant of motion which is nothing but $H$ represented with the corresponding variables either Lagrangian or Hamiltonian).
As a final comment notice that (3) and the last identity in (6) (which is nothing but the inverse function of (2) at fixed $(t,q)$) imply $$L(t, q, \dot{q}) = \nabla_p H(t,q,p) \cdot p - H(t,q,p)\:,$$ where (2) is assumed to connect Lagrangian and Hamiltonian variables.
In this answer we would like to show via the chain rule and brute force alone how Hamilton's eqs. follow from Lagrange eqs. and from the explicit definition (9) of the Hamiltonian. While there exist more elegant approaches, this method is in some sense the most natural and basic.
I) Lagrangian formalism. Let us assume that the Lagrangian $$\tag{1} L(q,v,t)$$ is a smooth function of its arguments $q^i$, $v^i$, and $t$. Let us suppress position dependence $q^i$ and explicit time dependence $t$ in the following. Define for later convenience functions
$$\tag{2} g_i(v)~:=~\frac{\partial L(v)}{\partial v^i}, \qquad i~\in~\{1, \ldots, n\}; $$
and
$$\tag{3} h(v,p)~:=~p_j v^j -L(v).$$
In eq. (3), the velocities $v^i$ and the momenta $p_i$ are independent variables.
II) Lagrangian eqs. of motion. The Lagrange eqs. read
$$\tag{4} \frac{\partial L(v)}{\partial q^i} ~\stackrel{\text{EL eq.}}{\approx}~ \frac{dg_i(v)}{dt} ~\stackrel{\text{Chain rule}}{=}~\frac{\partial g_i(v)}{\partial t}+ \dot{q}^j\frac{\partial g_i(v)}{\partial q^j}+ \dot{v}^j\frac{\partial g_i(v)}{\partial v^j}, $$
where we have identified
$$ \tag{5} v^i~\approx~\dot{q}^i, \qquad i~\in~\{1, \ldots, n\}.$$
[The $\approx$ symbol means equality modulo equations of motion.]
III) Dual Legendre variables. Within the Lagrangian framework, the momenta are defined as
$$\tag{6} p_i~=~g_i(v), \qquad i~\in~\{1, \ldots, n\}. $$
Here we will only discuss regular$^1$ Legendre transformations, i.e. we will assume that it is possible to invert the relations (6) as
$$\tag{7} v^i~=~f^i(p), \qquad i~\in~\{1, \ldots, n\}, $$
where
$$\tag{8} \text{The functions $f$ and $g$ are each others inverse functions}. $$
IV) Hamiltonian. Next define the Hamiltonian as the Legendre transform$^2$ of the Lagrangian:
$$\tag{9} H(p)~:=~ h(f(p),p)~\stackrel{(3)}{=}~p_j f^j(p)-(L\circ f)(p).$$
V) Hamilton's eqs. of motion. Then
$$\frac{\partial H(p)}{\partial p_i} ~\stackrel{(9)}{=}~ f^i(p) + p_j \frac{\partial f^j(p)}{\partial p_i} - \frac{\partial (L\circ f)(p)}{\partial p_i}$$ $$~\stackrel{\text{Chain rule}}{=}~ f^i(p) + \left\{p_j -\left( \frac{\partial L}{\partial v^j} \circ f \right)(p)\right\}\frac{\partial f^j(p)}{\partial p_i} $$ $$\tag{10}~\stackrel{(2)}{=}~ f^i(p) + \left\{p_j -(g_j\circ f)(p)\right\}\frac{\partial f^j(p)}{\partial p_i} ~\stackrel{(8)}{=}~f^i(p) ~\stackrel{(7)}{=}~v^i~\stackrel{(5)}{\approx}~\dot{q}^i, $$
and
$$-\frac{\partial H(p)}{\partial q^i} ~\stackrel{(9)}{=}~ \frac{\partial (L\circ f)(p)}{\partial q^i} - p_j \frac{\partial f^j(p)}{\partial q^i} $$ $$~\stackrel{\text{Chain rule}}{=}~ \left(\frac{\partial L}{\partial q^i}\circ f\right)(p) +\left\{\left( \frac{\partial L}{\partial v^j} \circ f \right)(p)-p_j \right\}\frac{\partial f^j(p)}{\partial q^i} $$ $$~\stackrel{(2)}{=}~ \left(\frac{\partial L}{\partial q^i}\circ f\right)(p) +\left\{(g_j\circ f)(p)-p_j \right\}\frac{\partial f^j(p)}{\partial q^i} $$ $$~\stackrel{(8)}{=}~\left(\frac{\partial L}{\partial q^i}\circ f\right)(p) ~\stackrel{(4)}{\approx}~ \left(\frac{dg_i}{dt}\right)\circ f(p) $$ $$~\stackrel{(4)}{\approx}~\left(\frac{\partial g_i}{\partial t}\right)\circ f(p) + \dot{q}^j\left(\frac{\partial g_i}{\partial q^j}\circ f\right)(p) + \frac{df^j(p)}{dt} \left(\frac{\partial g_i}{\partial v^j}\circ f\right)(p)$$ $$ \tag{11}~\stackrel{\text{Chain rule}}{=}~ \frac{d(g_i\circ f)(p)}{dt} ~\stackrel{(8)}{=}~\dot{p}_i. $$
Equation (10) and (11) are Hamilton's eqs.
--
$^1$ A singular Legendre transformation leads to primary constraints.
$^2$ Formula (9) is the definition of Legendre transform usually given in the physics literature. In the smooth setting it is slightly more general than the alternative definition
$$\tag{12} H(p)~\stackrel{(3)}{:=}~ \sup_v h(v,p).$$
for convex Lagrangians given e.g. on Wikipedia. See also e.g. this related Phys.SE post. The stationary point of $h(v,p)$ wrt. $v^i$ reads
$$ \tag{13} \frac{\partial h(v,p)}{\partial v^i}~=~0 \qquad \stackrel{(2)+(3)}{\Leftrightarrow} \qquad p_i~=~g_i(v) \qquad \stackrel{(8)}{\Leftrightarrow} \qquad v^i~=~f^i(p).$$
This shows that definition (12) in the pertinent setting leads to definition (9).
Alternatively, there exists an extended approach to the Legendre transformation between the Lagrangian and Hamiltonian formalism using $3n$ variables $(q^i,v^i,p_i)$, cf. e.g. Ref. 1. Let us suppress explicit time dependence $t$ from the notation in the following. Consider the extended Lagrangian$^1$
$$ L_E(q,\dot{q},v,p)~:=~ p_i(\dot{q}^i-v^i)+L(q,v)~\stackrel{(2)}{=}~p_i\dot{q}^i-H_E(q,v,p), \tag{1}$$
where the extended Hamiltonian is defined as
$$ H_E(q,v,p)~:=~ p_i v^i-L(q,v).\tag{2} $$
The Hamiltonian is defined as the Legendre transform $$ H(q,p)~:=~ \sup_v H_E(q,v,p)\tag{3}$$ of the Lagrangian.
Here it is important that positions $q^i$, velocities $v^i$, and momenta $p_i$ are treated as independent variables in the corresponding extended stationary action principle.
The Euler-Lagrange (EL) eqs. for the extended Lagrangian (1) read
$$ \begin{align} \dot{p}_i~\approx~& \frac{\partial L(q,v)}{\partial q^i}~=~- \frac{\partial H_E(q,v,p)}{\partial q^i}, \tag{4q}\cr 0~\approx~& p_i-\frac{\partial L(q,v)}{\partial v^i}~=~\frac{\partial H_E(q,v,p)}{\partial v^i},\tag{4v}\cr \dot{q}^i~\approx~&v^i~=~\frac{\partial H_E(q,v,p)}{\partial p_i}.\tag{4p}\end{align}$$
On one hand, by integrating out the $v^i$ variables [i.e. using the eq. (4v)], the extended Lagrangian (1) becomes the so-called Hamiltonian Lagrangian $$ L_H(q,\dot{q},p)~:=~ p_i\dot{q}^i-H(q,p). \tag{5}$$ The EL eqs. for the Hamiltonian Lagrangian (5) are the Hamilton's eqs of motion. This is how we recover the Hamiltonian formalism.
On the other hand, by integrating out the $p_i$ variables [i.e. using the eq. (4p)], we get $v^i \approx\dot{q}^i$. Eliminating the $v^i$ variables as well, the extended Lagrangian becomes the usual Lagrangian $$ L(q,\dot{q}), \tag{6}$$ which leads to the usual Lagrange eqs. of motion. This is how we recover the Lagrangian formalism.
Since the Hamiltonian and Lagrangian approaches (5) and (6) belong to the same extended formalism (1), the two approaches are equivalent. Also note that the complications with implicit dependencies in the standard treatment of the Legendre transformation simplify considerably in the extended formalism (1).
References:
- D.M. Gitman and I.V. Tyutin, Quantization of fields with constraints, (1990), Section 2.1.
--
$^1$ As usual in order for the extended variational principle to be well-defined, the boundary conditions (BCs) should ensure that the boundary term $\left[p_i\delta q^i \right]^{t=t_f}_{t=t_i}$ vanishes under infinitesimal variations $\delta q^i$.