Computing geodesics (or shortest paths)
Ok, let me see if I get it right. You are interested in a computational method that finds the geodesic connecting two points. More precisely, you are looking for a reasonable discretization method, you are not necessarily looking for the differential-geometric explanation.
There is several different but equivalent (i.e. leading to the same result) characterizations of geodesics on a two dimensional surface embedded (immersed) in three space. So what I'm gonna say is by no means exhaustive of the subject.
In what follows I will use $ u = (u_1,u_2)$ instead of $(u,v)$ and a point in three space will be $x = (x_1,x_2,x_3)$. So you have your surface $S : (u_1,u_2) \mapsto (x_1,x_2,x_3)$. We mostly work in $(u_1,u_2)$ coordinates and then, when we have some solution, we map it in 3D via $S$.
First, compute the metric tensor $g_{ij}(u)$ which is a 2 by 2 matrix of function $G(u) = \big( g_{ij}(u)\big)$ with components: $$g_{ij}(u) = \left(\frac{\partial S}{\partial u_i} \cdot \frac{\partial S}{\partial u_j}\right)$$ where $( \, \cdot \, )$ is the dot product in three space. On a surface in three space, as well as on any smooth manifold, the length of a smooth curve $u : [a,b] \to \mathbb{R}^2$ (curve given with respect to $u = (u_1,u_2)$ coordinates and parametrized as $u=u(t)$) with tangent vector (first derivatives with respect to parameter $t$) $\dot{u} = (\dot{u}_1, \dot{u}_2)^T$ is
$$\mathcal{L}[u] = \int_{a}^{b} \sqrt{\dot{u}(t)^TG\big( u(t)\big)\dot{u}(t)} \,\, dt = \int_{a}^{b} \sqrt{\dot{u}^TG\big( u\big)\dot{u}} \,\, dt$$
The distance between the points $u(a)$ and $u(b)$ is the minimum of $L[u]$ over all smooth curves connecting $u(a)$ to $u(b)$. Geodesics are curves that (locally) minimize $\mathcal{L}[u]$. Now, alongside the length functional, define the (kinetic) energy functional
$$E[u] =\frac{1}{2} \int_{a}^{b} \dot{u}(t)^TG\big( u(t)\big)\dot{u}(t) \,\, dt = \frac{1}{2}\int_{a}^{b} \dot{u}^TG\big( u\big)\dot{u} \,\, dt$$
Due to the Cauchy-Schwarz inequality, $(\mathcal{L}[u])^2 \leq 2(b-a) E[u]$ for any curve $u(t)$ with equality if and only if $\dot{u}^TG\big( u\big)\dot{u} \equiv const$ which happens when both functionals are optimized, which means that $\mathcal{L}[u]$ is optimized exactly when $E[u]$ is optimized and both are optimized by the same curve (geodesic). Bottom line, the optimizers of $E[u]$ are the geodesics so we can simply replace $\mathcal{L}[u]$ by $E[u]$.
If one denotes by $L(u,\dot{u}) =\frac{1}{2} \dot{u}^TG\big( u\big)\dot{u}$, aka the Lagriangian or the Lagrange function, then the energy functional becomes $$E[u] = \frac{1}{2} \int_{a}^{b} \dot{u}^TG\big( u\big)\dot{u} \,\, dt = \int_{a}^{b} L(u, \dot{u}) \, dt = \int_{a}^{b} L\Big(u(t), \dot{u}(t)\Big) \, dt .$$
Then, according to the calculus of variations, the optimizing curves of the functional $E$ ( which are the geodesics ) are the solutions of the Euler-Lagrange equations $$\frac{d}{dt}\Big(\, \frac{\partial L}{\partial \dot{u}}(u,\dot{u}) \, \Big) = \frac{\partial L}{\partial u}(u,\dot{u}).$$ After rearranging the terms of the latter equations, one gets the equations
\begin{align} &\ddot{u}_1 + \sum_{ij} \Gamma^1_{ij}(u) \, \dot{u}_i\dot{u}_j =0\\ &\ddot{u}_2 + \sum_{ij} \Gamma^2_{ij}(u) \, \dot{u}_i\dot{u}_j =0 \end{align}
which are exactly the geodesic equation written in terms of the Christoffel symbols $\Gamma^k_{ij}(u)$ of the Levi-Civita connection of the Riemannian metric $G(u)$. Geometrically, they express the fact, written in $(u_1,u_2)$-coordinates, that the curvature vector of the curve $S\big(u(t)\big)$ in three space has zero projection on the tangent plane to the surface $S$ at every point of the curve $S\big(u(t)\big)$ (i.e. the normal vector of the geodesic is aligned with the normal vector of the surface). The latter projection, of the normal vector of a curve onto the tangent plane, is in fact the geodesic curvature vector, so these equations say the geodesic curvature is zero.
Now, let's go back to the functional $E[u]$ because I suggest a straight-forward discretization of the method I outlined above as follows: replace the derivative $\dot{u}$ by a difference $(u^1 - u)/\varepsilon$ (or something even better if you wish) and consider the discrete Lagrangian $$L_{\varepsilon}(u,u^1) := L\left(\, u, \, \frac{u^1-u}{\varepsilon} \, \right) = \frac{1}{2\varepsilon^2} (u^1 - u)^T G\big( u\big)(u^1 - u).$$ By analogy with the continuous case, where the action is $E[u] = \int_{a}^{b} L(u,\dot{u})dt$, in the discrete case the energy is $$E_{\varepsilon}[u] = \sum_{k=0}^{N} L_{\varepsilon}\big(u^k,u^{k+1} \big)$$ where $u^0$ is your initial point and $u^{N+1}$ is your final point. Then, the critical discrete geodesic should simply be the solution to the algebraic equations $\nabla E_{\varepsilon}[u] = 0$ which componentwise leads basically to the discrete version of the Euler-Lagrange equations $$\frac{\partial L_{\varepsilon}}{\partial u^k}\big(u^{k-1}, u^k\big) + \frac{\partial L_{\varepsilon}}{\partial u^k} \big(u^{k}, u^{k+1}\big) = 0 \,\,\, \text{ for } \,\,\, k=1,...,N$$
$$u^0 = \text{ initial point, } \,\, u^{N+1} = \text{ finial point. }$$
The latter is a system of algebraic equations, something like $2 N$ equations and variables. The solution is a sequence of points $\,\, u^0, \, u^1, \, u^2, \, ... \, , \, u^{N+1} \,$ which should approximate the geodesic between $u^0$ and $u^{N+1}$. When you map them onto the surface in 3D via the map $S$ you get a sequence of points $\,\, S\big(u^0\big), \, S\big(u^1\big), \, S\big(u^2\big), \, ... \, , \, S\big(u^{N+1}\big) \,$
which lie on your surface and should be a good approximation to the smooth geodesic. You can interpolate between consecutive points (either the $u$'s or the $S(u)$'s) if you wish to get an actual curve.
Now, you have two parameters you need to adjust. These are $\varepsilon$ and $N$. Maybe a link between them makes sense, something like $\varepsilon = \frac{1}{N}$ or something even more sophisticated. I haven't thought about it too much. The smaller $\varepsilon$ and the larger the $N$, the better the approximation (hopefully :) ).
Now, to find discrete geodesics with a stating point and a direction vector, one can look locally at the discrete Euler-Lagrange equations and write them down using a simplified superscript notation $$\frac{\partial L_{\varepsilon}}{\partial u}\big(u^{(-1)}, u\big) + \frac{\partial L_{\varepsilon}}{\partial u}\big(u, u^{1}\big) = 0$$ Introduce the variable $p = \frac{\partial L_{\varepsilon}}{\partial u}\big(u, u^{1}\big)$. Then the discrete Euler-Lagrange equations become $$\frac{\partial L_{\varepsilon}}{\partial u}\big(u^{(-1)}, u\big) +p= 0$$ which shifted by one subscript turn into $\frac{\partial L_{\varepsilon}}{\partial u^1}\big(u, u^1\big) + p^1= 0$. Thus we obtain the equations $$p = \frac{\partial L_{\varepsilon}}{\partial u}\big(u, u^{1}\big)$$ $$p^1 = -\frac{\partial L_{\varepsilon}}{\partial u^1}\big(u, u^1\big).$$ If one can express $u^1$ as a function of $(u,p)$ from the first equation, then the second equation also gives us $p^1$ as a function of $(u,p)$. Thus we have obtained a map $\Phi : (u,p) \mapsto (u^1,p^1)$. Observe that this is a map $\Phi : T^*\mathbb{R}^2 \to T^*\mathbb{R}^2$, which turns out to be symplectic, because the Lagrangian $L_{\varepsilon}$ is in fact a generating function of the symplectomorphism $\Phi$. Then, given an initial point $u^0$ and a direction vector $v_0$, take let's say $u^1 =u_0 + \varepsilon \, v^0 $ and obtain $p^0$. Then starting with $u = u^0$ and $p=p^0$, start iterating the map $\Phi$ getting $\big(u^{k+1}, p^{k+1}\big) = \Phi\big(u^{k}, p^{k}\big)$ obtaining again a sequence $\,\, u^0, \, u^1, \, u^2, \, ... \, , \, u^{k}, \,\, .... \,$ which approximates your geodesic (here you have another sequence $\,\, p^0, \, p^1, \, p^2, \, ... \, , \, p^{k}, \,\, .... \,$ which is a sequence somehow related to the dual vectors to the tangent vectors to your geodesic, but maybe that's not important for you).
Comment: As explained above, the discrete Euler-Lagrange equations are $\nabla E[u] = \nabla E( u^1, u^2, ..., u^k, ... u^n) = 0$, i.e. the multi-point $( u^1, u^2, ..., u^k, ... u^n)$ for which the gradient of $E( u^1, u^2, ..., u^k, ... u^n)$ is zero. Therefore, in the case of given initial and final point for the geodesic, a first line computational approach could be a gradient descent method (or a version of Newton's method). In the case of an initial point and a direction vector, simply iterate the map $\Phi$. Even if the map is not explicit, the problem is still slightly simpler because one goes step by step, every time solving only the system involving the variables $u, u^1, p, p^1$.
Comment: To make the gradient descent method a bit more efficient, one can choose a smart initial guess for the discrete geodesic $u^0,\, u^1, \, ..., \, u^{N+1}$. We have $u^0$ and $u^{N+1}$ fixed. These are the initial point and the end point. Denote by $\bar{u}(m) = \big(u^0, \, u^1(m),\, u^2(m), \, ...,\, u^{N}(m), \, u^{N+1} \big)$ a sequence of $N+2$ points at iteration $m$. Then say we use some sort of gradient descent $$\bar{u}(m+1) = \bar{u}(m) + \alpha_m \, \nabla E \big[\bar{u}(m)\big]$$ with starting point $\bar{u}(0) = \big(u^0, \, u^1(0),\, u^2(0), \, ...,\, u^{N}(0), \, u^{N+1}\big)$. If $\bar{u}(0)$ is picked carefully, then the gradient descent could make less iterations before arriving at a very good approximate solution to the problem $\nabla \, E[u] = 0$.
Suggestion 1: For instance (especially if your metric is not too curved or your points are not too far apart) take simply the segment between $u^0$ and $u^{N+1}$ and choose $N$ equally spaced points in between as a starting point $\bar{u}(0)$.
Suggestion 2: If you have some extra information, you can smartly use $u^0$ as a starting point and a second point $u^1$, then run the initial value iteration to create a discrete geodesic you can use as an initial guess for your gradient descent algorithm (or Newton's method or whatever numerical scheme you use to solve the discrete variation problem). That's why I have also discussed the initial value problem because one can use it as an auxiliary tool.
For instance, take $S(u^0)$ and $S(u^{N+1})$, the images of $u^0$ and $u^{N+1}$ on the surface $S$ in $\mathbb{R}^3$, and form the vector $\overrightarrow{u^{0}u^{N+1}}$, then project it onto the tangent plane at to the surface $S$ at the point $S(u^0)$. Write it as $$\text{proj}\Big( \overrightarrow{u^{0}u^{N+1}} \Big) = v_1 \, \frac{\partial S}{\partial u_1}(u^0) + v_2 \, \frac{\partial S}{\partial u_2}(u^0).$$ Form the vector $v=(v_1, v_2)^T$ (a tangent vector with respect to $(u_1, u_2)$-coordinates), rescale it to become small enough, and then use it as an intial value problem to generate a discrete geodesic that comes close to $u^{N+1}$. You can even get to a reasonable choice for $N$. Then use this geodesic as a starting point for your gradient descent method in order to get a much more accurate approximation of the geodesic connecting $u^0$ to $u^{N+1}$.