Smoothness of distance function in Riemannian Manifolds
As others mentioned, you have to remove the diagonal of $M\times M$ or square the distance function. Then, for a complete $M$, the answer is the following.
The distance function is differentiable at $(p,q)\in M\times M$ if and only if there is a unique length-minimizing geodesic from $p$ to $q$. Furthermore, the distance function is $C^\infty$ in a neighborhood of $(p,q)$ if and only if $p$ and $q$ are not conjugate points along this minimizing geodesic.
Thus, the function is smooth everywhere if and only if $M$ is simply connected and the geodesics have no conjugate points. This property has numerous equivalent reformulations, including the following
for every pair of points, there is a unique minimizing geodesic between them;
for every pair of points, there is a unique geodesic between them;
every geodesic is minimizing;
the exponential map at every point $p\in M$ is a diffeomorphism from $T_pM$ to $M$.
In general, the distance function has one-sided directional derivatives everywhere. This derivative has a nice description in the case when you fix $p\in M$ and study the function $f=d(p,\cdot)$. Namely let $q\in M$, $q\ne p$, and denote by $\vec{qp}$ the set of initial velocity vectors (in $T_qM$) of unit-speed minimizing geodesics from $q$ to $p$. Then, for a vector $v\in T_qM$, the one-sided derivative $f'_v$ of $f$ in the direction of $v$ is $$ f'_v=\min\{-\langle v,\xi\rangle:\xi\in \vec{qp}\} . $$ This follows from the first variation formula and holds not only in Riemannian manifolds but also in Alexandrov spaces. It is not hard to derive the above differentiablity properties from this.
I don't have a textbook reference for this precise formulation in the Riemannian case, but any book that covers Berger's lemma about geodesics realizing the diameter probably has directional derivatives as a sublemma. For Alexandrov spaces, the standard reference is Burago-Gromov-Perelman's paper. An intro-level proof (not in a full generality) can be found in (a shameless advertisement follows) "A course in metric geometry" by Burago, Burago and myself, section 4.5.
It is a cute thoerem of Franz-Erick Wolter that a complete $n$-dimensional Riemannian manifold $M$ is necessarily diffeomorphic to $\mathbb R^n$ if there is a point $p\in M$ such that the squared-distance function $d(p,\mathord-)^2:M\to\mathbb R$ has directional derivatives at all points and in all directions. This provides examples.
See [Wolter, Franz-Erich. Distance function and cut loci on a complete Riemannian manifold. Arch. Math. (Basel) 32 (1979), no. 1, 92--96. MR0532854]
The purpose of this answer is to provide a proof for the following result which Sergei mentioned in the accepted answer:
Proposition. Let $M$ be a complete Riemannian manifold and $x,y \in M\times M$, $x\neq y$. Then the following are equivalent:
The Riemannian distance $d:M\times M\rightarrow[0,\infty)$ is smooth in a neighbourhood of $(x,y)$.
There is only one length minimising geodesic connecting the points $x$ and $y$ and they are not conjugate along that geodesic.
Proof of Proposition. The Proposition follows from the three Lemmas below which freely use some properties of the so called segment domains $\Sigma_x=\{w\in T_xM: d(x,\exp_x(w))=\vert w \vert\}$:
- $\exp_x: \mathrm{int} \Sigma_x\rightarrow M$ is a diffeomorphism onto its image [Gallot-Lafontaine, Corollary 3.77 or Petersen, Lemma 5.7.8 and Proposition 5.7.10]
- $M= \exp_x(\mathrm{int}\Sigma_x)\cup\exp_x(\partial \Sigma_x)$ and the union is disjoint [Gallot-Lafontaine, Proposition 2.113].
- Denote $\partial^1\Sigma_x = \{w\in \partial \Sigma_x: \exp_x(w)=\exp_x(w')$ for some $w'\in \partial \Sigma_x\backslash\{w\}\}$. Then:
- If $w\in \partial \Sigma_x\backslash \partial^1\Sigma_x$, then $D\exp_x\vert_w$ is singular. [Gallot-Lafontaine, Scholium 3.78 or Petersen, Lemma 5.78 ]
- $\exp_x(\partial^1\Sigma_x) \subset \exp_x(\partial \Sigma_x)$ is dense [Sakai, Remark 4.9], see also [Klingenberg, Theorem 2.1.12 & 14] as well as here.
Lemma 1. $d^2(x,\cdot):M\rightarrow [0,\infty)$ is smooth in a neighbourhood of $y$ if and only if $y\in \exp_x(\mathrm{int} \Sigma_x).$
Proof. On (the open set) $\exp_x(\mathrm{int}\Sigma_x)$ we have $d(x,y)^2 = \vert \exp_x^{-1}(y)\vert^2$ which is clearly a smooth function in $y$. For the converse assume that $d(x,\cdot)^2$ is smooth on some open set $U\subset M$ and note that it suffices to prove $$U \cap \exp_x(\partial \Sigma_x) = \emptyset \tag{1}.$$ Without loss of generality we may assume that $x\notin U$, then also $d(x,\cdot)$ is smooth in $U$ and has a gradient $G\in C^\infty(U;TM)$. Let $\gamma:[0,l]\rightarrow M$ be a length minimising unit-speed geodesic with $\gamma(0)=x$ and $y=\gamma(l)\in U$. Then $d(x,\gamma(t))=t$ and differentiation yields $$\langle G_y, \dot \gamma(l)\rangle = 1. \tag{*} $$ Since $d(x,\cdot)$ is Lipschitz with constant $\le 1$ (triangle inequality) we have $\vert G_y \vert \le 1$. Further $\vert \dot \gamma(l) \vert =1$ and in light of (*) this is only possible if $ G_y = \dot \gamma(l). $ We conclude: $$\text{Length minimising geodesics which start in $x$ don't intersect in $U$.} \tag{2}$$ Now we can prove (1). Assume to the contrary that $U \cap \exp_x(\partial \Sigma_x) \neq \emptyset$. Then by densitity of $\exp_x(\partial^1 \Sigma_x)$ we also have $U \cap \exp_x(\partial^1 \Sigma_x) \neq \emptyset$ and there are $w,w'\in \partial \Sigma_x$ with $w\neq w'$ and $\exp_x(w) = \exp_x(w')\in U$, which is in contradiction to (2).
Lemma 2. $d^2:M\times M\rightarrow [0,\infty)$ is smooth in a neighbourhood of $(x,y)$ if and only if $y \in \exp_x(\mathrm{int} \Sigma_x)$.
Proof. If $d^2$ is smooth near $(x,y)$, then $d^2(x,\cdot)$ is smooth near $y$ and the previous Lemma implies that $y\in \exp_x(\mathrm{int}\Sigma_x)$. For the converse define $\Sigma = \bigcup_x \Sigma_x \subset TM$ and note that $$ \Sigma \text{ is closed }\quad \text{ and } \quad \mathrm{int} \Sigma \cap T_xM = \mathrm{int} \Sigma_x \tag{3}. $$ Define $$ F: \mathrm{int} \Sigma \rightarrow M\times M, (x,w) \mapsto (x,\exp_x(w)) $$ and note that $$ DF\vert_{(x,w)} = \begin{bmatrix} \mathrm{id} & 0\\ \ast& D \exp_x\vert_w \end{bmatrix} $$ is invertible for all $(x,w)\in \mathrm{int} \Sigma$. Further $F$ is easily seen to be injective and thus it has a smooth inverse $F^{-1}:F(\mathrm{int} \Sigma) \rightarrow \mathrm{int} \Sigma$. Hence $d^2(x,y)= \vert F^{-1}(x,y)\vert ^2$ is smooth in a neighbourhood of every $(x,y) \in F(\mathrm{int} \Sigma)$, which concludes the proof.
Lemma 3. $y\in \exp_x(\mathrm{int} \Sigma_x)$ if and only if there exists a unique distance minimising geodesic between $x$ and $y$ and along this geodesic they are not conjugate.
Proof. Let $y=\exp_x(w)$ with $w \in \mathrm{int}\Sigma_x$. Then $t\mapsto \exp_x(tw)$, $0\le t\le 1$ is length minimising (because $w \in \Sigma_x$) and $x$ and $y$ are not conjugate along this geodesic ($D\exp_x\vert_w$ is invertible because $\exp_x$ is a diffeomorphism on $\mathrm{int}\Sigma_x$). If there was another length minimising geodesic from $x$ to $y$, then $y=\exp_x(w')$ for some $w'\in \Sigma_x \backslash \{w\}$. Since $\exp_x(\mathrm{int}\Sigma_x)\cap\exp_x(\partial \Sigma_x)=\emptyset$ we must have $w'\in \mathrm{int} \Sigma_x$, but this is false (since $\exp_x$ is injective on $\mathrm{int} \Sigma_x$).
Conversely assume that there is a unique distance minimising geodesic from $x$ to $y$ and that they are not conjugate along that geodesic. Then $y=\exp_x(w)$ for some $w\in \Sigma_x$. If we had $w\in \partial \Sigma_x$, then either there would be two length minimising geodesic between $x$ and $y$ (corresponding to $w\in \partial^1 \Sigma_x$) or $x$ and $y$ would be conjugate (corresponding to $D \exp_x\vert_w$ being singular).