Are almost all $k$-tuples of vectors in high dimensional space almost orthogonal?
We will write $P:= \mu$ (for more probabilistic notation) and we will show that $\lim_{n \to \infty} P(U_{\epsilon}) = 1$.
Lemma 1: For all $\epsilon>0$ there exists some $\delta>0$ such that if $(x^1,...,x^k) \in \prod_1^k S^n$ then $\sum_{i<j} |\langle x^i,x^j \rangle|< \delta$ implies $d(x, \mathcal O)<\epsilon$.
Proof: This is a consequence of the Gram-Schmidt orthogonalization process. Specifically, we define $u^1:=x^1$, then $u^2:=x^2-\frac{\langle x^2,u^1\rangle}{|u^1|^2} u^1$, then $u^3=x^3 - \frac{\langle x^3,u^2\rangle}{|u^2|^2} u^2- \frac{\langle x^3, u^1 \rangle}{|u^1|^2} u^1$, and so on. Then let $e^i:=u^i/|u^i|$, so the $e^i$ form an orthonormal set which was constructed from the original $x^i$ using formulas which depend only on the values of the inner products $\langle x^i,x^j \rangle$. By carefully keeping track and using induction, one may show that $|x^j - u^j|$ (and thus $|x^j-e^j|$) can be made arbitrarily small by making the $|\langle x^i,x^j \rangle|$ very small. $\Box$
Lemma 2: If $U,V$ are independent and uniformly distributed on $S^{n-1}$ then $P(|\langle U, V \rangle|>\epsilon) \leq \frac{1}{n\epsilon^2}$
Proof: By conditioning on $V$ and using independence we may write $$P(|\langle U, V \rangle|>\epsilon) = \Bbb E\bigg[ P\big(|\langle U, V \rangle|>\epsilon \;\big| \;V\big)\bigg] = \int_{S^n} P(|\langle U, v \rangle|>\epsilon)\;\sigma(dv) = P(|\langle U, e_1 \rangle|>\epsilon)$$ where $\sigma$ is the uniform measure on $S^n$, and $e_1=(1,0,...,0)$. The final equality follows by rotational invariance of $\sigma$. Writing $U=(U_1,...,U_n)$ we see that the $U_j$ are identically distributed and $\sum U_j^2=1$. Thus $1 = E[\sum U_j^2] = nE[U_1^2]$, so that $E[U_1^2]=1/n$. Consequently, $$P(|\langle U, e_1 \rangle|>\epsilon) = P(|U_1|>\epsilon) \leq \frac{E[U_1^2]}{\epsilon^2} = \frac{1}{n\epsilon^2}$$ At the end we have used Chebyshev's inequality with the second moment. $\Box$
Proof of the original claim: By combining the two lemmas, we see that $$P(U_{\epsilon}^c) = P\big( d(\{X^1,...,X^k\}, \mathcal O)>\epsilon\big) \leq P \bigg( \sum_{i<j} |\langle X^i,X^j \rangle |> \delta \bigg) $$$$\leq \sum_{i<j} P\bigg(|\langle X^i,X^j \rangle| > \frac{\delta}{k(k-1)}\bigg) \leq\frac{k^3(k-1)^3}{n \delta^2} \stackrel{n\to \infty}{\longrightarrow} 0$$
Remarks: it was essential that $k$ is fixed here (though the final line shows that $k$ is allowed to depend on $n$, so long as it grows slowly enough). Also note that we only used pairwise independence in this argument (full mutual independence was not needed).
It is true, and it can be seen as a consequence of the concentration of measure on the sphere.
Let $P$ be the uniform probability measure on $\mathbb S^{n}$. Given a subset $A\subset \mathbb S^n$, call $A_t:=\{y\in\mathbb S^n:dist(y,A)<t\}$ its $t$-neighbourhood. Then:
Concentration of measure: If $P(A)\geq\frac12$, then $P(A_t^c)\leq 2e^{-nt^2/64}$. (the specific constants are not important here)
Apply this to two complementary emispheres to obtain that for any $\nu\in\mathbb S^n$ we have $P(|x\cdot \nu|\geq t)\leq 4e^{-nt^2/64}$, i.e. the measure concentrates near any equator when the dimension increases.
In particular it follows immediately that the result you ask for is true if $k=2$: once you have chosen the first vector $x_1$, the probability that the second you choose will be near the hyperplane $x_1^\perp$ goes to $1$ as $n\to\infty$ (by the rotational invariance of the measure on the sphere it doesn't matter which specific $x_1$ you chose in the first step).
To prove it for general $k$ use induction. I'll sketch an argument. The case $k=2$ (or $k=1$ if you want) is the base case and it's proved. Suppose now the statement is true for $k$. With probability close to $1$ the first $k$ vectors are $\epsilon$-orthogonal. Move them one by one until $x_i$ is close to $e_i$, the $i^{th}$ standard unit base vector (by the rotation invariance we can compute probabilities in this new position). From the concentration of measure $$ P(\{|x\cdot e_1|\geq t\}\cup \ldots\cup\{|x\cdot e_k|\geq t\})\leq 4ke^{-nt^2/64}$$ Therefore with probability close to $1$ the $(k+1)^{th}$ vector has small scalar products with the previous $x_i$'s. This implies that with probability close to $1$ the family is $\epsilon$-orthogonal.