Wasserstein distance in R^d from one dimensional marginals

There is a result which contains an answer to your question in a somewhat different form. Instead of the transportation metric it uses another metric which metrizes the weak topology in the space of measures on $\mathbb R^d$: $$ \lambda(\mu,\nu) \le \delta \iff \exists\; T\ge 1/\delta : \langle \exp(i(t,\cdot)),\mu-\nu\rangle \le \delta \quad\forall\; |t|\le T \;, $$ which might still be OK for your purposes. This article by Klebanov and Rachev actually contains a stronger result (Theorem 4): it gives an explicit upper estimate for $\lambda(\mu,\nu)$ in terms of the maximal distance $\lambda(\mu_v,\nu_v)$ between the projections of $\mu$ and $\nu$ onto a finite (growing) number of directions $v$ in $\mathbb R^d$.

PS In spite of a sufficiently long tradition of misnaming the transportation metric (some people even go as far as calling it after Hutchinson), I would still insist on using the name of Kantorovich (or Monge-Kantorovich, Kantorovich-Rubinshtein), see, for instance, this historical article.


PS: I posted an answer in 2015. In late 2019, @Neyman identified a problem with my original post.


Here is a non-constructive answer to the question.

I don't know of any reference where you can find the solution. I started thinking about the problem and found a simple solution (using standard results).
I read in another question about Wasserstein distance the suggestion of using a finite cover of the unit ball in $Lip(\mathbb{R}^d) $ to reduce estimates of $ W_{\mathbb{R}^d} $ to a max over a finite set. I see that idea working for Lipschitz functions on a bounded set, but I don't see how to apply that idea when working in the whole $\mathbb{R}^d $.

Here is my solution.

Let
$\quad Lip1(\mathbb{R}^d) = \{f:\mathbb{R}^d \longrightarrow\mathbb{R}: \forall x,y \in \mathbb{R} ^d \text{ with } x \neq y, \frac{|f(x)-f(y)|}{|x-y|} \leq 1) \}$
be the set of 1-Lipschitz functions in $\mathbb{R}^d$.

Let $\mathcal{G}$ be the set of functions on $ \mathbb{R}^d $ of the form:
$(1)\quad f(x) = \sum_{i \in F} a_if_i(x\cdot{v_i}) $
where
$\quad F$ is a finite set,
$\quad a_i \in \mathbb{R}, a_i \geq 0, \sum_{i \in F} a_i = 1,$
$\quad f_i \in Lip1(\mathbb{R}) ,$
$\quad v_i \in \mathbb{R}^d , |v_i| = 1 \text{ (Euclidean norm)}$
i.e.: $\mathcal{G}$ is the convex hull of $Lip1(\mathbb{R})$ functions composed with one dimensional projections. Clearly
$\quad \mathcal{G} \subseteq \overline{\mathcal{G}} \subseteq Lip1(\mathbb{R}^d)$
where $\overline{\mathcal{G}}$ is the closure of $\mathcal{G}$ in any topology in which $Lip1(\mathbb{R}^d)$ is closed. We will work with a weak* topology, namely the one in which the neighborhoods of 0 are generated by the sets
$(2)\quad \mathcal{N} = \{f \in Lip(\mathbb{R}^d): \arrowvert \int_{\mathbb{R}^d} f(x) u(x) dx + \int_{\mathbb{R}^d} \nabla f(x) \cdot{w(x)} dx\arrowvert < \epsilon \} $
for some $ u \in L^1(\mathbb{R}^d, (1+|x|)dx)$ with $\int u(x) dx = 0$, some $ w \in L^1(\mathbb{R}^d)^d $, and some $ \epsilon > 0 $.

Here are some known facts:
1) If $f \in Lip(\mathbb{R}^d)$, then $ f$ is differentiable a.e and $ sup\{ \frac{|f(x)-f(y)|}{|x-y|}: x \neq y\} = \|\nabla f \|_\infty $ is finite, denoted $\|f\|_{Lip}$.

2) If $f \in Lip(\mathbb{R}^d)$, then $ |f(x) - f(0)| \leq \|f\|_{Lip} |x| $, so the integral $\int_{\mathbb{R}^d} f(x) u(x) dx $ is well defined for any $ u \in L^1(\mathbb{R}^d ,(1+|x|)dx)$.

3) With this topology $ Lip(\mathbb{R}^d) $ is not separable. We will actually work in $ Lip(\mathbb{R}^d)/constants $, but we will not need be very explicit about it.

4) With this topology, any linear continuous function is of the form
$\quad \int_{\mathbb{R}^d} f(x) u(x) dx + \int_{\mathbb{R}^d} \nabla f(x) \cdot{w(x)} dx $
for some $ u \in L^1(\mathbb{R}^d, (1+|x|)dx)$ with $\int u(x) dx = 0$, and some $ w \in L^1(\mathbb{R}^d)^d $. The representation is not unique.

Claim 1: Let $\mathcal{S}$ be the space of functions like (1) with arbitrary $a_i$, i.e.: $ \mathcal{S}=\mathbb{R}\mathcal{G} $. The closure of $ \mathcal{S}$ in the topology defined in (2) is $Lip(\mathbb{R}^d)$.

Proof: If $ \overline{\mathcal{S}} \neq Lip(\mathbb{R}^d)$ then, by Hahn-Banach, there would by a non-zero linear function $ L(f)=\int_{\mathbb{R}^d} f(x) u(x) dx + \int_{\mathbb{R}^d} \nabla f(x) \cdot{w(x)} dx$ such that $L(f) = 0$ for all $f \in \mathcal{S}$.
Let $\rho$ be a smooth function with compact support. Since $\mathcal{S}$ is invariant under translations it follows that
$(3) \quad 0 = \int u\ast{\rho}(x) f(x) + w\ast{\rho}(x)\cdot{\nabla f(x)} dx = $
$\quad \int (u\ast{\rho}(x) - div(w\ast{\rho})(x))f(x) dx $ for all $f \in \mathcal{S}$.

The real and imaginary parts of functions of the form $ f(x)=e^{-2\pi x\cdot{\xi}}$ are in $\mathcal{S}$, therefore the Fourier transform of $(u\ast{\rho}(x) - div(w\ast{\rho}))$ is zero, so (3) holds for any $f \in Lip(\mathbb{R}^d)$. Letting $\rho \longrightarrow \delta $, we get that $L(f)=0$ for any $f \in Lip(\mathbb{R}^d)$, contradicting the fact that $L$ is not null.

Claim 2: $ Lip(\mathbb{R}^d) = \bigcup_{n \in \mathbb{N}} n\overline{\mathcal{G}}$.

Proof: Let $f \in Lip(\mathbb{R}^d)$. From Claim 1, there is a net $\{f_\lambda\}_\lambda$ in $\mathcal{S}$ such that $f_\lambda \rightarrow f$. So the functionals defined as $L_\lambda(w)=\int_{\mathbb{R}^d} w(x)\cdot{\nabla f_\lambda(x)} dx $ for $w \in L^1(\mathbb{R}^d)^d$ are pointwise bounded. By the uniform boundness theorem, they are uniformly bounded. It follows that $ sup_\lambda \| \nabla f_\lambda \|_\infty $ is finite, since $ \| T_\lambda \| = \| \nabla f_\lambda \|_\infty $. Taking $ n \in \mathbb{N} $ sufficiently large we have $ sup_\lambda \| \nabla f_\lambda \|_\infty \leq n $, and so $ f \in n\overline{\mathcal{G}} $.

Claim 3: $ \overline{\mathcal{G}} $ is close in the strong topology (i.e.: the topology defined by the (semi) norm $\|f\|=_{def} \| \nabla f \|_\infty $).

Proof: Since $ L^1(\mathbb{R}^d) $ is separable, the weak* topology of $ Lip(\mathbb{R}^d) $ restricted to $Lip1(\mathbb{R}^d) $ is metrizable; let $d$ be a metric on $Lip1(\mathbb{R}^d) $ that defines the weak* topology. Let $ f $ be in the closure of $ \overline{\mathcal{G}} $ with the strong topology. Let $ \{f_n\}_n$ converge to $f $ in the strong topology, $f_n \in \overline{\mathcal{G}} $, i.e.: $ \|\nabla f_n - \nabla f \|_\infty \rightarrow 0 $; in particular, $ d(f_n,f) \rightarrow 0 $.
For each $n$, since $f_n \in \overline{\mathcal{G}} $, there is $ h_n \in \mathcal{G} $ such that $ d(h_n,f_n) < \frac{1}{n} $. Then $ d(h_n,f) \leq d(h_n, f_n) + d(f_n,f) \rightarrow 0$, so $ f \in \overline{\mathcal{G}} $.

Claim 4: There is a constant $C_d$ such that $ Lip1(\mathbb{R}^d) \subseteq C_d\overline{\mathcal{G}} $.

Proof: From Claim 2, $ Lip(\mathbb{R}^d) = \bigcup_{n \in \mathbb{N}} n\overline{\mathcal{G}}$. From Claim 3, for each $ n \in \mathbb{N}, n\overline{\mathcal{G}}$ is closed in the strong topology. From Baire's theorem, at least one of the sets $n\overline{\mathcal{G}} \text{ } (n \in \mathbb{N}) $ has non-empty interior, i.e.: there is $n \in \mathbb{N}, \epsilon > 0, f \in Lip(\mathbb{R}^d)$ such that $f + \epsilon Lip1(\mathbb{R}^d) \subseteq n\overline{\mathcal{G}}$. From Claim 2, there is $a \in \mathbb{N}$ such that $ f \in a\overline{\mathcal{G}}$. Therefore, $ Lip1(\mathbb{R}^d) \subseteq C\overline{\mathcal{G}}$ with $ C = \frac{n+a}{\epsilon}$.

Finally, we can give a (non-constructive) answer the question in the affirmative. Since the definition of Wasserstein distance requires integration against functions in $Lip1(\mathbb{R}^d)$, we assume the measures involved have finite first moment.

Claim 5: If $\mu, \nu $ are measures in $\mathbb{R}^d$ with finite first moment, then
$ \quad W_{R^d}(\mu,\nu)\leq C_d\sup_{v\in R^d, |v|=1}W_{R}(\mu_v,\nu_v) $
where $C_d$ is the constant in Claim 4.

Proof: Let's assume first that $ \mu, \nu $ have densities $ u, w $ with respect to Lebesgue measure in $\mathbb{R}^d $. Then
$ \quad W_{R^d}(\mu,\nu) = sup_{f \in Lip1(\mathbb{R}^d)} \int_{\mathbb{R}^d} f(x) (u(x)-w(x)) dx \leq $
$\quad sup_{f \in C_d\overline{\mathcal{G}}} \int_{\mathbb{R}^d} f(x) (u(x)-w(x)) dx $, by Claim 4 .

But $sup_{f \in C_d\overline{\mathcal{G}}} \int_{\mathbb{R}^d} f(x) (u(x)-w(x)) dx = sup_{f \in \overline{\mathcal{G}}} \int_{\mathbb{R}^d} C_d f(x) (u(x)-w(x)) dx =$
$ C_d sup_{f \in \overline{\mathcal{G}}} \int_{\mathbb{R}^d} f(x) (u(x)-w(x)) dx $.
In turn, $ sup_{f \in \overline{\mathcal{G}}} \int_{\mathbb{R}^d} f(x) (u(x)-w(x)) dx = sup_{f \in \mathcal{G}} \int_{\mathbb{R}^d} f(x) (u(x)-w(x)) dx$, since $L(f)=\int_{\mathbb{R}^d} f(x) (u(x)-w(x)) dx$ is continuous in the weak* topology (2).
By definition, $\mathcal{G}$ is the convex hull of functions on $\mathbb{R}^d$ of the form $f(x\cdot{v}) $ with $|v|=1$ and $f \in Lip1(\mathbb{R})$, so
$\quad sup_{f \in \mathcal{G}} \int_{\mathbb{R}^d} f(x) (u(x)-w(x)) dx = $
$\quad sup_{f \in Lip1(\mathbb{R}), |v|=1} \int_{\mathbb{R}^d} f(x\cdot{v}) (u(x)-w(x)) dx = sup_{|v|=1} W_{\mathbb{R}}(\mu_v,\nu_v) $.
By continuity of the Wasserstein distance, we can pass from probability distributions with density to arbitrary distributions with finite first moments.$\square$