What is the expected value of an N-dim vector of uniform randoms that sum to 1 which have been sorted into descending order?

You pick $N-1$ points $X_1,\dotsc, X_{N-1}$ independently and uniformly at random in $[0,1]$. They divide the segment $[0,1]$ into $N$ segments of lengths $S_1,\dotsc, S_N$ usually referred to as spacings.

To use your description, rearrange the $X_i$ in increasing order $X_{(1)}\leq \cdots \leq X_{(N-1)}$ and then

$$ S_1= X_{(1)},\;\;S_2= X_{(2)}-X_{(1)},\dotsc , S_N=1-X_{(N-1)}. $$

The random vector $\vec{S}=(S_1,\dotsc, S_n)$ is uniformly distributed on the $(N-1)$-dimensional simplex $\newcommand{\bR}{\mathbb{R}}$

$$ \Delta_{N-1}= \bigl\{\; (s_1,\dotsc, s_N)\in \bR^N_+;\;\;\sum_{j=1}^N s_j=1\;\bigr\} $$

equipped with the normalized surface measure. The "area" $A_{N-1}$ of $\Delta_{N-1}$ is found using the recurrence

$$A_1=\sqrt{2},\;\;A_N= \sqrt{\frac{N+1}{N}}\frac{A_{N-1}}{N}. $$

This yields

$$A_N= \frac{\sqrt{N+1}}{N!}. $$

Rearrange the spacings $S_1,\dotsc,, S_n$ in increasing order

$$S_{(1)}\leq \cdots \leq S_{(N)}. $$

The random vector $(S_{(1)},\dotsc, S_{(N)})$ is uniformly distributed in the $N-1$-dimensional polyhedron

$$ P_{N-1}=\bigl\{\, (s_1,\dotsc, s_N)\in \Delta_{N_1};\;\;s_1\leq \cdots \leq s_N\;\bigr\}. $$

This is another $(N-1)$-dimensional simplex with vertices

$$v_1=(0,0,\dotsc, 0,1),\;\;v_2=\frac{1}{2}(0,\dotsc, 0,1,1),\;v_3=\frac{1}{3}(0,\dotsc, 0,1,1,1),\dotsc, $$

and total "area"

$$ \alpha_{N-1}=\frac{1}{N!} A_{N-1}. $$

The mean you are looking for is the center of mass of the simplex $P_{N-1}$, where the mass is uniformly distributed on this polyhedron.

To compute effectively the coordinates of this center of mass it may be convenient to work with new linear coordinates, $x=(x_1,\dotsc, x_N)$ determined by the basis $v_1,\dotsc, v_N$ of $\bR^N$. They are related to the standard Euclidean coordinates $(s_1,\dotsc, s_N)$ of $\bR^N$ via the equalities

$$ s_j=s_j(x) =\sum_{k=N-j+1}^N\frac{1}{k}x_k,\;\;j=1,\dotsc, N.$$

In the $x$-coordinates the simplex $P_{N-1}$ becomes the simplex

$$ D_{N-1}= \bigl\{\; (x_1,\dotsc, x_N)\in \bR^N_+;\;\;\sum_{j=1}^N x_j=1\;\bigr\} $$

Denote by $dA_x$ the normalized area form on $D_{N-1}$. The coordinates of the center of mass of $P_{N-1}$ are $(\bar{s}_1,\dotsc, \bar{s}_N)$, where

$$\bar{s}_j=\int_{D_{N-1}} s_j(x) dA_x,\;\;j=1,\dotsc, N. $$

Now observe that

$$\int_{D_{N-1}} x_j dA_x= \frac{1}{N},\;\;j=1,\dotsc, N.$$

We deduce

$$ \boxed{E[S_{(j)}]=\bar{s}_j=\frac{1}{N}\sum_{k=N-j+1}^N\frac{1}{k},\;\;j=1,\dotsc, N.} $$

Here is also a plausibility test. The above formula implies

$$\bar{s}_1+\cdots + \bar{s}_N=1. $$

This is in perfect agreement with the equality $S_{(1)}+\cdots +S_{(N)}=1$.

Remark. The problem of the distribution of spacings is rather old. For example there is Witworth formula (1897) describing the expectation of the larges spacing $S_{N}$. More precisely it states

$$ E[S_{(N)}]=\underbrace{\sum_{k=1}^{N}(-1)^{k+1} \frac{1}{k^2} \binom{N-1}{k-1}}_{=:L_N}. $$

It looks rather different from what we proved

$$ E[S_{(N)}]=\bar{s}_N=\frac{1}{N}\left(1+\frac{1}{2}+\cdots +\frac{1}{N}\right). $$

I have not thought how prove directly that

$$ L_N=\bar{s}_N, $$

but simple computer simulations show that indeed the two rather different descriptions are identical. For more details about the spacings problem see this blogpost.


Liviu has given an excellent answer with a nice geometric flavor to it. This answer is meant to serve as a complement with a slightly more probabilistic bent. In the process, some of the computations get buried in favor of bringing out the connections to other areas of elementary probability theory.

A well-known alternative mechanism for generating points uniformly on the simplex is as follows: Let $Z_1,\ldots,Z_n$ be iid exponential random variables. Then $(Z_1/Z,\ldots,Z_n/Z)$ is uniformly distributed on the simplex where $Z = Z_1 + \cdots + Z_n$.

This is a special case of the Dirichlet distribution. In your question, you are effectively interested in computing $\mathbb E \frac{Z_{(k)}}{Z}$ for each $k$ where $Z_{(k)}$ is the $k$th order statistic from your iid exponential sample.

By Rényi representation, we can construct $Z_{(k)}$ directly as follows: Let $Z_{(0)} = 0$, and let $(\widetilde Z_i)_{i \geq 1}$ be a sequence of iid exponentials. Then, form $$ Z_{(k)} = Z_{(k-1)} + \frac{\widetilde Z_k}{n-k+1}\>. $$ Note that $Z = \sum_{k=1}^n Z_k = \sum_{k=1}^n Z_{(k)} = \sum_{k=1}^n \widetilde Z_k$.

Hence $$ \mathbb E \frac{Z_{(k)}}{Z} = \sum_{i=1}^k \frac{1}{n-i+1}\mathbb E \frac{\widetilde Z_i}{Z} = \frac{1}{n} \sum_{i=1}^k \frac{1}{n-i+1} \>, $$ since the $\widetilde Z_i/Z$ have the same $\text{Beta}(1,n-1)$ distribution for each $i$.

References

  1. A. Rényi (1953), On the theory of order statistics, Acta Mathematica Hungarica, vol. 4(3–4), pp. 191–231.
  2. L. Devroye (1986), Non-Uniform Random Variate Generation, Springer-Verlag. (Chapter 5: Uniform and Exponential Spacings).

Postscriptum: Note that the algorithms given in this answer for generating a point on the standard simplex and on its sorted variant are both $O(n)$ compared to the $O(n \log n)$ approach in the question, so if $n$ is large and computing $\log x$ for $0 < x < 1$ is cheap, this may lead to meaningful gains in sampling efficiency.