What is the correct way to think about this yet another balls/boxes problem?

There is some ambiguity about the statement of the problem, since "randomly" can be interpreted in many ways.

We interpret the statement as meaning that the balls are placed in boxes one at a time. Each time, a ball is equally likely to end up in any of the boxes, and the choices are independent.

To visualize, it is useful to consider the balls to be distinct. That does not affect the probability.

For each ball, we have $n$ choices for the box it goes into. That gives a total of $n^n$ equally likely choices.

Now we count the number of ways to have precisely one empty box. That unlucky box can be chosen in $n$ ways. There will then be a lucky box that gets $2$ balls. That lucky box can be chosen in $n-1$ ways.

The balls that end up in the lucky box can be chosen in $\binom{n}{2}$ ways. For each of these ways, the $n-2$ boxes that will receive exactly $1$ ball can be filled from the $n-2$ remaining balls in $(n-2)!$ ways.

That gives a total of $n(n-1)\binom{n}{2}(n-2)!$ "favourables." This simplifies to $\binom{n}{2}n!$. Divide by $n^n$ for the probability.

Remark: By "Stars and Bars" there are $\binom{2n-1}{n-1}$ of distributing $n$ identical balls among $n$ boxes. Let us use probability model that says all these ways are equally likely.

Now we count the favourables. The unlucky box can be chosen in $n$ ways, and for each way the lucky box can be chosen in $n-1$ ways. Now the distribution is determined, so there are $n(n-1)$ favourables.

The model we used for the second calculation is physically very unreasonable. It is very different from the first model. Under the first model, all balls ending up in the first box has probability $\frac{1}{n^n}$. Under the second, the probability is $\frac{1}{\binom{2n-1}{n-1}}$, a much larger number for large $n$.

The second model does not seem suitable for any application I can think of. (The first is important in many places, including the study of hashing.)

I am of the opinion that the first model is the most reasonable interpretation of the problem.

Number the boxes $B_1$, $B_2,\,\dots B_n$, and the balls $b_1$, $b_2,\,\dots b_n$.

Then the probability that there is exactly one empty box is the number of arrangements in which there is exactly one empty box, divided by the total number of possible arrangements.

The number of arrangements in which there is exactly one empty box is the number of ways to put $n$ balls in $n-1$ boxes, multiplied by $n$ (the number of ways to choose which box is empty).

We want to put $n$ balls in $n-1$ boxes, such that there are no empty boxes. There are $n-1$ ways to choose the box in which there are two balls. Then there are $\binom{n}{2}$ ways to choose the two balls that go in this box. Finally, there are $(n-2)!$ ways to distribute the remaining $n-2$ balls in the remaining $n-2$ boxes. Multiplying these terms together, we get:

$$(n-1)\binom{n}{2}(n-2)! = \frac{n! \cdot n \cdot (n-1)}{2}$$

The total number of arrangements is $n^n$ (each ball has $n$ possible boxes into which it can be placed). Thus, our final probability is:

$$\frac{n! \cdot n \cdot (n-1)}{2(n^n)}$$

What is the correct way to think about this yet another balls/boxes problem?

Tags:

Combinatorics

Probability

Related

Recent Posts