Expected number of draws to find a match

Introductory remark. This answer was originally written to treat the case of seeing the first repeated value when drawing without replacement from a deck of cards as proposed at this MSE link. It answers the corresponding case of $n$ pairs of socks as well, however, consult the end of the document for this.

We solve the problem where we have $j$ instances of each of $n$ types of coupons and draw without replacement until we have seen $2$ coupons of some type. For a deck of cards we have $13$ types of coupons and $4$ instances of each type. Using the notation from the following MSE link I and MSE link II we introduce the marked generating function

$$\left(1 + j w z\right)^n.$$

The coefficient on $[z^m]$ here represents distributions of sequences of $m$ draws from the $n$ types according to probability, where the ones that occur one time have been marked. Each of the latter may be augmented to a pair of some color where the weight is $j-1$ because one coupon has already been drawn. As we only need the count we differentiate with respect to $w$ and set $w=1$, getting

$$n\times \left(1 + jz\right)^{n-1} \times j z.$$

With the method from the linked posts we thus obtain for the probability

$$P[T = m] = \frac{1}{m!} {nj\choose m}^{-1} (m-1)! \times (j-1) \times [z^{m-1}] n j z (1 + jz)^{n-1} \\ = \frac{nj}{m} {nj\choose m}^{-1} \times (j-1) \times [z^{m-2}] (1 + jz)^{n-1} \\ = \frac{nj}{m} {nj\choose m}^{-1} \times (j-1) \times {n-1\choose m-2} j^{m-2} \\ = (j-1) \times {nj-1\choose m-1}^{-1} {n-1\choose m-2} j^{m-2}.$$

Next we verify that this is a probability distribution. The process may halt after two steps at the earliest and $n+1$ at the latest and we get

$$\sum_{m=2}^{n+1} P[T=m] = (j-1) \sum_{m=2}^{n+1} {nj-1\choose m-1}^{-1} {n-1\choose m-2} j^{m-2} \\ = (j-1) \sum_{m=2}^{n+1} {nj-1\choose m-1}^{-1} \frac{m-1}{n} {n\choose m-1} j^{m-2} \\ = \frac{j-1}{n} \sum_{m=2}^{n+1} {nj-1\choose m-1}^{-1} {n\choose m-1} (m-1) j^{m-2}.$$

We have

$${nj-1\choose m-1}^{-1} {n\choose m-1} = \frac{n!\times (nj-m)!}{(nj-1)!\times (n-(m-1))!} \\ = {nj-1\choose n}^{-1} {nj-m\choose n-(m-1)}.$$

Here we have used the fact that for the scenario to make sense we must have $j\ge 2.$ Continuing we find

$$\frac{j-1}{n} {nj-1\choose n}^{-1} \sum_{m=2}^{n+1} {nj-m\choose n-(m-1)} (m-1) j^{m-2}$$

The sum term yields

$$\sum_{m=1}^{n} {nj-1-m\choose n-m} m j^{m-1} = \sum_{m\ge 1} [w^{n-m}] (1+w)^{nj-1-m} m j^{m-1} \\ = [w^n] (1+w)^{nj-1} \sum_{m\ge 1} w^m (1+w)^{-m} m j^{m-1} \\ = [w^n] (1+w)^{nj-1} \frac{w}{(1+w)} \sum_{m\ge 1} w^{m-1} (1+w)^{-(m-1)} m j^{m-1} \\ = [w^n] (1+w)^{nj-1} \frac{w}{(1+w)} \frac{1}{(1-wj/(1+w))^2} \\ = [w^{n-1}] (1+w)^{nj} \frac{1}{(1+w-wj)^2} = [w^{n-1}] (1+w)^{nj} \frac{1}{(1-(j-1)w)^2}.$$

Extracting coefficients we find

$$\sum_{q=0}^{n-1} {nj\choose n-1-q} (q+1) (j-1)^q = \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (q+1) (j-1)^q \\ = nj \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^q - n(j-1) \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (j-1)^q \\ = n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^{q+1} + n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^q \\ - n \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (j-1)^{q+1} \\ = n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^{q+1} + n \sum_{q=-1}^{n-2} {nj-1\choose nj-n+q+1} (j-1)^{q+1} \\ - n \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (j-1)^{q+1} \\ = n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q} (j-1)^{q+1} + n \sum_{q=0}^{n-1} {nj-1\choose nj-n+q+1} (j-1)^{q+1} \\ + n {nj-1\choose nj-n} - n \sum_{q=0}^{n-1} {nj\choose nj-n+q+1} (j-1)^{q+1} = n {nj-1\choose nj-n}.$$

Collecting everything we obtain

$$\frac{j-1}{n} {nj-1\choose n}^{-1} \times n \times {nj-1\choose n-1} \\ = \frac{j-1}{n} {nj-1\choose n}^{-1} \times n \times {nj-1\choose n} \frac{n}{nj-n} = 1$$

and we have confirmed that we have a probability distribution.

The next step is to compute the expectation. Recapitulating the earlier computation we find that

$$E[T] = \sum_{m=2}^{n+1} m P[T=m] = \frac{j-1}{n} {nj-1\choose n}^{-1} \sum_{m=1}^{n} {nj-1-m\choose n-m} (m+1) m j^{m-1}$$

or

$$\bbox[5px,border:2px solid #00A000]{ E[T] = \frac{2(j-1)}{n} {nj-1\choose n}^{-1} [w^{n-1}] (1+w)^{nj+1} \frac{1}{(1-(j-1)w)^3}.}$$

Extracting coefficients we obtain the closed form

$$\bbox[5px,border:2px solid #00A000]{ E[T] = \frac{(j-1)}{n} {nj-1\choose n}^{-1} \sum_{q=0}^{n-1} {nj+1\choose n-1-q} (q+2)(q+1) (j-1)^q.}$$

Observe that for a deck of cards we get

$$E[T] = {\frac {226087256246}{39688347475}} \approx 5.696565129.$$

Furthermore this simplifies when $j=2$ (pairs of socks). Instantiating $j$ to $2$ will produce

$$\frac{2}{n} {2n-1\choose n}^{-1} [w^{n-1}] (1+w)^{2n+1} \frac{1}{(1-w)^3}.$$

The coefficient is

$$\mathrm{Res}_{w=0} \frac{1}{w^n} (1+w)^{2n+1} \frac{1}{(1-w)^3}.$$

Note that the residue at infinity is given by

$$- \mathrm{Res}_{w=0} \frac{1}{w^2} w^n \frac{(1+w)^{2n+1}}{w^{2n+1}} \frac{1}{(1-1/w)^3} = - \mathrm{Res}_{w=0} \frac{1}{w^2} \frac{(1+w)^{2n+1}}{w^{n+1}} \frac{w^3}{(w-1)^3} \\ = \mathrm{Res}_{w=0} \frac{(1+w)^{2n+1}}{w^{n}} \frac{1}{(1-w)^3}.$$

Hence the value is minus half the residue at $w=1$. We find with $(1-w)^3 = - (w-1)^3$

$$\frac{1}{2} \times \frac{1}{2} \left.\frac{1}{w^n} (1+w)^{2n+1} \left(\frac{n(n+1)}{w^2} - \frac{2n(2n+1)}{w (1+w)} + \frac{(2n+1)(2n)}{(1+w)^2}\right)\right|_{w=1} \\ = 2^{2n-1} \left(n^2+n - 2n^2-n + n^2 + \frac{1}{2} n\right) = \frac{1}{4} n 4^n.$$

Now observe that

$${2n-1\choose n}^{-1} = {2n\choose n}^{-1} \times 2n \times \frac{1}{n} = 2 {2n\choose n}^{-1} \sim 2 \times \frac{\sqrt{\pi n}}{4^n}$$

We thus have the closed form for $j=2$

$$\bbox[5px,border:2px solid #00A000]{ E[T] = {2n-1\choose n}^{-1} \frac{1}{2} 4^n = {2n\choose n}^{-1} 4^n.}$$

and we get the nice asymptotic

$$\bbox[5px,border:2px solid #00A000]{ E[T] \sim \sqrt{\pi n}.}$$

There is also a very basic C program which confirmed the closed form of the expectations for all combinations of $n$ and $j$ that were examined. For example with $j=5$ we get the expectations

$$2,{\frac {23}{9}},{\frac {272}{91}},{\frac {3253}{969}}, {\frac {6522}{1771}},{\frac {94477}{23751}}, {\frac {714436}{168113}},{\frac {69263329}{15380937}},\ldots$$

with values

$$2, 2.555555556, 2.989010989, 3.357069143, 3.682665161, \\ 3.977811461, 4.249736784, 4.503193076,\ldots $$

Running the program on $10^8$ trials will then match these values to about five digits decimal precision.

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
#include <time.h>
#include <string.h>

int main(int argc, char **argv)
{
  int n = 6 , j = 3, trials = 1000; 

  if(argc >= 2){
    n = atoi(argv[1]);
  }

  if(argc >= 3){
    j = atoi(argv[2]);
  }

  if(argc >= 4){
    trials = atoi(argv[3]);
  }

  assert(1 <= n);
  assert(2 <= j);
  assert(1 <= trials);

  srand48(time(NULL));
  long long data = 0;

  for(int tind = 0; tind < trials; tind++){
    int src[n*j];

    for(int cind = 0; cind < n*j; cind++)
      src[cind] = cind/j;

    int done = 0; int steps = 0; 
    int dist[n];

    for(int cind = 0; cind < n; cind++)
      dist[cind] = 0;


    while(!done){
      int cpidx = drand48() * (double)(n*j-steps);
      int coupon = src[cpidx];

      for(int cind=cpidx; cind < n*j-steps-1; cind++)
        src[cind] = src[cind+1];

      steps++;
      dist[coupon]++;

      if(dist[coupon] == 2)
        done = 1;
    }

    data += steps;
  }

  long double expt = (long double)data/(long double)trials;
  printf("[n = %d, j = %d, trials = %d]: %Le\n", 
         n, j, trials, expt);

  exit(0);
}

Suppose you pick socks without replacement $k$ times, $2\leq k\leq n+1$, when you find the first match. When you pick socks the first time there are $N\equiv 2n$ ways of doing it.

If there is to be no match during second pick, then there are $N(N-2)$ ways of picking the second sock, because for each sock picked the first time there are $(N-2)$ socks that we could pick the second time without having a match.

If there is to be no match during third pick, then there are $N(N-2)(N-4)$ ways of picking the third sock. This is because after two picks without a match, there are $(N-2)$ socks left, and for there are to be no match in the third pick there are $(N-2)-2$ choices available.

If there is to be no match during fourth pick, then there are $N(N-2)(N-4)(N-6)$ ways of picking the second sock. This is because after three picks without a match, there are $(N-3)$ socks left, and for there are to be no match in the fourth pick there are $(N-3)-3$ choices available.

Since the first match occurs on $k$-th picking, there has been no match during the previous $(k-1)$ pickings. The number of ways for there to be no match during $(k-1)$ pickings is: \begin{align} &N(N-2)(N-4)...(N-2(k-2))\\ =&2^{k-1}n(n-1)(n-2)...(n-(k-2))\\ =&2^{k-1}\prod_{i=0}^{k-2}(n-i) \end{align} Since you have picked $(k-1)$ distinct socks so far, to get a match on $k$-th picking we must pick one of the $(k-1)$ socks. Therefore the total number of ways to get the first match on $k$-th pick is: $N(N-2)(N-4)...(N-2(k-1))(k-1)$. \begin{align} (k-1)2^{k-1}\prod_{i=0}^{k-2}(n-i) \end{align}

Since there are a total of $N(N-1)(N-2)...(N-(k-1))=\prod_{i=0}^{k-1}(2n-i)$ ways to select socks without replacement in $k$ picks the sought for probability is: \begin{align} \frac{(k-1)2^{k-1}\prod_{i=0}^{k-2}(n-i)}{\prod_{i=0}^{k-1}(2n-i)} \end{align} This formula correctly reproduces your results. Using Mathematica I get the $\lim_{n\to\infty}\frac{E[X_n]}{n}=0$.


Here is an alternative solution.

Let's say the first match occurs on the $X$th draw. We would like to find $\Pr(X>k)$, i.e. the probability that there has been no match by draw $k$, for $k = 0, 1, 2, \dots ,n$. There are $\binom{2n}{k}$ possible selections of $k$ socks, all of which we assume are equally likely. If there is no match, we must have at most one sock of each of $k$ colors. There are $\binom{n}{k}$ ways to select the $k$ colors, and $2^k$ ways to select the socks of those colors, so $\binom{n}{k}\; 2^k$ ways in all. So $$\Pr(X>k) = \frac{\binom{n}{k} \; 2^k}{\binom{2n}{k}}$$

Hence $$E(X) = \sum_{k \ge 0}\Pr(X>k) = \sum_{k=0}^n \frac{\binom{n}{k} \; 2^k}{\binom{2n}{k}}$$