## A Puzzle June 25, 2010

Posted by genericme in Uncategorized.

Consider a set of $n$ objects from which $m$ are drawn randomly at a time, with replacement.  What is $E(n, m)$, the expected number of draws I have to make to have drawn all of the objects?

I have yet to find a satisfactory closed-form expression for even $E(n, 1)$.  I obtain an ugly series in terms of Stirling numbers of the second kind.   However, I suspect that $E(n, 1)$ is asymptotically linear:

Is this a well-known problem?

1. Sune Jakobsen - June 25, 2010

For m=1 it is known as “the coupon collector’s problem”, and the expectation is $\Theta (n \log n)$. I haven’t heard about the problem for m>1 before, but I’m pretty sure that for fixed m, the expectation is still $\Theta (n \log n)$. I don’t known what it is in general.

2. Michael Lugo - June 25, 2010

In the case m=1, this is known as the “coupon collector problem”. In fact E(n,1) = n Hn, where Hn = 1 + 1/2 + … + 1/n is the nth harmonic number. Hn is approximately log n, so E(n,1) is approximately n log n.

I suspect E(n, m) is approximately (n log n)/m for large n and fixed m, but I haven’t seen this problem before.

3. genericme - June 25, 2010

Cool, thanks for the info. I was taking an unnecessarily difficult approach to solving this problem.

4. gaurav - January 4, 2011

found an approximate bound using wald’s equation. http://en.wikipedia.org/wiki/Wald's_equation.

imagine the experiment with m=1.its known that
E(n,1)= n H(n),
where H(n)=1+1/2+1/3+….1/n

now let x1= number of draws to pick m distinct objects. After x1 draws, let x2=number of draws to pick m distinct objects, and so on till u see see all the n objects.

then E[X1]=E[X2]=…=n[1/(n)+1/(n-1)+...1/(n-m+1)]=n/[H(n)-H(n-m)]

If S is total number of draws
i.e. S=X1+X2+…….,then as per wald’s equation,
E[S]=E[X1]*E(n,m).

Now, E[S]-m <E(n,1) E(n,m) >= H(n)/[H(n)-H(n-m)]

where H(r)=1/r+ 1/(r-1)+..1/2+1

5. kodlu - April 3, 2011

What happens in coupon collection is that if you want to collect say 2 of each coupon, the extra samples required after collecting (at least) one of each is smaller [on average]. Thus \$E(n,2) < 2 E(n,1)\$ on average. In fact

\$E(n, m) = \Theta( n[ \log n+ (m-1) \log \log n])\$

which is not surprising. Focusing on E(n,1) vs E(n,2) only, while collecting all coupons a lot of coupons have already repeated.

See "Randomized Algorithms" by Raghavan and Motwani.