##
A Puzzle *June 25, 2010*

*Posted by genericme in Uncategorized.*

trackback

trackback

Consider a set of objects from which are drawn randomly at a time, with replacement. What is , the expected number of draws I have to make to have drawn all of the objects?

I have yet to find a satisfactory closed-form expression for even . I obtain an ugly series in terms of Stirling numbers of the second kind. However, I suspect that is asymptotically linear:

Is this a well-known problem?

For m=1 it is known as “the coupon collector’s problem”, and the expectation is . I haven’t heard about the problem for m>1 before, but I’m pretty sure that for fixed m, the expectation is still . I don’t known what it is in general.

In the case m=1, this is known as the “coupon collector problem”. In fact E(n,1) = n Hn, where Hn = 1 + 1/2 + … + 1/n is the

nth harmonic number. Hn is approximately log n, so E(n,1) is approximately n log n.I suspect E(n, m) is approximately (n log n)/m for large n and fixed m, but I haven’t seen this problem before.

Cool, thanks for the info. I was taking an unnecessarily difficult approach to solving this problem.

found an approximate bound using wald’s equation. http://en.wikipedia.org/wiki/Wald's_equation.

imagine the experiment with m=1.its known that

E(n,1)= n H(n),

where H(n)=1+1/2+1/3+….1/n

now let x1= number of draws to pick m distinct objects. After x1 draws, let x2=number of draws to pick m distinct objects, and so on till u see see all the n objects.

then E[X1]=E[X2]=…=n[1/(n)+1/(n-1)+...1/(n-m+1)]=n/[H(n)-H(n-m)]

If S is total number of draws

i.e. S=X1+X2+…….,then as per wald’s equation,

E[S]=E[X1]*E(n,m).

Now, E[S]-m <E(n,1) E(n,m) >= H(n)/[H(n)-H(n-m)]

where H(r)=1/r+ 1/(r-1)+..1/2+1

What happens in coupon collection is that if you want to collect say 2 of each coupon, the extra samples required after collecting (at least) one of each is smaller [on average]. Thus $E(n,2) < 2 E(n,1)$ on average. In fact

$E(n, m) = \Theta( n[ \log n+ (m-1) \log \log n])$

which is not surprising. Focusing on E(n,1) vs E(n,2) only, while collecting all coupons a lot of coupons have already repeated.

See "Randomized Algorithms" by Raghavan and Motwani.