Notes on the Hypergeometric distribution
15.02.2018
In contrast to binomial distribution where the probability of a success is the same for each experiment, the hypergeometric distribution describes situation when the probability of success decreases after each success (and increases after each failure). Each experiment is modelled by an urn containing \(N\) balls, \(K\) of them being ``lucky’’ (and \(N-K\) unlucky). \(n\) draws are performed from an urn and the hypergeometric random variable \(X\) is the numebr of lucky balls drawn. So, each lcuky ball decreases the probability to pick a lucky ball in the next draw. One more point of view is that the hypergeometric distribution models draws without replacement, while binomial models draws with replacement (to keep the probability of picking a lucky ball constant we return the drawn ball back to the urn).
Parameters: \(N\) – number of balls in the urn, \(K\) – number of lucky balls in the urn, \(n\) – number of draws, \(0\leq n\leq N,\) \(0\leq K\leq N.\)
Values: \(\{0,1,2,\ldots,n\}\) (in fact values with nonzero probabilities are \(k\in[\max(0,n-N+K),\min(n,K)]\))
Probability mass function: \[ P(X=k)=\frac{{K\choose k}{N-K\choose n-k}}{{N\choose n}}, \ k=0,1,2,\ldots,n. \] (we agree that \({K\choose k}=0\) for \(k>K\)).
{} There are \({N\choose n}\) possibilities to draw \(n\) balls from an urn containing \(N\) balls. We are interested in draws that contain exactly \(k\) lucky balls. So, satisfactory outcomes are draws that contain \(k\) lucky and \(n-k\) unlucky balls. There are \({K\choose k}{N-K\choose n-k}\) possibilities to perform such draw.
Moment generating function does not have a simple representation: \[ M(t)=\sum^{n}_{k=0}e^{tk}\frac{{K\choose k}{N-K\choose n-k}}{{N\choose n}} \]
Expectation: \(EX=n\frac{K}{N}\)
Variance: \(V(X)=\frac{nK(N-k)(N-n)}{N^2(N-1)}\)
Derivation
We find expectation as a sum over all possible values of \(X:\) \[ EX=\sum^n_{k=0}k \frac{{K\choose k}{N-K\choose n-k}}{{N\choose n}}= \] the summand with \(k=0\) is zero \[ =\frac{1}{{N\choose n}}\sum^n_{k=1}k {K\choose k}{N-K\choose n-k}= \] use the identity \(k{K\choose k}=K{K-1 \choose k-1}\) \[ =\frac{K}{{N\choose n}}\sum^n_{k=1} {K-1 \choose k-1 }{(N-1)-(K-1)\choose (n-1)-(k-1)}= \] change summation variable to \(k-1\) \[ =\frac{K}{{N\choose n}}\sum^{n-1}_{k=0} {K-1 \choose k }{(N-1)-(K-1)\choose (n-1)-k}= \] represent the sum as a sum of all probabilities for hypergeometric distribution with parameters \(N-1,K-1,n-1\) (which is 1) \[ =\frac{K{N-1\choose n-1}}{{N\choose n}}\sum^{n-1}_{k=0} \frac{{K-1 \choose k }{(N-1)-(K-1)\choose (n-1)-k}}{{N-1\choose n-1}}=\frac{K{N-1\choose n-1}}{{N\choose n}}=n\frac{K}{N} \]
Similarly we find \(EX(X-1):\) \[ EX(X-1)=\sum^n_{k=0} k(k-1)\frac{{K\choose k}{N-K\choose n-k}}{{N\choose n}}= \] \[ =\frac{1}{{N\choose n}}\sum^n_{k=2}k(k-1) {K\choose k}{N-K\choose n-k}= \] use the identity \(k(k-1){K\choose k}=K(K-1){K-2 \choose k-2}\) \[ =\frac{K(K-1)}{{N\choose n}}\sum^{n-2}_{k=0} {K-2 \choose k }{(N-2)-(K-2)\choose (n-2)-k}= \] \[ =\frac{K(K-1){N-2\choose n-2}}{{N\choose n}}=n(n-1)\frac{K(K-1)}{N(N-1)} \]
Variance is \[ V(X)=EX^2-(EX)^2=EX(X-1)+EX-(EX)^2=\] \[ =n(n-1)\frac{K(K-1)}{N(N-1)}+n\frac{K}{N}-n^2\frac{K^2}{N^2}= \] \[ =\frac{nKN^2-n^2KN-nK^2N+n^2K^2}{N^2(N-1)}=\frac{nK(N-k)(N-n)}{N^2(N-1)} \]