Binomial Approximation to the Hypergeometric Distribution
A random variable X that has a hypergeometric distribution with parameters N, n and k has the following probability mass function:
The values of E(X) and Var(X) are
The hypergeometric distribution with parameters N, n and k is the probability distribution of the random variable X, whose value is the number of successes in a sample of n items from a population of size N that has k 'success' items and N - k 'failure' items. Like the binomial distribution, the hypergeometric distribution with parameters N, n and k is also the sum of n Bernoulli variables, with the ith Bernoulli variable having the value 1 if the n Bernoulli variables are no longer independent of each other; in fact, their parameters pi may differ from one another, since p, the probability of getting a success, depends on the number of successes already drawn in the previous (i-1) objects. object is a success, 0 otherwise. However, the
If the sample size n is small relative to N, then the probability of the object being a success will vary just slightly for different values of i. In this case, the hypergeometric distribution with parameters N, n and k will be the sum of n (almost) independent Bernoulli variables with parameter p = k / N. Thus, it can be approximated by the binomial distribution with parameters n and p = k / N.
The mean and variance of a random variable X having the binomial distribution above is
These values are the same as the mean and variance of the hypergeometric distribution above, except that the values for the variances differ by the factor
. term has a close to 1 for n small relative to N.