Goodness-of-Fit Tests

Let Y_{1}, Y_{ 2}, . . ., Y_{ n }be a set of independent and identically distributed random variables. Assume that the probability distribution of the Y_{ i}'s has the density function f_{ o }(y). We can divide the set of all possible values of Y_{i},

i ÃŽ {1, 2, ..., *n*}, into m non-overlapping intervals D_{1}, D_{2}, ...., D_{m}. Define the probability values *p*_{1}, *p*_{2} , ..., *p*_{m} as;

p_{1}= P(Y_{i}ÃŽ D_{1})

p_{2}= P(Y_{i}ÃŽ D_{2}):

:

p_{m }= P(Y_{i }ÃŽ D_{m})

Since the union of the mutually exclusive intervals D_{1}, D_{2}, ...., D_{m} is the set of all possible values for the Y_{i}'s, (*p*_{1} +* p*_{2} + .... +* p*_{m}) = 1. Define the set of discrete random variables X_{1}, X_{2}, ...., X_{m}, where

X

_{1}= number of Y_{i}'s whose value ÃŽ D_{1}

X_{2}= number of Y_{i}'s whose value ÃŽ D_{2}:

:X

_{m}= number of Y_{i}'s whose value ÃŽ D_{m}

and (X_{1}+ X_{2}+ .... + X_{m}) = n. Then the set of discrete random variables X_{1}, X_{2}, ...., X_{m}will have a multinomial probability distribution with parameters *n *and the set of probabilities {*p*_{1}, *p*_{2}, ..., *p*_{m}}. If the intervals D_{1}, D_{2}, ...., D_{m} are chosen such that *np*_{i}Â³ 5 for i = 1, 2, ..., m, then;

For the goodness-of-fit sample test, we formulate the null and alternative hypothesis as

Ho : f_{Y}(y) = f_{o}(y)

H1 : f_{Y}(y) Â¹ f_{o}(y)

At the a level of significance, Ho will be rejected in favor of H_{1 }if

However, it is possible that in a goodness-of-fit test, one or more of the parameters of f_{o}(y) are unknown. Then the probability values *p*_{1}, *p*_{2}, ..., *p*_{m} will have to be estimated by assuming that H_{o} is true and calculating their estimated values from the sample data. That is, another set of probability values *p*'_{1}, *p*'_{2}, ..., *p*'_{m }will need to be computed so that the values (*np*'_{1}, *np*'_{2}, ..., *np*'_{m}) are the estimated expected values of the multinomial random variable (X_{1}, X_{2}, ...., X_{m}). In this case, the random variable C will still have a chi-square distribution, but its degrees of freedom will be reduced. In particular, if the density function f_{o}(y) has *r* unknown parameters,

For this goodness-of-fit test, we formulate the null and alternative hypothesis as

H_{o}: f_{Y}(y) = f_{o}(y)

H_{1}: f_{Y}(y) Â¹ f_{o}(y)

At the a level of significance, H_{o} will be rejected in favor of H_{1 }if