Percentiles
Percentiles are values in a given set of observations that divide the data into 100 equal parts. These values can be denoted by P1, P2,....., P99, where
1 % of the data falls below (is less than or equal to) P1
2 % of the data falls below P2:
:99 % of the data falls below P99
Percentiles can be calculated using a sorted list of observations or the cumulative frequency distribution table corresponding to the observations. In the latter method, it is assumed that the values in a class interval are uniformly distributed within it; extrapolation is then used to calculate the percentiles. As this assumption is often untrue, percentile values can differ depending on whether raw data or frequency distributions were used in the computation. Therefore, percentiles are often treated as estimates for the value below which certain percentages of the observations fall.
EX. Given the following sorted list of observations:
0.7 | 0.8 | 0.9 | 1.1 | 1.2 | 1.4 | 1.9 | 2.2 | 2.2 | 2.3 |
2.5 | 3.1 | 3.2 | 3.3 | 3.4 | 3.8 | 3.9 | 4.0 | 4.1 | 4.2 |
4.3 | 4.6 | 4.7 | 5.0 | 5.2 | 5.5 | 5.6 | 5.8 | 5.9 | 6.1 |
6.4 | 6.6 | 6.8 | 7.0 | 7.7 | 8.2 | 8.9 | 9.2 | 9.5 | 9.9 |
P75 = 6.1, since 40 x 75 % = 30 and 6.1 is the 30th ranked value.
P45 = 4.0, since 40 x 45 % = 18 and 4.0 is the 18th ranked value.
P62 = 5.2, since 40 x 62 % = 24.8 and 5.2 is the 25th ranked value.
This set of observations has the following cumulative frequency distribution:
Measurements | Cumulative Frequency | Relative Cumulative Frequency |
0.0 - 1.0 | 3 | 0.075 |
1.0 - 2.0 | 7 | 0.175 |
2.0 - 3.0 | 11 | 0.275 |
3.0 - 4.0 | 18 | 0.450 |
4.0 - 5.0 | 24 | 0.600 |
5.0 - 6.0 | 29 | 0.725 |
6.0 - 7.0 | 34 | 0.850 |
7.0 - 8.0 | 35 | 0.875 |
8.0 - 9.0 | 37 | 0.925 |
9.0 - 10.0 | 40 | 1.000 |
Totals | 40 | 1.000 |
The percentiles can also be calculated from the cumulative frequency distribution table, using extrapolation to arrive at estimates:
P75 = 6.0 + 1.0 * ((0.750 - 0.725) / (0.850 - 0.725)) = 6.0 + 0.025 / 0.125 = 6.2
where 6.0 is the upper class limit of interval 5.0 - 6.0 with cumulative frequency 0.725, and 0.850 is the cumulative frequency of the next interval, 6.0 - 7.0, with class width 1.0.
P45 = 4.0, since the interval 3.0 - 4.0 has a cumulative frequency of 0.45
P62 = 5.0 + 1.0 × ((0.620 - 0.600) / (0.725 - 0.600)) = 5.0 + 0.020 / 0.125 = 5.16
where 5.0 is the upper class limit of interval 4.0 - 5.0 with cumulative frequency 0.600, and 0.725 is the cumulative frequency of the next interval, 5.0 - 6.0, with class width 1.0.
The values of P75 and P62 differ between the two methods of calculation, while the values of P45 for both methods are the same.
Deciles are values in a given set of observations that divide the data into 10 equal parts. These values can be denoted by D1, D2,....., D9 , where
10 % of the data falls below D1
20 % of the data falls below D2
:
:
90 % of the data falls below D9
It is easy to see that
D1 = P10 | D4 = P40 | D7 = P70 |
D2 = P20 | D5 = P50 | D8 = P80 |
D3 = P30 | D6 = P60 | D9 = P90 |
Quartiles are values in a given set of observations that divide the data in 4 equal parts. These values can be denoted by Q1, Q2 and Q3, where
25 % of the data falls below Q1
50 % of the data falls below Q2
75 % of the data falls below Q3
Again, it is obvious that Q1 = P25, Q2 = P50, and Q3 = P75 .
Deciles and quartiles are calculated in the same manner as percentiles.