Cumulative Frequency

Constructing the cumulative frequency table.

Cumulative frequency as the name suggests is the accumulated or collected frequencies up to a particular point. Let us look at an example for discrete data.

Students in a music class were asked how many musical instruments each could play. The results are :
3 4 5 3 3
1 1 2 2 1
3 3 4 4 5
3 1 2 3 5
1 1 6 1 7
Class frequency Cumulative frequency
0-1 7 7
2-3 10 17
4-5 6 23
6-7 2 25

The cumulative frequency in the last class is the total observations. The cumulative frequency for class 2-3 for example is just the total accumulated frequency up to 3 instruments. Thus, the cumulative frequency for class 2-3 is 7 + 10 = 17, (frequency for 0-1 and frequency for 2-3).

Let us now look at an example for continuous data. A group of students were subjected to a certain experiment to measure their reaction time in seconds. The results are as below:
Class frequency Cumulative frequency
[0,2[ 1 1
[2,4[ 2 3
[4,6[ 4 7
[6,8[ 4 11
[8,10[ 1 12

These cumulative frequency can be used to produce a cumulative frequency curve or ogive.

  1. We start the curve at the lowest class boundary of the first class. In our case this is 0 and the cumulative frequency at this point is set to 0.
  2. The end of the curve is the highest class boundary of the last class. In our case this is 10.
  3. The next point is (2,1) then we plot (4,3), (6,7), (8,11) and (10,12).
  4. Thus, the point is (upper class boundary,cumulative frequency at that class) except for the end points.

Exploration

Complete the cumulative frequency table below:
Class frequency Cumulative frequency
[3,5[ 2 2
[5,7[ 3  
[7,9[ 5  
[9,11[ 4  
[11,13[ 1  
Draw a cumulative frequency curve using the above information. Hint: the first point in your curve is (3,0) and the end point is (13,15).

Percentiles

  1. A 60th percentile is a value such that the 60 % of the distribution is beneath this value.
    For example, if your score in a SAT test is 95th percentile then your score is better than 95 % of all students who took the same test. You however do not necessary receive 95% in the test.
  2. The 25th percentile is also called the lower or first quartile (Q1). The 50th percentile and 75th percentile are respectively second quartile(Q2) and third quartile (Q3). The third quartile is also known as upper quartile.
  3. The second quartile is also known as median.
  4. The difference between the third and first quartiles is known as inter-quartile range. Inter-quartile range is also a measure of the spread of the data. Using the definition of percentile above, half of the sample size is found inside the interquartile range.
  5. Example 1. Below are the monthly rents of a particular size house in town X.
    5600 6800 4500 5900 7000 4900 5200 6100 9000
    1. Let us order these data: 4500 4900 5200 5600 5900 6100 6800 7000 9000
    2. The median is in the "middle" of the ordered data.
    3. The first quartile is (4900+5200)/2 = 5050.
    4. The third quartile is (6800+7000)/2 = 6900.
    5. Interquartile range = 6900-5050 = 1850.
    6. Note that if we have extreme minimum like 200 or maximum like 9000000 these values would not affect the median and the interquartile range for most data set like the one above.

  6. Example 2: 30 30 50 75 85 90 110 150 190 205 255
  7. These information about minimum, maximum and quartiles are commonly represented as "box and whisker plot." A box and whisker plot looks like this .
    A box and whisker plot helps us to visulize where half of the data (between Q1 and Q3) and distributed.
  8. Use of GDC.
    [STAT] Select 1:EDIT [ENTER]
    {Enter your data into the column under L1. For illustration I will use data from example 2 above.}

    Once finished then press [2nd][MODE] for quit. Press [STAT] again select CALC 1:1-Var Stats [Enter] You will then see this in your screen:
    1-Var Stats {blinking cursor}
    Enter [2nd][1] for L1 and press [ENTER]

    The mean here is 115 (3s.f). The second screen on the right helps us to produce a box-and-whisker diagram. Alternatively, you can set your GDC to the screen in the left below. Set the window to Xmin=15, Xmax=260, Xscl=10 to match your data set. The values of Ys in the window will not affect your box and whisker plot. Press [ENTER] to obtain the screen in the right. Press [TRACE] and use the arrow keys to obtain values of Q1, Q2 and etc.

    From the plot, we see that median is not in the middle of the distribution (between the minimum and maximum values) but nearer to the "tail end" of the distribution.

  9. So far, we have been dealing with raw ungrouped data. We could also obtained the quartiles of grouped data. This is accomplished using a cumulative frequency curve. Let us look at the above cumulative frequency curve. The 50th percentile is 50% of the cumulative frequency. Since 100% is 12 then 50% of the cumulative frequency is 6. Similarly first quartile and third quartile are 25% and 75% of the cumulative frequency. The values of first quartile, median, and third quartiles are read from the horizontal axis. The first quartile is 4 seconds, the median is 5.65 seconds, and the third quartile is 6.9 seconds.

Exercise.