Grouped Data

Aims: By the end of this chapter, you will be able to
(i) represent grouped data using a frequency table,
(ii) represent grouped discrete data using a bar chart,
(iii) represent grouped continuous data using a histogram, and
(iv) define modal group, and a frequency polygon.

Grouped Data

  1. Grouping means putting the data into a number of classes.
  2. The number of items of data falling into any class is called the frequency of the class.

Grouped discrete data

Let us look at this through an example. Here is the tally of 23 students who had just finished a 7-true-false-question test.
Score 0 1 2 3 4 5 6 7
Tally   // // / //// / ////
We could also group the above data into predefined classes.
Group/Class Tally Frequency, f  
0-1
//
2
2-3
///
3
4-5
///
8
6-7
10
Note the sum of frequency should be 23 because we have 23 students. In notation, this is Σ f = 23. "Σ" means summation.

We could represent these grouped data using a bar chart. These bars can be drawn touching each other for aesthetic appeal.
The group with the most frequent observation is called the modal group. In our case, the answer is group 6-7.
It is possible to have distribution that has more than one modal group.
We could also group categorical data in similar fashion. For example, 22 travellers are asked which country have he/she last visited before arriving at Hong Kong. Since it may not be convenient to deal with more than 150 countries in the world then we could group the data into various continent like Asia, Europe, Africa, America, and Ocenia.
continent Asia Europe Africa America Ocenia
Frequency, f   8 5 4 3 2

These data could be represented as a bar char. Note, we have two representations here and both of them are correct. The emphasis however is that the presentation should be easy to follow and not misleading. The one on the left is probably more pleasing than the one on the right.
Categorical data can also be represented efficiently using pie charts.

Does it make sense to talk about the shape of distribution for grouped categorical data? [hint: look at barcharts above]


Stem and Leaf Diagram

The weight of 30 students in kg are as follow:
45   45  50  60  62  63   47  69  51  61
58   59  61  63  69  63   75  78  70  51
50   79  71  52  59  55   79  88  75  55
One way to represent these data to get some idea about distribution but without lossing the individual information (datum) is to construct a stem and leaf diagram as below.

Stem
Leaf
4 |
  5 5 7
5 |
  0 0 1 1 2 5 5 8 9 9
6 |
  0 1 1 2 3 3 3 9 9
7 |
  0 1 5 5 8 9 9
8 |
  8

Key: 4 | 5 = 45 kg.

You should always provide a key to your stem and leaf diagram. This is an ordered stem and leaf diagram because the values are in ascending order.

A steam and leaf diagram can be used for simple comparison of two sets of data.
The height in cm of a group of male and female students are
Male: 168   170  171   175  180  163  188   176  176
Female: 180  155  160  165  168 165  159  157  158  155  154


Female
Stem
Male
9 8 7 5 5 4
| 15 |
 
8 5 5 0
| 16 |
3 8
 
| 17 |
0 1 5 6 6
0
| 18 |
0 8

Key 0 | 16 | 3 = 160 cm for a female student and 163 cm for a male student.

This simple comparison allows us to see that the most female students have a height less than 170cm and most male students have a height at or above 170 cm. Also note that a stem is left blank when there is not data in that group, for instance, no male student has a height between 150 cm and 159 cm.

Exercise.
1. Form a steam and leaf diagram for the percentage scores for a group of fifteen students in a Math Studies Test.
89   88  90  91  78  70  71  78  74  75  74  84  85  69  70

2. Form a steam and leaf diagram for the reading speed in words per minute for two group of students.
Group A: 23   25  45  44  50  23  20  28  29  31
Group B: 33   40  55  42  40  43  36  30  39  40

Remember to provide a key for each stem and leaf diagram.


Grouped Continuous Data.

  1. Continuous data cannot take exact values but can only be given to within a specific degree of accuracy. Examples:
    1. Tom's height (h) is 178 cm (given to the nearest cm). Tom's actual height could in reality be from any value in the interval
      177.5 ≤ h < 178.5 or "h is in interval [177.5,178.5)."
    2. The temperature of a cup of coffee (T) is 41.8o C (measured to the the nearest tenth of a degree).
      In reality, 41.75o C ≤ T < 41.85o C.
    3. The depth of an ocean (d0 is given as 9200 m measured to the nearest 100m.
      In reality, 9150 m ≤ d < 9250 m.
    4. The waiting time (t) is given as 11 minute measured to the nearest minute.
      In reality, 10.5 minutes ≤ t < 11.5 minutes.
  2. Let us look at this through an example. We will use above concepts of interval width and class boundary in this example. Let us reuse the example of weight gain above.
    Here is a the gain in weight w (in kg) of 18 students after being fed a certain diet.
    0.1   0.2  0.7  0.5  0.8  1.3
    1.2  1.5  0.8  1.4  1.3  1.3
    1.5  1.6  1.5  1.8  1.9  2.0
  3. Let us first organize this into a frequency distribution table. We are dealing with continuous data so a recorded value of w = 1.5 kg actually means 1.45 ≤ w < 1.55. Thus, we need to group these observations into some sensible class intervals.
    class (kg) [0.05,0.55) [0.55,1.05) [1.05,1.55) [1.55,2.05) [2.05,2.55)
    Frequency, f   2 4 5 6 1
    The class boundaries are 0.05, 0.55, 1.05, 1.55, 2.05, and 2.55.
    All class widths in this case are 0.5 kg.
  4. We can represent the frequency distribution above using a special bar chart called histogram.
  5. Note that the boundaries of these bars ARE the above class boundaries.
  6. All classes have the same class width.
  7. There are NO gaps between the rectangles because these data are continuous.
  8. The modal group/class is the class with the highest frequency or the tallest bar. In this case, it is obvious the modal class is [1.55,2.05).

  9. A frequency polygon is formed by joining the mid-interval values and the extrema in a bar chart, frequency histogram or frequency density hisotgram. The mid-interval values are found by averaging the class boundaries. For exampe the class [0.05,0.55) is (0.05+0.55)/2 = 0.30. The red lines superimpose on the frequency histogram is the frequenct polygon. As you can see, a polygon histogram conveys the same information as a histogram. The difference is in shape.

    Use of GDC.
    [STAT] Select 1:EDIT [ENTER]
    {Enter the mid-interval value for each x into the column under L1 and frequencies into another column under L2}

    Once finished then press [2nd][MODE] for quit. Before you plot your histogram, make sure there is no function in [Y=] because they will interfere with your plot latter.
    [2nd][Y=] select 1:Plot1 to activate the plot by selecting ON [ENTER]. Then move your cursor to the Type and select the icon with bars and [ENTER]. Make sure that the Xlist and Freq(frequency) correspond to your respectively lists. If not, make the appropriate changes.

    Before you plot, make sure that you set your [WINDOW] to match your data. In the example above, we could have Xmin=-1, Xmax=3, Xscl=0.5,Ymin=-0.5, Ymax=7,Yscl=1,Xres=1.
    This setting will leave some to the left and right of the histogram.
    [Graph] to plot the histogram. Press [TRACE] and appropriate arrow buttons to read these bars.

    This histogram can help you to draw an accurate histogram for grouped data or sketch a quick histogram as part of your workings.

Exploration

  1. Here are the recorded time in seconds for a group of students to complete 100 metre sprint.
    100m time (t) in seconds [8,9.5) [9.5,11) [11,12.5) [12.5,14) [14,15.5)
    Frequency, f   1 3 9 10 9
    1. Draw a frequency histogram with the above information.
    2. How many students took part in the 100 metre sprint?
    3. What is the modal class?
    4. Confirm your histogram with GDC.