Measure of Central Tendency

Aim: By then end of this chapter, you will be able to calculate the mode, median and mean from different types of raw data.

  1. Here we will look at some statistics that summarizes certain feature of the data. These statistics are mode, median and mean. We have already encountered mode and median. Mean is a measure which we commonly called "average." Mean is the summation of all data and divide these sum by the total frequency or observation.

  2. For discrete data x, the MEAN = Σ x / n
    where n is the total number of observations.

  3. If we are dealing with a set of data x then is used as the symbol for the mean of x. In fact, represents the sample mean.

  4. Population mean is represented by "μ". Recall that a sample is PART of a population. If our sample size is very large then we have a reasonable good estimate of the population mean using the sample mean.

  5. Examples.
    Find the mean, median, and mode of the following data representations:
    1. Discrete data. Mary carries out a survey about the number of horses in 7 farms. Here is her raw data.
      3  12  4  6  8  5  6

    2. A group of students are asked how many books they had read in the previos week. The results are represented in a frequency table below.
      Number of books, x 0 1 2 3 4
      Number of students, f 5 11 20 4 5

    3. The bar chart below shows the results of a group of students who are asked how many movies they had watched in the previous week.
    Solutions
    1. We first need to rearrange the data in increasing order. 3 4 5 6 6 8 12.
      The mode is 6 because it has the highest observed frequency.
      The median is 6. 6 is in the "middle" of the ordered distribution.
      The mean = (3+4+5+6+6+8+12) / 7 ≈ 6.29 (3.s.f)
    2. Number of books, x 0 1 2 3 4 sum
      Number of students, f 5 11 20 4 5 45
      (x)(f) 0 11 40 12 20 83

      The mode is 2 because it has the highest frequency.
      There are 45 students. Half of 45 is 22.5 thus the the value (observation) of the 23rd student is the median. To arrive at 23rd student we would have to add up the frequency. We obtain 5+11=16 and 5+11+20=36, that is up to the 16th student 1 book is read. The 17th to 36th student each read 2 books. Thus, the 23rd student must have read 2 books. So the median is 2.
      Mean = Σ(xf) / (Σf) = sum of products / sum of frequencies.
      The sums are in blue above,
      Mean = 83/45 ≈ 1.84 (3.s.f)

    3. The mode is the tallest bar. 1 is the answer.
      There are (8+10+6+8) 32 students. Half of 32 is 16. The median is the value between 16th and 17th observation. Again we need to add up frequencies. 8 people purchased no CD. The next 10 people up to 18th observation purchased 1 CD. Thus, the both 16th and 17th observation are just 1 CD. The median is 1.
      Mean = Σ(xf) / (Σf) = sum of products / sum of frequencies.
      Mean = [ (0x8)+(1x10)+(2x6)+(3x8) ] / 32
      ≈ 1.43 (3.s.f)

      Use of GDC.
      [STAT] Select 1:EDIT [ENTER]
      {enter your data x into the column under L1 and frequencies into another column under L2}

      Once finished then press [2nd][MODE] for quit. Press [STAT] again select CALC 1:1-Var Stats [Enter] You will then see this in your screen:
      1-Var Stats {blinking cursor}
      Enter [2nd][1] [,] [2nd][2] for "L1,L2" and press [ENTER].

      Recall that median is Q2.

      Note that the order here is important. You should always enter the list than contains x first follow by comma (which is the command for multiplication in this little routine) and then your frequency list.

  6. Summary:
    1. If the total number of observations n then the value of the (n/2)+(1/2)th onservation is the median.
    2. If n is 15 then the value of (15/2)+(1/2)th=8th observation is the median.
    3. If n is 6 then the value of (6/2)+(1/2) = 3.5 then the median is the value that lies between 3rd and 4th observations.
      Advantages Disadvantages
    Mean
    1. Easy to calculate.
    2. It make uses of the complete data.
    3. It can be used in further statistics like standard deviation.
    It can be misleading if the data contains abnormally high or low value.
    Median
    1. It has an easy interpretation.
    2. Unaffected by very high or low values.
    1. It has little usage in other statistics.
    2. In grouped data, this has to be estimated from a cumulative frequency curve.
    3. It is not a good summary of the data if the size is small or the distibution pattern is odd
    Mode
    1. It has an easy interpretation.
    2. Unaffected by very high or low values.
    3. A useful statistics for good manufacturers and sellers. It tells them the hottest selling item(s).
    1. It has little usage in other statistics.
    2. In grouped data, the mode cannot be determined with accuracy (refer to above examples.)
    3. It is not a good summary of the data if the size is small or the distibution pattern is odd

Exercises.