Measures of Dispersion

AIMS:

  1. Expose to difference measures of dispersion.
  2. Understand the concepts of variance and standard deviation.
  3. Able to calculate variance and standard deviation.

  1. Mean is a good summary of our data but it does not tell us how the data grouped around the mean. Here, we will learn how to measure dispersion or how concentrated is the distribution of the data.
  2. Let us look at an example.
    Height (cm) Team A Team B Team C
    190 2 5 0
    180 2 0 4
    170 2 0 2
    160 2 0 4
    150 2 5 0
    Each team has the mean height of 170cm but we will agree that these teams have very different distribution of height.

  3. One simple way to take into account of "concentration of data" is to use range. The range is defined to be the difference between the greatest and smallest data/observations in the distribution. Refering back to the above example, Team A and Team B have the same range although we can see that their distributions are different.

  4. Another method is to consider the interquartile range.
    Team A: first quartile= 160 cm, third quartile=180 and the interquartile range is 20.
    Team B: first quartile= 150 cm, third quartile=190 and the interquartile range is 40.
    Team C: first quartile= 160 cm, third quartile=180 and the interquartile range is 20.
    These interquartile ranges provide pretty good descriptions of these data but quartiles are difficult to estimate. Moreover, estimation of quartiles using cumulative frequency curve requires us to first construct the curve and second the estimation is not very accurate. The other problem is that interquartile range basically disregard 50% of the available data. This explains why Team A and Team B have the same "dispersion measure" although we know that their distributions are different.

  5. A standard way to take into account all the data is to use standard deviation as a measure of dispersion. Aims:
    • You should understand this concept.
    • Able to calculate the standard deviation and variance by hand.
    • Able to use your calculator to obtain both standard deviation and variance.

  6. We will learn the relevant concepts using an example. Eight hamsters were fed a certain diet. At the end of the month, these hamsters were weighed. Weight changes in grammes are recorded here. Negative sign represents weight loss.
    -1, 5, 20, 15, 17, 11, 9, 36
    We shall first order these data into
    -1, 5, 9, 11, 15, 17, 20, 36.
    The mean can be easily calculated to be 14.

  7. One measure of despersion is called mean deviation. Mean deviation as its name suggests is the mean of the deviation of an observation from the mean. Let "d= (x - )" be the deviation from the mean and "N" the total numbers of observations then
    Mean deviation
    =
    Σd
      N
    =
    d1+ d2 +d3 + ... + dn
    N
    where d1 is the deviation of the first data point from the mean and dk is the deviation of the last data point from the mean.
    Weight Change, x -1 5 9 11 15 17 20 36  
    Deviation from mean , d -15 -9 -5 -3 1 3 6 22 Σd= 0
    Mean deviation = Σd
      N
      = 0/8
      = 0

  8. Here is a plot of above deviations from the mean.

  9. The mean of deviation above is 0. In fact, the mean of deviation is always 0 when we have a genuine random data because half of the data will be above the mean and the other half beneath it. If the data is not completely random then the mean of deviation is not zero.

  10. The conclusion here is obvious. Mean of deviation is not a good measure of dispersion if we have a random data.

  11. The problem with the mean deviation is that when the data is genuinely random half of the data carry negative sign (see the above diagram). To avoid this negative sign, we could square these deviations.

    This new measure of dispersion is called variance.
    The square root of variance is called standard deviation.
    Variance, s2
    =
    Σ(x - )2

      N
    =
    (x1 - )2+(x2 - )2+(x3 - )2+...+(xn- )2

      N
    Standard deviation, s
    =
    √(variance)
    =
      Σ(x - )2

        N
    =
    (x1 - )2+(x2 - )2+(x3 - )2+...+(xn- )2

      N
    Note: A sample variance is denoted by s2 and population variance by σ2. In this course, we will usually deal only with population. Thus, when using a calculator always report the population variance and standard deviation for this course.
    If you are using TI, for example,
    • then standard deviation for IB purpose is σ
    • and variance = σ2.

    If you are using a CASIO then
    • standard deviation for IB purpose is σn and
    • variance =σn2

  12. Thus variance is basically the mean of the squared deviations from the mean. Standard deviation could informally be though of as the mean dispersion from the mean.
  13. Calculating the variance and standard deviation using the above example.
    Weight Change, x -1 5 9 11 15 17 20 36  
    Deviation from mean , d -15 -9 -5 -3 1 3 6 22 Σd= 0
    Squared of deviation, d2 225 81 25 9 1 9 36 484 Σd2= 870
    Variance =   Σ(x - )2

            N
      =      Σd2

            N
      = 870/8
      = 108.75

    Standard deviation = √(870/8)
    Standard deviation ≈ 10.4 (3 s.f)

  14. Exploration 1.
    Calculate the variance and standard deviation for the following test scores in a Math class.
    86, 73, 74, 66, 60, 72, 62, 75, 81.
    You may want to start by constructing a table like above. First calculate the mean.
    Then calculate the deviation from the mean.
    Follow by square of the deviation.

    variance≈63.4 (3 sf) & standard deviation≈ 7.96 (3 s.f.)

    Step by step practice for calculating variance and standard deviation using table.
    Please enable Macro to displace the EXCEL file properly. Download the Excel file.

  15. Let us know look at another example. The table below shows the number of matches in a box of matches.
    Number of matches, x 47 48 49 50 51 52
    Frequency, f 3 6 11 19 12 9
    1. We can easily calculate the mean by using
      Σ ( fx )
      Σ f
    2. = [3(47) + 6(48)+ 11(49)+ 19(50)+ 12(51)+ 9(52) ]/ 60.
      ≈ 49.9667
    3. x 47 48 49 50 51 52  
      f 3 6 11 19 12 9 Σ f = 60
      deviation, d -2.96667 -1.96667 -0.96667 0.03333 1.03333 2.03333  
      square of deviation, d2 8.80113 3.86779 0.93445 0.00111 1.06777 4.13443  
      f(d2) 26.40339 23.20674 10.27895 0.02109 12.81324 37.20987 Σ f(d2) = 109.93328
      Variance = 109.93328/60
      Variance ≈ 1.83 (3 s.f)
      Standard deviation ≈ 1.35 (3 s.f)
      In order to obtain accurate answers to 3 significant figures we have kept five decimal places throughout this calculation.

  16. Exploration 2.
    These are the heights of a group of students in a grade school.
    Height (cm), x 146 148 150 152 156 158
    Frequency, f 2 1 3 1 5 3
    1. Find the mean height.

    2. Calculate the standard deviation of the students' heights.

    mean ≈ 153 cm (3 s.f) & standard deviation ≈ 4.25 cm (3 s.f).

    Please enable macro to display the page inside this table.

    Step by step practice for calculating variance and standard deviation with frequency using table.
    Please enable Macro to displace the EXCEL file properly. Download the Excel file.

  17. Summary

    Given a set of data x1, x2, x3, x4, ..., xn and N represents the total numbers of observations/data. Then

    Variance, s2
    =
    Σ(x - )2

      N
    =
    (x1 - )2+(x2 - )2+(x3 - )2+...+(xn- )2

      N
    Standard deviation, s
    =
    (variance)
    =
      Σ(x - )2

        N
    =
    (x1 - )2+(x2 - )2+(x3 - )2+...+(xn- )2

      N

     

    Given a set of data such that

    data
    x1
    x2
    x3
    ...
    xn
    frequency
    f1
    f2
    f3
    ...
    fn

    Then

    Variance, s2
    =
    Σf(x - )2

      N
    =
    f1(x1 - )2+f2(x2 - )2+f3(x3 - )2+...+fn(xn- )2

      N
    Standard deviation, s
    =
    √ (variance)
    =
      Σf(x - )2

        N
    =
    f1(x1 - )2+f2(x2 - )2+f3(x3 - )2+...+fn(xn- )2

      N

     

     

Exercises.