Aim: By the end of this chapter, you will
be able to read a scatter diagram.
- Scatter Diagram is usualy used as the initial visualization aid of two
sets of data. These data should be ideally continuous data. The purpose
of this visualization is to find out whether or not these data are related.
If yes, then we would like to know the form and strength of the relationship.
- Scatter diagram is often then connected to fitting a best fit line to these
data. In this course, we will fit the line by inspection (by looking) and
will not go through the mechanic of calculating THE best fit line.
- The qualitative descriptors are direction of the relationship, the
form of the relationship and the strength of the relationship.
- Let us look at an example.
We have two sets of continuous data, height and weight. We could then plot
these data either of having height on the horizontal or on the vertical axis.
Both possibilities are shown above. Usually, we will have the independent
variable on the horizontal axis and the dependent variables on the vertical
axis. In a chemical experiment, the time is the independent variable and is
found on the horizontal axis and the rate of reaction the dependent variable
is on the vertical axis.
- No matter how we draw it, our example shows that as height increases the
weight also tends to increase. Thus, height and weight exhibit a positive
association (or relationship). Note that a scatter diagram cannot show
us that one variable causes another variable to happen. Scatter diagram cannot
show causality. It only shows how two continuous data are related to each
other. The relationship will have more predictive power if we have more data
on the diagram.
- This scatter diagram
however exhibits negative association between price and quantity purchased.
Two factors are negatively associated when one increases means the other tends
to decrease and vice versa.
- Both examples above appear to have linear form. We say that height
and weight exhibit a positive linear association. Whereas, price and
quantity purchased appear to have a negative linear association in
this scatter diagram.
- Beyond linear association,
data can also exhibit non-linear association. Two examples are given here.
Radiaoactivity of an element and time in this diagram exhibits an exponential
(decay) association. The other diagram shows that level of pollution and economic
development have a quadratic association.
- The strength of the relationship gives us an indication
how closely the points in the scatter diagram fit a straight line or a relevant
curve. The measure of the strength of a linear relationship is called the
correlation coefficient, r.
The above examples are given for positve association. You should be able to
figure out the diagrams for negative association.
- Let us return to our
initial example with height and weight. The scatter diagram exhibits a strong
positive linear association. We could draw a best fit (linear) line through
this set of points. The linear best fit line must
pass through the point that has both means as its coordinates.
In our example, the mean height is 175.1 cm and the mean weight is 57.2 kg.
Thus, the best fit linear line must pass through (175.1, 57.2). The scatter
diagram here shows the best fit linear line in black and the grey point is