Scatter Diagram

Aim: By the end of this chapter, you will be able to read a scatter diagram.

  1. Scatter Diagram is usualy used as the initial visualization aid of two sets of data. These data should be ideally continuous data. The purpose of this visualization is to find out whether or not these data are related. If yes, then we would like to know the form and strength of the relationship.

  2. Scatter diagram is often then connected to fitting a best fit line to these data. In this course, we will fit the line by inspection (by looking) and will not go through the mechanic of calculating THE best fit line.

  3. The qualitative descriptors are direction of the relationship, the form of the relationship and the strength of the relationship.

  4. Let us look at an example.
    We have two sets of continuous data, height and weight. We could then plot these data either of having height on the horizontal or on the vertical axis. Both possibilities are shown above. Usually, we will have the independent variable on the horizontal axis and the dependent variables on the vertical axis. In a chemical experiment, the time is the independent variable and is found on the horizontal axis and the rate of reaction the dependent variable is on the vertical axis.

  5. No matter how we draw it, our example shows that as height increases the weight also tends to increase. Thus, height and weight exhibit a positive association (or relationship). Note that a scatter diagram cannot show us that one variable causes another variable to happen. Scatter diagram cannot show causality. It only shows how two continuous data are related to each other. The relationship will have more predictive power if we have more data on the diagram.

  6. This scatter diagram however exhibits negative association between price and quantity purchased. Two factors are negatively associated when one increases means the other tends to decrease and vice versa.

  7. Both examples above appear to have linear form. We say that height and weight exhibit a positive linear association. Whereas, price and quantity purchased appear to have a negative linear association in this scatter diagram.

  8. Beyond linear association, data can also exhibit non-linear association. Two examples are given here. Radiaoactivity of an element and time in this diagram exhibits an exponential (decay) association. The other diagram shows that level of pollution and economic development have a quadratic association.

  9. The strength of the relationship gives us an indication how closely the points in the scatter diagram fit a straight line or a relevant curve. The measure of the strength of a linear relationship is called the correlation coefficient, r.
    The above examples are given for positve association. You should be able to figure out the diagrams for negative association.

  10. Let us return to our initial example with height and weight. The scatter diagram exhibits a strong positive linear association. We could draw a best fit (linear) line through this set of points. The linear best fit line must pass through the point that has both means as its coordinates. In our example, the mean height is 175.1 cm and the mean weight is 57.2 kg. Thus, the best fit linear line must pass through (175.1, 57.2). The scatter diagram here shows the best fit linear line in black and the grey point is (175.1, 57.2).