Statistics: Introducing DATA

Aim:To enable candidates to use math to analyse random events, to introduce concepts that will prove useful in further studies of probability and inferential statistics, and to develop techniques to describe and analyse sets of data. There is an emphasis on understanding and interpretation of results.

## Introduction.

1. Statistics are just numbers.
2. It summarizes results of a study (a collection of data).
3. It is a scientific discipline - a way people learn as they make observations.

## Definitions.

1. A sample is a collection of observations.
2. A population is the collection of potential observation of which the sample is a part. It is the set of all possible outcomes of a chance/random experiment.
The student population in LPCUWC is the collection of every single student who attends LPCUWC. However, if you made a survey by selecting 50 students by chance then this 50 students from LPCUWC form a sample.
3. Statistical inferences extrapolate from a sample to the population. That is from what you learned through your sample you will say something about the population.
4. A prediction is an inference about the next sample observation that has not been collected yet.
5. An object of interest that is capable of generating chance/random outcomes is called a random variable. A random variable thus describes possible outcomes from some random experiment. It is usually denoted by a capital alphabet like X.
Example 1: If our interest is the sum of the numbers showing on two dice. We will call our interest X. X could be 2, 3, 4, ..., 12. Thus, the random variable X is the sum of the numbers showing on two dice.
6. Chance/random outcomes are commonly called data.
Each outcome in the example 1 above represents an observed data and is frequently represented by a small x. So x1 = 2, x2 = 3, ..., x11=12.
Question: Why is 1 not a possible outcome for your interest X above?

## Classification of data.

1. The sum of the numbers showing on two dice (X) are limited to 2, 3, 4, ..., 12
2. We called X a discrete random variable. The data is called discrete data because it is impossible for example to observe an outcome that is 2.5 or 11.99.
Examples of discrete random variable.
1. The number of chicken in Mr. MacDonald Farm. 2. The shoe sizes of students in this class.
3. The number of books in your library.

3. On the other hand, the height of students in a class is a continuous random variable because the outcome can take any value from say 10 cm to 250 cm depending on the precision of measurement. Examples of continuous random variable.
1. The speed of motorbikes passing a checkpoint.
2. The weight of chicken in Mr. MacDonald Farm.
3. The time taken for a librarian to checkout a book.

4. Some data are called categorical data. Outcomes are some predefined categories. Data collected for a survey question that requires only a YES or a NO answer are categorical. Other examples.
1. The country origin of a UWC. Data could come from {UK, Italy, Singapore, Canada, US, Hong Kong SAR (China), Venezuela, Norway, Swaziland, India}
2. The hair color of students in this class. The data could come from {black, brown, red, blonde, white}
3. The level of education of people in this institution. The data could from {high school, college, university, graduate school, postgraduate school}