Module 2 (M2) - Handling data - Scatter graphs

Part ofMathsM2: Handling data

Scatter graphs

Scatter graphs are a good way of displaying two sets of data to see if there is a , or connection.

Example

The number of umbrellas sold and the amount of rainfall on 9 days is shown on the scatter graph and in the table.

Scatter graph of umbrellas sold vs rainfall
Umbrellas sold11025013247815
Rainfall (mm)324005611

The graph shows that there is a positive correlation between the number of umbrellas sold and the amount of rainfall. On days with higher rainfall, there were a larger number of umbrellas sold.

Back to top

Types of correlation

Graphs can either have positive correlation, negative correlation or no correlation.

Positive correlation

Positive correlation means as one variable increases, so does the other variable. They have a positive connection.

Temperature vs ice creams sold graph

Negative correlation

Negative correlation means as one variable increases, the other variable decreases. They have a negative connection.

Scatter graph of temperature vs number of coats sold

No correlation

No correlation means there is no connection between the two variables.

House number vs IQ graph
Back to top

Outliers on scatter graphs

Scatter plots often have a pattern. We call a data point an outlier if it doesn't fit the pattern.

The scatter graph below shows data for students on a hiking trip.

Each student is carrying a backpack and each point on the graph represents a student.

Scatter graph of Student weight (kg) vs Backpack weight (kg)

Two of the points don't fit the pattern very well.

These points have been labeled Brad and Sharon, which are the names of the students they represent.

Sharon could be considered an outlier because she is carrying a much heavier backpack than the pattern predicts.

Brad could be considered an outlier because he is carrying a much lighter backpack than the pattern predicts.

Back to top

Scatter graphs - Lines of best fit

A line of best fit is a sensible line that goes as centrally as possible through the coordinates plotted. It should also follow the same steepness of the crosses.

Positive and negative correlation lines of best fit on two graphs

The line of best fit for the scatter graph would look like this:

Umbrellas vs rainfall graph
Back to top

Interpolation and extrapolation

From the diagram above, we can estimate how many umbrellas would be sold for different amounts of rainfall. For example, how many umbrellas would be sold if there was 3 mm of rainfall? What if there was 10 mm of rainfall?

To estimate the number sold for 3 mm of rainfall, we use a process called interpolation. The value of 3 mm is within the range of data values that were used to draw the scatter graph

Find where 3 mm of rainfall is on the graph. Use this line to estimate the number of umbrellas sold by going across from 3 mm and then down to find the number of umbrellas.

umbrellas vs rainfall graph

An estimated 20 umbrellas would be sold if there was 3 mm of rainfall.

If there was 10 mm of rainfall, we could extend the graph and the line of best fit to read off the number of umbrellas sold. This gives a value of approximately 80 umbrellas sold.

This process is called extrapolation, because the value we are using is outside the range of data used to draw the scatter graph. Since 10 mm is much higher than the highest rainfall recorded, we cannot assume that the line of best fit would still follow the pattern when the rainfall is 10 mm, so the value of 80 umbrellas is not a reliable estimate.

Back to top

Question

The scatter graph shows the relationship between the temperature on a given day and the number of ice creams sold in a café.

A line of best fit has been drawn.

Use the line of best fit to predict how many ice creams will be sold on a day when the temperature is 29°C.

An image of a scatter diagram. A vertical axis has been drawn to the left. The axis has been labelled with numbers. The values are increasing in units of twenty from zero to eighty. It is subdivided into intervals of two. The axis has also been labelled, ice cream sales. A false origin has been used on the horizontal axis. The horizontal axis has been labelled with numbers. The values are increasing in units of two from twenty two to thirty two. It is subdivided into intervals of zero point two. The axis has also been labelled, temperature, measured in degrees Celsius. Ten data points have been plotted on the axes with crosses. They have co-ordinates; twenty two, comma, six. Twenty two, comma, twelve. Twenty four, comma, twenty. Twenty five, comma, thirty four. Twenty six, comma, twenty six. Twenty eight, comma, forty eight. Twenty eight, comma, sixty two. Thirty, comma, sixty four. Thirty one, comma, sixty six, and thirty two, comma, seventy four. Written above: temperature and ice cream sales. A line of best fit has been drawn passing through co-ordinates; twenty two, comma, ten, and thirty two, comma, seventy nine. The line of best fit is coloured orange.

Question

The scatter diagram shows the relationship between the months a competitor trained for and the time it took them to complete a marathon.

  1. What type of correlation does the graph show?
  2. Using the line of best fit, predict the time for someone who has been training for 16 months.
An image of a scatter diagram. A vertical axis has been drawn to the left. A false origin has been used on the vertical axis. The axis has been labelled with numbers. The values are increasing in units of fifty from one hundred and fifty to three hundred. It is subdivided into intervals of ten. The axis has also been labelled, time taken, measured in minutes. The horizontal axis has been labelled with numbers. The values are increasing in units of four from zero to twenty. It is subdivided into intervals of zero point eight. The axis has also been labelled, months of training. Ten data points have been plotted on the axes with crosses. They have co-ordinates; two, comma, three hundred. Four comma, two hundred and sixty five. Four, comma, two hundred and eighty five. Eight, comma, two hundred and sixty. Ten, comma, two hundred and forty five. Twelve, comma, one hundred and ninety. Twelve, comma, two hundred and twenty. Sixteen, comma, two hundred and ninety five. Eighteen, comma, one hundred and sixty five, and twenty comma, one hundred and sixty. Written above: marathon training and completion times."A line of best fit has been drawn passing through co-ordinates; two point four, comma, three hundred, and twenty, comma, one hundred and forty five. The line of best fit is coloured orange. The value one hundred and eighty has been marked on the vertical axis. A horizontal dashed line has been drawn from this value to the line of best fit. At the point where this line intersects the line of best fit, a vertical dashed line has been drawn. It has been extended to the horizontal axis. At the point where it meets the axis, the value is labelled, sixteen. The horizontal line and the number, one hundred and eighty, are coloured blue. The vertical line and the number, sixteen, are coloured pink.

Back to top

Test yourself

Back to top

More on M2: Handling data

Find out more by working through a topic