GCE A Level Maths 9709
SMIYL
April 2023
5.1 Representation of Data
In this topic we will learn how to:
• draw and interpret histograms
Histogram
A histogram is used to represent grouped continuous data. However,
it does not show all the data points. It consists of bars of different
widths joined together. There are no gaps between the bars. This
is the difference between a bar chart and a histogram. Data for
a histogram is usually displayed in the form of a class (a range of
values), with its respective frequency. We can use that information
to help us calculate the information we need to draw a histogram:
• class width
• frequency density
To calculate the class width, we use the formula,
class width = upper bound − lower bound
To calculate the frequency density, we use the formula,
frequency
frequency density =
class width
Measures of Central Tendencies for a Histogram
Mode
The modal class is the class with the highest frequency i.e the highest
bar on the histogram.
Median
1
To calculate the median, we use the formula,
1
q2 = n
2
The formula above, gives us the position of the median, which is
denoted by q2 and n represents the sample size. We can use that
number to find the class which contains the median.
Note: Since the data is continuous we cannot find the exact value of
the median, but we can find the median class.
Lower quartile
To calculate the lower quartile, we use the formula,
1
q1 = n
4
Where q1 represents the lower quartile. This gives us the position
of the lower quartile. Again since the data is continuous, we can
only find the class with the lower quartile. However, we can use that
class to find the maximum and minimum values of the lower quartile,
denoted by the upper and lower bounds, respectively.
Upper quartile
To calculate the upper quartile, we use the formula,
3
q3 = n
4
Where q3 represents the upper quartile. This gives us the position
of the upper quartile. Again since the data is continuous, we can
only find the class with the upper quartile. However, we can use that
class to find the maximum and minimum values of the lower quartile,
denoted by the upper and lower bounds, respectively.
Interquartile Range
To calculate the interquartile range, we use the formula,
IQR = q3 − q1
Where IQR represents the interquartile range, q3 represents the upper
quartile, q1 represents the lower quartile.
Mean
2
To calculate the mean when data is displayed in the form of a his-
togram, we need to first find the mid interval. This is the middle
value for each class i.e the midpoint. We use the formula,
Σxf
x=
Σf
Where x represents mean, x represents the mid-interval, f represents
the frequency.
Variance
To calculate the variance, we use the formula,
Σx2 f
σ2 = − x2
Σf
Where σ 2 represents variance, x represents the mid interval, f repre-
sents the frequency, x represents mean.
Standard Deviation
Standard deviation is the square root of variance. Therefore, the
formula for standard deviation is,
s
Σx2 f
σ= − x2
Σf
Where σ represents standard deviation, x represents the mid interval,
f represents the frequency, x represents mean.
Let’s look at some past paper questions.
1. The times taken by 200 players to solve a computer puzzle are summarised
in the following table. (9709/51/M/J/21 number 5)
Time (t seconds) 0 ≤ t < 10 10 ≤ t < 20 20 ≤ t < 40 40 ≤ t < 60 60 ≤ t < 100
Number of players 16 54 78 32 20
(a) Draw a histogram to represent this information.
To be able to draw a histogram, we need to first find the
class width and the frequency density,
class width = upper bound − lower bound
frequency
frequency density =
class width
3
Class Width 10 10 20 20 40
Frequency Density 1.6 5.4 3.9 1.6 0.5
Plot the classes on the x-axis, ensuring that each bar has the
corresponding class width. Then plot the frequency density
on the y-axis. Label the x-axis with the class name ’Time (t
seconds)’. Label the y-axis with ’frequency density’.
5
Frequency density
10 20 40 60 100
Time (t seconds)
(b) Calculate an estimate for the mean time taken by these 200 players.
Time (t seconds) 0 ≤ t < 10 10 ≤ t < 20 20 ≤ t < 40 40 ≤ t < 60 60 ≤ t < 100
Number of players 16 54 78 32 20
To find the mean, we need to first find the mid intervals,
Mid Interval 5 15 30 50 80
Number of players 16 54 78 32 20
The formula for calculating mean is,
Σxf
x=
Σf
4
Substitute into the formula,
5(16) + 15(54) + 30(78) + 50(32) + 80(20)
x=
200
6 430
x=
200
x = 32.15
Therefore, the final answer is,
x = 32.15
(c) Find the greatest possible value of the interquartile range of these
times.
Time (t seconds) 0 ≤ t < 10 10 ≤ t < 20 20 ≤ t < 40 40 ≤ t < 60 60 ≤ t < 100
Number of players 16 54 78 32 20
The formula for interquartile range is,
IQR = q3 − q1
To find the greatest possible value of the interquartile range,
we need to find the maximum value of the upper quartile
and the minimum value of the lower quartile,
3
q3 = n
4
3
q3 = (200)
4
q3 = 150
When we add up the frequencies, we notice that 150 lies in
the class,
40 ≤ t < 60
Therefore, the maximum value in that class is 60, so the
maximum value of the upper quartile,
q3 = 60
5
Let’s find the minimum value of the lower quartile,
1
q1 = n
4
1
q1 = (200)
4
q1 = 50
When we add up the frequencies, we notice that 50 lies in
the class,
10 ≤ t < 20
Therefore, the minimum value in that class is 10, so the
minimum value of the lower quartile,
q1 = 10
Therefore, the greatest possible value of the interquartile
range is,
IQR = 60 − 10
IQR = 50
Therefore, the final answer is,
IQR = 50
2. The numbers of chocolate bars sold per day in a cinema over a period of
100 days are summarised in the following table. (9709/51/M/J/20 number
7)
No. of chocolate bars sold 1 − 10 11 − 15 16 − 30 31 − 50 51 − 60
No. of days 18 24 30 20 8
(a) Draw a histogram to represent this information.
You’ll notice that there are gaps between our classes. If we
were to draw a histogram with these classes we would have
gaps between our bars, and this would cease to be a his-
togram. To fix this we have to do continuity correction. For
example, if the data is continuous, the number 10 represents
any number that lies between 9.5 and 10.5. To apply this to
our classes, subtract 0.5 from the lower bounds and add 0.5
to the upper bounds, so that the classes represent the whole
range of values,
6
No. of chocolate bars sold 0.5 − 10.5 10.5 − 15.5 15.5 − 30.5 30.5 − 50.5 50.5 − 60.5
No. of days 18 24 30 20 8
Now let’s use the classes after continuity correction to find
the class width and frequency density,
class width = upper bound − lower bound
frequency
frequency density =
class width
Class Width 10 5 15 20 10
Frequency Density 1.8 4.8 2.0 1.0 0.8
Plot the classes on the x-axis, ensuring that each bar has the
corresponding class width. Then plot the frequency density
on the y-axis. Label the x-axis with the class name ’Number
of chocolate bars sold’. Label the y-axis with ’frequency
density’.
4
Frequency density
0.5 10.5 15.5 30.5 50.5 60.5
Number of chocolate bars sold
(b) What is the greatest possible value of the interquartile data?
No. of chocolate bars sold 1 − 10 11 − 15 16 − 30 31 − 50 51 − 60
No. of days 18 24 30 20 8
7
The formula for interquartile range is,
IQR = q3 − q1
To find the greatest possible value of the interquartile range,
we need to find the maximum value of the upper quartile
and the minimum value of the lower quartile,
3
q3 = n
4
3
q3 = (100)
4
q3 = 75
When we add up the frequencies, we notice that 75 lies in
the class,
31 − 50
Therefore, the maximum value in that class is 50, so the
maximum value of the upper quartile,
q3 = 50
Let’s find the minimum value of the lower quartile,
1
q1 = n
4
1
q1 = (100)
4
q1 = 25
When we add up the frequencies, we notice that 25 lies in
the class,
11 − 15
Therefore, the minimum value in that class is 11, so the
minimum value of the lower quartile,
q1 = 11
Therefore, the greatest possible value of the interquartile
range is,
IQR = 50 − 11
IQR = 39
8
(c) Calculate estimates of the mean and standard deviation of the num-
ber of chocolate bars sold.
No. of chocolate bars sold 1 − 10 11 − 15 16 − 30 31 − 50 51 − 60
No. of days 18 24 30 20 8
To find the mean, we need to first find the mid interval,
Mid Interval 5.5 13 23 40.5 55.5
Number of players 18 24 30 20 8
The formula for calculating mean is,
Σxf
x=
Σf
Substitute into the formula,
5.5(18) + 13(24) + 23(30) + 40.5(20) + 55.5(8)
x=
100
2 355
x=
100
x = 23.55
Therefore, the mean is,
x = 23.55
The formula for standard deviation is,
s
Σx2 f
σ= − x2
Σf
Σx2 f
Let’s start by finding Σf ,
Σx2 f 5.52 (18) + 132 (24) + 232 (30) + 40.52 (20) + 55.52 (8)
=
Σf 100
Σx2 f 77 917.5
=
Σf 100
9
Let’s substitute into the formula,
s
Σx2 f
σ= − x2
Σf
r
77 917.5
σ= − (23.55)2
100
σ = 14.98574322
σ = 15.0
Therefore, the final answer is,
x = 23.55asdf σ = 15.0
10