Unit 1 STUDENTS Introduction, Graphs and Descriptive
Unit 1 STUDENTS Introduction, Graphs and Descriptive
PowerPoint slides from Lombard C, van der Merwe L, Kele T & Mouton A also used
• Statistics is the science of data collection,
organising and interpreting numerical facts.
• Gaining information from numerical data or
making sense of data.
• Descriptive Statistics
– Organising and summarising data – condense large
volumes of data into a few summary measures.
• Statistical inference
– Generalises subset data findings to the broader
universe.
2
• Approach for the Statistical process
Research process
becoming a cycle
PLANNING
DECISION- DATA
MAKING COLLECTION
Primary and
Descriptive Statistics
secondary
Statistical inference
sources
EDITING
CONCLUSIONS
and CODING
ANALYSIS
3
• Basic concepts of Statistics
Population: ______________________________________________
Sample : __________________________________
Parameter : __________________________________
Statistic : ________________________________________
Variable : __________________________________
Data : _________________________________________ 4
INTRO. TO STATISTICS
Basic concepts of Statistics
Direct observation/experimentation –
laboratories/cars passing at a traffic robot
• Sampling methods
– Probability sampling
• Elements have the same probability of being
selected.
• Unbiased inference about the population.
– Non-probability sampling
• Element from the population are not selected
random.
• The elements are selected without knowing the
probability of being selected as part of sample.
• We can not use results of these samples to make
conclusions about the population.
6
• Sampling methods – Probability sampling
– Simple random sampling
• Number the elements of the population from
1 to N.
• Select a random starting point in the random table.
• From the starting point read systematically in any
direction.
• Divide the digits in the random table into groups
with the same number of digits as the number of
digits in the population size (N).
• Find n random numbers from 1 to N – no
duplicates.
• Identify each of the chosen random numbers in the
population. 7
INTRO. TO STATISTICS
Sampling Methods
(i) Simple random sampling
Example
Suppose there are 210 people on the electoral roll in a retirement village. Select a
random sample of 5 people from the village using the table of random numbers
given below. Start at the top left corner and read the random numbers vertically
from top to bottom. Circle (on the table) the random numbers.
Row
1
58 22 37 15 22 41 78 47 14 51 94 38 69 47 58 04 29 85 48 58
2
45 16 80 64 71 01 32 97 73 05 04 33 96 16 42 50 91 30 86 66
3
19 14 68 57 27 40 08 16 41 89 75 46 83 12 20 50 03 73 77 24
4
02 49 74 33 14 12 79 32 55 23 63 66 78 83 60 61 73 84 85 52
5
17 87 57 75 01 11 81 00 86 28 26 23 68 11 76 65 64 19 47 83
1
61 31 37 15 22 21 78 47 14 51 94 38 69 47 58 04 29 85 48 58
2
44 06 80 64 71 41 32 97 73 05 04 33 96 16 42 50 91 30 86 66
3
19 54 68 57 72 10 08 16 41 89 75 46 84 12 27 50 03 73 77 24
4
32 48 74 33 24 22 79 32 55 23 63 66 78 83 60 61 73 84 85 52
5
27 87 57 75 35 11 86 00 86 28 26 23 68 11 76 65 64 19 47 83
n2 =
Drivers 90 (N2)
n3 =
Technicians 60 (N3)
n4 =
Cleaners 40 (N4)
Total N = 380 n = 10
INTRO. TO STATISTICS
Sampling Methods
(iii) Systematic sampling
N = 56; n = 6
Interval: k = = =
N = 85; n = 5
Interval: k = = =
15
• Sampling methods – Non-probability sampling
– Quota sampling
• Population divided into sub-classes according to a
certain characteristic.
• A non-sampling method is used to select a sample
from each stratum.
• It is a technique of convenience.
• Researcher attempts to fill the quota quickly.
• Sample is not representative of the population.
16
• Sampling methods – Non-probability sampling
– Judgement sampling
• Elements from the population are chosen by the
judgement of the researcher.
• The probability that an element will be chosen cannot
be calculated.
• Sample is biased.
17
• Sampling methods – Non-probability sampling
– Snowball sampling
• Is used where sampling units are difficult to locate
and identify.
• Find a person who fits the profile of characteristics
of the study.
• From this person obtain names and locations of
others who will fit the profile.
18
DIFFERENT
TYPES OF
DATA
QUANTITATIVE QUALITATIVE
(numerical scale) (categorical)
19
INTRO. TO STATISTICS
Exercises
Exercises:
INTRO. TO STATISTICS
Exercises
Exercises:
UNIT 1 PART B
22
• Need to gain information from data.
• Data must be organised and reduced.
• Descriptive statistics
– Organising data into tables, charts and graphs.
– Numerical calculations.
• Single variable data
• Raw data
– Collected data before it is grouped or ranked.
23
Organising and graphing qualitative data in a
frequency distribution table.
Example:
The data below show the gender of 50 employees and the
department in which they work at ABC Ltd.
HR – Human resources
Emp. no. Gender Dept. Emp. no. Gender Mark. …..
Dept– Marketing
1 M HR 6 M Fin. – Finance
Fin. …..
M – Male
2 F Mark. 7 M Mark. …..
F – Female
3 M Fin. 8 M Fin. …..
4 F HR 9 F HR …..
5 F Fin. 10 F Fin. ….. 24
HR Marketing Finance
M │ │ │││
F ││ │ ││
20%
28% Human resources
Marketing
Finance
52%
26
Bar graph of employees by department
Employees at ABC
30 26
Number of workers
25
20
14
15 10
10
5
0
Human Marketing Finance
resources
27
Organising and graphing qualitative data in a frequency
distribution table.
M 4 10 5 19
F 10 16 5 31
Total 14 26 10 50
28
Bar graph of employees by gender and department
Employees at ABC
Human
20
resources
Number of workers
15
Marketing
10
Finance
5
0
Male Female
29
Bar graph of employees by department and gender
Employees at ABC
Number of workers
20
15 Male
10
Female
5
0
Human Marketing Finance
resources
30
Stacked bar graphs
HR Marketing Finance Total
M 4 10 5 19
F 10 16 5 31
Total 14 26 10 50
Number of workers
35 Finance 30
25
Number of workers
30 Female
20
25 15
Marketing
20 10 Male
15 5
10 0
Human
5 resources Human Marketing Finance
0 resources 31
Male Female
Organising and graphing quantitative data in a frequency
distribution table.
• Frequency table consists of a number of classes and each
observation is counted and recorded as the frequency of
the class.
• If n observations need to be classified into a frequency
table, determine:
8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9 19
14 17 9 3 3 16 8 2 33
Step 1 : Sort observations in ascending order
2 3 3 5 6 6 7 8 8 8
9 9 9 9 9 10 10 10 11 11
11 11 11 11 11 11 12 12 12 13
13 14 14 14 14 15 15 16 16 16
17 17 18 18 18 19 20 22
35
Frequency distribution table
Time (x) f %f F
[ ; ) 6.3%
[ ; ) 8.3%
[ ; ) 22.9%
[ ; ) 27.1%
[ ; ) 18.8%
[ ; ) 12.5%
[ ; ) 4.2%
Total n = 48
36
Histograms
37
Histograms
38
Frequency polygon
39
Frequency polygons
9
8
6 6
4 4
3
2 2
0 0 0
0 5 10 15 20 25 30
Time
40
Ogive
41
Ogives
42
Ogives Ogive of number of call received
20% of the
hours had at a call centre per hour
more than
17 calls 100
number of hours
per hour. 90
% Cumulative
80
70
80% of the 60
hours had 50
less than 40
30
17 calls 20
per hour. 10
0
2 5 8 11 14 17 20 23
50% of Number
the hoursofhad less
calls
than 12 calls per hour.
43
Unit 1 Part C
44
• Properties to describe numerical data:
– Central tendency
– Dispersion
– Shape
• Measures calculated for:
– Sample data
• Statistics
– Entire population
• Parameters
45
Measures of location
• Arithmetic mean
• Median
• Mode
46
ARITHMETIC MEAN
- This is the most commonly used measure
and is also called the mean.
x i
x= i =1
n Sample size
47
• MEDIAN
– Half the values in data set is smaller than median.
– Half the values in data set is larger than median.
– Order the data from small to large.
• Position of median
– If n is odd:
• The median is the (n+1)/2 th observation.
– If n is even:
• Calculate (n+1)/2
• The median is the average of the values before and
after (n+1)/2.
48
• MODE
– It is the observation in the data set that occurs
the most frequently.
– If no observation(s) repeat(s) then there is no
mode.
49
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:
x i
x= i =1
= = _______
n 9
50
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:
Odd number
−4 −3 2 2 5 5 5 6 8
Median = 5
51
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:
Odd number
−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9
Median = 5
52
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 3
Determine the median of the sample of ten measurements.
•:Order the measurements Even number
53
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 3
Determine the median of the sample of ten measurements.
•:Order the measurements Even number
−4 −3 2 2 3 5 5 5 6 8
1 2 3 4 5 6 7 8 9 10
Median = (3+5)/2 = 4
54
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine the mode of the sample of nine measurements.
•Order the measurements
−4 −3 2 2 5 5 5 6 8
Mode = ______
55
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 2
Determine the mode of the sample of ten measurements.
•Order the measurements
−4 −3 2 2 2 5 5 5 6 8
Mode = 2 and 5
•Multimodal
56
Measures of dispersion/spread
• Range
• Variance
• Standard deviation
• Coefficient of variation
57
• Range
– The range of a set of measurements is the
difference between the largest and smallest
values in the data set.
– Its major advantage is the ease with which it
can be computed.
– Its major shortcoming is its failure to provide
information on the dispersion of the values
between the two end points.
58
• Variance and standard deviation
Determine how far the observations are from their mean.
x − ( x)
1
2 2
sample variance = s 2 = n
n −1
x − n ( x)
2 1 2
samplestandard deviation = s =
n −1
59
• Coefficient of variation
– Measures the standard deviation relative to the
mean.
– It is expressed as a percentage.
– Used to compare samples that are measured in
different units.
s
CV = 100
x
60
Example - Given the following data sets:
The means are the same but the dispersion of Dataset 1
1st: much
-4 larger
-3 than2 the2 dispersion
5 5of Data
5 set 2.
6 8
2nd : 0 1 2 3 3 4 5 5
29
x= 2,9
9
−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9
23
x= 2,9
8 61
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8
Range = Largest value – smallest value = 8 – (−4) = 12
( x)
2
x 2
− 1
n −1
62
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8
2
x 2
− 1
9
s 2
= n
= = 16.61
n −1 9 −1
x
2
2
− 1
9
s = s2 = n
= = 16.61 = 4.08
n −1 9 −1
63
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8
2nd : 0 1 2 3 3 4 5 5
The coefficient of variation of the measurements is
given by:
s 4.08
CV = 100% = 100 =
x 2.9
64
• Quartiles
• Percentiles
• Interquartile range
65
• QUARTILES
– Order data in ascending order.
– Divide data set into four quarters.
66
Example – Given the following data set:
4 6 11 −5 8 3 15 8 −7
Determine Q1 for the sample of nine measurements:
•Order the measurements
−7 −5 3 4 6 8 8 11 15
1 2 3 4 5 6 7 8 9
n +1 9 +1
Position of Q1: = = 2.5th value
4 4
0.5 ( 3rd − 2nd ) = 0.5 3 − ( −5 ) = 4
Therefore Q1 = −5 + ( 4 ) = −1 67
Example – Given the following data set:
4 6 11 −5 8 3 15 8 −7
Determine Q3 for the sample of nine measurements:
•Order the measurements
−7 −5 3 4 6 8 8 11 15
1 2 3 4 5 6 7 8 9
3 ( n + 1) 3 ( 9 + 1)
Position of Q3: = = 7.5th value
4 4
0.5 (8th − 7th ) = 0.5 11 − 8 = 1.5
Interquartile range = Q3 – Q1
Q3 = 9.5
Q1 = −1
Interquartile range = Q3 –Q1
= 9.5 – (-1)
= 10.5 69
• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.
10% 90%
Min P10 Max
80% 20%
Min P80 Max
50% 50%
Min P50 = Q2 Max 70
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine P20 for the sample of nine measurements:
−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9
p ( n + 1) 20 ( 9 + 1) nd
Position of P20 : = =2 value
100 100
71
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine P20 for the sample of nine measurements:
−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9
p ( n + 1) 20 ( 9 + 1)
Position of P20 : = =2nd value
100 100
Therefore P20 = −3
72
Unit 1 Part C
73
• ARITHMETIC MEAN
– Data have been organised in a frequency
distribution table
x=
fx i i
f =n
i
74
• MEDIAN
– Data is given in a frequency table.
– First cumulative frequency ≥ n/2 will indicate the
median class interval.
– Median can also be determined from the ogive.
( ui − li ) ( n2 − Fi −1 )
M e = li +
fi
75
• MODE
– Class interval that has the largest frequency
value will contain the mode.
– Mode can be determined from the histogram or
by using the Mode formula
76
Measures of location for grouped data
Mode formula:
M o = li +
( ui − li )( fi − fi −1 )
2 fi − fi −1 − fi +1
li = lower limit of the modal class interval
ui = upper limit of the modal class interval
fi = frequency of the modal class
fi −1 = frequency of the class preceding the modal interval
fi +1 = frequency of the class following the modal interval
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate the Number of Number of
mean for the sample calls hours fi xi
of the 48 hours: [2–under 5) 3 3,5
determine the class [5–under 8) 4 6,5
midpoints [8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5 78
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour. of
Number Number of
x=
fi xi calls hours fi xi
i f [2–under 5) 3 3,5
[5–under 8) 4 6,5
597
= [8–under 11) 11 9,5
48 [11–under 14) 13 12,5
= 12, 44 [14–under 17) 9 15,5
Average number [17–under 20) 6 18,5
of calls per hour [20–under 23) 2 21,5
is 12,44.
79
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate the Number of Number of
median for the calls hours fi F
sample of the 48: [2–under 5) 3 3
hours: [5–under 8) 4 7
determine the [8–under 11) 11 18
cumulative [11–under 14) 13 31
frequencies [14–under 17) 9 40
n/2 = 48/2 = 24
The first cumulative [17–under 20) 6 46
frequency ≥ 24 [20–under 23) 2 48 80
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Median Number of Number of
( ui − li ) ( n2 − Fi −1 ) calls hours fi F
= li +
fi [2–under 5) 3 3
= 11 +
(14 − 11)( 24 − 18 ) [5–under 8) 4 7
13 [8–under 11) 11 18
= 12,38
[11–under 14) 13 31
50% of the time less [14–under 17) 9 40
than 12,38 or 50% of [17–under 20) 6 46
the time more than
12,38 calls per hour. [20–under 23) 2 48 81
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Number of calls at a call centre
The median can
be determined
48
form the ogive.
Number of hours
40
32
24 n/2 = 48/2 = 24
16
8
0
Median = 12,4
2 5 8 11
A
14 17 20 23 Read at A.
Number of calls
82
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
To calculate the calls hours fi xi
mode for the sample [2–under 5) 3 3,5
of the 48 hours: [5–under 8) 4 6,5
draw the histogram [8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
83
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
10
8
6
4
2
0
2 5 8 11
A 14 17 20 23
Number of calls
84
Relationship between mean, median, and mode
• If a distribution is symmetrical:
– the mean, median and mode are the same
and lie at centre of distribution
• If a distribution is non-symmetrical: Mean
– skewed to the left or to the right Mode
Median
– three measures differ
A positively skewed distribution A negatively skewed distribution
(skewed to the right) (skewed to the left)
86
• Variance and standard deviation
Where:
– f = frequencies of class intervals
– x = class midpoints of class intervals
– n = sample size
87
• Variance and standard deviation
( fx )
2
fx 2
− 1
Population variance = 2
= N
fx ( fx )
2
2
− 1
N
Where:
– f = frequencies of class intervals
– x = class midpoints of class intervals
– N = population size
88
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
89
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
90
n = 48
• Quartiles
• Percentiles
• Interquartile range
91
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate Q1 Number of Number of
for the sample of calls hours fi F
the 48 hours: [2–under 5) 3 3
[5–under 8) 4 7
[8–under 11) 11 18
[11–under 14) 13 31
n/4 = 48/4 = 12 [14–under 17) 9 40
The first cumulative [17–under 20) 6 46
frequency ≥ 12 [20–under 23) 2 48 92
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q1 Number of Number of
= lQ + 1
( uQ
1
− lQ )( 4 − FQ −1 )
1
n calls hours fi F
1
1
fQ [2–under 5) 3 3
= 8+
(11 − 8 )(12 − 7 ) [5–under 8) 4 7
= 9,36
11 [8–under 11) 11 18
[11–under 14) 13 31
25% of the time less [14–under 17) 9 40
than 9,36 or 75% of [17–under 20) 6 46
the time more than
9,36 calls per hour. [20–under 23) 2 48 93
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q3 Number of Number of
calls hours fi F
= 3n/4
= 3(48)/4 [2–under 5) 3 3
= 36 [5–under 8) 4 7
The first cumulative [8–under 11) 11 18
frequency ≥ 36 [11–under 14) 13 31
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48 94
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q3 Number of Number of
= lQ +
(3
uQ − lQ )( 34n − FQ −1 )
3 3
calls hours fi F
fQ [2–under 5) 3 3
3
3
= 14 +
(17 − 14 )( 36 − 31)
[5–under 8) 4 7
9
= 15, 67 [8–under 11) 11 18
[11–under 14) 13 31
75% of the time less [14–under 17) 9 40
than 15,67 or 25% of [17–under 20) 6 46
the time more than
15,67 calls per hour. [20–under 23) 2 48 95
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Number of Number of
Q3 = 15,67 calls hours fi F
Q1 = 9,36 [2–under 5) 3 3
[5–under 8) 4 7
IQR [8–under 11) 11 18
[11–under 14) 13 31
= 15,67 – 9,36 [14–under 17) 9 40
= 6,31 [17–under 20) 6 46
[20–under 23) 2 48 96
n = 48
• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.
10% 90%
Min P10 Max
80% 20%
Min P80 Max
50% 50%
Min P50 = Q2 Max 97
Example – The following data represents the number of
telephone calls received for two days at a municipal call
centre. The data was measured per hour.
Number of Number of
P60 calls hours fi F
= np/100 [2–under 5) 3 3
= 48(60)/100
[5–under 8) 4 7
= 28,8
[8–under 11) 11 18
The first cumulative
[11–under 14) 13 31
frequency ≥ 28,8
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48 98
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
P60 Number of Number of
( u p − l p ) ( 100
np
− Fp −1 ) calls hours fi F
= lp +
fp [2–under 5) 3 3
= 11 +
( 14 − 11)( 28,8 − 18 ) [5–under 8) 4 7
= 13, 49
13 [8–under 11) 11 18
[11–under 14) 13 31
60% of the time less [14–under 17) 9 40
than 13,49 or 40% of [17–under 20) 6 46
the time more than
13,49 calls per hour. [20–under 23) 2 48 99
n = 48
BOX-AND-WISKER PLOT
Me = 12.38
Q3 = 15.67
Q1 = 9.36
IQR = 6.31
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
100
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IqR = 6.31
1,5(IQR) IQR 1,5(IQR)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
101
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38
Q3 = 15.67
Q1 = 9.36
IqR = 6.31
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
102
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IQR = 6.31
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
103
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IqR = 6.31
1,5(IQR) IQR 1,5(IQR)
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
104
• Any value larger than 25,14 will be an outlier.
The box-and-whisker plots
Symmetric
Left skewed, mean < median Right skewed, mean > median
Outliers
Observation values that are larger than the UL are the outliers.
Observation values that are lower than the LL are the outliers.
The box-and-whisker plots
46 49 49 50 51 53 54 54 55 55 59
n + 1 11 + 1 rd n + 1 11 + 1 th
Posion of Q1 : = = 3 observation. Posion of Q2 = Median : = = 6 observation.
4 4
2 2
Therefore, Q1 = 49
Therefore, Q2 = 53
3(n + 1) 3(11 + 1) th
Posion of Q3 : = = 9 observation.
4 4
Therefore, Q3 = 55
Example: The data below are the distances travelled by employees of The Tile Shop
from home to work. Construct the box-plot of these data.
data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 55]
34 was changed to 55. Let’s see if we’ll have an outlier.
q1 = 13; q2 = 24,5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16
LL = 13 – 1.5(16) = -11; UL = 29 + 1.5(16) = 53
References
Lombard C, van der Merwe L, Kele T and
Mouton S. 2012. Elementary Statistics for
Business and Economics.