[go: up one dir, main page]

0% found this document useful (0 votes)
41 views116 pages

Unit 1 STUDENTS Introduction, Graphs and Descriptive

This document discusses introductory concepts in statistics including: - Descriptive statistics involves organizing and summarizing data to condense large volumes into summary measures. Statistical inference generalizes findings from a sample to a broader population. - The statistical process involves planning, data collection, analysis, conclusions, and decision making in a cyclical process. - Probability and non-probability sampling methods are discussed. Probability methods include simple random sampling, stratified random sampling, and systematic sampling which allow statistical inference about populations. Non-probability methods do not support statistical inference.

Uploaded by

Johannah Manoko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views116 pages

Unit 1 STUDENTS Introduction, Graphs and Descriptive

This document discusses introductory concepts in statistics including: - Descriptive statistics involves organizing and summarizing data to condense large volumes into summary measures. Statistical inference generalizes findings from a sample to a broader population. - The statistical process involves planning, data collection, analysis, conclusions, and decision making in a cyclical process. - Probability and non-probability sampling methods are discussed. Probability methods include simple random sampling, stratified random sampling, and systematic sampling which allow statistical inference about populations. Non-probability methods do not support statistical inference.

Uploaded by

Johannah Manoko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

1

PowerPoint slides from Lombard C, van der Merwe L, Kele T & Mouton A also used
• Statistics is the science of data collection,
organising and interpreting numerical facts.
• Gaining information from numerical data or
making sense of data.
• Descriptive Statistics
– Organising and summarising data – condense large
volumes of data into a few summary measures.
• Statistical inference
– Generalises subset data findings to the broader
universe.

2
• Approach for the Statistical process
Research process
becoming a cycle

PLANNING
DECISION- DATA
MAKING COLLECTION

Primary and
Descriptive Statistics
secondary
Statistical inference
sources

EDITING
CONCLUSIONS
and CODING

ANALYSIS

3
• Basic concepts of Statistics
Population: ______________________________________________

Sample : __________________________________

Parameter : __________________________________

Statistic : ________________________________________

Variable : __________________________________

Data : _________________________________________ 4
INTRO. TO STATISTICS
Basic concepts of Statistics

Data collection methods

Personal interviews – census, research, surveys

Telephone interviews – call centres, insurance


companies

Self administered questionnaires - application forms

Direct observation/experimentation –
laboratories/cars passing at a traffic robot
• Sampling methods
– Probability sampling
• Elements have the same probability of being
selected.
• Unbiased inference about the population.
– Non-probability sampling
• Element from the population are not selected
random.
• The elements are selected without knowing the
probability of being selected as part of sample.
• We can not use results of these samples to make
conclusions about the population.
6
• Sampling methods – Probability sampling
– Simple random sampling
• Number the elements of the population from
1 to N.
• Select a random starting point in the random table.
• From the starting point read systematically in any
direction.
• Divide the digits in the random table into groups
with the same number of digits as the number of
digits in the population size (N).
• Find n random numbers from 1 to N – no
duplicates.
• Identify each of the chosen random numbers in the
population. 7
INTRO. TO STATISTICS
Sampling Methods
(i) Simple random sampling
Example
Suppose there are 210 people on the electoral roll in a retirement village. Select a
random sample of 5 people from the village using the table of random numbers
given below. Start at the top left corner and read the random numbers vertically
from top to bottom. Circle (on the table) the random numbers.
Row

1
58 22 37 15 22 41 78 47 14 51 94 38 69 47 58 04 29 85 48 58
2
45 16 80 64 71 01 32 97 73 05 04 33 96 16 42 50 91 30 86 66
3
19 14 68 57 27 40 08 16 41 89 75 46 83 12 20 50 03 73 77 24
4
02 49 74 33 14 12 79 32 55 23 63 66 78 83 60 61 73 84 85 52
5
17 87 57 75 01 11 81 00 86 28 26 23 68 11 76 65 64 19 47 83

_______; _______; ________; ________; _________;


INTRO. TO STATISTICS
Sampling Methods
(i) Simple random sampling
Example
Suppose there are 3 500 people on the electoral roll in a retirement village. Select a
random sample of 5 people from the village using the table of random numbers
given below. Start at row 4, column 5 corner and read the random numbers
horizontally from left to right. Circle (on the table) the random numbers.
Row

1
61 31 37 15 22 21 78 47 14 51 94 38 69 47 58 04 29 85 48 58
2
44 06 80 64 71 41 32 97 73 05 04 33 96 16 42 50 91 30 86 66
3
19 54 68 57 72 10 08 16 41 89 75 46 84 12 27 50 03 73 77 24
4
32 48 74 33 24 22 79 32 55 23 63 66 78 83 60 61 73 84 85 52
5
27 87 57 75 35 11 86 00 86 28 26 23 68 11 76 65 64 19 47 83

_________; _________; ___________; _________; _________;


• Sampling methods – Probability sampling
– Stratified random sampling
• Population heterogeneous with respect to the variable
under study.
• Population divided into N = N1 + N2 + ….. + Nk
homogeneous sub-
populations called strata. (k = number of stratum)

• Sample size form each n = n1 + n2 + ….. + nk


sample proportional to
(k = number of stratum)
stratum size.
• Draw a simple random sample N
from each of the stratum. n =
i n, i = 1...k
i
N 10
INTRO. TO STATISTICS
Sampling Methods
(ii) Stratified random sampling
Observations are not homogeneous
Example: The nationwide number of staff of a certain
insurance company is distributed in the in the table below.
Sales Persons 190
Drivers 90
Technicians 60
Cleaners 40

• How many of each category of staff should be


included in a stratified random sample of 10?
INTRO. TO STATISTICS
Sampling Methods
(ii) Stratified random sampling
Example:
Population size of
Stratum Sample size
each stratum
n1 =
Sales Persons 190 (N1)

n2 =
Drivers 90 (N2)

n3 =
Technicians 60 (N3)

n4 =
Cleaners 40 (N4)

Total N = 380 n = 10
INTRO. TO STATISTICS
Sampling Methods
(iii) Systematic sampling

English definition – acting according to a fixed plan or system

Example: Draw a sample of 6 from a population of 56.

N = 56; n = 6
Interval: k = = =

Select a starting point (start at 5), let’s choose.


5; 15; _____; _______; ______; _______
INTRO. TO STATISTICS
Sampling Methods
(iii) Systematic sampling

English definition – acting according to a fixed plan or system

Example: Draw a sample of 5 from a population of 85.

N = 85; n = 5
Interval: k = = =

Select a starting point (start at 11), let’s choose.


11; 28; _____; _______; ______; _______
• Sampling methods – Non-probability sampling
– Convenience sampling
• Not representative of the target population.
• Items being selected because they are easy to find,
inexpensive and self selected.

15
• Sampling methods – Non-probability sampling
– Quota sampling
• Population divided into sub-classes according to a
certain characteristic.
• A non-sampling method is used to select a sample
from each stratum.
• It is a technique of convenience.
• Researcher attempts to fill the quota quickly.
• Sample is not representative of the population.

16
• Sampling methods – Non-probability sampling
– Judgement sampling
• Elements from the population are chosen by the
judgement of the researcher.
• The probability that an element will be chosen cannot
be calculated.
• Sample is biased.

17
• Sampling methods – Non-probability sampling
– Snowball sampling
• Is used where sampling units are difficult to locate
and identify.
• Find a person who fits the profile of characteristics
of the study.
• From this person obtain names and locations of
others who will fit the profile.

18
DIFFERENT
TYPES OF
DATA

QUANTITATIVE QUALITATIVE
(numerical scale) (categorical)

19
INTRO. TO STATISTICS
Exercises
Exercises:
INTRO. TO STATISTICS
Exercises

Exercises:
UNIT 1 PART B

22
• Need to gain information from data.
• Data must be organised and reduced.
• Descriptive statistics
– Organising data into tables, charts and graphs.
– Numerical calculations.
• Single variable data
• Raw data
– Collected data before it is grouped or ranked.
23
Organising and graphing qualitative data in a
frequency distribution table.
Example:
The data below show the gender of 50 employees and the
department in which they work at ABC Ltd.
HR – Human resources
Emp. no. Gender Dept. Emp. no. Gender Mark. …..
Dept– Marketing
1 M HR 6 M Fin. – Finance
Fin. …..
M – Male
2 F Mark. 7 M Mark. …..
F – Female
3 M Fin. 8 M Fin. …..
4 F HR 9 F HR …..
5 F Fin. 10 F Fin. ….. 24
HR Marketing Finance

M │ │ │││

F ││ │ ││

Emp. no. Gender Dept. Emp. no. Gender Dept …..


1 M HR 6 M Fin. …..
2 F Mark. 7 M Mark. …..
3 M Fin. 8 M Fin. …..
4 F HR 9 F HR …..
5 F Fin. 10 F Fin. ….. 25
Organising and graphing qualitative data in a frequency
distribution table.
Department f F (f/n) Angle size
HR 14 14
Marketing 26 40
Finance 10 50
n = 50
Employees at ABC

20%
28% Human resources

Marketing

Finance

52%
26
Bar graph of employees by department

Employees at ABC

30 26
Number of workers

25
20
14
15 10
10
5
0
Human Marketing Finance
resources

27
Organising and graphing qualitative data in a frequency
distribution table.

HR Marketing Finance Total

M 4 10 5 19
F 10 16 5 31
Total 14 26 10 50

28
Bar graph of employees by gender and department

Employees at ABC
Human
20
resources
Number of workers

15
Marketing
10
Finance
5

0
Male Female

29
Bar graph of employees by department and gender

Employees at ABC
Number of workers

20
15 Male
10
Female
5
0
Human Marketing Finance
resources

30
Stacked bar graphs
HR Marketing Finance Total
M 4 10 5 19
F 10 16 5 31
Total 14 26 10 50

Employees at ABC Employees at ABC

Number of workers
35 Finance 30
25
Number of workers

30 Female
20
25 15
Marketing
20 10 Male
15 5
10 0
Human
5 resources Human Marketing Finance
0 resources 31
Male Female
Organising and graphing quantitative data in a frequency
distribution table.
• Frequency table consists of a number of classes and each
observation is counted and recorded as the frequency of
the class.
• If n observations need to be classified into a frequency
table, determine:

Number of classes: c = 1 + 3.3log n

l arg est − smallest



Class width =
c
32
Organising and graphing quantitative data in a frequency
distribution table.
Example:
The following data represent the length of time (in minutes) of
number of telephone calls received at a municipal call centre.

8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9 19
14 17 9 3 3 16 8 2 33
Step 1 : Sort observations in ascending order

2 3 3 5 6 6 7 8 8 8
9 9 9 9 9 10 10 10 11 11
11 11 11 11 11 11 12 12 12 13
13 14 14 14 14 15 15 16 16 16
17 17 18 18 18 19 20 22

Step 2: Determine the number of classes required


c = 1 + 3.3log(n) =
34
Step 3: Determine the class width
xmax − xmin 22 − 2
Class width = = = 2.86  3
c 7
2 3 3 5 6 6 7 8 8 8
9 9 9 9 9 10 10 10 11 11
11 11 11 11 11 11 12 12 12 13
13 14 14 14 14 15 15 16 16 16
17 17 18 18 18 19 20 22

35
Frequency distribution table
Time (x) f %f F
[ ; ) 6.3%
[ ; ) 8.3%
[ ; ) 22.9%
[ ; ) 27.1%
[ ; ) 18.8%
[ ; ) 12.5%
[ ; ) 4.2%
Total n = 48
36
Histograms

37
Histograms

38
Frequency polygon

39
Frequency polygons

Frequency distribution polygon of the telephone calls received


14
13
12
11
10
Number of calls

9
8

6 6

4 4
3
2 2

0 0 0
0 5 10 15 20 25 30
Time

40
Ogive

41
Ogives

42
Ogives Ogive of number of call received
20% of the
hours had at a call centre per hour
more than
17 calls 100
number of hours

per hour. 90
% Cumulative

80
70
80% of the 60
hours had 50
less than 40
30
17 calls 20
per hour. 10
0
2 5 8 11 14 17 20 23
50% of Number
the hoursofhad less
calls
than 12 calls per hour.

43
Unit 1 Part C

44
• Properties to describe numerical data:
– Central tendency
– Dispersion
– Shape
• Measures calculated for:
– Sample data
• Statistics
– Entire population
• Parameters

45
Measures of location
• Arithmetic mean
• Median
• Mode

46
ARITHMETIC MEAN
- This is the most commonly used measure
and is also called the mean.

sum of sample observations


Sample mean =
number of sample observations
n

x i
x= i =1

n Sample size
47
• MEDIAN
– Half the values in data set is smaller than median.
– Half the values in data set is larger than median.
– Order the data from small to large.
• Position of median
– If n is odd:
• The median is the (n+1)/2 th observation.
– If n is even:
• Calculate (n+1)/2
• The median is the average of the values before and
after (n+1)/2.
48
• MODE
– It is the observation in the data set that occurs
the most frequently.
– If no observation(s) repeat(s) then there is no
mode.

49
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:

x i
x= i =1
= = _______
n 9

50
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:
Odd number

−4 −3 2 2 5 5 5 6 8

(n+1)/2 = (9+1)/2 = 5th observation

Median = 5
51
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:
Odd number

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

(n+1)/2 = (9+1)/2 = 5th observation

Median = 5
52
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 3
Determine the median of the sample of ten measurements.
•:Order the measurements Even number

53
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 3
Determine the median of the sample of ten measurements.
•:Order the measurements Even number

−4 −3 2 2 3 5 5 5 6 8
1 2 3 4 5 6 7 8 9 10

(n+1)/2 = (10+1)/2 = 5,5th observation

Median = (3+5)/2 = 4
54
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine the mode of the sample of nine measurements.
•Order the measurements

−4 −3 2 2 5 5 5 6 8
Mode = ______

55
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 2
Determine the mode of the sample of ten measurements.
•Order the measurements

−4 −3 2 2 2 5 5 5 6 8
Mode = 2 and 5
•Multimodal

56
Measures of dispersion/spread
• Range
• Variance
• Standard deviation
• Coefficient of variation

57
• Range
– The range of a set of measurements is the
difference between the largest and smallest
values in the data set.
– Its major advantage is the ease with which it
can be computed.
– Its major shortcoming is its failure to provide
information on the dispersion of the values
between the two end points.

58
• Variance and standard deviation
Determine how far the observations are from their mean.

x − ( x)
1
 2 2

sample variance = s 2 = n
n −1

 x − n ( x)
2 1 2

samplestandard deviation = s =
n −1

59
• Coefficient of variation
– Measures the standard deviation relative to the
mean.
– It is expressed as a percentage.
– Used to compare samples that are measured in
different units.
s
CV = 100
x

60
Example - Given the following data sets:
The means are the same but the dispersion of Dataset 1
1st: much
-4 larger
-3 than2 the2 dispersion
5 5of Data
5 set 2.
6 8
2nd : 0 1 2 3 3 4 5 5
29
x=  2,9
9

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9

23
x=  2,9
8 61
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8
Range = Largest value – smallest value = 8 – (−4) = 12

 ( x)
2
x 2
− 1

Calculation formula for sample variance: s 2


= n

n −1

62
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8

Calculation formula for sample variance:


1
( x) 208 − ( 26 )
2


2
x 2
− 1
9
s 2
= n
= = 16.61
n −1 9 −1

Calculation formula for sample standard deviation:


1
( x) 208 − ( 26 )
2

x
2
2
− 1
9
s = s2 = n
= = 16.61 = 4.08
n −1 9 −1

63
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8
2nd : 0 1 2 3 3 4 5 5
The coefficient of variation of the measurements is
given by:
s 4.08
CV = 100% =  100 =
x 2.9

64
• Quartiles
• Percentiles
• Interquartile range

65
• QUARTILES
– Order data in ascending order.
– Divide data set into four quarters.

25% 25% 25% 25%


Min Q1 Q2 Q3 Max

66
Example – Given the following data set:
4 6 11 −5 8 3 15 8 −7
Determine Q1 for the sample of nine measurements:
•Order the measurements
−7 −5 3 4 6 8 8 11 15
1 2 3 4 5 6 7 8 9

n +1 9 +1
Position of Q1: = = 2.5th value
4 4
0.5 ( 3rd − 2nd ) = 0.5 3 − ( −5 )  = 4

Therefore Q1 = −5 + ( 4 ) = −1 67
Example – Given the following data set:
4 6 11 −5 8 3 15 8 −7
Determine Q3 for the sample of nine measurements:
•Order the measurements
−7 −5 3 4 6 8 8 11 15
1 2 3 4 5 6 7 8 9

3 ( n + 1) 3 ( 9 + 1)
Position of Q3: = = 7.5th value
4 4
0.5 (8th − 7th ) = 0.5 11 − 8 = 1.5

Therefore Q3 = 8 + 1.5 = 9.5 68


Example – Given the following data set:
4 6 11 −5 8 3 15 8 −7
-7 -5 3 4 6 6 8 8 11 15

Interquartile range = Q3 – Q1
Q3 = 9.5
Q1 = −1
Interquartile range = Q3 –Q1
= 9.5 – (-1)
= 10.5 69
• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.

10% 90%
Min P10 Max

80% 20%
Min P80 Max

50% 50%
Min P50 = Q2 Max 70
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine P20 for the sample of nine measurements:

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

p ( n + 1) 20 ( 9 + 1) nd
Position of P20 : = =2 value
100 100

71
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine P20 for the sample of nine measurements:

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

p ( n + 1) 20 ( 9 + 1)
Position of P20 : = =2nd value
100 100
Therefore P20 = −3
72
Unit 1 Part C

73
• ARITHMETIC MEAN
– Data have been organised in a frequency
distribution table

x=
 fx i i

 f =n
i

74
• MEDIAN
– Data is given in a frequency table.
– First cumulative frequency ≥ n/2 will indicate the
median class interval.
– Median can also be determined from the ogive.

( ui − li ) ( n2 − Fi −1 )
M e = li +
fi

75
• MODE
– Class interval that has the largest frequency
value will contain the mode.
– Mode can be determined from the histogram or
by using the Mode formula

76
Measures of location for grouped data

Mode formula:

M o = li +
( ui − li )( fi − fi −1 )
2 fi − fi −1 − fi +1
li = lower limit of the modal class interval
ui = upper limit of the modal class interval
fi = frequency of the modal class
fi −1 = frequency of the class preceding the modal interval
fi +1 = frequency of the class following the modal interval
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate the Number of Number of
mean for the sample calls hours fi xi
of the 48 hours: [2–under 5) 3 3,5
determine the class [5–under 8) 4 6,5
midpoints [8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5 78
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour. of
Number Number of

x=
 fi xi calls hours fi xi
 i f [2–under 5) 3 3,5
[5–under 8) 4 6,5
597
= [8–under 11) 11 9,5
48 [11–under 14) 13 12,5
= 12, 44 [14–under 17) 9 15,5
Average number [17–under 20) 6 18,5
of calls per hour [20–under 23) 2 21,5
is 12,44.
79
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate the Number of Number of
median for the calls hours fi F
sample of the 48: [2–under 5) 3 3
hours: [5–under 8) 4 7
determine the [8–under 11) 11 18
cumulative [11–under 14) 13 31
frequencies [14–under 17) 9 40
n/2 = 48/2 = 24
The first cumulative [17–under 20) 6 46
frequency ≥ 24 [20–under 23) 2 48 80
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Median Number of Number of
( ui − li ) ( n2 − Fi −1 ) calls hours fi F
= li +
fi [2–under 5) 3 3
= 11 +
(14 − 11)( 24 − 18 ) [5–under 8) 4 7
13 [8–under 11) 11 18
= 12,38
[11–under 14) 13 31
50% of the time less [14–under 17) 9 40
than 12,38 or 50% of [17–under 20) 6 46
the time more than
12,38 calls per hour. [20–under 23) 2 48 81
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Number of calls at a call centre
The median can
be determined
48
form the ogive.
Number of hours

40
32
24 n/2 = 48/2 = 24
16
8
0
Median = 12,4
2 5 8 11
A
14 17 20 23 Read at A.
Number of calls
82
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
To calculate the calls hours fi xi
mode for the sample [2–under 5) 3 3,5
of the 48 hours: [5–under 8) 4 6,5
draw the histogram [8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
83
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.

Number of calls at a call centre Mode = 12,3


14
Read at A.
12
Number of hours

10
8
6
4
2
0

2 5 8 11
A 14 17 20 23
Number of calls
84
Relationship between mean, median, and mode
• If a distribution is symmetrical:
– the mean, median and mode are the same
and lie at centre of distribution
• If a distribution is non-symmetrical: Mean
– skewed to the left or to the right Mode
Median
– three measures differ
A positively skewed distribution A negatively skewed distribution
(skewed to the right) (skewed to the left)

Mode Mean Mean Mode 85


Median Median
Measures of dispersion
• Range
• Variance
• Standard deviation
• Coefficient of variation

86
• Variance and standard deviation

Where:
– f = frequencies of class intervals
– x = class midpoints of class intervals
– n = sample size
87
• Variance and standard deviation
 (  fx )
2
fx 2
− 1

Population variance =  2
= N

 fx (  fx )
2
2
− 1

Population standard deviation =  = N

N
Where:
– f = frequencies of class intervals
– x = class midpoints of class intervals
– N = population size

88
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
89
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
90
n = 48
• Quartiles
• Percentiles
• Interquartile range

91
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate Q1 Number of Number of
for the sample of calls hours fi F
the 48 hours: [2–under 5) 3 3
[5–under 8) 4 7
[8–under 11) 11 18
[11–under 14) 13 31
n/4 = 48/4 = 12 [14–under 17) 9 40
The first cumulative [17–under 20) 6 46
frequency ≥ 12 [20–under 23) 2 48 92
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q1 Number of Number of

= lQ + 1
( uQ
1
− lQ )( 4 − FQ −1 )
1
n calls hours fi F
1
1
fQ [2–under 5) 3 3
= 8+
(11 − 8 )(12 − 7 ) [5–under 8) 4 7
= 9,36
11 [8–under 11) 11 18
[11–under 14) 13 31
25% of the time less [14–under 17) 9 40
than 9,36 or 75% of [17–under 20) 6 46
the time more than
9,36 calls per hour. [20–under 23) 2 48 93
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q3 Number of Number of
calls hours fi F
= 3n/4
= 3(48)/4 [2–under 5) 3 3
= 36 [5–under 8) 4 7
The first cumulative [8–under 11) 11 18
frequency ≥ 36 [11–under 14) 13 31
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48 94
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q3 Number of Number of
= lQ +
(3
uQ − lQ )( 34n − FQ −1 )
3 3
calls hours fi F
fQ [2–under 5) 3 3
3
3

= 14 +
(17 − 14 )( 36 − 31)
[5–under 8) 4 7
9
= 15, 67 [8–under 11) 11 18
[11–under 14) 13 31
75% of the time less [14–under 17) 9 40
than 15,67 or 25% of [17–under 20) 6 46
the time more than
15,67 calls per hour. [20–under 23) 2 48 95
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Number of Number of
Q3 = 15,67 calls hours fi F
Q1 = 9,36 [2–under 5) 3 3
[5–under 8) 4 7
IQR [8–under 11) 11 18
[11–under 14) 13 31
= 15,67 – 9,36 [14–under 17) 9 40
= 6,31 [17–under 20) 6 46
[20–under 23) 2 48 96
n = 48
• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.

10% 90%
Min P10 Max

80% 20%
Min P80 Max

50% 50%
Min P50 = Q2 Max 97
Example – The following data represents the number of
telephone calls received for two days at a municipal call
centre. The data was measured per hour.
Number of Number of
P60 calls hours fi F
= np/100 [2–under 5) 3 3
= 48(60)/100
[5–under 8) 4 7
= 28,8
[8–under 11) 11 18
The first cumulative
[11–under 14) 13 31
frequency ≥ 28,8
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48 98
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
P60 Number of Number of
( u p − l p ) ( 100
np
− Fp −1 ) calls hours fi F
= lp +
fp [2–under 5) 3 3
= 11 +
( 14 − 11)( 28,8 − 18 ) [5–under 8) 4 7
= 13, 49
13 [8–under 11) 11 18
[11–under 14) 13 31
60% of the time less [14–under 17) 9 40
than 13,49 or 40% of [17–under 20) 6 46
the time more than
13,49 calls per hour. [20–under 23) 2 48 99
n = 48
BOX-AND-WISKER PLOT
Me = 12.38
Q3 = 15.67
Q1 = 9.36
IQR = 6.31

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

100
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IqR = 6.31
1,5(IQR) IQR 1,5(IQR)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
101
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38
Q3 = 15.67
Q1 = 9.36
IqR = 6.31

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
102
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IQR = 6.31

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
103
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IqR = 6.31
1,5(IQR) IQR 1,5(IQR)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
104
• Any value larger than 25,14 will be an outlier.
The box-and-whisker plots

It is a graphical method that can indicate whether the distribution of


set data is symmetric, positive or negatively skewed.
It helps us picture a set of data.
Positively skewed – the distance from the Median to Q3 is larger
than the distance from the Median to Q1; or
the distance from Q3 to UL is larger than the
distance from Q1 to LL; or
the longer part of the box is to the right hand
side of the median
The box-and-whisker plots

Negatively skewed – the distance from the Median to Q1 is larger


than the distance from the Median to Q3, or
the distance from Q1 to LL is larger than the
distance from Q3 to UL, or
the longer part of the box is to the left hand
side of the median.
Normality assessment methods
The Box-and-Whisker plot

Symmetric

Left skewed, mean < median Right skewed, mean > median
Outliers

An outlier is an observation in the dataset that differs significantly


from other observations.

An outlier can be extremely high or extremely low.

An outlier can be detected by using the Box-plot, standardized


residuals plot (Q-Q plot) or the outlier test.

At this level we’ll stick to the Box-plot method of detecting outliers.

Observation values that are larger than the UL are the outliers.
Observation values that are lower than the LL are the outliers.
The box-and-whisker plots

Example: Construct the box-plot of the data below.


46 49 53 51 49 59 54 55 55 54 50

46 49 49 50 51 53 54 54 55 55 59

n + 1 11 + 1 rd n + 1 11 + 1 th
Posion of Q1 : = = 3 observation. Posion of Q2 = Median : = = 6 observation.
4 4
2 2
Therefore, Q1 = 49
Therefore, Q2 = 53
3(n + 1) 3(11 + 1) th
Posion of Q3 : = = 9 observation.
4 4
Therefore, Q3 = 55

IQR = Q3 – Q1 = 55 – 49 = 6. UL = Q3 + 1.5IQR = 55 + 1.5 x 6 = 64


LL = Q1 – 1.5IQR = 49 – 1.5 x 6 = 40
The box-and-whisker plots
Example: Construct the box-plot of the data below.
IQR = Q3 – Q1 = 55 – 49 = 6. UL = Q3 + 1.5IQR = 55 + 1.5 x 6 = 64
LL = Q1 – 1.5IQR = 49 – 1.5 x 6 = 40

Box-plot of the data


The box-and-whisker plots

Example: The data below are the distances travelled by


employees of The Tile Shop from home to work.
Construct the box-plot of these data.
data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 34]

q1 = 13; q2 = 24.5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16


The box-and-whisker plots

Example: The data below are the distances travelled by employees


of The Tile Shop from home to work. Construct the box-plot of
these data.
data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 34]
q = 13; q2 = 24.5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16
The box-and-whisker plots
Example: q1 = 13; q2 = 24.5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16

LL = Q1-1.5IQR = 13-1.5(16) = -11; UL = Q3 + 1.5IQR = 29 + 1.5(16) = 53


There are no outliers.
The box-and-whisker plots

Example: data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 55]


34 was changed to 55. Let’s see if we’ll have an outlier.
q1 = 13; q2 = 24,5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16
LL = 13 – 1.5(16) = -11; UL = 29 + 1.5(16) = 53
The box-and-whisker plots

Example: The data below are the distances travelled by employees of The Tile Shop
from home to work. Construct the box-plot of these data.
data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 55]
34 was changed to 55. Let’s see if we’ll have an outlier.
q1 = 13; q2 = 24,5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16
LL = 13 – 1.5(16) = -11; UL = 29 + 1.5(16) = 53
References
Lombard C, van der Merwe L, Kele T and
Mouton S. 2012. Elementary Statistics for
Business and Economics.

You might also like