0% found this document useful (0 votes)

41 views116 pages

Unit 1 STUDENTS Introduction, Graphs and Descriptive

This document discusses introductory concepts in statistics including: - Descriptive statistics involves organizing and summarizing data to condense large volumes into summary measures. Statistical inference generalizes findings from a sample to a broader population. - The statistical process involves planning, data collection, analysis, conclusions, and decision making in a cyclical process. - Probability and non-probability sampling methods are discussed. Probability methods include simple random sampling, stratified random sampling, and systematic sampling which allow statistical inference about populations. Non-probability methods do not support statistical inference.

Uploaded by

Johannah Manoko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views116 pages

Unit 1 STUDENTS Introduction, Graphs and Descriptive

Uploaded by

Johannah Manoko

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 116

1

PowerPoint slides from Lombard C, van der Merwe L, Kele T & Mouton A also used
• Statistics is the science of data collection,
organising and interpreting numerical facts.
• Gaining information from numerical data or
making sense of data.
• Descriptive Statistics
– Organising and summarising data – condense large
volumes of data into a few summary measures.
• Statistical inference
– Generalises subset data findings to the broader
universe.

2
• Approach for the Statistical process
Research process
becoming a cycle

PLANNING
DECISION- DATA
MAKING COLLECTION

Primary and
Descriptive Statistics
secondary
Statistical inference
sources

EDITING
CONCLUSIONS
and CODING

ANALYSIS

3
• Basic concepts of Statistics
Population: ______________________________________________

Sample : __________________________________

Parameter : __________________________________

Statistic : ________________________________________

Variable : __________________________________

Data : _________________________________________ 4
INTRO. TO STATISTICS
Basic concepts of Statistics

Data collection methods

Personal interviews – census, research, surveys

Telephone interviews – call centres, insurance

companies

Self administered questionnaires - application forms

Direct observation/experimentation –
laboratories/cars passing at a traffic robot
• Sampling methods
– Probability sampling
• Elements have the same probability of being
selected.
• Unbiased inference about the population.
– Non-probability sampling
• Element from the population are not selected
random.
• The elements are selected without knowing the
probability of being selected as part of sample.
• We can not use results of these samples to make
conclusions about the population.
6
• Sampling methods – Probability sampling
– Simple random sampling
• Number the elements of the population from
1 to N.
• Select a random starting point in the random table.
• From the starting point read systematically in any
direction.
• Divide the digits in the random table into groups
with the same number of digits as the number of
digits in the population size (N).
• Find n random numbers from 1 to N – no
duplicates.
• Identify each of the chosen random numbers in the
population. 7
INTRO. TO STATISTICS
Sampling Methods
(i) Simple random sampling
Example
Suppose there are 210 people on the electoral roll in a retirement village. Select a
random sample of 5 people from the village using the table of random numbers
given below. Start at the top left corner and read the random numbers vertically
from top to bottom. Circle (on the table) the random numbers.
Row

1
58 22 37 15 22 41 78 47 14 51 94 38 69 47 58 04 29 85 48 58
2
45 16 80 64 71 01 32 97 73 05 04 33 96 16 42 50 91 30 86 66
3
19 14 68 57 27 40 08 16 41 89 75 46 83 12 20 50 03 73 77 24
4
02 49 74 33 14 12 79 32 55 23 63 66 78 83 60 61 73 84 85 52
5
17 87 57 75 01 11 81 00 86 28 26 23 68 11 76 65 64 19 47 83

___; _; ; ; ___;

INTRO. TO STATISTICS
Sampling Methods
(i) Simple random sampling
Example
Suppose there are 3 500 people on the electoral roll in a retirement village. Select a
random sample of 5 people from the village using the table of random numbers
given below. Start at row 4, column 5 corner and read the random numbers
horizontally from left to right. Circle (on the table) the random numbers.
Row

1
61 31 37 15 22 21 78 47 14 51 94 38 69 47 58 04 29 85 48 58
2
44 06 80 64 71 41 32 97 73 05 04 33 96 16 42 50 91 30 86 66
3
19 54 68 57 72 10 08 16 41 89 75 46 84 12 27 50 03 73 77 24
4
32 48 74 33 24 22 79 32 55 23 63 66 78 83 60 61 73 84 85 52
5
27 87 57 75 35 11 86 00 86 28 26 23 68 11 76 65 64 19 47 83

_____; _; ___; _; _____;

• Sampling methods – Probability sampling
– Stratified random sampling
• Population heterogeneous with respect to the variable
under study.
• Population divided into N = N1 + N2 + ….. + Nk
homogeneous sub-
populations called strata. (k = number of stratum)

• Sample size form each n = n1 + n2 + ….. + nk

sample proportional to
(k = number of stratum)
stratum size.
• Draw a simple random sample N
from each of the stratum. n =
i n, i = 1...k
i
N 10
INTRO. TO STATISTICS
Sampling Methods
(ii) Stratified random sampling
Observations are not homogeneous
Example: The nationwide number of staff of a certain
insurance company is distributed in the in the table below.
Sales Persons 190
Drivers 90
Technicians 60
Cleaners 40

n2 =
Drivers 90 (N2)

n3 =
Technicians 60 (N3)

n4 =
Cleaners 40 (N4)

Total N = 380 n = 10
INTRO. TO STATISTICS
Sampling Methods
(iii) Systematic sampling

English definition – acting according to a fixed plan or system

Example: Draw a sample of 6 from a population of 56.

N = 56; n = 6
Interval: k = = =

Select a starting point (start at 5), let’s choose.

5; 15; _____; _______; ______; _______
INTRO. TO STATISTICS
Sampling Methods
(iii) Systematic sampling

English definition – acting according to a fixed plan or system

Example: Draw a sample of 5 from a population of 85.

N = 85; n = 5
Interval: k = = =

Select a starting point (start at 11), let’s choose.

11; 28; _____; _______; ______; _______
• Sampling methods – Non-probability sampling
– Convenience sampling
• Not representative of the target population.
• Items being selected because they are easy to find,
inexpensive and self selected.

15
• Sampling methods – Non-probability sampling
– Quota sampling
• Population divided into sub-classes according to a
certain characteristic.
• A non-sampling method is used to select a sample
from each stratum.
• It is a technique of convenience.
• Researcher attempts to fill the quota quickly.
• Sample is not representative of the population.

16
• Sampling methods – Non-probability sampling
– Judgement sampling
• Elements from the population are chosen by the
judgement of the researcher.
• The probability that an element will be chosen cannot
be calculated.
• Sample is biased.

17
• Sampling methods – Non-probability sampling
– Snowball sampling
• Is used where sampling units are difficult to locate
and identify.
• Find a person who fits the profile of characteristics
of the study.
• From this person obtain names and locations of
others who will fit the profile.

18
DIFFERENT
TYPES OF
DATA

QUANTITATIVE QUALITATIVE
(numerical scale) (categorical)

19
INTRO. TO STATISTICS
Exercises
Exercises:
INTRO. TO STATISTICS
Exercises

Exercises:
UNIT 1 PART B

22
• Need to gain information from data.
• Data must be organised and reduced.
• Descriptive statistics
– Organising data into tables, charts and graphs.
– Numerical calculations.
• Single variable data
• Raw data
– Collected data before it is grouped or ranked.
23
Organising and graphing qualitative data in a
frequency distribution table.
Example:
The data below show the gender of 50 employees and the
department in which they work at ABC Ltd.
HR – Human resources
Emp. no. Gender Dept. Emp. no. Gender Mark. …..
Dept– Marketing
1 M HR 6 M Fin. – Finance
Fin. …..
M – Male
2 F Mark. 7 M Mark. …..
F – Female
3 M Fin. 8 M Fin. …..
4 F HR 9 F HR …..
5 F Fin. 10 F Fin. ….. 24
HR Marketing Finance

M │ │ │││

F ││ │ ││

Emp. no. Gender Dept. Emp. no. Gender Dept …..

1 M HR 6 M Fin. …..
2 F Mark. 7 M Mark. …..
3 M Fin. 8 M Fin. …..
4 F HR 9 F HR …..
5 F Fin. 10 F Fin. ….. 25
Organising and graphing qualitative data in a frequency
distribution table.
Department f F (f/n) Angle size
HR 14 14
Marketing 26 40
Finance 10 50
n = 50
Employees at ABC

20%
28% Human resources

Marketing

Finance

52%
26
Bar graph of employees by department

Employees at ABC

30 26
Number of workers

25
20
14
15 10
10
5
0
Human Marketing Finance
resources

27
Organising and graphing qualitative data in a frequency
distribution table.

HR Marketing Finance Total

M 4 10 5 19
F 10 16 5 31
Total 14 26 10 50

28
Bar graph of employees by gender and department

Employees at ABC
Human
20
resources
Number of workers

15
Marketing
10
Finance
5

0
Male Female

29
Bar graph of employees by department and gender

Employees at ABC
Number of workers

20
15 Male
10
Female
5
0
Human Marketing Finance
resources

30
Stacked bar graphs
HR Marketing Finance Total
M 4 10 5 19
F 10 16 5 31
Total 14 26 10 50

Employees at ABC Employees at ABC

Number of workers
35 Finance 30
25
Number of workers

30 Female
20
25 15
Marketing
20 10 Male
15 5
10 0
Human
5 resources Human Marketing Finance
0 resources 31
Male Female
Organising and graphing quantitative data in a frequency
distribution table.
• Frequency table consists of a number of classes and each
observation is counted and recorded as the frequency of
the class.
• If n observations need to be classified into a frequency
table, determine:

Number of classes: c = 1 + 3.3log n

l arg est − smallest

–
Class width =
c
32
Organising and graphing quantitative data in a frequency
distribution table.
Example:
The following data represent the length of time (in minutes) of
number of telephone calls received at a municipal call centre.

8 11 12 20 18 10 14 18 16 9
5 7 11 12 15 14 16 9 17 11
6 18 9 15 13 12 11 6 10 8
11 13 22 11 11 14 11 10 9 19
14 17 9 3 3 16 8 2 33
Step 1 : Sort observations in ascending order

2 3 3 5 6 6 7 8 8 8
9 9 9 9 9 10 10 10 11 11
11 11 11 11 11 11 12 12 12 13
13 14 14 14 14 15 15 16 16 16
17 17 18 18 18 19 20 22

Step 2: Determine the number of classes required

c = 1 + 3.3log(n) =
34
Step 3: Determine the class width
xmax − xmin 22 − 2
Class width = = = 2.86  3
c 7
2 3 3 5 6 6 7 8 8 8
9 9 9 9 9 10 10 10 11 11
11 11 11 11 11 11 12 12 12 13
13 14 14 14 14 15 15 16 16 16
17 17 18 18 18 19 20 22

35
Frequency distribution table
Time (x) f %f F
[ ; ) 6.3%
[ ; ) 8.3%
[ ; ) 22.9%
[ ; ) 27.1%
[ ; ) 18.8%
[ ; ) 12.5%
[ ; ) 4.2%
Total n = 48
36
Histograms

37
Histograms

38
Frequency polygon

39
Frequency polygons

Frequency distribution polygon of the telephone calls received

14
13
12
11
10
Number of calls

9
8

6 6

4 4
3
2 2

0 0 0
0 5 10 15 20 25 30
Time

40
Ogive

41
Ogives

42
Ogives Ogive of number of call received
20% of the
hours had at a call centre per hour
more than
17 calls 100
number of hours

per hour. 90
% Cumulative

80
70
80% of the 60
hours had 50
less than 40
30
17 calls 20
per hour. 10
0
2 5 8 11 14 17 20 23
50% of Number
the hoursofhad less
calls
than 12 calls per hour.

43
Unit 1 Part C

44
• Properties to describe numerical data:
– Central tendency
– Dispersion
– Shape
• Measures calculated for:
– Sample data
• Statistics
– Entire population
• Parameters

45
Measures of location
• Arithmetic mean
• Median
• Mode

46
ARITHMETIC MEAN
- This is the most commonly used measure
and is also called the mean.

sum of sample observations

Sample mean =
number of sample observations
n

x i
x= i =1

n Sample size
47
• MEDIAN
– Half the values in data set is smaller than median.
– Half the values in data set is larger than median.
– Order the data from small to large.
• Position of median
– If n is odd:
• The median is the (n+1)/2 th observation.
– If n is even:
• Calculate (n+1)/2
• The median is the average of the values before and
after (n+1)/2.
48
• MODE
– It is the observation in the data set that occurs
the most frequently.
– If no observation(s) repeat(s) then there is no
mode.

49
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:

x i
x= i =1
= = _______
n 9

50
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:
Odd number

−4 −3 2 2 5 5 5 6 8

(n+1)/2 = (9+1)/2 = 5th observation

Median = 5
51
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
The mean of the sample of nine measurements is given by:
Odd number

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

(n+1)/2 = (9+1)/2 = 5th observation

Median = 5
52
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 3
Determine the median of the sample of ten measurements.
•:Order the measurements Even number

53
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 3
Determine the median of the sample of ten measurements.
•:Order the measurements Even number

−4 −3 2 2 3 5 5 5 6 8
1 2 3 4 5 6 7 8 9 10

(n+1)/2 = (10+1)/2 = 5,5th observation

Median = (3+5)/2 = 4
54
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine the mode of the sample of nine measurements.
•Order the measurements

−4 −3 2 2 5 5 5 6 8
Mode = ______

55
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4 2
Determine the mode of the sample of ten measurements.
•Order the measurements

−4 −3 2 2 2 5 5 5 6 8
Mode = 2 and 5
•Multimodal

56
Measures of dispersion/spread
• Range
• Variance
• Standard deviation
• Coefficient of variation

57
• Range
– The range of a set of measurements is the
difference between the largest and smallest
values in the data set.
– Its major advantage is the ease with which it
can be computed.
– Its major shortcoming is its failure to provide
information on the dispersion of the values
between the two end points.

58
• Variance and standard deviation
Determine how far the observations are from their mean.

x − ( x)
1
 2 2

sample variance = s 2 = n
n −1

 x − n ( x)
2 1 2

samplestandard deviation = s =
n −1

59
• Coefficient of variation
– Measures the standard deviation relative to the
mean.
– It is expressed as a percentage.
– Used to compare samples that are measured in
different units.
s
CV = 100
x

60
Example - Given the following data sets:
The means are the same but the dispersion of Dataset 1
1st: much
-4 larger
-3 than2 the2 dispersion
5 5of Data
5 set 2.
6 8
2nd : 0 1 2 3 3 4 5 5
29
x=  2,9
9

−5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9

23
x=  2,9
8 61
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8
Range = Largest value – smallest value = 8 – (−4) = 12

 ( x)
2
x 2
− 1

Calculation formula for sample variance: s 2

= n

n −1

62
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8

Calculation formula for sample variance:

1
( x) 208 − ( 26 )
2


2
x 2
− 1
9
s 2
= n
= = 16.61
n −1 9 −1

Calculation formula for sample standard deviation:

1
( x) 208 − ( 26 )
2

x
2
2
− 1
9
s = s2 = n
= = 16.61 = 4.08
n −1 9 −1

63
Example – Given the following data sets:
1st: −4 −3 2 2 5 5 5 6 8
2nd : 0 1 2 3 3 4 5 5
The coefficient of variation of the measurements is
given by:
s 4.08
CV = 100% =  100 =
x 2.9

64
• Quartiles
• Percentiles
• Interquartile range

65
• QUARTILES
– Order data in ascending order.
– Divide data set into four quarters.

25% 25% 25% 25%

Min Q1 Q2 Q3 Max

66
Example – Given the following data set:
4 6 11 −5 8 3 15 8 −7
Determine Q1 for the sample of nine measurements:
•Order the measurements
−7 −5 3 4 6 8 8 11 15
1 2 3 4 5 6 7 8 9

n +1 9 +1
Position of Q1: = = 2.5th value
4 4
0.5 ( 3rd − 2nd ) = 0.5 3 − ( −5 )  = 4

Therefore Q1 = −5 + ( 4 ) = −1 67
Example – Given the following data set:
4 6 11 −5 8 3 15 8 −7
Determine Q3 for the sample of nine measurements:
•Order the measurements
−7 −5 3 4 6 8 8 11 15
1 2 3 4 5 6 7 8 9

3 ( n + 1) 3 ( 9 + 1)
Position of Q3: = = 7.5th value
4 4
0.5 (8th − 7th ) = 0.5 11 − 8 = 1.5

Therefore Q3 = 8 + 1.5 = 9.5 68

Example – Given the following data set:
4 6 11 −5 8 3 15 8 −7
-7 -5 3 4 6 6 8 8 11 15

Interquartile range = Q3 – Q1
Q3 = 9.5
Q1 = −1
Interquartile range = Q3 –Q1
= 9.5 – (-1)
= 10.5 69
• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.

10% 90%
Min P10 Max

80% 20%
Min P80 Max

50% 50%
Min P50 = Q2 Max 70
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine P20 for the sample of nine measurements:

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

p ( n + 1) 20 ( 9 + 1) nd
Position of P20 : = =2 value
100 100

71
Example – Given the following data set:
2 5 8 −3 5 2 6 5 −4
Determine P20 for the sample of nine measurements:

−4 −3 2 2 5 5 5 6 8
1 2 3 4 5 6 7 8 9

p ( n + 1) 20 ( 9 + 1)
Position of P20 : = =2nd value
100 100
Therefore P20 = −3
72
Unit 1 Part C

73
• ARITHMETIC MEAN
– Data have been organised in a frequency
distribution table

x=
 fx i i

 f =n
i

74
• MEDIAN
– Data is given in a frequency table.
– First cumulative frequency ≥ n/2 will indicate the
median class interval.
– Median can also be determined from the ogive.

( ui − li ) ( n2 − Fi −1 )
M e = li +
fi

75
• MODE
– Class interval that has the largest frequency
value will contain the mode.
– Mode can be determined from the histogram or
by using the Mode formula

76
Measures of location for grouped data

Mode formula:

M o = li +
( ui − li )( fi − fi −1 )
2 fi − fi −1 − fi +1
li = lower limit of the modal class interval
ui = upper limit of the modal class interval
fi = frequency of the modal class
fi −1 = frequency of the class preceding the modal interval
fi +1 = frequency of the class following the modal interval
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate the Number of Number of
mean for the sample calls hours fi xi
of the 48 hours: [2–under 5) 3 3,5
determine the class [5–under 8) 4 6,5
midpoints [8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5 78
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour. of
Number Number of

x=
 fi xi calls hours fi xi
 i f [2–under 5) 3 3,5
[5–under 8) 4 6,5
597
= [8–under 11) 11 9,5
48 [11–under 14) 13 12,5
= 12, 44 [14–under 17) 9 15,5
Average number [17–under 20) 6 18,5
of calls per hour [20–under 23) 2 21,5
is 12,44.
79
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate the Number of Number of
median for the calls hours fi F
sample of the 48: [2–under 5) 3 3
hours: [5–under 8) 4 7
determine the [8–under 11) 11 18
cumulative [11–under 14) 13 31
frequencies [14–under 17) 9 40
n/2 = 48/2 = 24
The first cumulative [17–under 20) 6 46
frequency ≥ 24 [20–under 23) 2 48 80
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Median Number of Number of
( ui − li ) ( n2 − Fi −1 ) calls hours fi F
= li +
fi [2–under 5) 3 3
= 11 +
(14 − 11)( 24 − 18 ) [5–under 8) 4 7
13 [8–under 11) 11 18
= 12,38
[11–under 14) 13 31
50% of the time less [14–under 17) 9 40
than 12,38 or 50% of [17–under 20) 6 46
the time more than
12,38 calls per hour. [20–under 23) 2 48 81
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Number of calls at a call centre
The median can
be determined
48
form the ogive.
Number of hours

40
32
24 n/2 = 48/2 = 24
16
8
0
Median = 12,4
2 5 8 11
A
14 17 20 23 Read at A.
Number of calls
82
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
To calculate the calls hours fi xi
mode for the sample [2–under 5) 3 3,5
of the 48 hours: [5–under 8) 4 6,5
draw the histogram [8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
83
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.

Number of calls at a call centre Mode = 12,3

14
Read at A.
12
Number of hours

10
8
6
4
2
0

2 5 8 11
A 14 17 20 23
Number of calls
84
Relationship between mean, median, and mode
• If a distribution is symmetrical:
– the mean, median and mode are the same
and lie at centre of distribution
• If a distribution is non-symmetrical: Mean
– skewed to the left or to the right Mode
Median
– three measures differ
A positively skewed distribution A negatively skewed distribution
(skewed to the right) (skewed to the left)

Mode Mean Mean Mode 85

Median Median
Measures of dispersion
• Range
• Variance
• Standard deviation
• Coefficient of variation

86
• Variance and standard deviation

Where:
– f = frequencies of class intervals
– x = class midpoints of class intervals
– n = sample size
87
• Variance and standard deviation
 (  fx )
2
fx 2
− 1

Population variance =  2
= N

 fx (  fx )
2
2
− 1

Population standard deviation =  = N

N
Where:
– f = frequencies of class intervals
– x = class midpoints of class intervals
– N = population size

88
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
89
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.of
Number Number of
calls hours fi xi
[2–under 5) 3 3,5
[5–under 8) 4 6,5
[8–under 11) 11 9,5
[11–under 14) 13 12,5
[14–under 17) 9 15,5
[17–under 20) 6 18,5
[20–under 23) 2 21,5
90
n = 48
• Quartiles
• Percentiles
• Interquartile range

91
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
To calculate Q1 Number of Number of
for the sample of calls hours fi F
the 48 hours: [2–under 5) 3 3
[5–under 8) 4 7
[8–under 11) 11 18
[11–under 14) 13 31
n/4 = 48/4 = 12 [14–under 17) 9 40
The first cumulative [17–under 20) 6 46
frequency ≥ 12 [20–under 23) 2 48 92
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q1 Number of Number of

= lQ + 1
( uQ
1
− lQ )( 4 − FQ −1 )
1
n calls hours fi F
1
1
fQ [2–under 5) 3 3
= 8+
(11 − 8 )(12 − 7 ) [5–under 8) 4 7
= 9,36
11 [8–under 11) 11 18
[11–under 14) 13 31
25% of the time less [14–under 17) 9 40
than 9,36 or 75% of [17–under 20) 6 46
the time more than
9,36 calls per hour. [20–under 23) 2 48 93
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q3 Number of Number of
calls hours fi F
= 3n/4
= 3(48)/4 [2–under 5) 3 3
= 36 [5–under 8) 4 7
The first cumulative [8–under 11) 11 18
frequency ≥ 36 [11–under 14) 13 31
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48 94
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Q3 Number of Number of
= lQ +
(3
uQ − lQ )( 34n − FQ −1 )
3 3
calls hours fi F
fQ [2–under 5) 3 3
3
3

= 14 +
(17 − 14 )( 36 − 31)
[5–under 8) 4 7
9
= 15, 67 [8–under 11) 11 18
[11–under 14) 13 31
75% of the time less [14–under 17) 9 40
than 15,67 or 25% of [17–under 20) 6 46
the time more than
15,67 calls per hour. [20–under 23) 2 48 95
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
Number of Number of
Q3 = 15,67 calls hours fi F
Q1 = 9,36 [2–under 5) 3 3
[5–under 8) 4 7
IQR [8–under 11) 11 18
[11–under 14) 13 31
= 15,67 – 9,36 [14–under 17) 9 40
= 6,31 [17–under 20) 6 46
[20–under 23) 2 48 96
n = 48
• PERCENTILES
– Order data in ascending order.
– Divide data set into hundred parts.

10% 90%
Min P10 Max

80% 20%
Min P80 Max

50% 50%
Min P50 = Q2 Max 97
Example – The following data represents the number of
telephone calls received for two days at a municipal call
centre. The data was measured per hour.
Number of Number of
P60 calls hours fi F
= np/100 [2–under 5) 3 3
= 48(60)/100
[5–under 8) 4 7
= 28,8
[8–under 11) 11 18
The first cumulative
[11–under 14) 13 31
frequency ≥ 28,8
[14–under 17) 9 40
[17–under 20) 6 46
[20–under 23) 2 48 98
n = 48
Example – The following data represents the number of
telephone calls received for two days at a municipal call centre.
The data was measured per hour.
P60 Number of Number of
( u p − l p ) ( 100
np
− Fp −1 ) calls hours fi F
= lp +
fp [2–under 5) 3 3
= 11 +
( 14 − 11)( 28,8 − 18 ) [5–under 8) 4 7
= 13, 49
13 [8–under 11) 11 18
[11–under 14) 13 31
60% of the time less [14–under 17) 9 40
than 13,49 or 40% of [17–under 20) 6 46
the time more than
13,49 calls per hour. [20–under 23) 2 48 99
n = 48
BOX-AND-WISKER PLOT
Me = 12.38
Q3 = 15.67
Q1 = 9.36
IQR = 6.31

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28

100
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IqR = 6.31
1,5(IQR) IQR 1,5(IQR)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
101
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38
Q3 = 15.67
Q1 = 9.36
IqR = 6.31

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
102
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IQR = 6.31

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
103
• Any value larger than 25,14 will be an outlier.
BOX-AND-WISKER PLOT
Me = 12.38 LL = Q1 – 1,5(IQR) = 9,36 – 1,5(6,31) = –0,11
Q3 = 15.67
Q1 = 9.36 UL = Q3 + 1,5(IQR) = 15,67 + 1,5(6,31) = 25,14
IqR = 6.31
1,5(IQR) IQR 1,5(IQR)

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28
• Any value smaller than −0,11 will be an outlier.
104
• Any value larger than 25,14 will be an outlier.
The box-and-whisker plots

It is a graphical method that can indicate whether the distribution of

set data is symmetric, positive or negatively skewed.
It helps us picture a set of data.
Positively skewed – the distance from the Median to Q3 is larger
than the distance from the Median to Q1; or
the distance from Q3 to UL is larger than the
distance from Q1 to LL; or
the longer part of the box is to the right hand
side of the median
The box-and-whisker plots

Negatively skewed – the distance from the Median to Q1 is larger

than the distance from the Median to Q3, or
the distance from Q1 to LL is larger than the
distance from Q3 to UL, or
the longer part of the box is to the left hand
side of the median.
Normality assessment methods
The Box-and-Whisker plot

Symmetric

Left skewed, mean < median Right skewed, mean > median
Outliers

An outlier is an observation in the dataset that differs significantly

from other observations.

An outlier can be extremely high or extremely low.

An outlier can be detected by using the Box-plot, standardized

residuals plot (Q-Q plot) or the outlier test.

At this level we’ll stick to the Box-plot method of detecting outliers.

Observation values that are larger than the UL are the outliers.
Observation values that are lower than the LL are the outliers.
The box-and-whisker plots

Example: Construct the box-plot of the data below.

46 49 53 51 49 59 54 55 55 54 50

46 49 49 50 51 53 54 54 55 55 59

n + 1 11 + 1 rd n + 1 11 + 1 th
Posion of Q1 : = = 3 observation. Posion of Q2 = Median : = = 6 observation.
4 4
2 2
Therefore, Q1 = 49
Therefore, Q2 = 53
3(n + 1) 3(11 + 1) th
Posion of Q3 : = = 9 observation.
4 4
Therefore, Q3 = 55

IQR = Q3 – Q1 = 55 – 49 = 6. UL = Q3 + 1.5IQR = 55 + 1.5 x 6 = 64

LL = Q1 – 1.5IQR = 49 – 1.5 x 6 = 40
The box-and-whisker plots
Example: Construct the box-plot of the data below.
IQR = Q3 – Q1 = 55 – 49 = 6. UL = Q3 + 1.5IQR = 55 + 1.5 x 6 = 64
LL = Q1 – 1.5IQR = 49 – 1.5 x 6 = 40

Box-plot of the data

The box-and-whisker plots

Example: The data below are the distances travelled by

employees of The Tile Shop from home to work.
Construct the box-plot of these data.
data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 34]

q1 = 13; q2 = 24.5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16

The box-and-whisker plots

Example: The data below are the distances travelled by employees

of The Tile Shop from home to work. Construct the box-plot of
these data.
data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 34]
q = 13; q2 = 24.5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16
The box-and-whisker plots
Example: q1 = 13; q2 = 24.5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16

LL = Q1-1.5IQR = 13-1.5(16) = -11; UL = Q3 + 1.5IQR = 29 + 1.5(16) = 53

There are no outliers.
The box-and-whisker plots

Example: data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 55]

34 was changed to 55. Let’s see if we’ll have an outlier.
q1 = 13; q2 = 24,5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16
LL = 13 – 1.5(16) = -11; UL = 29 + 1.5(16) = 53
The box-and-whisker plots

Example: The data below are the distances travelled by employees of The Tile Shop
from home to work. Construct the box-plot of these data.
data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 55]
34 was changed to 55. Let’s see if we’ll have an outlier.
q1 = 13; q2 = 24,5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16
LL = 13 – 1.5(16) = -11; UL = 29 + 1.5(16) = 53
References
Lombard C, van der Merwe L, Kele T and
Mouton S. 2012. Elementary Statistics for
Business and Economics.

Quantitative Methods in Management
No ratings yet
Quantitative Methods in Management
150 pages
Chapter 1 - Introduction To Statistics
No ratings yet
Chapter 1 - Introduction To Statistics
30 pages
BS - Ch1 - Data Collection
No ratings yet
BS - Ch1 - Data Collection
10 pages
Sampling Techniques and Unbiased Estimates Notes
No ratings yet
Sampling Techniques and Unbiased Estimates Notes
11 pages
CHAPTER 1 and 2
No ratings yet
CHAPTER 1 and 2
18 pages
QTS105D Study Notes
No ratings yet
QTS105D Study Notes
184 pages
Probability and Statistics Lesson 1 2
No ratings yet
Probability and Statistics Lesson 1 2
47 pages
Introduction to Statistics Guide
No ratings yet
Introduction to Statistics Guide
12 pages
Lecture 3 Sampling and Sampling Distribution - Probability and Non-Probability Sampling
No ratings yet
Lecture 3 Sampling and Sampling Distribution - Probability and Non-Probability Sampling
16 pages
NSTA 51516 Slides
No ratings yet
NSTA 51516 Slides
97 pages
Statistics For Beginners 2024
No ratings yet
Statistics For Beginners 2024
37 pages
Week 5
No ratings yet
Week 5
31 pages
MMW Reviewer Data Management
No ratings yet
MMW Reviewer Data Management
17 pages
Unit 1 EDMA 323 Slides
No ratings yet
Unit 1 EDMA 323 Slides
36 pages
Statistics (Part 1)
No ratings yet
Statistics (Part 1)
98 pages
Elementary Statistics Module 2
No ratings yet
Elementary Statistics Module 2
33 pages
Lecture 1
No ratings yet
Lecture 1
65 pages
Sampling Dr. Meeta Joshi
No ratings yet
Sampling Dr. Meeta Joshi
25 pages
Summary of Lectures
No ratings yet
Summary of Lectures
36 pages
Module 6 Lesson 1
No ratings yet
Module 6 Lesson 1
8 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
30 pages
LP 5
No ratings yet
LP 5
5 pages
LP 8
No ratings yet
LP 8
5 pages
Lesson 1 - Basic Concepts of Statistics
No ratings yet
Lesson 1 - Basic Concepts of Statistics
49 pages
Quantitative Methods For Management: Session 9 PP: 299-335
No ratings yet
Quantitative Methods For Management: Session 9 PP: 299-335
88 pages
Lecture 3 Sampling
No ratings yet
Lecture 3 Sampling
83 pages
CH 1 Lecture Notes
No ratings yet
CH 1 Lecture Notes
10 pages
Sampling Basics for Researchers
No ratings yet
Sampling Basics for Researchers
15 pages
Classification and Presentation of Data
No ratings yet
Classification and Presentation of Data
35 pages
Chapter 1
No ratings yet
Chapter 1
12 pages
MODULE in Stat Week 6
No ratings yet
MODULE in Stat Week 6
10 pages
Which of The Following Does Not Belong To The Group?
No ratings yet
Which of The Following Does Not Belong To The Group?
41 pages
Week 6, 7
No ratings yet
Week 6, 7
37 pages
Statistics For Beginners
No ratings yet
Statistics For Beginners
35 pages
Statistics and Probability
No ratings yet
Statistics and Probability
22 pages
B.SC (CS With AI) Unit - 1
No ratings yet
B.SC (CS With AI) Unit - 1
19 pages
Lecture Notes - Prob and Stat
No ratings yet
Lecture Notes - Prob and Stat
229 pages
Experimental: Chp2: Measures of Chp3: Representation
No ratings yet
Experimental: Chp2: Measures of Chp3: Representation
26 pages
EM 104 Module
No ratings yet
EM 104 Module
12 pages
Sampling for Statistics Students
No ratings yet
Sampling for Statistics Students
35 pages
Data Management & Statistics Guide
No ratings yet
Data Management & Statistics Guide
17 pages
Data Collection
No ratings yet
Data Collection
112 pages
Statistics - MMW
No ratings yet
Statistics - MMW
15 pages
Statistics and Propability 1
No ratings yet
Statistics and Propability 1
35 pages
Identifying The Different Random Sampling Techniques 2024
No ratings yet
Identifying The Different Random Sampling Techniques 2024
68 pages
Statistics For Managenent II
No ratings yet
Statistics For Managenent II
73 pages
Chapter 3 Sampling & Estimation Theory
No ratings yet
Chapter 3 Sampling & Estimation Theory
52 pages
Lesson 4
No ratings yet
Lesson 4
22 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
7 pages
Sampling and Sampling Distribution
100% (2)
Sampling and Sampling Distribution
43 pages
Statistics and Probability
No ratings yet
Statistics and Probability
19 pages
Week 1: To Statistics: 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Sampling Technique and Data Collection
No ratings yet
Week 1: To Statistics: 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Sampling Technique and Data Collection
27 pages
Lesson 1
No ratings yet
Lesson 1
77 pages
Unit 4 Statistics Notes 1 2023-24
No ratings yet
Unit 4 Statistics Notes 1 2023-24
6 pages
Est&Hypgp 7
No ratings yet
Est&Hypgp 7
292 pages
Math
No ratings yet
Math
10 pages
Statistics Chapter 1
No ratings yet
Statistics Chapter 1
3 pages
Q1 Module 5 Statistics
No ratings yet
Q1 Module 5 Statistics
15 pages
Class BSC Book Statistics All Chpter Wise Notes
66% (50)
Class BSC Book Statistics All Chpter Wise Notes
128 pages
Relationship Between Technology and Work Stress
100% (1)
Relationship Between Technology and Work Stress
23 pages
Topic III Sts Human Flourishing
No ratings yet
Topic III Sts Human Flourishing
43 pages
20th Circle Competition April 2024
No ratings yet
20th Circle Competition April 2024
1 page
The True Trigger of Shame Social Devaluation Is Su PDF
No ratings yet
The True Trigger of Shame Social Devaluation Is Su PDF
9 pages
M.C.A II Year II Sem 12-05-2025
No ratings yet
M.C.A II Year II Sem 12-05-2025
1 page
Guidelines Quiz (Schools) 2025-26
No ratings yet
Guidelines Quiz (Schools) 2025-26
5 pages
Models of Classroom Management As Applied To The Secondary Classroom PDF
100% (1)
Models of Classroom Management As Applied To The Secondary Classroom PDF
5 pages
The Road Not Taken
No ratings yet
The Road Not Taken
8 pages
Empathy - Statements - Customer Service
No ratings yet
Empathy - Statements - Customer Service
3 pages
Diana Agrest, Patricia Conway, Leslie Kanes Weisman-The Sex of Architecture-Harry N. Abrams (1996)
100% (2)
Diana Agrest, Patricia Conway, Leslie Kanes Weisman-The Sex of Architecture-Harry N. Abrams (1996)
330 pages
Pattern Sequence Mining: Presented By: Devika Mittal
No ratings yet
Pattern Sequence Mining: Presented By: Devika Mittal
15 pages
Teaching Timetable Jan - Apr 2022 TVET
No ratings yet
Teaching Timetable Jan - Apr 2022 TVET
12 pages
Control Plan Process Audit Checklist and PDCA Rev3 (24jan2012)
No ratings yet
Control Plan Process Audit Checklist and PDCA Rev3 (24jan2012)
3 pages
FCE Listening Test - Tips & Strategies
No ratings yet
FCE Listening Test - Tips & Strategies
3 pages
Unit 7: Management Styles
No ratings yet
Unit 7: Management Styles
3 pages
1 s2.0 S0169814107001709 Main
No ratings yet
1 s2.0 S0169814107001709 Main
8 pages
Presentation of Parallel Universe To My Ma'm Dr. Ayesha GCUF
0% (1)
Presentation of Parallel Universe To My Ma'm Dr. Ayesha GCUF
22 pages
C1 Writing Dossier
100% (1)
C1 Writing Dossier
36 pages
Ece PDF
No ratings yet
Ece PDF
157 pages
Understanding Percentiles & Std Dev
No ratings yet
Understanding Percentiles & Std Dev
3 pages
Mba First Question Papers
100% (2)
Mba First Question Papers
14 pages
Kelly Attachment Report
No ratings yet
Kelly Attachment Report
18 pages
Design and Analysis of Experiments 2
100% (1)
Design and Analysis of Experiments 2
8 pages
Social Group MCQ Question and Answer - Sociology MCQ Question Page-2 Section-1
No ratings yet
Social Group MCQ Question and Answer - Sociology MCQ Question Page-2 Section-1
4 pages
2 Thuja
33% (3)
2 Thuja
50 pages
Hypothesis: Presented To PROF - DR .Saraswathy Mam GCN Presented by Naveen Kumar C N 1st Year M SC (N)
100% (2)
Hypothesis: Presented To PROF - DR .Saraswathy Mam GCN Presented by Naveen Kumar C N 1st Year M SC (N)
60 pages
Detailed Lesson Plan in English 9
100% (6)
Detailed Lesson Plan in English 9
5 pages
Lesson Plan Scenario
No ratings yet
Lesson Plan Scenario
2 pages
Review On Hand Gesture Recognition
No ratings yet
Review On Hand Gesture Recognition
5 pages
Kuhn's Changing Concept of Incommensurability
No ratings yet
Kuhn's Changing Concept of Incommensurability
16 pages

Unit 1 STUDENTS Introduction, Graphs and Descriptive

Uploaded by

Unit 1 STUDENTS Introduction, Graphs and Descriptive

Uploaded by

1

Data collection methods

Personal interviews – census, research, surveys

Telephone interviews – call centres, insurance

Self administered questionnaires - application forms

_______; _______; ________; ________; _________;

_________; _________; ___________; _________; _________;

• Sample size form each n = n1 + n2 + ….. + nk

• How many of each category of staff should be

English definition – acting according to a fixed plan or system

Example: Draw a sample of 6 from a population of 56.

Select a starting point (start at 5), let’s choose.

English definition – acting according to a fixed plan or system

Example: Draw a sample of 5 from a population of 85.

Select a starting point (start at 11), let’s choose.

Emp. no. Gender Dept. Emp. no. Gender Dept …..

HR Marketing Finance Total

Employees at ABC Employees at ABC

Number of classes: c = 1 + 3.3log n

l arg est − smallest

Step 2: Determine the number of classes required

Frequency distribution polygon of the telephone calls received

sum of sample observations

(n+1)/2 = (9+1)/2 = 5th observation

(n+1)/2 = (9+1)/2 = 5th observation

(n+1)/2 = (10+1)/2 = 5,5th observation

Calculation formula for sample variance: s 2

Calculation formula for sample variance:

Calculation formula for sample standard deviation:

25% 25% 25% 25%

Therefore Q3 = 8 + 1.5 = 9.5 68

Number of calls at a call centre Mode = 12,3

Mode Mean Mean Mode 85

Population standard deviation =  = N

It is a graphical method that can indicate whether the distribution of

Negatively skewed – the distance from the Median to Q1 is larger

An outlier is an observation in the dataset that differs significantly

An outlier can be extremely high or extremely low.

An outlier can be detected by using the Box-plot, standardized

At this level we’ll stick to the Box-plot method of detecting outliers.

Example: Construct the box-plot of the data below.

IQR = Q3 – Q1 = 55 – 49 = 6. UL = Q3 + 1.5IQR = 55 + 1.5 x 6 = 64

Box-plot of the data

Example: The data below are the distances travelled by

q1 = 13; q2 = 24.5; q3 = 29; IQR = q3 – q1 = 29 – 13 = 16

Example: The data below are the distances travelled by employees

LL = Q1-1.5IQR = 13-1.5(16) = -11; UL = Q3 + 1.5IQR = 29 + 1.5(16) = 53

Example: data = [1 3 6 12 13 15 19 20 23 26 26 26 29 29 29 30 32 55]

You might also like

___; _; ; ; ___;

_____; _; ___; _; _____;