QAB II Lecture Notes2021
QAB II Lecture Notes2021
Sampling Techniques
Systematic Sampling
Description: First item is selected randomly with Kth items
Method: Calculation
Formula
Business Examples
Advantages
Disadvantages
Uses or applications of the technique
Stratified Sampling
Description: Homogenous strata, random sampling within each stratum
Cluster Sampling
Description: Heterogeneous clusters
Overview of Statistics
Before examining the broad areas of statistics, it is necessary to become familiar with certain
terms and concepts used extensively in the subject.
1. Random Variable- A characteristic being measured or observed is called a variable for
example weight and height. Since a variable can take on different values at each
measurement or observation, it is termed a random variable that is different
measurements of height or weight for example the distance travelled per day by a
delivery truck.
2. Sampling Unit- A sampling unit is the item or individual being measured or counted
with respect to the random variable under study for example the random variable is
distance and the sampling unit is each delivery truck.
3. Population – Is the collection of all observations of a random variable understudy and
the one on which the researcher is trying to draw conclusions. A population must be
defined in very specific terms to include only those sampling units with characteristics
that are relevant to the problem for example all the delivery vehicles in Zimbabwe.
4. Sample- Not every member of the population is observable or measurable for reasons
mainly of cost and time. A subset of the population on which observations are made
or measurements taken is referred to as a sample for example a random sample of two
hundred delivery vehicles is selected and their daily distances travelled are recorded.
There are two major components in the discipline of statistics
a) Descriptive Statistics- It aims to identify the essential characteristics of a random
variable and produce a profile of its behaviour. This is achieved through summary
measures.
b) Inferential Statistics-This generalizes sample findings to the broader population, it is
that area of statistics which extends the information extracted from a sample to the
actual environment.
1. Qualitative and Quantitative Data
2. Scales of Measurement. a) Nominal scaled data {categorizes data for example gender
and profession}. b) Ordinal scaled data {ranks data for example Likert type scales.
Statement: Nust is the best university in Zimbabwe. Options: Strongly disagreed,
neutral, agree, and strongly agreed} c) Interval scaled data. d) Ratio- scaled data
A random variable whose observations can take only specific values which are integers
{whole numbers is referred to as a discrete random variable}. In such instances certain
values are valid whilst others are invalid e.g. The number of cars in a parking lot at a
given time, the numbers of students in a class or the number of employees in an
organization.
CONTINUOS DATA
A random variable whose observations can take on any value in an interval is said to
generate continuous data for example the mass of a person, distance travelled, and time
taken to travel to work daily.
4) Forecasting
a) Regression and Correlation
Section 1
PRESENTATION OF DATA
UNGROUPED DATA
There are a number of ways in which ungrouped data can be presented such as frequency
distribution tables, stem and leaf.
A frequency distribution is a table which summarizes data with corresponding frequencies.
The following data correspond to the performance of students in a test. Construct the
frequency distribution table to illustrate the information. The random variable is marks
represented by X
X1=10 X6=20 X11=25 X16=21
X2=20 X7=27 X12=15 X17=80
X3=25 X8=80 X13=17 X18=10
X4=15 X9=15 X14=25 X19=27
X5=10 X10=20 X15=30 X20=21
Mark Frequency
10 3
15 3
17 1
20 3
21 2
25 3
27 2
30 3
∑ 𝑓 = 20
55 15 25 50 28 66 73 25 24 47 10 45 54
55 55 43 57 53 65 38 30 29 64 12 70
16 24 25 40 15 36 53 57 24 27
Rough Draft
Stem Leaf
1 0 6 5 2 5
2 9 4 5 5 5 4 4 8 7
3 0 6 8
4 3 5 0 7
5 5 7 3 43 70 5 5
6 6 4 5
7 3 0
Stem Leaf
1 0 25 5 5 6
2 4 4 4 5 5 5 7 8 9
3 0 6 8
4 0 3 5 7
5 0 3 3 45 55 7
6 4 5 6
7 0 3
Marks in Economics
16 29 45 58 64 78
42 54 66 72 34 35
54 91 74 24 84 92
70 78 54 52 18 41 65
Exercise
The data below shows the number of villages interviewed in different villages of the country
1) Construct a frequency distribution to illustrate the data
2) Construct a stem and leaf plot of the data
Grouped Data
Type A: Continuous
Data is grouped into classes for example 20≤30, 30≤40
If the random variable is x: 20<x<30 then 20≤30 is a class where 20m is the lower limit and
30 is the upper limit.
We use the abbreviations LCL and UCL to denote these. The difference between the LCL
and UCL is called the class interval or class length or class width.
Class width= UCL – LCL
= 30 - 20
=10
The sum of the LCL and UCL divided by two gives the midpoint of a class usually denoted
as x
(LCL + UCL)
Midpoint (x) = 2
Class Limits X
20≤30 25
30≤40 35
40≤50 45
50≤60 55
60≤70 65
70≤80 75
80≤90 85
NB. The UCL of the first class is the LCL of the succeeding class.
Type B: Discrete
There are a number of ways of presenting grouped data such as frequency distribution,
histogram, frequency polygon and cumulative frequency distributions.
Example
The owner of a small business once to analyse profits over past 25 day period using a class
interview of 5 beginning at 20 construct. a) Frequency Distribution
b) Histogram
c) Frequency polygon
N.B In the original class you put an interval of four such that the adjusted classes will have an
interview of 5.
21 27 35 41 23
32 30 35 28 38
36 32 33 32 34
42 29 43 37 20
32 30 20 34 35
c. Frequency Polygon
Before constructing a frequency polygon find the midpoint of each and then plot the midpoint
against the corresponding frequency
19.5≤24.5 4 4
24.5≤29.5 3 7
29.5≤34.5 9 16
34.5≤39.5 6 22
39.5≤44.5 3 25
Cumulate comes from the word accumulate. A less than cumulative frequency distribution is
called an ogive. To construct an ogive, plot the UCL of each class against the cumulative
frequency and join the points using free hand [ It has to be a curve]
Example
The following data shows the number of days on which patients visited a clinic for
counselling using a class interval of 15 starting at 125 construct
a) Frequency distribution
b) Cumulative frequency distribution
c) Ogive
d) Relative frequency distribution
𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
The relative frequency is calculated as follows [R.F] = 𝑇𝑜𝑡𝑎𝑙 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Example
The following data are the marks obtained by a group of students in a statistics exam
68 49 69 41 79
42 60 87 65 68
50 61 85 66 63
52 56 74 59 81
57 88 47 55 65
78 90 65 72 95
a) Group the data into classes with an interval of 10 starting at 40 until all the values have
been accounted for.
The behaviour of any random variable can be described by a measure of central tendency and
a measure of dispersion about a central value. Observations of a random variable tend to
group about some central value. The statistical measures that quantify where the majority of
observations are concentrated are referred to as measures of central tendency. There are 3
main measures of central tendency mainly mean, mode and median. Each measure will be
compared for both grouped and ungrouped data.
18 15 7 24 10
23 28 10 16 12
5 23 24 16 19
26 17 27 17 17
29 18 23 9 26
12 22 14 26 22
555
a) Find the mean number of days between orders = = 18.5
30
Grouped data are represented by a frequency distribution. To calculate the mean for grouped
data, find the midpoint for each class and multiply it by the corresponding frequency, the
formula therefore for calculating the mean is given as follows
∑ 𝑓(𝑥) ∑ 𝑓(𝑥)
x̅ = ∑𝑓
, x̅ = 𝑛
where x is the midpoint for each class and f is the absolute frequency for each class.
Find the mean number of days between orders using the data from the previous example
assuming that the data are grouped as shown in the following table
565
The average time between order is = 18.83
30
The mean for grouped data is not exactly equal to the mean for ungrouped data. The more
reliable mean is the one for ungrouped data because it uses absolute values unlike grouped
data which puts the values unto classes.
N.B The mean uses every value of the data set in its computation as a result it possesses
certain useful properties, which make it the most widely, used measure of central
tendency.
Mode is the most frequently occurring value in a dataset. If the number of observations is not
too large, the mode can be found by arranging the data in ascending order and by inspection
that is identifying the value that occurs the most.
Example
74, 48, 36,74, 70, 67, 48, 74, 70, 36, 36, 40, 50, 74
36, 36, 36, 40, 48, 48, 50, 67, 70, 70, 74, 74, 74, 74
Therefore, Mo=74
Calculating the mode for grouped data is based on a frequency distribution table. The first
step is to identify the modal interval and then determine the modal value within the modal
interval. The formula used to accomplish this is given as follows
Mo= Lm + { [Cm ( fm - fm-1)] divided [2fm - fm-1 –fm+1] }
Classes F
125≤140 4
140≤155 11
155≤170 9
170≤185 8
185≤200 10
200≤215 2
15 (11−4)
Mode = 2(11)−4−9
= 151.67
a)
Classes f F
5≤10 3
10≤15 5
15≤20 9
20≤25 7
25≤30 6
b)
Classes F
50≤90 2
90≤130 9
130≤170 26
170≤210 27
210≤250 6
5( 9−5) 40(27−26)
Mo= 15 +2(9)−5−7 Mo = 170 +2(27)−26−6
= 18.33 = 171.82
N.B A major disadvantage of using the mode as a measure of central tendency is that they
can be more than mode making it difficult to make a decision on which one to sell. A
distribution with one mode is said to be unimodal and with two modes is said to be bimodal
The median is that value of a random variable, which divides an ordered data set into two
equal parts. Half of the observations will fall below this median value and the other half
above it.
When finding the median for ungrouped data, the first step is to arrange the observations in
(𝑛+1)𝑡ℎ
ascending order. If n is odd, identify the median position as the position.
2
(𝑛)𝑡ℎ
If n is even identify the value in the position, average this value and the adjacent value
2
Example
27 38 12 34 42 40 24 40 23
Step
12 23 24 27 34 38 40 40 42
n= 9, n is odd
(𝑛+1)𝑡ℎ (9+1)𝑡ℎ
𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 5𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛.Median =34
2 2
27 38 12 42 40 24 40 23 18 34
Ascending order
12 18 23 24 27 34 38 40 40 42
27+34
Median = = 30.5
2
The formula for finding the median for grouped data using the arithmetic method is as
follows
𝑛
Me= Lm + [Cm (2 - Fm-1)] divided [fm]
Fm-1 = cumulative frequency of the class interval before the median interval
Basically there are two methods of finding the median for grouped data
1. Arithmetic Method- Both the frequency distribution and the cumulative frequency
distribution values are required using the cumulative frequency distribution values.
𝑛
The median interval is that class interval into which the ( 2)𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑎𝑙𝑙𝑠
Examples
Find the median for the following set of grouped data using the arithmetic method.
Classes f F
125≤140 4 4
140≤155 11 15
155≤170 9 24
170≤185 9 33
185≤200 10 43
200≤215 2 45
∑ 𝑥 = 45
𝑛 45
Median Class = ( 2)𝑡ℎ position= ( 2 )𝑡ℎ = 22.5th position
NB: You identify the median class using the cumulative frequency
45
15( −15)
2
Median = 155 + 9
= 167.5
2. Graphical Method: The median is found by reading off the value of random variable
associated with the fifty percent cumulative frequency on the vertical axis
SKEWNESS
After calculating the mean, mode and median the decision has to be made as to which
one should be preferred as a measure of central tendency for a data set. The following
comparisons might help in this endeavour.
Symmetrical Distribution
If the mean = mode= median. Then a symmetrical distribution has been identified. For
a symmetrical distribution the best measures of central tendency is the mean because
it contains all the properties of a given data set.
This is also known as the right –skewed distribution if the mean>median>mode. Then
it means that the data are not evenly distributed that is more data values are
distributed to the left and few data values to the right resulting in a long tail to the
right.
Measures of Position
There are two types of measures of position that is quartiles and percentiles
QUARTILES
These values divide a data set that is ordered in ascending order into four equal parts.
There are three quartiles, which are
Q1 = Lower quartile
Q2 = Middle quartile
Q3 = Upper Quartile
Ungrouped Data
In ungrouped data the observations are arranged in ascending order before the
required quartile position are determined. To get these position the following
formulae are used
𝑛
Q1= (4) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
𝑛
Q2= (2) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
3𝑛
Q3= ( 4 ) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
If n is odd and the value obtained after making the calculations is as whole number,
the required quartile position will be to the count of that figure. If after making the
calculation the answer is not a whole number, consider the next whole position.
Example
Find q1, q2 and q3 for the data below
18 9 11 30 15 22 19 20 35 40 43
9 11 15 18 19 20 22 30 35 40 43
𝑛 11
Q1 = (4)𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = = 2.75 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
Q1= 15
11
Q2 = = 5.5 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 20
2
3(11)
Q3 = = 8.25 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 35
4
If n is even and the result you get after calculation is a whole number average this
value and the value to its right and if it is not a whole number, consider the position to
the next whole number.
18 9 11 30 15 22 19 20 35 40 43 24 9 11 15 18 19
20 22 24 30 35 40 43
𝑛 12 15+18
Q1 = ( 4) th = ( ) = 3rd position = = 16.5
4 2
𝑛 12 20+22
Q2 = ( 2) th = ( ) = 6th position = = 21
2 2
3𝑛 3(12) 30+35
Q3 = ( ) th =( )= 9th position = = 32.5
4 4 2
Grouped data
There are two methods that can be used to compute quartiles for grouped data and
these are the graphical method and arithmetic method. The cumulative frequency
distribution is required for both methods.
Graphical Method
An ogive is used to find or estimate quartiles. To find the Q1 position determine 25%
of the total frequency and find value that corresponds to it on the x-axis. To find Q2
and Q3 determine 50% and 75 %respectively of the total frequency. Example
45
Q1 = = 11,25th
4
45
Q2 = = 22,5th
2
3(45)
Q3 = = 33,75th
4
The following formulae are used to find the lower quartile Q1 and the upper quartile
Q3. Q2 is found using the median formulae
𝑛
𝐶𝑞( −𝐹𝑚−1)
4
Q1 = Lq1 + 𝑓𝑞
3𝑛
𝐶𝑞( −𝐹𝑚−1)
4
Q3 = Lq3 + 𝑓𝑞
𝑛
𝐶𝑞( −𝐹𝑚−1)
4
Q1 = Lq1 + 𝑓𝑞
45
15( −4)
4
= 140 +[ ]
11
= 149,89
3𝑛
𝐶𝑞( −𝐹𝑚−1)
4
Q3 = Lq3 + 𝑓𝑞
3(45)
15( −33)
4
= 185 +[ ]
10
= 186,13
In general any percentile value can be found by adjusting the median formula to find
the required percentile position and from this establish the percentile for example 90th
9𝑛
percentile position =(10) th position
4𝑛
40th percentile position = (10) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
35𝑛
35th percentile position = (100) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
Measures of Position
Symbolic Notation for Sample
A measure found from analysing sample data is called a statistic while a measure
describing a population attribute is called a parameter. Various symbols are used for
each of these measures.
They are used to describe the extent to which the values of a random variable are scattered
about a central value. The central value can be described as more reliable if there is a high
concentration of the values of observations about it. On the other hand, widely spread
observations show low reliability of the central value.
Range
Is the gap or difference between the smallest and biggest observation in a given data set.
Example
The following data show the amount in millions of dollars paid to employees in different
companies. Find the data range and interpret your solution.
16 2 38 9 20 80 3 10 50
Range = 80-2
= $78 million
Since 78 is very close to the highest observation and very far from the lowest observation this
suggests a wide dispersion hence the mean as a measure of central tendency will be strongly
unrepresentative.
Interquartile Range
It is simply the difference between the upper quartile and the lower quartile
Variance
It is a measure of spread or dispersion that includes all the observations of a data set in its
computation. It can be computed for both ungrouped and grouped data.
∑(𝑥 2 ) − 𝑛(𝑥̅ )2
𝑛−1
Where
n= sample size
x̅ = sample mean
The standard deviation is found by computing the square root of the variance
S= √(𝑆)2
∑(𝑥 2 )−𝑛(𝑥̅ )2
S=√( )
𝑛−1
The following data show the weights in kgs of 8 patients who visited a clinic one afternoon.
Compute the variance and standard deviation of the weights.
80 70 60 50 40 35 65 45 Mean = 55,63
X x2
80 6400
70 4900
60 3600
50 2500
40 1600
35 1225
65 4225
45 2025
26475−8(55,63)2
S2 = 8−1
=245,35
= 15.66 kg
Example
The following data give the time in minutes spent by a sample of 20 students to complete a
given task. Showing all workings calculate the standard deviation of the data.
16 29 58 66 78 42 54 72 54 72 54 91 44 84 92 70 78
52 28 41
x̅ = 58.75
∑((𝑥)^2) = 77671
77671−20((58.75)^2)
S2 = 19
= 454,72
= 21.32
∑ 𝑓(𝑥 2 )−𝑛(𝑥̅ )2
S= 𝑛−1
S= √(𝑆)2
∑ 𝑓(𝑥 2 )−𝑛(𝑥̅ )2
S=√( )
𝑛−1
Where f is the frequency for each class and x is the midpoint of each class.
Example
Find the variance and standard deviation for the grouped data below
Classes F x fx fx2
125≤140 4 132.5 530 70225
140≤155 11 147.5 1622.5 239318.75
155≤170 9 162.5 1462.5 237656.25
170≤185 9 177.5 1597.5 283556.25
185≤200 10 192.5 1925 370562.5
200≤215 2 207.5 415 86112.5
∑ 𝑓𝑥 7552,5
x̅ = = = 167.83
𝑛 45
∑ 𝑓(𝑥 2 )−𝑛(𝑥̅ )2
S2 = 𝑛−1
1287431.25−45((167.83)2 )
= 45−1
= 452.74
S= √(452.74= 21.28
Coefficient of Variation
𝑠
Sample C.O.V = 𝑥̅ 𝑥 100
𝛿
Population C.O.V= 𝑥 100
𝜇
The following data show the mean and standard deviation of sales per month and the
experience of employees in years. Calculate and compare the coefficients of variation of two
random variables. Which random variable is exhibiting better variation?
𝑠
COV for experience = 𝑥̅ 𝑥 100
4
= 20 𝑥 100
= 20%
𝑠
COV for sales per month =𝑥̅ 𝑥 100
80
= 500 𝑥 100
= 16%
NB: The one with a higher percentage has greater variability, which means that it is less
consistent. It follows therefore that the one with smaller variability is more consistent.
Interpretation
Sales per month are more consistent than experience as shown by the variability of 16%
compared to 20%.
Coefficient of Skewness
The coefficient of skewness values should lie between negative 3 and positive 3 inclusive
A value less than zero indicate negative skewness. A value equal to zero represents a
symmetrical distribution. A value greater than zero indicate positive skewness.
The common coefficient of skewness that is used is called Pearson’s coefficient of skewness.
The first coefficient of skewness is denoted as Sk1 is calculated as
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒
Sk1 =𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑥̅ −𝑀𝑜𝑑𝑒
= 𝑠
3( 𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
Sk2 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
3( 𝑥̅ −𝑀𝑒𝑑𝑖𝑎𝑛)
= 𝑠
When Sk1 and Sk2 are both less than zero then we have a negatively skewed
distribution.
If Sk1 and Sk2 are both equal to zero, then we have a symmetrical distribution
Sk1 and Sk2 are both greater than zero then we have a positively skewed distribution
Example
Compute Pearson’s first, second coefficient of skewness, and interpret your results.
𝐶𝑚(𝑓𝑚−𝑓𝑚−1)
Mode= Lm+ [2𝑓𝑚−𝑓𝑚−1−𝑓𝑚+1]
15(11−4)
= 140+[ 2(11)−4−9]
= 151.67
Mean= 16783
Mode= 151.67
Median =167.5
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒
Sk1=𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
167.83−151.67
= 21.28
= 0,76
3( 𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
Sk2 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
3( 𝑥̅ −𝑀𝑒𝑑𝑖𝑎𝑛)
= 𝑠
3(167.83−167.5)
= 21.28
= 0.05
PROBABILITY THEORY
Subjective Probability
Objective Probability
Example
a) A girl
b) A boy
3 4
a) P(G) =7 b) P(B) =7
1. A probability value lies only between zero and 1 inclusive that is 0≤P(A)≤1
2. If an event A cannot occur that is it is an impossible event P(A)=0
3. If an event A is certain to occur that is it is definite then P(A) =1
4. The sum of probabilities of all possible outcomes of a random experiment (=1) equals
one that is exhaustive probability for example the sum of the probability of a girl and
probability of a boy equals one that is P(G)+ P(B)=1
5. If P(A) is the probability of event A occurring, then the probability of event A not
occurring is defined as P(A1) = 1-P(A)
Example
Consider a random process of drawing cards from a pack of playing cards find the probability
of selecting
a) A red card
b) A spade
c) An ace
d) Not an ace
26 1
a) P ( red card) = 52 = 2
13 1
b) P ( spade) = 52 = 4
4 1
c) P ( Ace) = 52 = 13
1 12
d) P ( Ace1) = 1 − =
13 13
Consider a random experiment of selecting companies from the Zimbabwe stock exchange
(ZSE).Values for the random variables, which are company size and industry type, are
measured or summarized as shown in the following table
Marginal Probability
A marginal probability is the probability of only a single event e.g the probability of event A
occurring, it is written as P(A). A single event is an event that describes outcomes of one random
variable only. If A represents the event of a small company fund P(A).
29
𝑃(𝐴) =
150
A joint probability is the probability of both events A and B occurring simultaneously on a given
random experiment. It is denoted as : P(A n B)
Let A be event of a small company and B the event of a Finance company. Therefore, the probability
of P (A n B) = 9/150.
CONDITIONAL PROBABILITY
A conditional probability is the probability of an event A occurring given information about the
occurrence of another event B. A conditional event describes the behaviour of a random variable in
light of additional information about a second random variable. A conditional probability is defined
as follows;
P(A n B)
P(A|B) =
P(B)
This the probability of event A occurring given that event B has already occurred.
The essential feature here is that the sample space is reduced to the to the outcomes describing
event B only and not all possible outcomes as for marginal and joint probabilities.
Let A be the event of a large company and B the event of a retail company. Find P(A|B) using
INTERSECTION OF EVENTS
The intersection of events A and B is the set of outcomes that belong to both A and B
simultaneously. It is written as:
Let A be the event of a small company and B the event of a service company. A n B is the set of all
small and service companies. A n B = 6.
UNION OF EVENTS
The Union of events A and B is the set of outcomes that belong to A or B or Both. It is written as A u
B [A or B].
Let A be the event of a small company and be the event of a service company. Then A u B is the set
of all small or service or both companies. A u B = 29 + 10 - 6 = 33.
MUTUALLY EXCLUSIVE EVENTS
Events are mutually exclusive if they cannot occur together on a single trial of a random experiment.
For example, let A be event of a small company and B be the event of a medium company. Events A
and B are mutually exclusive because a randomly selected company from ZSE cannot be both small
and medium at the same time.
The events are said to be statistically independent if the occurrence of one event A has no effect on
the outcome of event B occurring or vice versa. For example, let L be the event of an accident
occurring in London and H be the event of an industrial strike occurring in Harare. These scenarios
have no effect on each other event if they may occur at the same time.
The terms statistically independent events and mutually exclusive event must not be confused.
When the events are mutually exclusive they not statistically independent. They are dependent in
the sense that when one event occurs then the other will not occur. In probability terms the
probability of an intersection between two mutually exclusive events is zero.
PROBABILITY RULES
1. Addition Rule (u ; “or”) ≤ for both mutually and non-mutually exclusive events.
2. Multiplication Rule ( n ; “and”) ≤ For both statistically and non-statistically independent
events.
The probability of either event A or B occurring in a single trial of a random experiment is defined as:
For mutually exclusive events there is no intersection event, therefore P(A n B)= 0.
Let A be the event of small company and B the event of a large company. Since these two events are
mutually exclusive therefore P(A u B) = 29/150 + 84/150 = 113/150.
This is the probability that a randomly selected company from ZSE will either be a small company or
a large company.
Probability of either event A or B occurring in a single trial of a random experiment is given by;
This the probability that a randomly selected company for ZSE will either be a small company or a
service company or both.
If two events are statistically independent, then the multiplication rule reduces to the probability of
P( A|B) = P(A)
NB* two events A and B are statistically independent if the following test can be satisfied.
This means that if the marginal probability of event A equals the conditional probability of A given
that event B has occurred, then these two events are statistically independent. This means that the
prior occurrence of event B does not influence the outcome of event A.
Let A be the event of a media company and B the vent of a Finance company. Determine if the two
events are statistically independent or not.
𝑃(𝐴 𝑛 𝐵)
= 𝑃(𝐴)
𝑃(𝐵)
21
150 = 37
72 150
15
7 37
≠
24 150
Since P(A|B) is not equal to P(A), then these events are not statistically independent.
If two events are non-statistically independent, we apply the following rule. The multiplication rule
may be used to find the joint probability of event A and B occurring on a single trial of a random
experiment i.e that is the intersection of the two events. By rearranging the conditional probability
formula, the multiplication rile is defined as:
𝑃(𝐴 𝑛 𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)
P(A|B) = conditional probability of event A occurring given that B has already occurred.
The personnel department of an insurance company analysed the qualification profile of their 129
managers. The qualifications attained by each manager are shown below.
MANAGEMENYT
LEVELS
Qualification Section Head Department Division Head Total
Level Head
O’Level 28 14 8 50
Diploma 20 24 6 50
Degree 5 10 14 29
Total 53 48 28 129
Answer:
i) P(O’Level)= 50/129
ii) P(Section Head n Diploma)= P(A n B)= P(A)*P(B)
Therefore, being a section head and having a degree are non-statistically independent events.
= 5/129 * 29/129
= 5/129
This a diagram that helps in decision making. The diagram has the shape of a tree and each branch
on the tree represent a logical outcome.
Example: A farmer has 15 cows in which 7 are black and 8 are white, it has been a tradition that he
sells a cow each month, if two cows are sold and then replaced find the probability that
Example: If three playing cards are selected from a pack of playing cards with replacement, what is
the probability of getting at least two diamonds. How would this probability be affected if these
cards were not replaced?
Example: Suppose that we rolled an unbiased dice three times, Find the probability that the
outcome is:
If the event E can be split into K-sub-events i.e …….. such that there are …………, ways of performing
each sub event then the entire set E can be performed as E=
Suppose a man walks from point A to point C via point B as illustrated in diagram below.
A to B = 2 ways
B to C = 2 ways
Example: A restaurant menu has a choice of four starters ten main courses and six deserts. Find the
total number of possible meals that can be ordered.
Example: How many different 7 place number plates are possible if the first 3 places are to be
occupied by alphabetic letters and the final four by numbers?
A permutation is the number of distinct ways in which a group of objects can be arranged, each
possible arrangement is called a permutation. Consider the number of ways of placing 3 of the
letters ABCDEFG in three empty spaces. The first space can be filled in 7 ways, the second space can
be filled in 56 ways, the third space can be filled in 5 ways. Therefore there are 7*6*5=210 ways of
arranging the letters taken from seven letters. This is the number of permutations of three objects
taken from seven objects and it written as:
NB* With a permutation the order in which letter or numbers are arranged is very important.
Example: ABC is a different permutation from ACB and so are CAB, CBA, BAC, BCA.
Is the number of the different ways of arranging a subset of objects selected from a group of objects
where the order is not important. Each possible arrangements is called a combination. ABC gives rise
to many permutations ABC, ACB, CBA, CAB, BAC, BCA. But however it is one combination. Therefore
the number of combinations of three letters from seven letter ABCDEFG is denoted by
Example: From a group of five women and seven men. How many different committees consisting of
two women and three men can be formed?
PROBABILITY DISTRIBUTION
A probability distribution is a list of all possible outcomes of a random variable and their associated
probabilities of occurrence. The expected value of a random experiment which is the mean is given
by the following formula
The Variance
Example: An unbiased dice is thrown once, construct a probability distribution that shows the
possible outcomes and use it to find the expectation and standard deviation. Let X be the possible
outcomes.
Example: A coin is tossed three times, construct a probability distribution for the number of tails that
can be obtained, and use it to calculate the expectation and standard deviation for the number of
tails that occur.
Example: Each customer at a supermarket pays using one of the three methods, cash, cheque and
credit. The probability of randomly selected customer paying by cash is 0.54 and cheque is 0.12.
i) Determine the probability of a randomly selected customer paying by credit card, and
three customers are selected at random find the probability of all three paying by cash.
ii) Exactly one paying by cheque
iii) One paying by cash, one by Cheque and one by a credit card.
The choice of a particular probability distribution function depends primarily on the nature of the
random variable under study.
Discrete or Continuous
These probability distribution function assume that the outcomes of a random variable under study
can take only specific values usually integers eg a car can only take 0,1,2,3,4,5,6… tyres at any time.
The two common types of discrete probability distributions are the Binomial and Poisson
distributions. For a random variable to follow either the poison or binomial distribution, the
following have to be met.
A discrete random variable can be said to follow a Binomial distribution if the following are satisfied:
i) There are two mutually exclusive outcomes of the random variable generally referred to
as success or failure.
ii) The probability of the success outcome is denoted as p, whereas for the failure is q.
P + q =1
iii) The random variable is observed n times and each observation is called a trial.
iv) The trials are assumed to be independent of each other i.e each trial does not influence
the outcomes of another trial.
If a random variable satisfies all the above conditions, it is said to follow a binomial process
Example: Ten students seat for an exam. The probability for each student to pass an exam is 0.2.
What is the probability that three of them will pass the paper?
i) Exactly or Equals
This can be written as P(X=3).
ii) More than or greater than.
iii) Not more than or at most
iv) Less Than
v) Between and inclusive
vi) Between
vii) Or
viii) And
ix) Not less than or at least
Example: Refer to the example above and answer the following questions.
a) What is the probability that more than two students will pass?
b) Less than two students will fail.
c) Between 2 and 4 students will pass the exam.
d) Between 1 and 3 inclusive will fail
e) Two or three students will pass.
f) Calculate the mean and standard deviation for the number of students who will pass.
POISSON DISTRIBUTION
The poison process measures the number of occurrences of a particular outcome of a discrete
random variable in a pre-determined time space, or volume interval for which an average number of
occurrences of the outcome can be determined eg the number of cars arriving at a parking lot in a
hourly interval or the number of telephone calls received in a ten minutes interval. If a distribution
follows a poison process, then
X – Poi ()
Where is the mean the number of occurrences, x is the number of occurrences whose probability
is being calculated.
Example: The average number of errors a junior typist can make in a page is 6. What is the
probability that she makes:
Answer:
Example: A textile producer has established that a spinning machine stops randomly due to thread
breakages at an average rate of 5 stoppages per hour. What is the probability that in a given hour:
Answer:
CONTINOUS PROBABILITY DISTRIBUTIONS
A continuous random variable can take any value in an interval. Continuous probability functions are
used for probabilities associated with intervals of X values. You will encounter many business
situations in which the random variables of interest can be treated as a continuous variable. There
are several continuous distributions that a frequently used to describe a physical situation. The most
common and useful continuous distribution function is the normal distribution, the reason being
that the output for many processes are normally distributed.
A normal probability distribution function finds the probability for a continuous random variable. It
has the following characteristics:
i) It is bell shaped.
ii) It is symmetrical about a central value (The Mean)
iii) The tails of the distribution never touch the X- axis.
iv) A normally distributed random variable is described by two parameters, namely the
mean and the standard deviation.
v) The area under the curve of the PDF of a normal distribution is equal to 1.
i) P(Z<2.31)
ii) P(Z<-1.49)
iii) P(Z>2.1)
iv) P(-2.5<Z)
v) P(0<Z<2.05)
vi) P(-1.52<Z<0.69)
NB* Always sketch a normal probability distribution curve and indicate the area whose probability is
to be found.
Answer:
P(Z<2.05) - P(Z<0)
0,9798 – 0,5000
=0,4798
0,7549 – 0,0643
=0,6906
a. P(0<Z<1,46)
P(Z<1,46) – P (Z <0)
0,9278 – 0,5
=0,4278
0,5-0,0107
=0,4893
0,9066 – 0,0179
=0,8887
0,9812 – 0,8925
=0,0887
The trick is finding probabilities for a normal distribution is to convert the normal distribution to a
standard normal distribution. Values of x associated with any normally distributed random variable
can be converted into corresponding Z values by using the conversion formula.
𝑥−𝜇
Z=
𝜎
The time taken to install a new telephone is found to be normally distributed with the mean time of
45minutes and a standard deviation of 8minutes. For a new installation what is the probability that
𝑥−𝜇
P(Z < 𝜎
)
41−45
P (X < 40) =
8
𝑥−𝜇
b) P (44 < Z < 49) =P (Z < 𝜎
)
49−45 44−45
=P ( 8
<𝑍< 8
)
0,6915 – 0,4483
= 0,2432
= 0,0987
45−45 51−45
d. P (45 < X < 51) = <𝑍<
8 8
0,7734 – 0,5
= 0,2734
The number of customers who enter a certain a super market in a day is normally distributed with the
mean of 400 customers and a standard deviation of 80 customers.
a) What is the probability that on a given day the number of customers is less than 250?
b) Greater than 400
c) Between 300 and 400
d) Between 200 and 500
Solution
µ= 400 , 𝜎 = 80
𝑋− 𝜇
a) P (X < 250) = P ( )
𝜎
250−400
=P ( 80 )
= P (X < -1,875)
= 0,0301
400−400
b) P (X> 400) =
80
= P (X>0)
=0,5
300−400
c) P (300 < Z < 400)= 80
= P (-1,25 < Z < 0)
= 0,5 – 0,1056
= 0,3944
200−400 500−400
d) P (200 < X < 500) = 80 80
=-2,5 1,25
P (2,5 < X < 1,25) = 0,8944 – 0,0062
= 0,8882
There are many situations in business where the population is not normally distributed. For simple
random sample of n observations taken from a population with mean 𝜇 and standard deviation 𝜎.The
sum of the random variables will have an approximately normal distribution. More specifically if
𝑥1, 𝑥2…...𝑥𝑛 is a random sample of size n taken from a population with mean µ and standard
deviation 𝜎 the mean of the sample 𝑥̅ follows a normal distribution with the following parameters.
𝜎2 𝑥̅ − 𝜇
𝑥̅ ~𝑁(𝜇 𝑛
) such that the probability P (𝑥̅ < x) = P (Z < 𝜎 )
√𝑛
1. Formulate the hypothesis that is null hypothesis which is denoted as 𝐻0 and the alternative
hypothesis which is denoted as 𝐻1 .
2. Determine the type of distribution.
3. Determine the areas of acceptance and rejection.
4. Compute the test statistic.
5. Compare the test static with the critical value and draw a conclusion from the result obtained.
The null hypothesis is a claim made about a true value of a population parameter.
The alternative hypothesis is a statement that reverses or oppresses a claim or oppresses a claim made
about a true value of a population parameter.
𝐻0 : µ= 1000
𝐻1 : µ≠ 1000
𝐻0 : µ= a
𝐻1 : µ≠ a
This is a claim that states that a population is greater than or equal to a special value
To identify this type of a hypothesis, look for words such as smaller than, less than, below etc.
The hypothesis for a lower tail test is stated as
𝐻0 : µ= a
𝐻1 : µ˂ a
This is a claim that states that a population parameter is greater than or equal to a specified value. It is
identified by taking note of words like greater than, above, beyond etc
𝐻1 : µ˃a
There are two types of errors that can be made when carrying out a hypothesis.
It is the chance of rejecting a null hypothesis when it is true. It is denoted as α(alpha) which is the
level of significance or the probability of committing a type one error.
This is the chance of accepting a null hypothesis when it is false, it is denoted as β(beta) which is the
probability of committing a type 2 error.
Step2
Determining the type of distribution. There are two common types of distribution used in hypothesis
taking i.e, the Z-distribution and the t- distribution.
n≥30
The acceptance is the region into which when the calculated test statistic falls in it then 𝐻0 is not
rejected. The rejection area is the region into which where the calculated test statistic fails in it then
𝐻0 is rejected.
Critical Values
To arrive at this value, the level of significance α is used and it always given as %.
α = 10%
𝛼 0,1
2
Z = 2 = 0,05
-1,64
α = 5%
5
2
= 0,025 = ±1,96
Zα = Z (0,01) = -2,33
Zα = 0,05 = -1,28
α = 1%
5% = 1,65
10%
Step 4
𝜒−𝜇
For a small sample the test static is calculated as 𝜒 = 𝑠
√𝑛
Step 5
Drawing a conclusion.
The conclusion depends on the results obtained in the step above, if the calculated test static fails
written the rejection region then 𝐻0 is not accepted that is rejected. If it fails within the acceptance
region we fail to reject 𝐻0 .
Large Sample
Χ− 𝜇
𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛
µ = population mean
Example
A firm suspect that the average life of 28000km claimed for certain tires is too high. To check this,
claim the firm puts 40 of these tires on these types on its truck and get a mean life time of 27563km
and a standard deviation 1348km. is this evidence that the mean life time for these tires is in fact less
than 28000km. Carry out an appropriate test using α = 0,01
𝐻0 : µ = 28000km
𝐻1 : µ ˂ 28000km
n =40 =˃ z-distribution.
Critical Value
α = 0,01
𝑍𝛼 = 𝑍0,01
= -2,33
27 563−28000
= 1348
√40
= -2,05
Since 𝑍𝑐𝑎𝑙𝑐 = -2,05 is greater than -2,33 we fail to reject 𝑯𝟎 and conclude at the 1% level of
significance that the mean life time of these tires is 28000km.
A manufacture claims that the light bulbs have an average life of 1600hrs. A sample of 100 light bulbs
tested gave an average life of 1570hrs and standard deviation of 120hrs. Test at the 5% of significance
if this claim is true.
𝐻0 : µ =1600
𝐻1 : µ ≠ 1600
n =100 =˃ Z-distribution
Critical Value
0,05
𝑍𝛼 = 2
= 0,0025 = -1,96
2
1570−1600
= 120
√100
= -2,50
Since 𝑍𝑐𝑎𝑙𝑐 = -2,5 is less than -1,96 we reject 𝐻𝑜 and conclude at the 5% level of significance that the
average life of these light bulbs not equal to 1600.
1. The average monthly salary paid to an employee at a certain company is $340. A study
carried out amongst a sample of 300 employee produced an average monthly salary of $350
with a standard deviation of $60. Test the hypothesis at the 5% level of significance that the
average monthly salary of an employee.
2. The average speed of cars along a high way is 135km\h. A sample study of 200 cars along the
high way showed an average speed of 130km\h with a variance of 900. Test the hypothesis at
the 10% level of significance to determine if the speed of cars along the highway is below
135km\h.
𝐻0 : µ = 340
𝐻1 : µ ≠ 340
N= 300= z-distribution.
Critical Value
5
𝑍𝛼 = 2 = 0,0025 = -1,96
2
Reject 𝐻0 if 𝑍𝑐𝑎𝑙𝑐 < -1,96 or 𝐻0 > 1,96
Χ− 𝜇
Test Statistic : 𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛
350 −340
= 60
√300
= 2,89
Since 𝑍𝑐𝑎𝑙𝑐 = 2,89 greater than 1,96 we reject 𝐻0 n and conclude that at the 5% level of significance
that the average monthly salary of the employees is not equal to $340.
𝐻𝑜 : µ = 135km\h
𝐻1 : µ < 135km\h
N=200 = z-distribution
Critical Value
𝑍𝛼 =10% = 1,28
Reject 𝐻0 if 𝑍𝑐𝑎𝑙𝑐 < -1,28
Χ− 𝜇
Test Statistic : 𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛
130−135
= 30
√200
= -2,86
Since 𝑍𝑐𝑎𝑙𝑐 = -2,36 is greater than -1,28 we reject 𝐻0 and conclude that at the 10% level of
significance that the average speed of cars along a way is less than 135km\h.
Example
α = 0,025 n=24 𝑡∝ ; n-1
𝑡0,025 ; 23 = 2,07
α = 0,005, n=1
𝑡∝ , n-1
𝑡0,005 ; 4 = 4,60
𝐻0 : µ = 20
𝐻1 : µ > 20
𝑛 = 25, t-distribution
Critical Values
𝑡0.01,24
22−20
5 ,= 2
√25
Since 𝑡𝑐𝑎𝑙𝑐 = 2 is less than 2,49 we fail to reject 𝐻0 : µ = 20 and conclude that 1% level of
significance the average price of pair of shoes is $20.
The mean weight of a certain product is assumed to be 85kgs. To prove this, claim a random
sample of 16 such products was studied and it was found that average weight was 83kg with a
standard deviation of 5kgs. Test whether the claim is true or not using α =0,05.
𝐻0 : µ = 85kg
𝐻1 : µ ≠ 85kg
Large Samples
The sum of the two samples should be greater than 30 when 𝑛1 = size of sample 1 and 𝑛2 =
𝑥̅1 −𝑥̅2
sample size of 2. The test statistic is calculated as 𝑍𝑐𝑎𝑙𝑐 = , where
𝜎2 𝜎2
√ 1+ 2
𝑛1 𝑛2
Small Samples
𝑛1 + 𝑛2 < 30
𝑥̅1 −𝑥̅2
The Test Statistic is calculated as 𝑡𝑐𝑎𝑙𝑐 =
𝑠2 𝑠2
√ 1− 2
𝑛1 𝑛2
Critical Values.
𝐻0 : 𝜇1 = 𝜇2
𝐻0 : 𝜇1 <𝜇2
𝑛1 = 15 , 𝑛2 = 12
Critical Value
α = 0,05
Rejection Criterion
Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 is < -1,71
𝑥̅1 −𝑥̅2
Test Statistic is calculated as 𝑡𝑐𝑎𝑙𝑐 =
𝑠2 𝑠2
√ 1− 2
𝑛1 𝑛2
76,2−78,5
= 7,4 6,7
√ +
15 12
= -0,85
Since 𝑡𝑐𝑎𝑙𝑐 = -0,85 which is greater than -1,71 we fail to reject 𝐻0 and conclude at the 5% level of
significance that the mean score of all male students is not lower than that of female students.
A transport company want to compare the performance of 2 cars a Nissan and a Toyota. The Nissan
was used to 75times and its average breakdowns was recorded to be 5 with a variance of 4. The
Toyota was used 63 times and its coverage number of breakdowns was recorded to 4 with a variance
of 3. Test the hypothesis whether the performance of the two cars is the same. Use α = 0,05
𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 ≠𝜇2
𝑛1 = 75 , 𝑛2 = 63
Critical Value
α = 0,05
𝑍∝ = 𝑍0,05 = 0,025
2 2
= ±1,96
=3,15
Since 𝑍𝑐𝑎𝑙𝑐 3,15 is greater than 1,96 we reject 𝐻0 and conclude that at 5% level of significance that
the level of performance of these two cars is not the same.
The principal of a college wants to compare the performance of two teachers, X and Y. X was assed
8times with the mean of 6,2 scores and a variance of 2,15 scores. Y was assed 6times with the, a mean
of 5,8scores and a variance of 1,2 scores. Test the hypothesis at the 1% level of significance that they
is no difference between the mean number of scores obtained by the two teachers
Let X be population 1
Let Y be population 2
Critical Value
α =0,01/2 = 0,005
𝑡∝ = 𝑛1 + 𝑛2 − 2 =±2,98
In some cases, it is possible to pair the measurements from one population or sample. The hypothesis
test tests whether the differences between two measurements, in the population, we will always have
small samples for this type of hypothesis
𝐻0 = µ𝑑 = 0;
𝐻1 = µ𝛼 < 0;
Critical Value
−𝑡∝ 𝑛 − 1;
𝐻0 = µ𝑑 = 0;
𝐻1 = µ𝑑 > 0;
Critical Values
𝑡∝ 𝑛 − 1;
𝐻0 = µ𝑑 = 0;
𝐻1 = µ𝑑 ≠ 0;
Critical Value
± 𝑡∝ n-1;
2
𝑑
𝑡𝑐𝑎𝑙𝑐 = 𝑠𝑑
√𝑛
,where d represents the difference between the before measurements and the after measurement that is
𝑑 =𝐵−𝐴
∑𝑑
𝑑̅ = 𝑛 , n= sample size
∑𝑑 2 −𝑛(𝑑̅ )2
SD= standard deviation of the differences calculated as 𝑠𝑑 = √
𝑛−1
Example
The following table shows the before and after use of tobacco for a particular group of people
Does tobacco use increase in the heartrates of these people? Test using α=5%
𝐻1 = µ𝑑 > 0;
Critical Value
Rejection Criterion
Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 is < -1,83
∑𝑑
Test Statistic = 𝑑̅ = 𝑛
=-192/10
= -19,2
∑𝑑 2 −𝑛(𝑑̅ )2
𝑠𝑑 = √ 𝑛−1
4376−10 (−19,20)^2
=√ 9
= 8,75
−19,2
𝑡𝑐𝑎𝑙𝑐 = 8,75 ;
√10
= -6,94
Since 𝑡𝑐𝑎𝑙𝑐 = -6,94 is less than -1,83 we reject 𝐻0 at 5% level of significance and conclude that the use
of tobacco does caused an increase in the heart rates.
You have been trying to control the weight of a chocolate candy bar by intervening in the production
process. The following table shows the weight of before and after intervention. Has the intervention
managed to reduce the weight of the chocolate candy bar? Test at the 1% level of significance upper
tailed.
𝐻0 = µ𝑑 = 0;
𝐻1 = µ𝑑 > 0;
Critical Values
0,45
𝑑= 10
= -0,045
0,0539 − 10(0,045)^2
𝑠𝑑 = √
9
=2,33
Since 𝑡𝑐𝑎𝑙𝑐 =2,33 is less than 2,82 we fail to reject 𝐻0 at 1% significant level and conclude that the
intervention has managed to control the weight of chocolate candy bars.
The common statements like ‘the average price of petrol per liter is between $1,40 & $1,50’are
examples of interval estimates. In statistics it is customary to give not only the interval estimate for a
parameter but the probability it will lead to the interval which contains the parameter. The probability
is the level of confidence for example 90%, 95%, 97%.
Small Sample
,n<30
The confidence interval estimate for a small for a small sample is given by following formula
𝑠 𝑠
𝑥̅ − 𝑡∝ 𝑛 − 1 ( 𝑛) ≤ 𝜇 ≤ 𝑥̅ + 𝑡∝ ; 𝑛 − 1( 𝑛);
2 √ 2 √
𝑠
𝑥̅ ± 𝑡𝛼 ;n-1(√𝑛)
2
Example
From the sample of 64 car commuters. The sample mean time taken to commute to work daily was
found to be 26,5minutes if the standard deviation is known to be 15minutes. Find the 95% confidence
interval estimate of the actual mean time µ taken by all car commuters.
,n=64≫ 𝑧-distribution
α = 100 – 95%
=5%
=0,05%
𝑥̅ − 𝑍∝ 𝜎 ≤ 𝜇 ≤ 𝑥̅ + 𝑍∝ 𝜎
2 √𝑛 2 √𝑛
̅̅̅̅̅̅ − 𝑍0,05
26.5 15 ̅̅̅̅̅̅ + 𝑍0,05
≤ 𝜇 ≤ 26,5 15
2 √64 2 √64
̅̅̅̅̅̅ − 𝑍
26,5 15 ̅̅̅̅̅̅ + 𝑍
≤ 𝜇 ≤ 26,5 15
0,025 0,025
√64 √64
We are 95% confident that the mean time taken to commute to work daily uses daily between
22,83minutes and 30,18minutes.
If the sample size in the example above was 25 and the means and standard deviation remaining the
sample the same compute α 99% Confidence interval estimate of the population mean µ.
,n=25 →t distribution
α= 100%- CI
100%-99%
=0,01
15 15
̅̅̅̅̅̅
26,5 − 𝑡0,01 24 ( ) ≤ 𝜇 ≤ ̅̅̅̅̅̅
26,5 + 𝑡0,01 ; 24( );
2 √25 2 √25
18,1 ≤ µ ≤ 34,9
The mean taken to commute to work daily lies between 18,20minutes and 34,90minutes with a
probability of 0,99.
Large Samples
𝜎 𝜎 2 2 2
𝜎 𝜎 2
(𝑋̅1 − 𝑋̅2 ) − 𝑍∝ √ + 2
≤ (𝜇1 − 𝜇2 ) ≤ (𝑋̅1 − 𝑋̅2 ) − 𝑍∝ √ + 2
2 𝑛1 𝑛2 2 𝑛1 𝑛2
Or
𝜎 𝜎 2 2
(𝑋̅1 − 𝑋̅2 ) ± 𝑍∝ √𝑛 + 𝑛 2
2 1 2
A company has two shops A&B to compare the efficiency of the employees of these two shops. 30
employees were sampled from shop A & 20 from shop B & their performance were observed. Shop A
employees completed a given task within 30minutes averagely with a sample standard deviation
6minutes. Shop B employees took given 25minutes to complete the same task an average with a
sample variance of 25minutes. Construct a 95% confidence interval estimate for the difference in the
mean of the number of minutes taken to complete the task by the employees from two shops
𝑛1 − 𝑛2 > 30 = 𝑧 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝜎 2
𝜎 2 𝜎 𝜎 2 2
(𝑋̅1 − 𝑋̅2 ) − 𝑍∝ √𝑛 + 𝑛 2 ≤ (𝜇1 − 𝜇2 ) ≤ (𝑋̅1 − 𝑋̅2 ) − 𝑍∝ √𝑛 + 𝑛 2
2 1 2 2 1 2
62 252 62 252
(30 − 25) − 𝑍0,05 √30 + 20
≤ (𝜇1 − 𝜇2 ) ≤ (30 − 25) − 𝑍0,05 √30 + 20
2 2
62 252 62 252
5 – Z(0,025) √ + ≤ (𝜇1 − 𝜇2 ) ≤ 5+√ +
30 20 30 20
1,93≤x≤8,07
We are 95% confident that the difference between the mean number of minutes taken to complete a
given task by the employees from the two shops is between 1,93minutes and 8,07
Small Samples
𝑛1 − 𝑛2 < 30 = 𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝑠2 𝑠2 𝑠2 𝑠2
((𝑋̅1 − 𝑋̅2 ) − 𝑡𝛼;𝑛 − 𝑛2 −2
√ + 2 ≤ (𝜇1 − 𝜇2 ) ≤ ((𝑋̅1 − 𝑋̅2 ) − 𝑡𝛼;𝑛 − 𝑛 −2 √ + 2
2 1 𝑛1 𝑛2 2 1 2 𝑛1 𝑛2
Or
𝑠2 𝑠2
((𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼;𝑛 − 𝑛2 −2
√ + 2
2 1 𝑛1 𝑛2
If the sample size n in the example above were 15 employees from shop A & 10 employees from shop
B and the standard deviation remaining the same. Compute 90% confidence interval estimate for the
difference in the mean number of minutes taken to complete the task given from the two shops.
62 252 62 252
((30 − 25) − 𝑡0,05 √ + ≤ (𝜇1 − 𝜇2 ) ≤ (30 − 25) − 𝑡0,05 √ +
15 10 15 10
A sample to be drawn from a given population must be represent for a fair conclusion to be made
about the population being represented. It follows then than for a given confidence level and sample
standard deviation and a mean size within the true average is expected to fail then the sample size can
be calculated using the formula
𝑛 = (Z𝛼 , 𝜎)^2
2
e
σ = standard deviation which shows how much variance one expects on their response
e = margin error.
A recent study of a private company employees, salaries showed a standard deviation of $251,35. The
study would like to estimate the mean salary to be within ±80% of the true mean with a 90%
confidence level. What sample size of the employees must be de for the study.
,e= 80
σ = 251,35
α = 100-90 = 0,1
𝑍0,1
𝑛=( 2
, 251,35)2
80
𝑍(0,05)∗251,35 2
=( 80
)
1,64∗251,35 2
=( )
80
=27 employees
The Chi`- SQUARE DISTRIBUTION OR TEST
The Chi-square distribution is a distribution obtained from multiplying the ratio of sample variance to
population variance by the degree of freedom when a random sample, are selected. Expected
frequencies denoted as Ѐ are frequencies obtained by calculations where, as observed denoted by Ӧ
are obtained by observations. The Chi-squared distribution is denoted 𝜒 2 and it is used to test for
independency.
In this test the claim is that the row and column variable are independent of each other. The
hypothesis for this test is stated as follows.
Or
Or
Critical Value
𝜒 2 𝛼 ; (𝑟 − 1)(𝑐 − 1).
,r = number of rows
In order to determine whether or not a relationship exists between blood type and the severity to the
winter , a survey was concluded and yield the following results.
Critical Value
α = 0,05
r = 3, c = 4
rejection criterion
α = 0,05 r = 7, c = 5
d.f = 6
𝜒 2 0,05 ; (24)=36,4
Rejection Criterion
Reject 𝐻0 if 𝜒 2 𝑐𝑎𝑙𝑐 > 𝜒 2 𝛼 ; (𝑟 − 1)(𝑐 − 1)
(𝑂−𝐸)2
Test Statistic =∑
𝐸
NB. Each observed frequency must have its own expected frequency
Test statistic
(𝑂−𝐸)2
𝜒 2 𝑐𝑎𝑙𝑐 = ∑ 𝐸
Since 𝜒 2 𝑐𝑎𝑙𝑐 =56,73988 is greater than 12,60 we reject 𝐻0 at the 5% level of significance and
conclude that there is a relationship between blood type and the severity of winter flue.
A survey of 382 respondents produced the following results.
1 2 3 total
1 45 87 52 184
2 33 65 100 198
total 79 152 152 383
Test the hypothesis that a response failing in any response is independent of the column it will fail use
α = 1%.
Regression analysis is a statistical method that establishes a linear relationship between two variables.
Correlation analysis closely looks at the strength of this linear relationship between variables.
The purpose of simple linear regression analysis is to examine some form of linear relationship
between two random variables. These variables are denoted x and y, z
X values are always known or they can be always known or can easily be found whereas Y values are
estimated using x values.
Scatter Plot
This is a plot of x values against y values x values make up the horizontal line of the graph and y
values make up the vertical line of the graph is drawn by plotting dots into space where the values of
x and y Intersect. If the dots seem to lie in a linear form, then a linear relationship exists between the
two variables.
This suggest that x values can be confidently used in predicting the y values
Scatter Plot
Y-Values
3.5
3
2.5
2
1.5
1
0.5
0
0 0.5 1 1.5 2 2.5 3
This indicate that they is a linear relationship between x and y and its positive
As x increases y increases
y
6
0
0 1 2 3 4 5 6
y
6
5
4
3
2
1
0
0 1 2 3 4 5 6
This indicates a linear relationship between x and y and its negative.
A perfect negative linear relationship for a negative linear relation as x increases y decreases. If the
dots are scattered all over the space this suggests no linear relationship between x and y.
If a linear relationship exists between two variables, then x values can be relied upon in pretending the
y values. If a linear relationship does not in predicting the y values, then the x values cannot be relied
upon in predicting the y values
The following data shows the number of garments and the size of cloth meters.
number cloth in
of meters
garment
45 25
28 16
34 20
42 28
34 19
30 17
42 22
39 20
24 14
32 17
20 6
The dependent variable is the number of garment, independent variable is the cloth in meters.
cloth in meters
30
25
20
15
10
0
0 10 20 30 40 50
From the scatter plot a positive linear relationship exist between cloth in meters and number
of garment.
The following data gives different profits for a particular type of machine sold and the
number of units sold in different shop.
a) Determine the independent and dependent variables.
b) Construct a scatter plot of the data and comment.
Solution
Dependent variable =profits
Independent variable = number of units
profit number
of units
550 42
600 38
650 35
600 40
500 44
650 38
450 45
500 42
number of units
50
40
30
20
10
0
0 100 200 300 400 500 600 700
The estimated value of dependent variable y is composed of a linear function 𝛽̂0 + 𝛽1 𝑥 of the
explanatory variable x
The parameter 𝛽̂0 is known as the intercept parameter and the parameter 𝛽1 is known as the slope
parameter. The slope parameter 𝛽1 is of particular interest since it indicates how the expected value of
y depends on x if 𝛽1 >
0 then a positive linear
relationship exist
y
between x and y.
6
0
0 1 2 3 4 5 6
0
0 1 2 3 4 5 6
y
6
0
0 1 2 3 4 5 6
NB: The two unknown parameters 𝛽̂0 & 𝛽1 are estimated from a data set
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽1 = 𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
𝛽̂0 is calculated from 𝛽1 as follows
∑𝑦 ∑𝑥
− 𝛽1 ( ) →𝛽̂0 = 𝑦̂ − 𝛽1 𝑥̅
𝑛 𝑛
NB: for a specific value of the explanatory variable x the equation provides an estimated value
of y
Example
The following is sample data obtained in a study of the relationship between the number of years that
applicants for a certain job have studied English language in high school or college and the grades
which they received in a proficiency test in that language.
d). super impose the equation line into the scatter graph
e). predict the grade in the test for someone with 8years in school studying English language
grade in test
90
80
70
60
50
40
30
20
10
0
0 1 2 3 4 5 6
𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 10(2404)−(35∗667)
𝛽1 = 𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
→= 10(133)−1225
𝛽1 =6,62
𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
66,7 – (6,62)4
𝑦̂ = 43,53
d). since 𝛽1 > 0 it implies a positive linear relationship between x&y and as x increases by one unit
y increases by a factor of 6,6
=43,53 + 6,62(2)
=56,77
𝑦̂ = 43,53 + 6,62x
=43,53 + 6,62(4)
=470,01
e). 𝑦̂ = 43,53 + 6,62x
= 43,53 + 6,62(8)
=96,49
=96%
Correlation Analysis
Correlation analysis tests the strength of the between two variables. It measures the strength of a
linear relationship between independent variable x and dependent variable y
The correlation coefficient is denoted as ṙ takes values between -1 ≤ r ≤ 1, that is your r must in -1 ≤ r
≤ 1.
-1 -0.5 0 0.5 1
Perfect –ve correlation -ve correlation no correlation +ve correlation perfect +ve
The common correlation coefficient used in statistics is the Pearson correlation coefficient
It is calculated by
𝑛 ∑ 𝑥𝑦− ∑ 𝑥 ∑ 𝑦
𝑟=
√(𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ) (𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
r = correlation coefficient
x =independent variable
y = dependent variable
Suppose an experiment involving 5 subjects is conducted to determine the relationship between the
percentage of a certain drug in the blood stream and the length of time it takes to react to a stimulus.
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽1 𝑥 =
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽1 𝑥 = 𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
𝟓(𝟑𝟕)−(1510)
=
5(55)−(15)2
= 0,07
𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
= 2 – (0,7)3
= -0.1+ 0.7x
= 0,93
Coefficient of determination
COD =𝑟 2 ∗ 100
This measurement helps to determine the relationship or association of the two variables and its
measured as a percentage. It also helps in estimating the reliability of x values in predicting the y
values.
0,93*0,93*100 =86,49%
The x values are 86,49% reliable in predicting the y values according to this model.
A lady operates a hot dog stand in the park. She suspects that there is relationship between the
temperature in a given day and the number of hotdogs she sells in that day. She begins to keep her
track of the data and obtains the following results.
d) interpret
e) estimate the increase in the number of hot dogs sold when the temperature increases from 27
degrees to 30degrees.
90 Chart Title
80
70
60
50
40
30
20
10
0
0 2 4 6 8 10 12
𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽1 𝑥 = 𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
∑ 𝑥 = 265, ∑ 𝑥 2 = 7207,
10∗17413−265∗644
10∗7207− 2652
= 1,88
𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
= 64,4 – 1,88* 26,5
= 14,5
10∗17413−265∗644
=
√(10∗7207− 265)2 (10∗42186)−6442
= 0,96
The x values are 92,26% reliable in predicting the y values according to this model.
CHI- SQUARE DISTRIBUTION OR TEST
The chi-square distribution is a distribution obtained by multiplying the ratio of sample variance to
population variance by the degrees of freedom when random samples are selected. Expected
frequencies denoted as E are frequencies obtained by calculation whereas observed frequencies,
denoted by O are obtained by taking observations. The chi-square distribution is denoted by and
it is used to test for independency.
In this test the claim is that the row and column variables are independent of each other. The
hypothesis for this test are stated as follows:
Or
Or
r= number of rows.
c= number of columns.
Example: In order to establish whether or not a relationship exists between blood type and the
severity of winter flue, a survey was conducted and it gave the following results:
Test at 5% level of significance whether there is a relationship.
H0 : There is no relationship between blood type and the severity of a winter flue.
H1 : There is a relationship between blood type and the severity of a winter flue.
Critical Value
Rejection Criterion
Rejection Criterion
Reject H0 if
Test statistic :
NB* Each observed frequency must have its own corresponding expected frequency.