0% found this document useful (0 votes)

39 views91 pages

QAB II Lecture Notes2021

The document outlines various sampling techniques including simple random, systematic, stratified, and cluster sampling, along with their descriptions, methods, advantages, and applications. It provides an overview of statistics, distinguishing between descriptive and inferential statistics, and discusses data types, collection methods, and presentation techniques such as frequency distribution and stem-and-leaf plots. Additionally, it covers grouped data analysis, cumulative frequency distributions, and the construction of histograms and ogives.

Uploaded by

musvavajobe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views91 pages

QAB II Lecture Notes2021

Uploaded by

musvavajobe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 91

QUANTITATIVE ANALYSIS FOR BUSINESS TWO

Prepared by Mr S. Munyala (WhatsApp 0782022800 / Call 0771524834)

Sampling Techniques

Simple Random Sampling

Description: - Equal chance, unbiased

Methods
Business Examples
Advantages
Disadvantages
Uses or applications of the technique

Systematic Sampling
Description: First item is selected randomly with Kth items
Method: Calculation
Formula
Business Examples
Advantages
Disadvantages
Uses or applications of the technique

Stratified Sampling
Description: Homogenous strata, random sampling within each stratum

Cluster Sampling
Description: Heterogeneous clusters

Overview of Statistics
Before examining the broad areas of statistics, it is necessary to become familiar with certain
terms and concepts used extensively in the subject.
1. Random Variable- A characteristic being measured or observed is called a variable for
example weight and height. Since a variable can take on different values at each
measurement or observation, it is termed a random variable that is different
measurements of height or weight for example the distance travelled per day by a
delivery truck.
2. Sampling Unit- A sampling unit is the item or individual being measured or counted
with respect to the random variable under study for example the random variable is
distance and the sampling unit is each delivery truck.
3. Population – Is the collection of all observations of a random variable understudy and
the one on which the researcher is trying to draw conclusions. A population must be
defined in very specific terms to include only those sampling units with characteristics
that are relevant to the problem for example all the delivery vehicles in Zimbabwe.
4. Sample- Not every member of the population is observable or measurable for reasons
mainly of cost and time. A subset of the population on which observations are made
or measurements taken is referred to as a sample for example a random sample of two
hundred delivery vehicles is selected and their daily distances travelled are recorded.
There are two major components in the discipline of statistics
a) Descriptive Statistics- It aims to identify the essential characteristics of a random
variable and produce a profile of its behaviour. This is achieved through summary
measures.
b) Inferential Statistics-This generalizes sample findings to the broader population, it is
that area of statistics which extends the information extracted from a sample to the
actual environment.
1. Qualitative and Quantitative Data
2. Scales of Measurement. a) Nominal scaled data {categorizes data for example gender
and profession}. b) Ordinal scaled data {ranks data for example Likert type scales.
Statement: Nust is the best university in Zimbabwe. Options: Strongly disagreed,
neutral, agree, and strongly agreed} c) Interval scaled data. d) Ratio- scaled data

3. Data Collection Methods

4. Data sources
Data can be classified into two classes that is discrete or continuous
DISCRETE DATA

A random variable whose observations can take only specific values which are integers
{whole numbers is referred to as a discrete random variable}. In such instances certain
values are valid whilst others are invalid e.g. The number of cars in a parking lot at a
given time, the numbers of students in a class or the number of employees in an
organization.

CONTINUOS DATA
A random variable whose observations can take on any value in an interval is said to
generate continuous data for example the mass of a person, distance travelled, and time
taken to travel to work daily.

STRUCTURE OF THE COURSE

The course is organized under the following general headings

1. Methods to describe the characteristics of a random variable.
a) Presentation of data
b) Measures of central tendency
c) Measures of dispersion
d) Skewness
2) Quantifying Uncertainty
a) Basic Probability Concepts
b) Probability distributions- The Binomial distribution
- The Poisson distribution
- The Normal distribution
3) Inferential Statistics
a) The basic of sampling
b) Confidence Intervals
c) Hypothesis Testing
d) The chi-squared distribution

4) Forecasting
a) Regression and Correlation
Section 1

PRESENTATION OF DATA
UNGROUPED DATA
There are a number of ways in which ungrouped data can be presented such as frequency
distribution tables, stem and leaf.
A frequency distribution is a table which summarizes data with corresponding frequencies.
The following data correspond to the performance of students in a test. Construct the
frequency distribution table to illustrate the information. The random variable is marks
represented by X
X1=10 X6=20 X11=25 X16=21
X2=20 X7=27 X12=15 X17=80
X3=25 X8=80 X13=17 X18=10
X4=15 X9=15 X14=25 X19=27
X5=10 X10=20 X15=30 X20=21

Mark Frequency
10 3
15 3
17 1
20 3
21 2
25 3
27 2
30 3

∑ 𝑓 = 20

Stem and Leaf Plots

A stem and leaf plot is a way of summarizing data. It can be constructed in two phases, rough
draft and final draft. The numbers are divided into two parts one called a stem and the other
called a leaf.
NB: In the final draft data values are arranged inn ascending order and it is important to have
a key. Example: The following data shows ages in years of people who were shipping at a
supermarket one afternoon. Construct a stem and leaf plot to present the data.

55 15 25 50 28 66 73 25 24 47 10 45 54
55 55 43 57 53 65 38 30 29 64 12 70
16 24 25 40 15 36 53 57 24 27

Rough Draft

Stem Leaf
1 0 6 5 2 5
2 9 4 5 5 5 4 4 8 7
3 0 6 8
4 3 5 0 7
5 5 7 3 43 70 5 5
6 6 4 5
7 3 0

Stem Leaf
1 0 25 5 5 6
2 4 4 4 5 5 5 7 8 9
3 0 6 8
4 0 3 5 7
5 0 3 3 45 55 7
6 4 5 6
7 0 3

Key 1/0 =10

Example
Twenty-five students obtained the following marks in a statistics test and economics test.
Prepare a stem and leaf plot for the data and comment on the overall performance of the
students in both subjects.
Marks in Stats
75 34 29 91 81 47
30 20 21 32 32 68
58 36 18 15 23 30
21 23 28 22 34 45

Marks in Economics
16 29 45 58 64 78
42 54 66 72 34 35
54 91 74 24 84 92
70 78 54 52 18 41 65

Marks in Statistics Stem Marks in Economics

5 8 1 6 8
3 2 8 19 3 0 1 2 9 4
0 4 2 2 6 40 3 4 5
5 7 4 25 1
0 8 5 4 4 4 8 2
8 6 5 6 4
5 7 0 8 4 2 8
1 8 4
1 9 1 2
Final Draft Back to Back Stem and leaf Plot

Marks in Statistics Stem Marks in Economics

8 8 1 6 8
9 8 3 3 2 1 1 0 2 4 9
6 4 4 2 2 00 3 4 5
7 5 4 12 5
8 0 5 2 4 4 4 8
8 6 4 5 6
5 7 0 2 4 8 8
1 8 4
1 9 1 2

Statistics Key 5/1 =15 Key Economics 2/4=24

The stem and leaf plot shows that students performed better in economics than in statistics

Exercise
The data below shows the number of villages interviewed in different villages of the country
1) Construct a frequency distribution to illustrate the data
2) Construct a stem and leaf plot of the data

176 168 168 147 156

152 180 153 165 160
140 134 168 143 171
158 136 162 166 155
170 153 174 170 169

Grouped Data
Type A: Continuous
Data is grouped into classes for example 20≤30, 30≤40
If the random variable is x: 20<x<30 then 20≤30 is a class where 20m is the lower limit and
30 is the upper limit.

We use the abbreviations LCL and UCL to denote these. The difference between the LCL
and UCL is called the class interval or class length or class width.
Class width= UCL – LCL
= 30 - 20
=10
The sum of the LCL and UCL divided by two gives the midpoint of a class usually denoted
as x

(LCL + UCL)
Midpoint (x) = 2

Class Limits X
20≤30 25
30≤40 35
40≤50 45
50≤60 55
60≤70 65
70≤80 75
80≤90 85

NB. The UCL of the first class is the LCL of the succeeding class.

Type B: Discrete

Classes Adjusted Classes

5-9 4.5≤9.5
10-14 9.5≤14.5
15-19 14.5≤19.5
20-24 19.5≤24.5
25-29 24.5≤29.5
In type B classifications we need to adjust class limits before any statistical procedures can be
carried out. We adjust the classes by adding one half /0.5 to the original UCL and subtracting
one half/0.5 from the original LCL for example 5-9 becomes (5- 0.5) ≤ (9 + 0.5)= 4.5≤ 9.5

If the less than side is given no need to adjust

There are a number of ways of presenting grouped data such as frequency distribution,
histogram, frequency polygon and cumulative frequency distributions.

Example

The owner of a small business once to analyse profits over past 25 day period using a class
interview of 5 beginning at 20 construct. a) Frequency Distribution

b) Histogram

c) Frequency polygon

N.B In the original class you put an interval of four such that the adjusted classes will have an
interview of 5.

21 27 35 41 23

32 30 35 28 38

36 32 33 32 34

42 29 43 37 20

32 30 20 34 35

Classes Adjusted Classes Distribution Frequency

20-24 19.5≤24.5 4
25-29 24.5≤29.5 3
30-34 29.5≤34.5 9
35-39 34.5≤39.5 6
40-44 39.5≤44.5 3
A histogram is a form of a bar-chart or graph where the areas or lengths of each bar
are proportional to the frequencies of the classes

c. Frequency Polygon

Before constructing a frequency polygon find the midpoint of each and then plot the midpoint
against the corresponding frequency

Adjusted Classes Distribution Frequency Midpoint x

19.5≤24.5 4 22
24.5≤29.5 3 27
29.5≤34.5 9 32
34.5≤39.5 6 37
39.5≤44.5 3 42

Adjusted Classes Distribution Frequency Cumulative Frequency

19.5≤24.5 4 4

24.5≤29.5 3 7

29.5≤34.5 9 16

34.5≤39.5 6 22

39.5≤44.5 3 25

CUMULATIVE FREQUENCY DISTRIBUTION

Cumulate comes from the word accumulate. A less than cumulative frequency distribution is
called an ogive. To construct an ogive, plot the UCL of each class against the cumulative
frequency and join the points using free hand [ It has to be a curve]

Example

The following data shows the number of days on which patients visited a clinic for
counselling using a class interval of 15 starting at 125 construct
a) Frequency distribution
b) Cumulative frequency distribution
c) Ogive
d) Relative frequency distribution

Cumulative Frequency Distributors

185 165 180 174 125

185 168 188 182 181

154 175 175 160 192

168 134 188 142 175

156 150 214 160 145

172 188 192 185 172

Original Adjusted Frequency Cumulative Relative

Classes Classes Frequency Frequency
(R.F)
125-139 124.5≤139.5 2 2 0.067
140-154 139.5≤154.5 4 6 0.133
155-169 154.5≤169.5 6 12 0.2
170-184 169.5≤184.5 9 21 0.3
185-199 184.5≤199.5 8 29 0.267
200-214 199.5≤214.5 1 30 0.033

N.B The last figure on your cumulative frequency should be equal to ∑ 𝑓

𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
The relative frequency is calculated as follows [R.F] = 𝑇𝑜𝑡𝑎𝑙 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦

N.B. The summation of all relative frequencies should be one

Example
The following data are the marks obtained by a group of students in a statistics exam

68 49 69 41 79

42 60 87 65 68

50 61 85 66 63

52 56 74 59 81

57 88 47 55 65

78 90 65 72 95

a) Group the data into classes with an interval of 10 starting at 40 until all the values have
been accounted for.

b) Construct an ogive for this data

Original Classes Adjusted Classes Frequency Cumulative

Frequency40
40-49 39.5≤49.5 4 4
50-59 49.5≤59.5 6 10
60-69 59.5≤69.5 10 20
70-79 69.5≤79.5 4 24
80-89 79.5≤89.5 4 28
90-99 89.5≤99.5 2 30

Measures of Central tendency

The behaviour of any random variable can be described by a measure of central tendency and
a measure of dispersion about a central value. Observations of a random variable tend to
group about some central value. The statistical measures that quantify where the majority of
observations are concentrated are referred to as measures of central tendency. There are 3
main measures of central tendency mainly mean, mode and median. Each measure will be
compared for both grouped and ungrouped data.

Mean for Ungrouped Data

In general, the mean which is denoted as x̅ is defined as the sum of all observation divided by
the number of all observation

𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 ∑𝑥

x̅=𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 𝑛

A merchandising manager of a retail clothing chain has recorded 0 observations on the

interval between orders for a particular range of women clothing. The order intervals in days
are

18 15 7 24 10

23 28 10 16 12

5 23 24 16 19

26 17 27 17 17

29 18 23 9 26

12 22 14 26 22

555
a) Find the mean number of days between orders = = 18.5
30

Therefore, the average time between orders is 18.5 days

Mean for Grouped Data

Grouped data are represented by a frequency distribution. To calculate the mean for grouped
data, find the midpoint for each class and multiply it by the corresponding frequency, the
formula therefore for calculating the mean is given as follows

∑ 𝑓(𝑥) ∑ 𝑓(𝑥)
x̅ = ∑𝑓
, x̅ = 𝑛

where x is the midpoint for each class and f is the absolute frequency for each class.

Find the mean number of days between orders using the data from the previous example
assuming that the data are grouped as shown in the following table

Order Period Frequency X f(x)

5≤10 3 7.5 22.5
10≤15 5 12.5 62.5
15≤20 9 17.5 157.5
20≤25 7 22.5 157.5
25≤30 6 27.5 165
∑ 𝑓(𝑥) = 565

565
The average time between order is = 18.83
30

The mean for grouped data is not exactly equal to the mean for ungrouped data. The more
reliable mean is the one for ungrouped data because it uses absolute values unlike grouped
data which puts the values unto classes.

N.B The mean uses every value of the data set in its computation as a result it possesses
certain useful properties, which make it the most widely, used measure of central
tendency.

Mode for Ungrouped data

Mode is the most frequently occurring value in a dataset. If the number of observations is not
too large, the mode can be found by arranging the data in ascending order and by inspection
that is identifying the value that occurs the most.

The mode is denoted as Mo

Example

Identify the mode for the following data set

74, 48, 36,74, 70, 67, 48, 74, 70, 36, 36, 40, 50, 74

Arranging in ascending order

36, 36, 36, 40, 48, 48, 50, 67, 70, 70, 74, 74, 74, 74

Therefore, Mo=74

Calculating the mode for grouped data is based on a frequency distribution table. The first
step is to identify the modal interval and then determine the modal value within the modal
interval. The formula used to accomplish this is given as follows
Mo= Lm + { [Cm ( fm - fm-1)] divided [2fm - fm-1 –fm+1] }

Where Lm =lower limit of the modal class

Cm = class width of the modal class

fm = the absolute frequency of the modal class

fm-1 = the frequency of the class proceeding the modal class

fm+1 = the frequency of the class succeeding the modal

Using the following data find the mode

Classes F
125≤140 4
140≤155 11
155≤170 9
170≤185 8
185≤200 10
200≤215 2

15 (11−4)
Mode = 2(11)−4−9

= 151.67

Find the mode for the following sets of grouped data

Classes f F
5≤10 3
10≤15 5
15≤20 9
20≤25 7
25≤30 6
b)

Classes F
50≤90 2
90≤130 9
130≤170 26
170≤210 27
210≤250 6

5( 9−5) 40(27−26)
Mo= 15 +2(9)−5−7 Mo = 170 +2(27)−26−6

= 18.33 = 171.82

N.B A major disadvantage of using the mode as a measure of central tendency is that they
can be more than mode making it difficult to make a decision on which one to sell. A
distribution with one mode is said to be unimodal and with two modes is said to be bimodal

Median for Ungrouped Data

The median is that value of a random variable, which divides an ordered data set into two
equal parts. Half of the observations will fall below this median value and the other half
above it.

When finding the median for ungrouped data, the first step is to arrange the observations in
(𝑛+1)𝑡ℎ
ascending order. If n is odd, identify the median position as the position.
2

(𝑛)𝑡ℎ
If n is even identify the value in the position, average this value and the adjacent value
2

in its right, to find the median value.

Example

27 38 12 34 42 40 24 40 23

Step

Arranging the data in ascending order

12 23 24 27 34 38 40 40 42
n= 9, n is odd

(𝑛+1)𝑡ℎ (9+1)𝑡ℎ
𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 5𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛.Median =34
2 2

Identify the median for the following data sets

27 38 12 42 40 24 40 23 18 34

Ascending order

12 18 23 24 27 34 38 40 40 42

27+34
Median = = 30.5
2

The formula for finding the median for grouped data using the arithmetic method is as
follows

𝑛
Me= Lm + [Cm (2 - Fm-1)] divided [fm]

Where Lm = the lower class limit of the median

Cm = the class width of the median class

n = total number of observations

Fm-1 = cumulative frequency of the class interval before the median interval

fm = the absolute frequencies of the median class

Basically there are two methods of finding the median for grouped data

1. Arithmetic Method- Both the frequency distribution and the cumulative frequency
distribution values are required using the cumulative frequency distribution values.
𝑛
The median interval is that class interval into which the ( 2)𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑓𝑎𝑙𝑙𝑠

Examples
Find the median for the following set of grouped data using the arithmetic method.

Classes f F
125≤140 4 4
140≤155 11 15
155≤170 9 24
170≤185 9 33
185≤200 10 43
200≤215 2 45

∑ 𝑥 = 45

𝑛 45
Median Class = ( 2)𝑡ℎ position= ( 2 )𝑡ℎ = 22.5th position

NB: You identify the median class using the cumulative frequency

45
15( −15)
2
Median = 155 + 9

= 167.5
2. Graphical Method: The median is found by reading off the value of random variable
associated with the fifty percent cumulative frequency on the vertical axis

The cumulative frequency distribution is required in both methods

SKEWNESS
After calculating the mean, mode and median the decision has to be made as to which
one should be preferred as a measure of central tendency for a data set. The following
comparisons might help in this endeavour.

Symmetrical Distribution

If the mean = mode= median. Then a symmetrical distribution has been identified. For
a symmetrical distribution the best measures of central tendency is the mean because
it contains all the properties of a given data set.

Negatively Skewed Distribution

This is known as the left skewed distribution, if the mean< median< mode. This
situation indicates that more data values are distributed to the right owing to few data
values to the left as such a long tail results to the left. This yields a negatively skewed
distribution

Therefore, if a distribution is negatively skewed the median is preferred as the best

measure of central tendency.

Positively Skewed Distribution

This is also known as the right –skewed distribution if the mean>median>mode. Then
it means that the data are not evenly distributed that is more data values are
distributed to the left and few data values to the right resulting in a long tail to the
right.

The best measure of central tendency in this case is the median

Measures of Position

There are two types of measures of position that is quartiles and percentiles

QUARTILES
These values divide a data set that is ordered in ascending order into four equal parts.
There are three quartiles, which are
Q1 = Lower quartile
Q2 = Middle quartile
Q3 = Upper Quartile

Ungrouped Data
In ungrouped data the observations are arranged in ascending order before the
required quartile position are determined. To get these position the following
formulae are used

𝑛
Q1= (4) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
𝑛
Q2= (2) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
3𝑛
Q3= ( 4 ) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
If n is odd and the value obtained after making the calculations is as whole number,
the required quartile position will be to the count of that figure. If after making the
calculation the answer is not a whole number, consider the next whole position.

Example
Find q1, q2 and q3 for the data below
18 9 11 30 15 22 19 20 35 40 43

9 11 15 18 19 20 22 30 35 40 43
𝑛 11
Q1 = (4)𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = = 2.75 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4

Q1= 15

11
Q2 = = 5.5 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 20
2

3(11)
Q3 = = 8.25 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = 35
4

If n is even and the result you get after calculation is a whole number average this
value and the value to its right and if it is not a whole number, consider the position to
the next whole number.

18 9 11 30 15 22 19 20 35 40 43 24 9 11 15 18 19
20 22 24 30 35 40 43

𝑛 12 15+18
Q1 = ( 4) th = ( ) = 3rd position = = 16.5
4 2

𝑛 12 20+22
Q2 = ( 2) th = ( ) = 6th position = = 21
2 2

3𝑛 3(12) 30+35
Q3 = ( ) th =( )= 9th position = = 32.5
4 4 2

Grouped data
There are two methods that can be used to compute quartiles for grouped data and
these are the graphical method and arithmetic method. The cumulative frequency
distribution is required for both methods.

Graphical Method
An ogive is used to find or estimate quartiles. To find the Q1 position determine 25%
of the total frequency and find value that corresponds to it on the x-axis. To find Q2
and Q3 determine 50% and 75 %respectively of the total frequency. Example

Find Q1, Q2 and Q3 for the grouped data below

Classses Frequency Cumulative frequency

125≤140 4 4
140≤55 11 15
155≤170 9 24
170≤185 9 33
185≤200 10 43
200≤215 2 45

45
Q1 = = 11,25th
4

45
Q2 = = 22,5th
2

3(45)
Q3 = = 33,75th
4

The following formulae are used to find the lower quartile Q1 and the upper quartile
Q3. Q2 is found using the median formulae

𝑛
𝐶𝑞( −𝐹𝑚−1)
4
Q1 = Lq1 + 𝑓𝑞
3𝑛
𝐶𝑞( −𝐹𝑚−1)
4
Q3 = Lq3 + 𝑓𝑞

Where Lq is the LCL of the required quartile interval and

Cq = the class width
n = the total number of observations
Fm-1 = Cumulative frequency of the interval preceding the required quartile interval
fq = the frequency of the required quartile interval

𝑛
𝐶𝑞( −𝐹𝑚−1)
4
Q1 = Lq1 + 𝑓𝑞

45
15( −4)
4
= 140 +[ ]
11

= 149,89

3𝑛
𝐶𝑞( −𝐹𝑚−1)
4
Q3 = Lq3 + 𝑓𝑞

3(45)
15( −33)
4
= 185 +[ ]
10

= 186,13

In general any percentile value can be found by adjusting the median formula to find
the required percentile position and from this establish the percentile for example 90th
9𝑛
percentile position =(10) th position

4𝑛
40th percentile position = (10) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
35𝑛
35th percentile position = (100) 𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
Measures of Position
Symbolic Notation for Sample

A measure found from analysing sample data is called a statistic while a measure
describing a population attribute is called a parameter. Various symbols are used for
each of these measures.

Statistical Measure Sample Statistic Population Parameter

Mean x̅ μ
Variance S2 δ2
Standard Deviation S δ
Sample Size N N
Measures of Dispersion
There are several types of measures of dispersion and these include
(i) Range
(ii) Interquartile Range
(iii) Variance
(iv) Standard Deviation
(v) Coefficient of Variation

They are used to describe the extent to which the values of a random variable are scattered
about a central value. The central value can be described as more reliable if there is a high
concentration of the values of observations about it. On the other hand, widely spread
observations show low reliability of the central value.

Range

Is the gap or difference between the smallest and biggest observation in a given data set.

Example

The following data show the amount in millions of dollars paid to employees in different
companies. Find the data range and interpret your solution.

16 2 38 9 20 80 3 10 50

Range = 80-2
= $78 million

Since 78 is very close to the highest observation and very far from the lowest observation this
suggests a wide dispersion hence the mean as a measure of central tendency will be strongly
unrepresentative.

Interquartile Range

It is simply the difference between the upper quartile and the lower quartile

IQR =Q3 –Q1

Variance

It is a measure of spread or dispersion that includes all the observations of a data set in its
computation. It can be computed for both ungrouped and grouped data.

Variance for Ungrouped data

The formula used to compute this is

∑(𝑥 2 ) − 𝑛(𝑥̅ )2
𝑛−1

Where

x = is the value of each observation

n= sample size

x̅ = sample mean

Standard Deviation for ungrouped data

The standard deviation is found by computing the square root of the variance

S= √(𝑆)2

∑(𝑥 2 )−𝑛(𝑥̅ )2
S=√( )
𝑛−1
The following data show the weights in kgs of 8 patients who visited a clinic one afternoon.
Compute the variance and standard deviation of the weights.

80 70 60 50 40 35 65 45 Mean = 55,63

X x2
80 6400
70 4900
60 3600
50 2500
40 1600
35 1225
65 4225
45 2025

∑ 𝑥 = 445 ∑(𝑥 2 ) = 26475

26475−8(55,63)2
S2 = 8−1

=245,35

Standard Deviation = √(245.35)

= 15.66 kg

Example

The following data give the time in minutes spent by a sample of 20 students to complete a
given task. Showing all workings calculate the standard deviation of the data.

16 29 58 66 78 42 54 72 54 72 54 91 44 84 92 70 78
52 28 41

x̅ = 58.75

∑((𝑥)^2) = 77671

77671−20((58.75)^2)
S2 = 19
= 454,72

Standard deviation= √454, 72

= 21.32

Variance for grouped data

∑ 𝑓(𝑥 2 )−𝑛(𝑥̅ )2
S= 𝑛−1

S= √(𝑆)2

∑ 𝑓(𝑥 2 )−𝑛(𝑥̅ )2
S=√( )
𝑛−1

Where f is the frequency for each class and x is the midpoint of each class.

Example

Find the variance and standard deviation for the grouped data below

Classes F x fx fx2
125≤140 4 132.5 530 70225
140≤155 11 147.5 1622.5 239318.75
155≤170 9 162.5 1462.5 237656.25
170≤185 9 177.5 1597.5 283556.25
185≤200 10 192.5 1925 370562.5
200≤215 2 207.5 415 86112.5

∑ 𝑓𝑥 7552,5
x̅ = = = 167.83
𝑛 45

∑ 𝑓(𝑥 2 )−𝑛(𝑥̅ )2
S2 = 𝑛−1

1287431.25−45((167.83)2 )
= 45−1

= 452.74

S= √(452.74= 21.28
Coefficient of Variation

This is a measure of dispersion that we use to compare consistency in performance between

different random variables with diff units of measurement

The sample coefficient of variation is calculated as

𝑠
Sample C.O.V = 𝑥̅ 𝑥 100

𝛿
Population C.O.V= 𝑥 100
𝜇

It is always expressed as a percentage.

The following data show the mean and standard deviation of sales per month and the
experience of employees in years. Calculate and compare the coefficients of variation of two
random variables. Which random variable is exhibiting better variation?

Experience of Employees (yrs) Sales Per month

x̅ =20 x̅ = 500
S= 4 S= 80

𝑠
COV for experience = 𝑥̅ 𝑥 100

4
= 20 𝑥 100

= 20%

𝑠
COV for sales per month =𝑥̅ 𝑥 100

80
= 500 𝑥 100

= 16%

NB: The one with a higher percentage has greater variability, which means that it is less
consistent. It follows therefore that the one with smaller variability is more consistent.

Interpretation
Sales per month are more consistent than experience as shown by the variability of 16%
compared to 20%.

Coefficient of Skewness

The coefficient of skewness values should lie between negative 3 and positive 3 inclusive

-3≤ coefficient of skewness≤ 3

A value less than zero indicate negative skewness. A value equal to zero represents a
symmetrical distribution. A value greater than zero indicate positive skewness.

The common coefficient of skewness that is used is called Pearson’s coefficient of skewness.
The first coefficient of skewness is denoted as Sk1 is calculated as

𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒
Sk1 =𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

𝑥̅ −𝑀𝑜𝑑𝑒
= 𝑠

The second coefficient of skewness denoted as

3( 𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
Sk2 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

3( 𝑥̅ −𝑀𝑒𝑑𝑖𝑎𝑛)
= 𝑠

 When Sk1 and Sk2 are both less than zero then we have a negatively skewed
distribution.
 If Sk1 and Sk2 are both equal to zero, then we have a symmetrical distribution
 Sk1 and Sk2 are both greater than zero then we have a positively skewed distribution

Example

Compute Pearson’s first, second coefficient of skewness, and interpret your results.

𝐶𝑚(𝑓𝑚−𝑓𝑚−1)
Mode= Lm+ [2𝑓𝑚−𝑓𝑚−1−𝑓𝑚+1]

15(11−4)
= 140+[ 2(11)−4−9]

= 151.67
Mean= 16783

Mode= 151.67

Median =167.5

Standard Deviation= 21.28

𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒
Sk1=𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

167.83−151.67
= 21.28

= 0,76

3( 𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
Sk2 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

3( 𝑥̅ −𝑀𝑒𝑑𝑖𝑎𝑛)
= 𝑠

3(167.83−167.5)
= 21.28

= 0.05

This distribution is skewed to the right hence it is positively skewed.

PROBABILITY THEORY

A probability can be defined as a chance or likelihood of a particular outcome out of a

number of possible outcomes occurring for a given outcome.

Subjective Probability

Where the probability of an event is based on an educated guess or expert opinion. It is

referred to as a subjective probability. Subjective probabilities cannot be statistically verified.

Objective Probability

When the probability of an event can be verified statistically, it is referred to as an objective

probability. It is this type of probability that is used extensively is statistical analysis.
Mathematically a probability is defined as the ratio of two numbers that is the probability of
an event A occurring =r/n where capital A is an event of a specific type.
r = the number of outcomes of event A

n= total number of possible outcomes or the sample space

p (A) = probability of event A occurring

Example

Out of a class of 3 girls and 4 boys, what is the probability of selecting

a) A girl
b) A boy
3 4
a) P(G) =7 b) P(B) =7

Basic Properties of Probability

1. A probability value lies only between zero and 1 inclusive that is 0≤P(A)≤1
2. If an event A cannot occur that is it is an impossible event P(A)=0
3. If an event A is certain to occur that is it is definite then P(A) =1
4. The sum of probabilities of all possible outcomes of a random experiment (=1) equals
one that is exhaustive probability for example the sum of the probability of a girl and
probability of a boy equals one that is P(G)+ P(B)=1
5. If P(A) is the probability of event A occurring, then the probability of event A not
occurring is defined as P(A1) = 1-P(A)

Example

Consider a random process of drawing cards from a pack of playing cards find the probability
of selecting

a) A red card
b) A spade
c) An ace
d) Not an ace
26 1
a) P ( red card) = 52 = 2
13 1
b) P ( spade) = 52 = 4
4 1
c) P ( Ace) = 52 = 13
1 12
d) P ( Ace1) = 1 − =
13 13

Basic Probability Concepts

The concepts will be illustrated using the following example

Consider a random experiment of selecting companies from the Zimbabwe stock exchange
(ZSE).Values for the random variables, which are company size and industry type, are
measured or summarized as shown in the following table

Industry Type Small Medium Large Row Total

Mining 0 0 35 35
Finance 9 21 42 72
Service 6 3 1 10
Retail 14 13 6 33
Column Total 29 37 84 150

Computation of Objective Probability

Objective probability can be classified into three categories

(1) Marginal Probability

Marginal Probability

A marginal probability is the probability of only a single event e.g the probability of event A
occurring, it is written as P(A). A single event is an event that describes outcomes of one random
variable only. If A represents the event of a small company fund P(A).

29
𝑃(𝐴) =
150

Let B be the event of a Finance Company.

29
𝑃(𝐴) = 150

Marginal Probability is the probability of an event occurring at any time.

JOINT PROBABILITY

A joint probability is the probability of both events A and B occurring simultaneously on a given
random experiment. It is denoted as : P(A n B)

Let A be event of a small company and B the event of a Finance company. Therefore, the probability
of P (A n B) = 9/150.

CONDITIONAL PROBABILITY

A conditional probability is the probability of an event A occurring given information about the
occurrence of another event B. A conditional event describes the behaviour of a random variable in
light of additional information about a second random variable. A conditional probability is defined
as follows;

P(A n B)
P(A|B) =
P(B)

This the probability of event A occurring given that event B has already occurred.

The essential feature here is that the sample space is reduced to the to the outcomes describing
event B only and not all possible outcomes as for marginal and joint probabilities.

Let A be the event of a large company and B the event of a retail company. Find P(A|B) using

i) Intuition ii) The formula

INTERSECTION OF EVENTS

The intersection of events A and B is the set of outcomes that belong to both A and B
simultaneously. It is written as:

Let A be the event of a small company and B the event of a service company. A n B is the set of all
small and service companies. A n B = 6.

UNION OF EVENTS

The Union of events A and B is the set of outcomes that belong to A or B or Both. It is written as A u
B [A or B].

Let A be the event of a small company and be the event of a service company. Then A u B is the set
of all small or service or both companies. A u B = 29 + 10 - 6 = 33.
MUTUALLY EXCLUSIVE EVENTS

Events are mutually exclusive if they cannot occur together on a single trial of a random experiment.
For example, let A be event of a small company and B be the event of a medium company. Events A
and B are mutually exclusive because a randomly selected company from ZSE cannot be both small
and medium at the same time.

STATISTICALLLY INDEPENDENT EVENTS

The events are said to be statistically independent if the occurrence of one event A has no effect on
the outcome of event B occurring or vice versa. For example, let L be the event of an accident
occurring in London and H be the event of an industrial strike occurring in Harare. These scenarios
have no effect on each other event if they may occur at the same time.

The terms statistically independent events and mutually exclusive event must not be confused.
When the events are mutually exclusive they not statistically independent. They are dependent in
the sense that when one event occurs then the other will not occur. In probability terms the
probability of an intersection between two mutually exclusive events is zero.

PROBABILITY RULES

There are two basic probability rules:

1. Addition Rule (u ; “or”) ≤ for both mutually and non-mutually exclusive events.
2. Multiplication Rule ( n ; “and”) ≤ For both statistically and non-statistically independent
events.

ADDITION RULE FOR MUTUALLY EXCLUSIVE EVENTS

The probability of either event A or B occurring in a single trial of a random experiment is defined as:

P(A u B) = P(A) + P(B)

For mutually exclusive events there is no intersection event, therefore P(A n B)= 0.

Let A be the event of small company and B the event of a large company. Since these two events are
mutually exclusive therefore P(A u B) = 29/150 + 84/150 = 113/150.

This is the probability that a randomly selected company from ZSE will either be a small company or
a large company.

ADDITION RULE FOR NON-MUTUALLY EXCLUSIVE EVENTS

Probability of either event A or B occurring in a single trial of a random experiment is given by;

P(A u B) = P(A) + P(B) – P(A n B)

For example: Let A be event of a small company and B the event of a service company. These two
events are not mutually exclusive as they can occur at the same time. Therefore P(A u B)= 29/150 +
10/150 – 6/150 = 11/50

This the probability that a randomly selected company for ZSE will either be a small company or a
service company or both.

MULTIPLICATION RULE FOR STATISTICALLY INDEPENDENT EVENTS

If two events are statistically independent, then the multiplication rule reduces to the probability of

P( A n B)= P(A) * P(B)

P( A|B) = P(A)

NB* two events A and B are statistically independent if the following test can be satisfied.

This means that if the marginal probability of event A equals the conditional probability of A given
that event B has occurred, then these two events are statistically independent. This means that the
prior occurrence of event B does not influence the outcome of event A.

Let A be the event of a media company and B the vent of a Finance company. Determine if the two
events are statistically independent or not.

𝑃(𝐴 𝑛 𝐵)
= 𝑃(𝐴)
𝑃(𝐵)

21
150 = 37
72 150
15
7 37
≠
24 150

Since P(A|B) is not equal to P(A), then these events are not statistically independent.

MULTIPLICATION RULE FOR NON_STATISTICALLY INDEPENDENT EVENTS

If two events are non-statistically independent, we apply the following rule. The multiplication rule
may be used to find the joint probability of event A and B occurring on a single trial of a random
experiment i.e that is the intersection of the two events. By rearranging the conditional probability
formula, the multiplication rile is defined as:

𝑃(𝐴 𝑛 𝐵)
𝑃(𝐴|𝐵) =
𝑃(𝐵)

𝑃(𝐴|𝐵) ∗ 𝑃(𝐵) = 𝑃(𝑎 𝑛 𝐵)

Where P(A n B)= The joint probability for A and B.

P(A|B) = conditional probability of event A occurring given that B has already occurred.

P(B)= is the marginal probability of event B occurring.

The personnel department of an insurance company analysed the qualification profile of their 129
managers. The qualifications attained by each manager are shown below.

MANAGEMENYT
LEVELS
Qualification Section Head Department Division Head Total
Level Head
O’Level 28 14 8 50
Diploma 20 24 6 50
Degree 5 10 14 29
Total 53 48 28 129

What is the probability of a person selected at random:

i) Having only O’ Level

ii) Being section head and having a degree.
iii) Being a department head given that they have a diploma.
iv) Being a division head.
v) Being a division head or a section head.
vi) Having an O’ Level given that the person is a section head.

Answer:

i) P(O’Level)= 50/129
ii) P(Section Head n Diploma)= P(A n B)= P(A)*P(B)

Or P(A n B)= P(A|B)* P(B)

P(A|B)= 5/29 and P(A)= 53/129

Therefore, being a section head and having a degree are non-statistically independent events.

We use P(A n B)= P(A|B)*P(B)

= 5/129 * 29/129

= 5/129

iii) P(Being Department Head| Diploma )

24
𝑃(𝐴 𝑛 𝐵) 129 24
𝑃(𝐴|𝐵) = = =
𝑃(𝐵) 50 50
129
iv) 28/129
v) Being division head and being section head are mutually exclusive events:
P(A u B)= P(A) + P(B)
= 2/129 + 53/129
= 81/129
14
𝑃(𝐴 𝑛 𝐵) 129 14
vi) 𝑃(𝐴|𝐵) = = 48 =
𝑃(𝐵) 48
129

PROBABILITY TREE DIGRAMS

This a diagram that helps in decision making. The diagram has the shape of a tree and each branch
on the tree represent a logical outcome.

Example: A farmer has 15 cows in which 7 are black and 8 are white, it has been a tradition that he
sells a cow each month, if two cows are sold and then replaced find the probability that

i) Both are black

ii) One is white and one is black.
iii) How will these probabilities be affected if these cows were not replaced?

Example: If three playing cards are selected from a pack of playing cards with replacement, what is
the probability of getting at least two diamonds. How would this probability be affected if these
cards were not replaced?

Example: Suppose that we rolled an unbiased dice three times, Find the probability that the
outcome is:

i) Three even numbers

ii) At least two even numbers
iii) At least one even number
iv) No even number at all.

The Fundamental Principles of Counting.

If the event E can be split into K-sub-events i.e …….. such that there are …………, ways of performing
each sub event then the entire set E can be performed as E=
Suppose a man walks from point A to point C via point B as illustrated in diagram below.

In how many ways can the man move from point A to C

A to B = 2 ways

B to C = 2 ways

Therefore A to C = 2*2 = 4 ways

Example: A restaurant menu has a choice of four starters ten main courses and six deserts. Find the
total number of possible meals that can be ordered.

Total number of possible meals = 4106 = 240 meals

Example: How many different 7 place number plates are possible if the first 3 places are to be
occupied by alphabetic letters and the final four by numbers?

PERMUTATIONS OF R OBJECTS FRON N OBJECTS

A permutation is the number of distinct ways in which a group of objects can be arranged, each
possible arrangement is called a permutation. Consider the number of ways of placing 3 of the
letters ABCDEFG in three empty spaces. The first space can be filled in 7 ways, the second space can
be filled in 56 ways, the third space can be filled in 5 ways. Therefore there are 7*6*5=210 ways of
arranging the letters taken from seven letters. This is the number of permutations of three objects
taken from seven objects and it written as:

NB* With a permutation the order in which letter or numbers are arranged is very important.

Example: ABC is a different permutation from ACB and so are CAB, CBA, BAC, BCA.

COMBINATIONS OF R OBJECTS FROM N OBJECTS

Is the number of the different ways of arranging a subset of objects selected from a group of objects
where the order is not important. Each possible arrangements is called a combination. ABC gives rise
to many permutations ABC, ACB, CBA, CAB, BAC, BCA. But however it is one combination. Therefore
the number of combinations of three letters from seven letter ABCDEFG is denoted by

Example: From a group of five women and seven men. How many different committees consisting of
two women and three men can be formed?

PROBABILITY DISTRIBUTION

A probability distribution is a list of all possible outcomes of a random variable and their associated
probabilities of occurrence. The expected value of a random experiment which is the mean is given
by the following formula
The Variance

Denoted as is calculated as follows:

Example: A fair coin is tossed twice.

i) Construct a probability distribution of the number of heads that can occur.

ii) Find the value of the expectation and standard deviation of the number of heads that
can occur.

Let X be the number of heads that can occur.

NB* In your probability distribution the sum of P(X=x)=1 (exhaustive probability).

Example: An unbiased dice is thrown once, construct a probability distribution that shows the
possible outcomes and use it to find the expectation and standard deviation. Let X be the possible
outcomes.

Example: A coin is tossed three times, construct a probability distribution for the number of tails that
can be obtained, and use it to calculate the expectation and standard deviation for the number of
tails that occur.

Example: Each customer at a supermarket pays using one of the three methods, cash, cheque and
credit. The probability of randomly selected customer paying by cash is 0.54 and cheque is 0.12.

i) Determine the probability of a randomly selected customer paying by credit card, and
three customers are selected at random find the probability of all three paying by cash.
ii) Exactly one paying by cheque
iii) One paying by cash, one by Cheque and one by a credit card.

TYPES OF PROBABILITY DISTRIBUTIONS

The choice of a particular probability distribution function depends primarily on the nature of the
random variable under study.

Discrete or Continuous

Probability distribution functions can be classified as.

Discrete Probability Distribution

These probability distribution function assume that the outcomes of a random variable under study
can take only specific values usually integers eg a car can only take 0,1,2,3,4,5,6… tyres at any time.
The two common types of discrete probability distributions are the Binomial and Poisson
distributions. For a random variable to follow either the poison or binomial distribution, the
following have to be met.

A discrete random variable can be said to follow a Binomial distribution if the following are satisfied:

i) There are two mutually exclusive outcomes of the random variable generally referred to
as success or failure.
ii) The probability of the success outcome is denoted as p, whereas for the failure is q.

P + q =1

iii) The random variable is observed n times and each observation is called a trial.
iv) The trials are assumed to be independent of each other i.e each trial does not influence
the outcomes of another trial.

If a random variable satisfies all the above conditions, it is said to follow a binomial process

i.e X- Bin (n; p)

The PDF of a binomial distribution is given as :

Where n= number of trials

X= number of success outcomes.

P= probability of a success outcome.

q= probability of a failure outcome.

Example: Ten students seat for an exam. The probability for each student to pass an exam is 0.2.
What is the probability that three of them will pass the paper?

The formula for calculating the mean of a binomial distribution is

Important words or terms used in probability:

i) Exactly or Equals
This can be written as P(X=3).
ii) More than or greater than.
iii) Not more than or at most
iv) Less Than
v) Between and inclusive
vi) Between
vii) Or
viii) And
ix) Not less than or at least

Example: Refer to the example above and answer the following questions.

a) What is the probability that more than two students will pass?
b) Less than two students will fail.
c) Between 2 and 4 students will pass the exam.
d) Between 1 and 3 inclusive will fail
e) Two or three students will pass.
f) Calculate the mean and standard deviation for the number of students who will pass.

POISSON DISTRIBUTION

The poison process measures the number of occurrences of a particular outcome of a discrete
random variable in a pre-determined time space, or volume interval for which an average number of
occurrences of the outcome can be determined eg the number of cars arriving at a parking lot in a
hourly interval or the number of telephone calls received in a ten minutes interval. If a distribution
follows a poison process, then

X – Poi ()

The PDF is defined as follows:

Where is the mean the number of occurrences, x is the number of occurrences whose probability
is being calculated.

Example: The average number of errors a junior typist can make in a page is 6. What is the
probability that she makes:

a) Two errors per page.

b) At least two errors per page.
c) Two errors in two pages.
d) Between 1 and 3 errors in three pages.
e) 1 or 2 errors in a singles page.
f) Find the mean and standard deviation of errors per page.

Answer:

Example: A textile producer has established that a spinning machine stops randomly due to thread
breakages at an average rate of 5 stoppages per hour. What is the probability that in a given hour:

a) 3 stoppages will occur

b) At most two stoppages will occur,
c) More than four stoppages will occur.
d) Not more than two stoppages will occur in a two hour interval.

Answer:
CONTINOUS PROBABILITY DISTRIBUTIONS

A continuous random variable can take any value in an interval. Continuous probability functions are
used for probabilities associated with intervals of X values. You will encounter many business
situations in which the random variables of interest can be treated as a continuous variable. There
are several continuous distributions that a frequently used to describe a physical situation. The most
common and useful continuous distribution function is the normal distribution, the reason being
that the output for many processes are normally distributed.

THE NORMAL DISTRIBUTION

A normal probability distribution function finds the probability for a continuous random variable. It
has the following characteristics:

i) It is bell shaped.
ii) It is symmetrical about a central value (The Mean)
iii) The tails of the distribution never touch the X- axis.
iv) A normally distributed random variable is described by two parameters, namely the
mean and the standard deviation.
v) The area under the curve of the PDF of a normal distribution is equal to 1.

How to read off Normal Distribution tables.

1) Identify a given Z value to one decimal place on the left column.

2) The remaining Z values are given on the top row.
3) The required area of probability is found where the Z values to one decimal place on the left
column intersects the remaining Z values on the top row.

Find the following probabilities where Z is the standard normal variable.

i) P(Z<2.31)
ii) P(Z<-1.49)
iii) P(Z>2.1)
iv) P(-2.5<Z)
v) P(0<Z<2.05)
vi) P(-1.52<Z<0.69)

NB* Always sketch a normal probability distribution curve and indicate the area whose probability is
to be found.

Answer:

P(Z<2.05) - P(Z<0)

0,9798 – 0,5000
=0,4798

f) P (-1,52 < Z < 0,69)

P(Z<0,69) – P(Z<-1,52)

0,7549 – 0,0643

=0,6906

g) Find the following probabilities from the z tables

a. P(0<Z<1,46)

P(Z<1,46) – P (Z <0)

0,9278 – 0,5

=0,4278

b) P ( -2,3 < Z < 0)

P (Z < 0) – [P (Z < -2,3)]

0,5-0,0107

=0,4893

c) P (-2,1 < Z < 1,32)

P (Z < 1,32) – P (Z < -2,1)

0,9066 – 0,0179

=0,8887

d) P (1,24 < Z < 2,08)

P (Z < 2,08) – P (Z < 2,4)

0,9812 – 0,8925

=0,0887

STANDARD NORMAL DISTRIBUTION

The trick is finding probabilities for a normal distribution is to convert the normal distribution to a
standard normal distribution. Values of x associated with any normally distributed random variable
can be converted into corresponding Z values by using the conversion formula.
𝑥−𝜇
Z=
𝜎

Where µ = mean of the specific normal distribution and,

𝜎 = standard deviation of the specific normal distribution.

NB The process is converting X~ 𝑍 is called standardizing

The time taken to install a new telephone is found to be normally distributed with the mean time of
45minutes and a standard deviation of 8minutes. For a new installation what is the probability that

a) It will take less than 40minutes.

b) It will take between 44 and 49 minutes.
c) It will take between 43 and 45 minutes.
d) It will take between 45 and 51 minutes.

X= time taken to install a new telephone

µ = 45minutes
𝜎 =8 minutes

𝑥−𝜇
P(Z < 𝜎
)
41−45
P (X < 40) =
8

P (X < 0,625) = P (X < -3,63)

𝑥−𝜇
b) P (44 < Z < 49) =P (Z < 𝜎
)

49−45 44−45
=P ( 8
<𝑍< 8
)

=P (-0.215 < Z < 0.5)

0,6915 – 0,4483

= 0,2432

𝑥−𝜇 𝑥−𝜇 43−45 45−45

c) P ( 𝜎
)< Z < ( 𝜎
) = 8
<Z< 8

= P (-0,25 < Z < 0)

=0,5 – 0,4013

= 0,0987

45−45 51−45
d. P (45 < X < 51) = <𝑍<
8 8

= 0 < Z < 0,75

0,7734 – 0,5

= 0,2734

The number of customers who enter a certain a super market in a day is normally distributed with the
mean of 400 customers and a standard deviation of 80 customers.

a) What is the probability that on a given day the number of customers is less than 250?
b) Greater than 400
c) Between 300 and 400
d) Between 200 and 500

Solution

µ= 400 , 𝜎 = 80
𝑋− 𝜇
a) P (X < 250) = P ( )
𝜎
250−400
=P ( 80 )
= P (X < -1,875)
= 0,0301

400−400
b) P (X> 400) =
80
= P (X>0)
=0,5

300−400
c) P (300 < Z < 400)= 80
= P (-1,25 < Z < 0)
= 0,5 – 0,1056
= 0,3944
200−400 500−400
d) P (200 < X < 500) = 80 80

=-2,5 1,25
P (2,5 < X < 1,25) = 0,8944 – 0,0062

= 0,8882

The Central Limit Theorem

There are many situations in business where the population is not normally distributed. For simple
random sample of n observations taken from a population with mean 𝜇 and standard deviation 𝜎.The
sum of the random variables will have an approximately normal distribution. More specifically if
𝑥1, 𝑥2…...𝑥𝑛 is a random sample of size n taken from a population with mean µ and standard
deviation 𝜎 the mean of the sample 𝑥̅ follows a normal distribution with the following parameters.

𝜎2 𝑥̅ − 𝜇
𝑥̅ ~𝑁(𝜇 𝑛
) such that the probability P (𝑥̅ < x) = P (Z < 𝜎 )
√𝑛

NB Hypothesis testing and confidence interval estimation are based on this

Hypothesis Testing

In business it is common practice to make blanket statements about a population of interest. An

example could be ″workers earn $1000″. Such statements can be proved to be true or false using
statistical methods. A hypothesis can be defined as claim or assumption / assertion about a true value
of a population parameter that can be proved to be true or not. Hypothesis testing can be defined as a
process of testing / verifying the validity of a claim about a true value of a population parameter.
Hypothesis tests can be carried out on the following population parameters

 Hypothesis testing for a single population mean.

 Large samples
 Small samples

 Hypothesis testing for the difference between 2 population means

 Large samples
 Small samples

 Hypothesis testing for paired differences

 Small samples
Basic steps for hypothesis test

1. Formulate the hypothesis that is null hypothesis which is denoted as 𝐻0 and the alternative
hypothesis which is denoted as 𝐻1 .
2. Determine the type of distribution.
3. Determine the areas of acceptance and rejection.
4. Compute the test statistic.
5. Compare the test static with the critical value and draw a conclusion from the result obtained.

Formulating the Hypothesis

The null hypothesis is a claim made about a true value of a population parameter.

The alternative hypothesis is a statement that reverses or oppresses a claim or oppresses a claim made
about a true value of a population parameter.

𝐻0 : µ= 1000

𝐻1 : µ≠ 1000

The following are different ways in which a hypothesis can be stated.

a) Two sided hypothesis test

 It is a claim that equates a population parameter to a stated value
 It can easily be identified by taking note of words like is exactly, indeed, equal to, same as
etc.
 The hypothesis for a two sided test is stated as:

𝐻0 : µ= a
𝐻1 : µ≠ a

It has two rejection areas

One sided lower tailed test

 This is a claim that states that a population is greater than or equal to a special value
 To identify this type of a hypothesis, look for words such as smaller than, less than, below etc.
 The hypothesis for a lower tail test is stated as

𝐻0 : µ= a

𝐻1 : µ˂ a

It has one rejection area on the left hand side

One sided upper tailed test.

This is a claim that states that a population parameter is greater than or equal to a specified value. It is
identified by taking note of words like greater than, above, beyond etc

The hypothesis for a one sided upper tailed test is stated as

𝐻0 : µ= a

𝐻1 : µ˃a

It has one rejection region on the right hand side.

Errors associated with hypothesis testing.

There are two types of errors that can be made when carrying out a hypothesis.

Type One Error

It is the chance of rejecting a null hypothesis when it is true. It is denoted as α(alpha) which is the
level of significance or the probability of committing a type one error.

Type Two Error

This is the chance of accepting a null hypothesis when it is false, it is denoted as β(beta) which is the
probability of committing a type 2 error.

Step2

Determining the type of distribution. There are two common types of distribution used in hypothesis
taking i.e, the Z-distribution and the t- distribution.

The z-distribution is used if the sample size is large ie

n≥30

The t- distribution is used if the if the sample size is small n≤30

Determining the areas of acceptance and rejection

The acceptance is the region into which when the calculated test statistic falls in it then 𝐻0 is not
rejected. The rejection area is the region into which where the calculated test statistic fails in it then
𝐻0 is rejected.

Critical Values

This is a value that represents the acceptance and rejection regions.

To arrive at this value, the level of significance α is used and it always given as %.

Two sided test

α = 10%

𝛼 0,1
2
Z = 2 = 0,05

-1,64
α = 5%
5
2
= 0,025 = ±1,96

α = 1% = 0,01 = 0,005 = ± 2,57

One sided lower tailed

Zα = Z (0,01) = -2,33
Zα = 0,05 = -1,28

One sided upper tailed test

α = 1%

100% - 1%= 99%

Zα = 𝑍0,99 =2,33

5% = 1,65

10%

Step 4

Computing the test statistic

𝜒− 𝜇
For a large sample the test statistic is calculated as 𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛

𝜒−𝜇
For a small sample the test static is calculated as 𝜒 = 𝑠
√𝑛

Step 5

Drawing a conclusion.

The conclusion depends on the results obtained in the step above, if the calculated test static fails
written the rejection region then 𝐻0 is not accepted that is rejected. If it fails within the acceptance
region we fail to reject 𝐻0 .

Hypothesis Testing for Single Population Mean.

Large Sample
Χ− 𝜇
𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛

Where Χ = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛

µ = population mean

σ = standard deviation n = sample size.

Example

A firm suspect that the average life of 28000km claimed for certain tires is too high. To check this,
claim the firm puts 40 of these tires on these types on its truck and get a mean life time of 27563km
and a standard deviation 1348km. is this evidence that the mean life time for these tires is in fact less
than 28000km. Carry out an appropriate test using α = 0,01

𝐻0 : µ = 28000km

𝐻1 : µ ˂ 28000km
n =40 =˃ z-distribution.
Critical Value

α = 0,01

𝑍𝛼 = 𝑍0,01

= -2,33

Reject 𝐻0 if 𝑍𝑐𝑎𝑙𝑐 ˂ -2,33

Χ− 𝜇
Test Statistic : 𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛

27 563−28000
= 1348
√40

= -2,05

Since 𝑍𝑐𝑎𝑙𝑐 = -2,05 is greater than -2,33 we fail to reject 𝑯𝟎 and conclude at the 1% level of
significance that the mean life time of these tires is 28000km.

A manufacture claims that the light bulbs have an average life of 1600hrs. A sample of 100 light bulbs
tested gave an average life of 1570hrs and standard deviation of 120hrs. Test at the 5% of significance
if this claim is true.
𝐻0 : µ =1600

𝐻1 : µ ≠ 1600

n =100 =˃ Z-distribution

Critical Value
0,05
𝑍𝛼 = 2
= 0,0025 = -1,96
2

Reject 𝐻𝑜 if 𝑍𝑐𝑎𝑙𝑐 ˂ -1,96 or 𝑍𝑐𝑎𝑙𝑐 ˃ 1,96

Χ− 𝜇
Test Statistic : 𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛

1570−1600
= 120
√100

= -2,50

Since 𝑍𝑐𝑎𝑙𝑐 = -2,5 is less than -1,96 we reject 𝐻𝑜 and conclude at the 5% level of significance that the
average life of these light bulbs not equal to 1600.

1. The average monthly salary paid to an employee at a certain company is $340. A study
carried out amongst a sample of 300 employee produced an average monthly salary of $350
with a standard deviation of $60. Test the hypothesis at the 5% level of significance that the
average monthly salary of an employee.

2. The average speed of cars along a high way is 135km\h. A sample study of 200 cars along the
high way showed an average speed of 130km\h with a variance of 900. Test the hypothesis at
the 10% level of significance to determine if the speed of cars along the highway is below
135km\h.

𝐻0 : µ = 340

𝐻1 : µ ≠ 340

N= 300= z-distribution.

Critical Value
5
𝑍𝛼 = 2 = 0,0025 = -1,96
2
Reject 𝐻0 if 𝑍𝑐𝑎𝑙𝑐 < -1,96 or 𝐻0 > 1,96
Χ− 𝜇
Test Statistic : 𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛

350 −340
= 60
√300

= 2,89

Since 𝑍𝑐𝑎𝑙𝑐 = 2,89 greater than 1,96 we reject 𝐻0 n and conclude that at the 5% level of significance
that the average monthly salary of the employees is not equal to $340.

𝐻𝑜 : µ = 135km\h

𝐻1 : µ < 135km\h

N=200 = z-distribution

Critical Value

𝑍𝛼 =10% = 1,28
Reject 𝐻0 if 𝑍𝑐𝑎𝑙𝑐 < -1,28
Χ− 𝜇
Test Statistic : 𝑍𝑐𝑎𝑙𝑐 = 𝜎
√𝑛

130−135
= 30
√200
= -2,86

Since 𝑍𝑐𝑎𝑙𝑐 = -2,36 is greater than -1,28 we reject 𝐻0 and conclude that at the 10% level of
significance that the average speed of cars along a way is less than 135km\h.

𝑯𝟎 to use the t tables

∝
 If it is has a two tailed test divide 2 .
 Look up the results obtained on the top row of the t tables where these results intersect the
degree of freedom this is the critical value
 For a single population mean the degrees of freedom are calculated as n-1 is the sample size
 The critical value is therefore denoted as

One sided lower tail test

-tα n-1

One sided upper tail test

tα n-1

Two sided test

± 𝑡∝ n-1
2

Example
α = 0,025 n=24 𝑡∝ ; n-1
𝑡0,025 ; 23 = 2,07

α = 0,005, n=1
𝑡∝ , n-1

𝑡0,005 ; 4 = 4,60

Rejection Criterion for a t-test.

Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 < -𝑡∝ ; n-1

One sided upper

Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 > -𝑡∝ ; n-1 degrees of freedom

Two Sided test
reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 < -𝑡∝ , n-1 or if 𝑡𝑐𝑎𝑙𝑐 > 𝑡∝ , n-1.
2 2
On coverage the price of a pour of shoes is taken to be $20 from a sample of 25 pairs it was
found that the mean price was $22 with a standard deviation of $5. Test the hypothesis at the
1% level of significance that average price of a pair of shoes is more than $20.

𝐻0 : µ = 20

𝐻1 : µ > 20
𝑛 = 25, t-distribution

Critical Values
𝑡0.01,24

22−20
5 ,= 2
√25

Since 𝑡𝑐𝑎𝑙𝑐 = 2 is less than 2,49 we fail to reject 𝐻0 : µ = 20 and conclude that 1% level of
significance the average price of pair of shoes is $20.

The mean weight of a certain product is assumed to be 85kgs. To prove this, claim a random
sample of 16 such products was studied and it was found that average weight was 83kg with a
standard deviation of 5kgs. Test whether the claim is true or not using α =0,05.

𝐻0 : µ = 85kg
𝐻1 : µ ≠ 85kg

,n=16 t-distribution 𝑡0,05 , 15

Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 is >2,13 or 𝑡𝑐𝑎𝑙𝑐 < -2,13

83−85
5 = -1,6
√16

Since 𝑡𝑐𝑎𝑙𝑐 is > -2,13 we fail to 𝐻0 : µ = 85kg

Hypothesis testing for the difference between 2 means

Large Samples
The sum of the two samples should be greater than 30 when 𝑛1 = size of sample 1 and 𝑛2 =
𝑥̅1 −𝑥̅2
sample size of 2. The test statistic is calculated as 𝑍𝑐𝑎𝑙𝑐 = , where
𝜎2 𝜎2
√ 1+ 2
𝑛1 𝑛2

𝑥̅1 =is the sample mean for sample 1

𝑥̅2 =is the sample mean for sample 2

𝜎 21 = is the variance for sample 1

𝜎 2 2 =is the variance for the sample 2

Small Samples
𝑛1 + 𝑛2 < 30

𝑥̅1 −𝑥̅2
The Test Statistic is calculated as 𝑡𝑐𝑎𝑙𝑐 =
𝑠2 𝑠2
√ 1− 2
𝑛1 𝑛2
Critical Values.

One sided lower tail test

Large sample 𝑍∝
Small Sample -𝑡∝ 𝑛1 + 𝑛2 − 2

Two sided upper tailed test

Large sample 𝑍∝

Small sample ±𝑡∝ , 𝑛1 + 𝑛2 − 2

2
Example
A professor took two samples one of 15males and another of 12 females from students at a
college who were enrolled for statistics course. The professor found that the mean score of
male students in an exam was 76,2 with a standard deviation of 7,4 and the mean score of the
female student was 78,5 with a standard deviation of 6,7. Test at the 5% level of significance
if the mean score of all male students is lower than that of the students

Let males be population 1

Let females be population 2

𝐻0 : 𝜇1 = 𝜇2

𝐻0 : 𝜇1 <𝜇2

𝑛1 = 15 , 𝑛2 = 12

𝑛1 + 1 = 27, < 30 …= t-distribution

Critical Value

α = 0,05

-𝑡∝ 𝑛1 + 𝑛2 − 2 = 𝑡0,05, 25 = -1,71

Rejection Criterion
Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 is < -1,71
𝑥̅1 −𝑥̅2
Test Statistic is calculated as 𝑡𝑐𝑎𝑙𝑐 =
𝑠2 𝑠2
√ 1− 2
𝑛1 𝑛2

76,2−78,5
= 7,4 6,7
√ +
15 12

= -0,85

Since 𝑡𝑐𝑎𝑙𝑐 = -0,85 which is greater than -1,71 we fail to reject 𝐻0 and conclude at the 5% level of
significance that the mean score of all male students is not lower than that of female students.

A transport company want to compare the performance of 2 cars a Nissan and a Toyota. The Nissan
was used to 75times and its average breakdowns was recorded to be 5 with a variance of 4. The
Toyota was used 63 times and its coverage number of breakdowns was recorded to 4 with a variance
of 3. Test the hypothesis whether the performance of the two cars is the same. Use α = 0,05

Let Nissan be population 1

Let Toyota be population 2

𝐻0 : 𝜇1 = 𝜇2

𝐻1 : 𝜇1 ≠𝜇2

𝑛1 = 75 , 𝑛2 = 63

=138 > 30 z-distribution

Critical Value

α = 0,05
𝑍∝ = 𝑍0,05 = 0,025
2 2

= ±1,96

Reject 𝐻0 if 𝑍𝑐𝑎𝑙𝑐 < -1,96 or if the 𝑍𝑐𝑎𝑙𝑐 > 1,96

5−4
Test statistic = 4 3
√ +
75 63

=3,15

Since 𝑍𝑐𝑎𝑙𝑐 3,15 is greater than 1,96 we reject 𝐻0 and conclude that at 5% level of significance that
the level of performance of these two cars is not the same.

The principal of a college wants to compare the performance of two teachers, X and Y. X was assed
8times with the mean of 6,2 scores and a variance of 2,15 scores. Y was assed 6times with the, a mean
of 5,8scores and a variance of 1,2 scores. Test the hypothesis at the 1% level of significance that they
is no difference between the mean number of scores obtained by the two teachers

Let X be population 1

Let Y be population 2

𝑛1 = 8 , 𝑛2 = 6 < 30 we use t-distribution

Critical Value

α =0,01/2 = 0,005

𝑡∝ = 𝑛1 + 𝑛2 − 2 =±2,98

Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 is < -2,98 or if 𝑡𝑐𝑎𝑙𝑐 > 2,98

𝑥̅1 −𝑥̅2 6,2−5,8
Test Statistic = = = 0,58
𝜎2 𝜎2 2,15 1,2
√ 1+ 2 √ +
𝑛1 𝑛2 8 6

Since 𝑡𝑐𝑎𝑙𝑐 is less than 2,98 we fail to reject 𝐻0 .

Hypothesis Testing for Paired Differences.

In some cases, it is possible to pair the measurements from one population or sample. The hypothesis
test tests whether the differences between two measurements, in the population, we will always have
small samples for this type of hypothesis

One sided lower tailed test

𝐻0 = µ𝑑 = 0;

𝐻1 = µ𝛼 < 0;

Critical Value

−𝑡∝ 𝑛 − 1;

One sided upper tailed test

𝐻0 = µ𝑑 = 0;

𝐻1 = µ𝑑 > 0;

Critical Values

𝑡∝ 𝑛 − 1;

Two sided test

𝐻0 = µ𝑑 = 0;

𝐻1 = µ𝑑 ≠ 0;
Critical Value

± 𝑡∝ n-1;
2

𝑑
𝑡𝑐𝑎𝑙𝑐 = 𝑠𝑑
√𝑛

,where d represents the difference between the before measurements and the after measurement that is
𝑑 =𝐵−𝐴
∑𝑑
𝑑̅ = 𝑛 , n= sample size

∑𝑑 2 −𝑛(𝑑̅ )2
SD= standard deviation of the differences calculated as 𝑠𝑑 = √
𝑛−1

Example

The following table shows the before and after use of tobacco for a particular group of people

Heart beat before Heart beat after d d^2

1 81 105 -24 576
2 81 91 -10 100
3 68 87 -19 361
4 61 86 -25 625
5 67 82 -15 225
6 74 78 -4 16
7 75 87 -12 144
8 64 94 -30 900
9 70 93 -23 529
10 60 90 -30 900
Sum 701 893 -192 36864

Does tobacco use increase in the heartrates of these people? Test using α=5%

𝐻0 = µ𝑑 = 0; (does not cause an increase)

𝐻1 = µ𝑑 > 0;

Critical Value

−𝑡∝ 𝑛 − 1; −𝑡0.059 ; =-1,83

Rejection Criterion
Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 is < -1,83
∑𝑑
Test Statistic = 𝑑̅ = 𝑛

=-192/10

= -19,2

∑𝑑 2 −𝑛(𝑑̅ )2
𝑠𝑑 = √ 𝑛−1

4376−10 (−19,20)^2
=√ 9

= 8,75

−19,2
𝑡𝑐𝑎𝑙𝑐 = 8,75 ;
√10

= -6,94

Since 𝑡𝑐𝑎𝑙𝑐 = -6,94 is less than -1,83 we reject 𝐻0 at 5% level of significance and conclude that the use
of tobacco does caused an increase in the heart rates.

You have been trying to control the weight of a chocolate candy bar by intervening in the production
process. The following table shows the weight of before and after intervention. Has the intervention
managed to reduce the weight of the chocolate candy bar? Test at the 1% level of significance upper
tailed.

Before after d=B-A d^2

1.62 1.55 -0.07 0.0049
1.71 1.71 0 0
1.6 1.65 0.05 0.0025
1.61 1.64 0.03 0.0009
1.62 1.62 0 0
1.55 1.63 0.08 0.0064
1.64 1.72 0.08 0.0064
1.61 1.77 0.16 0.0256
1.57 1.63 0.06 0.0036
1.67 1.73 0.06 0.0036
16.2 16.65 0.45 0.0539 sum

𝐻0 = µ𝑑 = 0;

𝐻1 = µ𝑑 > 0;
Critical Values

-𝑡∝ 𝑛 − 1; −𝑡0,019 = 2,82

Reject 𝐻0 if 𝑡𝑐𝑎𝑙𝑐 is >2,82

𝑑
Test Statistic = 𝑡𝑐𝑎𝑙𝑐 = 𝑠𝑑
√𝑛

0,45
𝑑= 10
= -0,045

0,0539 − 10(0,045)^2
𝑠𝑑 = √
9

=2,33

Since 𝑡𝑐𝑎𝑙𝑐 =2,33 is less than 2,82 we fail to reject 𝐻0 at 1% significant level and conclude that the
intervention has managed to control the weight of chocolate candy bars.

Confidence Interval Estimation.

The common statements like ‘the average price of petrol per liter is between $1,40 & $1,50’are
examples of interval estimates. In statistics it is customary to give not only the interval estimate for a
parameter but the probability it will lead to the interval which contains the parameter. The probability
is the level of confidence for example 90%, 95%, 97%.

Small Sample

,n<30

The confidence interval estimate for a small for a small sample is given by following formula
𝑠 𝑠
𝑥̅ − 𝑡∝ 𝑛 − 1 ( 𝑛) ≤ 𝜇 ≤ 𝑥̅ + 𝑡∝ ; 𝑛 − 1( 𝑛);
2 √ 2 √

𝑠
𝑥̅ ± 𝑡𝛼 ;n-1(√𝑛)
2

Example

From the sample of 64 car commuters. The sample mean time taken to commute to work daily was
found to be 26,5minutes if the standard deviation is known to be 15minutes. Find the 95% confidence
interval estimate of the actual mean time µ taken by all car commuters.

,n=64≫ 𝑧-distribution

α = 100 – 95%

=5%
=0,05%

𝑥̅ − 𝑍∝ 𝜎 ≤ 𝜇 ≤ 𝑥̅ + 𝑍∝ 𝜎
2 √𝑛 2 √𝑛

̅̅̅̅̅̅ − 𝑍0,05
26.5 15 ̅̅̅̅̅̅ + 𝑍0,05
≤ 𝜇 ≤ 26,5 15
2 √64 2 √64

̅̅̅̅̅̅ − 𝑍
26,5 15 ̅̅̅̅̅̅ + 𝑍
≤ 𝜇 ≤ 26,5 15
0,025 0,025
√64 √64

26,5 − 𝑍1,96 15 ≤ 𝜇 ≤ ̅̅̅̅̅̅

26,5 + 𝑍1,96 15
√64 √64

22,83 ≤µ≤ 30,18

We are 95% confident that the mean time taken to commute to work daily uses daily between
22,83minutes and 30,18minutes.

If the sample size in the example above was 25 and the means and standard deviation remaining the
sample the same compute α 99% Confidence interval estimate of the population mean µ.

,n=25 →t distribution

α= 100%- CI

100%-99%

=0,01
15 15
̅̅̅̅̅̅
26,5 − 𝑡0,01 24 ( ) ≤ 𝜇 ≤ ̅̅̅̅̅̅
26,5 + 𝑡0,01 ; 24( );
2 √25 2 √25

̅̅̅̅̅̅ − 𝑡0,005 24 ( 𝑠 ) ≤ 𝜇 ≤ 26,5

26,5 ̅̅̅̅̅̅ + 𝑡0,005 ; 24( 15 );
𝑛 √ √25

̅̅̅̅̅̅ − 2,80 ( 15 ) ≤ 𝜇 ≤ 26,5

26,5 ̅̅̅̅̅̅ + 26,5( 15 );
√25 √25

18,1 ≤ µ ≤ 34,9

The mean taken to commute to work daily lies between 18,20minutes and 34,90minutes with a
probability of 0,99.

Confidence Interval Estimate For the difference between 2 population means

Large Samples

The appropriate formula to be used for a large sample is given as

𝜎 𝜎 2 2 2
𝜎 𝜎 2
(𝑋̅1 − 𝑋̅2 ) − 𝑍∝ √ + 2
≤ (𝜇1 − 𝜇2 ) ≤ (𝑋̅1 − 𝑋̅2 ) − 𝑍∝ √ + 2
2 𝑛1 𝑛2 2 𝑛1 𝑛2

𝜎 𝜎 2 2
(𝑋̅1 − 𝑋̅2 ) ± 𝑍∝ √𝑛 + 𝑛 2
2 1 2
A company has two shops A&B to compare the efficiency of the employees of these two shops. 30
employees were sampled from shop A & 20 from shop B & their performance were observed. Shop A
employees completed a given task within 30minutes averagely with a sample standard deviation
6minutes. Shop B employees took given 25minutes to complete the same task an average with a
sample variance of 25minutes. Construct a 95% confidence interval estimate for the difference in the
mean of the number of minutes taken to complete the task by the employees from two shops

Let shop A be population 1

Let shop B be population 2

𝑛1 − 𝑛2 > 30 = 𝑧 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

𝜎 2
𝜎 2 𝜎 𝜎 2 2
(𝑋̅1 − 𝑋̅2 ) − 𝑍∝ √𝑛 + 𝑛 2 ≤ (𝜇1 − 𝜇2 ) ≤ (𝑋̅1 − 𝑋̅2 ) − 𝑍∝ √𝑛 + 𝑛 2
2 1 2 2 1 2

62 252 62 252
(30 − 25) − 𝑍0,05 √30 + 20
≤ (𝜇1 − 𝜇2 ) ≤ (30 − 25) − 𝑍0,05 √30 + 20
2 2

62 252 62 252
5 – Z(0,025) √ + ≤ (𝜇1 − 𝜇2 ) ≤ 5+√ +
30 20 30 20

1,93≤x≤8,07

We are 95% confident that the difference between the mean number of minutes taken to complete a
given task by the employees from the two shops is between 1,93minutes and 8,07

Small Samples

the appropriate formula to be used is given bas follows

𝑛1 − 𝑛2 < 30 = 𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

𝑠2 𝑠2 𝑠2 𝑠2
((𝑋̅1 − 𝑋̅2 ) − 𝑡𝛼;𝑛 − 𝑛2 −2
√ + 2 ≤ (𝜇1 − 𝜇2 ) ≤ ((𝑋̅1 − 𝑋̅2 ) − 𝑡𝛼;𝑛 − 𝑛 −2 √ + 2
2 1 𝑛1 𝑛2 2 1 2 𝑛1 𝑛2

𝑠2 𝑠2
((𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼;𝑛 − 𝑛2 −2
√ + 2
2 1 𝑛1 𝑛2

If the sample size n in the example above were 15 employees from shop A & 10 employees from shop
B and the standard deviation remaining the same. Compute 90% confidence interval estimate for the
difference in the mean number of minutes taken to complete the task given from the two shops.

Let shop A be population 1

Let shop B be population 2

𝑛1 − 𝑛2 < 30 = 𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛

62 252 62 252
((30 − 25) − 𝑡0,05 √ + ≤ (𝜇1 − 𝜇2 ) ≤ (30 − 25) − 𝑡0,05 √ +
15 10 15 10

5 -1,71(2,21) ≤ (𝜇1 − 𝜇2 ) ≤5+0,05(2,21)

1,13≤ (𝜇1 − 𝜇2 ) ≤ 5,11

Determining the sample size

A sample to be drawn from a given population must be represent for a fair conclusion to be made
about the population being represented. It follows then than for a given confidence level and sample
standard deviation and a mean size within the true average is expected to fail then the sample size can
be calculated using the formula

𝑛 = (Z𝛼 , 𝜎)^2
2
e

where Z𝛼 = is the value associated with a given confidence level

σ = standard deviation which shows how much variance one expects on their response

e = margin error.

A recent study of a private company employees, salaries showed a standard deviation of $251,35. The
study would like to estimate the mean salary to be within ±80% of the true mean with a 90%
confidence level. What sample size of the employees must be de for the study.

,e= 80

σ = 251,35

α = 100-90 = 0,1

𝑍0,1
𝑛=( 2
, 251,35)2
80

𝑍(0,05)∗251,35 2
=( 80
)

1,64∗251,35 2
=( )
80

=27 employees
The Chi`- SQUARE DISTRIBUTION OR TEST

The Chi-square distribution is a distribution obtained from multiplying the ratio of sample variance to
population variance by the degree of freedom when a random sample, are selected. Expected
frequencies denoted as Ѐ are frequencies obtained by calculations where, as observed denoted by Ӧ
are obtained by observations. The Chi-squared distribution is denoted 𝜒 2 and it is used to test for
independency.

Test for Independency or Association

In this test the claim is that the row and column variable are independent of each other. The
hypothesis for this test is stated as follows.

𝐻0 : row and column variables are independent of each other

𝐻1 : row and column variables are dependent on each other

𝐻0 : there is no association between the row and column variable.

𝐻1 : there is an association between the row and column variable.

𝐻0 : there is no relationship between the row and column variable.

𝐻1 : there is a relationship between the row and column variable.

Critical Value

The critical value of a Chi-square test is given as

𝜒 2 𝛼 ; (𝑟 − 1)(𝑐 − 1).

Where d.f = (𝑟 − 1)(𝑐 − 1)

,r = number of rows

,c= number of columns

In order to determine whether or not a relationship exists between blood type and the severity to the
winter , a survey was concluded and yield the following results.

Type of Flue A B AB O Total

Severe 34 57 82 55 228
mild 53 45 137 57 292
No 213 218 211 173 815
Total 300 320 430 285 1335

228∗300 228∗320 228∗430 228∗285

1335 1335 1335 1335

Critical Value

α = 0,05

r = 3, c = 4

d.f = (r-1)(c-1) =2*3=6

rejection criterion

r =4, c =3, α =0,01

𝜒 2 0,01 ; (4 − 1)(3 − 1).

𝜒 2 0,01 ; (3 ∗ 2). = 16,8

α = 0,05 r = 7, c = 5

d.f = 6

𝜒 2 0,05 ; (24)=36,4

Rejection Criterion
Reject 𝐻0 if 𝜒 2 𝑐𝑎𝑙𝑐 > 𝜒 2 𝛼 ; (𝑟 − 1)(𝑐 − 1)

(𝑂−𝐸)2
Test Statistic =∑
𝐸

𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙∗𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙

E= 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙

NB. Each observed frequency must have its own expected frequency

Reject 𝐻0 if 𝜒 2 𝑐𝑎𝑙𝑐 > 12,60

Test statistic

(𝑂−𝐸)2
𝜒 2 𝑐𝑎𝑙𝑐 = ∑ 𝐸

𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙∗𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙

E= 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙

O E O-E (0-E)^2 (O-E)/E

34 51.24 -17.24 297.2176 5.8005
57 54.65 2.35 5.5225 0.101052
82 73.44 8.56 73.2736 0.997734
55 48.67 6.33 40.0689 0.823277
53 65.62 -12.62 159.2644 2.427071
45 69.99 -24.99 624.5001 8.922705
137 94.05 42.95 1844.703 19.61406
57 62.34 -5.34 28.5156 0.457421
213 183.15 29.85 891.0225 4.864988
218 195.36 22.64 512.5696 2.623718
211 262.51 -51.51 2653.28 10.10735
173 173.9 -0.9 0.81 56.73988 totals

Since 𝜒 2 𝑐𝑎𝑙𝑐 =56,73988 is greater than 12,60 we reject 𝐻0 at the 5% level of significance and
conclude that there is a relationship between blood type and the severity of winter flue.
A survey of 382 respondents produced the following results.

1 2 3 total
1 45 87 52 184
2 33 65 100 198
total 79 152 152 383

Test the hypothesis that a response failing in any response is independent of the column it will fail use
α = 1%.

Simple Linear Regression and Correlation.

Regression analysis is a statistical method that establishes a linear relationship between two variables.
Correlation analysis closely looks at the strength of this linear relationship between variables.

Regression- establish if there is a nR

Correlation – tests the strength of the nR

The purpose of simple linear regression analysis is to examine some form of linear relationship
between two random variables. These variables are denoted x and y, z

X values are always known or they can be always known or can easily be found whereas Y values are
estimated using x values.

Scatter Plot

This is a plot of x values against y values x values make up the horizontal line of the graph and y
values make up the vertical line of the graph is drawn by plotting dots into space where the values of
x and y Intersect. If the dots seem to lie in a linear form, then a linear relationship exists between the
two variables.

This suggest that x values can be confidently used in predicting the y values

Scatter Plot
Y-Values
3.5
3
2.5
2
1.5
1
0.5
0
0 0.5 1 1.5 2 2.5 3

This indicate that they is a linear relationship between x and y and its positive

As x increases y increases

y
6

0
0 1 2 3 4 5 6

y
6
5
4
3
2
1
0
0 1 2 3 4 5 6
This indicates a linear relationship between x and y and its negative.

A perfect negative linear relationship for a negative linear relation as x increases y decreases. If the
dots are scattered all over the space this suggests no linear relationship between x and y.

If a linear relationship exists between two variables, then x values can be relied upon in pretending the
y values. If a linear relationship does not in predicting the y values, then the x values cannot be relied
upon in predicting the y values

The following data shows the number of garments and the size of cloth meters.

 Identify the independent and dependent variable

 Produce a scatter plot of the data and comment on the relationship

number cloth in
of meters
garment
45 25
28 16
34 20
42 28
34 19
30 17
42 22
39 20
24 14
32 17
20 6
The dependent variable is the number of garment, independent variable is the cloth in meters.

cloth in meters
30

0
0 10 20 30 40 50

From the scatter plot a positive linear relationship exist between cloth in meters and number
of garment.

The following data gives different profits for a particular type of machine sold and the
number of units sold in different shop.
a) Determine the independent and dependent variables.
b) Construct a scatter plot of the data and comment.

Solution
Dependent variable =profits
Independent variable = number of units

profit number
of units
550 42
600 38
650 35
600 40
500 44
650 38
450 45
500 42
number of units
50

0
0 100 200 300 400 500 600 700

Linear Regression Function.

The simple linear regression is given as

𝑦̂ = 𝛽̂0 + 𝛽1 𝑥 where 𝛽̂0 & 𝛽1 are unknowns.

The estimated value of dependent variable y is composed of a linear function 𝛽̂0 + 𝛽1 𝑥 of the
explanatory variable x

The parameter 𝛽̂0 is known as the intercept parameter and the parameter 𝛽1 is known as the slope
parameter. The slope parameter 𝛽1 is of particular interest since it indicates how the expected value of
y depends on x if 𝛽1 >
0 then a positive linear
relationship exist
y
between x and y.
6

0
0 1 2 3 4 5 6

𝑦̂ = 𝛽̂0 + 𝛽1 𝑥 as x increases y will also increases

𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
if 𝛽1 < 0 then
y
a negative
6 linear
5 relationship
4
exist between
x and y.
3

0
0 1 2 3 4 5 6

If 𝛽1 = 0 is a straight line exists if there is no linear relationship between x and y

y
6

0
0 1 2 3 4 5 6

NB: The two unknown parameters 𝛽̂0 & 𝛽1 are estimated from a data set

𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽1 = 𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
𝛽̂0 is calculated from 𝛽1 as follows
∑𝑦 ∑𝑥
− 𝛽1 ( ) →𝛽̂0 = 𝑦̂ − 𝛽1 𝑥̅
𝑛 𝑛

NB: for a specific value of the explanatory variable x the equation provides an estimated value
of y

Example

The following is sample data obtained in a study of the relationship between the number of years that
applicants for a certain job have studied English language in high school or college and the grades
which they received in a proficiency test in that language.

number of years grade in test

3 57
4 78
4 72
2 58
5 59
3 63
4 73
5 84
3 75
2 48

a) plot the scatter graph and comment on the relationship

b). Compute the appropriate regression equation

c). comment on the regression coefficient

d). super impose the equation line into the scatter graph

e). predict the grade in the test for someone with 8years in school studying English language
grade in test
90
80
70
60
50
40
30
20
10
0
0 1 2 3 4 5 6

𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦 10(2404)−(35∗667)
𝛽1 = 𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
→= 10(133)−1225

𝛽1 =6,62

𝑦̂ = 𝛽̂0 + 𝛽1 𝑥

66,7 – (6,62)4

𝑦̂ = 43,53

d). since 𝛽1 > 0 it implies a positive linear relationship between x&y and as x increases by one unit
y increases by a factor of 6,6

𝑦̂ = 43,53 + 6,62x 𝑦̂ = 43,53 + 6,62x

=43,53 + 6,62(2)

=56,77

𝑦̂ = 43,53 + 6,62x

=43,53 + 6,62(4)

=470,01
e). 𝑦̂ = 43,53 + 6,62x

= 43,53 + 6,62(8)

=96,49

=96%

Correlation Analysis

Correlation analysis tests the strength of the between two variables. It measures the strength of a
linear relationship between independent variable x and dependent variable y

The correlation coefficient is denoted as ṙ takes values between -1 ≤ r ≤ 1, that is your r must in -1 ≤ r
≤ 1.

-1 -0.5 0 0.5 1

Perfect –ve correlation -ve correlation no correlation +ve correlation perfect +ve

The common correlation coefficient used in statistics is the Pearson correlation coefficient

It is calculated by

𝑛 ∑ 𝑥𝑦− ∑ 𝑥 ∑ 𝑦
𝑟=
√(𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ) (𝑛 ∑ 𝑥 2 −(∑ 𝑥)2

r = correlation coefficient

n = number of data sets

x =independent variable

y = dependent variable

Suppose an experiment involving 5 subjects is conducted to determine the relationship between the
percentage of a certain drug in the blood stream and the length of time it takes to react to a stimulus.

a) Estimate the regression line equation.

b) Interpret the slope of the regression line.
c) Predict the reaction of a subject with the amount of drug of 1,05% in their blood stream.

Subject amount of reaction

drug time(s)
1 1 1
2 2 1
3 3 2
4 4 2
5 5 4

𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽1 𝑥 =
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2

𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽1 𝑥 = 𝑛 ∑ 𝑥 2 −(∑ 𝑥)2

𝟓(𝟑𝟕)−(1510)
=
5(55)−(15)2

= 0,07

𝑦̂ = 𝛽̂0 + 𝛽1 𝑥

= 2 – (0,7)3

= -0.1+ 0.7x

n =11 ∑ 𝑥𝑦 = 7289, ∑ 𝑥 = 204, ∑ 𝑦 = 370 ∑ 𝑥 2 = 4120, ∑ 𝑦 2 = 13070,

compute and interpret the correlation coefficient.

11 ∗ 7289 − 204 ∗ 370

√(11 ∗ 13070 − 3702 ) − (11 ∗ 4120 − (204)2

= 0,93

= strong positive correlation

Coefficient of determination

It is the square of the Pearson’s correlation coefficient that is

COD =𝑟 2 ∗ 100

This measurement helps to determine the relationship or association of the two variables and its
measured as a percentage. It also helps in estimating the reliability of x values in predicting the y
values.

0,93*0,93*100 =86,49%

The x values are 86,49% reliable in predicting the y values according to this model.

A lady operates a hot dog stand in the park. She suspects that there is relationship between the
temperature in a given day and the number of hotdogs she sells in that day. She begins to keep her
track of the data and obtains the following results.

a). plot a scatter diagram of the data and comment.

b). is it reasonable to fit regression line explain?

c. estimates the regression line equation.

d) interpret

e) estimate the increase in the number of hot dogs sold when the temperature increases from 27
degrees to 30degrees.

Compute the coefficient of determination and interpret it.

Day Temperature number of hotdogs sold

1 25 67
2 23 61
3 20 49
4 21 54
5 28 65
6 32 75
7 31 72
8 33 77
9 27 64
10 25 60

90 Chart Title
80
70
60
50
40
30
20
10
0
0 2 4 6 8 10 12

temperature number of hotdogs sold

Positive linear relationships

b). yes it is reasonable to fit regression line because the scatter plot exhibits some form of
positive linear relationship between temperature of a given day and the number of dogs
sold on that day.

𝑦̂ = 𝛽̂0 + 𝛽1 𝑥

𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝛽1 𝑥 = 𝑛 ∑ 𝑥 2 −(∑ 𝑥)2

∑ 𝑥 = 265, ∑ 𝑥 2 = 7207,

∑ 𝑥 ∑ 𝑦 = 17413 ∑ 𝑦 = 644, ∑ 𝑦 2 = 42186 𝑦̇ = 64,4 ∑ 𝑥𝑦 =

10∗17413−265∗644
10∗7207− 2652
= 1,88

𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
= 64,4 – 1,88* 26,5
= 14,5

d) There is a positive linear relationship.

e) 𝑦̂ = 𝛽̂0 + 𝛽1 𝑥
= 14,58 + 1,88*30
70,98
@27 degrees =14,58 +1,88*27
=65,34

Increase in the number of hotdogs sold

70,98 – 65,34
=5,64
=5 hotdogs sold
𝑛 ∑ 𝑥𝑦− ∑ 𝑥 ∑ 𝑦
f) 𝑟=
√(𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 ) (𝑛 ∑ 𝑥 2 −(∑ 𝑥)2

10∗17413−265∗644
=
√(10∗7207− 265)2 (10∗42186)−6442

= 0,96

COD = r*r (100)

= 0,96(0,96) *100
= 92,16%

The x values are 92,26% reliable in predicting the y values according to this model.
CHI- SQUARE DISTRIBUTION OR TEST

The chi-square distribution is a distribution obtained by multiplying the ratio of sample variance to
population variance by the degrees of freedom when random samples are selected. Expected
frequencies denoted as E are frequencies obtained by calculation whereas observed frequencies,
denoted by O are obtained by taking observations. The chi-square distribution is denoted by and
it is used to test for independency.

Tests for Independence or Association

In this test the claim is that the row and column variables are independent of each other. The
hypothesis for this test are stated as follows:

H0 :Row and Column variables are independent of each other.

H1 : Row and Column variables are dependent of each other.

H0 :Row and Column variables have no association.

H1 :Row and Column variables have an association.

H0 :Row and Column variables have no relationship.

H1 :Row and Column variables have a relationship.

Critical Values of a Chi-Square Test

The critical value of a chi-square test is given as:

Where d.f= degrees of freedom = (r-1)(c-1)

r= number of rows.

c= number of columns.

Example: In order to establish whether or not a relationship exists between blood type and the
severity of winter flue, a survey was conducted and it gave the following results:
Test at 5% level of significance whether there is a relationship.

H0 : There is no relationship between blood type and the severity of a winter flue.

H1 : There is a relationship between blood type and the severity of a winter flue.

Critical Value

Rejection Criterion

Reject H0 if

Test statistic :

NB* Each observed frequency must have its own corresponding expected frequency.

Determining the Sample Size

QAB - II - Lecture - Notes Statistic
No ratings yet
QAB - II - Lecture - Notes Statistic
101 pages
Topic 1 A
No ratings yet
Topic 1 A
116 pages
Notes FPMA1054 Business Statistics 202405
No ratings yet
Notes FPMA1054 Business Statistics 202405
155 pages
Lecture 6
No ratings yet
Lecture 6
27 pages
Day 2 YTU Statistics - 3455436a Becf 436e Bc74 840fabcaee53
No ratings yet
Day 2 YTU Statistics - 3455436a Becf 436e Bc74 840fabcaee53
34 pages
Notes Statistical Diagrams
No ratings yet
Notes Statistical Diagrams
19 pages
Introduction to Statistics Guide
No ratings yet
Introduction to Statistics Guide
12 pages
Data Presentation and Organization
100% (1)
Data Presentation and Organization
40 pages
Presenting Data in Tables and Charts
No ratings yet
Presenting Data in Tables and Charts
56 pages
Which of The Following Does Not Belong To The Group?
No ratings yet
Which of The Following Does Not Belong To The Group?
41 pages
Statistcs Topic 1 Tutorial 2 Stem Plots and Box and Whisker Diagrams
No ratings yet
Statistcs Topic 1 Tutorial 2 Stem Plots and Box and Whisker Diagrams
6 pages
Lecture 2
No ratings yet
Lecture 2
29 pages
Chapter1 RevisionBooklet
No ratings yet
Chapter1 RevisionBooklet
18 pages
2035 CH2 Notes
No ratings yet
2035 CH2 Notes
42 pages
Lesson 2.2.a Stem and Leaf Plot
No ratings yet
Lesson 2.2.a Stem and Leaf Plot
9 pages
Lecture (2) Chapter
No ratings yet
Lecture (2) Chapter
32 pages
Representation of Data: Week 4A
No ratings yet
Representation of Data: Week 4A
11 pages
Lecture2 Slides
No ratings yet
Lecture2 Slides
8 pages
Measure of Central Tendency
No ratings yet
Measure of Central Tendency
8 pages
Data Organization Techniques in Statistics
No ratings yet
Data Organization Techniques in Statistics
14 pages
Lecture2 - Slides 2
No ratings yet
Lecture2 - Slides 2
9 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
28 pages
Data Visualization Techniques
100% (1)
Data Visualization Techniques
25 pages
Objective:: Present Data Using Different Methods
No ratings yet
Objective:: Present Data Using Different Methods
11 pages
Slide01 Introduction To Statistics (Part 2)
No ratings yet
Slide01 Introduction To Statistics (Part 2)
66 pages
Statictics and Measures of Central Tendency
80% (5)
Statictics and Measures of Central Tendency
46 pages
Chapter 2 - Visual Presentation
No ratings yet
Chapter 2 - Visual Presentation
18 pages
QA1 Notes Binder
No ratings yet
QA1 Notes Binder
139 pages
El Moasser 2025 - Statistics
No ratings yet
El Moasser 2025 - Statistics
211 pages
UNIT 3 Methods of Organizing and Presenting Data
No ratings yet
UNIT 3 Methods of Organizing and Presenting Data
24 pages
Stem and Leave Plot
No ratings yet
Stem and Leave Plot
23 pages
Chapter 7 Booklet v1
No ratings yet
Chapter 7 Booklet v1
11 pages
Quantity Surveying Data Analysis
No ratings yet
Quantity Surveying Data Analysis
26 pages
What Is Raw Data?
No ratings yet
What Is Raw Data?
8 pages
Stem and Leaf
No ratings yet
Stem and Leaf
17 pages
CHP 2 Mat161
No ratings yet
CHP 2 Mat161
12 pages
Data Representation: Discrete & Continuous Data Ungrouped & Grouped Data Stem Plots Histogram Cumulative Frequency Curve
No ratings yet
Data Representation: Discrete & Continuous Data Ungrouped & Grouped Data Stem Plots Histogram Cumulative Frequency Curve
10 pages
Business Statistics Essentials
No ratings yet
Business Statistics Essentials
8 pages
3 Organizing Data
No ratings yet
3 Organizing Data
20 pages
Frequency Distribution & Data Analysis
No ratings yet
Frequency Distribution & Data Analysis
8 pages
Statistics - CH - 1 & CH - 2 - Introduction and Describing Data - Tabular and Graphical Presentation
No ratings yet
Statistics - CH - 1 & CH - 2 - Introduction and Describing Data - Tabular and Graphical Presentation
37 pages
Data Handling Notes and Exercises
No ratings yet
Data Handling Notes and Exercises
16 pages
Data Representation and Analysis Notes, Math
No ratings yet
Data Representation and Analysis Notes, Math
35 pages
Methods of Organizing Data
No ratings yet
Methods of Organizing Data
8 pages
DA Lo1
No ratings yet
DA Lo1
26 pages
2data Presentation and Visualization
No ratings yet
2data Presentation and Visualization
47 pages
Exploratory Data Analysis Ch2
No ratings yet
Exploratory Data Analysis Ch2
2 pages
Statistics
No ratings yet
Statistics
3 pages
Population vs. Sample
100% (1)
Population vs. Sample
44 pages
GE3 Module 6 Statistics
No ratings yet
GE3 Module 6 Statistics
21 pages
Chapter 2 Measures of Location
No ratings yet
Chapter 2 Measures of Location
16 pages
Data Presentation
No ratings yet
Data Presentation
21 pages
Write in A Piece of Paper
No ratings yet
Write in A Piece of Paper
87 pages
Lecture No. Statistics and Probability
No ratings yet
Lecture No. Statistics and Probability
64 pages
PSLP Full
No ratings yet
PSLP Full
161 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
195 pages
4 - Organizing Data
No ratings yet
4 - Organizing Data
42 pages
WEEK 3 and 4 - Formulation and Presentation of Data
No ratings yet
WEEK 3 and 4 - Formulation and Presentation of Data
36 pages
Lesson 2: Data Collection and Presentation Learning Objectives
No ratings yet
Lesson 2: Data Collection and Presentation Learning Objectives
11 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
CH 4
No ratings yet
CH 4
113 pages
Module 2 - 4
No ratings yet
Module 2 - 4
40 pages
STATISTICS Module LESSON 5
No ratings yet
STATISTICS Module LESSON 5
17 pages
WHOAnthro2005 PC Manual
No ratings yet
WHOAnthro2005 PC Manual
56 pages
Height and Weight Chart For Girl by Age
No ratings yet
Height and Weight Chart For Girl by Age
34 pages
Grain Size Analysis of Ngrayong Sandstone in Madura Island, North East Java Basin
No ratings yet
Grain Size Analysis of Ngrayong Sandstone in Madura Island, North East Java Basin
8 pages
Compensation Metrics Cheat Sheet: Leverage Data To Overcome Your Most Pressing C&B Challenges
No ratings yet
Compensation Metrics Cheat Sheet: Leverage Data To Overcome Your Most Pressing C&B Challenges
21 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
27 pages
Problem Set #2
No ratings yet
Problem Set #2
6 pages
Growth Charts For Small Sample Sizes Using Unsupervised Clustering: Application To Canine Early Growth
No ratings yet
Growth Charts For Small Sample Sizes Using Unsupervised Clustering: Application To Canine Early Growth
14 pages
Discussion of Percentile
No ratings yet
Discussion of Percentile
32 pages
Summary of Quartiles Deciles Percentiles
No ratings yet
Summary of Quartiles Deciles Percentiles
2 pages
Compensation, Ninth Edition
No ratings yet
Compensation, Ninth Edition
30 pages
Application Handbook USTER Statistics 2013
100% (3)
Application Handbook USTER Statistics 2013
38 pages
RSPM Manual Random
No ratings yet
RSPM Manual Random
80 pages
How To Use SPSS
100% (7)
How To Use SPSS
134 pages
Jurnal Teknologi: A O E R I I P M
No ratings yet
Jurnal Teknologi: A O E R I I P M
6 pages
LP5 Q4W1 Miot
No ratings yet
LP5 Q4W1 Miot
17 pages
ECB Survey Guide for Analysts
No ratings yet
ECB Survey Guide for Analysts
15 pages
Common Malformations Updated Edition Download
100% (13)
Common Malformations Updated Edition Download
16 pages
Paige Green1 2011
No ratings yet
Paige Green1 2011
9 pages
A4 G10 Q4 Module 3 MELC-3
0% (1)
A4 G10 Q4 Module 3 MELC-3
8 pages
Stroop Victoria Troyer 2006 - Normative Data
No ratings yet
Stroop Victoria Troyer 2006 - Normative Data
16 pages
Wfa Boys 0 5 Percentiles
No ratings yet
Wfa Boys 0 5 Percentiles
3 pages
Lecture 11 - Standard Scores
No ratings yet
Lecture 11 - Standard Scores
21 pages
Research Evaluation Group Colimbo
No ratings yet
Research Evaluation Group Colimbo
2 pages
Normal Distribution for Students
No ratings yet
Normal Distribution for Students
11 pages
Critical Values of T-Distribution
No ratings yet
Critical Values of T-Distribution
7 pages
Fetal Weight Percentiles Tool
No ratings yet
Fetal Weight Percentiles Tool
1 page