1
Chapter 1
DESCRIPTIVE
STATISTICS
2
1.1 Introduction to Probability and
Statistics
DESCRIPTIVE AND INDUCTIVE STATISTICS
The science of Statistics deals with the methods
of collecting, presenting, analyzing and
interpreting data so that valid conclusions can be
drawn from them. These methods are categorized
as belonging to two major areas called descriptive
statistics and inductive or inferential statistics.
3
DESCRIPTIVE STATISTICS
Arethose methods concerned with data collection
presentation and to the description of some of its
features to yield meaningfully information without
attempting to draw any inferences from it.
DescriptiveStatistics can be divided into two subject
areas. The first area consists of data presentation
using visual techniques in the form of tables and
graphs. The other area consists of numerical
summary measures for data set.
4
INDUCTIVE OR INFERENTIAL
STATISTICS
Arethose methods concerned with developing
and using mathematical tools to make forecasts
and inferences. The concept of probability is
basic to the development and understanding of
inductive statistics.
5
POPULATION, SAMPLE AND
VARIABLES
One of the goals of statistical investigation is to acquire
information or draw some conclusions about a large group
of items on the basis of a few. What is impossible or
impractical to observe the entire set of observations, we
must, therefore, defend on a subset of the observations.
Allof the conceivable members of a group under study
constitute a population. A sub collection of items drawn
from a population under study is called a sample.
6
POPULATION, SAMPLE AND
VARIABLES
Thecharacteristics that is being studied is a variable.
A variable may be qualitative or quantitative.
Theuncertainty associated with taking a sample from a
given population should be understood first before we
can understand what a particular sample can tell us
about the population. That is why, the concept of
probability is studied first before statistics.
7
COLLECTION OF DATA
Statisticsdeals also with the development of techniques
for collecting data. Data should be properly collected so
that an investigator may be able to answer the question
under consideration with a reasonable degree of
confidence.
The simplest method for ensuring representative selection
of samples is to take a simple random sample. In this
method of sampling , any particular subset of the specified
size has the same chance of being selected.
8
COLLECTION OF DATA
Another method is stratified sampling. This entails
the population units into non-overlapping groups and
taking a sample from each one.
We focused our discussion on a selected few techniques
of data presentation that are most useful and relevant
to probability and inferential statistics.
1.2 Tabular and Graphical Methods in Descriptive 9
Statistics
FREQUENCY DISTRIBUTIONS
The organization of data in tabular form gives the frequency
distribution. Data in frequency distribution may be grouped
or ungrouped.
Raw Data are collected data that have not been organized
numerically. An arrangement of raw data in ascending or
descending order or magnitude is an array. In an array, any
value may appear several times. The number or times a
value appears in the listing is its frequency. The relative
frequency of any observation is obtained by dividing the
actual frequency of the observation by the total frequency.
10
UNGROUPED DATA
When data is small (n<30) or when there are few
distinct values. The data is organized without grouping.
Ex 1.1 A certain machine is to dispense 1.6 kilos of alum. To
determine whether it is properly adjusted to dispense 1.6 kilos the
quality control engineer weighed thirty bags of 1.6 kilo alum after
the machine was adjusted. The data given below refer to the net
weight (in kilos) of each
1.56bag. 1.59 1.62 1.60 1.56 1.62
1.62 1.60 1.59 1.60 1.56 1.56
Arrange the data
1.60 1.62 1.59 1.62 1.56 1.59
In frequency table. 1.62 1.56 1.62 1.58 1.62 1.60
1.60 1.58 1.59 1.62 1.58 1.58
11
Table 1.1 Frequency Distribution of the Weight of Alum
Weight (kl.) Tally No. of bags Relative
((frequency,f) Frequency
1.56 IIII–I 6 0.20
1.58 IIII 4 0.13
1.59 IIII 5 0.17
1.60 IIII–I 6 0.20
1.62 IIII–IIII 9 0.30
Σ 30 1.00
12
GROUPED DATA
Statistical data generated in large masses can be assessed by
grouping the data into different classes.
The following are suggested steps in forming a frequency
distribution from raw data.
1. Find the range (R). The range is the difference the largest and smallest value.
2. Decide on a suitable number of classes depending upon what information the
table is supposed to present. Sturge suggested the number of classes (m) as
m=1+3.3 log n where n=number of cases.
3. Determine the class size (c). C=R/m
4. Find the number of observations in each class. This is the class frequency (f).
CLASS INTERVALS, CLASS MARKS 13
AND CLASS BOUNDARIES
Classes represent the grouping or classification. The range of
values in a class is class interval consisting of a lower limit
and an upper limit. Whenever possible, we must make the
class interval of equal width and make the ranges multiples of
numbers which are easy to work with such as 5, 10, or 100.
The midpoint of the class interval is the class mark. It is a
half the sum of the lower and upper limits of a class. A point
that represents half way, or a dividing point between
successive classes is the class boundaries.
Ex. 1.2 the following are the observed gasoline 14
consumption in miles per gallon of 40 cars.
Arrange the data in a frequency distribution.
24.5 23.6 24.1 25.0 22.9 24.7 23.8 25.2 23.7 24.4
24.7 23.9 25.1 24.6 23.3 24.3 24.6 23.9 24.1 24.4
24.5 25.7 23.6 24.0 23.9 24.2 24.7 24.9 25.0 24.8
24.5 23.4 24.9 24.8 24.7 24.1 22.8 23.1 25.3 24.6
Solution: R = 25.7 – 22.8 = 2.9
m = 1+3.3 log 40 = 6.2 say 7 classes
c = 2.9/7 = 0.41
use c = 0.5
The lowest value is 22 8, therefore, 22.5 may be the
lower limit of the 1st class. 22.5+0.5=23.0 is the lower limit of
the second class.
15
Table 1.2 Frequency Distribution of Gasoline
Consumption
Gasoline Tally No. of cars Relative Frequency
Consumption in (f requency,f)
miles/ gallon
22.5 - 22.9 II 2 0.050
23.0 – 23.4 III 3 0.075
23.5 – 23.9 IIII–II 7 0.175
24.0 – 24.4 IIII–III 8 0.200
24.5 – 24.9 IIII–IIII–IIII 14 0.350
25.0 – 25.4 IIII 5 0.125
25.5 – 25.9 I 1 0.025
sum 40 1.000
Ex. 1.3 the following are the observed
16
compressive strength in kg/cm of 50 samples of
2
fiber roof tiles. Prepare the frequency distribution
table.
136 92 115 118 121 137 132 120 104 125
119 115 101 129 87 108 110 133 135 126
127 103 110 126 118 82 104 134 120 95
146 126 119 119 105 132 126 118 100 113
106 125 117 102 146 129 124 113 95 148
Prepare the frequency distribution table.
Solution: R = 148 – 82 = 66
m = 1+3.3 log 50 = 7 classes
c = 66/7 = 9.4
use c = 10
17
Table 1.3 Frequency Distribution of
Compressive Strength of Fiber Roof Tiles
Compressive Tally No. of Roof tiles Relative
Strenght in F Frequency
kg/cm2
80 – 89 II 2 0.04
90 – 99 III 3 0.06
100 – 109 IIII–IIII 9 0.18
110 – 119 IIII–IIII–III 13 0.26
120 – 129 IIII–IIII–III 13 0.26
130 – 139 IIII–II 7 0.14
140 - 149 III 3 0.06
Σ 50 1.00
18
Class limits, class boundaries
and class marks for frequency
distribution
Table 1.4 Class limits, class boundaries and 19
class marks for frequency distribution
presented in Table 1.2
Classes Class Boundaries Class Marks
22.5 – 22.9 22.45 – 22.95 22.7
23.0 – 23.4 22.95 – 23.45 23.2
23.5 – 23.9 23.45 – 23.95 23.7
24.0 – 24.4 23.95 – 24.45 24.2
24.5 – 24.9 24.45 – 24.95 24.7
25.0 – 25.4 24.95 – 25.45 25.2
25.5 – 25.9 25.45 – 25.95 25.7
Table 1.5 Class limits, class boundaries and 20
class marks for frequency distribution
presented in Table 1.3
Classes Class Boundaries Class Marks
80 – 89 79.5 – 89.5 84.5
90 – 99 89.5 – 99.5 94.5
100 – 109 99.5 – 109.5 104.5
110 – 119 109.5 – 119.5 114.5
120 – 129 119.5 – 129.5 124.5
130 – 139 129.5 – 139.5 134.5
140 - 149 139.5 – 149.5 144.5
21
CUMULATIVE FREQUENCY
DISTRIBUTIONS
The cumulative frequency is the total
frequency of all values either “less than” or
“more than” any class boundary.
22
Table 1.6 Cumulative Frequency Distribution
of Gasoline Consumption
Gasoline “less than” Gasoline “more than”
Consumption Cumulative Consumption Cumulative
mi/gal Frequency mi/gal Frequency
Less than 22.45 0 More than 22.45 40
Less than 22.95 2 More than 22.95 38
Less than 23.45 5 More than 23.45 35
Less than 23.95 12 More than 23.95 28
Less than 24.45 20 More than 24.45 20
Less than 24.95 34 More than 24.95 6
Less than 25.45 39 More than 25.45 1
Less than 25.95 40 More than 25.95 0
23
Table 1.7 Cumulative Frequency Distribution
of Compressive Strength of Fiber Roof Tiles
Compressive “less than” Compressive “more than”
Strength Cumulative Strength Cumulative
Kg/cm2 Frequency Kg/cm2 Frequency
Less than 79.5 0 More than 79.5 50
Less than 89.5 2 More than 89.5 48
Less than 99.5 5 More than 99.5 45
Less than 109.5 14 More than 109.5 36
Less than 119.5 27 More than 119.5 23
Less than 129.5 40 More than 129.5 10
Less than 139.5 47 More than 139.5 3
Less than 149.5 50 More than 149.5 0
24
25
HISTOGRAM, FREQUENCY POLYGON AND
OGIVE
A histogram, or frequency histogram, consists of a set of
rectangles having a.) bases equal to the class interval sizes with
centers at the class marks and b.) heights equal to the
corresponding class frequencies. The areas of the rectangles,
therefore, are proportional to the class frequencies.
A frequency polygon is a line graph of class frequency plotted
against the class mark. It can also be obtained by connecting the
midpoints of the tops of the rectangle in the histogram.
A line graph showing the cumulative frequency plotted against any
class boundary is called a cumulative frequency polygon or
ogive.
26
27
STEM AND LEAF DISPLAY
Stem and leaf display is a combined tabular and graphic
display. The following are suggested steps in constructing a
stem and leaf display.
1. Select one or more leading digits for the stem values. The remaining
digits become the leaves.
2. List all the possible stem values in a vertical column.
3. Record the leaf for every observation beside the corresponding stem
value.
4. Indicate the units for stems and leaves someplace in the display.
A display having between 5 and 20 stems is recommended..
28
Ex. 1.4 Construct the stem and leaf
display of the data in Ex. 1.3
STEMS LEAVES
8 2 7
9 2 5 5
10 0 1 2 3 4 4 5 6 8
11 0 0 3 3 5 5 7 8 8 8 9
9 9
12 0 0 1 4 5 5 6 6 6 6 7
9 9
13 2 2 3 5 6 7 7
14 6 6 8
Stem: hundred and tens digits Leaf: ones digit
29
1.3 Numerical Summary
of Measures
30
Parameter and Statistics
We will focus on numerical summary measures for
qualitative data to calculate a set of numbers that will
characterize the data set and convey some of its salient
features. The important characteristics of a set of numbers
are its location, particularly the center, and its variability.
Numerical summary measures can be calculated from
either a sample or a population. Any quantitative measure
that describes a characteristics of a population is a
parameter, it is a statistic when it describes a
characteristics of a sample.
The following summation notation and rules 31
will be useful in dealing with summary
measures.
Ex. 1.5 The following data represents the time in
32
seconds for 9 samples to react to a stimulant: 2.5,
3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 3.4. Calculate the
mean.
Solution:
n= 9
33
Ex. 1.6 Find the mean weight of alum in Example 1.1
Weight (kl.) No. of Bags
Solution:
(frequency,f)
1.56 6 9.36
1.58 4 6.32
1.59 5 7.95 𝑥=
∑ 𝑓 𝑖 𝑥𝑖 47.81
= =1.59 𝑘𝑖𝑙𝑜𝑠
𝑛 30
1.60 6 9.60
1.62 9 14.58
MEASURES OF CENTRAL 34
TENDENCY:
MEAN, MEDIAN, MODE
Mean
The arithmetic mean or
briefly the mean is the over all
average.
if the data represent the
entire population, the mean of
the values is referred to as the
population mean, μ. If the
data constitute a sample drawn
from a population, the mean
referred to as the sample
35
Mean of Grouped Data
Long Method Short Method Coding Method
, ,
36
Ex. 1.7 Find the mean of gasoline
consumption in Ex. 1.2
Table 1.8
37
38
MEDIAN, x̃:
The median of a set of numbers in an array is either the
middle value or the arithmetic mean of the two middle
values.
If n is odd : If n is even:
x̃ = x (n+1) /2 x̃ = (x (n/2)+ x (n/2) +1) )/2
39
Finding the Median (Example 1.8)
Find the median of
Solution: The given set of numbers is odd (9 digits),
thus
Median,
40
Finding the Median (Example 1.9)
For the set of numbers , find the median,
Solution: The given set of numbers is even (8 digits),
thus
41
Finding the Median (Example 1.10)
Find the median of the data in Ex. 1.5. The given data
are the following:
2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 3.4
Solution: Arrange the data in ascending magnitude.
2.3, 2.5, 2.6, 2.9, 3.1, 3.4, 3.6, 4.1, 4.3
Since the given set of data is odd in number, then the
median is the fifth data, or
42
MEDIAN OF GROUPED DATA
x̃ = Lm + [n/2 – (Σf)L ]c
fm
Where :
Lm = lower class boundary of the median class
(Σf)L = sum of frequencies of all classes lower than the
median class
fm = frequency of the median class
C = size of the median class
43
Finding the Median of Grouped Data
Ex. 1.11 Determine the median of the data in Ex. 1.2 Refer to Table
1.8 (Slide 36)
Solution:
Formula:
44
MODE,
Mode is the value which occurs with greatest frequency in
a dataset. Along with mean and median, mode is a
statistical measure of central tendency in a dataset.
Unlike the other measures of central tendency that are
unique to a particular dataset, there may be several
modes in a dataset.
45
MODE,
No calculation are necessary to find the
mode. simply do the following:
1. Collect and organize the data from a dataset.
2. Determine all the distinct values in a dataset.
3. Count the frequency of occurrence for each distinct
value.
4. The most frequent value(s) is the mode.
46
Finding the Mode in a given dataset
(Unimodal case)
Ex 1.12. For the set of numbers given, find the mode
Solution:
9 is the most frequently appearing in the
given data set, . This is Unimodal
47
Finding the Mode in a given dataset.
(No Mode case).
Ex. 1.13. For the given dataset, find the mode;
Solution:
In the given dataset there is no mode because no
value appears more than any other. In other words, each
value in the dataset occurs only once.
48
Finding the Mode in a given dataset.
(A Bimodal case)
Ex. 1.14 For the given set of values, find the mode.
2.2 3.1 4.1 4.1 5.4 5.4 5.4
6.2 7.7 7.7 8.5 8.5 8.5 9.3
This is a case where the dataset have two modes,
thus called Bimodal and these are
49
MODE OF GROUPED DATA
Where :
Lmo = lower class boundary of the modal class
d1 = excess of modal frequency of the next lower
class
d2 = excess of modal frequency of the next lower class
c = size of the modal class interval
50
Finding the Mode of Grouped Data
Ex. 1.15 Find the mode of the data in Ex. 1.2.
Refer to Table 1.8
Solution:
d1 = 14 - 8 = 6; d2 = 14 – 5 = 9
51
OTHER MEASURES OF LOCATION:
DECILES, QUARTILES, PERCENTILES
DECILE
Are values that divide the data into ten equal parts. D1, D2,…D9 are
values such that one tenth or 10% of the data falls below D1, 20%
fall below D2,…,90% falls below D9.
If D1 is the first decile, then c
Where:
L1 = lower class boundary of the first decile class
(Σf)L = sum of frequencies of all classes lower than the first decile class
f1 = frequency of the first decile class
C = size of the first decile class interval
52
QUARTILES
Are values that divide the data into 4 equal parts. Q1,
Q2 and Q3 are values such that one fourth or 25% of
the data falls below Q1, 50% falls below Q2 and 75%
falls below Q3.
Are
found in the same way as deciles, if Q1 is the first
quartile, then
53
PERCENTILES
Arevalues that decide the data into 100 equal parts. P1,
P2 and P99 are values such that 1% of the data falls
below P1, 2% falls below P2 and 99% falls below Q99.
Are found in the same way as deciles and quartiles.
Ex. 1.16 Use the frequency distribution of Table 1.254
to find the D1, D3, Q1, Q3, P1, P30 and P90 for the
distribution of gasoline consumption of 40 cars.
55
FREQUENCY CURVES AND SMOOTHED
OGIVES
Inthe population, so many observations are available so
that it is theoretically possible (for continuous data) to
choose class intervals very small. Thus, the frequency
polygon for large population have so many small broken
line segments that they approximate curves, which we
call the frequency curves.
By smoothing the frequency polygon of the sample,
theoretical curves of the population can be
approximated.
56
FREQUENCY CURVES AND SMOOTHED
OGIVES
Insimilar manner,
smoothed ogives
are obtained by
smoothing the
cumulative
frequency polygon.
57
TYPES OF FREQUENCY CURVES
1. Symmetrical or bell-shaped – when observations
from the central maximum have the same frequency.
2. Moderately asymmetrical or skewed – when the tail
of the curve to one side of the central maximum is
longer than the other.
a. Skewed to the right – or to have positive skewness if
the longer tail occurs to the right of the central
maximum.
b. Skewed to the left – or to have negative skewness
58
TYPES OF FREQUENCY CURVES
3. J-shaped or reverse J-shaped – maxima occurs at
one end.
4. U-shaped – has maxima at both ends
5. Bimodal – has two maxima
6. Multimodal – has more than two maxima
59
Measures of Variability
• Spread
• Scatter
• Dispersipn
60
IMPORTANCE OF MEASURING VARIABILITY
IMPORTANCE OF MEASURING VARIABILITY
The importance of variability of values in data collected should
be acknowledge. For instance, a company manufacturing car
batteries will be interested not only in the average life of the
batteries but also in how consistent the performance of the
battery is. The manufacturer interested in marketing batteries
with a mean life of 3 years will not be satisfied even if this goal
is realized if there is a very high percentage of batteries that
last up to 1 year. Meaning to say, the success of the batteries
will depend on the variability of life, the smaller the variability
IMPORTANCE OF MEASURING 61
VARIABILITY
Another instance is evaluating the performance of two student A
and B. Let us consider the scores of these two students in
mathematics subject.
Student A: 56, 58, 60, 60, 60, 62, 64 For student A, xA=60; x̃ A=60; xA=60
Student B: 30, 35, 60, 60, 60, 85, 90 For student A, xB=60;
x̃ B=60; xB=60
On the basis of these measures alone, it is incorrect to infer that the
performances of the two students are identical. If we will consider
the variability of scores, Student A is more consistent in his
performance.
62
Three Measures of
Variability
• RANGE
• VARIANCE
• STANDARD DEVIATION
63
RANGE
Is the simplest measure of variability. It is the least
satisfactory because it provides no information at all
about the data between the highest and the lowest
values.
Illustration of Range (Source: https://www.cuemath.com/data/range-in-statistics/)
64
Variance
To overcome the disadvantage of the range, the
variability of the data will depend upon the extent to
which individual observations are spread about the
mean, since, it is by for the most important measure of
central tendency. The deviation of individual
observations are taken as deviation from the mean .
65
Variance
Varianceis a measure that considers position
of each observation relative to the mean. It is
defined as the squares of all the deviations.
66
Population Variance
Versus
SampleVariance
67
Population Variance,
Population variance () tells how data points in a
specific population are spread out.
Symbol used is (sigma squared)
The formula is
Where, (mu) is the population mean
is the population size
68
Sample Variance,
The sample variance, (), is used to calculate how
varied a sample is. A sample is taken from a
population.
Symbol used is
The formula is ; where is the sample mean
With samples, we is used because using n ( n is the sample size) would give a biased
estimate that consistently underestimates variability. The sample variance would
tend to be lower than the real variance of the population.
69
Sample Variance
The sample variance is a statistic. A statistic that
estimates the true parameter on the average is said
to be unbiased. As mentioned earlier, dividing by n
will underestimate the population variance on the
average. So, to compensate the bias in estimating ,
we use in the divisor. Other formula for sample
variance (which often called the short cut formula)
is
70
Derivation of the sample variance
short cut formula
From the formula ;
expanding the numerator yields
Thus,
Then distribute the symbol yields ; finally,
Since , and
then
71
STANDARD DEVIATION
Standard deviation is the positive square root of the
variance.
The population standard deviation,
The sample standard deviation,
72
VARIANCE OF GROUPED DATA
Where,
73
Example 1.17
The following reading were obtained for the tensile
strength (in kg/cm) of six specimens of an alloy:
Find the mean tensile strength and the standard
deviation of the tensile strengths.
74
Solution to Example 1.17
Solving for
2.58 6.6564
2.65 7.0225 Solving for ;
2.40 5.7600
2.46 6.0516 The sample standard deviation
2.44 5.9536 is
2.41 5.8081
75
Solution to Example 1.17 (Short Cut
Formula
2.58 6.6564
2.65 7.0225
2.40 5.7600 Therefore,
2.46 6.0516
2.44 5.9536
2.41 5.8081
76
Ex. 1.18 Determine the standard deviation
of gasoline consumption in Ex. 1.2
77
78
Frequency Cumulative Frequency
Distribution Table Distribution Table
“Less “Less
than” than”
Cumulativ Cumulativ
e e
Frequenc Frequenc
y y
79
Chapter 2
PROBABILITY
80
PROBABILITY
Refers to the study of randomness and
uncertainty. The theory of probability provides
methods for quantifying the chances, or likelihood,
associated with various outcomes.
81
2.1 Sample Spaces and Events
An is any action or process that generates data. The set
of all possible outcomes of an experiment is the . Each
outcome in a sample space is called an or a .
82
Method of Describing a Sample Space.
(The Roster Method)
1. If the sample space has a finite number of sample points,
we may the elements separated by commas and enclosed
in brackets. Example,
A list (or roster method) is defined
𝑺= { 𝑨,𝑩 }
as a way to show the elements of a
set by listing the elements inside of
brackets.
are the elements or the sample
points.
Methods of Describing a Sample 83
Space. (The Statement or Rule
Method)
2. If the sample has large or infinite number of sample points,
describe the set by a or . Example,
Reading: “the set of all such that is a natural number and is
less than ”
84
Example 2.1 (Roster Method of
Describing a Sample Space)
1. An experiment consists of examining a bulb to determine
whether it is defective. Using D for defective and N for not
defective, the sample space for this experiment is,
2. Another such experiment is tossing a single coin. The set of
all possible outcomes is,
3. If we are interested in the number that shows up when a die
is tossed, the sample space would be,
85
Example 2.2 (The Rule Method of
Describing a Sample Space)
1. If the possible outcomes of an experiment are the set of
universities in the Philippines with a population over 5,000,
the sample space is written
- .
2. Similarly, if is the set of all points in the first quadrant
inside a circle of radius 3 centered at the origin, the sample
space is
86
TREE DIAGRAM
In some experiments, it will
be helpful to list the elements of
systematically by means of a
A tree diagram is a tool used in
𝑥
probability and statistics to branches Probable
calculate the number of possible outcomes
outcomes of an event, as well as 𝑦
list those possible outcomes in an
organized manner.
𝑥∧𝑦 𝑎𝑟𝑒 𝑡h𝑒𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠
87
Example of a probability problem
involving flipping a coin
Head Tail
Consider the experiment
of flipping a coin three
times. Construct the tree
The two faces of a coin
diagram. Source:
https://mathalino.com/reviewer/algebr
a/probability-problems-involving-coins
88
Solution to the example of a probability
problem involving flopping a coin
The branches of the tree Figure 2.1)
give the distinct sample points,
Starting at the top and by following
the paths of the branches,
the sample space S has outcomes
Figure 2.1 Tree Diagram for Tossing a Coin 3 Times
Source:
hhttps://www.onlinemathlearning.com/tree-
diagram.html
89
EVENTS
- In the study of probability, we will be interested in any
collection of outcomes in S rather than the individual
outcomes of S. Any collection (subset) of outcomes in
the sample space S is called an event.
90
EVENT
For instance, in tossing a Six faces of a die
die, A = {1,3,5} is the
event that odd numbers
show up. B = {3,4,5,6} is
Source:
the event that a number https://blogs.adelaide.edu.au/maths-
learning/2016/05/24/spotless-dice/
greater than two shows
up.
91
Some Relations from
Set Theory
92
𝑨 ∪ 𝑩
is the event will happen. This connotes the event of
and , or the simultaneous occurrence of and For
instance, if
and
then,
93
𝑨 ∩ 𝑩
is the event will happen. This connotes occurrence of
and .
If , A and B are mutually exclusive events.
Mutually exclusive events are events that cannot occur
simultaneously.
94
𝑨 ∩ 𝑩
Examples are
nd
Then,
Also, nd
Then,
95
′
𝐴
the complement of , is the event will happen. s the set
of all outcomes in that are not in Some illustrations of
complement are as follows:
If, then,
Also, if then,
96
2.2 Counting Techniques
If the number of possible outcomes in an
experiment is quite large, the effort of constructing
the list of outcomes is prohibitive. By exploiting
some counting rules, it is possible to determine the
number of outcomes without listing.
Fundamental Principle: (Multiplication 97
Rule)
If an operation can be performed in ways, and
if for each of these a second operation can be performed
in ways, and for each of the first two a third operation can
be performed in ways, and so forth, then the sequence of
operations can be performed in ways.
98
Permutation
99
Permutation
A permutation is an arrangement of all or part of a group of
objects or elements. Order is an important aspect of permutation.
This means that objects are arranged in a definite order.
For example, the permutation of the set two numbers is , such as
,
10
Representation and Types of 0
Permutation
Permutation can be
represented in many
ways, these are as
nP k
follows:
10
Representation and Types of 1
Permutation
1. The number of permutation of n distinct objects
taken n at a time is (
10
Representation and Types of 2
Permutation
Example: How many ways can five different door
prizes be distributed among five people?
𝑃=𝑛!=5∙ 4∙3∙2∙1=120𝑤𝑎𝑦𝑠
10
Representation and Types of 3
Permutation
The following are the standard truths about
only exists when and does not exist for .
10
Representation and Types of 4
Permutation
2. The number of permutations of n distinct objects
taken at a time is
10
Representation and Types of 5
Permutation
Example: 8 students competed in a quiz show. In how
many different ways can the 1st, 2nd, and 3rd prizes be
awarded?
10
Representation and Types of 6
Permutation
3. The number of permutations of n objects of which
are identical, are identical,…, are identical is
10
Representation and Types of 7
Permutation
Example: In how many different linear arrangements
of the letters of the word PHILIPPINES are there?
P= 3 L=1 S=1 11!
H=1 N=1 ====== 𝑃= =1108800 ways
I=3 E=1 n = 11 3 !∙ 1 !∙3 !∙ 1 !∙1 !∙ 1!∙ 1!
10
Representation and Types of 8
Permutation
4. The number of permutations of n distinct objects
arranged in a circle is
10
Representation and Types of 9
Permutation
Example: In how many ways can 6 people be seated
at a round table?
𝑃= ( 6−1 ) !=5!=5∙4∙3∙2∙1=120ways
11
0
Combinations
11
Combination 1
Combination is the number of ways of selecting r
objects from n without regard to order.
The number of combinations of n objects taken r at a
time has the formula
n
11
Example 2.3 2
A man wishes to travel from Town A to Town D. There are
three roads connecting towns A and B, three roads
connecting towns B and C and four roads connecting town
C to town D. In how many ways can the man travel from
town A to D.
Solution:
N = n1.n2.n3 = 2(3)(4) = 24 ways
11
Example 2.4 3
From the digits 1, 2, 5, 6, and 9
a. How many distinct three digit numbers can be formed?
b. How many of these are even?
Solution
a) n1= number of choices for the ones place value = 5 digits as choices
n2 = number of choices for the tens place value = 4 digits
n3 = number of choices for the hundreds place value = 3 digits
n = 5(4)(3) = 60 distinct three digit numbers.
11
Example 2.4 4
Solution
b) n1= number of choices for the ones place value = 2 digits
n2 = number of choices for the tens place value = 4 digits
n3 = number of choices for the hundreds place value = 3
digits
n = 2(4)(3) = 24 distinct three digit numbers.
11
Example 2.4 5
Alternative Solution for question a.)
Since the digits should be distinct, there are 5 digits to be
arranged by 3’s, such number of arrangement is
n = 5P3 = 5!/ (5-3)! = 5!/2! = 60
11
Example 2.5 6
How many numbers can be formed using all the digits 1, 2, 3, and 4?
Solution:
To form different numbers, arrange all the 4 digits and the
number
of arrangements are the numbers formed.
P = 4! = 24 numbers
11
Example 2.6 7
How many distinct permutations are there in the word
MILLENNIUM?
Solution:
There are 2M’s, 2L’s, 2I’s, 2N’s
P = ___10!____ = 226, 800
2! 2! 2! 2!
11
Example 2.7 8
a) In how many ways can 4 letters a, b, c, and d be
arranged in a circle?
b) How many arrangements are there if a and b must
always be together?
11
Example 2.7 9
Solution
a. P = ( 4-1)! = 3! = 6 ways
b. So that a and b be always, together , arrange only three
positions in a circle thus
n1 = (3-1)! = 2! =2
n = 2! = number of ways two letters a and b
2
be arranged
n = 2(2) = 4 arrangements
12
Example 2.8 0
From a group of 4 men and 5 women, how many
committees of size 3 are possible,
a) with no restrictions
b) with 1 man and 2 women
c) with 2 men and 1 woman if a certain man must
be on
the committee
12
Example 2.8 1
Solution
a) n= number of committees = number of ways of selecting 3 from 9
n= 9C3 = 9!/3!(9-3)! = 9!/3!6! = 84 committees
b) n= number of committees with 1 man & 2 women
n1= no. of ways of selecting 1 man from 4 men
n2= no. of ways of selecting 2 women from 5 women
n= 4C1· 5C2 = 4!/1!3!· 5!/2!3! = 4(10) = 40 committees
c) n= no. of committees with 2 men and 1 woman with a certain man on the
committee
n1= no. of ways of selecting 1 man from 3 men
n2= no. of ways of selecting 1 woman from 5 women
n= 3C1· 5C1 = 3!/1!2!· 5!/1!4! = 3(5) = 15 committees
12
2.3 Probability of an Event 2
The objective of probability is to assign to each event A a
number P(A), called the probability of the event A, which will give a
precise measure of the chance that A will happen.
The probability of an event A is the ratio of the outcomes
favorable to A to the total number of outcomes. Thus, P(A) = n(A)
n(S)
where n(A) = number of outcomes favorable to A
n(S) = total number of outcomes
= number of outcome in the sample space
12
3
Properties of Probability
1. Positiveness
0 ≤ P(A) ≤ 1
2. Certainty
P(S) = 1, the probability of a sure event
12
4
Example 2.9
A coin is tossed 3 times. Find the probability of the
following events:
a. Exactly two heads appear.
b. At least two heads appear.
12
5
Example 2.9
Solution
The sample space for this experiment is
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
a. A= The event that exactly two heads appear
A = {HHT, HTH, THH} n(A) = 3 n(S) = 8
P(A) = 3/8
b. B = The event that atleast two head appear
B = {HHH, HHT, HTH, THH} n(B) =4
P(B) = 4/8 = 1/2
12
6
Example 2.10
In a poker hand consisting of 5 cards, find the
probability of holding
a. 3 aces
b. 4 hearts and 1 diamond
12
7
Example 2.10
Solution
a. A = the event of having 3 aces and two of any kind other than aces
C3 = number of ways of having 3 aces
4
48 C2 = number of ways of having 2 of any kind
n(A) = 4C3 · C2 = 4512
48
n(S) = C5 = 2598960
52
P(A) = 4512/2598960 = 94/54,145
b. B = the event of holding 4 hearts and 1 diamond
n(B) = 13C4 · 13C1 = 9295
P(B) = 9295/2598960 = 143/39,984
12
8
Additive Rules
1. If A & B are any two event, then,
P(A ∪ B = P(A) + P(B) – P(A ∩ B)
2. If A & B are mutually exclusive events, then
P(A ∪ B = P(A) + P(B)
3. If A and A’ are complementary events, then
P(A) + P(A’) = 1
Also, P(A ∪ B) – P(A) = P(B) – P(A ∩ B) = P(B ∩ A’)
12
9
Example 2.11
In a class of 100 students, 69 are taking algebra, 54 are
taking trigonometry and 35 are taking both algebra and
trigonometry. If one of these students is selected at random,
find the probability that
a. The student is taking algebra or trigonometry.
b. The student did not take either of these subjects.
c. The student is taking algebra but not trigonometry
13
0
Example 2.11
Solution
Let A = the event that the student takes algebra
B = the event that the student takes trigonometry
A ∩ B = the event that the student take both algebra &
trigonometry
P(A) = 69/100 P(B) = 54/100
P(A ∩ B ) = 35/100
13
1
Example 2.11
a. P(student takes algebra or trigonometry) = P(A or B) = P(A ∪ B)
P(A ∪ B) = P(A) + P(B) – P(A ∩ B) = 69/100 + 54/100 - 35/100
= 88/100
P(student takes algebra or trigonometry) = 22/25
b. P(student did not take either subjects) = P(not “algebra or
trigonometry”) = P(A ∪ B)’
P(A ∪ B)’ = 1- P(A ∪ B) = 1 – 22/25
P (student did not take either subjects) = 3/25
c. P(student takes algebra but not trigonometry) = P(A and B’) = P(A ∩
B’) = P(A) – P(A ∩ B) = 69/100 – 35/100 = 34/100
13
2
Conditional Probability
The conditional probability of A is given that B
has occurred is defined by
P(A/B) = P(A ∩ B)
P(B)
13
3
Example 2.12
The probability that an automobile being filled with
gasoline will also need an oil change is 0.25; the probability that
it needs new oil filter is 0.40 and the probability that both oil
and filter need changing is 0.14.
a. If the oil had to be changed, what is the probability that a
new oil filter is needed?
b. If a new oil filter is needed, what is the probability that the oil
has to be changed?
13
4
Example 2.12
Solution
Let A = need oil change
B = need new oil filter
P (A) = 0.25 P(B) = 0.40
P(A ∩ B) = 0.14
a. P(need new oil filter/ need oil change) = P(B/A)
P(A ∩ B)P(A) = 0.14/0.25 = 0.56
b. P(need oil change/ need oil filter) = P(A/B)
P(A ∩ B)P(B) = 0.14/0.40 = 0.35
13
5
Multiplicative Rule
If the events A and B can both occur, then
P ( A ∩ B ) = P(A/B) · P(B)
Since P ( AB ) = P ( BA ), then
P(A ∩ B ) = P(B/A) · P(A)
13
6
Independent Events
Two events A & B are independent if and only if
P(A/B) = P(A) and P(B/A) = P(B)
So that
P(A∩B) = P(A) · P(B)
13
7
Example 2.13
A petroleum company exploring for oil has decided to
drill two wells, one after the other. The probability of
striking oil in the first well is 0.2. Given that the first
attempt is successful, the probability of striking oil in
the second attempt is 0.8. What is the probability of
striking oil in both wells?
13
8
Example 2.13
Solution
Let W1 = the event of striking oil in the 1st well
W2 = the event of striking oil in the 2nd well
P(striking oil in both wells) = P(W1 and W2)
P (W1 ∩ W2) =
P(W1)· P(W1/W2) = 0.2(0.8)
P(striking oil in both wells) = 0.16
13
9
Example 2.14
A town has two fire engine operating independently.
The probability that a specific engine is available
when needed is 0.96.
a. What is the probability that neither is available
when needed?
b. What is the probability that a fire engine is
available when needed?
14
0
14
1
Example 2.15
Suppose current flows through
switches A and B to a radio and back to the
battery as shown in fig. 2.2 if the probability that
switch A is closed is 0.8, and given that switch A
is closed the probability that switch B is closed is
0.7. Find the probability that the radio is playing.
14
2
14
3
Example 2.16
Suppose current flows through switches A
and B to a radio and back to the battery as shown in
fig. 2.3. Suppose the probability that switch A is closed
is 0.8, the probability that switch B is closed is 0.6, and
given that switch A is closed the probability that switch
B is closed 0.7, find the probability that the radio is
playing.
14
4
14
5
Chapter 3
PROBABILITY
DISTRIBUTION
S
14
6
3.1 Random Variables
A random variable is a function that
assigns numerical values to the outcomes of a
sample space.
We shall use capital letter to denote a
random variable and its corresponding small
letter for one of its values.
14
7
Classification of Random Variables
Random Variables are classified as discrete or
continuous, depending upon their range of values.
1. DiscreteRandom Variable is one whose set of
possible values is finite or countably infinite.
2. ContinuousRandom Variable is one that can
assume values on a continuous scale.
14
3.2 Probability Distribution of 8
Discrete Random Variables
A probability distribution is a formula or a table listing
all possible values that a random variable can take on.
The probability distribution or probability mass
function (pmf) of a discrete random variable X is defined
for every number x by f(x) = P(X=x)
Properties of a discrete probability function
1. f(x) ≥0
2. Σ f(x) = 1
14
9
Example 3.1
Find the probability distribution for the number of heads that
appear when a coin is tossed 3 times.
Pmf of the no. of heads
Solution X F(x)
0 1/8
Let X = number of heads that appear
1 3/8
x = 0, 1, 2, 3
2 3/8
f(x) = P(X=x) 3 1/8
Σ 1
15
0
Example 3.2
Find the probability mass function of the random
variable M which represents the number of red
balls out of three balls drawn at random from an
urn containing 5 red balls and 6 black balls.
15
1
Example 3.2
Solution
M = number of red balls
m = 0,1,2,3 Pmf of the no. of red balls
f(0) = P(M=0) = 5C0 · 6C3 = 20 = 4 m F(m)
11C3 =165 = 33 0 4/33
f(1) = P(M=1) = 5C1 · 6C2 = 75 = 5
1 5/11
11C3 =165 = 11
2 4/11
f(2) = P(M=2) = 5C2 · 6C1 = 60 = 4
11C3 =165 = 11 3 2/33
f(3) = P(M=3) = 5C3 · 6C0 = 10 = 2 Σ 1
11C3 =165 = 33
15
2
Example 3.2
Alternative Solution
f(0) = P(BBB) = (6/11)(5/10)(4/9) = 4/33
f(1) = P(RBB or BRB or BBR) = 5/11(6/10)(5/9)+(6/11)(5/10)(5/9)+(6/11)(5/10)(5/9) =
5/11
f(2) = P(RRB or RBR or BRR) = 5/11(4/10)(6/9)=(5/11)(6/10)(4/9)=(6/11)(5/10)(4/9) =
4/11
f(3) = P(RRR) = (5/11)(4/10)(3/9) = 2/33
15
Distribution Functions for Discrete 3
Random Variables
The cumulative distribution
function (cdf) F(x) for a discrete random
variable X is defined by
F(x) = P ( X ≤ x ) = Σ f(u)
u≤x
15
4
Example 3.3
Find the cdf for the number of heads that appear when a
coin is tossed 3 times.
Solution
F(0) = P(X ≤ 0) = f(0) = 1/8
F(1) = P(X ≤ 1) = f(0)+f(1) = 1/8 + 3/8= 4/8 = ½
F(2) = P(X ≤ 2) = f(0)+f(1)+f(2) = F(1)+f(2) = 7/8
F(3) = P(X ≤ 3) = f(0)+f(1)+f(2)+f(3) = F(2)+f(3) = 1
15
The Mean and Variance of a Discrete 5
Random Variable
For a discrete random variable X, the mean or expected
value of x is given as
µ = E(X) = Σ x f(x)
The variance of a discrete random variable X with mean µ is
σ = E(X-µ) 2 = Σ(x-µ) 2 f(x)
or σ = Σx 2 f(x) - µ 2
15
3.3 Probability Distribution of 6
Continuous Random Variables
15
The Mean and Variance of a 7
Continuous Random Variable
For a continuous random variable X, the mean or
expected value of X is given as
µ = E (X) = x f(x) dx
The variance of a continuous random variable X
with mean µ is given by
σ = (x-µ) 2 f(x) dx
15
8
Chapter 4
SOME DISCRETE
PROBABILITY
DISTRIBUTIONS
15
9
4.1 The Binomial Distributions
If an experiment consists of n repeated trials, each trial has
two possible outcomes which may be labeled as success or
failure, and if the repeated trials are independent, the probability
of a success remains constant from trial to trial, the experiment is
called a binomial experiment.
The number of success in n independent trials is called a
binomial random variable. The probability distribution if this
discrete random variable is called the binomial distribution.
16
0
If a binomial trial can result in a success with probability p
and a failure with probability q = 1 – p, then the probability
distribution of the binomial random variable X, the number of
success in n independent trials is
f(x) = b (x ; n , p) = nCx p x q n-x
The mean and variance of the binomial distribution are
µ = np and σ 2 = npq
16
1
Example 4.1
A safety engineer claims that only 40% of all
workers wear safety helmets when they eat lunch at
the workplace. Assuming that his claim is right, find the
probability that 4 of 6 workers randomly chosen will be
wearing their helmets while having lunch at the
workplace.
16
2
Example 4.1
Solution
Let X = number of workers wearing helmets
p = 0.40 q = 0.60
n=6
P(4 workers wear helmets) = P(X=4) = f(4) = 6C4 p 4 q 2
=
6! / (4!2!) (0.4)
4
(0.6) 2
P(4 workers wear helmets) = 0.1382
16
3
Example 4.2
If the probability that a fluorescent light has a useful life
of at least 800 hours is 0.9, find the probabilities that
among 20 such lights.
a. Exactly 18 will have a useful life of at least 800 hours
b. At least 15 will have a useful life of at least 800 hours
c. At least 2 will not have a useful life of at least 800
hours
16
4
16
5
16
6
4.2 The Poisson Distribution
An experiment that yields the number of
outcomes occurring during a given time interval or a
specified region is a Poisson Experiment. The number
of outcomes occurring in a Poisson experiment is called
a Poisson random vaiable and its probability
distribution is called a Poisson Distribution.
16
7
4.2 The Poisson Distribution
16
8
Example 4.3
On the average a certain intersection results in 3
traffic accidents per month. What is the probability
that in any given month at this intersection
a. Exactly 5 accidents will occur?
b. Less than 3 accidents will occur?
c. At lest two accidents will occur?
16
9
The Poisson Distribution as a 17
0
Limiting Form of the Binomial
Distribution
If in a binomial experiment, n is large (n → ∞) and
p is small (p→ 0) and µ = np remains constant,
the Poisson distribution can be used to
approximate the binomial distribution.
17
1
Example 4.4
In proof testing of circuit boards, the probability that any particular diode
will fail is 0.01. Suppose a circuit board contains 200 diodes. What is the
probability that at least four diodes will fail on a randomly selected board?
Solution :
Let X = number of diodes that fail
p = 0.01 n = 200
µ = np = 200 (0.01) = 2
P (at least 4 diodes will fail) P(X ≥ 4 = 1 – P(X<4)
Using the Poisson probability sum table with µ = 2
P(at least 4 diodes will fail ) = 1 – 0.8571 = 0.1428
17
2
4.3 The Hypergeometric Distribution
The probability distribution of the hypergeometric random
variable X. the number of success in a random sample of
size n selected from N items of which k are labeled success
and N-k labeled failure, is
17
3
Example 4.5
To avoid detection at customs, a traveler
has placed 6 narcotic tablets in a bottle containing 9
vitamin pills that are similar in appearance. If the
customs official selects 3 of the tablets at random for
analysis, what is the probability that the traveler will
be arrested for illegal possession of narcotics?
17
4
Example 4.5
Solution:
X = number of narcotic tablets
N = 15 n=3 k=6
P(traveler will be arrested) = P(X≥1) = f(1)+f(2)+f(3)
= 6C1· 9C2 /15C3 + 6C2 · 9C
/15C3 + 6C3.9C0 /15C3
= 216/455 + 135/455 +
20/455 = 371/455
P(traveler will be arrested) = 53/65
17
5
Example 4.6
A manufacturing company uses an acceptance scheme on
production items before they are shipped. The plan is a two-
stage one. Boxes of 25 are readied for shipment and a sample
of 3 are tested for defectives. If any defectives are found, the
entire box is sent back for 100% screening. If no defectives are
found, the box is shipped.
a. What is the probability that a box containing 3 defectives
will be shipped?
b. What is the probability that a box containing on;y one
defective will be sent back for screening?
17
6
Example 4.6
17
4.4 The Negative Binomial 7
Distribution
If repeated trials can result in a success with probability
p and a failure with probability q =1 – p, then the
probability distribution of the random variable X, the
number of the trial on which the kth success
occurs, is given bt
f(x) = b*(x; k, p) = x-1C k-1 p k q x-k
17
8
Example 4.7
Find the probability that a person flipping a coin
gets
a. The third head on the seventh flip.
b. The first head on the fourth flip.
17
9
Example 4.7
18
0
Chapter 5
SOME CONTINUOUS
PROBABILITY
DISTRIBUTIONS
18
CONTINUOUS PROBABILITY DISTRIBUTIONS 1
5.1 NORMAL DISTRIBUTION
It is the most important probability distribution in the
entire field of statistics. The graph of a normal distribution,
called the normal curve, is bell-shaped which
approximately describes many phenomena that occur in
nature, industry and research. Examples include physical
measurements in meteorological experiments, rainfall
studies, manufactured parts, including errors in scientific
measurements. The normal distribution is often referred to
as the Gaussian distribution, in honor of Karl Friedrich
Gauss (1977-1855).
18
2
A continuous random variable X having the bell-shaped
distribution is called a normal random variable.
The density function of the normal random variable X,
with the mean μ and variance σ2, is
18
3
THE NORMAL CURVE
1. The mode occurs at x = μ
2. It is symmetric about a vertical axis
through μ
3. The point of reflection is at x = μ ± σ
σ
4. It approaches the horizontal axis
asymptomatically in either direction away
from μ.
X
5. The total area under the curve and the μ
horizontal axis is equal to 1. Fig. 5.1 The Normal Curve
AREAS UNDER THE NORMAL CURVE 18
4
The probability that a continuous random variable X assumes
numerical value between x = x1 and x = x2 is equal to the area
under the normal curve bounded by x = x1 and x = x2, thus,
Is presented by the shades
Fig. 5.2 P(x1<X<x2)
region of the normal curve
area in Fig. 5.2
The difficulty encountered in integrating the density function requires
the tabulation of normal curve areas for quick reference.
18
1.5 STANDARD NORMAL 5
DISTRIBUTION
Theproblem of providing a table for normal curve areas
with different values of μ and σ has been solved by
transforming all observations of any normal random
variable X into a new set of observations of a normal
random variable Z. with μ = 0 and σ = 1.
The normal random variable X will be transformed
intoa standard normal variable Z by
Z=X–μ
σ
If X falls between x = x1 and x = x2, the random 18
variable Z will fall between z = z1 and z = z2. 6
hence,
The distribution of a normal random variable with mean zero
And variance 1 is called a standard normal distribution.
The probability of the original and transformed
18
distributions is shown in Fig. 5.3. the shaded area 7
under the X-curve equals the shaded area under the Z-
curve.
Fig. 5.3 (a) the normal distribution and (b) the standard normal distribution
18
Table A3 gives the area under the standard 8normal
curve
corresponding to P(Z<z), the area to the left of z. To
illustrate
the use of the table, let us find P(z<1.96). In the left column
we locate 1.9, then move along the row to the column 0.06
we read 0.9750. the process is reversed when we find the
z-value corresponding to a given probability.
The following illustrates the use of the table corresponding
to the required probability.
P(Z>z1)= the area to the right of z = 1-P(Z<z1)
18
9
Chapter 6
ESTIMATION
19
0
6.1 STATISTICAL INFERENCE
Is specifically decision making and prediction; it is centuries
old and plays a very important role in most peoples lives.
Some of the various applications include the following:
The government needs to predict short and long-term interest
rates.
A broker wants to forecast the behavior of the stock market.
A metallurgist wants to decide whether a new type of steel is
more resistant to high temperatures than the old type was.
A consumer wants to estimate the selling price of her house
before putting it on the market.
There are many ways to make these decisions or predictions,
19
some subjective and some more objective in nature. How1good
will your predictions or decision be?
Although you may feel that your own built-in decision-making
ability is quite good, experiences suggests that this may not be
the case. It is the job of the mathematical statistician to
provide methods of statistical inference making that are better
and more reliable than just subjected guesses.
Statistical inference is concerned with making decisions or
predictions about parameters, the numerical descriptive
measures that characterize a population. Three parameters
you encountered in earlier chapters are the population mean μ,
the population standard deviation σ and the binomial
proportion p. In statistical reference, a practical problem is
restated in the framework of a population with a specific
19
METHODS FOR MAKING INFERENCE 2
ABOUT POPULATION PARAMETER
Estimation: Estimating or predicting the value of the
parameter.
Hypothesis Testing: Making a decision about the
value of a parameter based on some preconceived
idea about what its value might be.
19
3
Example 6.1
The circuits in computers and other electronics
equipment consist of one or more printed circuit boards
(PCB) and computers are often repaired by simply
replacing one or more defective PCB’s. In an attempt to
find the proper setting of a plating process applied to one
side of a PCB, a production supervisor might estimate the
average thickness of copper plating on PCB’s using
samples from several days of operation. Since he has no
knowledge of the average thickness μ before observing
the production process, his is an estimation problem.
19
4
Example 6.2
The supervisor in Example 6.1 is told by the plant owner that
the thickness of the cooper plating might not be less than 0.001
inch in order for the process to be in control. To decide whether
or not the process is in control, the supervisor might formulate
a test. He could hypothesize that the process is in control, i.e.,
assume that the average thickness of the cooper plating is
0.001 or greater and use samples from several days of
operation to decide whether or not his hypothesis is correct.
The supervisor’s decision-making approach is called a test of
hypothesis.
Hence, both estimation and test of hypotheses are frequently
19
IMPORTANCE OF STATISTICAL 5
PROCEDURES
A statistical problem, which involves planning, analysis,
and inference making, is incomplete if it does not
provide a measure of goodness of the inference. The
following are the importance of statistical procedures:
Methods for making the inference
A numerical measure of the goodness or reliability of
the inference.
19
6
6.2 ESTIMATORS
Is a rule, usually as a formula, that tells us how to
calculate an estimate based on information in the sample.
TYPES OF ESTIMATORS
Point Estimation: Based on sample data, a single
number is calculated to estimate the population
parameter. The rule or formula that describes this
calculation is called the point estimator, and the
resulting number is called the point estimate.
19
7
TYPES OF ESTIMATOR
Interval Estimation: Based on sample data, two
numbers are calculated to form an interval within which
the parameter is expected to lie. The rule or formula that
describes the calculation is called the interval
estimator, and the resulting pair of numbers is called
an interval estimate or confidence interval.
19
8
An estimator of a parameter is said to be unbiased if the
mean of its distribution is equal to the true value of the
parameter, otherwise, the estimator is said to be biased.
Fig. 6.1 Distribution of Unbiased Estimator
Example 6.3 Suppose to the Management of the Philippine
19
Daily Inquirer wishes to estimate the average age of9 the
population of its readers. A random sample of 57 readers is
taken with the following frequency distribution of ages
resulting: Aged Group No. of
Readers
15-19 16
20-24 8 The sample mean x can be
25-29 8 computed from the formula x =
30-34 6 Σ fx/n = 29.63 years and can
35-39 7 be used as an estimate of the
40-44 4 population mean age of the
45-49 4 paper’s readership. Such an
50-54 3 estimate is called a point
55-59 0 estimate because it consist of a
60-64 1 single measure.
57
20
0
INTERVAL ESTIMATION
An interval estimate of a population parameter is
establishing an interval that will contain the parameter.
The sampling distribution of a statistic will determine
the endpoints of the interval containing the population,
within which, it will have a probability equal to 1-a. The
interval is then called the (1-a) 100% confidence
interval, the fraction 1-a is the confidence coefficient or
the degree of confidence, a is the level of significance.
20
1
INTERVAL ESTIMATION
Thewider the confidence interval the more confident
we can be that the given interval contains the unknown
parameter.
The three most commonly used level of significance are
0.01, 0.05 and 0.10. Correspondingly, confidence
intervals 99%, 95% and 90% provide higher degree of
confidence.
20
SINGLE SAMPLE: ESTIMATING THE 2
MEAN
Large Sample Confidence Interval for a
Population Mean, μ
x±z a/2 σ_
√n
This formula is used when n is large (n ≥ 30) and
σ is known. If σ is not known, and when n is large, σ ≈ s.
20
3
za/2 is the z value taken from the table of Areas Under the
Normal Curve, such that the area to the left of –z is a/2,
likewise the area to the right +z is a/2.
Fig. 6.2 Confidence Interval
Small Sample Confidence Interval for a 20
Population Mean, μ x ± t a/2 s_ 4
√n
Thisformula is used when n is small (n<30) and σ is
unknown t is similarly obtained from Appendix 4, Critical
a/2
Values of the t distribution.
When s is used in place of σ the resulting quantity (x- μ) (s-
√n) is called the t statistics. Its sampling distribution is
called the t-distribution.
Like the standard normal distribution, the t-distribution is
bell shaped and symmetrical about zero. It is however more
dispersed and its shape depends on the degrees of
freedom, df=n-1.
20
5
Fig. 6.3 below shows the t-distribution for df=5 and
df=25, relative to the Z distribution. Note that df
increases, the t-distribution approaches the Z distribution.
In fact when df is 29 the values of t and Z are
approximately equal.
Fig. 6.3. t-
Distribution Relative
to Z
20
Example 6.4 Estimate the true average age 6
of readers with 95% confidence interval.
Solution: Thus we say with 95%
s=11.85 a=0.5 confidence or degree of
x±z a/2 s/√n certainty, that the true
population mean age μ, of the
29.63
±1.96(11.85/√57)
readers is between 26.55 and
32.71 years. This is called an
29.63 ±3.076 or interval estimate rather than a
26.55 to 32.71 point estimate. Thus, 26.55< μ
< 32.71.
20
7
Example 6.5
From a population of
Solution: From table A.4, t 0.005 =
examinees in MEP students, a 2.771 with df =27
sample of 28 scores in MEng a = 0.0 (n<30)
504 yielded an average
x±t 0.0025 s/√n
score of 44.07 and a standard
44.07 ± 2.771 (8.996/ √28)
deviation of 8.966. Find the
44.07 ± 4.695
99% confidence interval to
We can assert with 99% confidence
estimate the true mean score
that 39.370< μ < 48.765
in MEng 504.
20
SINGLE SAMPLE: ESTIMATING A 8
PROPORTION
The true proportion can likewise be estimated using a
point estimator p =x/n as a point estimate for the
population proportion p.
LARGE SAMPLE CONFIDENCE INTERVAL for p
If an interval estimate is preferred, the confidence
interval for the true proportion p is given as:
20
9
Example 6.6
A marketing analyst surveys a random sample of 90
cigarette smokers and find that 36 prefer the leading
brand. At 90% confidence interval, estimate the true
proportion of smokers that prefer the leading brand.
21
ONE SIDED CONFIDENCE 0
INTERVALS
For example, a pollster for a candidate may be
concerned only about the minimum percentage of
voters preferring his candidate. He may wish to
estimate a specified degree of confidence (1-a), a
percentage value beyond which his candidate will not
fall lower.
In the standardized normal curve, we would be 21
looking for an upper confidence interval 1
corresponding to the shaded area Fig. 6.4 below.
Fig. 6.4 The Upper Confidence Interval
Likewise, the
lower 1-a
confidence
interval is the
shaded area in
Fig. 6.5
Fig. 6.5 The Lower Confidence Interval
Since the interval is now one-sided, the entire a is confined to one tail,
thus the notation –za for the minimum z value or za for the maximum z
21
2
Example 6.7
In the pollster’s survey of 1000 voters, 320 prefer his
candidate over the others. The pollster can estimate with
99% confidence that the minimum proportion of voters who
prefer his candidate is:
p –za √pq/n
0.320-2.33 √(0.32)(0.68)/1000 = 0.285
Thus, we can say with 99% confidence that at least 28.5 %
of the voters will vote for his candidate.
21
3
Example 6.8
A private bank has ATM’s located in Magic Star Mall. Suppose
that the ATM cash inventory officer is concerned about the
maximum total withdrawal during weekends. To estimate the
average total withdrawal he takes a sample of 36 weekend s
and obtains an average total withdrawal of P400,000 per
weekend with a standard deviation of P60,000. The cash
inventory officer can estimate a with a 95% confidence, the
minimum value of the true average withdrawal per weekend
as:
x +za s/ √n
21
Determination of the Sample Size n 4
for Estimating the Mean
In previous discussions and examples the sample size n was
given or arbitrarily assigned a value. In statistical research, one
of the first question that must be answered by the researcher is
how large must the sample size n, be for the study. If an
estimate for the mean is the object of the study, n is determined
based on the confidence level (1- ) and the allowable error, e,
specified by the researcher’s design of the study, using the
formula:
21
5
Example 6.9
An agriculturist wishes to estimate the mean yield in tons per
hectare of palay in a Central Luzon region. He wants to
determine the sample size to collect such that the sample
mean, x, obtained is correct within 0.25 tons of the true mean
yield per hectare for the region at a 99% confidence level. In
other words, the probability is 0.99 that the estimate will not be
in error by more than 0.25 tons. Previous studies have
determined σ to be 1.5 tons. The sample required is: n=(z0.01/2
σ/e) 2
n=[(2.58)(1.5)/(0.25)] 2
21
6
Admittedly, a difficult with formula(6) above is that the
researcher must rely on previous studies of the same
population for the value of σ or an estimate thereof.
Furthermore two necessary conditions for the use of
formula (6) above are that σ is known and the population is
normally distributed. If the populations cannot be assumed
to be normally distributed and/or σ is unknown (s is used in
its place), and formula (6) yields n < 30, the researcher
must discard use of the formula and use a sample size n ≥
21
Determination of the Sample Size n 7
for Estimating the Proportion
If an estimate for the true proportion is the object of the
statistical study, n is determined using:
n = (za/2) 2 p q
e2
If an estimate for p is not available, it is assumed to be
equal to 0.50, thus, the formula reduces to:
n = [za/2] 2
2e
21
8
Example 6.10
Campaign managers of a senatorial candidate wish to
estimate the true proportion of all voters who prefer their
candidate over two others in the field. They want this
estimate of the proportion to be correct within 2% of the
true proportion with 95% confidence. The required
sample size n is: n = [z0.05/2 / 2e] 2
n = [1.96/2(0.02)] 2
n = 2401 respondents
Two Samples: Estimating the 21
9
Difference Between Two Population
( Means, μ1 – μ2)
In the previous sections, we learned that the estimation of
unknown parameters can be done using point estimates (such as x for
μ, s for σ and p for p), or using interval estimates with a specified
level of confidence (the confidence interval). It is often the case that
the statistician studies this nature proceed by comparing two (or
more) populations using their parameters or estimate thereof.
Such comparison will take the form of estimating the difference
between the parameters of two populations. It is often of interest
whether the difference is greater than zero or some other
hypothesized value, or that there is no difference between the two
population parameters.
22
Confidence Interval for μ1 – μ2, σ’s 0
known
x 1- x 2 Z a/2 √ σ1 2 + σ 2 2
n1
n2
When σ’s unknown and n is large, σ1 s1 and, σ2
s2 .
22
1
Example 6.11
From two groups of college applicants for an
entrance exam, 28 applicants from High School A obtain
a mean score of 44.07 in the Math portion of the exam
while 28 from High School B obtain a mean score of
29.02. Previous studies of the two sources of college
applicants by the Admission Office estimate σA to be 9
and σB to be 10points. Find a 90% confidence interval for
the difference between the true mean Math scores of the
two groups.
22
2
Thus, we conclude with 90% confidence that 1.87 < μA- μB
< 19.23, meaning the true mean score of High School A
applicants is greater than the true mean score of High
School B applicants by an amount between 10.87 to
19.23 points.
22
3
Example 6.12
A sample of 57 readers of Philippine Daily Inquirer had an
average age of 29.63 years with a standard deviation of
11.85 years. A sample of 90 readers of Manila Bulletin
yields an average of 36.17 with a standard deviation of
16.26 of the two prints using a 95% confidence interval.
Solution:
Let x1 = mean age of Manila Bulletin Readers
x2 = mean age of Philippine Daily
Inquirer Readers
22
4
With 95% confidence, the mean age of readers of
Manila Bulletin exceeds the mean age of Philippine
Daily Inquirer readers by 1.985 to 11.09
22
Confidence Interval for μ1 – μ2, σ’s 5
unknown and small sample sizes
x1 - x2 t a/2 √ σp2/ n1 + σp2/ n2
Where: σp2 is a pooled variance computed as
σp2 = (n1 – 1)s12 + (n2 – 1)s22
n1 + n2 – 2
df = n1+ n2 - 2
22
6
Example 6.13
In 1992, fifteen random readings of bank
lending rates in Manila averaged 23% per annum with
a standard deviation of 1%. Fifteen random readings
in Bangkok averaged 13% with a standard deviation
of 0.5%. Find the 90% confidence interval for the
difference between the true mean lending rates of
Manila and Bangkok 1992.
22
7
We are 90% certain that the true mean lending rates in
Manila in 1992, exceeded that of Bangkok by an amount
between 9.51% to 10.49%.
Two Samples: Estimating the 22
8
Difference between Population
Proportions
Confidence Interval for p1 – p2.
(p1- p2) Z a/2 √ p1q1+ p2q2
n1
n2
22
9
Example 6.14
Let us assume that national Literacy Rates (the
proportion literate in a population) are estimated using p
from random samples of 1000 persons over 10 years old. In
the Philippines, in a random sample of 1000, 935 are
considered literate, In Singapore, a random sample of 1000
yields 907 literate.
Let the point estimates for the true proportion
literate in the Philippines be p1= 93.5% and in Singapore be
p2= 90.7%
23
0
We can estimate the difference in literacy rates
using a 90% confidence interval as follows:
(0.935 – 0.907) Z 0.05 √ (0.935)(0.065) + (0.907)(0.093)
1000 1000
0.028 (1.645)(0.012) or 0.00826 to 0.04774
We can say with 90% confidence that the difference
in the proportions literate is between 0.83% to 4.77%.
23
SINGLE SAMPLE: ESTIMATING THE 1
VARIANCE
If the sample variance s2 is used as a point
estimate for the population variance σ2. The confidence
interval can likewise be constructed as:
When n is small, an interval estimate of σ2 can
be established using the chi-squared statistic, , the
distribution of which is shown in fig. 6.6.
23
2
Fig. 6.6 Chi-Squared Distribution
23
3
Example 6.15
From example 6.3, a sample of 57 readers of the
Philippine Daily Inquirer had an average of 29.63 years
with a standard deviation of 11.85 years. Find a 95 %
confidence interval to estimate the standard deviation of
the population of readers.
23
4
Example 6.16
From example 6.5, a sample of 28 scored in MEng 504
yielded an average of 44.07 with a standard deviation of
8.966 points. Find the 99% confidence interval to
estimate the standard deviation of the population of
examinees being sampled. Assume that the population is
nearly normally distributed.
23
5
Chapter 7
HYPOTHESIS TESTING:
DESCRIBING A SINGLE
POPULATION
23
6
7.1 INTRODUCTION
In the previous chapter we’ve dealt with the
estimation of various parameters,. Now, we’re going
to investigate the second type of statistical inference
– hypothesis testing. The purpose of this type of
inference is to determine whether enough statistical
evidence exists for us to conclude that a belief or
hypothesis about a parameter is reasonable.
23
Examples of this type of 7
inference include the following:
1. Companies often to survey to help them make decisions
concerning the effectiveness of their advertising. For
example, suppose that a company has 100% market share.
To improve its position, it launches a new advertising
campaign. At the campaign’s completion, the company
wants to know if the campaign is likely to be successful in
raising its market share. A random sample of purchasers of
the product will provide an answer. Since the survey
produces qualitative results (“Yes, I will buy the product” or
“No, I won’t”) the parameter to be tested in the proportion p.
23
Examples of this type of 8
inference include the following:
2. The research and development departments of
companies frequently develop new products. In order to
ensure that a new product works at least as well as the
earlier versions, scientist may use statistical analysis. For
instance, suppose that an agricultural products firm has
produced a new fertilizer. To determine if it improves crop
yields, researchers will use it to fertilize a random sample
of farms. The resultant crop yields can be measured. The
parameter of interest here is the mean crop yield μ.
23
Examples of this type of 9
inference include the following:
3. The quality control engineer’s responsibility is to ensure that
only a very small proportion of a firm’s product is defective. Since
a complete inspection is impractical. The engineer will base his or
her decision on a sample of units. The data type in this
experiment is qualitative (defective or non-defective), and hence
the parameter of interest is the proportion p. The quality control
engineer may test to determine if there is sufficient to justify
concluding that the proportion of defective units is less than some
critical amount.
In order to answer these and other questions, we must first
develop the structure of hypothesis testing.
24
0
7.2 HYPOTHESIS TESTING
The test of hypothesis consists of four components:
1. Null Hypothesis. H0
2. Alternative Hypothesis, HA
3. Test Statistic
4. Decision Rule
24
1
THE NULL HYPOTHESIS
Refers to any hypothesis we wish to test. This will
always be stated using the equality sign so as to specify a
single value of the parameter. For example, if we wish to
test to determine whether the mean weight loss of people
who participate in a new weight reduction program is 10
pounds we would test H0: μ= 10.
To test whether the proportion of defective stereos
coming off a production line is equal to 3%, we would test.
H0: p= 0.03.
24
2
THE ALTERNATIVE HYPOTHESIS
This hypothesis is really the more important one,
because it is the hypothesis that answers our questions.
The rejection of the null hypothesis leads to the
acceptance of the alternative hypothesis. If the company
wants to know whether its 10% market share has
increased as a result of a new advertising campaign, it
would specify the alternative hypothesis as HA: p>0.10
If it wanted to know whether the campaign
decreased sales, it would test HA: p<0.10
24
3
THE ALTERNATIVE HYPOTHESIS
And if it wished to determine whether its market share
had changed at all, the alternative hypothesis would be
HA: p0.10.
There are two crucial things to remember about the
two hypotheses. First, the null hypothesis must specify one
single value for the parameter. Second, the alternative
hypothesis must answer the researcher’s question by
specifying that the parameter is greater than, less than, or
different from the value shown in the null hypothesis.
24
4
TEST STATISTIC
In any test we decide to reject or not to
reject the null hypothesis. The criterion upon which
we base our decision to reject or not to reject the
null hypothesis is called the test statistic. The test
statistic is based on the point estimator of the
parameter to be tested.
24
5
DECISION RULE
The decision rule defines the range of values for
the test statistic that leads to the acceptance of the null
hypothesis, called the acceptance region. The range of
values that leads to the rejection of the null hypothesis is
called the critical region.
To understand how the decision rule is determined, it is
important to realize that, since our conclusion is based on
sample data, the possibility of making an error always exists.
As indicated in Table 3.1, the null hypothesis is either true or
false, and we must decide either to reject or not to reject the
24
6
DECISION RULE
Therefore, two correct decisions are possible:
accepting the null hypothesis when it is true, and
rejecting the null hypothesis when it is false.
Conversely, two incorrect decisions are possible:
rejecting H0 when its true (this is called a Type I
error), and accepting H0 when it is false (this is called a
Type II error). We define the probability of a Type I
error as a. The probability of a Type I error as B.
24
7
DECISION RULE
The decision rule is based on specifying the
value of a, which is also called the significance level.
We usually select a small value of a such as 0.01, 0.05,
or 0.10. H0 is True H0 is False
Accept H0 Correct Type II error
Decision
Reject H0 Type 1 error Correct
Table 3.1 Possible Situations in Testing a statisticalDecison
Hypothesis
24
8
ONE AND TWO - TAILED TEST
We use one-tail test if the alternative hypothesis states
that the parameter is greater than or less than the value
in the null hypothesis. The entire area of the rejection
region is located in one tail of the sampling distribution.
We use two-tail test if the alternative hypothesis states
that the parameter is either too large positive or too large
negative than the value in the null hypothesis. The critical
region is split into two parts , having equal probabilities
placed in each tail of the sampling distribution.
24
9
ONE AND TWO - TAILED TEST
To illustrate , suppose that we want to test.
H0: μ = 50. HA: μ > 50.
If we assume that the population variance is known, the
test statistic is. z=x–μ
σ/√n
The decision rule is reject H0 if z > za. The entire area of
the rejection region is located in one tail of the sampling
distribution as shown in Fig. 7.1
25
0
If we test Fig. 7.1 Rejection Region for One Tail Test
H0: μ = 50.
HA: μ > 50.
The decision rule is reject
H0 if
z < -za (See Fig. 7.2)
Fig. 7.2 Rejection Region for One Tail Test
If we test
H0: μ = 50.
HA: μ 50.
The decision rule is reject
H0 if z > za/2 or if z < -za/s (See
Fig. 7.3 Rejection Region for Two Tail Test
25
SIX STEP PROCESS FOR TESTING 1
HYPOTHESIS
1. State the null hypothesis.
2. Choose an appropriate alternative hypothesis.
3. Specify the significance level.
4. Define the decision rule or establish the critical region.
5. Compute the value of the test statistic.
6. Make a decision and answer the question.
There are certain guidelines desirable in determining 25
which hypothesis should be stated as H0 and which should
2
be stated as HA. If the claim you want to test suggest
direction such as more than, less than, superior to,
inferior to, and so on, the HA will be stated using the
inequality symbol (< or >) corresponding to the
suggested direction. If the claim suggest a compound
direction (equality as well as direction) such as at least,
equal to or greater, at most, no more than, and so on,
then this entire compound direction (≤ or ≥) is expressed
as H0, but using only the equality sign, and HA is given by
the opposite direction. Finally, of no direction what so
ever is suggested by the claim, then HA is stated using the
not equal symbol (≠).
25
TESTING THE POPULATION MEAN 3
WHEN THE VARIANCE IS KNOWN
We are now going to use the
method to test hypothesis about a
population mean when the variance is
known.
25
4
EXAMPLE 7.1
A manufacturer of a new, cheaper type of light bulb
claims that his product is better than the higher-priced
competitive light bulb. The average life of the other light
bulb is known to be 5,000 hours. In a test to examine the
manufacturer’s claim, 100 of his bulbs are left on until
they burn out. The average length of life in the sample is
5,100 hours. With a significance level a=0.05, is there
enough evidence to support the manufacturer’s claim?
Assume that σ = 500 hours.
Solution: 25
1. Since the claim is μ>5,000 , 5
4. Because the alternative hypothesis
then H0: μ = 5,000 is one sided, the critical region is za =
HA: μ > 5,000 z0.05 = 1.645, our decision rule is to
reject H0 if z > 1.645.
2. In the test of the population
mean μ (σ is known), the test 5. The test statistic is z= 5,100 – 5,000
statistic is z = x – μ = 2.0
σ/√n 500 / √ 100
We assume that either the
population is normal or n is 6. Decision: Since the value of the test
sufficiently large. statistic is 2.0>1.645, we reject H0. we
therefore conclude that sufficient
3. The significance level a = evidence exists to support the
0.05. manufacturer’s claim that his light
bulb lasts longer on the average than
his competitor’s product.
25
INTERPRETING THE RESULTS OF A 6
TEST
In Example 7.1, we rejected the null hypothesis, but
this does not prove that the alternative hypothesis was true.
Since our conclusion is based on sample data (and not on the
whole population) we can never prove anything by using
statistical inference. As a result, we say that enough statistical
evidence exists to allow us to conclude that the alternative
hypothesis is true.
In Example 7.1, suppose that x = 5,050 and that as a
consequence, the value of the test statistic z = 1.0. Now the
value of the test statistic does not fall into the rejection region.
This result does not allow us to conclude that sufficient evidence
25
INTERPRETING THE RESULTS OF A 7
TEST
Unfortunately, we can never have enough statistical
evidence to establish that a population parameter equals the
value specified in the null hypothesis (unless we sample the
whole population). Thus, if the value of the test statistic does
not fall into the rejection region, then, rather than say that we
accept H0 (which implies that we are saying that the null
hypothesis is true). We state that we do not reject H0, and we
conclude that not enough evidence exists to show that the
alternative hypothesis is true. Notice that, no matter the result
of the test, the conclusion is based on the alternative
hypothesis.
25
8
REJECTING AND NOT REJECTING H0
If we reject H0, we state that there is
enough evidence to show that HA is true.
If we do not reject H0, we state that there
is no enough evidence to show that HA is true.
25
9
Example 7.2
In the midst of labor-management negotiations, the president of
the union claims that her blue-collar workers (whose annual
income is P20,000) are underpaid, since the average annual
Filipino blue-collar income exceeds P20,000. Management claims
that the workers are well paid since the average annual Filipino
blue-collar income is less than P20,000. To help resolve the
impasse, an arbitrator decides to do a survey of 400 Filipino blue-
collar workers to determine if their mean income is different from
P20,000. Assuming that σ = P8,000, can be the arbitrator
conclude at the 5% significance level that μ is different from
P20,000 if the sample mean is x = 20,000?
Solution: In this example we want to know if the 26
mean annual income differs from P20,000. 0
The complete test as follows.
1. H0: μ = 20,000 HA: μ≠ 6. Conclusion: Accept H0. since there is
20,000 not enough evidence to show that the
2. Test statistic is z= x mean annual Filipino blue-collar income
–μ is different from P20,000, the arbitrator
σ/√n may conclude that the union member’s
3. Significance level a = 0.05. incomes are close to the Filipino
4. The critical region is za/2=z average.
0.025 =1.96 or –za/2 = -z0.025 =
1.96. The decision rule is to
reject H0 if z> 1.96 or z < -
1.96.
Fig 7.4 . Sampling Distribution for example 7.2
5. Value of the test statistic:
26
1
PROBABILITY VALUE OF A TEST
One of the drawbacks of the testing procedure described
above is that the significance level selected can change the test’s
conclusion. For instance, in Example 7.1 the value of the test statistic z
= 2.0 fell into the rejection region when we used a = 0.05 (rejection
region z >1.645); but if we had set a = 0.10 (rejection region z >
2.33), we would not reject the null hypothesis. One way of avoiding
this problem is by reporting the prob-value of the test.
The prob-value of a test is the value of a that would result
in rejection of the null hypothesis. A small prob-value indicates that
the test statistic is either very large positive or very large negative . A
small probability thus results in the rejection of the null hypothesis.
The reader of the statistical result decides what is small enough.
26
2
PROBABILITY VALUE OF A TEST
If in example 7.1 the decision rule was to reject H0 if z ≥
1.645, then the value of the test statistic z = 2.0 would fall into the
rejection region. The pro b-value is simply the probability P(z ≥ 2.0).
Using the table 5 in Appendix B, we find P(z ≥ 2.0) = 0.0228. If we
decided that a = 0.05 is small enough, then since 0.0228 is less than
0.05, we reject the null hypothesis. If, however, we set a = 0.10, then
since 0.0228 is greater than 0.01 we do not reject the null hypothesis.
(See Fig. 7.5)
For a two-tail test, we would multiply the tail-area
probability by 2 to get the prob-value. In example 7.2, the value of the
test statistic z = 1.25. The probability P(z ≥ 1.25 = 0.1056 would
26
3
PROBABILITY VALUE OF A TEST
The smaller the probability value the greater is the impetus
to reject the null hypothesis.
Fig. 7.5 Calculation of the Prob-Value for Example 7.1
26
TESTING THE POPULATION MEAN 4
WHEN THE VARIANCE IS UNKNOWN
In this section we progress to the more
realistic case in which the population variance is known.
The test statistic in the case is
t=x-μ
s/√n
Which is Student t distributed with n-1 degrees of
freedom, as long as the population is normally
distributed.
26
5
Example 7.3
A manufacturer of television picture tubes has a production
line that used to produce an average of 100 tubes per day.
Because of new government regulations, a new safety device is
installed, which the manufacturer believes will reduce average
daily output. After installation of the safety device, a random
sample of 15 day’s production was recorded as follows:
93, 103, 95, 101, 91, 105, 96, 94, 101, 88, 98, 94, 101,
92, 95
Assuming that the daily output is normally distributed, is
there sufficient evidence to allow the manufacturer to conclude
that the average daily output has decreased following installation
Solution: Since we want to know whether 26
the mean production is less than 100, we 6
test as follows:
H0: μ = 100 HA: μ < 100 Computation: x = 94.67 s = 4.85
Test Statistic: t = x - μ t = 96.47-100
s/√n 4.85 / √15
Significance Level: a = t = -2.82
0.055 Decision: Reject H0. There is not
Critical Region: -t 0.05 = - enough evidence to indicate that
1.761 with df = n – 1 = 14. the mean daily production has
decreased after the installation
The decision rule is to
of the safety device.
reject H0 if t < - 1.761
26
TESTING THE POPULATION 7
PROPORTION
When the data type is qualitative, the parameter of
interest is the population proportion. The point estimator
of this parameter is the sample proportion, p.
The test statistic used to test a population proportion
is
z= p–p
√ pq/n
In order for the test statistic to be valid, the sample
26
8
Example 7.4
After careful analysis, a company
contemplating the introduction of a new product has
determined that it must capture a market share of 10%
to break even. Anything greater than 10% will result in
a profit for the company. In a survey, 100 potential
customers are asked whether or not they would
purchase the product. If 14 people respond
affirmatively, is this enough evidence to enable the
company to conclude that the product will produce a
profit? (use a = 0.05).
26
9
Solution: The data type is qualitative, since the
possible responses are “ Yes, I would purchase this
product” and “ No, I would not purchase this product.”
The parameter of interest is the population proportion p.
The complete test is as follows: H0: p = 0.10
HA: p > 0.10
Test Statistics:z = p – p
√pq/n
Significance Level : a = 0.05
27
0
Computation: z = 0.14 – 0.1 = 1.33
√ 0.10 (0.90)/ 100
Decision: Accept H0. There is not enough evidence to
conclude that the product will contribute a profit to the
company. Observe that we find some evidence to support a
conclusion that the population is greater than 10% (the
sample proportion was 14%). But the evidence was not
strong enough (at 5% significance level) to allow us to say
that the population proportion exceeds 10%).
27
1
Chapter 8
HYPOTHESIS TESTING:
COMPARING TWO
POPULATION
27
2
8.1 INTRODUCTION
In this chapter, we will extend the range of
techniques by discussing how to perform test of
hypothesis when the problem objective involves
comparing two populations. Examples of the use of
these methods include the following:
1. Market Managers and advertisers are eager to know
which segments of the population are buying their
products. If they can determine these groups, they can
target their advertising messages and tailor their products
to these customers.
27
3
For example, if advertisers determine that the decision to
purchase a particular household product is made more
frequently by men than by women, the interest and
concerns of men will be the focus of most commercial
messages. The choice of advertising media also depends
on whether the product is of greater interest to men or to
women. The most common way of measuring this factor is
to find the difference in the proportions of men and women
buying the product. In these situations the parameter to be
tested is p1 –p2.
2. Customers who purchase televisions, major household
appliances, and automobiles consider reliability a major 27
factor
in their brand choice. Reliability can be measured by 4 the
amount of time the product lasts. To compare two brands of
televisions, we would test μ1 - μ2, the difference between the
mean lifetimes of the two brands.
3. Production supervisors and quality-control engineers are
responsible for measuring, controlling and minimizing the
number of defective units that are produced at a plant.
Frequently more than one method or machine can be used to
perform the manufacturing function. The decision about which
one of two machines to acquire and use often depends on
which machine produces a smaller proportion of defective
units or in other words, on the parameter p1 – p2, the difference
in the proportions of defective units from each machine.
8.2 TESTING THE DIFFERENCE 27
5
BETWEEN TWO POPULATION
MEANS
We will go through the same steps to test hypothesis
in this chapter. All hypotheses will feature μ1 - μ2. the
hypothesis will again specify that μ1 - μ2 is equal to some
value of μ0 (usually zero), while the alternative hypothesis
takes on the following formats, depending on what the
question asks:
1. H A : μ1 - μ2 ≠ μ0
2. HA : μ1 - μ2 > μ0
3. HA : μ1 - μ2 < μ0
27
6
27
7
Example 8.1
The selection of a new store location depends on many factors, one of
which is the level of household income in areas around the proposed site.
Suppose that a large department – store chain is trying to decide
whether to build a new store in Tarlac or in the nearby city of Cabanatuan.
Building costs are lower in Cabanutan, and the company decides it will
build there unless the average household income is higher in Tarlac than
in Cabanatuan. A survey of 100 residence in each of the cities found that
the mean annual household income was P29,980 in Tarlac and P 28,650 in
Cabanatuan. From other sources, it is known that the population standard
deviation of annual household incomes are P4,740 in Tarlac and P5,365 in
Cabanatuan. At the 5% level, can it be concluded that the mean
household in Tarlac exceeds that of Cabanatuan? Assume that the
Solution: The parameter to be tested is μ1 - μ2 (where
27
μ1 = the mean annual household income in Tarlac, and8
μ2 = the mean annual household income in
Cabanatuan).
Computation:
The complete test is as follows:
z = (29,980 -28,650)-
H :μ -μ
0 1 2 = μ
0
0____
H :μ -μ > μ
√4,740 2/100 + 5,365 2/100
A 1 2 0
Test Statistics: z= (x1-x2)-(μ1 - μ2)
z = 1.86
√σ1 2/n1+σ2
Decision: Reject H0. There is enough
2
/n2 evidence to infer that the mean
Significance Level: a = 0.05 household income in Tarlac exceeds
Critical Region: z 0.05 = 1.645 that of Cabanatuan. Hence, despite
lower building costs in Cabanatuan,
The decision rule is to reject H0 of
we recommend locating the new
8.3 TESTING THE DIFFERENCE 27
9
BETWEEN TWO POPULATION
PROPORTIONS
There are two different test statistics for this
parameter, the choice of which one depends on the null
hypothesis.
Case 1. If the null hypothesis specifies the difference
between the two population proportions is zero, the test
statistic is
z= (p1 – p2) – (p1 – p2)
√ p q [ 1/n1 = 1/n2]
28
0
The statistics are:
p1 = x1 / n1 (the proportion of success in sample 1)
p2 = x2 / n2 (the proportion of success in sample 2)
p = x1 + x2 (the proportion of success in both samples
combined)
n1 + n2
q=1-p
28
1
Case 2. If the null hypothesis states that the difference
between the two population proportions is a non zero
value (H0: p1 – p1 = p0 where p0 ≠ 0), the test statistic is
z = (p1 – p2) – (p1 – p2)
√p1q1/n1 + p2q2/n2
In both cases, the sample sizes must be sufficiently large.
28
2
EXAMPLE 8.2
An insurance company is thinking about offering
discounts on its life-insurance policies to nonsmokers. As
part of its analysis, it randomly selects 200 men who are
50 years old and masks them if they smoke at least one
pack of cigarettes per day and if they have ever suffered
from heart disease. The results indicate that 20 out of 80
smokers and 15 out of 120 nonsmokers suffer from heart
disease. Can we conclude at the 5% level of significance
that smokers have a higher incidence of heart disease
Solution: The data type is qualitative, since the
responses are “suffer from heart disease” and “ 28 don’t
3
suffer from heart disease”. The parameter of interest is
therefore p1 – p2 (where p1 = proportion of smokers who
suffer from heart disease, and p2 = proportion of
nonsmokers who suffer from heart disease. Since we want
to know whether p1 is greater than p2 then H0 : p 1 – p 1
=0
HA : p 1 – p 1 > 0
Test Statistic: z = (p1 – p2) – (p1 – p2)
√ p q [ 1/n1 + 1/n2]
Significance level: a = 0.05
Critical region: z 0.05 = 1.645. The decision rule is to reject H 0
28
Computation : p1 = 20/80 = 0.25 p2 = 15/120 =4
0.125
p = (20 +15)/ (80 + 120) =
35/200 = 0.175
z= ( 0.25 – 0.125 ) – 0________
√(0.175)(0.825)(1/80 + 1/120)
z = 2.28
Conclusion: Reject H0 There is sufficient evidence to enable
us to conclude that the proportion of smokers who suffer
from heart disease is greater than the proportion of non
28
5
Chapter 9
ANALYSIS OF VARIANCE
28
6
9.1 INTRODUCTION
Analysis of variance (ANOVA) is a techniques in inferential
statistics designed to test whether or not more than two samples (or
groups) are significantly different from each other. In simple hypothesis
testing, the t-test, together with the z-test, is used to test non-
significance of difference between a single pair of samples. While both t-
test and ANOVA are used to test non-significance of difference, ANOVA
has an advantage over other because it minimizes time and effort
expended when computing and testing more than two samples. The t-
test is used to test non-significance of difference between samples,
taking them one pair at a time. ANOVA is simultaneous test taking the
samples all at a single time. The t-test formula is applied in many times
as there are pairs among the samples. The ANOVA test is applied only
28
7
Suppose samples A, B, C and D have to be tested for
non significance of difference. If the researcher uses the t-test, he
would have to test separately for the following pairs: A and B, A and
C, A and D, B and C, B and D, then C and D (computing first their
respective means and standard deviations). There is a possibility
that none of the pairs are significantly different, and so much time
and effort would have been spent useless by using the t-test.
Now, if the researcher uses ANOVA, taking all the four
samples simultaneously and without pairing, the search stops
without further ado when the conclusion arrived at is that of non-
significance of difference.
28
9.2 ANOVA: A ONE-WAY 8
CLASSIFICATION
The technique in performing simple analysis of
variance using data in one-way classification model is
summarized in table 9.1. ANOVA data must be arranged
into rows and columns where the rows represent the
items in a sample and columns represent the k
treatments for the sample classification as shown in table
9.2.
Table 9.1 ANOVA Table for k 28
9
Independent Random Samples. One-
Way Classification
Source of df Sum of Mean F
Variation Squares Squares
Treatmen k-l SST MST MST / MSE
t
Error n-k SSE MSE
Total n-l Total SS
29
0
Where:
Total SS = ∑ x ij
2
–(∑ x ij) 2 Ti = total of all observations in sample i
n
SST = ∑ Ti 2 /ni -(∑ x ij) 2 ni = number of observations in sample
i
n n = n1 + n2 + ….. nk
SSE = Total SS – SST
MST = SST/k – 1
MSE = SST/n – k
29
1
The ANOVA procedure begins by considering the total
variation in the experiment which is measured by the
quantity call total sum of squares (Total SS). This Total SS
is partitioned into two components. The first component
called the sum of squares for treatments (SST), measures
the variation among the k sample means. The second
component called the sum of squares for error (SSE), is
used to measure the pooled variance within the k samples. In
the analysis of variance, Total SS = SST + SSE. Therefore, you
need to calculate only two of the three sum of squares – Total
Each of the sources of variation, when divided by29 the
appropriate df, provides an estimate of the variation 2 in
the experiment. Since Total SS involves n squared
observations, its df = n -1. Similarly, the sum of squares
for treatments involves k squared observations, and its df
= k – 1. Finally, the sum of squares for error, a direct
extension of the pooled estimate has df = n – k. Notice
that the degrees of freedom for treatments and error are
additive, i.e., df(total) = ds(treatments + df(error).
These two sources of variation and their respective
degrees of freedom are combined to form the mean
squares as MS = SS/df. The total variation in the
experiment is then displayed in the ANOVA table.
29
TESTING THE EQUALITY OF THE 3
TREATMENT MEANS
1. H0 : μ1 = μ2 =… = μk (there is no significant difference among
the treatment means
HA : At least one of the means is different from the others.
2. Level of significance a.
3. Test Statistic : F = MST / MSE
4. Decision Rule : Reject H0 if F > Fa with df1 = k -1 and df2 = n – k
When the null hypothesis is rejected, the researcher
may farther expand the test to find out which particular pairs
of the samples are significantly different.
29
4
Example 9.1
Three groups of student each, were subjected to
one of the three types of teaching methods. The grades
of the students are taken at the end of the semester and
enumerated according to grouping shown table 9.2. Test
the null hypothesis that the three teaching methods are
significantly different from each other at 95% level of
confidence.
Solution: 29
Table 9.2. Scored of students at Three 5
Teaching Methods
Method A Method B Method C
Student 1 84 70 90
2 90 75 95
3 92 90 100
4 96 80 98
5 84 75 88
6 88 75 90
T1 = 534 T2 = 465 T3 = 561
k = 3 treatments with n1 = n2 = n3 = 6, n =18
∑x ij = 534 + 465 + 561 = 1560 29
6
∑x ij
2
= 84 2 + 90 2 + ….+ 88 2 + 90 2 = 136484
Total SS = 136484 – (1560) 2 /18 = 1284
df2 = n -1 = 18 -1 = 17
SST = 534 2 + 465 2 + 561 2 – (1560) 2 = 817
6 18
df = k -1 = 3 – 1 = 2
SSE = Total SS – SST = 1284 -817 =467
df1 = n – k = 18 – 3 = 15
MST = SST / nk – 1 = 817 / 2 = 408.5
MSE = SSE / n – k = 467 / 15 = 31.13
29
7
Table 9.3 ANOVA for Example 9.1
Source of df Sum of Mean F
Variation Squares Square
Methods 2 817 408.5 13.12
Error 15 467 31.13
Total 17 1284
29
8
Complete test is as follows:
1. H0: There is no significant difference among the three
teaching methods
2. HA : At least one teaching method differs from the others
3. Level of Significance: a = 0.05
4. Test Statistic: F = MST/ MSE = 13.12
5. Critical
Region: F 0.05 = 3.68 with df1 = 2 and df2 = 15. The
decision rule is to reject H0 when F > 3.68.
6. Conclusion: Reject H0 and conclude that there is a
significant difference among the three teaching methods.
9.3 THE RANDOMIZED BLOCK 29
9
DESIGN: A TWO-WAY
CLASSIFICATION
The one-way classification for a completely
randomized design is meant to be used when the
experimental units are quite similar or homogeneous in
their makeup and when there is only one factor – the
treatment – that might influence the response. Sometimes
it is clear to the researcher that the experimental units
are not homogeneous. Experimental subjects or animals,
agricultural fields, days of the week, and other
experimental units often add their own variability to the
30
Although the researcher is not really interested
0 in
this source of variation, but rather in some treatment he
chooses to apply, he may be able to increase the
information by isolating this source of variation using
the randomized block design. This identifies two
factors: treatments and blocks – both of which affect the
response.
In a randomized design, the experimenter is
interested in comparing k treatment means. The design
uses blocks of k experimental units that are relatively
similar or homogeneous, with one unit within each block
randomly assigned to each treatment.
For instance, a production supervisor wants 30 to
1
compare the mean times for assembly – line operators
to assemble an item using one of the three methods: A,
B, or C. Expecting variation in the assembly times from
operator to operator, the supervisor uses the randomized
block design to compare the three methods. Five
assembly – line operators are selected to serve as
blocks and each is designed to assemble the item three
times, one for each of the three methods. Since the
sequence in which the operator uses the three methods
may be important (fatigue or increasing dexterity may
be factors affecting the response), each operator should
be assigned a random sequencing of the three methods.
30
2
For example, operator 1 might be assigned to
perform method C first, followed by A and B. Operator 2
might perform method A first, then C and D.
To compare four different teaching methods, a group
of students might be divided into blocks of size 4, so that
the groups are mostly nearly matched according to
academic achievement. To compare the average at each
of three usage level : low, medium and high.
30
ANOVA FOR A RAMDOMIZED BLOCK 3
DESIGN
The total variation in the n = bk observation is
Total SS = ∑ x 2 ij –(∑ x ij) 2
n
This is the partitioned into three parts in such a way that
Total SS = SSB + SST + SSE
Where:
SSB( sum of squares for blocks) measures the variation
among the block means.
SST (sum of squares for treatments) measures the variation
among the treatment means. 30
4
SSE (sum of squares for error) measures the variation of the
differences among the treatment observations within blocks,
which measure the experimental error.
SST = ∑ T 2 i – (∑ x ij) 2
b n
SSB = ∑ B 2 j – (∑ x ij) 2
k n
SSE = Total SS – SST - SSB
with t1 = total of all observations receiving treatment i,
i=1,2,…k
Each of the three sources of variation, when divided
by the appropriate degrees of freedom, provides an 30
5
estimate of the variation in the experiment. Since the total SS
involves n = bk squared observations, its degrees of freedom
are df = n – 1. Similarly, SST involves k squared totals, and its
degrees of freedom are df = k – 1, while SSB involves b
squared totals and has df = b – 1. Finally, since the degrees of
freedom are additive, the remaining degrees of freedom
associated with SSE can be shown algebraically to be df = (b-
1) (k – 1).
These three sources of variation and their respective
degrees of freedom are combined to form the mean squares
as MS = SS/df, and the total variation in the experiment is
Table 9.4 ANOVA Table for a 30
6
randomized Block Design, k
Treatments and b Blocks.
SOURCE df SS MS F
Treatmen df1 = k -1 SST MST=SST/ MST/
ts df1 MSE
Blocks df2 = b -1 SSB MSB=SSB/ MSB/
df2 MSE
Error df3 = (b-1)(k- SSE MSE=SSE/
1) df3
Total n – 1 = bk - 1
TESTING THE EQUALITY OF THE 30
7
TREATMENT MEANS AND BLOCK
MEANS
For comparing treatment means:
For comparing block means:
1. H0 : The treatment means are
1. H0 : The block means are equal.
equal.
2. HA : At least two of the block
2. HA : At least two of the
means differ.
treatment means differ.
3. Level of significance a.
3. Level of significance a.
4. Test Statistic : F = MSB/MSE,
4. Test Statistic : F = MST/MSE,
where f is based on df2 and df3.
where f is based on df1 and df2.
5. Rejection Region: Reject H0 if F
5. Rejection Region : Reject H0 if F
> F a-
> F a-
30
8
Example 9.2
The cellular phone industry is involved in a fierce battle for
customers, with each company devising its own complex pricing plan
to lure customers. Since the cost of a cell phone minute varies
drastically depending on the number of minutes per month used by
the customer, a consumer watchdog group decided to compare the
average costs for four cellular phone companies using three different
usage levels as blocks. The monthly cost (in dollars) computed by the
cell phone companies for peak-time callers at low (20 minutes per
month), middle (150 minutes per month), and high (1000 minutes per
month) usage levels are given in table 9.5. Do the data provide
sufficient evidence to indicate a difference in the average monthly
cellphone cost depending on the company the customer uses?
Table 9.5 Monthly phone costs of 30
9
four companies at three usage
levels.
COMPANY
Usage A B C D Totals
Level
Low 27 24 31 23 B1=105
Middle 68 76 65 67 B2=276
High 308 326 312 300 B3=1246
Totals T1=40 T2=42 T3=40 T4=390 1627
Solution: The experiment is designed as a randomized block
design with b = 3 usage levels (blocks) and k = 4
31
companies (treatments), so there are n = bk 0= 12
observations ∑ x ij = 1627
Total SS = (27 + 24 + … + 312 + 300 ) – ( 1627) /12
2 2 2 2 2
= 410,393 – 220, 594.0833
Total SS = 189, 798.92
SST = 403 2 + 426 2 + 408 2 + 390 2 – 22,594.0833
3
SST = 222.25
SSB = 105 2 + 276 2 + 1246 2 - 220,594.0833
4
SSB = 189,335.17
SSE = Total SS – SST – SSB = 241.5
31
Table 9.6 Two-Way Analysis of 1
Variance output for example 9.2
Source df SS MS F
Company 3 222.25 74.1 1.84
Usage 2 189,335.1 94667.6 2351.99
7
Error 6 241.5 40.25
Total 11
The complete test is as follows: 31
2
1. H0 : No difference in the average cost among the
companies.
2. HA
: The average cost is different for at least one of the four
companies.
3. Level of significance a = 0.05
4. Test Statistic : F = MST/MSE = 1.84
5. Critical Region : F 0.05 = 4.76. The decision rule is to reject
H0 if F > 4.76
6. Conclusion : Accept H0-
There is insufficient evidence to indicate a difference
in the average monthly costs for the four companies.
31
3
Chapter 10
ANALYSIS OF
ENUMERATION DATA
31
4
10.1 INTRODUCTION
Enumeration data are expressed in the form of frequencies, which
represent the number of items within specified qualitative descriptions or
categories. Enumeration data may be classified according to the number
of variables described as either one-way or two-way classification. Each
variable is further subdivided into more specific categories.
One-way classification has only one variable described by at least
two categories. Two-way classification enumeration data have two
variables described by their respective categories. Frequencies given are
applicable to both variables. Data with two-way classification are best
summarized and presented in a contingency table which is made up of
several rows and columns- the rows representing the categories of one
variable, and the columns representing the categories of the other
31
10.2 ANALYSIS OF ENUMERATION 5
DATA
The chi-square is a versatile statistical test named after
the chi-square distribution which is derived under the
assumption of normality of the population.
Among the uses of the chi-square are the following:
1. Goodness-of-Fit Test. To find out whether or not the sample
distribution conforms with the hypothetical normal
distribution.
2. Test
for Homogeneity. To find out whether or not an observed
proportion equal to some given ideal or expected proportion.
3. Test for Independence. To test the independence of one
Just like the format used in the test of hypothesis
31as a
involving z and t-tests, the following steps will serve
6
guide in the analysis of enumeration data.
1. State the null and alternative hypothesis. The null
hypothesis may be stated in any of these ways:
a. Goodness-of-Fit Test: H0 : The sample distribution
conforms with the hypothetical or theoretical
distribution.
b. Test for Homogeneity: H0 : The actual observed
proportion is not significantly different from the ideal or
expected proportion.
c. Test for Independence: H0 : Variables are
31
The first two types of null hypothesis are applicable
7
to data with one-way classification. The third type of null
hypothesis is applicable to data with two-way
classification.
2. Choose an alternative hypothesis HA.
3. Set the level of significance a.
4. Determine the critical region Xa 2 with df = k -1 for one
way classification and df = (r - 1)(c – 1) for two way
classification. where: k= no. of categories
r = no. of rows c = no. of columns
The decision rule is to reject H0 if X 2 > Xa 2.
X2 = ∑ (f0 – fe) 2 31
where: f0 = actual observed frequency
8
fe fe = expected
frequency
In one-way classification, fe = np.
In two-way classification, fe = (column total)x(row total)
grand total
In a 2x2 contingency table, df = 1, Yate’s correction for
continuity is applied. The corrected formula then
becomes
X2 = ∑ (| f0 – fe |- 0.5) 2
31
9
Example 10.1
Based on the data on Table 10.1, is the
actual observed proportion significantly different
from the expected proportion, if the expected
proportion is 50% married, 30% single, 10%
widowed and 10% legally separated?
32
Table 10.1 One-Way Classification 0
of Civil Status of 50 Employees
STATUS FREQUENCY EXPECTED
PROPORTION
Single 18 30%
Married 24 50%
Widowed 5 10%
Legally Separated 3 10%
Total 50 100%
32
1
Chapter 11
LINEAR REGRESSION
AND CORRELATION
32
2
11.1 INTRODUCTION
The technique presented in this chapter addresses the
problems whose objective is to analyze the relationship
between two variables whose data types are quantitative. The
statistical technique we will discuss is called regression
analysis.
One of the reasons for the importance of regression,
particularly in business and economics applications is that it
can be used to forecast variables. As you can easily
appreciate, almost all companies and governmental
institutions frequently forecast variables such as product
32
3
While several different forecasting techniques can be
used, regression analysis is one of the most popular.
The technique involves developing a mathematical
equation that analyzes the relationship between the variable
to be forecasted and the variable that the statistician
believes is related to the forecast variable. The variable to be
forecasted is called the dependent variable and is denoted
“Y”, while the related variable is called the independent
variable is denoted as “X”.
The following examples 32
4
illustrate uses of regression
analysis:
1. Areal – estate agent wants to develop a more accurate methods
of predicting the selling price of houses. She believes that the
most important factor is the size of the house. The house price is
labeled Y, and the house is labeled X. She will gather data about X
and Y to develop the model.
2. Government economist is assigned the task of analyzing the
relationship between interest rates and unemployment. Because
high interest rates generally cause higher unemployment, he
denotes the unemployment rate as Y and the interest rate as X.
Observing monthly values of X and Y will enable the economist to
produce a model to assist in the analysis.
The following examples 32
5
illustrate uses of regression
analysis:
3. The president of a national chain of print shops is re-
evaluating the company’s prices. The president needs to know
how prices affect gross sales. Consequently, he labels gross
sales Y and the price per copy X. Gathering annual data for X
and Y for several different outlets across the country is the
next step on the procedure.
32
6
11.2 MODEL
The first step is to develop a mathematical
model that represents the relationship between the two
variables. A mathematical model is an equation that
represents an actual relationship. For example, an
appliance repair company changes its customers (for
labor only) according to the following schedule:
First hour
Php 70.00
Each additional quarter hour Php 10.00
We can represent this schedule by the equation:
Y = 70 + 10X
Where: 32
7
Y = total charge
X = number of quarter hours after the
first hour
The simplest presentation of a relationship is a two-variable
model. These two variables are usually expressed in the
form of an equation. For example:
- The quantity demanded is related to price.
- The quantity produced in a factory is related to the
production costs.
- The number of students for a given semester may be related
to tuition fee
- This expenditures of a certain household is related to income
The estimation of these relationships is given by the linear equation :
Y = a + bX Where : a – is the y – intercept
32
b – is the slope of the line 8
The Y-intercept refers to the value of the Y when X is equal to zero. The
slope is the ratio of the unit charge in Y to the unit charge in X. If the
value of b is positive, that is b > 0, then the graph of a linear
regression is said to be increasing (see fig. 1-A). If the value of b is
negative or b < 0, then the graph is decreasing (see fig. 1- B). When
the value of b = 0, then the value of Y is constant (see fig. 1- C).
32
9
11.3 SCATTER DIAGRAM
The relationship between the variables X and Y
can be visualized using the scatter diagram. In this
method, the values of the variables X and Y are plotted in
quadrant 1 of a two dimensional coordinate system. In
plotting the points associated with the ordered
observation (X,Y), we first subdivide the horizontal X-axis
and the vertical Y-axis in terms of subintervals. It should
be noted that the subdivisions in X might not be the
same as in Y.
Consider the following paired
33
observations, the value of X, the 0
independent variable, and Y the
dependent variable.
X Y
1 1
3 2
4 4
6 4
8 5
9 7
11 8
14 9
33
Using the scatter diagram for the data 1
X and Y shown below
33
2
11.4 METHOD OF LEAST SQUARE
Consider a set of observations of two variables X and Y.
we want to estimate the relationship between X and Y such
that Y = a + bX where a and b are estimates of parameters.
The deviation between the observed and estimated values
of Y are called residuals or simply errors denoted by letter e.
Our trend line is the regression line. “Least Square”
means that the most accurate trend line that may be
drawn is one where sum of the squares of the vertical
distances of the points from the line is least or maximum.
33
3
All other lines will yield a higher result. This is
the same as saying that the sum of the vertical
distances (from the points to the line) of the points
above the line should be equal to the sum of the vertical
distances of the points below the line. When these sums
(above and below) are not equal, then the sum of the
squares of the vertical distances of all points from the
line is not minimum.
The formulas for a and b are derived from what
33
are referred to as “normal” equations. These 4normal
equations are in turn derived through a combination of
graphical techniques and the use of calculus. The normal
equations are: 1. ∑Y = aN + b∑X
2. ∑XY = a∑X + b∑X 2
Where:
∑Y= sum of the values of Y, dependent variable
N = the number of pairs of X and Y
∑X= sum of the values of X, independent variable
∑XY = the sum of the column XY which is derived by multiplying
paired values of X and Y.
2 2
33
5
Based on the given data of X and Y, we can
determine all of the above which means that the two
normal equations now consist of a system of two linear
equations with two unknowns – a and b.
a = (∑Y)(∑X2) – (∑X)(∑XY)
N(∑X2) – (∑X)2
b = N (∑XY) - (∑X)(∑Y)
N(∑X2) – (∑X)2
33
Let us illustrate the use of these formulas by 6
using the given values of X and Y earlier.
X Y XY X2
1 1 1 1
3 2 6 9
4 4 16 16
6 4 24 36
8 5 40 64
9 7 63 81
11 8 88 121
14 9 126 196
∑ = 56 40 364 524
Therefore : 33
7
a = (40)(524) – (56)(364) b = 8 (364)
– (56)(40)
8(524) – (56) 2
8(524) – (56) 2
= 0.54
= 0.64
We can now substitute these values into the equation of a
straight line:
Y = a + bX
This final equation gives us the equation of the33Least
Square Regression Line (LSRL) or the equation 8of the
most accurate trend line, also known as the best fitting
line. This equation will enable us to solve for Y for any
given value of X. In our example, when X is 16 Y is :
Y = 0.54 + 0.64 (16)
= 0.54 + 10.24
= 10.78
This result is not very far from the graphical result of
10.8
The method employing the LSRL is very useful in providing a
fairly accurate estimate when the values of any two variables
33 are
given. This method assumes that one variable depends (through
9 not
necessarily entirely) on the other variable. It also assumes that the
trend approximates a straight line.
.
Testing the Significance of b
The usual approach in testing the significance of regression
coefficient is to assume that b=0. this assumption implies that in
the equation Y=a+bX, if b=0, then we can say that the two
variables are not linearly related. This further implies that the
independent variable X can not be used to predict the value of the
dependent variable Y using the linear equations.
When the relationship of two variables is proven to be linear,
then we can say that the variables X can be used in predicting the
We wanted to test the hypothesis b = 0 against the
alternative hypothesis b ≠ 0. In symbols, we have : 34
0
H0 = b = 0 Ha : b ≠
0
In testing for the significance of the regression
coefficient b, we use the formula :
34
1
11.5 SIMPLE CORRELATION
In regression analysis, the dependence of the variables
was emphasized. The variable X assumed the role of the
independent variable and variable Y as the dependent
variable.
In this section, we are going to consider the degree of
association between two variables X and Y without giving
emphasis on which variable is the dependent or the
independent variable. The measure of the degree of
association between two variables is known as the coefficient
of correlation. We are only going to consider linear correlation
34 of
In different situations, the value of the coefficient
2
correlation varies from -1 to +1. We let r be the coefficient of
correlation, then all the possible values of r can be expressed
in the interval. -1 ≤ r ≥ 1
If the value of r = -1 or 1, we say that the two variables
are perfectly correlated. A negative value for r implies
negative correlation and a positive value of r implies a positive
correlation. Example, suppose a researcher is trying to
determine the relationship between the IQ and the
achievement test scores of the students. If the researcher
finds out that the variables are highly positive correlated, that
he can say that if a student has very high IQ, then it follows
that the student can get a very high score in an achievement
test.
In other words, there are three degrees of correlation or
relationship between two variables: 34
1. Perfect Correlation (positive and negative) 3
2. Some degrees of Correlation (positive and negative)
3. No Correlation
34
4
34
5