Chapter 3 Data
Chapter 3 Data
Overview/Introduction
The processing of statistical information has a history that extends back to the beginning of
mankind. In early biblical times nations compiled statistical data to provide descriptive
information relative to all sorts of things, such as taxes, wars, agricultural crops and even
athletic events. Today, with the development of probability theory, we are able to use statistical
methods that not only describe important features of data but methods that allow us to proceed
beyond the collected data into the area of decision making through generalizations and
predictions.
Learning Outcome/Objective
Learning Content/Topic
Examples
4 out of 5 dentists recommend Dentyne.
Almost 85% of lung cancers in men and 45% in women are tobacco-related.
People predict that it is very unlikely there will be another baseball player with a batting
average over 400.
79.48% of all statistics are made up on the spot.
A surprising new study shows that eating egg whites can increase one’s life span.
All these claims are statistical in character. In the study of statistics we are basically concerned
with the presentation and interpretation of chance outcomes that occur in a planned or scientific
investigation. Hence, having knowledge with basic statistic is very important.
STATISTICS
Provides tools that you need in order to react intelligently to information you hear or
read.
Can be applied in psychology, health, law, sports, business, etc.
Are often presented in an effect to add credibility to an argument or advice.
Statistical methods are those procedures used in the collection, presentation, analysis and
interpretation of data. We shall categorize these methods as belonging to one of two major
areas called the descriptive statistics and statistical inference.
Descriptive Statistics. This comprises those methods concerned with collecting and
describing a set of data so as to yield meaningful information.
Let it clearly understood that descriptive statistics provides information only about the collected
data and in no way draws inferences or conclusions concerning a larger set of data. The
construction of tables, charts, graphs and other relevant computations in various newspapers
and magazines usually fall in the area categorized as descriptive statistics.
Statistical Inference. This comprises those methods concerned with the analysis of a subset of
data leading to predictions or inferences about the entire set of data.
The generalizations associated with statistical inferences are always subject to uncertainties,
since we are dealing only with partial information obtained from a subset of the data of interest.
To cope with uncertainties, an understanding of probability theory is essential. In this lesson, we
will be discussing some sort of ideas wherein, we could use our knowledge from this in all areas
of learning may benefit. The basic techniques for collecting and analyzing data are the same no
matter what the field of application may be.
For example, the chemist runs an experiment using 3 variables and measures the amount of
desired product. The results are then analyzed by statistical procedures. These same
procedures are used to analyze the results obtained by measuring the yield of grain when 3
fertilizers are tested or to analyze the data representing the number of defectives produced by 3
similar machines. Many statistical methods that were derived primarily from agricultural
applications have proved to be equally valuable for applications in other areas.
Statisticians are employed today by every progressive industry to direct their quality control
process and to assist in the establishment of good advertising and sales programs for their
products. In business the statistician is responsible for decision making, for the analysis of time
series, and for the formation of index numbers. Indeed, statistics is a very powerful tool if
properly used.
The abuse of statistical procedures will frequently lead to erroneous results. One should be
careful to apply the correct and most efficient procedure for the given conditions to obtain
maximum information from the data available.
The procedures used to analyze a set of data depend to a large degree on the method used to
collect the information. For this reason, it is desirable in any investigation to consult with the
statistician from the time the project is planned until the final results are analyzed and
interpreted. Statistics are all around us, sometimes we need to used it well, sometimes not.
The number of observations in the population is defined to be the SIZE of the population.
Example: A newspaper website contains a poll asking people their opinion on a recent news
article. What is the population?
Answer: While the target (intended) population may have been all people, the real population of
the survey is the readers of the website.
On the other hand, since surveying an entire population is often impractical, we usually select a
sample to study.
Sample is a smaller subset of the entire population, ideally one that is fairly representative of the
whole population.
Example: A researcher wanted to know how citizens of Echague felt about a voter initiative. To
study this, she goes to the Echague SM Store and randomly selects 500 shoppers and asks
them their opinion. Sixty percent (60%) indicate they are supportive of the initiative. What is the
sample and population?
Answer: The sample is the 500 shoppers questioned. The population is less clear. While the
intended population of this survey was Echague citizens, the effective population was mall
shoppers. There is no reason to assume that the 500 shoppers questioned would be
representative of all Echague citizens.
Why? Because there might be a tendency that the shopper being asked is not from Echague, so
asking that sample would give an error to the survey.
A parameter is used to describe the entire population being studied. For example, we want to
know the average length of a butterfly. It is a parameter because it is states something about
the entire population of butterflies.
Parameters are difficult to obtain, but we use the corresponding statistic to estimate its value. A
statistic describes a sample of a population, while a parameter describes the entire population.
Examples:
1) A researcher wants to estimate the average height of women aged 20 years or older. From a
simple random sample of 45 women, the researcher obtains a sample mean height of 63.9
inches.
2) A nutritionist wants to estimate the mean amount of sodium consumed by children under the
age of 10. From a random sample of 75 children under the age of 10, the nutritionist obtains a
sample mean of 2993 milligrams of sodium consumed.
3) Nexium is a drug that can be used to reduce the acid produced by the body and heal damage
to the esophagus. A researcher wants to estimate the proportion of patients taking Nexium that
are healed within 8 weeks. A random sample of 224 patients suffering from acid reflux disease
is obtained, and 213 of those patients were healed after 8 weeks.
4) A researcher wants to estimate the average farm size in Kansas. From a simple random
sample of 40 farms, the researcher obtains a sample mean farm size of 731 acres.
5) An energy official wants to estimate the average oil output per well in the United States. From
a random sample of 50 wells throughout the United States, the official obtains a sample mean of
10.7 barrels per day.
6) An education official wants to estimate the proportion of adults aged 18 or older who had
read at least one book during the previous year. A random sample of 1006 adults aged 18 or
older is obtained, and 835 of those adults had read at least one book during the previous year.
Answers: For each study, identify both the parameter and the statistic in the study:
1) The parameter is the average height of all women aged 20 years or older.
The statistic is the average height of 63.9 inches from the sample of 45 women.
2) The parameter is the mean amount of sodium consumed by children under the age of ten.
The statistic is the mean of 2993 milligrams of sodium obtained from the sample of 75 children.
5) The parameter is the average oil output per well in the United States.
The statistic is the mean oil output of 10.7 barrels per day from the sample of 50 wells.
6) The parameter is the proportion of adults 18 or older who read a book in the previous year.
The statistic is 835/1006 = 0.830, the proportion who read a book in the sample.
The most important requirement of probability sampling is that everyone in your population has
a known and an equal chance of getting selected. For example, if you have a population of 100
people every person would have odds of 1 in 100 for getting selected. Probability sampling
gives you the best chance to create a sample that is truly representative of the population.
Probability sampling uses statistical theory to select randomly, a small group of people (sample)
from an existing large population and then predict that all their responses together will match the
overall population.
Types of Probability Sampling
1. Simple Random Sampling
- is a completely random method of selecting the sample.
- This sampling method is as easy as assigning numbers to the individuals (sample) and
then randomly choosing from those numbers through an automated process. Finally, the
numbers that are chosen are the members that are included in the sample.
Every element has an equal chance of getting selected to be the part sample.
There are two ways in which the samples are chosen in this method of sampling: Lottery system
and using number generating software/ random number table. This sampling technique usually
works around large population and has its fair share of advantages and disadvantages.
The population is divided into a number of subgroups (strata) before taking samples randomly.
The number of samples in each group is proportional to the size of the subgroup to the
population.
A common method is to arrange or classify by sex, age, ethnicity and similar ways. Splitting
subjects into mutually exclusive groups and then using simple random sampling to choose
members from groups.
Members in each of these groups should be distinct so that every member of all groups gets
equal opportunity to be selected using simple probability. This sampling method is also called
“random quota sampling”.
3. Cluster Random Sampling
- is a way to randomly select participants when they are geographically spread out.
The population is divided into subgroups and a set of subgroup are selected to be in the
sample.
For example, if you wanted to choose 100 participants from the entire population of the U.S., it
is likely impossible to get a complete list of everyone. Instead, the researcher randomly selects
areas (i.e. cities or counties) and randomly selects from within those boundaries.
Cluster sampling usually analyzes a particular population in which the sample consists of more
than a few elements, for example, city, family, university etc. The clusters are then selected by
dividing the greater population into various smaller sections.
4. Systematic Sampling
- is when you choose every “nth” individual to be a part of the sample.
For example, you can choose every 5th person to be in the sample. Systematic sampling is an
extended implementation of the same old probability technique in which each member of the
group is selected at regular periods to form a sample. There’s an equal opportunity for every
member of a population to be selected using this sampling technique.
What are the steps involved in Probability Sampling?
1. Choose your population of interest carefully: Carefully think and choose from the population,
people you think whose opinions should be collected and then include them in the sample.
2. Determine a suitable sample frame: Your frame should include a sample from your population
of interest and no one from outside in order to collect accurate data.
3. Select your sample and start your survey: It can sometimes be challenging to find the right
sample and determine a suitable sample frame. Even if all factors are in your favor, there still
might be unforeseen issues like cost factor, quality of respondents and quickness to respond.
Getting a sample to respond to true probability survey might be difficult but not impossible.
But, in most cases, drawing a probability sample will save you time, money, and a lot of
frustration. You probably can’t send surveys to everyone but you can always give everyone a
chance to participate, this is what probability sample is all about.
2. When the population is usually diverse: When your population size is large and diverse this
sampling method is usually used extensively as probability sampling helps researchers create
samples that fully represent the population. Say we want to find out how many people prefer
medical tourism over getting treated in their own country, this sampling method will help pick
samples from various socio-economic strata, background etc to represent the bigger population.
2. It’s simple and easy: Probability sampling is an easy way of sampling as it does not involve a
complicated process. Its quick and saves time. The time saved can thus be used to analyze the
data and draw conclusions.
3. It non-technical: This method of sampling doesn’t require any technical knowledge because
of the simplicity with which this can be done. This method doesn’t require complex knowledge
and it’s not at all lengthy.
Non-Probability Sampling
- is a sampling technique in which the researcher selects samples based on the subjective
judgment of the researcher rather than random selection.
- Does not rely on randomization thus, not all the elements has chance to become a
sample
In non-probability sampling, not all members of the population have a chance of participating in
the study unlike probability sampling, where each member of the population has a known
chance of being selected.
Non-probability sampling is most useful for exploratory studies like pilot survey (a survey that is
deployed to a smaller sample compared to pre-determined sample size). Non-probability
sampling is used in studies where it is not possible to draw random probability sampling due to
time or cost considerations.
Non-probability sampling is a less stringent method; this sampling method depends heavily on
the expertise of the researchers. Non-probability sampling is carried out by methods of
observation and is widely used in qualitative research.
These samples are selected only because they are easy to recruit and researcher did not
consider selecting sample that represents the entire population.
Ideally, in research, it is good to test sample that represents the population. But, in some
research, the population is too large to test and consider the entire population. This is one of the
reasons, why researchers rely on convenience sampling, which is the most common non-
probability sampling technique, because of its speed, cost-effectiveness, and ease of availability
of the sample.
An example of convenience sampling would be using student volunteers known to researcher.
Researcher can send the survey to students and they would act as sample in this situation.
2. Consecutive Sampling
- This non-probability sampling technique is very similar to convenience sampling, with a
slight variation. Here, the researcher picks a single person or a group of sample,
conducts research over a period of time, analyzes the results and then moves on to
another subject or group of subject if needed.
Consecutive sampling gives the researcher a chance to work with many subjects and fine tune
his/her research by collecting results that have vital insights.
3. Quota Sampling
- is a variation on stratified sampling, wherein samples are collected in each subgroup
until the desired quota is met.
Hypothetically consider, a researcher wants to study the career goals of male and female
employees in an organization. There are 500 employees in the organization. These 500
employees are known as population. In order to understand better about a population,
researcher will need only a sample, not the entire population. Further, researcher is interested in
particular strata within the population. Here is where quota sampling helps in dividing the
population into strata or groups.
For studying the career goals of 500 employees, technically the sample selected should have
proportionate numbers of males and females. This means there should be 250 males and 250
females. Since, this is unlikely, the groups or strata are selected using quota sampling.
This is not a scientific method of sampling and the downside to this sampling technique is that
the results can be influenced by the preconceived notions of a researcher. Thus, there is a high
amount of ambiguity involved in this research technique.
For example, this type of sampling method can be used in pilot studies.
5. Snowball Sampling
- helps researchers find sample when they are difficult to locate. Researchers use this
technique when the sample size is small and not easily available.
This sampling system works like the referral program. Once the researchers find suitable
subjects, they are asked for assistance to seek similar subjects to form a considerably good size
sample.
For example, this type of sampling can be used to conduct research involving a particular illness
in patients or a rare disease. Researchers can seek help from subjects to refer other subjects
suffering from the same ailment to form a subjective sample to carry out the study.
7. Referral Sampling
- samples are based on referrals.
SOURCES OF BIAS
1. Sampling Bias – when sample is not a representative of a population
2. Voluntary Response Bias –often occurs when the samples are volunteers
3. Self- Interest Study – can occur when the researchers have an interest in the outcome
4. Response Bias – when the responder gives inaccurate responses for any reasons
5. Perceived Lack of Anonymity – when the responder fears giving an honest answer
6. Loaded Questions – when the question wording influences the responses
7. Non- Response Bias – when people refuse to participate in the study
Types of Variables
1. Independent Variable/Treatment – the variable that is manipulated by the researcher.
2. Dependent Variable – one that is observed for changes in order to assess the effect of the
treatment.
Types of Data
● Types of Numerical/Quantitative
Interval – numbers/scale w/o true ZERO
Ratio – numbers/scale w/ true ZERO
Discrete – only certain values are possible
Continuous – Any value
● Types of Categorical/Qualitative
Nominal – Unordered categories (eye color, gender)
Ordinal – ordered categories (taxonomic classification, Educational attainment)
SUMMATION NOTATION
In statistics, it is frequently necessary to work with sums of numerical values. For example, we
may wish to compute the average cost of a certain brand of toothpaste sold at different stores.
Consider a controlled experiment in which the decreases in weight over a 6-month period were
15, 10, 18, and 6 kilograms, respectively. If we designate the first recorded value x 1 = 15, x2 =
10, x3 = 18 and x4 = 6. Then, we can use the capital sigma letter Σ to indicate “summation of”,
where we can now write the sum of the 4 weights as
4
∑ 𝑥𝑖
𝑖=1
where we read “summation of xi going from 1 to 4.”
The numbers 1 and 4 are called the lower and upper limits of summation.
Hence,
4
∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4
𝑖=1
= 15 + 10 + 18 + 6
= 49
Also,
3
∑ 𝑥𝑖 = 𝑥2 + 𝑥3
𝑖=2
= 10 + 18
= 28
∑
𝑖=1
means that we replace i wherever it appears after the summation symbol by 1, then by 2, and
so on up to n, and then add up the terms. Therefore, we can write
3 5
The subscript may be any letter, although i, j and k seem to be preferred by statisticians.
Obviously,
𝑛 𝑛
∑ 𝑥𝑖 = ∑ 𝑥𝑗
𝑖=1 𝑗=1
The lower limit of summation is not necessarily a subscript. For instance, the sum of the natural
numbers from 1 to 9 may be written:
9
∑ 𝑥 = 1 + 2 + ⋯ + 9 = 45
𝑥=1
When we are summing over all the values xi that are available, the limits of summation are
often omitted and we simply write Σxi. If in the diet experiment only 4 people were involved,
then Σxi = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 .
Example 4:
If X1 = 3, x2 = 5 and x3 = 7, find
Solution:
a. Σxi = x1 + x2 + x3 = 3 + 5 + 7 = 15
Example 5:
Solution:
a. ∑3𝑖=1 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 = (2)(4) + (-3)(2)+ (1)(5)= 7
Theorem 1:
The summation of the sum of two or more variables is the sum of their summations. Thus
𝑛 𝑛 𝑛 𝑛
∑(𝑥𝑖 + 𝑦𝑖 + 𝑧𝑖 ) = ∑ 𝑥𝑖 + ∑ 𝑦𝑖 + ∑ 𝑧𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1
∑(𝑥𝑖 + 𝑦𝑖 + 𝑧𝑖 )
𝑖=1
= (𝑥1 + 𝑦1 + 𝑧1 ) + (𝑥2 + 𝑦2 + 𝑧2 ) + … + (𝑥𝑛 + 𝑦𝑛 + 𝑧𝑛 )
∑ 𝑐𝑥𝑖 = 𝑐 ∑ 𝑥𝑖
𝑖=1 𝑖=1
= 𝑐 ∑𝑛𝑖=1 𝑥𝑖
Theorem 3:
If c is a constant, then
𝑛
∑ 𝑐 = 𝑛𝑐
𝑖=1
Proof: If in Theorem 2 all the xi are equal to 1, then
𝑛
∑𝑐 = 𝑐 +𝑐 + …+ 𝑐
𝑖=1
= nc
Example 6:
If x1 = 2, x2 = 4, y1 = 3, y2 = -1, find the value of
2
∑(3𝑥𝑖 − 𝑦𝑖 + 4)
𝑖=1
= (3)(2+4) – (3-1) + 8
= 24
Example 7:
Simplify
3
∑(𝑥 − 𝑖 )2
𝑖=1
=∑3𝑖=1(𝑥 2 − 2𝑥𝑖 + 𝑖 2 )
= 3x2 -12x + 14
An average is a measure of the center of a set of data when the data are arranged in an
increasing or decreasing order of magnitude.
Any measure indicating the center of a set of data, arranged in an increasing or decreasing
order of magnitude is called measure of central location or a measure of central tendency.
The most commonly used measures of central location are the mean, median and mode.
Solution:
∑ 𝑥𝑖 3+5+6+4+6
Since the data are considered to be a finite population, 𝝁 = = = 𝟒. 𝟖
𝑁 5
Sample Mean. If the set of data x1, x2 … xn, not necessarily all distinct, represents a finite
sample of size n, then the sample mean is
∑𝑛𝑖=1 𝑥𝑖
𝑥̅ =
𝑛
Example 2 (Ungrouped Data): A food inspector examined a random sample of 7 cans of a
certain brand of tuna to determine the percent of foreign impurities. The following data were
recorded: 1.8, 2.1, 1.7, 1.6, 0.9, 2.7 and 1.8. Compute the sample mean.
∑ 𝑥𝑖 1.8+2.1+1.7+1.6+0.9+2.7+1.8
Solution: This being a sample, we have 𝑥̅ = = = 1.8%
𝑛 7
∑ 𝑓𝑥
𝑥̅ = 𝑥̅ ′ + (𝑖)
𝑛
Example 4:
Class Interval f x (midpoint) d’ fd’
90-94 7 92 2 14
85-89 13 87 1 13
80-84 16 82 0 0
75-79 8 77 -1 -8
70-74 6 72 -2 -12
N= 50 Σfd’ = 7
∑ 𝑓𝑥 7
𝑥̅ = 𝑥̅ ′ + (𝑖) = 82 + (5) = 82 + 0.7 = 82.7 𝑜𝑟 83
𝑛 50
The second most useful measure of central location is the median. For a population we
designate the median by 𝜇̅ and for a sample we write 𝑥̅ .
Example 5 (Ungrouped): On 5 term tests in sociology, a student has made grades of 82, 93,
86, 92, and 79. Find the median for this population of grades.
Solution: Arranging the grades in an increasing order of magnitude, we get 79, 82, 86, 92, 93
and hence, 𝜇̅ = 86
Solution: If we arrange these nicotine contents in an increasing order of magnitude, we get 1.9,
2.3, 2.5, 2.7, 2.9, 3.1 and the median is then the mean of 2.5 and 2.7.
Therefore,
𝟐.𝟓+𝟐.𝟕
𝑥̅ = = 2.6 mg
𝟐
Example 7 (Grouped):
Exact Lower Limit or Cumulative
Scores Frequency (f)
Lower Boundary (L) Frequency (F)
90-94 7 89.5 50
85-89 13 84.5 43
80-84 16 79.5 30
75-79 8 74.5 14
70-74 6 69.5 6
i=5 N=50
Example 8 (Ungrouped): If the donations from the residents of Fairway Forest toward the
Virginia Lung Association are recorded as 9, 10, 5, 9, 9, 8, 6, 10 and 11 dollars, then 9 dollars,
the value that occurs with the greatest frequency, is the mode.
The number of movies attended last month by a random sample of 12 high school students
were recorded as follows: 2, 0, 3, 1, 2, 4, 2, 5, 4, 0, 1, and 4. In this case, there are two modes,
2 and 4, since both 2 and 4 occur with the greatest frequency. The distribution is said to be
bimodal.
Example 9 (Grouped):
Exact Lower Limit or Cumulative
Scores Frequency (f)
Lower Boundary (L) Frequency (F)
90-94 7 89.5 50
85-89 13 84.5 43
80-84 16 79.5 30
75-79 8 74.5 14
70-74 6 69.5 6
i=5 N=50
MEASURES OF VARIATION
The three measures of central location do not by themselves give an adequate description of
the data. We need to know how the observations spread out from the average. It is quite
possible to have two sets of observations with the same mean or median that differs
considerably in the variability of their measurements about the average.
Example 10: Consider the following measurements, in liters, for two samples of orange juice
bottled by companies A and B:
Sample A 0.97 1.00 094 1.03 1.11
Sample B 1.06 1.01 0.88 0.91 1.14
Both samples have the same mean of 1 liter. It is quite obvious that company A bottles orange
juice with a more uniform content than company B. We say that the variability or the dispersion
of the observations from the average is less for sample A than for sample B. Therefore, in
buying orange juice, we would feel more confident that the bottle we select will be closer to the
advertised average if we buy from company A.
The most important statistics for measuring the variability of a set of data are the range and the
variance. The simplest of these to compute is the range.
Example 11 (Ungrouped): The IQs of 5 members of a family are 108, 112, 127, 118, and 113.
Find the range.
Example 12 (Grouped)
Scores f Lower Limit Higher Limit
90-94 7 89.5 94.5
85-89 13 84.5 89.5
80-84 16 79.5 84.5
75-79 8 74.5 79.5
70-74 6 69.5 74.5
The range is a poor measure of variation, particularly if the size of the sample or population is
large. It considers only the extreme values and tells us nothing about the distribution of numbers
in between. Consider, for example, the following two sets of data, both with range of 12.
Set A 3 4 5 6 8 9 10 12 15
Set B 3 7 7 7 8 8 8 9 15
In set A the mean and median are both 8, but the numbers vary over the entire interval from 3 to
15.
In set B the mean and median are also 8, but most of the values are closer to the center of the
data.
Although the range fails to measure this variation between the upper and lower observations, it
does not have useful applications. In industry the range for measurements on items coming off
an assembly line might be specified in advance. As long as all measurements fall within the
specified range, the process is said to be in control.
Variance
To overcome the disadvantage of the range, we shall consider a measure of variation, namely,
the variance that considers the position of each observation relative to the mean of the set. This
is accomplished by examining the deviations from the mean.
The deviation of an observation from the mean is found by subtracting the mean of set of data
from the given observation.
Note: An observation greater than the mean will yield a positive deviation, whereas an
observation smaller than the mean will produce a negative deviation.
Comparing the deviations for the two sets of data above, we have the following
Set A 3 4 5 6 8 9 10 12 15
Set B 3 7 7 7 8 8 8 9 15
Set A -5 -4 -3 -2 0 1 2 4 7
Set B -5 -1 -1 -1 0 0 0 1 7
Clearly, most of the deviations of set B are smaller in magnitude than those of set A, indicating
less variation among the observations of set B. Our aim now is to obtain a single numerical
measure of variation that incorporates all the deviations from the mean.
Population Variance. Given the finite population x1, x2, … , xN, the population variance is
∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2 ∑𝑁
𝑖=1 𝑓(𝑥𝑖 −𝜇)
2
𝜎2 = =
𝑁 𝑁
Assuming that the two sets A and B are populations (from our previous example), we now use
the deviations in the table below to calculate their variance.
Set A -5 -4 -3 -2 0 1 2 4 7
Set B -5 -1 -1 -1 0 0 0 1 7
A comparison of the two variances shows that the data of set A are more variable than the data
of set B.
On the other hand, the variance of a sample, denoted by s 2, is a statistic. Therefore, different
random samples of size n, selected from the entire population, would generally yield different
values for s2. In most statistical applications, the parameter 𝜎 2 is unknown and is estimated by
the value s2. For our estimate to be good, it must be computed from a formula that on the
average produces the true answer 𝜎 2 . That is, if we were to take all the possible random
samples of size n from a population and compute s 2 for each sample, the average of all the s 2
values should be equal to 𝜎 2 . A statistic that estimates the true parameter on the average is
said to be unbiased.
Intuitively, we would expect the formula for s 2 to be the summation formula as that used for 𝜎 2 ,
with the summation now extending over the sample observations and with μ replaced by Ẋ. This
is indeed done in many texts, but the values so computed for the sample variance tend to
underestimate 𝜎 2 on the average. To compensate for this bias, we replace n by n-1 in the
divisor.
Sample Variance. Given a random sample x1, x2, … , xn, the sample variance is
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 ∑𝑛
𝑖=1 𝑓(𝑥𝑖 −𝑥̅ )
2
𝑠2 = =
𝑛−1 𝑛−1
Suppose the observations or data are from a random sample, then the sample variance is
Set A -5 -4 -3 -2 0 1 2 4 7
Set B -5 -1 -1 -1 0 0 0 1 7
Set A 25 16 9 4 0 1 4 16 49
Set B 25 1 1 1 0 0 0 1 49
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
25+16+⋯+49 124
Set A: 𝑠2 = = = = 15.5
𝑛−1 9−1 8
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
25+1+⋯+49 78
Set B: 𝑠2 = = = = 9.75
𝑛−1 9−1 8
By using the squares of the deviations to compute the variance, we obtain a number in squared
units. That is, if the original measurements were in feet, the variance would be expressed in
square feet. To get a measure of variation expressed in the same units as raw data, as was the
case for the range, we take the square root of the variance. Such a measure is called the
standard deviation.
The standard deviation is the measure of the variation of a set of data in terms of the amounts
by which the individual values differ from their mean. It is considered as the most stable
measure of spread, and is usually preferred in experimental and research studies where in-
depth statistical analysis of data in involved. It is affected by the value of each data.
To find the standard deviation of the population variance (𝜎 2 ) and sample variance (s2) , just
take the square root.
∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
s = √𝜎 2 = √𝑠 2 = √ =√
𝑁 𝑛−1
Example 13 (Ungrouped): Calculate the variance and the standard deviation of the given
scores
Scores d
Standard Deviation (d2)
(x) ̅)
(x-𝒙
45 0.2 0.4
42 -2.8 7.84
46 1.2 1.44
43 -1.8 3.24
48 3.2 10.24
𝑥̅ = 44.8 Σd2=23.16
∑𝑛 (𝑥 − 𝑥̅ )2
√𝑠2 = √ 𝑖=1 𝑖 = √5.79 = 2.41
𝑛−1
Example 14 (Grouped):
Deviation Squared
Frequency Midpoint
Scores f(x) (d) Deviation f(d2)
(f) (x)
(x-𝐱̅) (d2)
90-94 7 92 644 9 81 567
85-89 13 87 1131 4 16 208
80-84 16 82 1312 -1 1 16
75-79 8 77 616 -6 36 288
70-74 6 72 432 -11 121 726
𝑥̅ =83 n= 50 Σf(x)=4135 Σf(d2)= 1805
∑𝑛 𝑓(𝑥𝑖 − 𝑥̅ )2
√𝑠2 = √ 𝑖=1 = √36.84 = 6.07
𝑛−1
CHEBYSHEV’S THEOREM
In sections 2.2 and 2.3 we described a set of observations--a population or a sample—by
means of a center or average and the variability about this average. The two values most often
used by statisticians are the mean and the standard deviation. If a distribution of measurements
has a small standard deviation, we would expect most of the values to be grouped closely
around the mean. However, a large value of the standard deviation indicates a greater
variability, in which case we would expect the observation to be more spread out from the
mean.
The Russian mathematician P.L. Chebyshev (1821-1894) discovered that the fraction of the
measurements falling between any two values symmetric about the mean is related to the
standard deviation. Chebyshev’s Theorem gives a conservative estimate of the fraction of
measurements falling k standard deviations of the mean for any fixed number k.
Chebyshev’s Theorem. At least the fraction 1 – 1/k2 of the measurements of any set of data
must lie within k standard deviations of the mean.
Example 15: If the IQs of a random sample of 1080 students at a large university have a mean
score of 120 and a standard deviation of 8, use Chebyshev’s Theorem to determine the interval
containing at least 810 of the IQs in the sample.
From this interval draw a statistical inference concerning the IQs of all students at this
university. In what range can we be sure that no more than 120 of the scores fall?
Which means that the interval from 104 to 136 contains at least ¾ or at least 810 of the IQs of
our sample. From this result, one might make the inference that at least ¾ of the IQs for the
entire university fall in the interval from 104 to136.
Z-SCORES
An observation, x, from a population with mean μ and standard deviation 𝜎, has a z-score or z
value defined by
𝑥−𝜇
𝑧=
𝜎
Where: z= z value
x = observation
𝜇 = population mean
𝜎 = population standard deviation
A z score measures how many standard deviations an observation is above or below the mean.
Since 𝜎 is never negative, a positive z score measures the number of standard deviations an
observation is above the mean, and a negative z score gives the number of standard deviations
an observation is below the mean. Note that the units of the denominator and the numerator of
a z score cancel. Hence a z score is unitless, thereby permitting a comparison of two
observations relative to their groups, measured in completely different units.
Example 16: Different typing skills are required for secretaries depending on whether one is
working in a law office, an accounting firm or for a research mathematical group at a major
university. In order to evaluate candidates for these positions, an employment agency
administers three distinct standardized typing samples. A time penalty has been incorporated
into the scoring of each sample based on the number of typing errors. The mean and standard
deviation for each test, together with the score achieved by a recent applicant are given in the
table.
Sample Applicant’s Score Mean Standard Deviation
Law 141sec 180 sec 30 sec
Accounting 7min 10 min 2 min
Scientific 33 min 26 min 5 min
FREQUENCY DISTRIBUTIONS
Important characteristics of a large mass of data can be readily assessed by grouping the data
into different classes and then determining the number of observations that fall in each of the
classes. Such an arrangement, in tabular form, is called a frequency distribution.
Data that are presented in the form of a frequency distribution are called grouped data. We
often group the data of a sample into intervals to produce a better overall picture of the unknown
population, but in so doing we lose the identity of the individual observations in the sample.
Example 1:
Weight (Kilograms) Number of Pieces
7-9 2
10-12 8
13-15 14
16-18 19
19-21 7
For this data we have used 5 class intervals namely 7-9, 10-12, 13-15, 16-18, and 19-21.
We often group the data of a sample into intervals to produce a better overall picture of the
unknown population, but in so doing we lose the identity of the individual observations in the
sample.
The original data were recorded to the nearest kilogram, so the 8 observations in the interval
10-12 are the weights of all the pieces of luggage weighing more than 9.5 kilograms but less
than 12.5 kilograms. The numbers 9.5 and 12.5 are called the class boundaries for the given
interval.
The smallest and largest values that can fall in a given class interval are referred to as its class
limits.
The smallest value in the class interval is called the lower class limit, while the largest value is
called the upper class limit.
Class boundaries are always carried out to one more decimal place than the recorded
observations. This ensures that no observation can fall precisely on a class boundary and
thereby avoids any confusion as to which class the observation belongs.
Lower Class Boundary- is subtracting 0.5 from the lower class limit of the class interval.
Upper Class Boundary- is adding 0.5 to the upper class limit of the class interval.
Class Frequency- It is the number of observations falling in a particular class and is denoted by
the letter f.
Class Width - It is the numerical difference between the upper and lower class boundaries of a
class interval.
Class Mark or Class Midpoint - It is the midpoint between the upper and lower class
boundaries or class limits of a class interval.
Example 2:
Weight (Kilograms) Number of
Class Boundaries Class Mark (x)
(c) Pieces (f)
7-9 2 6.5-9.5 8
10-12 8 9.5-12.5 11
13-15 14 12.5-15.5 14
16-18 19 15.5-18.5 17
19-21 7 18.5-21.5 20
Example 3: To illustrate the construction of frequency distribution consider the following data
which represents the lives of 40 similar car batteries recorded to the nearest tenth of a year. The
batteries are guaranteed to last 3 years.
2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6
3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5
To obtain the variations of these data, it can be done by listing the relative frequencies or
percentages for each interval. How?
The relative frequency of each class can be obtained by dividing the class frequency by the total
frequency. A table listing the relative frequencies is called a relative frequency distribution. If
each relative frequency is multiplied by 100%, we have percentage distribution.
In many situations, we are more concerned with the number of data that fall below the specified
value. For example, instead of identifying the number of batteries that lasted to 3 years, we are
more concerned with the 7 batteries below 3 years.
The total frequency of all values less than the upper class boundary of a given interval is called
the cumulative frequency up to and including that class. A table showing the cumulative
frequency is called cumulative frequency distribution.
The percentage cumulative distribution enables one to read off the percentage of observations
falling below certain specified values.
Class Class Class Relative Cumulative Cumulative
Frequency Percentage
Interval Boundaries Midpoint Frequency Frequency Percent
The examples discussed a while back are under numerical data, but take note that frequency
tables are also applicable to categorical data.
Graphical Representations
The information provided by a frequency distribution in tabular form is easier to grasp if
presented graphically. Most people find a visual picture beneficial in comprehending the
essential features of a frequency distribution.
Example 4:
Although the bar chart provides immediate information about a set of data in a condensed form,
we are usually more interested in a related pictorial representation called a histogram.
A histogram differs from a bar chart in that the bases of each bar are the class boundaries
rather than the class limits. The use of class boundaries for the bases eliminates the spaces
between the bars to give the solid appearance.
Example 5:
Frequency Polygon. A second useful way of presenting numerical data in graphic form is by
means of frequency polygon. Frequency Polygons are constructed by plotting class
frequencies against class marks and connecting the consecutive points by straight lines.
A polygon is a many sided closed figure. To close the frequency polygon, an additional class
interval is added to both ends of the distribution, each with zero frequency.
Relative Frequency Polygon. If we wish to compare two sets of data with unequal sample
sizes by constructing two frequency polygons on the same graph, we must use relative
frequencies or percentages. This is called a relative frequency polygon or a percentage
polygon.
Cumulative Frequency Polygon or Ogive. A second line graph, which is obtained by plotting
the cumulative frequency less than any upper class boundary against the upper class boundary
and joining all the consecutive points by straight lines. If relative cumulative frequencies or
percentages had been used, we would call the graph a relative frequency ogive or a
percentage ogive.
SYMMETRY AND SKEWNESS
The shape or distribution of a set of measurements is best displayed by means of a histogram.
A distribution is said to be symmetric if it can be folded along a vertical axis so that the two
sides coincide. While a distribution that lacks symmetry with respect to a vertical axis is said to
be skewed.
Example 6: Symmetric
Example: Skewed
The first distribution is skewed to the right or positively skewed, since it has along right tail
compared to a much shorter left tail.
For a symmetric distribution of measurements, the mean and median are both located at the
same position along the horizontal axis. However, if the data are skewed to the right, the large
values in the right tail are not offset by correspondingly low values in the left tail and
consequently the mean will be greater than the median. In the opposite, the small values in the
left tail will make the mean less than the median. We shall use this behaviour between the mean
and the median relative to the standard deviation to define a numerical measure of skewness.
For a perfectly symmetrical distribution, the mean and the median are identical and the value of
SK is zero.
SK is the Pearsonian coefficient of skewness, which is given by:
3(𝑥̅ − 𝑥̃) 3(𝜇 − 𝜇̃)
𝑆𝐾 = 𝑜𝑟 𝑆𝐾 =
𝑠 𝜎
Where: SK= Coefficient of Skewness
𝑥̅ = 𝜇 = mean
𝑥̃ = 𝜇̃ =median
s = standard deviation
When the distribution is skewed to the left, the mean is less than the median and the value of
SK will be negative. However, if the distribution is skewed to the right, the mean is greater than
the median and the value of SK will be positive. In general, the values of SK will fall between -3
and 3.
Example 7: Compute the Pearsonian coefficient of skewness for the distribution of battery lives
in the previous examples:
Solution: Assuming the data to be a sample, we find that the mean is 3.41, the median is 3.4
and the standard deviation is 0.70. Therefore,
3(𝑥 − 𝑥) 3(3.41 − 3.4)
𝑆𝐾 = = = 0.04
𝑠 0.70
indicating only a very slight amount of skewness to the right. With such a small value of SK, we
could essentially say that the distribution is symmetrical.
Note: Although histograms assume a wide variety of shapes, fortunately most distributions that
we meet in practice can be represented approximately by bell-shaped histograms for which the
SK will be very close to zero. This will be true for any set of data where the frequency of the
observations falling in the various classes decreases at roughly the same rate as we get farther
out in the tails of the distribution. These bell-shaped distributions play a major role in the field of
statistical inference. Some are understandably more variable than others as reflected by a flatter
and wider histogram. Chebyshev’s Theorem tells us that at least ¾ 0r 8/9 of the observations of
any distribution, bell-shaped or not, will be within 2 or 3 standard deviations of the mean,
respectively. If the distribution happens to be somewhat bell-shaped, we can state a rule that
gives even stronger results.
Empirical Rule:
Given a bell-shaped distribution of measurements, then approximately
68% of the observations lie within 1 standard deviation of the mean
95% of the observations lie within 2 standard deviations of the mean
99.7% of the observations lie within 3 standard deviations of the mean.
Example 8:
From the given example on car batteries, we find that the mean is 3.41 and standard deviation
is 0.70. Now the Empirical Rule states that approximately 68% or 27 of the 40 observations
should be contained in the interval mean(x) ± s = 3.41± 0.70, or from 2.71 to 4.11. An actual
count shows that 28 of the 40 observations fall in the given interval. Similarly, 95% or 38 out of
40 observations should fall in the interval mean ± 2s = 3.41 ± 2(0.70) or from 2.01 to 4.81. The
actual count this time shows that 38 of the 40 observations fall in the specified interval. The
interval mean ± 3s = 3.41 ± 3(0.70) or from 1.31 to 5.51, contains all measurements. By
Chebyshev’s theorem we could only have concluded that at least 30 observations will fall in the
interval from 2.01 to 4.81 and at least 36 observations will fall between 1.31 to 5.51.
PERCENTILES, DECILES AND QUARTILES
We already discussed the measures of central location, but there are several other measures of
location that describe or locate the position of certain non-central pieces of data relative to the
entire set of data. These measures often referred to as fractiles or quantiles.
Fractiles or Quantiles. These are values below which a specific fraction or percentage of the
observations in a given set must fall.
Percentiles. These are values that divide a set of observations into 100 equal parts. These
values, denoted by P1, P2, … P99 , are such that 1% of the data falls below P1, 2% falls below
P2, … , and 99% falls below P99.
Example 9. Find the P85 for the distribution of the battery lives in the previous example.
2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6
3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5
Solution: First, rank the given data in increasing order of magnitude.
1.6 2.6 3.1 3.2 3.4 3.7 3.9 4.3
1.9 2.9 3.1 3.3 3.4 3.7 3.9 4.4
2.2 3.0 3.1 3.3 3.5 3.7 4.1 4.5
2.5 3.0 3.2 3.3 3.5 3.8 4.1 4.7
2.6 3.1 3.2 3.4 3.6 3.8 4.2 4.7
Solution: Since the table contains 40 observations, we seek the value below which (85/100) *
40 = 34 observations fall. Based from the table, P 85 could be any value between 4.1 years and
4.2 years. In order to give a unique value, we shall define P 85 to be the value midway between
these two observations. Therefore, P85 = 4.15 years.
This procedure works very well whenever the number of observations below the given
percentile is a whole number. However, when the required number of observations is fractional,
it is customary to use the next highest whole number to find the required percentile.
Example 10: Find the P48 in the given set of observations of the 40 car batteries.
Round up to the next integer, so, use the 20th observation as the location point. Hence, P48 = 3.4
years
Example 11: Find P48 for the distribution of batter lives shown below:
Solution: We are seeking the value below which (48/100) * 40 = 19.2 of the observations fall.
The fact that the observations are assumed uniformly distributed over the class interval, permits
us to use fractional observations, as is the case here. There are 7 observations falling below the
class boundary 2.95. We still need 12.2 of the next 15 observations falling between 2.95 and
3.45. Therefore, we must go a distance (12/15) * 0.5 = 0.41 beyond 2.95.
Hence, P48 = 2.95 + 0.41
= 3.36 years
compared with 3.4 years obtained above from the ungrouped data. Therefore, we conclude that
48% of all batteries of this type will last less than 3.36 years.
Deciles. These are values that divide a set of observations into 10 equal parts. These values,
denoted by D1, D2, …, D9, are such that 10 % of the data falls below D1, 20% falls below D2, …
and 90% falls below D 9.
Example 12: Use the frequency distribution of the lives of car batteries to find the D 7.
Solution: We need the value below which (70/100) * 40 = 28 observations fall. There are 22
observations falling below 3.45. We still need 6 of the next 10 observations and therefore, we
must go a distance (6/10) * 0.5 = 0.3 beyond 3.45. Hence,
D7 = 3.45 + 0.3 = 3.75 years.
Deciles are found in exactly the same way that we found percentiles. To find D 7 for the
distribution of battery lives, we need the value below which (70/100) * 40 = 28 of the
observations fall. Since this can be any value between 3.7 years and 3.8 years, we take their
average and hence D7 = 3.75 years. Therefore, we conclude that 70% of all batteries of this
type will last less than 3.75 years.
Quartiles. These are values that divide a set of observations into 4 equal parts. These values,
denoted by Q1, Q2, and Q3 , are such that 25% of the data falls below Q1, 50% falls below Q2
and 75% falls below Q3.
Example 13: To find Q1 for the distribution of the battery lives, we need the value below which
(25/100) * 40 = 10 of the observations fall. Since the 10th and 11th measurements are both equal
to 3.1 years, their average will also be 3.1 years and hence Q 1 = 3.1 years.
Note: The 50th, 5th Decile and second quartile (Q1) of a distribution are all equal to the same
value, commonly referred to as the median. All the quartiles and deciles are percentiles. For
example, the seventh decile is the 70 th percentile and the first quartile is the 25th percentile. Any
percentile, decile or quartile can also be estimated from a percentage ogive.
PROBABILITY
Sample Space
In statistics we use the word experiment to describe any process that generates a set of data.
An example of a statistical experiment might be the tossing of a coin. In this experiment there
are only two possible outcomes, heads and tails. We are particularly interested in the
observations obtained by repeating the same experiment several times. In most cases the
outcomes will depend on chance and therefore, cannot be predicted with certainty. Going back
to the coin, so even you tossed the coin repeatedly, we cannot be certain that a given toss will
result in a head. However, we do know the entire set of possibilities for each toss.
Sample Space. It is the set of all possible outcomes of a statistical experiment and is
represented by the symbol S.
Each outcome in a sample space is called an element or a member of the sample space or
simply a sample point. If the sample space has a finite number of elements, we may list the
members separated by commas and enclosed in brackets. Thus the sample space S, of
possible outcomes when a coin is tossed, may be written as:
Example 1: Consider the experiment of tossing a die. If we are interested in the number that
shows on the top face, then the sample space would be: S1 = {1, 2, 3, 4, 5, 6}
On the other hand, if we are interested only in whether the number is even or odd, then the
sample space is simply: S2 = {even, odd}
In this example, it illustrates the fact that more than one sample space can be used to describe
the outcomes of an experiment. In this case, Sample Space 1 provides more information than
S2. If we know which element in S1 occurs, we can tell which outcome in S2 occurs; however, a
knowledge of what happens in S2, in no way helps us know which element in S1 occurs. In
general, it is desirable to use a sample space that gives the most information concerning the
outcomes of the experiment.
In some experiments it will be helpful to list the elements of the sample space systematically by
means of a tree diagram.
Example 2: Suppose that 3 items are selected at random from a manufacturing process. Each
item is inspected and classified defective, D, or non-defective, N.
To list the elements of the sample space providing the most information, we construct the tree
diagram.
Now, the various paths of the tree give the distinct sample points. Starting at the top with the
first path, we get the sample point DDD, indicating the possibility that all three items inspected
are defective. Proceeding along the other paths, the sample space is
S = {DDD, DDN, DND, DNN, NDD, NDN, NND, NNN}
Sample spaces with a large or infinite number of sample points are best described by a
statement or rule. For example, if the possible outcomes of an experiment are the set of cities in
the world with a population over 1 million, our sample space is written.
The possible outcomes of an experiment are the set of cities in the world with a population over
1 million.
Which reads “S is the set of all x such that x is a city with a population over 1 million.”
The vertical bar is read “such that”. Similarly, if S is the set of all points (x,y) on the boundary of
a circle of radius 2 centered at the origin we write:
Example 3: S= {(x, y) | x2 + y2 = 4}
Whether we describe the sample space by the rule method or by listing the elements will
depend on the specific problem at hand. The rule method has practical advantages, particularly
in the many experiments where a listing becomes a very tedious chore.
Events
In any given experiment we may be interested in the occurrence of certain events, rather than in
the outcome of a specific element in the sample space. For instance, we might be interested in
the event A that the outcome when a die is tossed is divisible by 3. This will occur if the outcome
is an element of the subset A ={3,6} of the sample space S 1 in our previous example. As an
additional illustration, we might be interested in the event B that the number of defectives is
greater than 1 in example 2. this will occur if the outcome is an element of the subset B = {DDD,
DND, NDD, DDD} of the sample space S.
To each event we assign a collection of sample points that constitutes a subset of the sample
space. This subset represents all the elements for which the event is true.
Example 4: Given the sample space S = {t | t ≥ 0}, where t is the life in years of a certain
electric component, then the event A that the component fails before the end of the 5th year is
the subset A = {t | 0 ≤ t < 5}.
If an event is a set containing only one element of the sample space, then it is called a simple
event. A compound event is one that can be expressed as the union of simple events.
Example 5: The event of drawing a heart from a deck of 52 playing cards is the subset A =
{heart} of the sample space S = {heart, spade, club, diamond}. Therefore, A is a simple event.
Now the event B of drawing a red card is a compound event, since B = {heart U diamond} =
{heart, diamond}
Note that the union of simple events produces a compound event that is still a subset of the
sample space. We should also note that if the 52 cards of the deck were the elements of the
sample space rather than the 4 suits, then the event A of example 4 would be a compound
event.
Null Space. The null space or empty space is a subset of the sample space that contains no
elements. We denote this event by the symbol Ø.
The relationships between events and the corresponding sample space can be illustrated
pictorially by means of a Venn diagram.
In a Venn diagram we might let the sample space be a rectangle and represent events by
circles drawn inside the rectangle.
Sometimes, it is convenient to shade various areas of the Venn diagram just like this:
Intersection of Events. The intersection of two events A and B, denoted by the symbol A ∩ B,
is the event containing all elements that are common to A and B.
The elements in the set A ∩ B represent the
simultaneous occurrence of both A and B and
therefore must be those elements and only those
that belong to both A and B. These elements may
either be listed or defined by the rule A ∩ B = { x | x
€ A and x € B }, where the symbol € means “is the
element of” or belongs to. In the Venn diagram, the
shaded area corresponds to the event A ∩ B .
Examples 6:
1. Let A = {1, 2, 3, 4, 5} and B = {2, 4, 6, 8}; then A ∩ B = {2, 4}
2. If R is the set of all taxpayers and S is the set of all people over 65 years of age, then R ∩ S
is the set of all taxpayers who are over 65 years of age.
3. Let P = {a, e, i, o, u} and Q = {r, s, t}; then P ∩ Q = Ø. That is P and Q have no elements is
common.
In a certain statistical experiments it is by no means unusual to define two events A and B that
cannot both occur simultaneously. The events A and B are then said to be mutually exclusive.
Mutually Exclusive Events. Two events A and B are mutually exclusive if A ∩ B = Ø; that is, A
and B have no elements in common.
Example 7:
Suppose that the die is tossed. Let A be the event that an even number turns up and B the
event that an odd number shows. The events A = {2, 4, 6} and B = {1, 3, 5} have no points in
common, since an even and an odd number cannot simultaneously on a single toss of a die.
Therefore A ∩ B, and consequently the events A and B are mutually exclusive.
Often, one is interested in the occurrence of at least one of two events associated with an
experiment. Thus, in a die-tossing experiment, if A = {2, 4, 6} and B = {1, 3, 5}, we might be
interested in either A or B occurring, or both A and B occurring. Such an event is called Union of
A and B, will be occur if the outcome is an element of the subset {2, 4, 5, 6}
Union of Events. The union of two events A and B, denoted by the symbol A U B, is the event
containing all the elements that belong to A or to B or to both.
Examples 8:
1. Let A = {2, 3, 5, 8} and B = {3, 6, 8}; then
A U B = {2, 3, 5, 6,8}
2. If M = {x | 3 < x < 9} and N = {y | 5 < y < 12}; then M U N = {z | 3 < z < 12}
Suppose that we consider the smoking habits of the employees of some manufacturing firm as
our sample space. Let the subset of smokers correspond to some event. Then all the
nonsmokers correspond to some event, also a subset of S 1, which is called the complement of
the set of smokers.
Complement of an Event. The complement of an event A with respect to S is the set of all
elements of S that are not in A. We denote the complement A by the symbol A’.
Examples 9:
1. Let R be the event that a red card is selected
from an ordinary deck of 52 playing cards and let S
be the entire deck. Then R’ is the event that the
card selected from the deck is not red but a black
card.
2. Consider the sample S = {book, dog, cigarette, coin, map, war}. Let A = {dog, war, book,
cigarette}. Then A’ = {coin, map}.
Several results that follow from the foregoing definitions which may easily be verified by means
of Venn Diagrams are
1. A ∩ Ø = Ø
2. A U A’ = S
3. (A’)’ = A
4. A U Ø = A
5. S’ = Ø
6. A ∩ A’ = Ø
7. Ø’ = S
Theorem 4.1 (Multiplication Rule). If an operation can be performed in n1 ways, and if for each
of these a second operation can be performed in n 2 ways, then the two operations can be
performed together in n1n2 ways.
Example 10: How many sample points are in the sample space when a pair of dice is thrown
once?
Solution: The first die can land in any of 6 ways. For each of these 6 ways the second die can
also land in 6 ways. Therefore, the pair of dice can land in (6)(6) = 36 ways.
Theorem 4.1 may be extended to cover any number of operations. In example 2, for instance,
the first item can be classified in 2 ways, defective or non-defective, and likewise for the second
and third times, resulting to (2)(2)(2) = 8 possibilities displayed in the tree diagram. The general
multiplication rule covering k operations is stated in the following theorem.
Example 11: How many lunches are possible consisting of soup, a sandwich, dessert and a
drink if one can select from 4 soups, 3 kinds of sandwiches, 5 desserts and 4 drinks?
Example 12: How many even three-digit numbers can be formed from the digits 1, 2, 5, 6, and
9 if each digit can be used only once?
Solution: Since the number must be even, we have only 2 choices for the units position. For
each of these we have 4 choices for the hundreds position and 3 choices for the tens position.
Therefore, we can form a total of (2)(4)(3) = 24 even three-digit numbers
Frequently, we are interested in a sample space that contains as elements all possible orders or
arrangements of a group of objects. For example, we might want to know how many
arrangements are possible for sitting 6 people around a table, or we might ask how many
different orders are possible to draw 2 lottery tickets from a total of 20. The different
arrangements are called permutations.
Consider the three letters a, b, and c. The possible permutations are abc, acb, bac, cab, cba,
bca. Thus we see that there are 6 distinct arrangements. Using theorem 4.2, we could have
arrived at the answer without listing the different orders. There are 3 positions to be filled from
the letters a, b, c. Therefore, we have 3 choices for the first position, and 2 for the second,
leaving only 1 choice for the last position, giving a total of (3)(2)(1) = 6 permutations. In general,
n distinct objects can be arranged in n(n-1)(n-2) … 3*2*1 ways. We represent this product by
the symbol n!, which is read “n factorial”. Three objects can be arranged in 3! = 3*2*1 = 6. By
definition 1! = 1 and 0! = 1.
The number of permutations of the four letters a, b, c, and d will be 4! = 24. Let us now consider
the number of permutations that are possible for taking the 4 letters 2 at a time. These would be
ab, ac, ad, ba, ca, da, bc, cb, bd, db, cd and dc. Using Theorem 4.1, we have 2 positions to fill
with 4 choices for the first and 3 choices for the second, a total of (4)(3) = 12 permutations. In
general, n distinct objects taken r at a time can be arranged in n(n-1)(n-2) …(n-r +1) ways. We
represent this product by the symbol nPr = n!/ (n-r)!
Theorem 4.4. The number of permutations of n distinct objects taken r at a time is:
𝒏!
nPr =
(𝒏−𝒓)!
Example 13: Two lottery tickets are drawn from 20 for first and second prizes. Find the number
of sample points in the space S.
𝟐𝟎!
20P2 =
(𝟐𝟎 − 𝟐)!
𝟐𝟎!
=
𝟏𝟖!
= (20)(19)
= 380
Example 14: How many ways can a basketball team schedule 3 exhibition games with 3 teams
if they are all available on any of 5 possible dates?
5P3 = 5!/ (5-3)! = 5!/2! = (5)(4)(3)
= 60
Permutations that occur by arranging objects in a circle are called circular permutations. 2
circular permutations are not considered different, unless corresponding objects in the two
arrangements are preceded or followed by a different object as we proceed in a clockwise
direction.
Example15: If four people are playing bridge, we do not have a new permutation if they all
move one position in a clockwise direction. By considering 1 person in a fixed position and
arranging the other 3 in 3! Ways, we find that there are distinct arrangements for the bridge
game.
Theorem 4.5. The number of permutations of n distinct objects arranged in a circle is (n-1)!.
So far we have considered permutations of distinct objects. That is, all the objects were
completely different or distinguishable. Obviously, if the letters b and c are both equal to x, then
the 6 permutations of the letters a, b and c become axx, axx, xax, xax, xxa, and xxa, of which
only 3 are distinct. Therefore, with 3 letters, 2 being the same, we have 3!/ 2! = 3 distinct
permutations. With the four letters a, b, c, and d we had 24 distinct permutations. If we let a=b=x
and c=d=y, we can only list the following xxyy, xyxy, yxxy, yyxx, xyyx, yxyx. Thus, we have 4!/
2!2! =6distinct permutations.
Theorem 4.6. The number of permutations of n things of which n1 are one of a kind, n2 of a
second kind, … nk of the kth kind is:
𝒏!
𝒏𝟏 ! 𝒏𝟐 ! … 𝒏𝒌 !
Example 16: How many different ways can 3 red, 4 yellow and 2 blue bulbs be arranged in a
string of Christmas tree lights with 9 sockets?
𝟗!
= 𝟏𝟐𝟔𝟎
𝟑! 𝟒! 𝟐!
Often we are concerned with the number of ways of partitioning a set of n objects into r subsets,
called cells. A partition has been achieved if the intersection of every possible pair of the r
subsets is the empty set Ø and if the union of all subsets gives the original set. The order of the
elements within a cell is of no importance. Consider the set {a, e, I, o, u}. The possible partitions
into 2 cells are {(a,e,I,o),(u)}, {(a,I,o,u),(e)}, {(a,e,o,u),(i)}, {(a,e,i,u),(o)}, {(e,I,o,u),(a)}. We see
that there are 5 such ways to partition a set of 5 elements into 2 subsets or cells containing 4
elements in the first cell and 1 element in the second.
𝟓 𝟓!
The number of partitions for this illustration is denoted by (
)= =𝟓
𝟒, 𝟏 𝟒!𝟏!
Where the top number represents the total number of elements and the bottom numbers
represent the number of elements going into each cell.
Theorem 4.7. The number of ways of partitioning a set of n objects into r cells with n 1 elements
in the first cell, n2 elements in the second and so on, is:
𝒏 𝒏!
(𝒏 , 𝒏 , … , 𝒏 ) =
𝟏 𝟐 𝒓 𝑛1 !, n2 !, … , n𝑟 !
Where: n1 + n2 + ⋯ + nr = n
Example 17: How many ways can 7 people be assigned to 1 triple and 2 double rooms?
𝟕 𝟕!
( )= = 𝟐𝟏𝟎
𝟑, 𝟐, 𝟐 3! 2! 2!
In several problems we are interested in the number of ways of selecting r objects from n
without regard to order. These selections are called combinations.
A combination creates a partition with 2 cells, one cell containing the r objects selected and the
other cell containing the n-r objects that are left. The number of such combinations, denoted by
𝒏 𝒏
(𝒓, 𝒏 − 𝒓) is usually shortened to ( ), since the number of elements in the second cell must be
𝒓
n-r.
Example 18: From 4 Republicans and 3 Democrats, find the number of committees of 3 that
can be formed with 2 Republicans and 1 Democrat.
Using Theorem 4.1 (Multiplication Rule), we find the number of committees that can be formed
with 2 Republicans and 1 Democrat to be (3)(6) = 18.
It is of interest to note that the number of permutations of the r objects making up each of the
𝒏
( ) combinations in theorem 4.8 is r!. . Consequently, the number of permutations of n distinct
𝒓
𝒏
objects taken r at a time is related to the number of combinations by the formula nPr = ( )r!.
𝒓
Probability of an Event
The statistician is basically concerned with drawing conclusions or inferences from experiments
involving uncertainties. For these conclusions and inferences to be accurately interpreted, an
understanding of probability theory is essential.
What do we mean when we make the statements “John will probably win the tennis match”, I
have a 50:50 chance of getting an even number when a die is tossed,” “I am not likely to win at
bingo tonight,” or “Most of our graduating class will probably be married within 3 years”? In each
case we are expressing an outcome of which we are not certain, but because of past
information or from an understanding of the structure of the experiment, we have some degree
of confidence in the validity of the statement.
The mathematical theory of probability for finite sample spaces provides a set of real numbers
called weights or probabilities, ranging from 0 to 1, which allow us to evaluate the likelihood of
occurrence of events.
To every point in the sample space we assign a probability such that the sum of all probabilities
is 1. if we have reason to believe that a certain sample point is quite likely to occur when the
experiment is conducted, the probability assigned should be close to 1. On the other hand, a
probability closer to zero is assigned to sample point that is not likely to occur. In many
experiments, such as tossing a coin or die, all the sample points have the same chance of
occurring and are assigned equal probabilities. For points outside the sample space, that is, for
simple events that cannot possibly occur, we assign a probability of zero.
To find the probability of an event A, we sum all the probabilities assigned to the sample points
A. This sum is called the probability of A and is denoted by P(A). Thus the probability of the set
Ø is zero and the probability of S is 1.
The probability of an event A is the sum of the probabilities of all sample points in A. Therefore,
0 ≤ P(A) ≤ 1, P(Ø) = 0, P(S) = 1
Example 19: A coin is tossed twice. What is the probability that at least 1 head occurs?
Example 20: A die is loaded in such a way that an even number is twice as likely to occur as an
odd number. If E is the event that a number less than 4 occurs on a single toss of the die, find
P(E).
If the sample space for an experiment contains N elements all of which are equally likely to
occur, we assign probabilities equal to 1/N to each of the N points. The probability of any event
A containing n of these N sample points is then the ratio of the number of elements in A to the
number of elements in S.
Theorem 4.9. If the experiment can result in any one of N different equally likely outcomes, and
if exactly n of these outcomes correspond to event A, then the probability of event A is
P(A) = n/N
Example 21: If a card is drawn from an ordinary deck, find the probability that is a heart.
Solution: The number of possible outcomes is 52, of which 13 are hearts. Therefore, the
probability of event A of getting a heart is P(A) = 13/52 = ¼ = 0.25 or 25%
Example 22: In a poker hand consisting of 5 cards, find the probability of holding 2 aces and 3
jacks.
By the multiplication rule of theorem 4.1, there are n = 4*6 = 24 hands with 2 aces and 3 jacks.
𝟓𝟐
The total number of 5 cards poker, all of which are equally likely, is N = ( )= 52! / 5! 47! = 2,
𝟓
598, 960
Therefore, the probability of event C of getting 2 aces and 3 jacks in a 5-card poker hand is
P(C) = n/N = 24 / 2, 598, 960 = 0.000009
If the probabilities cannot be assumed equal, they must be assigned on the basis of prior
knowledge or experimental evidence. For example, if a coin is not balanced, we could estimate
the two possibilities by tossing the coin a large number of times and recording the outcomes.
The true probabilities would be the fractions of heads and tails that occur in the long run. This
method is at probabilities is known as the relative frequency definition of probability.
To find a numerical value that represents adequately the probability of winning at tennis, we
must depend on our past performance at the game as well as that of our opponent and to some
extent in our belief in being able to win. Similarly, to find the probability that a horse will win a
race, we must arrive at a probability based on the previous records of all the horses entered in a
race. Intuition would undoubtedly also play a part in determining the size of the bet that one
might be willing to wager. The use of intuition, personal beliefs and other indirect information in
arriving at probabilities is referred to as the subjective definition of probability.
Additive Rules
Often, it is easier to calculate the probability of an event from known probabilities of other
events. This may well be true if the event is question can be represented as the union of two or
more other events or as the complement of some event. Several important laws that frequently
simplify the computation of probabilities follow. The first, called the additive rule, applies to
unions of events.
Corollary 1
If A and B are mutually exclusive, then
P(A U B) = P(A) + P(B)
The corollary is an immediate result of the additive rule, since A and B are mutually exclusive. In
general, if we have more those events in a sample space, we can write the sum of all the
probabilities of the events using the corollary 2.
Corollary 2
If A1, A2, A3, … An are mutually exclusive, then
P(A1 U A2 U … U An) = P(A1) + P(A2)+ … + P(An)
Note that if A1, A2, A3, … An are partitions of a sample space S, then the probability is equal to 1.
Example 24: The probability that a student passes mathematics is 2/3 and the probability that
he passes English is 4/9. If the probability of passing at least one course is 4/5, what is the
probability that he passes both courses?
Solution: If M is the event “passing mathematics” and E is the event “passing English”, then by
transposing the terms in Theorem 4.10, we have
P (M∩E) = P(M) + P(E) – P(M ∪ E)
= 2/3 + 4/9 – 4/5
= 14/45 = 31.11%
Example 25: What is the probability of getting a total of 7 or 11 when a pair of dice is tossed?
Sample points in the 2 dice are 36 = 6 * 6, and there are 6 elements where a total of 7 occurs
which are 5+2, 2+5, 4+3, 3+4, 6+1, 1+6, and for a total of 11, there are 2 sample points and
these are 5+6 and 6+5. Since there are 6 for total of 7, let A be the event for that which is 1/6 or
6/36, and B for total of 11 which is 2/36 or 1/18. To add the probability of the two, we have 2/9.
Often, it is more difficult to calculate the probability that an event occurs than it is to calculate
the probability that the event does not occur. Should this be the case for some event A, we
simply find P(A’) first and then using Theorem 4.11, find P(A) by subtraction.
Proof. Since A ∪ A’ = S and the events A and A’ are mutually exclusive, then
1 = P(S)
= P(A ∪ A’)
= P(A) + P(A’)
Example 26: A coin is tossed 6 times in succession. What is the probability of at least 1 head
occurs?
Solution: Let E be the event that at least 1 head occurs. The sample space consists of 2 6 = 64
sample points, since each toss can result in 2 outcome. Now, P(E) = 1 – P(E’), where E’ is the
event that no head occurs. This can happen in only one way – when all tosses result in tail.
Therefore, P(E’) = 1/64 and P(E) = 1 – 1/64 = 63/64.
Conditional Probability
The probability of an event B occurring when it is known that some event A has occurred is
called a conditional probability and is denoted by P(B|A), which is usually read as “the
probability that B occurs given that A occurs” or “the probability of B, given A”.
Example 27: The probability that a regularly scheduled flight departs on time is P(D) = 0.83, the
probability that it arrives on time is P(A) = 0.92, and the probability that it departs and arrives on
time is P(D ∩ A) = 0.78. Find the probability that a plane
a) Arrives on time given that it departed on time;
b) Departed on time given that it has arrived on time.
Solution
a) Arrives on time given that it departed on time;
𝑃(𝐷∩𝐴) 0.78
P(A|D) = = = 0.94
𝑃(𝐷) 0.83
b) Departed on time given that it has arrived on time.
𝑃(𝐷∩𝐴) 0.78
P(D|A) = = = 0.85
𝑃(𝐴) 0.92
Multiplicative Rules
If in an experiment the events A and B can both occur, the
P(A ∩ B) = P(A) P(B|A)
Example 28: Suppose that we have a fuse box containing 20 fuses, of which 5 are defective. If
2 fuses are selected at random and removed from the box in succession without replacing the
first, what is the probability that both fuses are defective?
Solution: We shall let A be the event that the first fuse is defective and B be the event that the
second fuse is defective; then we interpret that A ∩ B as the event that A occurs and B occurs
after A has occurred. The probability of first removing a defective fuse is ¼ and then the
probability of removing a second defective fuse from the remaining 4 is 4/19. Hence, solution
P(A ∩ B) = P(A) P(B|A)
= (1/4) (4/19)
= 1/19
Therefore, to obtain the probability that two independent events will both occur, we simply find
the product of their individual probabilities.
Example 29: A small town has one fire engine and one ambulance available for emergencies.
The probability that the fire engine is available when needed is 0.98, and the probability that the
ambulance is available when called is 0.92. In the event of an injury resulting from a burning of
building, find the probability that both the ambulance and the fire engine will be available.
Solution: Let A and B represent the respective events that the fire engine and ambulance
available. Then,
P(A ∩ B) = P(A) P(B)
= 0.98 (0.92)
= 0.9016
Example 30: Three cards are drawn in succession, without replacement, from an ordinary deck
of playing cards. Find the probability that the first card is a red ace, the second card is a ten or
jack and the third is greater than 3 but less than 7.
P(A1) = 2/52
P(A2|A1) = 8/51
P(A3|A1∩A2) = 12/50
Hence, P(A1∩A2∩A3) = (2/52)(8/51)(12/50) = 8/5525
Teaching and Learning Activities
Squared f(d2)
Cumulative Deviation
Class Frequency Midpoint Lower Upper Deviation
f(x) Frequency (d)
Interval (f) (x) Limit Limit (d2)
(F) (x - 𝑥̅ )
(x - 𝑥̅ )2
91-100 8
81-90 11
71-80 9
61-70 13
51-60 18
41-50 12
31-40 10
21-30 9
11-20 7
01-10 3
i= n= Σf(x)= Σd2= Σfd2=
Assessment Task
Problem Set
Reference/s:
Walpole, Ronald E. Introduction to Statistics. International Edition. Third Edition. Prentice Hall
International, Inc. 1997. ISBN 981-4009-51-2.
https://www.questionpro.com/blog/non-probability-sampling/
https://www.questionpro.com/blog/probability-sampling/