[go: up one dir, main page]

0% found this document useful (0 votes)
331 views48 pages

Chapter 3 Data

This chapter discusses data management and statistical analysis. It explains that statistics has been used since biblical times to compile descriptive data on topics like taxes, wars and crops. Modern statistics also allows for decision-making through generalization and prediction using probability theory. The chapter aims to teach students to use statistical tools to process and manage numerical data, apply probability methods, and use statistical data in decision-making. It differentiates between descriptive statistics, which describes a data set, and statistical inference, which allows for predictions about a larger data set from analyzing a sample. Key concepts introduced are observations, parameters, statistics, populations and samples.

Uploaded by

Wild Rift
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
331 views48 pages

Chapter 3 Data

This chapter discusses data management and statistical analysis. It explains that statistics has been used since biblical times to compile descriptive data on topics like taxes, wars and crops. Modern statistics also allows for decision-making through generalization and prediction using probability theory. The chapter aims to teach students to use statistical tools to process and manage numerical data, apply probability methods, and use statistical data in decision-making. It differentiates between descriptive statistics, which describes a data set, and statistical inference, which allows for predictions about a larger data set from analyzing a sample. Key concepts introduced are observations, parameters, statistics, populations and samples.

Uploaded by

Wild Rift
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Chapter 3: DATA MANAGEMENT

Overview/Introduction

The processing of statistical information has a history that extends back to the beginning of
mankind. In early biblical times nations compiled statistical data to provide descriptive
information relative to all sorts of things, such as taxes, wars, agricultural crops and even
athletic events. Today, with the development of probability theory, we are able to use statistical
methods that not only describe important features of data but methods that allow us to proceed
beyond the collected data into the area of decision making through generalizations and
predictions.

Learning Outcome/Objective

At the end of this chapter, the students shall be able to:


1. Use a variety of statistical tools to process and manage numerical data;
2. Use the methods of probability; and
3. Advocate the use of statistical data in making important decisions.

Learning Content/Topic

Examples
 4 out of 5 dentists recommend Dentyne.
 Almost 85% of lung cancers in men and 45% in women are tobacco-related.
 People predict that it is very unlikely there will be another baseball player with a batting
average over 400.
 79.48% of all statistics are made up on the spot.
 A surprising new study shows that eating egg whites can increase one’s life span.

All these claims are statistical in character. In the study of statistics we are basically concerned
with the presentation and interpretation of chance outcomes that occur in a planned or scientific
investigation. Hence, having knowledge with basic statistic is very important.

STATISTICS
 Provides tools that you need in order to react intelligently to information you hear or
read.
 Can be applied in psychology, health, law, sports, business, etc.
 Are often presented in an effect to add credibility to an argument or advice.

In statistics, we shall refer to any recording of information, whether it is numerical or categorical


as an OBSERVATION.

Statistical methods are those procedures used in the collection, presentation, analysis and
interpretation of data. We shall categorize these methods as belonging to one of two major
areas called the descriptive statistics and statistical inference.

Descriptive Statistics. This comprises those methods concerned with collecting and
describing a set of data so as to yield meaningful information.
Let it clearly understood that descriptive statistics provides information only about the collected
data and in no way draws inferences or conclusions concerning a larger set of data. The
construction of tables, charts, graphs and other relevant computations in various newspapers
and magazines usually fall in the area categorized as descriptive statistics.

Statistical Inference. This comprises those methods concerned with the analysis of a subset of
data leading to predictions or inferences about the entire set of data.

The generalizations associated with statistical inferences are always subject to uncertainties,
since we are dealing only with partial information obtained from a subset of the data of interest.
To cope with uncertainties, an understanding of probability theory is essential. In this lesson, we
will be discussing some sort of ideas wherein, we could use our knowledge from this in all areas
of learning may benefit. The basic techniques for collecting and analyzing data are the same no
matter what the field of application may be.

For example, the chemist runs an experiment using 3 variables and measures the amount of
desired product. The results are then analyzed by statistical procedures. These same
procedures are used to analyze the results obtained by measuring the yield of grain when 3
fertilizers are tested or to analyze the data representing the number of defectives produced by 3
similar machines. Many statistical methods that were derived primarily from agricultural
applications have proved to be equally valuable for applications in other areas.

Statisticians are employed today by every progressive industry to direct their quality control
process and to assist in the establishment of good advertising and sales programs for their
products. In business the statistician is responsible for decision making, for the analysis of time
series, and for the formation of index numbers. Indeed, statistics is a very powerful tool if
properly used.

The abuse of statistical procedures will frequently lead to erroneous results. One should be
careful to apply the correct and most efficient procedure for the given conditions to obtain
maximum information from the data available.

The procedures used to analyze a set of data depend to a large degree on the method used to
collect the information. For this reason, it is desirable in any investigation to consult with the
statistician from the time the project is planned until the final results are analyzed and
interpreted. Statistics are all around us, sometimes we need to used it well, sometimes not.

POPULATION AND SAMPLE


Before we begin gathering and analyzing data we need to characterize the population we are
studying. Population is the group, the collected data is intended to describe. Sometimes the
intended population is called the target population, since if we design our study badly, the
collected data might not actually be representative of the intended population. The totality of
observations with which we are concerned whether their number be finite or infinite, constitutes
what we call a population. In the past years, the word population referred to observations
obtained from statistical studies involving people. Today, the statistician uses the term to refer to
observations relevant to anything of interest, whether it be groups of people, animals or objects.

The number of observations in the population is defined to be the SIZE of the population.

Example: A newspaper website contains a poll asking people their opinion on a recent news
article. What is the population?
Answer: While the target (intended) population may have been all people, the real population of
the survey is the readers of the website.

On the other hand, since surveying an entire population is often impractical, we usually select a
sample to study.

Sample is a smaller subset of the entire population, ideally one that is fairly representative of the
whole population.

Example: A researcher wanted to know how citizens of Echague felt about a voter initiative. To
study this, she goes to the Echague SM Store and randomly selects 500 shoppers and asks
them their opinion. Sixty percent (60%) indicate they are supportive of the initiative. What is the
sample and population?

Answer: The sample is the 500 shoppers questioned. The population is less clear. While the
intended population of this survey was Echague citizens, the effective population was mall
shoppers. There is no reason to assume that the 500 shoppers questioned would be
representative of all Echague citizens.

Why? Because there might be a tendency that the shopper being asked is not from Echague, so
asking that sample would give an error to the survey.

PARAMETER AND STATISTIC


A parameter is a useful component of statistical analysis. It refers to the characteristics that are
used to define a given population. It is used to describe a specific characteristic of the entire
population. When making an inference about the population, the parameter is unknown because
it would be impossible to collect information from every member of the population. Rather, we
use a statistic of a sample picked from the population to derive a conclusion about the
parameter.

A parameter is used to describe the entire population being studied. For example, we want to
know the average length of a butterfly. It is a parameter because it is states something about
the entire population of butterflies.

Parameters are difficult to obtain, but we use the corresponding statistic to estimate its value. A
statistic describes a sample of a population, while a parameter describes the entire population.

Examples:
1) A researcher wants to estimate the average height of women aged 20 years or older. From a
simple random sample of 45 women, the researcher obtains a sample mean height of 63.9
inches.

2) A nutritionist wants to estimate the mean amount of sodium consumed by children under the
age of 10. From a random sample of 75 children under the age of 10, the nutritionist obtains a
sample mean of 2993 milligrams of sodium consumed.

3) Nexium is a drug that can be used to reduce the acid produced by the body and heal damage
to the esophagus. A researcher wants to estimate the proportion of patients taking Nexium that
are healed within 8 weeks. A random sample of 224 patients suffering from acid reflux disease
is obtained, and 213 of those patients were healed after 8 weeks.

4) A researcher wants to estimate the average farm size in Kansas. From a simple random
sample of 40 farms, the researcher obtains a sample mean farm size of 731 acres.

5) An energy official wants to estimate the average oil output per well in the United States. From
a random sample of 50 wells throughout the United States, the official obtains a sample mean of
10.7 barrels per day.

6) An education official wants to estimate the proportion of adults aged 18 or older who had
read at least one book during the previous year. A random sample of 1006 adults aged 18 or
older is obtained, and 835 of those adults had read at least one book during the previous year.

Answers: For each study, identify both the parameter and the statistic in the study:
1) The parameter is the average height of all women aged 20 years or older.
The statistic is the average height of 63.9 inches from the sample of 45 women.

2) The parameter is the mean amount of sodium consumed by children under the age of ten.
The statistic is the mean of 2993 milligrams of sodium obtained from the sample of 75 children.

3) The parameter is the proportion of patients healed by Nexium in 8 weeks.


The statistic is 213/224 = 0.951, the proportion healed in the sample.

4) The parameter is the average farm size in Kansas.


The statistic is the mean farm size of 731 acres from the sample of 40 farms.

5) The parameter is the average oil output per well in the United States.
The statistic is the mean oil output of 10.7 barrels per day from the sample of 50 wells.

6) The parameter is the proportion of adults 18 or older who read a book in the previous year.
The statistic is 835/1006 = 0.830, the proportion who read a book in the sample.

CATEGORIES OF SAMPLING METHODS


Probability Sampling
Probability Sampling is a sampling technique in which sample from a larger population are
chosen using a method based on the theory of probability.
Uses randomization to make sure that every element has an equal chance to become a sample.
For a participant to be considered as a probability sample, he/she must be selected using a
random selection.

The most important requirement of probability sampling is that everyone in your population has
a known and an equal chance of getting selected. For example, if you have a population of 100
people every person would have odds of 1 in 100 for getting selected. Probability sampling
gives you the best chance to create a sample that is truly representative of the population.

Probability sampling uses statistical theory to select randomly, a small group of people (sample)
from an existing large population and then predict that all their responses together will match the
overall population.
Types of Probability Sampling
1. Simple Random Sampling
- is a completely random method of selecting the sample.
- This sampling method is as easy as assigning numbers to the individuals (sample) and
then randomly choosing from those numbers through an automated process. Finally, the
numbers that are chosen are the members that are included in the sample.

Every element has an equal chance of getting selected to be the part sample.

There are two ways in which the samples are chosen in this method of sampling: Lottery system
and using number generating software/ random number table. This sampling technique usually
works around large population and has its fair share of advantages and disadvantages.

2. Stratified Random Sampling


- Involves a method where a larger population can be divided into smaller groups, that
usually don’t overlap but represent the entire population together. While sampling these
groups can be organized and then draw a sample from each group separately.

The population is divided into a number of subgroups (strata) before taking samples randomly.
The number of samples in each group is proportional to the size of the subgroup to the
population.

A common method is to arrange or classify by sex, age, ethnicity and similar ways. Splitting
subjects into mutually exclusive groups and then using simple random sampling to choose
members from groups.

Members in each of these groups should be distinct so that every member of all groups gets
equal opportunity to be selected using simple probability. This sampling method is also called
“random quota sampling”.
3. Cluster Random Sampling
- is a way to randomly select participants when they are geographically spread out.

The population is divided into subgroups and a set of subgroup are selected to be in the
sample.
For example, if you wanted to choose 100 participants from the entire population of the U.S., it
is likely impossible to get a complete list of everyone. Instead, the researcher randomly selects
areas (i.e. cities or counties) and randomly selects from within those boundaries.

Cluster sampling usually analyzes a particular population in which the sample consists of more
than a few elements, for example, city, family, university etc. The clusters are then selected by
dividing the greater population into various smaller sections.

4. Systematic Sampling
- is when you choose every “nth” individual to be a part of the sample.

For example, you can choose every 5th person to be in the sample. Systematic sampling is an
extended implementation of the same old probability technique in which each member of the
group is selected at regular periods to form a sample. There’s an equal opportunity for every
member of a population to be selected using this sampling technique.
What are the steps involved in Probability Sampling?
1. Choose your population of interest carefully: Carefully think and choose from the population,
people you think whose opinions should be collected and then include them in the sample.

2. Determine a suitable sample frame: Your frame should include a sample from your population
of interest and no one from outside in order to collect accurate data.

3. Select your sample and start your survey: It can sometimes be challenging to find the right
sample and determine a suitable sample frame. Even if all factors are in your favor, there still
might be unforeseen issues like cost factor, quality of respondents and quickness to respond.
Getting a sample to respond to true probability survey might be difficult but not impossible.

But, in most cases, drawing a probability sample will save you time, money, and a lot of
frustration. You probably can’t send surveys to everyone but you can always give everyone a
chance to participate, this is what probability sample is all about.

When to Use Probability Sampling?


1. When the sampling bias has to be reduced: This sampling method is used when the bias has
to be minimum. The selection of the sample largely determines the quality of the research’s
inference. How researchers select their sample largely determines the quality of a researcher’s
findings. Probability sampling leads to higher quality findings because it provides an unbiased
representation of the population.

2. When the population is usually diverse: When your population size is large and diverse this
sampling method is usually used extensively as probability sampling helps researchers create
samples that fully represent the population. Say we want to find out how many people prefer
medical tourism over getting treated in their own country, this sampling method will help pick
samples from various socio-economic strata, background etc to represent the bigger population.

3. To create an accurate sample: Probability sampling help researchers create an accurate


sample of their population. Researchers can use proven statistical methods to draw accurate
sample size to obtained well-defined data.

Advantages of Probability Sampling


1. It’s Cost-effective: This process is both cost and time effective and a larger sample can also
be chosen based on numbers assigned to the samples and then choosing random numbers
from the bigger sample. Work here is done.

2. It’s simple and easy: Probability sampling is an easy way of sampling as it does not involve a
complicated process. Its quick and saves time. The time saved can thus be used to analyze the
data and draw conclusions.

3. It non-technical: This method of sampling doesn’t require any technical knowledge because
of the simplicity with which this can be done. This method doesn’t require complex knowledge
and it’s not at all lengthy.

Non-Probability Sampling
- is a sampling technique in which the researcher selects samples based on the subjective
judgment of the researcher rather than random selection.
- Does not rely on randomization thus, not all the elements has chance to become a
sample
In non-probability sampling, not all members of the population have a chance of participating in
the study unlike probability sampling, where each member of the population has a known
chance of being selected.

Non-probability sampling is most useful for exploratory studies like pilot survey (a survey that is
deployed to a smaller sample compared to pre-determined sample size). Non-probability
sampling is used in studies where it is not possible to draw random probability sampling due to
time or cost considerations.

Non-probability sampling is a less stringent method; this sampling method depends heavily on
the expertise of the researchers. Non-probability sampling is carried out by methods of
observation and is widely used in qualitative research.

Types of Non-Probability Sampling


1. Convenience Sampling
- is a non-probability sampling technique where samples are selected from the population
only because they are conveniently available to researcher.

Samples are chosen by selecting whoever is available

These samples are selected only because they are easy to recruit and researcher did not
consider selecting sample that represents the entire population.

Ideally, in research, it is good to test sample that represents the population. But, in some
research, the population is too large to test and consider the entire population. This is one of the
reasons, why researchers rely on convenience sampling, which is the most common non-
probability sampling technique, because of its speed, cost-effectiveness, and ease of availability
of the sample.
An example of convenience sampling would be using student volunteers known to researcher.
Researcher can send the survey to students and they would act as sample in this situation.

2. Consecutive Sampling
- This non-probability sampling technique is very similar to convenience sampling, with a
slight variation. Here, the researcher picks a single person or a group of sample,
conducts research over a period of time, analyzes the results and then moves on to
another subject or group of subject if needed.

Consecutive sampling gives the researcher a chance to work with many subjects and fine tune
his/her research by collecting results that have vital insights.

3. Quota Sampling
- is a variation on stratified sampling, wherein samples are collected in each subgroup
until the desired quota is met.

Hypothetically consider, a researcher wants to study the career goals of male and female
employees in an organization. There are 500 employees in the organization. These 500
employees are known as population. In order to understand better about a population,
researcher will need only a sample, not the entire population. Further, researcher is interested in
particular strata within the population. Here is where quota sampling helps in dividing the
population into strata or groups.
For studying the career goals of 500 employees, technically the sample selected should have
proportionate numbers of males and females. This means there should be 250 males and 250
females. Since, this is unlikely, the groups or strata are selected using quota sampling.

4. Judgmental or Purposive Sampling


- the samples are selected based purely on researcher’s knowledge and credibility. In
other words, researchers choose only those who he feels are a right fit (with respect to
attributes and representation of a population) to participate in research study.

This is not a scientific method of sampling and the downside to this sampling technique is that
the results can be influenced by the preconceived notions of a researcher. Thus, there is a high
amount of ambiguity involved in this research technique.

For example, this type of sampling method can be used in pilot studies.

5. Snowball Sampling
- helps researchers find sample when they are difficult to locate. Researchers use this
technique when the sample size is small and not easily available.

This sampling system works like the referral program. Once the researchers find suitable
subjects, they are asked for assistance to seek similar subjects to form a considerably good size
sample.

For example, this type of sampling can be used to conduct research involving a particular illness
in patients or a rare disease. Researchers can seek help from subjects to refer other subjects
suffering from the same ailment to form a subjective sample to carry out the study.

6. Voluntary Response Sampling


- allowing sample to volunteer.

7. Referral Sampling
- samples are based on referrals.

When to Use Non-Probability Sampling?


1. This type of sampling is used to indicate if a particular trait or characteristic exists in a
population.
2. This sampling technique is widely used when researchers aim at conducting qualitative
research, pilot studies or exploratory research.
3. Non-probability sampling is used when researchers have limited time to conduct researcher
or have budget constraints.
4. Non-probability sampling is conducted to observe if a particular issue needs in-depth
analysis.

Advantages of Non-Probability Sampling


1. Non-probability sampling is a more conducive and practical method for researchers deploying
survey in the real world. Although statisticians prefer probability sampling because it yields data
in the form of numbers. However, if done correctly, non-probability sampling can yield similar if
not the same quality of results.
2. Getting responses using non-probability sampling is faster and more cost-effective as
compared to probability sampling because sample is known to researcher, they are motivated to
respond quickly as compared to people who are randomly selected.

Disadvantages of Non-Probability Sampling


1. In non-probability sampling, researcher needs to think through potential reasons for biases. It
is important to have a sample that represents closely the population.

2. While choosing a sample in non-probability sampling, researchers need to be careful about


recruits distorting data. At the end of the day, research is carried out to obtain meaningful
insights and useful data.

SOURCES OF BIAS
1. Sampling Bias – when sample is not a representative of a population
2. Voluntary Response Bias –often occurs when the samples are volunteers
3. Self- Interest Study – can occur when the researchers have an interest in the outcome
4. Response Bias – when the responder gives inaccurate responses for any reasons
5. Perceived Lack of Anonymity – when the responder fears giving an honest answer
6. Loaded Questions – when the question wording influences the responses
7. Non- Response Bias – when people refuse to participate in the study

2 General Types of Studies


1. Observational/Correlational Study – study based on observations and measurements.
2. Experimental Study – study in which the effects of a treatment are measured. Its goal is to
establish a cause-and-effect relationship.

Types of Variables
1. Independent Variable/Treatment – the variable that is manipulated by the researcher.
2. Dependent Variable – one that is observed for changes in order to assess the effect of the
treatment.

Types of Groups in Experimental Samples


1. Control Group – is the group that does not receive the treatment
2. Treatment/Experimental Group – does receive the experimental treatment

Datum, Data, and Data Set


Datum is a single measurement or observation
Data are measurements or observations
Data set is a collection of measurements or observations

Types of Data
● Types of Numerical/Quantitative
Interval – numbers/scale w/o true ZERO
Ratio – numbers/scale w/ true ZERO
Discrete – only certain values are possible
Continuous – Any value

● Types of Categorical/Qualitative
Nominal – Unordered categories (eye color, gender)
Ordinal – ordered categories (taxonomic classification, Educational attainment)
SUMMATION NOTATION
In statistics, it is frequently necessary to work with sums of numerical values. For example, we
may wish to compute the average cost of a certain brand of toothpaste sold at different stores.

Consider a controlled experiment in which the decreases in weight over a 6-month period were
15, 10, 18, and 6 kilograms, respectively. If we designate the first recorded value x 1 = 15, x2 =
10, x3 = 18 and x4 = 6. Then, we can use the capital sigma letter Σ to indicate “summation of”,
where we can now write the sum of the 4 weights as
4

∑ 𝑥𝑖
𝑖=1
where we read “summation of xi going from 1 to 4.”

The numbers 1 and 4 are called the lower and upper limits of summation.

Hence,
4

∑ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4
𝑖=1
= 15 + 10 + 18 + 6
= 49

Also,
3

∑ 𝑥𝑖 = 𝑥2 + 𝑥3
𝑖=2
= 10 + 18
= 28

In general, the symbol


𝑛


𝑖=1
means that we replace i wherever it appears after the summation symbol by 1, then by 2, and
so on up to n, and then add up the terms. Therefore, we can write

3 5

∑ 𝑥𝑖2 = 𝑥12 + 𝑥22 + 𝑥32 or ∑ 𝑥𝑗 𝑦𝑗 = 𝑥2 𝑦2 + 𝑥2 𝑦2 + 𝑥2 𝑦2 + 𝑥2 𝑦2


𝑖=1 𝑗=2

The subscript may be any letter, although i, j and k seem to be preferred by statisticians.

Obviously,
𝑛 𝑛

∑ 𝑥𝑖 = ∑ 𝑥𝑗
𝑖=1 𝑗=1

The lower limit of summation is not necessarily a subscript. For instance, the sum of the natural
numbers from 1 to 9 may be written:
9

∑ 𝑥 = 1 + 2 + ⋯ + 9 = 45
𝑥=1

When we are summing over all the values xi that are available, the limits of summation are
often omitted and we simply write Σxi. If in the diet experiment only 4 people were involved,
then Σxi = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 .

Example 4:

If X1 = 3, x2 = 5 and x3 = 7, find

a. Σxi b. ∑3𝑖=1 2𝑥𝑖 2 c. ∑3𝑖=1(𝑥𝑖 − 𝑖)

Solution:
a. Σxi = x1 + x2 + x3 = 3 + 5 + 7 = 15

b. ∑3i=1 2xi 2 = 2x1 2 + 2x2 2 + 2x3 2 = 18 + 50 + 98 = 166

c. ∑3i=2(xi − i) = (x2 − 2) + (x3 − 3) = 3 + 4 = 7

Example 5:

Given X1 = 2, x2 = -3, x3 = 1, y1 = 4, y2 = 2 and y3 = 5, evaluate

a. ∑3𝑖=1 𝑥𝑖 𝑦𝑖 b. (∑3𝑖=2 𝑥𝑖 )(∑2𝑖=1 𝑦𝑗 2 )

Solution:
a. ∑3𝑖=1 𝑥𝑖 𝑦𝑖 = 𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 = (2)(4) + (-3)(2)+ (1)(5)= 7

b. (∑3𝑖=2 𝑥𝑖 )(∑2𝑖=1 𝑦𝑗 2 ) = (𝑥2 + 𝑥3 ) + (𝑦1 2 + 𝑦2 2 ) = (-2)(20) = - 40

Theorem 1:
The summation of the sum of two or more variables is the sum of their summations. Thus
𝑛 𝑛 𝑛 𝑛

∑(𝑥𝑖 + 𝑦𝑖 + 𝑧𝑖 ) = ∑ 𝑥𝑖 + ∑ 𝑦𝑖 + ∑ 𝑧𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1

Proof: Expanding the left side and regrouping, we have,


𝑛

∑(𝑥𝑖 + 𝑦𝑖 + 𝑧𝑖 )
𝑖=1
= (𝑥1 + 𝑦1 + 𝑧1 ) + (𝑥2 + 𝑦2 + 𝑧2 ) + … + (𝑥𝑛 + 𝑦𝑛 + 𝑧𝑛 )

= (𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 ) + (𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛 ) + (𝑧1 + 𝑧2 + ⋯ + 𝑧𝑛 )

= ∑𝑛𝑖=1 𝑥𝑖 + ∑𝑛𝑖=1 𝑦𝑖 + ∑𝑛𝑖=1 𝑧𝑖


Theorem 2:
If c is a constant, then
𝑛 𝑛

∑ 𝑐𝑥𝑖 = 𝑐 ∑ 𝑥𝑖
𝑖=1 𝑖=1

Proof: Expanding the left side and factoring, we have,


𝑛

∑ 𝑐𝑥𝑖 = 𝑐𝑥1 + 𝑐𝑥2 + … + 𝑐𝑥𝑛


𝑖=1
= 𝑐(𝑥1 + 𝑥2 + … + 𝑥𝑛 )

= 𝑐 ∑𝑛𝑖=1 𝑥𝑖

Theorem 3:
If c is a constant, then
𝑛

∑ 𝑐 = 𝑛𝑐
𝑖=1
Proof: If in Theorem 2 all the xi are equal to 1, then
𝑛

∑𝑐 = 𝑐 +𝑐 + …+ 𝑐
𝑖=1
= nc

Example 6:
If x1 = 2, x2 = 4, y1 = 3, y2 = -1, find the value of
2

∑(3𝑥𝑖 − 𝑦𝑖 + 4)
𝑖=1

=∑2𝑖=1 3𝑥𝑖 − ∑2𝑖=1 𝑦𝑖 + ∑2𝑖=1 4

=3 ∑2𝑖=1 𝑥𝑖 − ∑2𝑖=1 𝑦𝑖 + 2(4)

= (3)(2+4) – (3-1) + 8

= 24

Example 7:
Simplify
3

∑(𝑥 − 𝑖 )2
𝑖=1

=∑3𝑖=1(𝑥 2 − 2𝑥𝑖 + 𝑖 2 )

=∑3𝑖=1 𝑥 2 − ∑3𝑖=1 2𝑥𝑖 + ∑3𝑖=1 𝑖 2

=3𝑥 2 − 2𝑥 ∑3𝑖=1 𝑖 + ∑3𝑖=1 𝑖 2


= 3x2 - 2x (1+2+3) + (1+4+9)

= 3x2 -12x + 14

MEASURES OF CENTRAL LOCATION


To investigate a set of quantitative data, it is useful to define numerical measures that describe
important features of the data. One of the important ways of describing a group of
measurements, whether it be sample or a population, is by the use of an average.

An average is a measure of the center of a set of data when the data are arranged in an
increasing or decreasing order of magnitude.

Any measure indicating the center of a set of data, arranged in an increasing or decreasing
order of magnitude is called measure of central location or a measure of central tendency.
The most commonly used measures of central location are the mean, median and mode.

Mean of Ungrouped Data


Population Mean. If the set of data x1, x2 … xN, not necessarily all distinct, represents a finite
population of size N, then the population mean is
∑𝑁𝑖=1 𝑥𝑖
𝝁 =
𝑁
Example 1:
The numbers of employees at 5 different drugstores are 3, 5, 6, 4, and 6. Treating the data as
population, find the mean number of employees for the five stores.

Solution:
∑ 𝑥𝑖 3+5+6+4+6
Since the data are considered to be a finite population, 𝝁 = = = 𝟒. 𝟖
𝑁 5

Sample Mean. If the set of data x1, x2 … xn, not necessarily all distinct, represents a finite
sample of size n, then the sample mean is
∑𝑛𝑖=1 𝑥𝑖
𝑥̅ =
𝑛
Example 2 (Ungrouped Data): A food inspector examined a random sample of 7 cans of a
certain brand of tuna to determine the percent of foreign impurities. The following data were
recorded: 1.8, 2.1, 1.7, 1.6, 0.9, 2.7 and 1.8. Compute the sample mean.

∑ 𝑥𝑖 1.8+2.1+1.7+1.6+0.9+2.7+1.8
Solution: This being a sample, we have 𝑥̅ = = = 1.8%
𝑛 7

Mean of Grouped Data


The elements in a set of data may be arranged and grouped resulting in a frequency distribution
table. There are two methods of computing the mean of the grouped data: long method and
coded-deviation method

For long method:


∑ 𝐹𝑥 ∑ 𝑓𝑥
𝝁= or 𝑥̅ =  Long Method Formula
𝑁 𝑛
Example 3 (Grouped Data):
Class Interval f x (midpoint) fx
90-94 7 92 644
85-89 13 87 1131
80-84 16 82 1312
75-79 8 77 616
70-74 6 72 432
N= 50 Σfx = 4135
∑ 𝑓𝑥 4135
𝑥̅ = = = 𝟖𝟐. 𝟕 𝒐𝒓 𝟖𝟑
𝑛 50

For coded-deviation method


Step 1: Take the midpoint of one of the intervals, an arbitrary reference point, as assumed mean
𝑥̅ ’. As far as the result is concerned, it makes no difference which interval midpoint is used.
Step 2: After the frequency and midpoint columns, f and x, make another column to represent
unit deviations from the arbitrary reference point. Label it as d’. Since there is no deviation at the
reference point, place zero at that point and complete the column; positive deviations above and
negative deviations below it.
Step 3: Make another column and label it as fd’. Multiply each f by its d’ and enter the product in
this column.
Step 4: Get the sum of the fd’ to obtain Σfd’.
To find the mean of a grouped data using the coded-deviation method, use the formula:

∑ 𝑓𝑥
𝑥̅ = 𝑥̅ ′ + (𝑖)
𝑛
Example 4:
Class Interval f x (midpoint) d’ fd’
90-94 7 92 2 14
85-89 13 87 1 13
80-84 16 82 0 0
75-79 8 77 -1 -8
70-74 6 72 -2 -12
N= 50 Σfd’ = 7

∑ 𝑓𝑥 7
𝑥̅ = 𝑥̅ ′ + (𝑖) = 82 + (5) = 82 + 0.7 = 82.7 𝑜𝑟 83
𝑛 50

The second most useful measure of central location is the median. For a population we
designate the median by 𝜇̅ and for a sample we write 𝑥̅ .

Median of Ungrouped Data


The median (ungrouped data) of a set of observations arranged in an increasing or decreasing
order of magnitude is the middle value when the number of observations is odd or the arithmetic
mean of two middle values when the number of observations is even.

Example 5 (Ungrouped): On 5 term tests in sociology, a student has made grades of 82, 93,
86, 92, and 79. Find the median for this population of grades.

Solution: Arranging the grades in an increasing order of magnitude, we get 79, 82, 86, 92, 93
and hence, 𝜇̅ = 86

Example 6 (Ungrouped): The nicotine contents for a random sample of 6 cigarettes of a


certain brand are found to be 2.3, 2.7, 2.5, 2.9, 3.1, and 1.9 milligrams. Find the median.

Solution: If we arrange these nicotine contents in an increasing order of magnitude, we get 1.9,
2.3, 2.5, 2.7, 2.9, 3.1 and the median is then the mean of 2.5 and 2.7.

Therefore,
𝟐.𝟓+𝟐.𝟕
𝑥̅ = = 2.6 mg
𝟐

Median of Grouped Data


For large quantities of data, the median is computed using a frequency distribution with a
𝑛
cumulative frequency column. The middle item is then determined by ( ) where n is the number
2
of items (size of sample or population). To find the median of a grouped data, use the formula:
𝑛
−𝐹
𝑥̅ = 𝐿 + [2 ]𝑖
𝑓
where: L= exact lower limit of the median class; n= total number of items; F= “less than” or
“equal to” cumulative frequency preceding the class interval containing the median; f= frequency
of the median class; i= size of the class interval

Example 7 (Grouped):
Exact Lower Limit or Cumulative
Scores Frequency (f)
Lower Boundary (L) Frequency (F)
90-94 7 89.5 50
85-89 13 84.5 43
80-84 16 79.5 30
75-79 8 74.5 14
70-74 6 69.5 6
i=5 N=50

Given: N=50; N/2 = 25; L=79.5; F = 14; f = 16; i = 5


𝑛
−𝐹 25 −14
Solution: 𝑥̅ = 𝐿 + [ 2 ] 𝑖 = 79.5 + [ ] 5 = 82.94
𝑓 16
Mode of Ungrouped Data
The mode of a set of observations is that value which occurs most often or with the greatest
frequency. The mode does not always exist. This is certainly true when all observations occur
with the same frequency.

Example 8 (Ungrouped): If the donations from the residents of Fairway Forest toward the
Virginia Lung Association are recorded as 9, 10, 5, 9, 9, 8, 6, 10 and 11 dollars, then 9 dollars,
the value that occurs with the greatest frequency, is the mode.
The number of movies attended last month by a random sample of 12 high school students
were recorded as follows: 2, 0, 3, 1, 2, 4, 2, 5, 4, 0, 1, and 4. In this case, there are two modes,
2 and 4, since both 2 and 4 occur with the greatest frequency. The distribution is said to be
bimodal.

Mode of Grouped Data


In a grouped distribution, the class interval where the value with the highest frequency is the
modal class. The midpoint of the interval is the mode.

To find the mode of a grouped data, use the formula:


𝑑
Mo= Lmo + [ 1 ] 𝑖
𝑑1 + 𝑑2
where: Lmo = exact lower limit of the modal class; d 1 = the difference between the frequency of
the modal class and that of the frequency below the modal class; d 2= the difference between the
frequency of the modal class and that of the frequency above the modal class; i= the size of the
class interval

Example 9 (Grouped):
Exact Lower Limit or Cumulative
Scores Frequency (f)
Lower Boundary (L) Frequency (F)
90-94 7 89.5 50
85-89 13 84.5 43
80-84 16 79.5 30
75-79 8 74.5 14
70-74 6 69.5 6
i=5 N=50

Given: Lmo= 79.5; d1= 16 - 8= 8; d2= 16 - 13= 3; i=5


𝑑 8
Solution: Mo= Lmo + [ 1 ] 𝑖 = 79.5 + [ ] 5 = 83.14
𝑑1 + 𝑑2 8+3

MEASURES OF VARIATION
The three measures of central location do not by themselves give an adequate description of
the data. We need to know how the observations spread out from the average. It is quite
possible to have two sets of observations with the same mean or median that differs
considerably in the variability of their measurements about the average.
Example 10: Consider the following measurements, in liters, for two samples of orange juice
bottled by companies A and B:
Sample A 0.97 1.00 094 1.03 1.11
Sample B 1.06 1.01 0.88 0.91 1.14
Both samples have the same mean of 1 liter. It is quite obvious that company A bottles orange
juice with a more uniform content than company B. We say that the variability or the dispersion
of the observations from the average is less for sample A than for sample B. Therefore, in
buying orange juice, we would feel more confident that the bottle we select will be closer to the
advertised average if we buy from company A.

The most important statistics for measuring the variability of a set of data are the range and the
variance. The simplest of these to compute is the range.

Range of Ungrouped Data


The range (ungrouped) of a set of data is the difference between the largest and smallest
number in the set. It is the measure which requires the simples computation.

Range = highest value – lowest value

Example 11 (Ungrouped): The IQs of 5 members of a family are 108, 112, 127, 118, and 113.
Find the range.

Solution: The range of the 5 IQs is 127 – 108 = 19

Range of Grouped Data


In grouped data, the range is the difference between the exact lower limit of the lowest interval
and the exact upper limit of the highest class interval.

Example 12 (Grouped)
Scores f Lower Limit Higher Limit
90-94 7 89.5 94.5
85-89 13 84.5 89.5
80-84 16 79.5 84.5
75-79 8 74.5 79.5
70-74 6 69.5 74.5

The range is 94.5 – 69.5 = 25

The range is a poor measure of variation, particularly if the size of the sample or population is
large. It considers only the extreme values and tells us nothing about the distribution of numbers
in between. Consider, for example, the following two sets of data, both with range of 12.

Set A 3 4 5 6 8 9 10 12 15
Set B 3 7 7 7 8 8 8 9 15
In set A the mean and median are both 8, but the numbers vary over the entire interval from 3 to
15.

In set B the mean and median are also 8, but most of the values are closer to the center of the
data.

Although the range fails to measure this variation between the upper and lower observations, it
does not have useful applications. In industry the range for measurements on items coming off
an assembly line might be specified in advance. As long as all measurements fall within the
specified range, the process is said to be in control.

Variance
To overcome the disadvantage of the range, we shall consider a measure of variation, namely,
the variance that considers the position of each observation relative to the mean of the set. This
is accomplished by examining the deviations from the mean.

The deviation of an observation from the mean is found by subtracting the mean of set of data
from the given observation.

For the finite population x1 , x2 ,…, xN , the deviations are x1 – μ, x2 – μ, …, xN – μ.


Similarly, if the set of data is the random sample x1 , x2 ,…, xn, the deviation are
x1 – 𝑥̅ , x2 – 𝑥̅ , …, xN – 𝑥̅ .

Note: An observation greater than the mean will yield a positive deviation, whereas an
observation smaller than the mean will produce a negative deviation.

Comparing the deviations for the two sets of data above, we have the following
Set A 3 4 5 6 8 9 10 12 15
Set B 3 7 7 7 8 8 8 9 15

The mean for both sets is 8,


Subtracting 8 from the given observations we will get

Set A -5 -4 -3 -2 0 1 2 4 7
Set B -5 -1 -1 -1 0 0 0 1 7

Clearly, most of the deviations of set B are smaller in magnitude than those of set A, indicating
less variation among the observations of set B. Our aim now is to obtain a single numerical
measure of variation that incorporates all the deviations from the mean.

Population Variance. Given the finite population x1, x2, … , xN, the population variance is
∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2 ∑𝑁
𝑖=1 𝑓(𝑥𝑖 −𝜇)
2
𝜎2 = =
𝑁 𝑁

Where: 𝜎 2 = Population Variance


N= Size of Population
𝜇= Population Mean
Xi= Observations

Assuming that the two sets A and B are populations (from our previous example), we now use
the deviations in the table below to calculate their variance.

Set A -5 -4 -3 -2 0 1 2 4 7
Set B -5 -1 -1 -1 0 0 0 1 7

Compute the square of the deviation, we get,


Set A 25 16 9 4 0 1 4 16 49
Set B 25 1 1 1 0 0 0 1 49

Compute the variance,


∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2 25+16+⋯+49 124
Set A: 𝜎 2 = = = = 13.78
𝑁 9 9
∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2 25+1+⋯+49 78
Set B: 𝜎 2 = = = = 8.67
𝑁 9 9

A comparison of the two variances shows that the data of set A are more variable than the data
of set B.

On the other hand, the variance of a sample, denoted by s 2, is a statistic. Therefore, different
random samples of size n, selected from the entire population, would generally yield different
values for s2. In most statistical applications, the parameter 𝜎 2 is unknown and is estimated by
the value s2. For our estimate to be good, it must be computed from a formula that on the
average produces the true answer 𝜎 2 . That is, if we were to take all the possible random
samples of size n from a population and compute s 2 for each sample, the average of all the s 2
values should be equal to 𝜎 2 . A statistic that estimates the true parameter on the average is
said to be unbiased.

Intuitively, we would expect the formula for s 2 to be the summation formula as that used for 𝜎 2 ,
with the summation now extending over the sample observations and with μ replaced by Ẋ. This
is indeed done in many texts, but the values so computed for the sample variance tend to
underestimate 𝜎 2 on the average. To compensate for this bias, we replace n by n-1 in the
divisor.

Sample Variance. Given a random sample x1, x2, … , xn, the sample variance is
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 ∑𝑛
𝑖=1 𝑓(𝑥𝑖 −𝑥̅ )
2
𝑠2 = =
𝑛−1 𝑛−1

Where: 𝑠2 = Sample Variance


n = Size of Sample
𝑥̅ = Sample Mean
xi = Observations

Suppose the observations or data are from a random sample, then the sample variance is
Set A -5 -4 -3 -2 0 1 2 4 7
Set B -5 -1 -1 -1 0 0 0 1 7
Set A 25 16 9 4 0 1 4 16 49
Set B 25 1 1 1 0 0 0 1 49

∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
25+16+⋯+49 124
Set A: 𝑠2 = = = = 15.5
𝑛−1 9−1 8

∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
25+1+⋯+49 78
Set B: 𝑠2 = = = = 9.75
𝑛−1 9−1 8

By using the squares of the deviations to compute the variance, we obtain a number in squared
units. That is, if the original measurements were in feet, the variance would be expressed in
square feet. To get a measure of variation expressed in the same units as raw data, as was the
case for the range, we take the square root of the variance. Such a measure is called the
standard deviation.

The standard deviation is the measure of the variation of a set of data in terms of the amounts
by which the individual values differ from their mean. It is considered as the most stable
measure of spread, and is usually preferred in experimental and research studies where in-
depth statistical analysis of data in involved. It is affected by the value of each data.

To find the standard deviation of the population variance (𝜎 2 ) and sample variance (s2) , just
take the square root.
∑𝑁
𝑖=1(𝑥𝑖 −𝜇)
2 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
s = √𝜎 2 = √𝑠 2 = √ =√
𝑁 𝑛−1

Example 13 (Ungrouped): Calculate the variance and the standard deviation of the given
scores
Scores d
Standard Deviation (d2)
(x) ̅)
(x-𝒙
45 0.2 0.4
42 -2.8 7.84
46 1.2 1.44
43 -1.8 3.24
48 3.2 10.24
𝑥̅ = 44.8 Σd2=23.16

∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 23.16 23.16


𝑠2 = = = = 5.79
𝑛−1 5−1 4

∑𝑛 (𝑥 − 𝑥̅ )2
√𝑠2 = √ 𝑖=1 𝑖 = √5.79 = 2.41
𝑛−1
Example 14 (Grouped):
Deviation Squared
Frequency Midpoint
Scores f(x) (d) Deviation f(d2)
(f) (x)
(x-𝐱̅) (d2)
90-94 7 92 644 9 81 567
85-89 13 87 1131 4 16 208
80-84 16 82 1312 -1 1 16
75-79 8 77 616 -6 36 288
70-74 6 72 432 -11 121 726
𝑥̅ =83 n= 50 Σf(x)=4135 Σf(d2)= 1805

∑𝑛𝑖=1 𝑓(𝑥𝑖 − 𝑥̅ )2 1805 1805


𝑠2 = = = = 36.84
𝑛−1 50 − 1 49

∑𝑛 𝑓(𝑥𝑖 − 𝑥̅ )2
√𝑠2 = √ 𝑖=1 = √36.84 = 6.07
𝑛−1

CHEBYSHEV’S THEOREM
In sections 2.2 and 2.3 we described a set of observations--a population or a sample—by
means of a center or average and the variability about this average. The two values most often
used by statisticians are the mean and the standard deviation. If a distribution of measurements
has a small standard deviation, we would expect most of the values to be grouped closely
around the mean. However, a large value of the standard deviation indicates a greater
variability, in which case we would expect the observation to be more spread out from the
mean.

The Russian mathematician P.L. Chebyshev (1821-1894) discovered that the fraction of the
measurements falling between any two values symmetric about the mean is related to the
standard deviation. Chebyshev’s Theorem gives a conservative estimate of the fraction of
measurements falling k standard deviations of the mean for any fixed number k.

Chebyshev’s Theorem. At least the fraction 1 – 1/k2 of the measurements of any set of data
must lie within k standard deviations of the mean.

Example 15: If the IQs of a random sample of 1080 students at a large university have a mean
score of 120 and a standard deviation of 8, use Chebyshev’s Theorem to determine the interval
containing at least 810 of the IQs in the sample.

From this interval draw a statistical inference concerning the IQs of all students at this
university. In what range can we be sure that no more than 120 of the scores fall?

Solution: Applying the formula 1 – (1/k2)


1 810 3
1- 2= =
𝑘 1080 4
We find that k=2
𝑥̅ ± 2𝑠 = 120 ± 2(8) = 120 ± 16

Which means that the interval from 104 to 136 contains at least ¾ or at least 810 of the IQs of
our sample. From this result, one might make the inference that at least ¾ of the IQs for the
entire university fall in the interval from 104 to136.

Z-SCORES
An observation, x, from a population with mean μ and standard deviation 𝜎, has a z-score or z
value defined by
𝑥−𝜇
𝑧=
𝜎
Where: z= z value
x = observation
𝜇 = population mean
𝜎 = population standard deviation

A z score measures how many standard deviations an observation is above or below the mean.
Since 𝜎 is never negative, a positive z score measures the number of standard deviations an
observation is above the mean, and a negative z score gives the number of standard deviations
an observation is below the mean. Note that the units of the denominator and the numerator of
a z score cancel. Hence a z score is unitless, thereby permitting a comparison of two
observations relative to their groups, measured in completely different units.

Example 16: Different typing skills are required for secretaries depending on whether one is
working in a law office, an accounting firm or for a research mathematical group at a major
university. In order to evaluate candidates for these positions, an employment agency
administers three distinct standardized typing samples. A time penalty has been incorporated
into the scoring of each sample based on the number of typing errors. The mean and standard
deviation for each test, together with the score achieved by a recent applicant are given in the
table.
Sample Applicant’s Score Mean Standard Deviation
Law 141sec 180 sec 30 sec
Accounting 7min 10 min 2 min
Scientific 33 min 26 min 5 min

For what type of position does this applicant seem to be suited?

𝒙−𝝁 𝟏𝟒𝟏 − 𝟏𝟖𝟎


𝑳𝒂𝒘: 𝒛 = = = −𝟏. 𝟑
𝝈 𝟑𝟎
𝒙−𝝁 𝟕 − 𝟏𝟎
𝑨𝒄𝒄𝒐𝒖𝒏𝒕𝒊𝒏𝒈: 𝒛 = = = −𝟏. 𝟓
𝝈 𝟐
𝒙−𝝁 𝟑𝟑 − 𝟐𝟔
𝑺𝒄𝒊𝒆𝒏𝒕𝒊𝒇𝒊𝒄: 𝒛 = = = 𝟏. 𝟒
𝝈 𝟓
Since speed is of primary importance, we are looking for the z-score that represents the
greatest number of standard deviations to the left of the mean, and that is -1.5. Meaning, the
applicant is most suitable for accounting firms.

FREQUENCY DISTRIBUTIONS
Important characteristics of a large mass of data can be readily assessed by grouping the data
into different classes and then determining the number of observations that fall in each of the
classes. Such an arrangement, in tabular form, is called a frequency distribution.

Data that are presented in the form of a frequency distribution are called grouped data. We
often group the data of a sample into intervals to produce a better overall picture of the unknown
population, but in so doing we lose the identity of the individual observations in the sample.

Example 1:
Weight (Kilograms) Number of Pieces
7-9 2
10-12 8
13-15 14
16-18 19
19-21 7

For this data we have used 5 class intervals namely 7-9, 10-12, 13-15, 16-18, and 19-21.

We often group the data of a sample into intervals to produce a better overall picture of the
unknown population, but in so doing we lose the identity of the individual observations in the
sample.

The original data were recorded to the nearest kilogram, so the 8 observations in the interval
10-12 are the weights of all the pieces of luggage weighing more than 9.5 kilograms but less
than 12.5 kilograms. The numbers 9.5 and 12.5 are called the class boundaries for the given
interval.

The smallest and largest values that can fall in a given class interval are referred to as its class
limits.

The smallest value in the class interval is called the lower class limit, while the largest value is
called the upper class limit.

Class boundaries are always carried out to one more decimal place than the recorded
observations. This ensures that no observation can fall precisely on a class boundary and
thereby avoids any confusion as to which class the observation belongs.

Lower Class Boundary- is subtracting 0.5 from the lower class limit of the class interval.

Upper Class Boundary- is adding 0.5 to the upper class limit of the class interval.
Class Frequency- It is the number of observations falling in a particular class and is denoted by
the letter f.

Class Width - It is the numerical difference between the upper and lower class boundaries of a
class interval.

Class Mark or Class Midpoint - It is the midpoint between the upper and lower class
boundaries or class limits of a class interval.

Example 2:
Weight (Kilograms) Number of
Class Boundaries Class Mark (x)
(c) Pieces (f)
7-9 2 6.5-9.5 8
10-12 8 9.5-12.5 11
13-15 14 12.5-15.5 14
16-18 19 15.5-18.5 17
19-21 7 18.5-21.5 20

Steps in Grouping Large Set of Data into Frequency Distribution


1. Decide on the number of class intervals required
2. Determine the range
3. Divide the range by the number of classes to estimate the approximate with of the interval
4. List the lower class limit of the bottom interval and then the lower class boundary. Add the
class width to the lower class boundary to obtain the upper class boundary. Write down the
upper class limit.
5. List all the class limits and class boundaries by adding the class width to the limits and
boundaries of the previous interval.
6. Determine the class marks of each interval by averaging the class limits or the class
boundaries.
7. Tally the frequencies for each class
8. Sum the frequency column and check against the total number of observations.

Example 3: To illustrate the construction of frequency distribution consider the following data
which represents the lives of 40 similar car batteries recorded to the nearest tenth of a year. The
batteries are guaranteed to last 3 years.
2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6
3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5

Class Interval Class Boundaries Class Midpoint Frequency


1.5-1.9 1.45-1.95 1.7 2
2.0-2.4 1095-2.45 2.2 1
2.5-2.9 2.45-2.95 2.7 4
3.0-3.4 2.95-3.45 3.2 15
3.5-3.9 3.45-3.95 3.7 10
4.0-4.4 3.95-4.45 4.2 5
4.5-4.9 4.45-4.95 4.7 3

To obtain the variations of these data, it can be done by listing the relative frequencies or
percentages for each interval. How?

The relative frequency of each class can be obtained by dividing the class frequency by the total
frequency. A table listing the relative frequencies is called a relative frequency distribution. If
each relative frequency is multiplied by 100%, we have percentage distribution.

Class Class Class Relative


Frequency Percentage
Interval Boundaries Midpoint Frequency
1.5-1.9 1.45-1.95 1.7 2 0.05 5%
2.0-2.4 1095-2.45 2.2 1 0.025 2.5%
2.5-2.9 2.45-2.95 2.7 4 0.1 10%
3.0-3.4 2.95-3.45 3.2 15 0.375 37.5%
3.5-3.9 3.45-3.95 3.7 10 0.25 25%
4.0-4.4 3.95-4.45 4.2 5 0.125 12.5%
4.5-4.9 4.45-4.95 4.7 3 0.075 7.5%

In many situations, we are more concerned with the number of data that fall below the specified
value. For example, instead of identifying the number of batteries that lasted to 3 years, we are
more concerned with the 7 batteries below 3 years.

The total frequency of all values less than the upper class boundary of a given interval is called
the cumulative frequency up to and including that class. A table showing the cumulative
frequency is called cumulative frequency distribution.

The percentage cumulative distribution enables one to read off the percentage of observations
falling below certain specified values.
Class Class Class Relative Cumulative Cumulative
Frequency Percentage
Interval Boundaries Midpoint Frequency Frequency Percent

1.5-1.9 1.45-1.95 1.7 2 0.05 5% 2 5%

2.0-2.4 1095-2.45 2.2 1 0.025 2.5% 3 7.5%

2.5-2.9 2.45-2.95 2.7 4 0.1 10% 7 17.5%

3.0-3.4 2.95-3.45 3.2 15 0.375 37.5% 22 55%

3.5-3.9 3.45-3.95 3.7 10 0.25 25% 32 80%

4.0-4.4 3.95-4.45 4.2 5 0.125 12.5% 37 92.5%

4.5-4.9 4.45-4.95 4.7 3 0.075 7.5% 40 100%

The examples discussed a while back are under numerical data, but take note that frequency
tables are also applicable to categorical data.

Frequency Table for Categorical Data


First, decide on the number of categories or classes to use. These categories must be chosen
so as to accommodate all the data and so that no item is placed under more than one category.
The concepts of class limits, class boundaries and class marks are of no concern when
constructing frequency distributions using categorical data.

Graphical Representations
The information provided by a frequency distribution in tabular form is easier to grasp if
presented graphically. Most people find a visual picture beneficial in comprehending the
essential features of a frequency distribution.

Bar Chart. A widely used form of graphic presentation of numerical data.

Example 4:

Although the bar chart provides immediate information about a set of data in a condensed form,
we are usually more interested in a related pictorial representation called a histogram.

A histogram differs from a bar chart in that the bases of each bar are the class boundaries
rather than the class limits. The use of class boundaries for the bases eliminates the spaces
between the bars to give the solid appearance.
Example 5:

Frequency Polygon. A second useful way of presenting numerical data in graphic form is by
means of frequency polygon. Frequency Polygons are constructed by plotting class
frequencies against class marks and connecting the consecutive points by straight lines.

A polygon is a many sided closed figure. To close the frequency polygon, an additional class
interval is added to both ends of the distribution, each with zero frequency.
Relative Frequency Polygon. If we wish to compare two sets of data with unequal sample
sizes by constructing two frequency polygons on the same graph, we must use relative
frequencies or percentages. This is called a relative frequency polygon or a percentage
polygon.

Cumulative Frequency Polygon or Ogive. A second line graph, which is obtained by plotting
the cumulative frequency less than any upper class boundary against the upper class boundary
and joining all the consecutive points by straight lines. If relative cumulative frequencies or
percentages had been used, we would call the graph a relative frequency ogive or a
percentage ogive.
SYMMETRY AND SKEWNESS
The shape or distribution of a set of measurements is best displayed by means of a histogram.
A distribution is said to be symmetric if it can be folded along a vertical axis so that the two
sides coincide. While a distribution that lacks symmetry with respect to a vertical axis is said to
be skewed.

Example 6: Symmetric

Example: Skewed
The first distribution is skewed to the right or positively skewed, since it has along right tail
compared to a much shorter left tail.

The second distribution is skewed to the left or negatively skewed.

For a symmetric distribution of measurements, the mean and median are both located at the
same position along the horizontal axis. However, if the data are skewed to the right, the large
values in the right tail are not offset by correspondingly low values in the left tail and
consequently the mean will be greater than the median. In the opposite, the small values in the
left tail will make the mean less than the median. We shall use this behaviour between the mean
and the median relative to the standard deviation to define a numerical measure of skewness.

For a perfectly symmetrical distribution, the mean and the median are identical and the value of
SK is zero.
SK is the Pearsonian coefficient of skewness, which is given by:
3(𝑥̅ − 𝑥̃) 3(𝜇 − 𝜇̃)
𝑆𝐾 = 𝑜𝑟 𝑆𝐾 =
𝑠 𝜎
Where: SK= Coefficient of Skewness
𝑥̅ = 𝜇 = mean
𝑥̃ = 𝜇̃ =median
s = standard deviation

When the distribution is skewed to the left, the mean is less than the median and the value of
SK will be negative. However, if the distribution is skewed to the right, the mean is greater than
the median and the value of SK will be positive. In general, the values of SK will fall between -3
and 3.

Example 7: Compute the Pearsonian coefficient of skewness for the distribution of battery lives
in the previous examples:

Solution: Assuming the data to be a sample, we find that the mean is 3.41, the median is 3.4
and the standard deviation is 0.70. Therefore,
3(𝑥 − 𝑥) 3(3.41 − 3.4)
𝑆𝐾 = = = 0.04
𝑠 0.70
indicating only a very slight amount of skewness to the right. With such a small value of SK, we
could essentially say that the distribution is symmetrical.
Note: Although histograms assume a wide variety of shapes, fortunately most distributions that
we meet in practice can be represented approximately by bell-shaped histograms for which the
SK will be very close to zero. This will be true for any set of data where the frequency of the
observations falling in the various classes decreases at roughly the same rate as we get farther
out in the tails of the distribution. These bell-shaped distributions play a major role in the field of
statistical inference. Some are understandably more variable than others as reflected by a flatter
and wider histogram. Chebyshev’s Theorem tells us that at least ¾ 0r 8/9 of the observations of
any distribution, bell-shaped or not, will be within 2 or 3 standard deviations of the mean,
respectively. If the distribution happens to be somewhat bell-shaped, we can state a rule that
gives even stronger results.

Empirical Rule:
Given a bell-shaped distribution of measurements, then approximately
68% of the observations lie within 1 standard deviation of the mean
95% of the observations lie within 2 standard deviations of the mean
99.7% of the observations lie within 3 standard deviations of the mean.

Example 8:
From the given example on car batteries, we find that the mean is 3.41 and standard deviation
is 0.70. Now the Empirical Rule states that approximately 68% or 27 of the 40 observations
should be contained in the interval mean(x) ± s = 3.41± 0.70, or from 2.71 to 4.11. An actual
count shows that 28 of the 40 observations fall in the given interval. Similarly, 95% or 38 out of
40 observations should fall in the interval mean ± 2s = 3.41 ± 2(0.70) or from 2.01 to 4.81. The
actual count this time shows that 38 of the 40 observations fall in the specified interval. The
interval mean ± 3s = 3.41 ± 3(0.70) or from 1.31 to 5.51, contains all measurements. By
Chebyshev’s theorem we could only have concluded that at least 30 observations will fall in the
interval from 2.01 to 4.81 and at least 36 observations will fall between 1.31 to 5.51.
PERCENTILES, DECILES AND QUARTILES
We already discussed the measures of central location, but there are several other measures of
location that describe or locate the position of certain non-central pieces of data relative to the
entire set of data. These measures often referred to as fractiles or quantiles.

Fractiles or Quantiles. These are values below which a specific fraction or percentage of the
observations in a given set must fall.

The most common fractiles are percentiles, deciles and quartiles.

Percentiles. These are values that divide a set of observations into 100 equal parts. These
values, denoted by P1, P2, … P99 , are such that 1% of the data falls below P1, 2% falls below
P2, … , and 99% falls below P99.

Example 9. Find the P85 for the distribution of the battery lives in the previous example.
2.2 4.1 3.5 4.5 3.2 3.7 3.0 2.6
3.4 1.6 3.1 3.3 3.8 3.1 4.7 3.7
2.5 4.3 3.4 3.6 2.9 3.3 3.9 3.1
3.3 3.1 3.7 4.4 3.2 4.1 1.9 3.4
4.7 3.8 3.2 2.6 3.9 3.0 4.2 3.5
Solution: First, rank the given data in increasing order of magnitude.
1.6 2.6 3.1 3.2 3.4 3.7 3.9 4.3
1.9 2.9 3.1 3.3 3.4 3.7 3.9 4.4
2.2 3.0 3.1 3.3 3.5 3.7 4.1 4.5
2.5 3.0 3.2 3.3 3.5 3.8 4.1 4.7
2.6 3.1 3.2 3.4 3.6 3.8 4.2 4.7

Solution: Since the table contains 40 observations, we seek the value below which (85/100) *
40 = 34 observations fall. Based from the table, P 85 could be any value between 4.1 years and
4.2 years. In order to give a unique value, we shall define P 85 to be the value midway between
these two observations. Therefore, P85 = 4.15 years.

This procedure works very well whenever the number of observations below the given
percentile is a whole number. However, when the required number of observations is fractional,
it is customary to use the next highest whole number to find the required percentile.

Example 10: Find the P48 in the given set of observations of the 40 car batteries.

(48/100)* 40 = 19.2 observations

Round up to the next integer, so, use the 20th observation as the location point. Hence, P48 = 3.4
years

Percentile of Grouped Data


In grouping the data, we have chosen to ignore the identity of the individual observations. The
only information that remains, assuming the original raw data have been discarded is the
number of observations falling in each class interval. To evaluate a percentile from a frequency
distribution, we assume the measurements within a given class interval to be uniformly
distributed between the lower and upper class boundaries. This is equivalent to interpreting a
percentile as a value below which a specific fraction or percentage of the area of a histogram
falls.

Example 11: Find P48 for the distribution of batter lives shown below:
Solution: We are seeking the value below which (48/100) * 40 = 19.2 of the observations fall.
The fact that the observations are assumed uniformly distributed over the class interval, permits
us to use fractional observations, as is the case here. There are 7 observations falling below the
class boundary 2.95. We still need 12.2 of the next 15 observations falling between 2.95 and
3.45. Therefore, we must go a distance (12/15) * 0.5 = 0.41 beyond 2.95.
Hence, P48 = 2.95 + 0.41
= 3.36 years
compared with 3.4 years obtained above from the ungrouped data. Therefore, we conclude that
48% of all batteries of this type will last less than 3.36 years.

Deciles. These are values that divide a set of observations into 10 equal parts. These values,
denoted by D1, D2, …, D9, are such that 10 % of the data falls below D1, 20% falls below D2, …
and 90% falls below D 9.

Example 12: Use the frequency distribution of the lives of car batteries to find the D 7.

Solution: We need the value below which (70/100) * 40 = 28 observations fall. There are 22
observations falling below 3.45. We still need 6 of the next 10 observations and therefore, we
must go a distance (6/10) * 0.5 = 0.3 beyond 3.45. Hence,
D7 = 3.45 + 0.3 = 3.75 years.

Deciles are found in exactly the same way that we found percentiles. To find D 7 for the
distribution of battery lives, we need the value below which (70/100) * 40 = 28 of the
observations fall. Since this can be any value between 3.7 years and 3.8 years, we take their
average and hence D7 = 3.75 years. Therefore, we conclude that 70% of all batteries of this
type will last less than 3.75 years.

Quartiles. These are values that divide a set of observations into 4 equal parts. These values,
denoted by Q1, Q2, and Q3 , are such that 25% of the data falls below Q1, 50% falls below Q2
and 75% falls below Q3.

Example 13: To find Q1 for the distribution of the battery lives, we need the value below which
(25/100) * 40 = 10 of the observations fall. Since the 10th and 11th measurements are both equal
to 3.1 years, their average will also be 3.1 years and hence Q 1 = 3.1 years.

Note: The 50th, 5th Decile and second quartile (Q1) of a distribution are all equal to the same
value, commonly referred to as the median. All the quartiles and deciles are percentiles. For
example, the seventh decile is the 70 th percentile and the first quartile is the 25th percentile. Any
percentile, decile or quartile can also be estimated from a percentage ogive.

PROBABILITY
Sample Space
In statistics we use the word experiment to describe any process that generates a set of data.
An example of a statistical experiment might be the tossing of a coin. In this experiment there
are only two possible outcomes, heads and tails. We are particularly interested in the
observations obtained by repeating the same experiment several times. In most cases the
outcomes will depend on chance and therefore, cannot be predicted with certainty. Going back
to the coin, so even you tossed the coin repeatedly, we cannot be certain that a given toss will
result in a head. However, we do know the entire set of possibilities for each toss.
Sample Space. It is the set of all possible outcomes of a statistical experiment and is
represented by the symbol S.

Each outcome in a sample space is called an element or a member of the sample space or
simply a sample point. If the sample space has a finite number of elements, we may list the
members separated by commas and enclosed in brackets. Thus the sample space S, of
possible outcomes when a coin is tossed, may be written as:

S = {Heads, Tails} = {H, T}

Example 1: Consider the experiment of tossing a die. If we are interested in the number that
shows on the top face, then the sample space would be: S1 = {1, 2, 3, 4, 5, 6}

On the other hand, if we are interested only in whether the number is even or odd, then the
sample space is simply: S2 = {even, odd}

In this example, it illustrates the fact that more than one sample space can be used to describe
the outcomes of an experiment. In this case, Sample Space 1 provides more information than
S2. If we know which element in S1 occurs, we can tell which outcome in S2 occurs; however, a
knowledge of what happens in S2, in no way helps us know which element in S1 occurs. In
general, it is desirable to use a sample space that gives the most information concerning the
outcomes of the experiment.

In some experiments it will be helpful to list the elements of the sample space systematically by
means of a tree diagram.

Example 2: Suppose that 3 items are selected at random from a manufacturing process. Each
item is inspected and classified defective, D, or non-defective, N.

To list the elements of the sample space providing the most information, we construct the tree
diagram.

Now, the various paths of the tree give the distinct sample points. Starting at the top with the
first path, we get the sample point DDD, indicating the possibility that all three items inspected
are defective. Proceeding along the other paths, the sample space is
S = {DDD, DDN, DND, DNN, NDD, NDN, NND, NNN}

Sample spaces with a large or infinite number of sample points are best described by a
statement or rule. For example, if the possible outcomes of an experiment are the set of cities in
the world with a population over 1 million, our sample space is written.

The possible outcomes of an experiment are the set of cities in the world with a population over
1 million.

S = {x | x is a city with a population over 1 million}

Which reads “S is the set of all x such that x is a city with a population over 1 million.”

The vertical bar is read “such that”. Similarly, if S is the set of all points (x,y) on the boundary of
a circle of radius 2 centered at the origin we write:

Example 3: S= {(x, y) | x2 + y2 = 4}

Whether we describe the sample space by the rule method or by listing the elements will
depend on the specific problem at hand. The rule method has practical advantages, particularly
in the many experiments where a listing becomes a very tedious chore.

Events
In any given experiment we may be interested in the occurrence of certain events, rather than in
the outcome of a specific element in the sample space. For instance, we might be interested in
the event A that the outcome when a die is tossed is divisible by 3. This will occur if the outcome
is an element of the subset A ={3,6} of the sample space S 1 in our previous example. As an
additional illustration, we might be interested in the event B that the number of defectives is
greater than 1 in example 2. this will occur if the outcome is an element of the subset B = {DDD,
DND, NDD, DDD} of the sample space S.

To each event we assign a collection of sample points that constitutes a subset of the sample
space. This subset represents all the elements for which the event is true.

Event. An event is a subset of a sample space.

Example 4: Given the sample space S = {t | t ≥ 0}, where t is the life in years of a certain
electric component, then the event A that the component fails before the end of the 5th year is
the subset A = {t | 0 ≤ t < 5}.

There are two types of events: Simple and Compound Events

If an event is a set containing only one element of the sample space, then it is called a simple
event. A compound event is one that can be expressed as the union of simple events.

Example 5: The event of drawing a heart from a deck of 52 playing cards is the subset A =
{heart} of the sample space S = {heart, spade, club, diamond}. Therefore, A is a simple event.
Now the event B of drawing a red card is a compound event, since B = {heart U diamond} =
{heart, diamond}
Note that the union of simple events produces a compound event that is still a subset of the
sample space. We should also note that if the 52 cards of the deck were the elements of the
sample space rather than the 4 suits, then the event A of example 4 would be a compound
event.
Null Space. The null space or empty space is a subset of the sample space that contains no
elements. We denote this event by the symbol Ø.

If we let A be the event of detecting a microscopic organism by naked eye in a biological


experiment, then A = Ø. Also, if B = {x|x is a nonprime factor of 7}, then B must be the null set,
since the only possible factors of 7 are the prime numbers 7 and 1.

The relationships between events and the corresponding sample space can be illustrated
pictorially by means of a Venn diagram.

In a Venn diagram we might let the sample space be a rectangle and represent events by
circles drawn inside the rectangle.

We see from the figure that events A, B and C


are all subsets of the sample space S. It is also
that event B is a subset of event A; events b and
c have no sample points in common; and events
A and C have at least one sample point in
common. This figure then may depict a situation
in which one selects a card at random from an
ordinary deck of 52 playing cards and observes
whether the following events occur.
A: the card is red
B: the card is jack, queen or king of diamonds
C: the card is an ace.
Clearly the only sample points common to events A and C are the two red aces.

Sometimes, it is convenient to shade various areas of the Venn diagram just like this:

In this case we take all the college to be our sample


space. The event representing those students
taking mathematics is the first circle while the other
circle are students taking history, and the space
intersecting each circle are the students taking both
math and history. The unshaded part of the sample
space or not included in the circles are students
who are taking other subjects other than math and
history.

Operations with Events


We now consider certain operations with events that will result in the formation of new events.
These new events will be subsets of the sample space as the given events.

Intersection of Events. The intersection of two events A and B, denoted by the symbol A ∩ B,
is the event containing all elements that are common to A and B.
The elements in the set A ∩ B represent the
simultaneous occurrence of both A and B and
therefore must be those elements and only those
that belong to both A and B. These elements may
either be listed or defined by the rule A ∩ B = { x | x
€ A and x € B }, where the symbol € means “is the
element of” or belongs to. In the Venn diagram, the
shaded area corresponds to the event A ∩ B .
Examples 6:
1. Let A = {1, 2, 3, 4, 5} and B = {2, 4, 6, 8}; then A ∩ B = {2, 4}
2. If R is the set of all taxpayers and S is the set of all people over 65 years of age, then R ∩ S
is the set of all taxpayers who are over 65 years of age.
3. Let P = {a, e, i, o, u} and Q = {r, s, t}; then P ∩ Q = Ø. That is P and Q have no elements is
common.

In a certain statistical experiments it is by no means unusual to define two events A and B that
cannot both occur simultaneously. The events A and B are then said to be mutually exclusive.

Mutually Exclusive Events. Two events A and B are mutually exclusive if A ∩ B = Ø; that is, A
and B have no elements in common.

Two mutually exclusive events A and B


are illustrated in Venn diagram. By
shading the areas corresponding to the
events A and B, we find no overlapping
shaded area representing the event A ∩
B. Hence A ∩ B is empty.

Example 7:
Suppose that the die is tossed. Let A be the event that an even number turns up and B the
event that an odd number shows. The events A = {2, 4, 6} and B = {1, 3, 5} have no points in
common, since an even and an odd number cannot simultaneously on a single toss of a die.
Therefore A ∩ B, and consequently the events A and B are mutually exclusive.

Often, one is interested in the occurrence of at least one of two events associated with an
experiment. Thus, in a die-tossing experiment, if A = {2, 4, 6} and B = {1, 3, 5}, we might be
interested in either A or B occurring, or both A and B occurring. Such an event is called Union of
A and B, will be occur if the outcome is an element of the subset {2, 4, 5, 6}

Union of Events. The union of two events A and B, denoted by the symbol A U B, is the event
containing all the elements that belong to A or to B or to both.

Examples 8:
1. Let A = {2, 3, 5, 8} and B = {3, 6, 8}; then
A U B = {2, 3, 5, 6,8}
2. If M = {x | 3 < x < 9} and N = {y | 5 < y < 12}; then M U N = {z | 3 < z < 12}

Suppose that we consider the smoking habits of the employees of some manufacturing firm as
our sample space. Let the subset of smokers correspond to some event. Then all the
nonsmokers correspond to some event, also a subset of S 1, which is called the complement of
the set of smokers.

Complement of an Event. The complement of an event A with respect to S is the set of all
elements of S that are not in A. We denote the complement A by the symbol A’.

The elements of A’ may be listed or defined by the


rule A’ = { x | { x | x € S and x €/ A }. In this figure,
the area representing the elements of the event A’
has been shaded.

Examples 9:
1. Let R be the event that a red card is selected
from an ordinary deck of 52 playing cards and let S
be the entire deck. Then R’ is the event that the
card selected from the deck is not red but a black
card.

2. Consider the sample S = {book, dog, cigarette, coin, map, war}. Let A = {dog, war, book,
cigarette}. Then A’ = {coin, map}.

Several results that follow from the foregoing definitions which may easily be verified by means
of Venn Diagrams are
1. A ∩ Ø = Ø
2. A U A’ = S
3. (A’)’ = A
4. A U Ø = A
5. S’ = Ø
6. A ∩ A’ = Ø
7. Ø’ = S

Counting Sample Points


One of the problems that the statistician must consider and attempt to evaluate is the element of
chance associated with the occurrence of certain events when an experiment is performed.
These problems belong in the field of probability. In many cases we shall be able to solve a
probability problem by counting the number of points in the sample space without actually listing
the entire element. The fundamental principle of counting, often referred to as the Multiplication
Rule, is stated in the following theorem.

Theorem 4.1 (Multiplication Rule). If an operation can be performed in n1 ways, and if for each
of these a second operation can be performed in n 2 ways, then the two operations can be
performed together in n1n2 ways.
Example 10: How many sample points are in the sample space when a pair of dice is thrown
once?

Solution: The first die can land in any of 6 ways. For each of these 6 ways the second die can
also land in 6 ways. Therefore, the pair of dice can land in (6)(6) = 36 ways.

Theorem 4.1 may be extended to cover any number of operations. In example 2, for instance,
the first item can be classified in 2 ways, defective or non-defective, and likewise for the second
and third times, resulting to (2)(2)(2) = 8 possibilities displayed in the tree diagram. The general
multiplication rule covering k operations is stated in the following theorem.

Theorem 4.2 (Generalized Multiplication Rule). If an operation can be performed in n 1 ways,


if for each of these a second operation can be performed in n 2 ways, if for each of the first two a
third operation can be performed in n 3 ways and so on, then the sequence of k operations can
be performed in n1n2…nk ways.

Example 11: How many lunches are possible consisting of soup, a sandwich, dessert and a
drink if one can select from 4 soups, 3 kinds of sandwiches, 5 desserts and 4 drinks?

Solution: The total number of lunches would be


= (4)(3)(5)(4) = 240

Example 12: How many even three-digit numbers can be formed from the digits 1, 2, 5, 6, and
9 if each digit can be used only once?

Solution: Since the number must be even, we have only 2 choices for the units position. For
each of these we have 4 choices for the hundreds position and 3 choices for the tens position.
Therefore, we can form a total of (2)(4)(3) = 24 even three-digit numbers

Frequently, we are interested in a sample space that contains as elements all possible orders or
arrangements of a group of objects. For example, we might want to know how many
arrangements are possible for sitting 6 people around a table, or we might ask how many
different orders are possible to draw 2 lottery tickets from a total of 20. The different
arrangements are called permutations.

Permutation. It is an arrangement of all or part of a set of objects.

Consider the three letters a, b, and c. The possible permutations are abc, acb, bac, cab, cba,
bca. Thus we see that there are 6 distinct arrangements. Using theorem 4.2, we could have
arrived at the answer without listing the different orders. There are 3 positions to be filled from
the letters a, b, c. Therefore, we have 3 choices for the first position, and 2 for the second,
leaving only 1 choice for the last position, giving a total of (3)(2)(1) = 6 permutations. In general,
n distinct objects can be arranged in n(n-1)(n-2) … 3*2*1 ways. We represent this product by
the symbol n!, which is read “n factorial”. Three objects can be arranged in 3! = 3*2*1 = 6. By
definition 1! = 1 and 0! = 1.

Theorem 4.3. The number of permutations of n distinct objects is n!.

The number of permutations of the four letters a, b, c, and d will be 4! = 24. Let us now consider
the number of permutations that are possible for taking the 4 letters 2 at a time. These would be
ab, ac, ad, ba, ca, da, bc, cb, bd, db, cd and dc. Using Theorem 4.1, we have 2 positions to fill
with 4 choices for the first and 3 choices for the second, a total of (4)(3) = 12 permutations. In
general, n distinct objects taken r at a time can be arranged in n(n-1)(n-2) …(n-r +1) ways. We
represent this product by the symbol nPr = n!/ (n-r)!

Theorem 4.4. The number of permutations of n distinct objects taken r at a time is:
𝒏!
nPr =
(𝒏−𝒓)!

Example 13: Two lottery tickets are drawn from 20 for first and second prizes. Find the number
of sample points in the space S.
𝟐𝟎!
20P2 =
(𝟐𝟎 − 𝟐)!
𝟐𝟎!
=
𝟏𝟖!
= (20)(19)
= 380

Example 14: How many ways can a basketball team schedule 3 exhibition games with 3 teams
if they are all available on any of 5 possible dates?
5P3 = 5!/ (5-3)! = 5!/2! = (5)(4)(3)
= 60

Permutations that occur by arranging objects in a circle are called circular permutations. 2
circular permutations are not considered different, unless corresponding objects in the two
arrangements are preceded or followed by a different object as we proceed in a clockwise
direction.

Example15: If four people are playing bridge, we do not have a new permutation if they all
move one position in a clockwise direction. By considering 1 person in a fixed position and
arranging the other 3 in 3! Ways, we find that there are distinct arrangements for the bridge
game.

Theorem 4.5. The number of permutations of n distinct objects arranged in a circle is (n-1)!.

So far we have considered permutations of distinct objects. That is, all the objects were
completely different or distinguishable. Obviously, if the letters b and c are both equal to x, then
the 6 permutations of the letters a, b and c become axx, axx, xax, xax, xxa, and xxa, of which
only 3 are distinct. Therefore, with 3 letters, 2 being the same, we have 3!/ 2! = 3 distinct
permutations. With the four letters a, b, c, and d we had 24 distinct permutations. If we let a=b=x
and c=d=y, we can only list the following xxyy, xyxy, yxxy, yyxx, xyyx, yxyx. Thus, we have 4!/
2!2! =6distinct permutations.

Theorem 4.6. The number of permutations of n things of which n1 are one of a kind, n2 of a
second kind, … nk of the kth kind is:
𝒏!
𝒏𝟏 ! 𝒏𝟐 ! … 𝒏𝒌 !

Example 16: How many different ways can 3 red, 4 yellow and 2 blue bulbs be arranged in a
string of Christmas tree lights with 9 sockets?
𝟗!
= 𝟏𝟐𝟔𝟎
𝟑! 𝟒! 𝟐!

Often we are concerned with the number of ways of partitioning a set of n objects into r subsets,
called cells. A partition has been achieved if the intersection of every possible pair of the r
subsets is the empty set Ø and if the union of all subsets gives the original set. The order of the
elements within a cell is of no importance. Consider the set {a, e, I, o, u}. The possible partitions
into 2 cells are {(a,e,I,o),(u)}, {(a,I,o,u),(e)}, {(a,e,o,u),(i)}, {(a,e,i,u),(o)}, {(e,I,o,u),(a)}. We see
that there are 5 such ways to partition a set of 5 elements into 2 subsets or cells containing 4
elements in the first cell and 1 element in the second.

𝟓 𝟓!
The number of partitions for this illustration is denoted by (
)= =𝟓
𝟒, 𝟏 𝟒!𝟏!
Where the top number represents the total number of elements and the bottom numbers
represent the number of elements going into each cell.

Theorem 4.7. The number of ways of partitioning a set of n objects into r cells with n 1 elements
in the first cell, n2 elements in the second and so on, is:
𝒏 𝒏!
(𝒏 , 𝒏 , … , 𝒏 ) =
𝟏 𝟐 𝒓 𝑛1 !, n2 !, … , n𝑟 !
Where: n1 + n2 + ⋯ + nr = n

Example 17: How many ways can 7 people be assigned to 1 triple and 2 double rooms?
𝟕 𝟕!
( )= = 𝟐𝟏𝟎
𝟑, 𝟐, 𝟐 3! 2! 2!

In several problems we are interested in the number of ways of selecting r objects from n
without regard to order. These selections are called combinations.

A combination creates a partition with 2 cells, one cell containing the r objects selected and the
other cell containing the n-r objects that are left. The number of such combinations, denoted by
𝒏 𝒏
(𝒓, 𝒏 − 𝒓) is usually shortened to ( ), since the number of elements in the second cell must be
𝒓
n-r.

Theorem 4.8. The number of combinations of n distinct objects takes r at a time is


𝒏 𝒏!
( )=( )
𝒓 𝐫! (𝐧−𝐫)!

Example 18: From 4 Republicans and 3 Democrats, find the number of committees of 3 that
can be formed with 2 Republicans and 1 Democrat.

Solution: The number of ways of selecting 2 Republicans from 4 is


𝟒 𝟒! 𝟒!
( )=( )= ( )= 6
𝟐 𝟐! (𝟒−𝟐)! 𝟐! 𝟐!
𝟑 𝟑! 𝟑!
The number of ways of selecting 1 Democrat from 3 is ( ) = ( )= ( )= 3
𝟏 𝟏! (𝟑−𝟏)! 𝟏! 𝟐!

Using Theorem 4.1 (Multiplication Rule), we find the number of committees that can be formed
with 2 Republicans and 1 Democrat to be (3)(6) = 18.
It is of interest to note that the number of permutations of the r objects making up each of the
𝒏
( ) combinations in theorem 4.8 is r!. . Consequently, the number of permutations of n distinct
𝒓
𝒏
objects taken r at a time is related to the number of combinations by the formula nPr = ( )r!.
𝒓

Probability of an Event
The statistician is basically concerned with drawing conclusions or inferences from experiments
involving uncertainties. For these conclusions and inferences to be accurately interpreted, an
understanding of probability theory is essential.

What do we mean when we make the statements “John will probably win the tennis match”, I
have a 50:50 chance of getting an even number when a die is tossed,” “I am not likely to win at
bingo tonight,” or “Most of our graduating class will probably be married within 3 years”? In each
case we are expressing an outcome of which we are not certain, but because of past
information or from an understanding of the structure of the experiment, we have some degree
of confidence in the validity of the statement.

The mathematical theory of probability for finite sample spaces provides a set of real numbers
called weights or probabilities, ranging from 0 to 1, which allow us to evaluate the likelihood of
occurrence of events.

To every point in the sample space we assign a probability such that the sum of all probabilities
is 1. if we have reason to believe that a certain sample point is quite likely to occur when the
experiment is conducted, the probability assigned should be close to 1. On the other hand, a
probability closer to zero is assigned to sample point that is not likely to occur. In many
experiments, such as tossing a coin or die, all the sample points have the same chance of
occurring and are assigned equal probabilities. For points outside the sample space, that is, for
simple events that cannot possibly occur, we assign a probability of zero.

To find the probability of an event A, we sum all the probabilities assigned to the sample points
A. This sum is called the probability of A and is denoted by P(A). Thus the probability of the set
Ø is zero and the probability of S is 1.
The probability of an event A is the sum of the probabilities of all sample points in A. Therefore,
0 ≤ P(A) ≤ 1, P(Ø) = 0, P(S) = 1

Example 19: A coin is tossed twice. What is the probability that at least 1 head occurs?

Solution: The sample space of the experiment is:


S = {HH, HT, TH, TT}
If the coin is balanced, each of these outcomes would be equally likely to occur. Therefore, we
assign a probability of w to each sample point. Then 4w = 1 or w = ¼. If A represents the event
of at least 1 head occurring, then P(A) = 3/4 .

Example 20: A die is loaded in such a way that an even number is twice as likely to occur as an
odd number. If E is the event that a number less than 4 occurs on a single toss of the die, find
P(E).

Solution: The sample space of the experiment is:


S = {1, 2, 3, 4, 5, 6}
We assign a probability of w to each odd number and a probability of 2w to each even number.
Since the sum of the probabilities must be 1, we have 9w = 1 or w = 1/9. Hence probabilities of
1/9 and 2/9 are assigned to each odd and even numbers, respectively. Therefore, P(E) = 1/9 +
2/9 + 1/9 = 4/9

If the sample space for an experiment contains N elements all of which are equally likely to
occur, we assign probabilities equal to 1/N to each of the N points. The probability of any event
A containing n of these N sample points is then the ratio of the number of elements in A to the
number of elements in S.

Theorem 4.9. If the experiment can result in any one of N different equally likely outcomes, and
if exactly n of these outcomes correspond to event A, then the probability of event A is
P(A) = n/N

Example 21: If a card is drawn from an ordinary deck, find the probability that is a heart.

Solution: The number of possible outcomes is 52, of which 13 are hearts. Therefore, the
probability of event A of getting a heart is P(A) = 13/52 = ¼ = 0.25 or 25%

Example 22: In a poker hand consisting of 5 cards, find the probability of holding 2 aces and 3
jacks.

Solution: The number of ways of being dealt 2 aces from 4 cards is


𝟒
( )= 4! / 2!2! = 6
𝟐
And the number of ways of being dealt 3 jacks from 4 cards is
𝟒
( )= 4! / 3!1! = 4
𝟑

By the multiplication rule of theorem 4.1, there are n = 4*6 = 24 hands with 2 aces and 3 jacks.

𝟓𝟐
The total number of 5 cards poker, all of which are equally likely, is N = ( )= 52! / 5! 47! = 2,
𝟓
598, 960
Therefore, the probability of event C of getting 2 aces and 3 jacks in a 5-card poker hand is
P(C) = n/N = 24 / 2, 598, 960 = 0.000009

If the probabilities cannot be assumed equal, they must be assigned on the basis of prior
knowledge or experimental evidence. For example, if a coin is not balanced, we could estimate
the two possibilities by tossing the coin a large number of times and recording the outcomes.
The true probabilities would be the fractions of heads and tails that occur in the long run. This
method is at probabilities is known as the relative frequency definition of probability.

To find a numerical value that represents adequately the probability of winning at tennis, we
must depend on our past performance at the game as well as that of our opponent and to some
extent in our belief in being able to win. Similarly, to find the probability that a horse will win a
race, we must arrive at a probability based on the previous records of all the horses entered in a
race. Intuition would undoubtedly also play a part in determining the size of the bet that one
might be willing to wager. The use of intuition, personal beliefs and other indirect information in
arriving at probabilities is referred to as the subjective definition of probability.
Additive Rules
Often, it is easier to calculate the probability of an event from known probabilities of other
events. This may well be true if the event is question can be represented as the union of two or
more other events or as the complement of some event. Several important laws that frequently
simplify the computation of probabilities follow. The first, called the additive rule, applies to
unions of events.

Theorem 4.10. If A and B are any two events, then


P(A U B) = P(A) + P(B) – P(A∩B)

Corollary 1
If A and B are mutually exclusive, then
P(A U B) = P(A) + P(B)

The corollary is an immediate result of the additive rule, since A and B are mutually exclusive. In
general, if we have more those events in a sample space, we can write the sum of all the
probabilities of the events using the corollary 2.

Corollary 2
If A1, A2, A3, … An are mutually exclusive, then
P(A1 U A2 U … U An) = P(A1) + P(A2)+ … + P(An)

Note that if A1, A2, A3, … An are partitions of a sample space S, then the probability is equal to 1.

Example 24: The probability that a student passes mathematics is 2/3 and the probability that
he passes English is 4/9. If the probability of passing at least one course is 4/5, what is the
probability that he passes both courses?

Solution: If M is the event “passing mathematics” and E is the event “passing English”, then by
transposing the terms in Theorem 4.10, we have
P (M∩E) = P(M) + P(E) – P(M ∪ E)
= 2/3 + 4/9 – 4/5
= 14/45 = 31.11%

Example 25: What is the probability of getting a total of 7 or 11 when a pair of dice is tossed?

Solution: P (A ∪ B) = P(A) + P(B)


= 1/6 + 1/18
= 2/9

Sample points in the 2 dice are 36 = 6 * 6, and there are 6 elements where a total of 7 occurs
which are 5+2, 2+5, 4+3, 3+4, 6+1, 1+6, and for a total of 11, there are 2 sample points and
these are 5+6 and 6+5. Since there are 6 for total of 7, let A be the event for that which is 1/6 or
6/36, and B for total of 11 which is 2/36 or 1/18. To add the probability of the two, we have 2/9.

Often, it is more difficult to calculate the probability that an event occurs than it is to calculate
the probability that the event does not occur. Should this be the case for some event A, we
simply find P(A’) first and then using Theorem 4.11, find P(A) by subtraction.

Theorem 4.11. If A and A’ are complementary events, then


P(A) + P(A’) = 1

Proof. Since A ∪ A’ = S and the events A and A’ are mutually exclusive, then
1 = P(S)
= P(A ∪ A’)
= P(A) + P(A’)

Example 26: A coin is tossed 6 times in succession. What is the probability of at least 1 head
occurs?

Solution: Let E be the event that at least 1 head occurs. The sample space consists of 2 6 = 64
sample points, since each toss can result in 2 outcome. Now, P(E) = 1 – P(E’), where E’ is the
event that no head occurs. This can happen in only one way – when all tosses result in tail.
Therefore, P(E’) = 1/64 and P(E) = 1 – 1/64 = 63/64.

Conditional Probability
The probability of an event B occurring when it is known that some event A has occurred is
called a conditional probability and is denoted by P(B|A), which is usually read as “the
probability that B occurs given that A occurs” or “the probability of B, given A”.

The conditional probability of B, given A, denoted by P(B|A), is defined by the equation:


𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵|𝐴) =
𝑃(𝐴)
If P(A) > 0

Example 27: The probability that a regularly scheduled flight departs on time is P(D) = 0.83, the
probability that it arrives on time is P(A) = 0.92, and the probability that it departs and arrives on
time is P(D ∩ A) = 0.78. Find the probability that a plane
a) Arrives on time given that it departed on time;
b) Departed on time given that it has arrived on time.

Solution
a) Arrives on time given that it departed on time;
𝑃(𝐷∩𝐴) 0.78
P(A|D) = = = 0.94
𝑃(𝐷) 0.83
b) Departed on time given that it has arrived on time.
𝑃(𝐷∩𝐴) 0.78
P(D|A) = = = 0.85
𝑃(𝐴) 0.92

Independent Events. Two events A and B are independent if either


P(B|A) = P(B) or P(A|B) = P(A)

Otherwise, A and B are dependent.

Multiplicative Rules
If in an experiment the events A and B can both occur, the
P(A ∩ B) = P(A) P(B|A)
Example 28: Suppose that we have a fuse box containing 20 fuses, of which 5 are defective. If
2 fuses are selected at random and removed from the box in succession without replacing the
first, what is the probability that both fuses are defective?

Solution: We shall let A be the event that the first fuse is defective and B be the event that the
second fuse is defective; then we interpret that A ∩ B as the event that A occurs and B occurs
after A has occurred. The probability of first removing a defective fuse is ¼ and then the
probability of removing a second defective fuse from the remaining 4 is 4/19. Hence, solution
P(A ∩ B) = P(A) P(B|A)
= (1/4) (4/19)
= 1/19

Theorem 4.13 Special Multiplicative Rule


If two events A and B are independent, then
P(A ∩ B) = P(A) P(B)

Therefore, to obtain the probability that two independent events will both occur, we simply find
the product of their individual probabilities.

Example 29: A small town has one fire engine and one ambulance available for emergencies.
The probability that the fire engine is available when needed is 0.98, and the probability that the
ambulance is available when called is 0.92. In the event of an injury resulting from a burning of
building, find the probability that both the ambulance and the fire engine will be available.

Solution: Let A and B represent the respective events that the fire engine and ambulance
available. Then,
P(A ∩ B) = P(A) P(B)
= 0.98 (0.92)
= 0.9016

Theorem 4.14 Generalized Multiplicative Rule


If in an experiment the events A1, A2, A3, … Ak, can occur, then
P(A1 ∩ A2 ∩ A3 ∩ … ∩ Ak) = P(A1) P(A2|A1) P(A3|A1∩A2) … P(Ak|A1∩A2∩… ∩Ak-1)
If the events A1, A2, A3, … Ak are independent, then
P(A1 ∩ A2 ∩ A3 ∩ … ∩ Ak) = P(A1) P(A2)P(A3) … P(AK)

Example 30: Three cards are drawn in succession, without replacement, from an ordinary deck
of playing cards. Find the probability that the first card is a red ace, the second card is a ten or
jack and the third is greater than 3 but less than 7.

Solution: Let A1 = the first card (red ace)


Let A2 = the second card (either 10 or jack)
Let A3 = the third card (3<A3<7)

P(A1) = 2/52
P(A2|A1) = 8/51
P(A3|A1∩A2) = 12/50
Hence, P(A1∩A2∩A3) = (2/52)(8/51)(12/50) = 8/5525
Teaching and Learning Activities

Exercise: In a separate sheet of yellow paper, answer the following questions.

Squared f(d2)
Cumulative Deviation
Class Frequency Midpoint Lower Upper Deviation
f(x) Frequency (d)
Interval (f) (x) Limit Limit (d2)
(F) (x - 𝑥̅ )
(x - 𝑥̅ )2
91-100 8
81-90 11
71-80 9
61-70 13
51-60 18
41-50 12
31-40 10
21-30 9
11-20 7
01-10 3
i= n= Σf(x)= Σd2= Σfd2=

1. What is the sample mean?


2. What is the median?
3. What is the mode?
4. What is the range?
5. What is the standard deviation?
6. What is the sample variance?
7. What is the cumulative frequency of the median class?
8. What is the lower limit and the upper limit of the modal class?
9. What is the size of the class interval?
10. What is the summation of all the product of the frequency and midpoint of the class interval?

Fill in the blanks.


1. ________ is a measure of the center of a set of data when the data is arranged in an
increasing or decreasing order of magnitude.
2. The ______ of a set of observations arranged in an increasing or decreasing order of
magnitude is the middle value when the number of observations is odd or the arithmetic mean
of two middle values when the number of observations is even.
3. The ______ of a set of observations is that value which occurs most often or with the greatest
frequency.
4. It is the measure of the variation of a set of data in terms of the amounts by which the
individual values differ from their mean.
5. Data that are presented in the form of a frequency distribution are called _________.
6. A distribution that lacks symmetry with respect to a vertical axis is said to be _________.
7. These are values that divide a set of observations into 100 equal parts.
8. These are values that divide a set of observations into 4 equal parts.
9. These are values that divide a set of observations into 10 equal parts.
10. The numerical difference between the upper and lower class boundaries of a class interval
is defined to be the ___________.
11. The number of observations falling in a particular class is called ________.
12. The midpoint between the upper and lower class boundaries or class limits of a class
interval is called the __________.

Flexible Teaching Learning Modality (FTLM) adapted

Google Classroom, Module, Exercises

Assessment Task

Problem Set

Reference/s:

Walpole, Ronald E. Introduction to Statistics. International Edition. Third Edition. Prentice Hall
International, Inc. 1997. ISBN 981-4009-51-2.

https://www.questionpro.com/blog/non-probability-sampling/

https://www.questionpro.com/blog/probability-sampling/

You might also like