[go: up one dir, main page]

0% found this document useful (0 votes)
21 views45 pages

1 - Introduction To Statistics and Data Analysis

The document provides an introduction to statistics and data analysis, emphasizing the importance of understanding statistical methods for conducting research and making informed decisions. It covers key concepts such as types of statistics (inferential and descriptive), sampling methods, types of data, and the distinction between observational and experimental studies. Additionally, it highlights the potential misuse of statistics and the importance of accurate representation in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views45 pages

1 - Introduction To Statistics and Data Analysis

The document provides an introduction to statistics and data analysis, emphasizing the importance of understanding statistical methods for conducting research and making informed decisions. It covers key concepts such as types of statistics (inferential and descriptive), sampling methods, types of data, and the distinction between observational and experimental studies. Additionally, it highlights the potential misuse of statistics and the importance of accurate representation in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

1: Introduction to Statistics

and Data Analysis


Engineering Data Analysis
Introduction

Most people become familiar with probability and statistics through radio,
television, newspapers, and magazines.

• Based on the 2000 census, 40.5 million households in the United States
have two vehicles (US Census Bureau).
• The average cost of a wedding is nearly Php 250,000 (Bride Magazine).
• Women who eat fish once a week are 29% less likely to develop heart
disease (Harvard School of Public Health).
Introduction

• Statistics is the science of conducting studies to collect, organize,


summarize, analyze, and draw conclusions from data.
• Statistics is used to analyze the results of surveys and as a tool in
scientific research to make decisions based on controlled experiments.
Other uses of statistics include operations research, quality control,
estimation, and prediction.
Introduction

• A variable is a characteristic or attribute that can assume different


values.
• Data are the values (measurements or observations) that the variables
can assume. Variables whose values are determined by chance are
called random variables.
• A collection of data values forms a data set.
• Each value in the data set is called a data value or datum.
Why study statistics?

• We must be able to read and understand the various statistical studies


performed in the engineering field.
• We may be called to conduct research in our field, since statistical
procedures are basic to research. We must be able to deign
experiments; collect, organize, analyze, and summarize data; and
possibly make reliable predictions or forecasts for future use.
• We can also use the knowledge gained from studying statistics to
become better consumers and citizens.
Two fields of statistics

INFERENTIAL STATISTICS is the process of using DESCRIPTIVE STATISTICS are used to describe the
data analysis to make predictions (“inference”) basic features in the study, in the form of charts,
from that data. graphs, plots, etc.
Identify whether each of the following
statements describes the need of inferential
or descriptive statistics.
a. In the year 2021, 22 million Filipinos was
enrolled in an HMO.
Exercises b. Nine out of 10 on-the-job fatalities in the
construction industry are men.
c. Expenditures for the cement industry
were $5.66 billion worldwide in 1995.
d. The median household income for people
aged 25-34 is Php 28,755 on average.
e. “Allergy therapy makes bees go away.”
f. Drinking decaffeinated coffee can raise
cholesterol levels by 7%.
g. The national average annual medicine
Exercises expenditure per person is about Php
105,000.
h. Experts say that mortgage rates may soon
hit bottom before bouncing back to
economy.
Sampling

Sample
Population

For example, we wished to study the heights of students at PLV by


measuring a sample of 100 students.
• How should we choose the 100 students to measure?
• Think of a lottery • DEFINITION
consisting of 10,000 • A population is the
tickets and 5 winners
Sampling will be chosen. What is
entire collection of
objects or outcomes
the fairest way to about which
choose the winners? information is
sought.
• A simple random • A sample is a subset
sample of size n is a of a population,
sample chosen by a containing the
method in which each objects or outcomes
collection of n that are observed.
population items is
equally likely to
comprise the sample,
just as in a lottery.
EXAMPLE: A utility company wants to
conduct a survey to measure the satisfaction
level of its customers in a certain town.
There are 10,000 customers in the town, and
utility employees want to draw a sample of
size 200 to interview personally. They obtain
Sampling a list of all 10,000 customers, and number
them from 1 to 10,000. They use a computer
random number generator to generate 200
random integers between 1 and 10,000 and
then contact the customers who correspond
to those numbers. Is this a simple random
sample?
EXAMPLE: A quality engineer wants to
inspect electronic microcircuits in
order to obtain information on the
proportion that are defective. She
decide to draw a sample of 100 circuit
Sampling from a day’s production. Each hour
for 5 hours, she takes the 20 most
recently produced circuits and tests
them. Is this a simple random
sample?
• EXAMPLE: A
construction engineer has
just received a shipment
Sampling of 1000 concrete blocks, • DEFINITION
each weighing • A sample of
approximately 25 convenience is a
kilograms. The blocks
have been delivered in a sample that is not
large pile. The engineer drawn by a well-
wishes to investigate the defined random
compressive strength of
the blocks by measuring method.
the strengths in a sample
of blocks. What is the
more appropriate method
of selecting random
samples?
• If, for example, a quality • DEFINITION
inspector draws a random
sample of 40 bolts from a
• A sampling
Sampling large shipment, measures variation happens
the length of each and when two or
finds that 32 of them more different
(80%) meet a length samples from the
specification. By chance, a same population
second inspector got a
few more good bolts,
will differ from
about 90% in her sample. each other as
The proportion of good well.
bolts in the population is
likely to be close to 80%
or 90%, but it is not likely
that it is exactly equal to
either value.
Sampling
DEFINITION
• It is possible to make a population behave as though it were infinitely large, by
replacing each item after it is sampled, known as the sampling with
replacement method.
OTHER SAMPLING METHODS
• Weighted sampling is when some items are given a greater chance of being
selected than others (ex., lottery in which some people have more tickets than
others.)
• Stratified random sampling is then the population is divided into
subpopulations known as strata, and a simple random sample is drawn from
each stratum.
• Cluster sampling is when items are drawn from the population in groups or
clusters.
Types of Data
DEFINITION When a numerical quantity designating how much or how many is
assigned to each item in a sample, the resulting set of values is called
numerical or quantitative. These variables can be ordered or ranked.

In some cases, if sample items are placed into categories, and category
names are assigned to the sample items, the data are categorical or
qualitative.

Example: In a loading test of column-to-beam welded connections,


data may be collected both on the torque applied at failure and on the
location of the failure (weld or beam).
Quantitative variable: Torque
Qualitative variable: Location (weld or beam)
Classify each variable as qualitative or
quantitative.
a. Colors of automobiles in the shopping
mall parking lot.
b. Number of desks in classrooms.
c. Classification of children (infant, toddler,
Exercises preschool) in a day care center.
d. Weights of fish caught in Laguna de Bay.
e. Scores of retakes of Quiz #1 in
Engineering Data Analysis.
f. Capacity, in liters, of water in La Mesa
Dam.
g. Number of off-road vehicles sold in the
United Kingdom in 2020.
Classification of Quantitative Variables

Discrete variables can be assigned


values such as 0, 1, 2, 3, and are
said to be countable. Examples of Continuous variables can assume
discrete variables are the number all values in an interval between
of children in a family, the number any two specific values. They are
of students in a classroom, and the obtained by measuring.
number of calls received by an
executive secretary.
Qualitative
Data Discrete
Quantitative
Continuous
Exercises

Classify each variable as discrete or continuous.


a. Number of loaves of bread baked each day at a local bakery.
b. Water temperature of the saunas at a given health spa.
c. Incomes of single parents who attend a community college.
d. Lifetimes of batteries in an Android phone.
e. Weights of newborn infants at a certain hospital.
f. Capacity (in cubic meters) of water in swimming pools in Quezon City.
g. Number of pizzas sold last year in Region 5.
Independence
• For example, if we draw a simple • DEFINITION
random sample of 2 items from the • The items in a sample are said to be
population {0 0 1 1}, the sampled items independent if, knowing the values
are found to be dependent. (Why?) of some of them does not help to
• However, if we draw 2 samples from predict the values of the others.
this population: {One million 0’s, one
million 1’s}, the sampled items are
practically independent. (Why?)
Classification Based on Categories
Nominal level of measurement classifies data into mutually exclusive,
non-overlapping, exhausting categories in which no order or ranking
can be imposed on the data.

Example:
a. Classifying instructors based on subject taught (Drawing, Statistics,
Algebra, English, etc.)
b. Classifying residents according to zip codes.
c. Classifying according to religions.
Classification Based on Categories
Ordinal level of measurement classifies data into categories that can
be ranked; however, precise differences between the ranks do not
exist.

Example:
a. Ranking on Ms. Universe 2021 (Titleholder, 1st runner-up, 2nd
runner-up)
b. Level of service of highways (A, B, C, D, E, F)
c. Ten-point grading system (1.00, 1.25, 1.50, etc.)
Classification Based on Categories
Interval level of measurement ranks data, and precise differences
between units of measure do exist; however, there is no meaningful
zero.

Example:
a. Intelligence quotient (IQ) – no meaningful zero (IQ tests do not
measure people who have no intelligence)
b. Temperature – there is 0 Celsius, but it does not mean no heat at
all.
Classification Based on Categories
Ratio level of measurement possesses all the characteristics of interval
measurement, and there exists a true zero. In addition, true ratios exist
when the same variable is measured on two different members of the
population.

Example:
a. Weight – if a person can lift 100 kg, and another can lift 150 kg, one
can have a weighting ratio of 2:3.
b. Number of phone calls – there is a true zero and can be expressed
in units of measure.
Observational and
Experimental Studies
Observational and Experimental Studies
In an observational study, the researcher merely observes what is
happening or what has happened in the past and tries to draw
conclusions based on these observations.

Example:
“Motorcycle owners are getting older and richer.”
- From data collected on the ages and incomes of motorcycle owners
for the years 2010 and 2020 and then compared.
Observational and Experimental Studies
In an experimental study, the researcher manipulates on of the
variables and tries to determine how the manipulation influences other
variables.

Example:
Two groups of female aged 20-24 years old were asked to do a sit-ups.
The first group was told only to “do your best” while the second group
was told to increase by 10% their sit-ups each day. The first group
averaged 43 sit-ups while the second group had 56 sit-ups.
Independent vs. Dependent Variables
• The independent variable in an experimental study is the one being
manipulated by the researcher. Also known as the explanatory
variable.
• The resultant variable is called the dependent or outcome variable.

From the sit-up study, what are the independent and dependent
variables?
Independent vs. Dependent Variables
• One can also perceive independence of variables by identifying
treatment and control groups.
• Treatment groups refer to the individuals or groups in which an
instruction or procedure is modified, while the control group serves
as the reference for the experiment.

Example: In an experimental study of adding fly-ash to concrete, three


groups of samples were made: first has no additives, second has added
fly-ash that is 5% of the cement volume, and the third has 6% added.
Advantages and Disadvantages of
Experimental Study
Advantages Disadvantages
Researcher can decide how to select subjects and how Research may occur in unnatural settings, on
to assign them to specific groups. environments which the variables are not naturally
exposed to.
Researcher can manipulate or control the Respondents may experience Hawthorne effect, when
independent variable (example: number of additives the subjects knew they were participating in an
added on a concrete sample, etc.) experiment, which changed their behavior in ways
that affected the results of the study (example: nature
vs. nurture study of Margaret Mead, and spot speed
studies)
Confounding variables may exist, these are the ones
that influence the dependent or outcome variable but
cannot be separated from the independent variable.
Advantages and Disadvantages of
Observational Study
Advantages Disadvantages
Usually occurs in natural settings. A definite cause-effect situation cannot be shown
since other factors may have influenced the results.
Observational studies can be done using variables that Can be expensive and time-consuming.
cannot be manipulated by the researcher (example:
left-handedness vs. right-handedness)
Since the researcher may not be using his own
measurements, the results could be subject to
inaccuracies of those who collected the data.
Uses and
Misuse of
Statistics
Suspect Samples
Happens when very samples were used to obtain information; subjects
may have a built-in bias or do not represent the population at-large; or
volunteers for the study were taken from a convenience sample.
Ambiguous Averages
Happens when a person, to convince his reader to see his point of view,
may purposedly erroneously use the measures or “averages” (such as
mean, median, mode, and midrange) to misrepresent data and fit his
narrative.
Table 1: Total cost of the past ten (10)
bridges constructed on a Region, in
pesos
6.75 M 7.35 M Mean:
11.77 M pesos (average construction reported) –
7.90 M 8.35 M implies that the amount of construction is around
this value.
22.5 M 7.10 M
7.65 M 7.95 M Median:
7.925 M pesos (the actual “average” amount of
8.40 M 33.75 M construction on the area)
Changing the Subject
Happens when a person distorts the statistics by using different set of
values to represent the same data.

Incumbent Mayor:

“Ang nagasta po ng aking administrasyon sa tatlong taon ng aking


panunungkulan ay nasa 3-porsiyento lamang.”

Opponent of the Incumbent Mayor:

“Noong panahon po ng ating incumbent mayor, umangat ng 23.5-


milyong piso ang nagastos ng kanyang pamahalaan.”

These two persons are trying to say the same thing. Ask yourself, as a
voter, which one is much more believable and convincing as truth?
Detached Statistics
A claim that uses a detached statistic is one in which no comparison is
made.

“Our brand of milk tea has one-third fewer calories.”

“Our calculator is 4 times faster than any other calculators”

When presented with these statements, always ask yourself,


“compared to what?”
Implied Connections
Attempt to imply connections between variables that may not actually
exist.

“Eating fish may help reduce your cholesterol”.”

“Studies suggest that using our exercise machine will reduce your
weight.”

“Taking calcium will lower blood pressure in some people.”

“Malulusog ang lahat ng kabataan noon dahil sa Notri-Ban”.

When presented with these statements, always ask for references.


These became misuse of statistics if references are not given.
Misleading Graphs
Graphs are visual representations of
data that enables viewers to analyze
and interpret data more easily than
by simply looking at numbers. If
misused, graphs can lead to
erroneous conclusions and
improper use of implications.

Always check for the sizes and


appearance of these graphs.
Faulty Survey Questions
One should be sure that the questions are properly written since the way questions are
phrased can often influence the way people answer them.

Do you feel that the government Do you agree on increasing tax spending so that the
should build a new sports stadium? government could build a new sports stadium?

How awesome is the product? How would you rate this product?

How often do you exercise each day? (This question implies that the respondent must be exercising each day)

Was the speaker on this webinar How would you rate the speaker?
worked well and the materials he used
good enough for you? How would you rate the training materials?

I don’t usually buy my stuff online. How often do you buy items online?

You might also like