0% found this document useful (0 votes)

115 views69 pages

Unit 8. Data Analysis

This document discusses statistics and data analysis. It defines statistics as collections of numerical data, summary measures calculated from data, and the activity of interpreting data. Descriptive statistics summarize and describe data through measures of central tendency like mean, median, and mode, as well as measures of variability. Inferential statistics allow inferences about populations from samples using techniques like hypothesis testing that employ probability. The document provides examples of descriptive statistics like graphs, measures of central tendency and variability, and discusses key concepts in inferential statistics like hypothesis testing and significance.

Uploaded by

tebebe solomon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

115 views69 pages

Unit 8. Data Analysis

Uploaded by

tebebe solomon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Data analysis

Feyera Senbeta (PhD)

The Meaning of Statistics
Several Meanings
Collections of numerical data

Summary measures calculated from a

collection of data

Activity of using and interpreting a collection of

numerical data
A Meaningful Statistic (Significant)?
Statistics, descriptive or inferential are NOT a
substitute for good judgment
Decide what level or value of a statistic is
meaningful
State judgment before gathering and analyzing
data
Examples:
Score on performance test of 80% is passing
Pre/post rules instruction reduces incidents by
50%
Interpretation of Meaning
Population Measure (statistic)
There is no sampling error
The number you have is “real”
Judge against pre-set standard

Inferential Measure (statistic)

Tellsyou how sure (confident) you can be
the number you have is real
Judge against pre-set standard and state
how certain the measure is
Statistics

Descriptive Statistics
Gives numerical and
graphic procedures to Inferential Statistics
summarize a collection Provides procedures
of data in a clear and to draw inferences
understandable way about a population
from a sample
Descriptive and Inferential
Statistics
Descriptive statistics: Mathematical methods (such as
mean, median, standard deviation) that summarize and
interpret some of the properties of a set of data (sample)
but do not infer the properties of the population from
which the sample was drawn.

Mathematical methods (such as hypothesis

development) that employ probability theory for deducing
(inferring) the properties of a population from the
analysis of the properties of a set of data (sample) drawn
from it.

6
Did it happen by chance?
How do you know if something caused or
correlates with something else?
The appropriate Statistic will tell you:
If there is a difference from some expected value

If the difference is statistically significant or merely

due to random chance

1/24/2013 7
Descriptive Statistics

Summarize or describe the

important characteristics of a
known set of population data

1/24/2013 8
Descriptive Statistics

Design Descriptive Statistics

Survey Studies Percentages, measures of
central tendency and variation

Causal comparative studies Measures of central tendency &

variation, percentages, standard
scores
Experimental Measures of central tendency &
variation, percentages, standard
scores, effect sizes
Types of descriptive statistics
Statistic is a quantitative index that describes
performance of a sample or samples

Parameter is a quantitative index describing the

performance of a population

Measures of central tendency are used to determine the

typical or average value among a group of values

Measures of variability indicate how spread out the

values are

1/24/2013 10
Descriptive Statistics (Vocabulary)
Central tendency
Mode
Median
Mean
Variation
Range
Standard deviation
Normal distribution
Standard score
Correlation
Regression
Descriptive Measures
Central Tendency measures. They are
computed to give a “center” around which the
measurements in the data are distributed.

Variation or Variability measures. They

describe “data spread” or how far away the
measurements are from the center.

Relative Standing measures. They describe

the relative position of specific measurements in the
data.
Measures of Central Tendency

Mean:
Sum of all measurements divided by the number
of measurements.

Median:
A number such that at most half of the
measurements are below it and at most half of the
measurements are above it.

Mode:
The most frequent measurement in the data.
Example of Mean

Measurements Deviation
x x - mean
MEAN = 40/10 = 4
3 -1
5 1
5 1 Notice that the sum of the
1 -3 “deviations” is 0.
7 3
2 -2
6 2 Notice that every single
7 3 observation intervenes in
0 -4
4 0
the computation of the
40 0
mean.
Example of Median
Measurements Measurements
Ranked
Median: (4+5)/2 =
x x 4.5
3 0
5 1
5 2
Notice that only the two
1 3
central values are used
7 4 in the computation.
2 5
6 5
7 6
The median is not
0 7 sensible to extreme
4 7 values
40 40
Example of Mode
Measurements

x
3
5 In this case the data have
5 tow modes:
1
7 5 and 7
2 Both measurements are
6
7 repeated twice
0
4
Example of Mode
Measurements
x
3
5
Mode: 3
1
1
4
7 Notice that it is possible for a
3 data not to have any mode.
8
3
Graphing data
Provides a quick view of the what your data is telling
you.

There are various types of graphs which are used in

statistics including bar graphs, histograms, scatter
plots, pie charts, frequency polygons etc.

1/24/2013 18
Example group of test scores

1/24/2013 19
Frequency Polygon and Pie Chart

1/24/2013 20
Common mistakes
Use of one dataset as graph & table
Use of one dataset as frequency and %
histogram
Which graph to use histogram, pie chart,
linear
The importance of making graph
Sample Bar Graph

1/24/2013 22
Sample Histogram

1/24/2013 23
Sample Scatter Plot

1/24/2013 24
Frequency Distributions
Frequency distributions are like frequency
polygons; however, instead of straight lines,
a frequency distribution uses a smooth
curve to connect the points and, similar to a
graph, is plotted on two axes.

1/24/2013 25
J Shaped Curve

1/24/2013 26
Bimodal Curve with Two Peaks

1/24/2013 27
Positively Skewed Bell Curve

1/24/2013 28
Negatively Skewed Bell Curve

1/24/2013 29
Symmetric Bell Curve/Normal
Distribution

1/24/2013 30
What is the Normal Distribution ?
•Where did it come from and why is it so special?

• It is just about anything you measure turns out

to be normally distributed, at least approximately
so.

•That is, usually most of the observations cluster

around the mean, with progressively fewer
observations out towards the extremes

1/24/2013 31
Sample Histogram

1/24/2013 32
Just about any histogram can be
converted into a line graph

1/24/2013 33
Which can be used to plot a
normal distribution

1/24/2013 34
But how do we get from the
normal to the standard normal?

1/24/2013 35
Measures of variability
Range – Difference between the highest and
lowest values (high value -low value = range)
Variance S2
Standard Deviation S
variation of values about the mean

1/24/2013 36
Measures of variation – range
Range= highest value-lowest value

Bank waiting time values:

Values of 4, 7, 7 the range is 7-4 or 3

With values of 1, 3, 14, the range is 14-1 or 13

1/24/2013 37
Other key measures of variation
S2= Variance

S Standard Deviation

1/24/2013 38
Measures of variation –
standard deviation

x
6
6
6

1/24/2013 39
The Z statistic will allow you to
standardize a normal
distribution

1/24/2013 40
Inferential Statistics
To generalize or predict how a large
group will behave based upon
information taken from a part of the
group is called INFERENCE
Techniques which tell us how much
confidence we can have when we
GENERALIZE from a sample to a
population
Inferential Statistics (Vocabulary)
Hypothesis
Null hypothesis
Alternative hypothesis

ANOVA
Level of significance
Type I error
Type II error
Collecting a random sample
Goal: to understand characteristics about a population

Examples:
What’s the average household income of the 09 Kebele
resident?

What proportion of people living in Dire Dawa Town have had

malaria?
Estimating the mean
One of the most common goals of statistical
inference is estimating a population mean
with a sample mean
Central Limit Theorem
When we have n independent, identically distributed
(X1..Xn) random variables, the mean of those random
variables approaches a normal distribution with mean =
µ and variance = 2 , as n gets large.
n

Independence of random variables means that the value

of one observation has no effect on the value of another
observation.

Identical distribution of random variables means that

each random variable comes from the same population
(e.g., roll of a die, coin flip).
Simple random sampling
Each observation drawn does not depend on others
drawn
Thus observations are independent

Each observation (i.e., each random variable) is

identically distributed
The population has a distribution that doesn’t change (each
observation is randomly drawn from an identical distribution –
the distribution of the population).

So the Central Limit Theorem applies!

(when n is large)
What does this mean?
Suppose we take a sample of n=50
observations from a population that frequency
has this distribution:

0 10 20 30
Mean (µ) = 20
2
Variance ( ) = 100
Std. dev ( ) = 10

We then find the mean of this sample (suppose this mean = 19). Take
another sample of 50 observations and find the mean (suppose it’s 24).
Do this many times, and we’ll come up with a distribution of means. The
Central Limit Theorem tells us this distribution will always look like the
next slide (as long as n is “large”, and 50 is large enough):
The normal curve

16 18 20 22 24
x
2
Mean (µ) = 20 Sample size (n) = 50 variance of sample mean = =2
n
Symbols
Population Parameter: µ

Estimate: ẋ

Expected: E
Basic Types of Inference
Point Inference
The value of a population parameter µ is estimated using a
single value ẋ

Examples: mean, standard deviation, etc.

Interval Inference
Attaching a probability to an estimate (i.e., making a
confidence interval)

Example: we are 95% confident that µ is between 10 and 20

Judging the Quality of the
Estimator
ˆ )and
Bias – the difference between E (Θ Θ
(i.e., Bias = E (Θ
ˆ )−Θ
)

Bias may be positive or negative (e.g., a

positively biased estimator would indicate the
population parameter is higher than it actually is)

Efficiency – how clustered the distribution of

is (i.e., how “peaked” is its distribution) Θ̂
Point Estimates (inferring population
parameters from samples)
Population Mean: µ=x

Population Proportions: π = P = X /n

Population Variance: σ 2 = s2

Population Standard Deviation: σ = s

Confidence Intervals
The degree of confidence we have in our estimates defined
by a percentage

Common examples: 90, 95, or 99% confident

The confidence interval is defined with the α symbol

In confidence intervals, alpha (α) is the proportion of time

your confidence interval is wrong

The typical usage is: zα / 2

Why do we divide by 2?
Confidence Interval Example
What is the 95% confidence interval for a normally distributed
variable?

α= 1 - desired confidence interval

α= 1 – 0.95 = 0.05

Remember that we divide α by 2 since we have uncertainty both

above and below the mean (i.e., 2 tails)

Therefore we use z0.025 for the 95% confidence interval

From the z-table we find that z0.025 = 1.96

What does this mean?

Interval Estimation (making confidence
intervals for population parameters estimated
from samples)
Case #1 estimating an interval for µ when X is
normally distributed and we know σ

This is the simplest case because normality

allows us to use the z-table

This is also unlikely since it requires knowing the

distribution and the σ (which implies knowing µ
already)
Example #1: Create a confidence
interval for µ
A town is considering building a new bridge over a
river. The primary goal is to reduce workers’
commute times from a particular community. A
random sample of workers in that community are
asked to estimate their reduction in commute time if
the bridge were built.

Our goal is to estimate the mean reduction in

commute time for the whole community if the bridge
were built. Create a 95% confidence interval for this
mean.
Example #1 Data
n = 100 workers are sampled
x = 17 minutes
σ = 30 minutes
What is the 95% confidence interval for
the mean?
Constructing a confidence interval
Construct a 95% confidence interval around the sample mean
σ σ
P( X − 1.96 ≤ µ ≤ X + 1.96 ) = 0.95
n n

30 30
P(17 − 1.96 ≤ µ ≤ 17 + 1.96 ) = 0.95
100 100

P(17 − 1.96 * 3 ≤ µ ≤ 17 + 1.96 * 3) = 0.95

P(17 − 5.88 ≤ µ ≤ 17 + 5.88) = 0.95

So we can say that the 95% C.I. is 17 +/- 5.88 or 11.12, 22.88
Example #1 Questions
What would happen to our interval if we
used a 99% confidence interval instead?

What would happen to our confidence

interval if we sampled 200 people instead
of 100 people?
Interval Estimation (making confidence
intervals for population parameters estimated
from samples)
Case #2 estimating an interval for µ when X is
not normally distributed and we know σ

In this case the n matters a lot, why?

This is also unlikely since it requires knowing the

distribution and the σ (which implies knowing µ
already)
Interval Estimation (making confidence
intervals for population parameters estimated
from samples)
Case #3 estimating an interval for µ when σ and
the distribution are unknown

What should we used instead of σ?

Can we use the z-table in this case?

This case is what we see most commonly

t-distribution vs. z-
distribution
When we only have s (and not σ) we use the t-
distribution rather than the z-distribution

To do so we use the t-table

How are they different?

The t-distribution changes depending on the degrees of
freedom (n-1)
This is reflected in the table and in the symbol tα / 2,n −1
The t-distribution accounts for more uncertainty (i.e., wider
confidence intervals) since s is just an estimate for σ
t-distribution vs. z-distribution
As n approaches infinity t and z become equal

This means that even when we have s instead of σ we can use the z-
distribution if n is large
Central Limit Theorem: “…as n gets large.”
What is “large”?
Rule of thumb: 30

For n less than 30, the distribution of x does not follow the normal
distribution accurately enough.

But the distribution of x does closely follow a t-distribution for sample

sizes of less than 30.

For this class use the t-distribution any time you have s instead of σ
Example #2

n = 16
x = 30
s2 = 1600
What is the 95% C.I. for the mean?
Example #2
s = 40
Degrees of freedom = n – 1 = 15
tα / 2,n −1 = t0.05 / 2,16 −1 = t0.025,15 = 2.131(from the t-table)
s s
P ( X − 2.131 ≤ µ ≤ X + 2.131 ) = 0.95
n n

40 40
P (30 − 2.131 ≤ µ ≤ 30 + 2.131 ) = 0.95
16 16

P (30 − 2.131 10 ≤ µ ≤ 30 + 2.131 10) = 0.95

P (30 − 21.31 ≤ µ ≤ 30 + 21.31) = 0.95

The 95% confidence interval for the mean is (8.69, 51.31)

Interval Estimation (making confidence intervals
for population parameters estimated from
samples)
Case #4 estimating an interval for a proportion π
based on a sample proportion p

Remember that p = x/n

In other word, p = the number of “successes” divided by
the number of samples
For example: the proportion of people over 6ft tall

In this case we don’t need s or σ, but we do need

the standard deviation of p: π (1 − π )
σp =
n

Which we estimate as: p (1 − p )

sp =
n
Interval Estimation (making confidence intervals
for population parameters estimated from
samples)
Case #4 continued
p (1 − p) p(1 − p )
Equation: p − zα / 2 ≤ π ≤ p + zα / 2
n n
We use the z-distribution for estimating an interval for a
proportion π based on a sample proportion p

This also limits us to using only large samples (in this case n >
100)

For smaller samples, we calculate the entire distribution using

the binomial mass function: P ( x ) = C xnπ x (1 − π )(i.e.,
n− x
solve for
all x values)
Example #3
n = 150 people at a convention
63 people sampled were over 6 feet tall
What is the 99% C.I. for the true
proportion of all people ≥6 ft tall at the
convention?
Example #3
p = 63/150 = 0.42
99% C.I. -> z α /2 = z0.005 = 2.58 (from the z-table)

p(1 − p) p(1 − p)
p − zα / 2 ≤ π ≤ p + zα / 2
n n

0.42 * 0.58 0.42 * 0.58

0.42 − 2.58 ≤ π ≤ 0.42 + 2.58
150 150

0.42 − 2.58 * 0.04 ≤ π ≤ 0.42 + 2.58 * 0.04

0.42 − 0.104 ≤ π ≤ 0.42 + 0.104

The 99% confidence interval for p = 0.42 is (0.316, 0.524)

Spss PPT Kailash
No ratings yet
Spss PPT Kailash
26 pages
Bayes' Law and Probability Concepts
No ratings yet
Bayes' Law and Probability Concepts
7 pages
Statistics for Educators & Analysts
100% (1)
Statistics for Educators & Analysts
5 pages
Introduction to Statistics for Engineers
No ratings yet
Introduction to Statistics for Engineers
127 pages
2.1 Descriptive Statistics Contd
No ratings yet
2.1 Descriptive Statistics Contd
20 pages
Ss Notes
No ratings yet
Ss Notes
34 pages
Practice Problems On Descriptive Statistics
No ratings yet
Practice Problems On Descriptive Statistics
4 pages
Compare Means One Sample Cases: T-Test (Z-Test)
No ratings yet
Compare Means One Sample Cases: T-Test (Z-Test)
42 pages
Statistics Formula Sheet-With Tables
No ratings yet
Statistics Formula Sheet-With Tables
5 pages
The Three MS: Analysis Data
No ratings yet
The Three MS: Analysis Data
5 pages
AMOS Multi-Group Analysis Guide
No ratings yet
AMOS Multi-Group Analysis Guide
22 pages
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
No ratings yet
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
111 pages
EPIData Presentation
No ratings yet
EPIData Presentation
36 pages
Inferential Statistics
100% (1)
Inferential Statistics
176 pages
10 - 11 SPSS Introduction PDF
No ratings yet
10 - 11 SPSS Introduction PDF
25 pages
Presenter:: Prof. Richard Chinomona
100% (1)
Presenter:: Prof. Richard Chinomona
55 pages
SPSS2 Workshop Handout 20200917
No ratings yet
SPSS2 Workshop Handout 20200917
17 pages
SEM Notes
No ratings yet
SEM Notes
3 pages
Chapter 9. Test of Hypotheses For A Single Sample
No ratings yet
Chapter 9. Test of Hypotheses For A Single Sample
98 pages
Example of Two Group Discriminant Analysis
No ratings yet
Example of Two Group Discriminant Analysis
7 pages
Master of Statistics Program Guide
100% (1)
Master of Statistics Program Guide
24 pages
QNT 351 Final Exam Correct Answers 100%
100% (1)
QNT 351 Final Exam Correct Answers 100%
4 pages
Evaluation of Evidence
No ratings yet
Evaluation of Evidence
51 pages
Logit Model For Binary Data
No ratings yet
Logit Model For Binary Data
50 pages
Advanced Statistical Distributions
No ratings yet
Advanced Statistical Distributions
13 pages
Estimation in Statistics
100% (1)
Estimation in Statistics
4 pages
Ratio and Proportion (Part-2) MCQ (3!6!21)
No ratings yet
Ratio and Proportion (Part-2) MCQ (3!6!21)
12 pages
Ibm Spss
No ratings yet
Ibm Spss
20 pages
Formula Sheet, Final Exam, April 2013: 1. Control Charts
No ratings yet
Formula Sheet, Final Exam, April 2013: 1. Control Charts
1 page
Statatistical Inferences
No ratings yet
Statatistical Inferences
22 pages
17 A Introduction To Descriptive Statistics and Exploratory Data Analysis
100% (1)
17 A Introduction To Descriptive Statistics and Exploratory Data Analysis
47 pages
Chapter 4 Inferential
No ratings yet
Chapter 4 Inferential
135 pages
Data Analyst Multiple Choice Questions
100% (1)
Data Analyst Multiple Choice Questions
24 pages
The Central Limit Theorem and Hypothesis Testing Final
100% (1)
The Central Limit Theorem and Hypothesis Testing Final
29 pages
Factor Analysis Using SPSS: Example
No ratings yet
Factor Analysis Using SPSS: Example
14 pages
Stat Reviewer Notes
No ratings yet
Stat Reviewer Notes
14 pages
IE401 Lecture 3 Descriptive Statistics Grouped Data
No ratings yet
IE401 Lecture 3 Descriptive Statistics Grouped Data
51 pages
Chapter 1-Basic Statistical Concepts
No ratings yet
Chapter 1-Basic Statistical Concepts
30 pages
Introduction to Biostatistics Lecture
No ratings yet
Introduction to Biostatistics Lecture
22 pages
Book-Sher Muhammad Chaudary - 89-133 PDF
100% (1)
Book-Sher Muhammad Chaudary - 89-133 PDF
45 pages
Ch. 9 Multiple Choice Review Questions: 1.96 B) 1.645 C) 1.699 D) 0.90 E) 1.311
100% (1)
Ch. 9 Multiple Choice Review Questions: 1.96 B) 1.645 C) 1.699 D) 0.90 E) 1.311
5 pages
New ND Manual PDF
No ratings yet
New ND Manual PDF
36 pages
How To Use The WRF Registry: WRF Software Architecture Working Group
No ratings yet
How To Use The WRF Registry: WRF Software Architecture Working Group
62 pages
Assignment-Based Subjective Questions/Answers
No ratings yet
Assignment-Based Subjective Questions/Answers
3 pages
Statistics For Support Slides
No ratings yet
Statistics For Support Slides
186 pages
32 2022 Ao Jao MPL Admn Notification20221231205516
No ratings yet
32 2022 Ao Jao MPL Admn Notification20221231205516
26 pages
Statistics: Correlation Insights
No ratings yet
Statistics: Correlation Insights
48 pages
Statistics For Health Research: Non-Parametric Methods
100% (1)
Statistics For Health Research: Non-Parametric Methods
56 pages
Hotelling T-Square
No ratings yet
Hotelling T-Square
16 pages
R Programming Exam With Solutions
No ratings yet
R Programming Exam With Solutions
9 pages
Second Year B.C.A. (Sem. I LL) Exam Ination 301: Statistical M Ethods
No ratings yet
Second Year B.C.A. (Sem. I LL) Exam Ination 301: Statistical M Ethods
4 pages
R for Economics Students
No ratings yet
R for Economics Students
128 pages
UGC Statistics Curriculum 2001
No ratings yet
UGC Statistics Curriculum 2001
101 pages
Basic Statics
No ratings yet
Basic Statics
218 pages
Introduction To Data and Statistics With R
No ratings yet
Introduction To Data and Statistics With R
45 pages
Topic03 Correlation Regression
No ratings yet
Topic03 Correlation Regression
81 pages
Statistics Notes Self Made
100% (1)
Statistics Notes Self Made
41 pages
Statistical Methods in Social Sciences
No ratings yet
Statistical Methods in Social Sciences
69 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
HERQA QA Focus Area Thresholds Revised
No ratings yet
HERQA QA Focus Area Thresholds Revised
17 pages
HERQA Preparing Self Evaluation Document
100% (3)
HERQA Preparing Self Evaluation Document
9 pages
HERQA Institutional Self Evaluation
100% (4)
HERQA Institutional Self Evaluation
10 pages
Unit 6. Measurement and Sacle
No ratings yet
Unit 6. Measurement and Sacle
48 pages
Unit 9. Reserach Proposal Writing
No ratings yet
Unit 9. Reserach Proposal Writing
57 pages
Unit 3. Research Design
No ratings yet
Unit 3. Research Design
42 pages
Unit 8.1 Correlation-Regression
No ratings yet
Unit 8.1 Correlation-Regression
38 pages
Unit 4. Sampling Design
No ratings yet
Unit 4. Sampling Design
49 pages
Food and Beverage Service I
No ratings yet
Food and Beverage Service I
37 pages
Unit 1. Introduction To Research Methods
No ratings yet
Unit 1. Introduction To Research Methods
60 pages
Preparation IV Note
No ratings yet
Preparation IV Note
63 pages
Service Recovery Strategy and Customers
No ratings yet
Service Recovery Strategy and Customers
13 pages
The Effect of Service Quality and Servic
No ratings yet
The Effect of Service Quality and Servic
104 pages
DA Unit II
No ratings yet
DA Unit II
330 pages
Statistics & Business Math Basics
No ratings yet
Statistics & Business Math Basics
4 pages
Quantitative Analysis For Business Module
No ratings yet
Quantitative Analysis For Business Module
18 pages
Whole Paper in Transpo-1
No ratings yet
Whole Paper in Transpo-1
20 pages
New Module Geography May 2021
No ratings yet
New Module Geography May 2021
38 pages
Principles of Demography Notes 1
No ratings yet
Principles of Demography Notes 1
66 pages
Chapter 8 California Mathematics - Grade 6-9
No ratings yet
Chapter 8 California Mathematics - Grade 6-9
64 pages
Bus Math-Module 6.6 Measures of Central Tendency
No ratings yet
Bus Math-Module 6.6 Measures of Central Tendency
89 pages
CH 10 Analysing Data
100% (1)
CH 10 Analysing Data
68 pages
FGHJ
No ratings yet
FGHJ
16 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
8 pages
Research Methods and Data Analysis
No ratings yet
Research Methods and Data Analysis
8 pages
1st Sem Stats Book (BHALOTIA) (PDF - Io)
No ratings yet
1st Sem Stats Book (BHALOTIA) (PDF - Io)
153 pages
Practical 3
No ratings yet
Practical 3
32 pages
Descriptive Statistics Guide
No ratings yet
Descriptive Statistics Guide
36 pages
Test For Stat 25 - 04
No ratings yet
Test For Stat 25 - 04
12 pages
Statistics Homework Answers - Stats Homework Answers
No ratings yet
Statistics Homework Answers - Stats Homework Answers
2 pages
Mean, Median, Mode Solutions Guide
No ratings yet
Mean, Median, Mode Solutions Guide
8 pages
Pass Edited Final 6 Sep
No ratings yet
Pass Edited Final 6 Sep
16 pages
PPST Resource Package Objective 8
No ratings yet
PPST Resource Package Objective 8
22 pages
Budget of Work Math 7
No ratings yet
Budget of Work Math 7
8 pages
Kemh 113
No ratings yet
Kemh 113
32 pages
Chapter 10 Data Analysis-Quantitative
No ratings yet
Chapter 10 Data Analysis-Quantitative
93 pages
Statistical Treatment of Data Descriptive Statistics
No ratings yet
Statistical Treatment of Data Descriptive Statistics
19 pages
Measures of Central Tendency Guide
No ratings yet
Measures of Central Tendency Guide
24 pages
Sm-2.5-Balaji College - Kadapa - Icet Code Bimk
No ratings yet
Sm-2.5-Balaji College - Kadapa - Icet Code Bimk
108 pages
Exploring Research 9th Edition Salkind Solutions Manual PDF Download
100% (5)
Exploring Research 9th Edition Salkind Solutions Manual PDF Download
33 pages
MST-002 Descriptive Statistics: Analysis of Quantitative Data
No ratings yet
MST-002 Descriptive Statistics: Analysis of Quantitative Data
76 pages
Statistics Notes BBA
100% (1)
Statistics Notes BBA
7 pages
Biostatistics for Medical Students
No ratings yet
Biostatistics for Medical Students
35 pages

Unit 8. Data Analysis

Uploaded by

Unit 8. Data Analysis

Uploaded by

Data analysis

Feyera Senbeta (PhD)

Summary measures calculated from a

Activity of using and interpreting a collection of

Inferential Measure (statistic)

Mathematical methods (such as hypothesis

If the difference is statistically significant or merely

Summarize or describe the

Design Descriptive Statistics

Causal comparative studies Measures of central tendency &

Parameter is a quantitative index describing the

Measures of central tendency are used to determine the

Measures of variability indicate how spread out the

Variation or Variability measures. They

Relative Standing measures. They describe

There are various types of graphs which are used in

• It is just about anything you measure turns out

•That is, usually most of the observations cluster

Bank waiting time values:

Values of 4, 7, 7 the range is 7-4 or 3

With values of 1, 3, 14, the range is 14-1 or 13

What proportion of people living in Dire Dawa Town have had

Independence of random variables means that the value

Identical distribution of random variables means that

Each observation (i.e., each random variable) is

So the Central Limit Theorem applies!

Examples: mean, standard deviation, etc.

Example: we are 95% confident that µ is between 10 and 20

Bias may be positive or negative (e.g., a

Efficiency – how clustered the distribution of

Population Standard Deviation: σ = s

Common examples: 90, 95, or 99% confident

The confidence interval is defined with the α symbol

In confidence intervals, alpha (α) is the proportion of time

The typical usage is: zα / 2

α= 1 - desired confidence interval

Remember that we divide α by 2 since we have uncertainty both

Therefore we use z0.025 for the 95% confidence interval

From the z-table we find that z0.025 = 1.96

What does this mean?

This is the simplest case because normality

This is also unlikely since it requires knowing the

Our goal is to estimate the mean reduction in

P(17 − 1.96 * 3 ≤ µ ≤ 17 + 1.96 * 3) = 0.95

P(17 − 5.88 ≤ µ ≤ 17 + 5.88) = 0.95

What would happen to our confidence

In this case the n matters a lot, why?

This is also unlikely since it requires knowing the

What should we used instead of σ?

Can we use the z-table in this case?

This case is what we see most commonly

To do so we use the t-table

How are they different?

But the distribution of x does closely follow a t-distribution for sample

P (30 − 2.131 *10 ≤ µ ≤ 30 + 2.131 *10) = 0.95

P (30 − 21.31 ≤ µ ≤ 30 + 21.31) = 0.95

The 95% confidence interval for the mean is (8.69, 51.31)

Remember that p = x/n

In this case we don’t need s or σ, but we do need

Which we estimate as: p (1 − p )

For smaller samples, we calculate the entire distribution using

0.42 * 0.58 0.42 * 0.58

0.42 − 2.58 * 0.04 ≤ π ≤ 0.42 + 2.58 * 0.04

0.42 − 0.104 ≤ π ≤ 0.42 + 0.104

You might also like

P (30 − 2.131 10 ≤ µ ≤ 30 + 2.131 10) = 0.95