[go: up one dir, main page]

0% found this document useful (0 votes)
57 views13 pages

Data Analysis 1

Uploaded by

bradleymakaure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views13 pages

Data Analysis 1

Uploaded by

bradleymakaure
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Analysis

The purpose of data analysis is to::

•Produce descriptive statistics to summarize the data.

•Create graphics which help to visualize data.

•Use inferential statistics to distinguish between significant and non-


significant effects .

•Create predictive models which can be used to predict future results within a
given experimental domain.
Data Analysis
Descriptive Statistics can be categorized in two groups:
1. Measures of centrality.
2. Measures of variation.

Measure of Advantage Disadvantage Formula


Centrality
Arithmetic Mean Can be used for Sensitive to
inferential outliers
statistics
Geometric mean Damp the effect of Cannot be used
outliers. useful for for inferential
changing data statistics
Harmonic mean Damp the effect of Cannot be used
outliers. Useful for for inferential
rates and ratios statistics
Median Insensitive to Insensitive to Exact center of
outliers the distribution distribution
of data
Data Analysis

Measure Advantage Disadvantage Formula


of
dispersion
Standard Very useful parameter, Not additive
deviation properties are well known
Variance Useful parameter. The Squares true
variance is additive dispersion
Relative Useful when comparing Cannot be used for
STD dissimilar data sets statistical inference
Standard Used when calculating Not additive
Error uncertainties
Range Simple to calculate Based on only two
data points
Types of Variables

Type of Variable Definition Examples


Continuous Variable which can take on Mass,
any value between two Concentration,
specified limits Temperature.
Nominal Categorical variable in which Type of catalyst,
there is no order Method of analysis,
Binary variable: Pass/Fail.
Ordinal Ordered categorical variable Rating scale,
Diagnosis.
For many methods of data analysis, it is important to identify the
independent variables (factors) and the dependent variable (response)
Exploratory Data Analysis (EDA)
EDA is used for the following purposes:
•To help the researcher to formulate relevant hypothesis.
•To suggest the appropriate statistical tools to analyze the data.

Many EDA techniques involve graphical displays of the data such as:

•Histograms,
•Box and whisker plots,
•Pareto charts,
•Stem-and-leaf plots,
•Multi-vari charts.
Exploratory Data Analysis (EDA)
Example

Histogram: Yield (g) B ox P lot: Y ield (g)


400 12

10
300
8

Yield (g)
No of obs

200 6

4
100
2

0 0
1 2 3 4 5 6 7 8 9 10 11 12
Y ie ld (g)
Exploratory Data Analysis (EDA)
example 2: Box plots and Correlation matrix
of IQ and 4 Test marks (2000 students)

Box & Whisker Plot


120
Correlation Matrix
100 T1 T2 T3 T4
80 IQ 0.51 0.82 0.02 0.52

60 T1 0.42 0.03 0.60

40 T2 0.04 0.55

20 T3 0.02

0
IQ T1 T2 T3 T4
Exploratory Data Analysis (EDA)

Other EDA techniques:

• Cluster Analysis Collects “similar” variables in


clusters.

• Principle Component Analysis Reduces the number of


independent variables to the
essential variables.

• Factor Analysis Used to detect the relationship


between variables.

• Discriminant Analysis Used to detect variables which


discriminate between naturally
occurring groups.

• Categorical data Analysis Studies the relationship


between nominal and
ordinal variables.
Exploratory Data Analysis (EDA)
Example : Cluster Analysis
Cluster Diagram: Four Tests

Test 1

Test 4

Test 2

Test 3

400 600 800 1000 1200 1400


Linkage Distance
Exploratory Data Analysis (EDA)
Example : Categorical Data Analysis
Contingency Tables

Diagnosis
Treatment No Little Good
Improvement Improvement Improvement

A 12* 25 30

B 4 7 8
C 34 35 36
* The number in the cells are patient counts
From this contingency table, we can determine, by
performing a chi-squared test, whether there is a significant
difference between the treatments.
Statistical Inference:
Estimating the parameters of a population from the
statistics of a representative sample.

Examples Statistics Parameters


Statistic (from sample) Parameter

Sample Mean :X Population mean μ

Sample STD: S Population STD: σ

Sample Proportion: p Population proportion: ρ


Statistical Inference
The following statement always applies:

Measurement =Parameter ± Experimental error

• Parameters can only be estimated within a calculated uncertainty.

• Whenever a estimated parameter is given, the uncertainty associated


with it, must be given as well.

• The actual calculation of the uncertainty depends on the distribution of


the data.

• The uncertainty can be visualized by using error bars


Statistical Inference
Analysis Wanted Methods Available
Compare 2 independent samples T-Test for normal data
Mann-Withney test for non-normal data
Compare 2 related samples Paired t-Test
Compare n (n>2) independent ANOVA for normal data
samples Friedmann ANOVA for non-normal data
Compare trends Regression with indicator variables
Detect the effects of factors on a
response Multiple regression
Find the levels of the factors for
which maximum or / and minimum Response Surface Modeling
responses are achieved.

Definition: Significant effect = An effect not caused by experimental error

Whether an effect is significant or not, is decided on by using p-values.

You might also like