Data Analysis 1
Data Analysis 1
•Create predictive models which can be used to predict future results within a
given experimental domain.
Data Analysis
Descriptive Statistics can be categorized in two groups:
1. Measures of centrality.
2. Measures of variation.
Many EDA techniques involve graphical displays of the data such as:
•Histograms,
•Box and whisker plots,
•Pareto charts,
•Stem-and-leaf plots,
•Multi-vari charts.
Exploratory Data Analysis (EDA)
Example
10
300
8
Yield (g)
No of obs
200 6
4
100
2
0 0
1 2 3 4 5 6 7 8 9 10 11 12
Y ie ld (g)
Exploratory Data Analysis (EDA)
example 2: Box plots and Correlation matrix
of IQ and 4 Test marks (2000 students)
40 T2 0.04 0.55
20 T3 0.02
0
IQ T1 T2 T3 T4
Exploratory Data Analysis (EDA)
Test 1
Test 4
Test 2
Test 3
Diagnosis
Treatment No Little Good
Improvement Improvement Improvement
A 12* 25 30
B 4 7 8
C 34 35 36
* The number in the cells are patient counts
From this contingency table, we can determine, by
performing a chi-squared test, whether there is a significant
difference between the treatments.
Statistical Inference:
Estimating the parameters of a population from the
statistics of a representative sample.