Aksum University
College of Health Science and Referral Hospital
Departments of Public health
Course Name: Biostatistics
Correlation and Regression
Negasi A. (Assistance Prof. in Biostatistics)
1
CORRELATION ANALYSIS
Correlation is the method of analysis to use when
studying the possible association between two continuous
variables.
Correlation analysis is used to measure strength of the
association (linear relationship or straight line
association ) between two variables
The standard method (Pearson correlation ) leads to a
quantity called r that can take on any value from -1 to
2
+1.
CORRELATION ANALYSIS
The correlation between two variables is positive if
higher values of one variable are associated with higher
values of the other and negative if one variable tends to
be lower as the other gets higher.
A correlation of around zero indicates that there is no
linear relation between the values of the two variables
(i.e. they are uncorrelated).
3
CORRELATION ANALYSIS
It is important to note that a correlation between two
variables shows that they are associated but does not
necessarily imply a ‘cause and effect’
effect’ relationship.
We consider the two variables equally
In essence r is a measure of the scatter of the points
around an underlying linear trend:: the greater
the spread of the points the lower
the correlation. 4
SCATTER PLOT
A Scatter Plot is a graph of the order pairs (x, y) of numbers consisting
of the independent variable X (on the horizontal axis) and the
dependent variable, Y (on the vertical axis) .
Scatter plots are especially useful when you are examining the
relationship/association between two or more continuous variables using
statistical techniques such as correlation or regression.
That is, they help you to understand how well linear regression fits your
data and how, positively, negatively or neither, the two variables are
related.
Before trying to calculate r or fit any model, it is
better to see its scatter plot 5
TYPES OF SCATTER PLOT
PLOTS
S
(a).. Simple scatter plot
(a)
The Simple scatter plot graphs the relationship between
two quantitative variables
(b)matrix scatter plot
allows you to see the relationship between all
combinations of many different pairs of variables.
Therefore, a variable is plotted with every other
variable to visualize this relationship between two or
more variables.
Every combination is plotted twice so that each
variable appears on both the X and Y axis. 6
7
8
9
Examples of Matrix scatter plot
Example:- Construct a Matrix scatter plot with four
Example:
variables, age, weight loss, weight before treatment,
weight after treatment
10
CORRELATION COEFFICIENT
The population correlation coefficient ρ (rho) measures
the strength of the association between the variables
The sample correlation coefficient r is an estimate of ρ
and is used to measure the strength of the linear
relationship in the sample observations
11
Features of ρ and r
Unit free
Range between -1 and 1
The closer to -1, the stronger the negative linear relationship
Example: Depression & Self-
Self-esteem , Studying & test errors
GPA and Average TV watching time
The closer to 1, the stronger the positive linear relationship
Example: GPA and Studying time
-Smoking and Lung Damage
-Performance Evaluation and Sociability
The closer to 0, the weaker the linear relationship
Example : GPA and Shoe Size 12
Examples of Approximate r Values
13
EXAMPLE
o Systolic Blood Pressure against Age
14
Calculating the Correlation Coefficient
15
16
17
18
19
20
Interpretation of correlation
A very small correlation does not necessarily indicate
that two variables are not associated
However, no linear association
To be sure of this we should study a scatter plot of
data, because it is possible that the two variables
display a non-linear relationship (for example cyclical
or curved). 21
EXERCISE-1:
In an experimental design to determine the relationship between
the Risk factor, X and the Outcome, Y. The following sets of
data are obtained.
What type relationship do you observe between x and y? Is an
increase in x followed by an increase in y?
22
EXCERCISE-2:
The following data are records of birth weight in kg(x10) current
average income(x1000) and years of college education completed
by mothers for a simple random sample of 10 births occurring in a
single hospital in one month.
Then, produce a scatter plot birth weight on college education
completed by mothers.
23
INTRODUCTION TO REGRESSION ANALYSIS
What is Regression Analysis?
Regression analysis is a tool/
tool/technique
technique in biostatistics for :-
the investigation and modeling of relationships between
variables, one known as dependent and the remain known as
independent..
independent
studying of the dependence of one variable on one or more other
variables.. In other words, we can use it for examining the
variables
relationship that may exist among certain variables
variables..
Often, the dependent variable is denoted by y and the
independent variables by x1, x2, x3, - --
--,,xk
24
CLASSIFICATION OF VARIABLE IN REGRESSION
There are two types of variables in regression analysis
analysis..
Those variables are often classified as y`s and x`s the
y- variable is called
Dependent Variable,
Outcome variable
Output variable,
Response Variable
Target Variable,
25
and the x-variable is, however, known as
Independent Variable
Risk factor
Explanatory Variable
Predictor
Input variable,
Note that
that:: we have always one dependent variable , but
for this dependent variable, we can have one or more
independent variables
26
Example: We may employ regression technique/analysis
(1). To study how heart rate affected by or depends up on :-
Emotional Stress level
Illness when the body immune system becomes compromised
eg.. injury, anemia
eg
Exercise
In this case,
heart rate is the Dependent Variable
Exercise, Illness, Stress ….. Is the Independent Variable
27
Types of Regression Analysis
The relationship existing among regression variables could be:
(i). Linear or straight line relationship:
It is one in which the relationship between X and Y can best be
represented by a straight line.
line.
Example:
(ii). Curvilinear relationship:
relationship:
A curvilinear relationship is one in which the relationship between X
and Y can best be represented by a curved line (such as quadratic,
cubic, polynomial, etc)
28
Linear regression analysis is, therefore, a statistical technique
used to examine the linear relationship that can exist between
two groups of variables,
variables, one dependent and the other independent.
It can be classified as:
(a).. Simple Linear Regression Analysis
(a)
It is a regression between two variables, one is dependent and the
other one is independent
independent.. The nondeterministic model used in this
regard is given by
yi 0 1 x1 i
Simple
regression
One dependent
Onedependent One
29
variable independent
variable
30
31
32
33
34
35
36
Example: Simple Linear Regression
A researcher wishes to examine the relationship
between the amount of the daily average diets taken by
a cohort of 24 sample children and the weight gained
by them in one month (both measured in kg). The
content of the food is the same for all of them.
Dependent variable (y) = weight gained in one month
measured in kilogram
Independent variable (x) = average weight of diet taken
per day by a child measured in Kilogram
37
Sample Data for child weight Model
38
REGRESSION RESULTS USING SPSS
39
Interpretation of the Intercept, b0
40
Interpretation of the Slope Coefficient, b1
41
Least Squares Regression Properties
42
Explained and Unexplained Variation
43
Explained and Unexplained
44
Explained and Unexplained
45
Coefficient of Determination, R2
46
The Standard Deviation of the Regression Slope
47
Inference about the Slope :t Test
48
Inferences about the Slope: t-Test
Example
49
Confidence Interval estimation
50