[go: up one dir, main page]

0% found this document useful (0 votes)
9 views3 pages

Correlation

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 3

Coefficient of Determination (r2) and Pearson Correlation

Coefficient (r)
Coefficient of Determination (r2)
This is a percent of the variation observed in a relationship between dependent and
independent variables. r squared tells you how much of the differences you see in your
dependent variable can be explained using your independent variables. This is very helpful
when looking at regression models. A large r square suggests that the model explains most of
the variation you see in your dependent variable while a small r square suggests that much of
what you see remains unexplained.
For Example: Consider the variation in BMI you see in a typical grocery store. What explains
that variation? (Perhaps genetics, physical activity, eating habits, or some other unknown
qualities) If the people were surveyed and their behavior compared to genetics, and their BMI
you might get an r square of 40%. That would mean that the 3 factors you examined account
for 40% of the differences in BMI, but 60% is still unknown.

r squared- Week 9 power point indicated that the larger the r square is the better fit.
r squared is called the coefficient of determination and is the proportion of variability in a data
set that is accounted for by the statistical model. It is the fraction of the variance of the
dependent variable and explained by the independent variable included in the regression
model. There is no formal hypothesis testing to be conducted regarding r squared. Generally
the higher the r squared value the better is the model fitness.

Pearson's Correlation Coefficient (population) r:


Pearson's product moment correlation coefficient is a measure of how correlated 2 or more
variables are. r is expressed as a number between -1 and 1 and represents the strength of the
association of these variables. An r of 0 means there is absolutely no association, whereas both

©2014 Walden University Academic Skills Center


-1 and 1 represent a perfect relationship (very, very rare). The interpretation of the strength of
association can be tricky as it is based on a range of values. A small correlation is usually
considered anything between +/- 0.26 to 0.49, a medium is between +/- 0.50 to 0.69, and a
large is between +/- 0.70 to 0.89.

When to use r - Hypothesis testing can be conducted on the regression coefficients.


r is a measure of the strength and direction of the linear relationship between two variables. r is
also called the Pearson Product Moment Correlation Coefficient. It ranges between -1 and +1.
When r is near -1 we have an inverse strong linear association. When r is near +1 there is a
positive strong linear association between the two variables. r values near 0 almost no linear
association between the two variables. r = 0 doesn't prove that there is no association between
the two variables. It shows that there is no linear correlation; the two variables might be
correlated non-linearly.

p in correlation
The p-values (probabilities) for the hypothesis test about the population correlation coefficient.
If r = 0.356, and p < 0.01, then you can conclude that the corresponding population correlation
coefficient is significantly different from 0. Here the null hypothesis claims that the population
correlation coefficient is 0, the alternative claims that the population correlation coefficient is
different from 0.

Interpretation: How to interpret the p value


Think of the values as similar to effect sizes (other examples of an effect size is the actual
difference observed in a t test and the odds ratio) and it is easier to understand the relationship
of r to p. Remember that p is related to the likelihood of a type 1 error. p tells you whether the
conclusions you reach about a relationship based on the effect size are a reflection of reality or
the result of chance error. It is related to the size of the effect observed as well as the sample
size. An r of .20 may have a p of .01. This means it is a good reflection of the reality but not that

©2014 Walden University Academic Skills Center


it is a significant correlation. It is statistically significant in that it is reasonably accurate based
on the sample, but an r of .20 is a very low correlation. Interpretation can be tricky. It again
speaks to what is statistically significant versus what is clinically significant, but whereas you
can simply state that the results of the t test are significant, you should not simply state the
results of the correlation are statistically significant without also including the interpretation of
the strength of the association.

Coefficient of Determination (r2) (Calculate by dividing the sum of squares due to regression by
the sum of squares about the mean)
Percent of the variation observed in a relationship between dependent and independent
variables
Measure of how much of the variation in Y is accounted for by X

Also the square of the sample Pearson correlation coefficient between Y and X but does not
necessarily measure the strength of the linear association between the two variables

r2 tells you how much of the differences you see in the dependent variable can be explained
using the independent variables

Large r2 suggests that the model explains most of the variation seen in the dependent
variable

Small r2 suggests that much of what is seen remains unexplained


Generally the higher the r2 value the better the model fitness

Example: Consider the variation in BMI you see in a typical grocery store. What explains that
variation? (Perhaps genetics, physical activity, eating habits, or some other unknown qualities)
If the people were surveyed and their behavior compared to genetics, and their BMI you might
get an r square of 40%. That would mean that the 3 factors you examined account for 40% of
the differences in BMI, but 60% is still unknown.

©2014 Walden University Academic Skills Center

You might also like