Correlation
Correlation
Correlation
Coefficient (r)
Coefficient of Determination (r2)
This is a percent of the variation observed in a relationship between dependent and
independent variables. r squared tells you how much of the differences you see in your
dependent variable can be explained using your independent variables. This is very helpful
when looking at regression models. A large r square suggests that the model explains most of
the variation you see in your dependent variable while a small r square suggests that much of
what you see remains unexplained.
For Example: Consider the variation in BMI you see in a typical grocery store. What explains
that variation? (Perhaps genetics, physical activity, eating habits, or some other unknown
qualities) If the people were surveyed and their behavior compared to genetics, and their BMI
you might get an r square of 40%. That would mean that the 3 factors you examined account
for 40% of the differences in BMI, but 60% is still unknown.
r squared- Week 9 power point indicated that the larger the r square is the better fit.
r squared is called the coefficient of determination and is the proportion of variability in a data
set that is accounted for by the statistical model. It is the fraction of the variance of the
dependent variable and explained by the independent variable included in the regression
model. There is no formal hypothesis testing to be conducted regarding r squared. Generally
the higher the r squared value the better is the model fitness.
p in correlation
The p-values (probabilities) for the hypothesis test about the population correlation coefficient.
If r = 0.356, and p < 0.01, then you can conclude that the corresponding population correlation
coefficient is significantly different from 0. Here the null hypothesis claims that the population
correlation coefficient is 0, the alternative claims that the population correlation coefficient is
different from 0.
Coefficient of Determination (r2) (Calculate by dividing the sum of squares due to regression by
the sum of squares about the mean)
Percent of the variation observed in a relationship between dependent and independent
variables
Measure of how much of the variation in Y is accounted for by X
Also the square of the sample Pearson correlation coefficient between Y and X but does not
necessarily measure the strength of the linear association between the two variables
r2 tells you how much of the differences you see in the dependent variable can be explained
using the independent variables
Large r2 suggests that the model explains most of the variation seen in the dependent
variable
Example: Consider the variation in BMI you see in a typical grocery store. What explains that
variation? (Perhaps genetics, physical activity, eating habits, or some other unknown qualities)
If the people were surveyed and their behavior compared to genetics, and their BMI you might
get an r square of 40%. That would mean that the 3 factors you examined account for 40% of
the differences in BMI, but 60% is still unknown.