Correlation and Simple Regression
Pearson Product Moment Coefficient of Correlation
→ an index of relationship between two variables
→ x = independent variable, y = dependent variable
→ the value of r ranges from -1, 0, +1
if r = +1 or -1, there is a perfect correlation
if r = 0, x and y are independent of each other
r = +1
if the trend of the line graph is going upward, the value of r is positive
this indicates that as the value of x increases the value of y also increases
x and y being positively correlated [direct]
ex. as waist line gets bigger, it means weight is also getting heavier
r = -1
if the trend of the line graph is going downward, the value of r is negative
indicates that as the value of x increases the corresponding value of y decreases
x and y being negatively correlated [indirect]
as x increases, y decreases
r=0
if the trend of the line graph cannot be established either upward or downward, then r = 0
indicating that there is no correlation between the x and y variables
Why do we use ‘r’?
→ to analyze if a relationship exists between two variables
→ if there is a relationship existing between x and y, then we can determine the extent by which x influences y using the
Correlation and Simple Regression 1
coefficient of determination which is equal to the square of r and multiplied by 100%
→ this can answer or explain how much the independent variable influences the dependent variables or how much y depends on
x
→ this is now the degree of relationship between x and y which cannot be seen in other statistical tests of relationship
→ a more powerful test of relationship compared with other nonparametric tests
When do we use ‘r’?
→ the value of r ranges from +1 through zero -1
→ there is a perfect positive correlation of r = +1, likewise, there is a negative perfect correlation if the value of r = -1
→ however, if r = 0, then there is no correlation between the two variables x and y
→ positive correlation: as x increases, y also increase or vice versa
→ negative correlation: as x decreases, y increases or vice versa
💡 - ‘r’ tells you if there is a relationship between x and y or not
- coefficient of determination tells you how much x depends on y or how much y depends x (square the r)
Formula for ‘r’
nΣxy − ΣxΣy
r=
[nΣx2 − (Σx)2 ] ⋅ [nΣy2 − (Σy)2 ]
r = pearson product moment
coefficient of correlation
n = sample size
Σxy = sum of product of x and y
ΣxΣy = product of the sum of x and
sum of y
Σx^2 = sum of squares of x
Σy^2 = sum of squares of y
example—
x - 75 70 65 90 85 85 80 70 65 90
y - 80 75 65 95 90 85 90 75 70 90
solve for ‘r’
Correlation and Simple Regression 2
use formula and answer is:
r = 0.949
Solving by Stepwise Method
step 1—problem
step 2—hypotheses
step 3—level of significance
step 4—test statistics / computation
step 5—decision rule
step 6—conclusion / implication
example—
below are the midterm (x) and final (y) grades
x - 75 70 65 90 85 85 80 70 65 90
y - 80 75 65 95 90 85 90 75 70 90
step 1 - problem
→ is there a significant relationship between the midterm and the final grades of 10 students in Mathematics?
step 2 - hypotheses
→ Ho = there IS NO significant relationship between the midterm and the final grades of 10 students in Mathematics
→ Ha = there IS a significant relationship between the midterm and the final grades of 10 students in Mathematics
step 3 - level of significance
→ n = 10
→ a = 0.05
→ df = 8 (n-2)
→ r0.5 = 0.632
step 4 - test statistic / computation
→ r = 0.949
step 5 - decision rule
→ if the computed r value is greater than the tabular value, disconfirm Ho
r > 0.632 (tabular value at 0.05 level of significance with 8 degrees of freedom)
null hypothesis [Ho] is disconfirmed
step 6 - conclusion / implication
→ there is a significant relationship between the midterm and the final grades of 10 students in mathematics
Correlation and Simple Regression 3
Simple Linear Regression Analysis
→ predicts the value of y given the value of x
Why do we use ‘r’?
→ we are interested in predicting the value of y, the dependent variable; this is used for forecasting and prediction
When do we use ‘r’?
→ when there is a relationship between x and y variables
→ the data should be normally distributed using the level of measurement which is expressed in an interval or ratio data
Formula
y = bx + a
to get a,
y = dependent to get b,
variable a = yˉ − bx
ˉ
x = independent nΣxy − ΣxΣy
b= ȳ=
nΣx2 − (Σx)2
variable
average of
a = y-intercept y
b = slope of the line x̄ =
average of
x
example—
below are the midterm (x) and final (y) grades
x - 75 70 65 90 85 85 80 70 65 90
y - 80 75 65 95 90 85 90 75 70 90
suppose the midterm report is x = 88, what is the value of the final grade?
r = 0.949
x̄ = 77.5
Correlation and Simple Regression 4
ȳ = 81.5
b = 0.971
a = 6.25
y = 91.7 or 92 final grade
💡 Ho: μ1 = μ2 = μ3 = μ4 [negative because NO DIFFERENCE; all the same]
Ha:
μ1 ≠ μ2 ≠ μ3 ≠ μ4 [positive because MERON DIFFERENCE]
Before [if doing it manually,,, coming from the center]
computed value ≤ tabular value ,, Reject Ho
Now [if with the help of computer,,, coming from one side]
p-value ≤ alpha value ,, reject Ho
Correlation and Simple Regression 5