0% found this document useful (0 votes)

11 views16 pages

Correlation and Regression Analysis

Chapter 3 discusses correlation analysis, a statistical technique for quantifying the relationship between two variables using correlation coefficients. It outlines methods for estimating these coefficients, including Pearson's and Spearman's correlation, and provides examples of calculating and interpreting these coefficients. The chapter also introduces regression analysis, emphasizing the relationship between dependent and independent variables, and the assumptions underlying linear regression models.

Uploaded by

politezulu222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

Correlation and Regression Analysis

Uploaded by

politezulu222

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

CHAPTER 3

Correlation Analysis

3.1 Correlation

Correlation analysis is a statistical technique used to quantify the direction and strength
of association between two variables. An estimate of the correlation between two
variables is called the correlation coefficient, denoted by r . In correlation analysis, we
estimate a sample correlation coefficient. The sample correlation coefficient, ranges
between -1 and +1. When the correlation coefficient is positive, higher levels of one
variable are associated with higher levels of the other and when it is negative, higher
levels of one variable are associated with lower levels of the other. The sign of the
correlation coefficient indicates the direction of the association while the magnitude of
the correlation coefficient indicates the strength of the association.

3.2 Estimation methods

There are two main approaches commonly used in computing the sample correlation
coefficient. These are

• The Pearson product-moment correlation coefficient: for estimating linear

correlation among continuous variables and

• The spearman’s rank correlation coefficient: for estimating non-linear correlation.

(a) Pearson product-moment

This is used for quantitative data measured on interval or ratio scale. For given
observations x and y , It the Pearson product-moment is defined by:

 ( x − x )( y − y )
i i
r= i =1
n n

( x − x ) ( y − y )
2 2
i i
i =1 i =1

1
(b) Spearman’s correlation coefficient

This applied to the ranks instead of the actual data. It is defined by:

n
6 di2
r = 1− i =1

n ( n − 1)
2

Where

di = difference between pair of rank

n = total number of observations or pairs of ranks

Example 3.1

Consider the Table below provides the score for math and English for 10 students in a
test.

Math 5 0 3 1 2 2 5 3 5 4
English 1 2 1 3 3 1 3 1 6 2
Scores( )
a) Find the correlation coefficient between the math grade and English grade and
interpret it.

b) Test the hypothesis that there is no linear association among the variables at 5% level
of significance.

2
Solution

n n n

 ( x − x )( y − y ) = 6  ( x − x ) = 28 ( y − y ) =22.1
2 2
a) y = 3 and x = 2.3 i i i i
i =1 i =1 i =1

Math (x-xbar)(y-
Scores( English bar) (x-xbar) (y-bar)
5 1 -2.6 4 1.69
0 2 0.9 9 0.09
3 1 0 0 1.69
1 3 -1.4 4 0.49
2 3 -0.7 1 0.49
2 1 1.3 1 1.69
5 3 1.4 4 0.49
3 1 0 0 1.69
5 6 7.4 4 13.69
4 2 -0.3 1 0.09
Xbar=3 Ybar=2.3 6 28 22.1

6
r= = 0.2412
28  22.1

The correlation coefficient between the math grade and English grade is only about 24%
which seem not to be a strong linear relationship present. About 24% of the time when
math grade increases, the grade for English also increases and vice ver.

1. H 0 :  = 0 versus H1 :   0

2. The distribution of the population correlation is unknown; thus it follows the t-

distribution

3The two tail critical value at 5% is t   −2.306 or t   2.306

 n − 2,   n − 2,1− 
 2  2

r n−2 0.2417  10 − 2
4. Compute the test Statistics using T = , hence T = = 0.7260
1− r 2
1 − 0.2417 2

3
5. Decision: Do not reject H0 because T  2.306 and conclude that there is no linear
association between the variables.

Example 3.2

Find the rank correlation coefficient of the table below:

x 69 66 68 73 71 74 71 69
y 163 153 185 186 157 220 190 185S

Solution.

Let the rank values of x and y be Rx and Ry respectively.

x y Rx Ry d= Rx- Ry di2
69 163 3.5 3 0.5 0.25
66 153 1 1 0 0
68 185 2 4.5 -2.5 6.25
73 186 7 6 1 1
71 157 5.5 2 3.5 12.25
74 22 8 8 0 0
71 190 5.5 7 -1.5 2.25
69 185 3.5 4.5 -1 1

d
i =1
i
2
= 0.25 + 0 + 6.25 + 1 + 12.25 + 0 + 2.25 + 1 = 23

Hence, the rank correlation coefficient is given by:

n
6 di2
6(23)
r = 1− i =1
=1 − = 0.7262
n ( n − 1)
2
8(64 − 1)

Exercise 3.1
(a) Consider the Table of grades below:

Mathematics grade 70 92 80 74 65 83
English grade 74 84 63 87 78 90

4
(i) Compute and interpret the correlation coefficient if the grades of students are
selected at random.
(ii) Test the hypothesis that there is no linear association among the variables at 5%
level of significance.

(b) The Statistics Consulting Center at Virginia Tech analyzed data on normal
woodchucks for the Department of Veterinary Medicine. The variables of interest were
body weight in grams and heart weight in grams. It was desired to develop a linear
regression equation in order to determine if there is a significant linear relationship
between heart weight and total body weight.
Weight (kg) 2.75 2.15 4.41 5.52 3.21 4.32 2.31 4.3 3.71
Chest size 29.5 26.3 32.2 36.5 27.2 27.7 28.3 30.3 28.7
(cm)
(i) Calculate r and interpret it.

(ii) Test the null hypothesis that ρ = 0 against the alternative that ρ > 0. Use α=0.01 level.

(ii) What percentage of the variation in infant chest sizes is explained by difference in
weight? {NB: square r to obtain this percentage}.
(iii) Use the spearman’s rank approach to compute r using the data in question 3.1.

5
CHAPTER 4

Correlation Analysis

4.1 Regression Analysis

Regression analysis is a statistical technique in which we use observed data to relate a

variable of interest (or response) variable, to one or more independent (or predictor)
variables. A regression analysis in which the response/variable of interest/dependent
variable depends one independent or predictor variable is called a simple regression
model. The main objective of regression analysis is to build a regression model or
prediction equation that can be used to describe, predict, and control the dependent
variable on the basis of the independent variables. (Bowerman et al,2001)

4.2 Scatter Diagram

One way to explore the relationship between a dependent variable y and an independent
variable (denoted x ) is to make a scatter diagram, or scatter plot, of y versus x . First,
data concerning the two variables are observed in pairs. To construct the scatter plot, each
value of y is plotted against its corresponding value of x .If y and x are related, the plot
shows us the direction of the relationship. That is, y could be positively related to x ( y
increases as x increases) or y could be negatively related to x ( y decreases as
x increases). (Bowerman et al, 2001). The figures below show some examples of scatter
plots.

Figure 1(a) Figure 1(b)

6
Figure 1(c) Figure 1(d)

4.3 Sample linear regression model

The simple linear regression model assumes that the relationship between the dependent
variable, denoted y and the independent variable, denoted x, can be approximated by a
straight line. We can tentatively decide whether there is an approximate straight line
relationship between y and x by making a scatter diagram, or scatter plot of x and y by
making a scatter diagram, or scatter plot of y versus x. The simple linear regression
model is given by:
yi =  yi xi + i

yi = 0 + 1 xi + i
where
 0 = the intercept parameter on the y -axis of the model

1 = the slope or the differential tangent of the linear model.

yi
= the value of the response variable in the ith observation.
xi
= the known value of the predictor variable in the ith observation
i
= the random error or noise term which accounts for errors due to chance and
neglected factors assumed not important.

7
4.4 Assumptions Underlying the Simple Linear Model.

(i) The value of yi are random and independent of each other.

(ii) The probability distribution of i  N (0,  2 )

(iii) The mean of the error term is zero i.e. E( i ) = 0

(iv) The variance of the error term is constant i.e. Var( i )=  2  xi

(v) v)The random errors i and  j are independent i.e. cov( i ,  j )= 0 for i  j

From the above assumptions, we establish the following:

E ( yi ) = E (  0 + 1 xi + i ) =  0 + 1 x
Var ( yi ) = Var (  0 + 1 xi + i )
= Var (i ) =  2 xi


(a) yi  N (0 + 1 xi , 2 )

(b) cov( yi , y j ) = 0

4.5 The Least Square Point Estimates of Linear Regression Model

We seek to find the estimates 0 and 1 respectively by minimizing the total sum of
squares error (SSE):

n n n
SSE =   i2 =  ( yi2 −yˆ )2 =  ( yi −ˆ0 − ˆ1 xi )2 (1)
i =1 i =1 i =1

Differentiating the SSE with respect to 0 and 0 we have

 ( SSE ) n

= −2 ( yi −ˆ0 − ˆ1 xi )2 
ˆ0 i =1 
 (2)
 ( SSE ) n
= −2 ( yi − 0 − 1 xi ) xi 
ˆ ˆ 2

ˆ1 i =1


8
By setting the partial derivatives to zero and rearranging the terms, obtain the normal
equations:
n n

nˆ0 + ˆ1  xi =  yi 
i =1 i =1 
n n n  (3)
 0  xi + 1  xi =  xi yi
ˆ ˆ 
i =1 i =1 i =1


Which may be solved simultaneously to yield:
n
1 n n n
n xi yi − ( xi )( yi )  ( xi − x )( yi − y )
n i =1
ˆ1 = i =1 n n
i =1
= i =1 n (4)
n xi − ( xi )
2 2
 ( xi − x ) 2

i =1 i =1 i =1

and
n n

 y − ˆ  x i 1 i
ˆ0 = i =1 i =1
= y − ˆ1 x (5)
n

4.6 Analysis of Variance

The application of Analysis of variance (ANOVA) in regression analysis is based on the

partitioning of the total variation and its degree of freedom into components. By the
partitioning of the total variation,

n n

 ( y − y ) = [( y − yˆ ) + ( yˆ − y )]
i =1
i
2

i =1
i i i
2

n n n
=  ( yi − yî )2 +  ( yî − y )2 + 2 ( yi − yî )( yî − y )
i =1 i =1 i =1

n n n

 ( y − y ) =  ( y − yˆ ) +  ( yˆ − y )
i =1
i
2

i =1
i i
2

i =1
i
2

i.e. Total variation = unexplained variation + Explained Variation

SST = SSE + SSR

• The SST is a measure of dispersion of the total variance in the observed values,
yi .

9
• The SSR also measures the amount of the total variance in the observed values of
yi that is accounted for by the model.

• The SSE is a measure of dispersion of the observed values yi about the

regression line.

It can be shown that

SSR
r2 =
SST

This r 2 is called coefficient of determination which is the explained variation expressed

as a fraction of the total variation. It determines the percentage of the total variation in
yi accounted for by the model.

SSR
We can deduce from r2 = that
SST

i) SSR = r 2 SST

ii) SSE = (1 − r 2 )SST

SS
Also 0  r 2  1 , r 2 = ˆ12 xx and r 2 = ˆ1 ˆ0 .
SST

Table 4.1 The Analysis of Variance(ANOVA) for the regression model.

Sources of Variation Sum of Squares Degree of freedom Mean square F-Ratio

Regression SSR = ˆ1 SS xy 1

MSR =
SSR
F=
MSR
1 MSE
Residual Error SSE = SS yy − ˆ1 SS xy n − 2 MSE =
SSE
n−2

Total SST = SS yy n −1

From the ANOVA Table the following valid conclusions can be drawn:

10
i) E ( SST ) = (n − 1) 2 + ˆ12 SS xx

ii) E ( MSR) =  2 + ˆ12 SS xx

iii) E(SSE) = (n − 2) 2 under the null hypothesis, H 0 : ˆ0 = 0

iv) E (MST ) =  2

E (MSR) =  2

E (MSE ) =  2

However MSE is unbiased estimator for  2 whether or not x and y are related ( i.e

ˆ1 = 0 or not). If ,then ˆ1 = 0 then E (MSR)   2 since ˆ12 SS xx  0 . Thus ,for testing

whether or not ˆ1 = 0 a comparison of MSR and MSE is made. If MSR and MSE are

of the same order of magnitude, it will suggest that ˆ1 = 0 . On the other hand, if the

MSR  MSE this would suggest that ˆ1  0 .

These two mean squares ( MSR and MSE ) form the basic idea underlying the ANOVA
test of the overall regression model.

4.7 The F- Ratio Test

The ANOVA generally provides highly useful test for regression models (and other linear
statistical models).

(a) The hypotheses are

H 0 : ˆ1 = 0 versus H1 : ˆ1  0 at α level of significance

MSR
(b) The test –statistic is F = F (1,n − 2) which approaches 1 when 1 = 0 and
MSE
bigger than 1 when 1  0

11
(c) Decision Rule: Reject H 0 : 1 = 0 if F  F (1,n−2)

4.8 Coefficient of determination

The coefficient of determination denoted by R2 measures the proportion of the total
variability in the dependent variable (y) that is explained by the independent variable (x).
SSR SSTO − SSE SSE
R2 = = = 1−
SSTO SSTO SSTO
Note the follow:
(a) 1. 0  R 2  1
(b) If all the data points fall exactly on the regression line having a non-zero slope,
then R 2 = 1 .
(c) If ˆ1 = 0 then R 2 = 0.

(d) The square root of R2 gives you the correlation coefficient where the direction or
+ r if 1 = +
the sign is determined by the direction of ̂1 . i.e. r =  R 2 where  .
 −r if 1 = −

4.9 Pitfalls and limitations associated with regression and Correlation analysis
(a) In regression analysis a value of Y cannot be legitimately estimated if the value of
X is outside the range of values that served as the basis for the regression
equation.
(b) If the estimate of Y involves the prediction of a result that has not yet occurred,
the historical data that served as the basis for the regression equation may not be
relevant for future events.
(c) The use of a prediction or a confidence interval is based on the assumption that
the conditional distributions of Y, and thus of the residuals, are normal and have
equal variances.
(d) A significant correlation coefficient does not necessarily indicate causation, but
rather may indicate a common linkage to other events.
(e) A significant correlation is not necessarily an important correlation. Given a large
sample, a correlation of, say, r=+0.10 can be significantly different from 0 at a
 = 0.05 . Yet the coefficient of determination of r2 = 0.01 for this example

12
indicates that only 1 percent of the variance in Y is statistically explained by
knowing X.
(f) The interpretation of the coefficients of correlation and determination is based on
the assumption of a bivariate normal distribution for the population and, for each
variable, equal conditional variances.
(g) For both regression and correlation analysis, a linear model is assumed. For a
relationship that is curvilinear, a transformation to achieve linearity may be
available. Another possibility is to restrict the analysis to the range of values
within which the relationship is essentially linear.

CHAPTER 3

4.10 Worked Examples

Example 4.3
Use the estimated model below to describe the relationship between x and y
yˆ = 3.829633 − 0.903643 x .

Example 4.3

Suppose an analyst takes a random sample of 10 recent truck shipments made by a

company and records the distance in miles and delivery time to the nearest half-day from
the time that the shipment was made available for pick-up. Use the Table below to
answer the following questions.

(a) Construct a scatter plot and use it to determine if a linear regression will be
appropriate.
(b) Determine the least-squares regression equation for the data.

13
(c) What is the nature of the relationship between distance and delivery time?
Motivate your answer.
(d) Interpret the estimated value for β0.
(e) Determine the number of days it will take a shipment to arrive if the total miles is
1000.
(f) Determine the coefficient of variation and interpret it.
(g) Construct an ANOVA table to represent the data.

(h) Perform a hypothesis test to determine the significance of the estimated model at
α=0.05.

(i) Compute the correlation coefficient and use it to test the hypothesis H 0 :  = 0
versus H1 :   0 at 0.05 level of significance

Solution
(a) Looking at the scatter plot below, the points seem to form a straight line, therefore
linear regression may be appropriate model for the data.

(b) We will need the following data since we were not given.
10 10 10
x = 762 , y = 2.85 ,  ( xi − x ) = 1297860 ,  ( yi − y ) = 18.525  ( x − x )( y − y ) = 4653
2 2
i i
i =1 i =1 i =1

14
n

 ( x − x )( y − y )
i i
4653
ˆ1 = i =1
n
= = 0.003585
 (x − x ) 2 12977860
i
i =1

ˆ0 = y − ˆ1 x = 2.85 − 0.003585  762 = 0.118129

Therefore, the estimated regression line is yˆ = 0.118129 + 0.003585x
(c) There is a direct or positive relationship between delivery time and distance because
β1 is positive.
(d) All things being equal, a unit increase in distance will result in 0.0036 increase in
delivery time.
ŷ = 0.118 + 0.0036 (1000 ) = 3.70 days .

(e) SSE = SS yy − ˆ1 SS xy

 SSE = 18.525 − ( 0.003585)( 4653) = 1.844

10
Note from the variance decomposition that SST =  ( yi − y ) = 18.525 ,
2

i =1

SSE 1.844
 r2 = 1− = 1− = 0.90 .
SST 18.525
About 90% of the total variations in delivery time is explained by the distance.

(f) ANOVA table

Sources of Sum of Squares Degree of freedom Mean square F-Ratio

Variation
Regression 16.681 1 16.681 16.681
F= = 72.396
0.2305
Residual Error 1.844 8 1.844
= 0.2305
10 − 2

Total 18.525 10

(g) H 0 : ˆ1 = 0 versus H1 : ˆ1  0 at 0.05 level of significance

F = 72.396

15
F ,(1,n − 2) = F0.05,(1,8) = 5.32

(h) Since F  5.32 we reject H 0 : ˆ1 = 0 and conclude the model is significant.

(i) H 0 :  = 0 versus H1 :   0

The two tail critical value at 5% is t   −2.306 or t   2.306

 n − 2,   n − 2,1− 
 2  2

r =  0.9 = 0.95 since ̂1 is positive, r = 0.95

r n−2 0.95  10 − 2
T= , hence T = = 8.61
1− r2 1 − 0.952

We reject H0 because T  2.306 and conclude that there is a significant relationship

between distance and delivery time.

Chapter 1
No ratings yet
Chapter 1
22 pages
Chapter 9
No ratings yet
Chapter 9
14 pages
Intro to Correlation & Regression
No ratings yet
Intro to Correlation & Regression
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
62 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Stats for Students & Educators
No ratings yet
Stats for Students & Educators
15 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Statistics: Correlation & Regression
100% (1)
Statistics: Correlation & Regression
9 pages
Regression Ex
No ratings yet
Regression Ex
13 pages
Cha 6
No ratings yet
Cha 6
8 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
No ratings yet
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
31 pages
Engineering Regression Techniques
No ratings yet
Engineering Regression Techniques
8 pages
13simple Linear Regression
No ratings yet
13simple Linear Regression
127 pages
Stat 4-6 Chapter
No ratings yet
Stat 4-6 Chapter
37 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Correlation and Linear
No ratings yet
Correlation and Linear
27 pages
Lesson 6.2 Correlation and Regression Analysis Final Edition
No ratings yet
Lesson 6.2 Correlation and Regression Analysis Final Edition
8 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
Chapter-6-Simple Linear Regression & Correlation
No ratings yet
Chapter-6-Simple Linear Regression & Correlation
12 pages
PSNM - Ch. 1
No ratings yet
PSNM - Ch. 1
16 pages
Correlation and Regression Original
No ratings yet
Correlation and Regression Original
44 pages
Scatter Plot/Diagram Simple Linear Regression Model
No ratings yet
Scatter Plot/Diagram Simple Linear Regression Model
43 pages
PS - Module 3 - ViRa
No ratings yet
PS - Module 3 - ViRa
104 pages
Lecture 12 (Chapter 8) - Linear Regression and Correlation Analysis
No ratings yet
Lecture 12 (Chapter 8) - Linear Regression and Correlation Analysis
40 pages
Lesson 11 - Regression and Correlation Analysis
No ratings yet
Lesson 11 - Regression and Correlation Analysis
8 pages
Correlation
No ratings yet
Correlation
22 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
L3 - Correlation & Rank Correlation
No ratings yet
L3 - Correlation & Rank Correlation
11 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Statics Chapter 999
No ratings yet
Statics Chapter 999
9 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
9 pages
Module III Final 2024
No ratings yet
Module III Final 2024
42 pages
Correlation
100% (1)
Correlation
29 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Chapter7
No ratings yet
Chapter7
52 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
Business Stats for Students
No ratings yet
Business Stats for Students
66 pages
Correlation and Regression: by Tushar Bhatt
100% (1)
Correlation and Regression: by Tushar Bhatt
66 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Chapter 9-Correlation and Regression
No ratings yet
Chapter 9-Correlation and Regression
23 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Captura de Ecrã 2024-10-16 À(s) 13.04.06
No ratings yet
Captura de Ecrã 2024-10-16 À(s) 13.04.06
38 pages
12.1correlation and Simple Linear
No ratings yet
12.1correlation and Simple Linear
45 pages
MMW Module 10 - Correlation and Linear Regression
No ratings yet
MMW Module 10 - Correlation and Linear Regression
13 pages
Chapter-4-Simple Linear Regression & Correlation
100% (3)
Chapter-4-Simple Linear Regression & Correlation
9 pages
Correlation-Regression 2019
No ratings yet
Correlation-Regression 2019
76 pages
Regression & Correlation Guide
100% (1)
Regression & Correlation Guide
9 pages
Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
Pearson R
No ratings yet
Pearson R
25 pages
Linear Regression Analysis - 1
No ratings yet
Linear Regression Analysis - 1
18 pages
Unit 3 Simple Correlation and Regression Analysis1
No ratings yet
Unit 3 Simple Correlation and Regression Analysis1
16 pages
Reading Material 2 - Study Strategies
No ratings yet
Reading Material 2 - Study Strategies
6 pages
Topic 13 Class - Termination of Contracts Remedies
No ratings yet
Topic 13 Class - Termination of Contracts Remedies
34 pages
General Deduction Formula and s23
No ratings yet
General Deduction Formula and s23
44 pages
Topic 14 Class - The Contract of Sale 2020
No ratings yet
Topic 14 Class - The Contract of Sale 2020
77 pages
Capital Allowances Long
No ratings yet
Capital Allowances Long
48 pages
Maths Skills Geography Answers
No ratings yet
Maths Skills Geography Answers
37 pages
P&s Odd Sem Tutorial Workbook 25-26
No ratings yet
P&s Odd Sem Tutorial Workbook 25-26
95 pages
Post, or Distribute: Correlation and Regression - Pearson and Spearman
No ratings yet
Post, or Distribute: Correlation and Regression - Pearson and Spearman
35 pages
Statistical Analysis in Education
No ratings yet
Statistical Analysis in Education
6 pages
Practice Questions
No ratings yet
Practice Questions
2 pages
Factors Affecting The Demand For Life Insurance
100% (2)
Factors Affecting The Demand For Life Insurance
8 pages
Analisis Inferensial
No ratings yet
Analisis Inferensial
34 pages
Coursework Gcse Maths
100% (2)
Coursework Gcse Maths
6 pages
Probability Statistics and Numerical Methods
No ratings yet
Probability Statistics and Numerical Methods
4 pages
Nabilah-22018025 Tugas Statistika
No ratings yet
Nabilah-22018025 Tugas Statistika
13 pages
Activity 9 Alcala
100% (1)
Activity 9 Alcala
5 pages
The Contribution of Tourism To Socio-Economic Development: Myth or Reality?
No ratings yet
The Contribution of Tourism To Socio-Economic Development: Myth or Reality?
29 pages
Kendall's Tau and Spearman's Rank Correlation Coefficient Assess Statistical
No ratings yet
Kendall's Tau and Spearman's Rank Correlation Coefficient Assess Statistical
7 pages
Fourth Quarter Periodical Test in Grade Eleven Statistics and Probability
100% (2)
Fourth Quarter Periodical Test in Grade Eleven Statistics and Probability
2 pages
BSED3B ZAMORA, CHERRY Assignment SPEARMAN RANK
No ratings yet
BSED3B ZAMORA, CHERRY Assignment SPEARMAN RANK
9 pages
Correlational Research Design
No ratings yet
Correlational Research Design
16 pages
PSCOM
No ratings yet
PSCOM
28 pages
English Grammar's Role in Fluency
No ratings yet
English Grammar's Role in Fluency
10 pages
Bba 433 - Research Methodology Cia Ii Title - The Impact of CSR Activities On The Brand Loyalty of A Customer - Itc LTD
No ratings yet
Bba 433 - Research Methodology Cia Ii Title - The Impact of CSR Activities On The Brand Loyalty of A Customer - Itc LTD
26 pages
Assessment of Vitamin D3 and Iron Status and Their Correlation Among Libyan Children at A Tertiary Care Centre in East Libya
No ratings yet
Assessment of Vitamin D3 and Iron Status and Their Correlation Among Libyan Children at A Tertiary Care Centre in East Libya
9 pages
Martin-Crack Initiation Stress in Low Porosity Crystalline and Sedimentary Rocks
No ratings yet
Martin-Crack Initiation Stress in Low Porosity Crystalline and Sedimentary Rocks
14 pages
Specimen QP
No ratings yet
Specimen QP
26 pages
Understanding Correlation Basics
No ratings yet
Understanding Correlation Basics
11 pages
Bivariate Analysis Techniques Guide
No ratings yet
Bivariate Analysis Techniques Guide
23 pages
Introduction of Bio-Statistics For Cambdride A-Level Biology 9700
No ratings yet
Introduction of Bio-Statistics For Cambdride A-Level Biology 9700
14 pages
HRM Perspective June 2020
No ratings yet
HRM Perspective June 2020
90 pages
GMATH Correlation Analysis
No ratings yet
GMATH Correlation Analysis
3 pages
Geography HL IA
No ratings yet
Geography HL IA
22 pages
Cadre Life Cycle Management - Induction, Engagement and Retention Strategy (A Case Study of Tata Motors Ltd. Pantnagar Plant, Uttarakhand)
No ratings yet
Cadre Life Cycle Management - Induction, Engagement and Retention Strategy (A Case Study of Tata Motors Ltd. Pantnagar Plant, Uttarakhand)
11 pages
The Effects of Spirituality On The Quality of Life Among Cancer Patients Enrolled in The Out Patient Chemotheraphy Clinic
No ratings yet
The Effects of Spirituality On The Quality of Life Among Cancer Patients Enrolled in The Out Patient Chemotheraphy Clinic
8 pages

Correlation and Regression Analysis

Uploaded by

Correlation and Regression Analysis

Uploaded by

CHAPTER 3

3.2 Estimation methods

• The Pearson product-moment correlation coefficient: for estimating linear

• The spearman’s rank correlation coefficient: for estimating non-linear correlation.

(a) Pearson product-moment

di = difference between pair of rank

n = total number of observations or pairs of ranks

2. The distribution of the population correlation is unknown; thus it follows the t-

3The two tail critical value at 5% is t   −2.306 or t   2.306

Find the rank correlation coefficient of the table below:

Let the rank values of x and y be Rx and Ry respectively.

Hence, the rank correlation coefficient is given by:

4.1 Regression Analysis

Regression analysis is a statistical technique in which we use observed data to relate a

4.2 Scatter Diagram

Figure 1(a) Figure 1(b)

4.3 Sample linear regression model

1 = the slope or the differential tangent of the linear model.

(i) The value of yi are random and independent of each other.

(ii) The probability distribution of i  N (0,  2 )

(iii) The mean of the error term is zero i.e. E( i ) = 0

(iv) The variance of the error term is constant i.e. Var( i )=  2  xi

From the above assumptions, we establish the following:

4.5 The Least Square Point Estimates of Linear Regression Model

Differentiating the SSE with respect to 0 and 0 we have

4.6 Analysis of Variance

The application of Analysis of variance (ANOVA) in regression analysis is based on the

i.e. Total variation = unexplained variation + Explained Variation

SST = SSE + SSR

• The SSE is a measure of dispersion of the observed values yi about the

It can be shown that

This r 2 is called coefficient of determination which is the explained variation expressed

ii) SSE = (1 − r 2 )SST

Table 4.1 The Analysis of Variance(ANOVA) for the regression model.

Sources of Variation Sum of Squares Degree of freedom Mean square F-Ratio

Regression SSR = ˆ1 SS xy 1

ii) E ( MSR) =  2 + ˆ12 SS xx

iii) E(SSE) = (n − 2) 2 under the null hypothesis, H 0 : ˆ0 = 0

MSR  MSE this would suggest that ˆ1  0 .

4.7 The F- Ratio Test

(a) The hypotheses are

H 0 : ˆ1 = 0 versus H1 : ˆ1  0 at α level of significance

4.8 Coefficient of determination

4.10 Worked Examples

Suppose an analyst takes a random sample of 10 recent truck shipments made by a

ˆ0 = y − ˆ1 x = 2.85 − 0.003585  762 = 0.118129

(e) SSE = SS yy − ˆ1 SS xy

 SSE = 18.525 − ( 0.003585)( 4653) = 1.844

(f) ANOVA table

Sources of Sum of Squares Degree of freedom Mean square F-Ratio

(g) H 0 : ˆ1 = 0 versus H1 : ˆ1  0 at 0.05 level of significance

The two tail critical value at 5% is t   −2.306 or t   2.306

r =  0.9 = 0.95 since ̂1 is positive, r = 0.95

We reject H0 because T  2.306 and conclude that there is a significant relationship

You might also like