0% found this document useful (0 votes)

17 views14 pages

Week 03 Regression

Uploaded by

sabrinashah2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views14 pages

Week 03 Regression

Uploaded by

sabrinashah2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

22-08-2024

TOD 533
Correlation, Introduction to Regression
Amit Das
TODS / AMSOM / AU
amit.das@ahduni.edu.in

Association between interval variables

• Do two interval variables “move together” ?
• When one takes on “high” values (relative to its mean),
what does the other do?
• Pearson correlation coefficient

r
 Z x Zy
 1  r  1
N
• When high (low) z-scores of the two variables co-occur, the
correlation coefficient is larger

1
22-08-2024

Computing the correlation coefficient

Task 1 Task 2 Product of
z-scores
Student Raw Score z-score Raw Score z-score
1 42 +1.78 90 +1.21 +2.15
2 9 -1.04 40 -1.65 +1.72
3 28 +0.58 92 +1.33 +0.77
4 11 -0.87 50 -1.08 +0.94
5 8 -1.13 49 -1.13 +1.28
6 15 -0.53 63 -0.33 +0.17
7 14 -0.62 68 -0.05 +0.03
8 25 +0.33 75 +0.35 +0.12
9 40 +1.61 89 +1.16 +1.87
10 20 -0.10 72 +0.18 -0.02
SUM 212 0 688 0 +9.03
MEAN 21.2 0 68.8 0 +0.903
STD. DEV. 11.69 1 17.47 1

Eyeballing correlation

2
22-08-2024

Statistical significance of r
• Null hypothesis: r = 0
Compute test statistic = n2
r
1  r2
• Compare against t-distribution with df = n-2

• For r = 0.903 with n = 10,

• test statistic = 5.94, compare against t8 distribution
• p-value (2-tailed) = 0.0003 << 0.05

Correlation and sample size

• Significance of r depends on sample size
• for larger n, smaller value of r might be significant

Sample size Value of r required to reach statistical significance at …

10% (two-tailed) 5% (two-tailed)
12 0.497 0.576
22 0.360 0.423
32 0.296 0.349
42 0.257 0.304
52 0.231 0.273
102 0.164 0.195

• for very large n, a very small r might be significant

• statistical vs. managerial significance

3
22-08-2024

Association between ordinal variables

• The Spearman rank correlation coefficient

6 d 2

rs  1 
n n 2  1
• where d is the difference in the ranks of a given individual for the two
variables
• suitable for ordinal data
• less affected than Pearson r by outliers

Rank Correlation example

Task 1 Task 2 (Difference
in ranks)2
Student Raw Score Rank 1 Raw Score Rank 2
1 42 1 90 2 1
2 9 9 40 10 1
3 28 3 92 1 4
4 11 8 50 8 0
5 8 10 49 9 1
6 15 6 63 7 1
7 14 7 68 6 1
8 25 4 75 4 0
9 40 2 89 3 1
10 20 5 72 5 0

• Spearman rank correlation coefficient

= 6  10
1 = 94%
10  100  1

4
22-08-2024

Correlation and regression …1

• Earlier, we examined whether two interval-scaled variables are
associated (“move together”) using the correlation coefficient
-1  r  +1
• linear regression frames the same question in a slightly different form
• by modeling the dependent variable Y as a linear function of the independent
variable X
Y = a + bX

The linear regression model

slope b = p/q X
X
price in dollars

X
X X
p
X X

X q

X
intercept a

area in square feet

Relation of apartment prices to floor area (hypothetical)

5
22-08-2024

The best-fit regression line

• More than one line can be passed through the cloud (“scatterplot”) of
Y on X
• each line denotes a combination of a and b
• For each line
• for each data point compute error = Yobs – Ypred
• square the errors and add them up Se2
• The best-fit (least-squares) regression line

Y = A + BX (note A, B in caps) minimizes Se2

Solution to minimization problem

• For the mathematically inclined, here’s how A and B (optimum values

of a and b) may be calculated:

N  XY   X Y
B
N  X 2   X 
2

A  Y  BX

6
22-08-2024

Interpreting the slope

Y Y Y
X
B<0 X X X

X X X
X X
X X
X
X
X

B>0 X
B=0

X X X

The value of Y does

Larger values of X Larger values of X
not depend on X:
are associated with are associated with
the best estimate of
larger values of Y smaller values of Y
Y is simply its mean

Scale Invariance (or not)

• Let us say that, for area measured in square feet, the slope B of the
best-fit regression line is 500
• If we measure area in square meters, the value of B would work out
to be 5382
• Is that a problem?
• $500 per square foot vs $5382 per square meter?
• we can standardize all X and Y values before we start … then regression
coefficient B is scale-free

7
22-08-2024

Correlation and regression …2

• The correlation coefficient r and the regression slope B
are related as follows:
r  BS X / SY 
• where SX and SY are the standard deviations of X and Y
respectively
• r also has the benefit of being scale-invariant
• it does not matter whether area is measured in square feet or
square meters, or whether price is measured in INR or USD

Standardized regression coefficients

• Recall that regression coefficients are not scale-invariant
• i.e. they depend on the units of measurement
• To get scale-invariant coefficients
• standardize Y as well as X1, X2, …, Xn, estimate
zY  C  D1z X1  D2 z X 2  ...  Dn z X n
• the z-score of Y is modeled as a function of the z-scores of Xi … the
coefficients Di are scale-invariant
• Also used when the relative magnitudes of Xi differ
widely (in their “natural” units)

8
22-08-2024

Generalizing to multiple regression

• How does Y vary with the levels of multiple
“explanatory” variables?
Y = A + B 1 X 1 + B 2 X 2 + … + B nX n
• Bi is the slope of Y on dimension Xi
• B1, B2, …, Bn called “partial” regression coefficients
• the magnitudes (and even signs) of B1, B2, …, Bn depend on which
other variables are included in the multiple regression model
• might not agree in magnitude (or even sign) with the bivariate
correlation coefficient r between Xi and Y

Predictive power
• R = bivariate correlation between Yobserved and Ypredicted
(how well do they agree?)
• Consider the proportionate reduction in prediction
error (PRE) using the model
 Y obs   
 Y   Yobs  Y pred  /  Yobs  Y
2 2
2

• to the baseline of predicting Y using just its mean Y

• turns out that PRE = R2
• R2 or R-square measures the predictive power of the
multiple regression model

9
22-08-2024

Hypothesis-testing in regression
• Consider Y = A + B1X1 + B2X2 + …+ BnXn
• For the null hypothesis H0 that ALL the coefficients Bi
are zero, B1 = B2 = Bn = 0
• and the alternate hypothesis Ha that at least one Bi is
NOT zero, Bi  0
R2 / k
F
• the test statistic is
1  R /n  k  1
2

• k = number of explanatory variables Xi

• n = number of observations (sample size)

Overall F-test of model

• The test statistic is compared against the
F-distribution with df1 = k and df2 = n-(k+1)
• If the test statistic is large, the area to the right of this value will be
small
• small p-value enables rejection of the null hypothesis (H0: all Bi are zero)
• note that this is more likely if R2 is large
• A model that fails this test is no better than no model (in terms of
prediction error)

10
22-08-2024

Significance of coefficients
• Whether each coefficient Bi differs significantly from
zero is tested using the test statistic Bi /  Bi
(value of coefficient / standard error)
• compared against t-distribution with n-(k+1) df
• Each coefficient can be tested in this manner
• H0: coefficient is zero vs. Ha: coefficient is not zero
• When a coefficient Bi fails this test, it is not significantly
different from zero, and the term involving Xi can be
dropped from the model

Desirable properties of regression model

• High R2
• indicates that a large proportion of the variation in Y is explained by the
independent variables
• Significant F-test
• the null hypothesis that all Bi are zero can be conclusively rejected
• Significant coefficients (t-test)
• change in each explanatory variable significantly affects the level of the
dependent variable

11
22-08-2024

Another example: Boston housing prices

Variables
1. CRIM - per capita crime rate by town
2. ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS - proportion of non-retail business acres per town.
4. CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
5. NOX - nitric oxides concentration (parts per 10 million)
6. RM - average number of rooms per dwelling
7. AGE - proportion of owner-occupied units built prior to 1940
8. DIS - weighted distances to five Boston employment centres
9. RAD - index of accessibility to radial highways
10. TAX - full-value property-tax rate per $10,000
11. PTRATIO - pupil-teacher ratio by town
12. B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
13. LSTAT - % lower status of the population
14. MEDV - Median value of owner-occupied homes in $1000's

Excerpt of Boston housing data

crim zn indus chas nox ptratio b lstat medv
0.00632 18 2.31 0 0.538 15.3 396.9 4.98 24
0.02731 0 7.07 0 0.469 17.8 396.9 9.14 21.6
0.02729 0 7.07 0 0.469 17.8 392.83 4.03 34.7
0.03237 0 2.18 0 0.458 18.7 394.63 2.94 33.4
0.06905 0 2.18 0 0.458 18.7 396.9 5.33 36.2
0.02985 0 2.18 0 0.458 18.7 394.12 5.21 28.7
0.08829 12.5 7.87 0 0.524 15.2 395.6 12.43 22.9
0.14455 12.5 7.87 0 0.524 15.2 396.9 19.15 27.1
0.21124 12.5 7.87 0 0.524 15.2 386.63 29.93 16.5

12
22-08-2024

Boston housing regression model

Boston housing: Regression model predictions

crim zn indus chas nox ptratio b lstat medv Predicted values Residuals
0.00632 18 2.31 0 0.538 15.3 396.9 4.98 24 30.0 -6.00
0.02731 0 7.07 0 0.469 17.8 396.9 9.14 21.6 25.0 -3.43
0.02729 0 7.07 0 0.469 17.8 392.83 4.03 34.7 30.6 4.13
0.03237 0 2.18 0 0.458 18.7 394.63 2.94 33.4 28.6 4.79
0.06905 0 2.18 0 0.458 18.7 396.9 5.33 36.2 27.9 8.26
0.02985 0 2.18 0 0.458 18.7 394.12 5.21 28.7 25.3 3.44
0.08829 12.5 7.87 0 0.524 15.2 395.6 12.43 22.9 23.0 -0.10
0.14455 12.5 7.87 0 0.524 15.2 396.9 19.15 27.1 19.5 7.56
0.21124 12.5 7.87 0 0.524 15.2 386.63 29.93 16.5 11.5 4.98

• Negative residuals (actual – predicted) -> underpriced? -> good value?

• Positive residuals -> overpriced?

13
22-08-2024

Getting carried away … the story of Zillow

Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
Correlation Simple Regression
No ratings yet
Correlation Simple Regression
26 pages
Correlation
100% (1)
Correlation
29 pages
Spat Itttttt TTTTT TTTTT
No ratings yet
Spat Itttttt TTTTT TTTTT
48 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
RMD S10 Regression
No ratings yet
RMD S10 Regression
22 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Lecture 8 and 9 Regression Correlation and Index
100% (1)
Lecture 8 and 9 Regression Correlation and Index
32 pages
REGRESSION ANALYSIS 1 and 2 Notes
No ratings yet
REGRESSION ANALYSIS 1 and 2 Notes
9 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
SCM Session 6 Correlation and Regression Analysis
No ratings yet
SCM Session 6 Correlation and Regression Analysis
63 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Corr and Regress
No ratings yet
Corr and Regress
61 pages
Quantitative Anaysise Solomon
No ratings yet
Quantitative Anaysise Solomon
51 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Multiple Linear Regression: y BX BX BX
No ratings yet
Multiple Linear Regression: y BX BX BX
14 pages
Correlation and Regression Analysis Using SPSS
No ratings yet
Correlation and Regression Analysis Using SPSS
102 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Unit 4-1
No ratings yet
Unit 4-1
29 pages
BSC - Applied Statistics - Correlation and SLR
No ratings yet
BSC - Applied Statistics - Correlation and SLR
67 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Regression
No ratings yet
Regression
12 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
Parametric Test
No ratings yet
Parametric Test
49 pages
Corr and Regress
No ratings yet
Corr and Regress
30 pages
Chapter5 - Part 2 - Correlation and Regression
No ratings yet
Chapter5 - Part 2 - Correlation and Regression
30 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
Linear Regression Analysis - 1
No ratings yet
Linear Regression Analysis - 1
18 pages
Multiple Regression: by Dr. D. Israel
No ratings yet
Multiple Regression: by Dr. D. Israel
23 pages
Chapter 8 Linear Regression
No ratings yet
Chapter 8 Linear Regression
34 pages
Correlation and Regression
No ratings yet
Correlation and Regression
30 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Cha 6
No ratings yet
Cha 6
8 pages
Regression PDF
No ratings yet
Regression PDF
33 pages
Business Statistics: Regression Basics
No ratings yet
Business Statistics: Regression Basics
56 pages
BES - Lecture 10 - Simple Linear Regression
No ratings yet
BES - Lecture 10 - Simple Linear Regression
15 pages
Lecture Week 13 - Regression
No ratings yet
Lecture Week 13 - Regression
10 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Screenshot 2023-12-04 at 11.27.14
No ratings yet
Screenshot 2023-12-04 at 11.27.14
32 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Multiple Regression Analysis Guide
No ratings yet
Multiple Regression Analysis Guide
19 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
5 Chapter Fi
No ratings yet
5 Chapter Fi
29 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Assignment 2 - Data Management
No ratings yet
Assignment 2 - Data Management
68 pages
Data Analytics Cheat Sheet
No ratings yet
Data Analytics Cheat Sheet
318 pages
Job Stress Impact on Productivity & Commitment
No ratings yet
Job Stress Impact on Productivity & Commitment
11 pages
TYBMS Sem VI Project Work Guidelines
100% (1)
TYBMS Sem VI Project Work Guidelines
18 pages
Lavaan: An R Package For Structural Equation Modeling
No ratings yet
Lavaan: An R Package For Structural Equation Modeling
20 pages
Marketing Research Unit 2
No ratings yet
Marketing Research Unit 2
11 pages
Social Media's Role in Macedonian Firms
No ratings yet
Social Media's Role in Macedonian Firms
4 pages
Cab112:Introduction To Data Science: Session 2024-25 Page:1/2
No ratings yet
Cab112:Introduction To Data Science: Session 2024-25 Page:1/2
2 pages
Implementing Analytics Solutions Using Microsoft Fabric (Beta) v1.0
No ratings yet
Implementing Analytics Solutions Using Microsoft Fabric (Beta) v1.0
42 pages
Multivariate Data Analysis - EFA
No ratings yet
Multivariate Data Analysis - EFA
39 pages
Data Science Action Plan
No ratings yet
Data Science Action Plan
7 pages
Measurement System Analysis
No ratings yet
Measurement System Analysis
1 page
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Definition of Marketing Research
No ratings yet
Definition of Marketing Research
13 pages
Review 14 - Advanced Panel Data Methods
No ratings yet
Review 14 - Advanced Panel Data Methods
6 pages
Model Answer Paper - UT1-QP-ML-SEM7-COMPUTER-2023-3024 Version2
No ratings yet
Model Answer Paper - UT1-QP-ML-SEM7-COMPUTER-2023-3024 Version2
18 pages
A Output One Way Anova
No ratings yet
A Output One Way Anova
4 pages
ZEISS PiWeb Brochure en
No ratings yet
ZEISS PiWeb Brochure en
11 pages
MArketing Research Notes Chapter 1
100% (3)
MArketing Research Notes Chapter 1
9 pages
Hypothesis Testing With One Sample: Larson & Farber, Elementary Statistics: Picturing The World, 3e 3
No ratings yet
Hypothesis Testing With One Sample: Larson & Farber, Elementary Statistics: Picturing The World, 3e 3
23 pages
Practical No - 01: Aim: Data Collection, Data Curation and Management For Unstructured Data (Nosql) Using Apache Couchdb
No ratings yet
Practical No - 01: Aim: Data Collection, Data Curation and Management For Unstructured Data (Nosql) Using Apache Couchdb
79 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
3 Xbar S Chart Exercise PDF
No ratings yet
3 Xbar S Chart Exercise PDF
1 page
Keywords:-: Leadership Style, Compensation, Job Satisfaction, Work Motivation, Employee Performance
No ratings yet
Keywords:-: Leadership Style, Compensation, Job Satisfaction, Work Motivation, Employee Performance
11 pages
(Ebook) Practical Research: Design and Process, 13th Global Edition by Jeanne Ellis Ormrod ISBN 9780137871537, 0137871538 Full Chapters Instanly
No ratings yet
(Ebook) Practical Research: Design and Process, 13th Global Edition by Jeanne Ellis Ormrod ISBN 9780137871537, 0137871538 Full Chapters Instanly
174 pages
Educational Research Competencies For Analysis and Applications
No ratings yet
Educational Research Competencies For Analysis and Applications
7 pages
Extraction of Vietnam Address Data From The Unstructured Text - by Linh Tran - in Towards Data Science - Freedium
No ratings yet
Extraction of Vietnam Address Data From The Unstructured Text - by Linh Tran - in Towards Data Science - Freedium
13 pages
Biodun Oluwole - Business Analyst - Jconnect Infotech.
No ratings yet
Biodun Oluwole - Business Analyst - Jconnect Infotech.
9 pages
Ai&Ml Lab Manual r21
No ratings yet
Ai&Ml Lab Manual r21
45 pages
Daf1212 Business Statistics II Reg Main
No ratings yet
Daf1212 Business Statistics II Reg Main
2 pages

Week 03 Regression

Uploaded by

Week 03 Regression

Uploaded by

22-08-2024

Association between interval variables

Computing the correlation coefficient

• For r = 0.903 with n = 10,

Correlation and sample size

Sample size Value of r required to reach statistical significance at …

• for very large n, a very small r might be significant

Association between ordinal variables

Rank Correlation example

• Spearman rank correlation coefficient

Correlation and regression …1

The linear regression model

area in square feet

Relation of apartment prices to floor area (hypothetical)

The best-fit regression line

Y = A + BX (note A, B in caps) minimizes Se2

Solution to minimization problem

• For the mathematically inclined, here’s how A and B (optimum values

Interpreting the slope

The value of Y does

Scale Invariance (or not)

Correlation and regression …2

Standardized regression coefficients

Generalizing to multiple regression

• to the baseline of predicting Y using just its mean Y

• k = number of explanatory variables Xi

Overall F-test of model

Desirable properties of regression model

Another example: Boston housing prices

Excerpt of Boston housing data

Boston housing regression model

Boston housing: Regression model predictions

• Negative residuals (actual – predicted) -> underpriced? -> good value?

Getting carried away … the story of Zillow

You might also like