[go: up one dir, main page]

0% found this document useful (0 votes)
14 views15 pages

Report

report

Uploaded by

m9nj7w5ysz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views15 pages

Report

report

Uploaded by

m9nj7w5ysz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Analysis of Factors Affecting

Housing Prices in the Chicago


Metropolitan Area

ECON 312(Introduction to Econometrics)

Xinyi Hu

2024.08.08
0.1 Introduction
The objective of this project is to understand and analyze various factors that influ-
ence housing values in the Chicago metropolitan area. These factors include structural
characteristics of the house, geographic location, accessibility, and local government ex-
penditure policies. By constructing an econometric model, we study how these factors
individually or collectively affect the market price of houses.
This topic is worth researching because housing is the largest asset for most house-
holds, and understanding the determinants of housing prices is crucial for families, in-
vestors, and policymakers. Especially in metropolitan areas, fluctuations in housing prices
can have broad economic impacts.

0.2 Data Description


Our dataset includes the following variables:

• Dependent Variable: SPRICE (Housing Price)

• Independent Variables:

– NROOMS (Number of Rooms)


– LVAREA (Living Area)
– HAGEEFF (Effective Age of House)
– LSIZE (Land Size)
– AIRCON (Air Conditioning)
– NBATH (Number of Bathrooms)
– GARAGE (Type of Garage)
– PTAXES (Property Taxes)
– PCTWHT (Percentage of White Population)
– MEDINC (Median Income)
– DFCL (Distance to City Center)
– DFNI (Distance to Nearest Highway Entrance)
– SSPEND (School Expenditure)
– MSPEND (Municipal Expenditure)
– COOK (Located in Cook County)

1
– OHARE (Located Near O’Hare Airport)

Note: Some variables are transformed into logarithmic form for regression analysis.

0.3 Basic Assumptions


When constructing and estimating the econometric model, we need to make some
basic assumptions to ensure the validity and reliability of the results.

0.3.1 Linear Relationship Assumption


It is assumed that there is a linear relationship between the dependent variable
(housing price) and the explanatory variables:

ln(SPRICE) =β0 + β1 ln(NROOMS) + β2 ln(LVAREA) + β3 ln(HAGEEFF) + β4 ln(LSIZE)


+ β5 ln(PTAXES) + β6 ln(MEDINC) + β7 ln(DFCL) + β8 ln(SSPEND)
+ β9 ln(MSPEND) + β10 AIRCON + β11 NBATH + β12 GARAGE + β13 PCTWHT
+ β14 DFNI + β15 COOK + β16 OHARE + ϵ
(1)

0.3.2 Independent and Identically Distributed Error Term As-


sumption
It is assumed that the error term ϵ is independently and identically distributed, with
a zero mean and constant variance:

ϵ ∼ N (0, σ 2 ) (2)

This means that the error term has no autocorrelation or heteroscedasticity.

0.3.3 Multicollinearity Assumption


It is assumed that there is no perfect multicollinearity among the explanatory vari-
ables, meaning there are no perfect linear relationships among the explanatory variables.
This ensures that we can uniquely estimate the coefficients for each explanatory variable.

2
0.3.4 Exogeneity Assumption
It is assumed that the explanatory variables are exogenous, meaning there is no
correlation between the explanatory variables and the error term:

E(ϵ|X) = 0 (3)

where X is the matrix of all explanatory variables.

0.3.5 Normality Assumption


It is assumed that the error term follows a normal distribution, which is necessary
for statistical inference (such as t-tests and F-tests):

ϵ ∼ N (0, σ 2 ) (4)

0.4 Data Transformation


Transform the appropriate data into logarithmic form for regression analysis.

0.5 Initial Regression Analysis


Conduct an initial regression analysis on all variables.

0.5.1 Regression Equation:

ln(SPRICE) =4.8104 + 0.1740 ln(NROOMS) + 0.2434 ln(LVAREA) − 0.0844 ln(HAGEEFF)


+ 0.1066 ln(LSIZE) − 0.4652 ln(PTAXES) + 0.4239 ln(MEDINC) − 0.2345 ln(DFCL)
+ 0.1158 ln(SSPEND) − 0.0538 ln(MSPEND) + 0.0421AIRCON + 0.0356NBATH
+ 0.0137GARAGE + 0.0029PCTWHT − 0.0023DFNI + 0.0492COOK
+ 0.1025OHARE
(5)

3
(0.5565) (0.0421) (0.0384) (0.0098)
(0.0114) (0.0591) (0.0286) (0.0161)
(0.0519) (0.0211) (0.0068) (0.0138)
(0.0043) (0.0003) (0.0040) (0.0282) (0.0317)

Note: The numbers in parentheses are the standard errors of the estimated coeffi-
cients.
The F-statistic and R² of the estimated model are as follows:

• F-statistic: 141.09

• R²: 0.5324

0.5.2 Interpreting R²
The R² value of 0.5324 indicates that the explanatory variables explain 53.24% of the
variation in the dependent variable. This means that the model has a relatively strong
explanatory power for housing price variation.

0.5.3 Overall Significance Test


Conduct an F-test on the regression equation, with the results as follows:

• F( 16, 1983) = 141.09

• Prob > F = 0.0000

The model is significant at the 5% level. The significance test is passed.

0.5.4 Significance Test for Variable ln_HAGEEFF


Conduct a significance test for the variable ln_HAGEEFF, with the results as follows:

• F( 1, 1983) = 74.14

• Prob > F = 0.0000

The variable ln_HAGEEFF is significant at the 5% level.

4
0.6 Variable Selection
Conduct a stepwise regression analysis, removing insignificant variables DFNI and
COOK, and obtain the final regression equation.
The final regression equation is as follows:

ln(SPRICE) =4.3797 + 0.1714 ln(NROOMS) + 0.2433 ln(LVAREA) − 0.0842 ln(HAGEEFF)


+ 0.1065 ln(LSIZE) − 0.3840 ln(PTAXES) + 0.4173 ln(MEDINC) − 0.2406 ln(DFCL)
+ 0.1704 ln(SSPEND) − 0.0685 ln(MSPEND) + 0.0424AIRCON + 0.0359NBATH
+ 0.0135GARAGE + 0.0029PCTWHT + 0.1084OHARE
(6)
(0.5111) (0.0421) (0.0384) (0.0098) (0.0114) (0.0410) (0.0283) (0.0158) (0.0439) (0.0198)
(0.0068) (0.0138) (0.0043) (0.0003) (0.0314)
Note: The numbers in parentheses are the standard errors of the estimated coeffi-
cients.
The F-statistic and R² of the estimated model are as follows:

• F-statistic: 160.77

• R²: 0.5314

0.7 Subset Test


According to Stata output, the results of the subset test are as follows:

• F(14, 1985) = 1.07

• Prob > F = 0.3741

The subset test results show that there is no significant difference in the overall significance
of the model after removing insignificant variables, indicating that our final model still
explains the variation in the dependent variable well.

0.8 Conclusion

0.8.1 Main Conclusions


1. Number of Rooms (ln_NROOMS): An increase in the number of rooms has a
significant positive impact on housing prices. Each additional room increases the housing
price by approximately 17.14%.

5
2. Living Area (ln_LVAREA): Living area has a significant positive impact on hous-
ing prices. Each 1% increase in living area increases the housing price by approximately
24.33%.
3. Effective Age of House (ln_HAGEEFF): House age has a significant negative
impact on housing prices. Each 1% increase in house age decreases the housing price by
approximately 8.42%.
4. Land Size (ln_LSIZE): Land size has a significant positive impact on housing
prices. Each 1% increase in land size increases the housing price by approximately 10.65%.
5. Property Taxes (ln_PTAXES): Property taxes have a significant negative impact
on housing prices. Each 1% increase in property taxes decreases the housing price by
approximately 38.40%.
6. Median Income (ln_MEDINC): Median income in the neighborhood has a sig-
nificant positive impact on housing prices. Each 1% increase in median income increases
the housing price by approximately 41.73%.
7. Distance to City Center (ln_DFCL): The farther the house is from the city center,
the lower the housing price. Each 1% increase in distance decreases the housing price by
approximately 24.06%.
8. School Expenditure (ln_SSPEND): School expenditure has a significant positive
impact on housing prices. Each 1% increase in school expenditure increases the housing
price by approximately 17.04%.
9. Municipal Expenditure (ln_MSPEND): Municipal expenditure has a significant
negative impact on housing prices. Each 1% increase in municipal expenditure decreases
the housing price by approximately 6.85%.
10. Air Conditioning (AIRCON): Houses with air conditioning have higher prices,
increasing by approximately 4.24%.
11. Number of Bathrooms (NBATH): An increase in the number of bathrooms has
a significant positive impact on housing prices. Each additional bathroom increases the
housing price by approximately 3.59%.
12. Garage (GARAGE): Houses with a garage have higher prices, increasing by
approximately 1.35%.
13. Percentage of White Population (PCTWHT): The percentage of the white pop-
ulation has a significant positive impact on housing prices. Each 1% increase in the
percentage increases the housing price by approximately 0.29%.
14. Located Near O’Hare Airport (OHARE): Houses located near O’Hare Airport
have higher prices, increasing by approximately 10.84%.

6
0.8.2 Advantages and Disadvantages of the Model and Results
Advantages

• High Explanatory Power: The model’s R² value is 0.5314, indicating that the
explanatory variables can explain 53.14% of the variation in the dependent variable,
showing strong explanatory power.

• High Significance: Most of the explanatory variables are statistically significant,


indicating that their impact on housing prices is statistically meaningful.

• Simplicity: The model is simplified by removing insignificant variables through


stepwise regression analysis, improving the model’s explanatory power.

Disadvantages

• Potential Omitted Variables: There may be factors that affect housing prices
not included in the model, leading to incomplete explanatory power.

• Variable Correlation: There may be correlations between some variables, requir-


ing further analysis and validation.

0.8.3 Non-Technical Findings


• More Rooms and Larger Living Area Increase House Value: We find that
an increase in the number of rooms and living area significantly raises housing
prices. This means that to increase the value of a house, one could consider adding
more rooms or expanding the living area.

• Negative Impact of House Age on Housing Prices: Newer houses are more
valuable than older ones. Each additional year of house age decreases the price,
indicating that buying a new house might be a better investment choice.

• High Tax Rates Lower Housing Prices: High property tax rates significantly
lower housing prices. This means that houses in areas with high tax rates are
cheaper, which could affect home buying decisions.

• Good Schools and High-Income Neighborhoods Increase Housing Prices:


We find that school expenditure and median income in the neighborhood signifi-
cantly increase housing prices. Choosing to buy a house in these areas not only
provides access to quality education resources but also offers the added value of the
community.

7
• Higher Prices Near City Center: Houses closer to the city center have higher
prices, indicating that the convenience of the city center makes these houses more
desirable.

• Higher Prices Near O’Hare Airport: Houses near O’Hare Airport have higher
prices, possibly due to the convenience of transportation and concentration of eco-
nomic activities.

These findings provide valuable information for home buyers, investors, and policy-
makers, helping them make more informed decisions.

8
0.9 Appendix
Loading the data:

import excel "MP11.xls", sheet("Sheet1") firstrow clear

describe

Since some variables have zero values, only suitable variables are transformed into loga-
rithmic form. The Stata commands are as follows:

gen ln_SPRICE = log(SPRICE)


gen ln_NROOMS = log(NROOMS)
gen ln_LVAREA = log(LVAREA)
gen ln_HAGEEFF = log(HAGEEFF)
gen ln_LSIZE = log(LSIZE)
gen ln_PTAXES = log(PTAXES)
gen ln_MEDINC = log(MEDINC)
gen ln_DFCL = log(DFCL)
gen ln_SSPEND = log(SSPEND)
gen ln_MSPEND = log(MSPEND)

The F-statistic and R² of the estimated model are as follows:

• F-statistic: 141.09

• R²: 0.5324

Run the regression model and view the results using the following Stata commands:

reg ln_SPRICE ln_NROOMS ln_LVAREA ln_HAGEEFF ln_LSIZE ln_PTAXES


ln_MEDINC ln_DFCL ln_SSPEND ln_MSPEND AIRCON NBATH GARAGE PCTWHT DFNI COOK OHARE
estat vce

The regression results are as follows:

Source | SS df MS Number of obs = 2,000


-------------+---------------------------------- F(16, 1983) = 141.09
Model | 129.192109 16 8.0745068 Prob > F = 0.0000

9
Residual | 113.487647 1,983 .057230281 R-squared = 0.5324
-------------+---------------------------------- Adj R-squared = 0.5286
Total | 242.679755 1,999 .121400578 Root MSE = .23923

------------------------------------------------------------------------------
ln_SPRICE | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ln_NROOMS | .1740158 .0421356 4.13 0.000 .0913811 .2566506
ln_LVAREA | .2433724 .0383658 6.34 0.000 .1681309 .3186139
ln_HAGEEFF | -.0843861 .0098007 -8.61 0.000 -.1036069 -.0651653
ln_LSIZE | .1066477 .0114179 9.34 0.000 .0842553 .1290401
ln_PTAXES | -.465165 .0591439 -7.86 0.000 -.5811558 -.3491742
ln_MEDINC | .4239539 .0285668 14.84 0.000 .3679299 .479978
ln_DFCL | -.2344655 .0161426 -14.52 0.000 -.2661239 -.2028073
ln_SSPEND | .1158414 .051852 2.23 0.026 .0141512 .2175316
ln_MSPEND | -.0537627 .021136 -2.54 0.011 -.0952138 -.0123115
AIRCON | .0420512 .0067691 6.21 0.000 .0287759 .0553265
NBATH | .0356002 .013793 2.58 0.010 .0085498 .0626506
GARAGE | .0137483 .0043079 3.19 0.001 .0052998 .0221969
PCTWHT | .0029339 .0003053 9.61 0.000 .0023352 .0035326
DFNI | -.0023108 .0039563 -0.58 0.559 -.0100698 .0054482
COOK | .0491633 .0282415 1.74 0.082 -.0062228 .1045494
OHARE | .1025445 .0316506 3.24 0.001 .0404726 .1646164
_cons | 4.810413 .5564745 8.64 0.000 3.719077 5.901749
------------------------------------------------------------------------------

The Stata command and results are as follows:

test ln_NROOMS ln_LVAREA ln_HAGEEFF ln_LSIZE ln_PTAXES ln_MEDINC ln_DFCL


ln_SSPEND ln_MSPEND AIRCON NBATH GARAGE PCTWHT DFNI COOK OHARE
( 1) ln_NROOMS = 0
( 2) ln_LVAREA = 0
( 3) ln_HAGEEFF = 0
( 4) ln_LSIZE = 0
( 5) ln_PTAXES = 0
( 6) ln_MEDINC = 0
( 7) ln_DFCL = 0

10
( 8) ln_SSPEND = 0
( 9) ln_MSPEND = 0
(10) AIRCON = 0
(11) NBATH = 0
(12) GARAGE = 0
(13) PCTWHT = 0
(14) DFNI = 0
(15) COOK = 0
(16) OHARE = 0

F( 16, 1983) = 141.09


Prob > F = 0.0000

The Stata command and results are as follows:

test ln_HAGEEFF
( 1) ln_HAGEEFF = 0

F( 1, 1983) = 74.14
Prob > F = 0.0000

Stepwise regression analysis, removing insignificant variables step by step, the Stata
command and results are as follows:

stepwise, pr(0.05): reg ln_SPRICE ln_NROOMS ln_LVAREA ln_HAGEEFF ln_LSIZE

ln_PTAXES ln_MEDINC ln_DFCL ln_SSPEND ln_MSPEND AIRCON NBATH GARAGE


PCTWHT DFNI COOK OHARE

Wald test, begin with full model:


p = 0.5592 >= 0.0500, removing DFNI
p = 0.0505 >= 0.0500, removing COOK

Source | SS df MS Number of obs = 2,000


-------------+---------------------------------- F(14, 1985) = 160.77
Model | 128.953411 14 9.21095791 Prob > F = 0.0000
Residual | 113.726345 1,985 .057292869 R-squared = 0.5314

11
-------------+---------------------------------- Adj R-squared = 0.5281
Total | 242.679755 1,999 .121400578 Root MSE = .23936

------------------------------------------------------------------------------
ln_SPRICE | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ln_NROOMS | .1713756 .042117 4.07 0.000 .0887774 .2539739
ln_LVAREA | .2432909 .0383839 6.34 0.000 .1680139 .3185679
ln_HAGEEFF | -.0842359 .0097641 -8.63 0.000 -.1033848 -.065087
ln_LSIZE | .1064532 .0113601 9.37 0.000 .0841743 .1287321
ln_PTAXES | -.3839956 .0409537 -9.38 0.000 -.4643124 -.3036788
ln_MEDINC | .4173116 .0283497 14.72 0.000 .3617132 .4729099
ln_DFCL | -.2405759 .0158333 -15.19 0.000 -.2716275 -.2095242
ln_SSPEND | .1704485 .0439271 3.88 0.000 .0843004 .2565966
ln_MSPEND | -.0685232 .0198029 -3.46 0.001 -.1073598 -.0296866
AIRCON | .0424246 .0067674 6.27 0.000 .0291526 .0556966
NBATH | .0358732 .0137957 2.60 0.009 .0088176 .0629288
GARAGE | .0134979 .0043023 3.14 0.002 .0050603 .0219354
PCTWHT | .0028833 .0002941 9.80 0.000 .0023066 .00346
OHARE | .108432 .0313736 3.46 0.001 .0469034 .1699606
_cons | 4.379715 .5111181 8.57 0.000 3.37733 5.382099
------------------------------------------------------------------------------

Save the complete model and final model, and conduct a subset test:

estimates store full_model


estimates store final_model
suest full_model final_model
test [full_model=final_model] (ln_NROOMS ln_LVAREA ln_HAGEEFF ln_LSIZE ln_PTAXES
ln_MEDINC ln_DFCL ln_SSPEND ln_MSPEND AIRCON NBATH GARAGE PCTWHT OHARE)

reg ln_SPRICE ln_NROOMS ln_LVAREA ln_HAGEEFF ln_LSIZE ln_PTAXES


ln_MEDINC ln_DFCL ln_SSPEND ln_MSPEND AIRCON NBATH GARAGE PCTWHT OHARE

Source | SS df MS Number of obs = 2,000


-------------+---------------------------------- F(14, 1985) = 160.77

12
Model | 128.953411 14 9.21095791 Prob > F = 0.0000
Residual | 113.726345 1,985 .057292869 R-squared = 0.5314
-------------+---------------------------------- Adj R-squared = 0.5281
Total | 242.679755 1,999 .121400578 Root MSE = .23936

------------------------------------------------------------------------------
ln_SPRICE | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ln_NROOMS | .1713756 .042117 4.07 0.000 .0887774 .2539739
ln_LVAREA | .2432909 .0383839 6.34 0.000 .1680139 .3185679
ln_HAGEEFF | -.0842359 .0097641 -8.63 0.000 -.1033848 -.065087
ln_LSIZE | .1064532 .0113601 9.37 0.000 .0841743 .1287321
ln_PTAXES | -.3839956 .0409537 -9.38 0.000 -.4643124 -.3036788
ln_MEDINC | .4173116 .0283497 14.72 0.000 .3617132 .4729099
ln_DFCL | -.2405759 .0158333 -15.19 0.000 -.2716275 -.2095242
ln_SSPEND | .1704485 .0439271 3.88 0.000 .0843004 .2565966
ln_MSPEND | -.0685232 .0198029 -3.46 0.001 -.1073598 -.0296866
AIRCON | .0424246 .0067674 6.27 0.000 .0291526 .0556966
NBATH | .0358732 .0137957 2.60 0.009 .0088176 .0629288
GARAGE | .0134979 .0043023 3.14 0.002 .0050603 .0219354
PCTWHT | .0028833 .0002941 9.80 0.000 .0023066 .00346
OHARE | .108432 .0313736 3.46 0.001 .0469034 .1699606
_cons | 4.379715 .5111181 8.57 0.000 3.37733 5.382099
------------------------------------------------------------------------------

. test [full_model=final_model] (ln_NROOMS ln_LVAREA ln_HAGEEFF


ln_LSIZE ln_PTAXES ln_MEDINC ln_DFCL ln_SSPEND ln_MSPEND
AIRCON NBATH GARAGE PCTWHT OHARE)

(1) full_model: ln_NROOMS = 0


(2) full_model: ln_LVAREA = 0
(3) full_model: ln_HAGEEFF = 0
(4) full_model: ln_LSIZE = 0
(5) full_model: ln_PTAXES = 0
(6) full_model: ln_MEDINC = 0
(7) full_model: ln_DFCL = 0
(8) full_model: ln_SSPEND = 0

13
(9) full_model: ln_MSPEND = 0
(10) full_model: AIRCON = 0
(11) full_model: NBATH = 0
(12) full_model: GARAGE = 0
(13) full_model: PCTWHT = 0
(14) full_model: OHARE = 0
(15) final_model: ln_NROOMS = 0
(16) final_model: ln_LVAREA = 0
(17) final_model: ln_HAGEEFF = 0
(18) final_model: ln_LSIZE = 0
(19) final_model: ln_PTAXES = 0
(20) final_model: ln_MEDINC = 0
(21) final_model: ln_DFCL = 0
(22) final_model: ln_SSPEND = 0
(23) final_model: ln_MSPEND = 0
(24) final_model: AIRCON = 0
(25) final_model: NBATH = 0
(26) final_model: GARAGE = 0
(27) final_model: PCTWHT = 0
(28) final_model: OHARE = 0

F(14, 1985) = 1.07


Prob > F = 0.3741

14

You might also like