CH - 3 - Simple and Multiple Linear Regressions in Stata

Application to Cross Sectional Econometrics in stata

Uploaded by

mengistu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

162 views36 pages

CH - 3 - Simple and Multiple Linear Regressions in Stata

Application to Cross Sectional Econometrics in stata

Uploaded by

mengistu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 36

Mengistu Yismaw (MSc.) Department of Economics Debre Markos University (Burie Campus) Email: menyis.2012@gmail.comChapter ou Simple linear regression. ‘Regression with only qualitative (dummy) regressors: ANOVA © Specification © Estimation © Interpretation Multiple inear regression Regression with qualitative and quantitative regressors: ANCOVA © Specification © Estimation © Interpretation © Test of LRM assumptions © Violations of some of the CLRM assumption © Interaction effect 4 qualitative Response Regression Models: Dummy as dependent variable (Binary choice model) * near Probability Model (LPM) © Specification © Estimation © Interpretation o "CHAPTER THREE: CROSS SECTIONAL ECONOMETRICS. CMIa y PNAaAPRON = Methodology of econometrics analysis What are the steps or procedures of econometricians in their analysis of an economic problem? Broadly speaking, classical econometric methodology proceeds along the following lines (steps): Develop statement of theory or hypothesis Specification of the mathematical model of the theory Specification of the statistical, or econometric, model Obtaining the data Estimation of the parameters of the econometric model Hypothesis testing Forecasting or prediction Using the model for control or policy purposes OT2.1. Simple Linear Regression Simple linear regression= single regressor (independent variable) Suppose a regression with only qualitative (durnmy) regressor= ANOVA Regression: ‘Step 1: Develop a statement of theory or hypothesis Suppose we want to know if there isa productivity difference between male and female headed households > i.e. Suppose Gender is our independent variable Gender is a dummy (binary or Nominal scale ) variable Nominal scale variable: it is a type of variable which gives qualitative information only. male 1male i. Gender { cans let yan Note: the above coding ‘0’ or'1’ is used for identification purpose only. > Then values of nominal scale variable can't be divided, subtracted or ordered for comparison > This type of variable sometimes called dummy variable IIo)Step 2: Specification of the mathematical model of the theory Yield = BO+ B1Gender Step 3: Specification of the statistical, or econometric, model And let your multiple linear regression model is: Yield = By+ B1Gender +pi Step 4: Obtaining the data Then the next step is going to field and collect the data.» The next step is entering the data in to the appropriate software and format. O Remember ways of entering the data in to stata i. Directly entering the data in to the stata ii, Entering the data in to excel and import to the stata iii. Entering the data in to SPSS and save it in the appropriate stata format or use stata transfer software > The next step is estimating the model OeStep 5: Estimation of the parameters of the econometric model Ordinary Least Square (OLS) estimation techniques using stata Statistics mam models and related ==) Linear regression mm)Select the dependent and independent variables ===> Click submit = Click ok Syntax: reg depvar indepvarExample: eg Yield Gender nt Some statistical manipulations Depo of eedom(, sample size Penumber of parameters indep Vas) ben. of variables > (1, 30-2) (1,28) 8 — 0.4610 Estimate te residuals Estimate te ited value 4-BiGender Fe) ~ 1088 t= 508 Cl for fy = + #220) where; ¢2 ~~ value at (30 — 2,0%5/,) ~ 2.048 > CIfor fy=5.5278 +2.048(1.088)= (3.2986, 7.7568) WaQO To estimate the RSS (Residual), follow the following steps PaOQ To estimate the ESS (model), follow the following steps Dae SAInterpretation of coefficients What does the estimate 5.527 show? It is coefficient for Male showing that the average productivity of Male headed households is higher than female headed households by 5.527 at (sig at 1%): remember the t-test result in ch-2 What about the estimate 3.5? Average productivity of omitted category (Female headed households) ‘Why we omitted one category (Female)? Not to fall in dummy variable trap Average productivity of male headed hhs=3.5+ 5.527 *1= 9.027: remember the t-test result in ch-2 (Or use prediction ‘Average productivity of female headed hhs= 3.5+ 5.527 *0= 3.5 Or use prediction The productivity difference b/n male and female headed hh 9,027-3.5= 5.527 WaCo (e Exercise « Is there a significant difference in average productivity between households with and without access to credit? « What is the average productivity of households with access to credit? « How much of the productivity variation is explained by access to credit? CROSS SECTIONAL ECONOMET2.2. Multiple Linear Regression 2 Multiple linear regression= many regressors (independent variable) © Suppose: Dummy and continuous variables as an independent variable= ANCOVA Regression. Suppose you are going to analyze various determinants of maize productivity Based on your literature review, you think that maize productivity can be affected by: + Age of the household head ¥ Land fragmentation + Fertilizer applied per hectare ¥ Household land size ¥ Gender of the household head Then the multiple linear regression model will be: Yield= Bot Byaget Byfragment+ B.fertlizert B,land+ B.Gender+pi Note: estimation techniques are the same as simple linear regression model. Syntax: reg dep var indep vars. MENGISTU Y, UE Dae EL)Example: reg Yield age fragment fertlizer land Gender_n1 2 Based on p-value from five explanatory variables, only two variables (fragment and Gender) are significant. Note: only significant variables will be analyzed. a Then let as analyze the coefficients of significant variables 2 However, before making the analysis of the result, it is important to judge the efficiency of the model using some ed onteretetioneut equation of the regression model beams Ss 08sGender Vild~ 5.561740.055age 0.676Kragment-0.00Afertier 0, diagnostic tests. 2 In particular, inferences based on OLS results can be valid depending on whether the classical linear regression (CLRM) assumptions hold. UOa Now let as test the some of CLRM assumptions called diagnostic tests: i. Multicollinearity Test a The term multicollinearity means the existence of perfect or exact linear relationship among all or some of the explanatory variables of the regression model. a And the existence of multicollinearity can be examined (detected) using various techniques such as using auxiliary regression, pair-wise correlations among regressors and variance inflation factor (VIF) and or tolerance margin (1/VIF). @ VIF is most commonly used which measures how the variance of an estimator is inflated by the presence of multicollinearity. Note: Multicollinearity is a matter of degree and not of kind. < Itis not between the presence and the absence of its degrees (high or perfect)!Informal test: High R2 but t-ratio Formal tests: Take auxiliary regression Test pair-wise correlations among regressors Decision: best if less than 0.50 Test for variance inflation factor and tolerance Decision Asa rule of thumb if VIF is >10 or if 1/VIF < 10% (close to zero) there 1s multicollinearit. > Since our result shows that VIF ofall variables are less than 10 and I/VIF of all variables are grater than 10%, multicollinearity 1s not a problem in our model Note: > Multicollinearty is not a problem for nonlinear relationships between variables > Multicollinearity is essentially a sample (regression) phenomenon not for the population. Wa+ ° Remedial measures if there is multicollinearity problem Drop one or more of the perfectly collinear variables Take sample over wide area (increase the sample size) Take new data Transformation of variables (take square, natural logarithm...) Combining cross-sectional and time series data Do nothing: Multicollinearity is God’s will, according to Blanchard multicollinearity is essentially a data deficiency problem not a problem with OLS or statistical technique in general. MENGISTU Y, LEST DEES)|. Test of homoscedasticity a It is the test of the variance of the error (disturbance) term. alf the error term doesn’t have a constant variance, we say there is Heteroscedasticity problem. a The nature of the variance of the error term can be judged by Breusch-Pagan test.Stata command: hettest Then you get the following result (Deasion- if the P-value 1s sufficiently small, e, if below chosen significant level (usually TO%), we reject the null hypothesis (Ho) of homoscedasticity (constant variance and accept the alternative hypothesis (1). Since our result shows that P-value is less than 10%, we have to reject Ho Then there is no constant variance (there is Heteroscedasticity problem) in our model. MENGISTU Y, eur ee USRemedial measures for Heteroscedasticity problem Check for outliers (for the dependent variables) Use robust regression Example: reg Yield age fragment fertlizer land Gender_ni, robust Note: hettest is not appropriate after robust regression Waiii. Model Specification test Model specification test basically deals about: » The exclusion of relevant explanatory variables > The inclusion of irrelevant variables > Functional form error UIE Dae EL)Q Take Ramsey reset test Syntax: ovtest Decision: if the P-value is sufficiently small, that is, if below chosen significant level (usually 10%), we reject the null hypothesis (Ho) of homoscedasticity (constant variance and accept the alternative hypothesis (H,) < Implies that there is no model specification problem. TER THREE” CROSS SECTIONAL ECONOMETRICS DEBRE MARKOS UNIVERSITY(DMU) MENGISTU Y,iv. Normality of the disturbance term + There are various ways of testing the normality of ui. For example: y_ histogram with normal curve of residuals ¥ Normal probability plot and others COSA) LEST MENGISTU Y, oyTest of normality of the disturbance term using stata > First generate the disturbance term (U;) Syntax: predict ui, residual > Second test of normality of the disturbance term (Ui) a. Draw histogram of the ui with normal curve Syntax: histogram ui, normal v Then you get the result likeNormal probability and quartile plot Syntax: pnorm ui or qnorm ui > Then you get the following result respectively Noman) bia 0% enpwcsifsumie) °° o : aos > Both graphs shows that the disturbance term (ui) is almost normal. MENGISTU Y, LEST DEES)lations of Some of the VPC ellie The presence of multicollinearity a We said that multicollinearity means the existence of perfect or exact linear relationship among all or some of the explanatory variables of the regression model. » Let us assume that the variable fertilizer is twice that of age. » Then let us create hypothetical variable called age3 which is a function of age Syntax: gen age3=50+ageNote: we deliberately make the 10th observation of age3 95 instead of 75, unless the stata will drop one of the perfectly correlated variables in the regression. WeThen after regression with the new data, we get the following VIF result vit r DOTCategorical variables as a regressor ‘Suppose: Educational level (EducLevel) ‘Syntax: reg depvar i. Categorical var Example: reg Yield fertlizer Gender_n1 i.€ducLevel_ni Note: When you put i. Infront of the Categorical variables variable the software automatically drop the one category (usually the lowest category) that will be your bench mark Unless you put | Infront of the Categorical variables the software consider the variable as a continuous variable Your estimate will be wrong WaAnswer the following questions based on the regression result given below A. What does 4.612 shows? 8. Whats the average productivity d/ce b/n male and female headed hhs? C. Whatis the difference in average productivity b/n hhs with illiterate and secondary educ. completed heads? D. What is the difference in average productivity b/n hhs with secondary and post-secondary educ. completed heads? 7 PIE eerDiscussion question a What is the average productivity of households managed by male and secondary educ. completed heads? Ne eSTo know the average productivity of households managed by male and secondary educ. Completed heads. 1%: we have to generate interaction variable of Gender and educational level 2nd: make regression using the newly generated variable reg Yield fertlizer i.IntGenEduc The average productivity of households managed by male and secondary educ. Completed heads is 3.82 = UESThe linear probabi model (LPM) Suppose you are intended to investigate the effect of gender and land size on access to credit Model: Credit_Dummy_n1= By* B,land+ B,Gender+pi Since the dependent variable is takes values which are either 0 or 1, the model can be interpreted as the probability of observing a 0 or 1 given the explanatory variables Though the LPM model is not entirely correct, we can use OLS to estimate it. WaInterpretation of coefficients Interpret the intercept Interpret the coefficient of land Interpret the coefficient of Gender Answer ‘A. The probability of access to credit for female managed HHs with no land Is 0.168 or 17% Note: if the intercept term is negative, it will be interpreted as zero (because probability can’t be negative) B, The coefficient of land shows that for one hectare increase in HH's land size, on average, probability of access to credit decrease by 0.00069 or 0.07% but itis not statistically significant. However, we can estimate the actual probability of access to credit for a particular HH land size. Example: suppose the male managed HH with land size of S hectare E(x/land = 5, Gender = 1) = 0.168 ~ 0.00067 *5 +0499 +1 = 0.664 Or use prediction C. The coefficient of Gender shows that the probability of access to credit for male managed HHs greater than female managed HHs on average by 50% Uae SAieee UEP MCrsP eM CoM ire coreim ecm PeLar TT Importing STATA result to Microsoft word 1. Using asdoc Syntax: add asdoc before stata commands except for figure commands Examples: asdoc sum asdoc reg Yield age fragment fertlizer land Gender_n1 = EducLevel_n1 Credit t_Dummy_n1 Fete r & The software authomatically save your result in Microsoft word file “Myfile.doc” in the working directory you are working on. >» Click on “Myfile.doc” in the stata result window to open the document MENGISTU Y, CU LESSEE Uae SA2. Using outreg2 Itis used for regression results Syntax: Note: run simultaneously Example: reg Yield age fragment fertlizer land Gender_n1 EducLevel_n1 Credit_Dummy_ni outreg2 using Table1.doc, replace The software automatically save your result in Microsoft word file “Table1” in the working directory you are working on. Click on “Table” in the stata result window to open the document Note: outreg2 is usually used for publication purpose. For your senior essay please use asdoc option. — Wa

Chapter 5 - Violations of Regression Assumptions
No ratings yet
Chapter 5 - Violations of Regression Assumptions
44 pages
CH - 2 - Application To Univariate and Bivariate Analysis in Stata
No ratings yet
CH - 2 - Application To Univariate and Bivariate Analysis in Stata
32 pages
CH - 1 - Introduction To Econometrics Software Stata
No ratings yet
CH - 1 - Introduction To Econometrics Software Stata
35 pages
CH - 4 - Application To Time Series and Panel Data in Stata
No ratings yet
CH - 4 - Application To Time Series and Panel Data in Stata
40 pages
Stata Panel Data Analysis Guide
No ratings yet
Stata Panel Data Analysis Guide
90 pages
Panel Data Methods For Microeconometrics Using Stata: A. Colin Cameron Univ. of California - Davis
100% (1)
Panel Data Methods For Microeconometrics Using Stata: A. Colin Cameron Univ. of California - Davis
55 pages
Lab Introduction To STATA
100% (1)
Lab Introduction To STATA
27 pages
Econometrics With Stata PDF
No ratings yet
Econometrics With Stata PDF
58 pages
Panel Data Methods For Microeconomics Using Stata
100% (1)
Panel Data Methods For Microeconomics Using Stata
39 pages
Panel Stata Command
No ratings yet
Panel Stata Command
7 pages
Stata Basics for Data Analysts
No ratings yet
Stata Basics for Data Analysts
42 pages
Econometrics Note
No ratings yet
Econometrics Note
13 pages
Panel Analysis - April 2019 PDF
100% (1)
Panel Analysis - April 2019 PDF
303 pages
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
No ratings yet
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
27 pages
Stata 11 & Panel Data Analysis Guide
No ratings yet
Stata 11 & Panel Data Analysis Guide
87 pages
Chapter Three: Estimation of Multiple Linear Regression Model
No ratings yet
Chapter Three: Estimation of Multiple Linear Regression Model
18 pages
Econometrics for Economists
100% (1)
Econometrics for Economists
320 pages
Chapter 1 - Instrumental Variable Method
No ratings yet
Chapter 1 - Instrumental Variable Method
32 pages
Analysing Panel Data Using STATA
100% (1)
Analysing Panel Data Using STATA
13 pages
Drukker XTDPD
No ratings yet
Drukker XTDPD
34 pages
Lec06 - Panel Data
No ratings yet
Lec06 - Panel Data
160 pages
Stata Excel Spreadsheet
No ratings yet
Stata Excel Spreadsheet
43 pages
Stat331-Multiple Linear Regression
No ratings yet
Stat331-Multiple Linear Regression
13 pages
Chapter 3-Multiple Regression Model
No ratings yet
Chapter 3-Multiple Regression Model
26 pages
Chapter 4 Econometrics PDF
No ratings yet
Chapter 4 Econometrics PDF
26 pages
2018-Panel Data by Baun PDF
100% (1)
2018-Panel Data by Baun PDF
88 pages
ARDL Model
No ratings yet
ARDL Model
5 pages
Homoscedastic That Is, They All Have The Same Variance: Heteroscedasticity
100% (1)
Homoscedastic That Is, They All Have The Same Variance: Heteroscedasticity
11 pages
2SLS Klein Macro PDF
No ratings yet
2SLS Klein Macro PDF
4 pages
Applied Econometrics Course Guide
No ratings yet
Applied Econometrics Course Guide
68 pages
Econometrics II
100% (1)
Econometrics II
4 pages
Oracle System Performance Forecasting
No ratings yet
Oracle System Performance Forecasting
12 pages
Multicollinearity Among The Regressors Included in The Regression Model
No ratings yet
Multicollinearity Among The Regressors Included in The Regression Model
13 pages
Econometrics by Example Guide
No ratings yet
Econometrics by Example Guide
1 page
Lecture 7 VAR
No ratings yet
Lecture 7 VAR
34 pages
Econometric Analysis of Panel Data: William Greene Department of Economics Stern School of Business
No ratings yet
Econometric Analysis of Panel Data: William Greene Department of Economics Stern School of Business
88 pages
Econometrics PPT Final Review Slides
No ratings yet
Econometrics PPT Final Review Slides
41 pages
Saad Akhtar
No ratings yet
Saad Akhtar
48 pages
Stata Commands PDF
No ratings yet
Stata Commands PDF
5 pages
GMM Stata
No ratings yet
GMM Stata
27 pages
Time Series Analysis Guide
No ratings yet
Time Series Analysis Guide
202 pages
Model Specification & Data Issues
No ratings yet
Model Specification & Data Issues
45 pages
Econometrics I CH-1
100% (1)
Econometrics I CH-1
32 pages
Econometrics Module Tesfaye Ittansa
No ratings yet
Econometrics Module Tesfaye Ittansa
182 pages
Econometrics Year 3 Eco
No ratings yet
Econometrics Year 3 Eco
185 pages
Qualitative Response Regression Questions
No ratings yet
Qualitative Response Regression Questions
10 pages
Time Series Analysis Guide
No ratings yet
Time Series Analysis Guide
46 pages
SPSS Basics: A Comprehensive Guide
No ratings yet
SPSS Basics: A Comprehensive Guide
226 pages
Economics Program for Africa
No ratings yet
Economics Program for Africa
114 pages
CH 2. Simple Linear Regression
No ratings yet
CH 2. Simple Linear Regression
63 pages
Quantile vs. Linear Regression Analysis
No ratings yet
Quantile vs. Linear Regression Analysis
11 pages
Chapter 3
100% (1)
Chapter 3
28 pages
Chapter 0 - Multiple Regression Models
100% (1)
Chapter 0 - Multiple Regression Models
34 pages
Essentials of Econometrics Guide
7% (27)
Essentials of Econometrics Guide
12 pages
Cross Sectional
No ratings yet
Cross Sectional
40 pages
Intro To Econometrics Latter Half Chanon-1016098-17101310898743
No ratings yet
Intro To Econometrics Latter Half Chanon-1016098-17101310898743
15 pages
Ôn Final KTL
No ratings yet
Ôn Final KTL
5 pages
Chapter 5
No ratings yet
Chapter 5
45 pages
Statistical Report 12
No ratings yet
Statistical Report 12
2 pages
Regression Analysis Workshop
No ratings yet
Regression Analysis Workshop
36 pages

CH - 3 - Simple and Multiple Linear Regressions in Stata

Uploaded by

CH - 3 - Simple and Multiple Linear Regressions in Stata

Uploaded by

You might also like