Estimation Problem of Two-Variable
Regression Model
The Method of Ordinary Least Squares
• Under certain assumptions, the method of least
squares has some very attractive statistical properties
that have made it one of the most powerful and
popular methods of regression analysis.
• Recall the two-variable PRF:
• We estimate it from the SRF:
Where is the estimated (conditional mean) value of .
• which shows that the (the residuals) are
simply the differences between the actual and
estimated Y values.
• If we adopt the least-squares criterion, which
states that the SRF can be fixed in such a way
that:
is as small as possible, where are the squared
residuals.
The Classical Linear Regression Model:
The Assumption Underlying the Method of Least
Squares
• Assumption 1: Linear regression model. The
regression model is linear in the parameters
• Assumption 2: X values are fixed in repeated
sampling. Values taken by the regressor X are
considered fixed in repeated samples. More
technically, X is assumed to be nonstochastic.
• Assumption 3: Zero mean value of
disturbance . Given the value of X, the mean,
or expected, value of the random disturbance
term is zero. Technically, the conditional mean
value of is zero. Symbolically, we have:
• Assumption 4: Homoscedasticity or equal
variance of . Given the value of X, the variance
of is the same for all observations. That is, the
conditional variances of are identical.
Symbolically, we have
because of assumption 3
where var stands for variance
• Assumption 7: The number of observations n must be greater
than the number of parameters to be estimated. Alternatively,
the number of observations n must be greater than the number of
explanatory variables.
• Assumption 8: Variability in X values. The X values in a given
sample must not all be the same. Technically, var (X) must be a
finite positive number.
• Assumption 9: The regression model is correctly specified.
Alternatively, there is no specification bias or error in the model
used in empirical analysis.
• Assumption 10: There is no perfect multicollinearity. That is,
there are no perfect linear relationships among the explanatory
variables.