Classical linear regression model assumptions
Classical linear regression model assumptions are a set of assumptions that underpin the
mathematical theory of linear regression. Violations of these assumptions may lead to biased or
inefficient estimates of the model parameters, inaccurate predictions, or invalid inferences.
The classical linear regression model assumptions are as follows:
1. Linearity: The relationship between the independent variable(s) and the dependent
variable is linear. This means that the effect of a unit change in the independent
variable(s) on the dependent variable is constant across the range of values of the
independent variable(s). If the relationship is not linear, then the model may not
accurately represent the data. Nonlinear relationships can sometimes be transformed
to linear relationships by applying a mathematical function to one or more of the
variables.
2. Independence: The observations are independent of each other. This means that the
value of the dependent variable for one observation is not related to the value of the
dependent variable for another observation. Independence is important because if the
observations are not independent, then the regression coefficients may be biased and
the standard errors may be underestimated.
3. Homoscedasticity: The variance of the errors (the differences between the predicted and
actual values of the dependent variable) is constant across the range of values of the
independent variable(s). This means that the spread of the errors is the same for all levels
of the independent variable(s). If the errors have different variances for different levels of
the independent variable(s), this is called heteroscedasticity. Heteroscedasticity can lead
to biased and inefficient estimates of the regression coefficients and standard errors.
4. Normality: The errors are normally distributed. This means that the distribution of the
errors is symmetric and bell-shaped.
5. No multicollinearity: The independent variables are not highly correlated with each
other. This means that there is no linear relationship between any pair of independent
variables.
6. No influential outliers: The data do not contain any influential outliers that can distort
the results of the regression analysis.
These assumptions are important because they ensure that the estimates of the regression
coefficients are unbiased and efficient, and that the predictions and inferences based on the
model are valid. Violations of these assumptions can lead to biased or inefficient estimates of
the model parameters, inaccurate predictions, or invalid inferences. Therefore, it is important to
check these assumptions before applying linear regression and, if necessary, to take
appropriate corrective actions.
1. Kendall, M. (1938). "A New Measure of Rank Correlation". Biometrika. 30 (1–2): 81–
89. doi:10.1093/biomet/30.1-2.81. JSTOR 2332226.
2. ^ Kruskal, W. H. (1958). "Ordinal Measures of Association". Journal of the American
Statistical Association. 53 (284): 814–
861. doi:10.2307/2281954. JSTOR 2281954. MR 0100941.
3. ^ Nelsen, R.B. (2001) [1994], "Kendall tau metric", Encyclopedia of Mathematics, EMS
Press
4. ^ Prokhorov, A.V. (2001) [1994], "Kendall coefficient of rank correlation", Encyclopedia of
Mathematics, EMS Press
5. ^ Agresti, A. (2010). Analysis of Ordinal Categorical Data (Second ed.). New York: John
Wiley & Sons. ISBN 978-0-470-08289-8.
6. ^ Alfred Brophy (1986). "An algorithm and program for calculation of Kendall's rank
correlation coefficient" (PDF). Behavior Research Methods, Instruments, & Computers. 18:
45–46. doi:10.3758/BF03200993