Multicollinearity

In statistics, multicollinearity or collinearity is a situation where the predictors in a regression model are linearly dependent.

Perfect multicollinearity refers to a situation where the predictive variables have an exact linear relationship. When there is perfect collinearity, the design matrix $X$ has less than full rank, and therefore the moment matrix $X^{\mathsf {T}}X$ cannot be inverted. In this situation, the parameter estimates of the regression are not well-defined, as the system of equations has infinitely many solutions.

Imperfect multicollinearity refers to a situation where the predictive variables have a nearly exact linear relationship.

Contrary to popular belief, neither the Gauss–Markov theorem nor the more common maximum likelihood justification for ordinary least squares relies on any kind of correlation structure between dependent predictors^[1]^[2]^[3] (although perfect collinearity can cause problems with some software).

There is no justification for the practice of removing collinear variables as part of regression analysis,^[1]^[4]^[5]^[6]^[7] and doing so may constitute scientific misconduct. Including collinear variables does not reduce the predictive power or reliability of the model as a whole,^[6] and does not reduce the accuracy of coefficient estimates.^[1]

High collinearity indicates that it is exceptionally important to include all collinear variables, as excluding any will cause worse coefficient estimates, strong confounding, and downward-biased estimates of standard errors.^[2]

To address the high collinearity of a dataset, variance inflation factor can be used to identify the collinearity of the predictor variables.

^ ^a ^b ^c Gujarati, Damodar (2009). "Multicollinearity: what happens if the regressors are correlated?". Basic Econometrics (4th ed.). McGraw−Hill. pp. 363. ISBN 9780073375779.
^ ^a ^b Kalnins, Arturs; Praitis Hill, Kendall (13 December 2023). "The VIF Score. What is it Good For? Absolutely Nothing". Organizational Research Methods. doi:10.1177/10944281231216381. ISSN 1094-4281.
^ Leamer, Edward E. (1973). "Multicollinearity: A Bayesian Interpretation". The Review of Economics and Statistics. 55 (3): 371–380. doi:10.2307/1927962. ISSN 0034-6535. JSTOR 1927962.
^ Giles, Dave (15 September 2011). "Econometrics Beat: Dave Giles' Blog: Micronumerosity". Econometrics Beat. Retrieved 3 September 2023.
^ Goldberger,(1964), A.S. (1964). Econometric Theory. New York: Wiley.{{cite book}}: CS1 maint: numeric names: authors list (link)
^ ^a ^b Goldberger, A.S. "Chapter 23.3". A Course in Econometrics. Cambridge MA: Harvard University Press.
^ Blanchard, Olivier Jean (October 1987). "Comment". Journal of Business & Economic Statistics. 5 (4): 449–451. doi:10.1080/07350015.1987.10509611. ISSN 0735-0015.

[:3-1] Gujarati, Damodar (2009). "Multicollinearity: what happens if the regressors are correlated?". Basic Econometrics (4th ed.). McGraw−Hill. pp. 363. ISBN 9780073375779.

[:6-2] Kalnins, Arturs; Praitis Hill, Kendall (13 December 2023). "The VIF Score. What is it Good For? Absolutely Nothing". Organizational Research Methods. doi:10.1177/10944281231216381. ISSN 1094-4281.

[:5-3] Leamer, Edward E. (1973). "Multicollinearity: A Bayesian Interpretation". The Review of Economics and Statistics. 55 (3): 371–380. doi:10.2307/1927962. ISSN 0034-6535. JSTOR 1927962.

[:0-4] Giles, Dave (15 September 2011). "Econometrics Beat: Dave Giles' Blog: Micronumerosity". Econometrics Beat. Retrieved 3 September 2023.

[5] Goldberger,(1964), A.S. (1964). Econometric Theory. New York: Wiley.{{cite book}}: CS1 maint: numeric names: authors list (link)

[:1-6] Goldberger, A.S. "Chapter 23.3". A Course in Econometrics. Cambridge MA: Harvard University Press.

[:2-7] Blanchard, Olivier Jean (October 1987). "Comment". Journal of Business & Economic Statistics. 5 (4): 449–451. doi:10.1080/07350015.1987.10509611. ISSN 0735-0015.

[1]

[2]

[3]

[4]

[5]

[6]

[7]