Regression dilution

Regression dilution, also known as regression attenuation, is the biasing of the linear regression slope towards zero (the underestimation of its absolute value), caused by errors in the independent variable.

Consider fitting a straight line for the relationship of an outcome variable y to a predictor variable x, and estimating the slope of the line. Statistical variability, measurement error or random noise in the y variable causes uncertainty in the estimated slope, but not bias: on average, the procedure calculates the right slope. However, variability, measurement error or random noise in the x variable causes bias in the estimated slope (as well as imprecision). The greater the variance in the x measurement, the closer the estimated slope must approach zero instead of the true value.

It may seem counter-intuitive that noise in the predictor variable x induces a bias, but noise in the outcome variable y does not. Recall that linear regression is not symmetric: the line of best fit for predicting y from x (the usual linear regression) is not the same as the line of best fit for predicting x from y.^[1]

^ Draper, N.R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. p. 19. ISBN 0-471-17082-8.

[1] Draper, N.R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. p. 19. ISBN 0-471-17082-8.

[1]