Stepwise regression

In statistics, stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure.^[1]^[2]^[3]^[4] In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion. Usually, this takes the form of a forward, backward, or combined sequence of F-tests or t-tests.

The frequent practice of fitting the final selected model followed by reporting estimates and confidence intervals without adjusting them to take the model building process into account has led to calls to stop using stepwise model building altogether^[5]^[6] or to at least make sure model uncertainty is correctly reflected by using prespecified, automatic criteria together with more complex standard error estimates that remain unbiased.^[7]^[8]

In this example from engineering, necessity and sufficiency are usually determined by F-tests. For additional consideration, when planning an experiment, computer simulation, or scientific survey to collect data for this model, one must keep in mind the number of parameters, P, to estimate and adjust the sample size accordingly. For K variables, P = 1_(Start) + K_(Stage I) + (K² − K)/2_(Stage II) + 3K_(Stage III) = 0.5K² + 3.5K + 1. For K < 17, an efficient design of experiments exists for this type of model, a Box–Behnken design,^[9] augmented with positive and negative axial points of length min(2, (int(1.5 + K/4))^1/2), plus point(s) at the origin. There are more efficient designs, requiring fewer runs, even for K > 16.

^ Efroymson, M. A. (1960) "Multiple regression analysis," Mathematical Methods for Digital Computers, Ralston A. and Wilf, H. S., (eds.), Wiley, New York.
^ Hocking, R. R. (1976) "The Analysis and Selection of Variables in Linear Regression," Biometrics, 32.
^ Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2d Edition, New York: John Wiley & Sons, Inc.
^ SAS Institute Inc. (1989) SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2, Cary, NC: SAS Institute Inc.
^ Flom, P. L. and Cassell, D. L. (2007) "Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use," NESUG 2007.
^ Harrell, F. E. (2001) "Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis," Springer-Verlag, New York.
^ Chatfield, C. (1995) "Model uncertainty, data mining and statistical inference," J. R. Statist. Soc. A 158, Part 3, pp. 419–466.
^ Efron, B. and Tibshirani, R. J. (1998) "An introduction to the bootstrap," Chapman & Hall/CRC
^ Box–Behnken designs from a handbook on engineering statistics at NIST

[1] Efroymson, M. A. (1960) "Multiple regression analysis," Mathematical Methods for Digital Computers, Ralston A. and Wilf, H. S., (eds.), Wiley, New York.

[2] Hocking, R. R. (1976) "The Analysis and Selection of Variables in Linear Regression," Biometrics, 32.

[3] Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2d Edition, New York: John Wiley & Sons, Inc.

[4] SAS Institute Inc. (1989) SAS/STAT User's Guide, Version 6, Fourth Edition, Volume 2, Cary, NC: SAS Institute Inc.

[Flom2007-5] Flom, P. L. and Cassell, D. L. (2007) "Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use," NESUG 2007.

[6] Harrell, F. E. (2001) "Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis," Springer-Verlag, New York.

[Chatfield1995-7] Chatfield, C. (1995) "Model uncertainty, data mining and statistical inference," J. R. Statist. Soc. A 158, Part 3, pp. 419–466.

[8] Efron, B. and Tibshirani, R. J. (1998) "An introduction to the bootstrap," Chapman & Hall/CRC

[9] Box–Behnken designs from a handbook on engineering statistics at NIST

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]