Model selection

Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one.[1] In the context of machine learning and more generally statistical analysis, this may be the selection of a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice (Occam's razor).

Konishi & Kitagawa (2008, p. 75) state, "The majority of the problems in statistical inference can be considered to be problems related to statistical modeling". Relatedly, Cox (2006, p. 197) has said, "How [the] translation from subject-matter problem to statistical model is done is often the most critical part of an analysis".

Model selection may also refer to the problem of selecting a few representative models from a large set of computational models for the purpose of decision making or optimization under uncertainty.[2]

In machine learning, algorithmic approaches to model selection include feature selection, hyperparameter optimization, and statistical learning theory.

  1. ^ Hastie, Tibshirani, Friedman (2009). The elements of statistical learning. Springer. p. 195.{{cite book}}: CS1 maint: multiple names: authors list (link)
  2. ^ Shirangi, Mehrdad G.; Durlofsky, Louis J. (2016). "A general method to select representative models for decision making and optimization under uncertainty". Computers & Geosciences. 96: 109–123. Bibcode:2016CG.....96..109S. doi:10.1016/j.cageo.2016.08.002.