Cross-validation (statistics)

Comparing the cross-validation accuracy and percent of false negative (overestimation) of five classification models. Size of bubbles represent the standard deviation of cross-validation accuracy (tenfold).[1]
Diagram of k-fold cross-validation

Cross-validation,[2][3][4] sometimes called rotation estimation[5][6][7] or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters.

In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set).[8][9] The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias[10] and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem).

One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance.

In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance.[11]

  1. ^ Piryonesi, S. Madeh; El-Diraby, Tamer E. (March 2020). "Data Analytics in Asset Management: Cost-Effective Prediction of the Pavement Condition Index". Journal of Infrastructure Systems. 26 (1). doi:10.1061/(ASCE)IS.1943-555X.0000512.
  2. ^ Allen, David M (1974). "The Relationship between Variable Selection and Data Agumentation and a Method for Prediction". Technometrics. 16 (1): 125–127. doi:10.2307/1267500. JSTOR 1267500.
  3. ^ Stone, M. (1974). "Cross-Validatory Choice and Assessment of Statistical Predictions". Journal of the Royal Statistical Society Series B: Statistical Methodology. 36 (2): 111–133. doi:10.1111/j.2517-6161.1974.tb00994.x.
  4. ^ Stone, M (1977). "An Asymptotic Equivalence of Choice of Model by Cross-Validation and Akaike's Criterion". Journal of the Royal Statistical Society, Series B (Methodological). 39 (1): 44–47. doi:10.1111/j.2517-6161.1977.tb01603.x. JSTOR 2984877.
  5. ^ Geisser, Seymour (1993). Predictive Inference. New York, NY: Chapman and Hall. ISBN 978-0-412-03471-8.[page needed]
  6. ^ Kohavi, Ron (20 August 1995). "A study of cross-validation and bootstrap for accuracy estimation and model selection" (PDF). Proceedings of the 14th international joint conference on Artificial intelligence. Vol. 2. Morgan Kaufmann Publishers. pp. 1137–1143. ISBN 978-1-55860-363-9.
  7. ^ Devijver, Pierre A.; Kittler, Josef (1982). Pattern Recognition: A Statistical Approach. London, GB: Prentice-Hall. ISBN 978-0-13-654236-0.[page needed]
  8. ^ Galkin, Alexander (November 28, 2011). "What is the difference between test set and validation set?". Cross Validated. Stack Exchange. Retrieved 10 October 2018.
  9. ^ "Newbie question: Confused about train, validation and test data!". Heaton Research. December 2010. Archived from the original on 2015-03-14. Retrieved 2013-11-14.[self-published source?]
  10. ^ Cawley, Gavin C.; Talbot, Nicola L. C. (2010). "On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation" (PDF). Journal of Machine Learning Research. 11: 2079–2107.
  11. ^ Seni, Giovanni; Elder, John F. (January 2010). "Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions". Synthesis Lectures on Data Mining and Knowledge Discovery. 2 (1): 1–126. doi:10.2200/S00240ED1V01Y200912DMK002.