Multiple comparisons problem

An example of coincidence produced by data dredging (uncorrected multiple comparisons) showing a correlation between the number of letters in a spelling bee's winning word and the number of people in the United States killed by venomous spiders. Given a large enough pool of variables for the same time period, it is possible to find a pair of graphs that show a spurious correlation.

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously[1] or estimates a subset of parameters selected based on the observed values.[2]

The larger the number of inferences made, the more likely erroneous inferences become. Several statistical techniques have been developed to address this problem, for example, by requiring a stricter significance threshold for individual comparisons, so as to compensate for the number of inferences being made. Methods for family-wise error rate give the probability of false positives resulting from the multiple comparisons problem.

  1. ^ Miller, R.G. (1981). Simultaneous Statistical Inference 2nd Ed. Springer Verlag New York. ISBN 978-0-387-90548-8.
  2. ^ Benjamini, Y. (2010). "Simultaneous and selective inference: Current successes and future challenges". Biometrical Journal. 52 (6): 708–721. doi:10.1002/bimj.200900299. PMID 21154895. S2CID 8806192.