In statistics, the reference class problem is the problem of deciding what class to use when calculating the probability applicable to a particular case.
For example, to estimate the probability of an aircraft crashing, we could refer to the frequency of crashes among various different sets of aircraft: all aircraft, this make of aircraft, aircraft flown by this company in the last ten years, etc. In this example, the aircraft for which we wish to calculate the probability of a crash is a member of many different classes, in which the frequency of crashes differs. It is not obvious which class we should refer to for this aircraft. In general, any case is a member of very many classes among which the frequency of the attribute of interest differs. The reference class problem discusses which class is the most appropriate to use.
More formally, many arguments in statistics take the form of a statistical syllogism:
is called the "reference class" and is the "attribute class" and is the individual object. How is one to choose an appropriate class ?
In Bayesian statistics, the problem arises as that of deciding on a prior probability for the outcome in question (or when considering multiple outcomes, a prior probability distribution).