Dirichlet process

In probability theory, Dirichlet processes (after the distribution associated with Peter Gustav Lejeune Dirichlet) are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

As an example, a bag of 100 real-world dice is a random probability mass function (random pmf)—to sample this random pmf you put your hand in the bag and draw out a die, that is, you draw a pmf. A bag of dice manufactured using a crude process 100 years ago will likely have probabilities that deviate wildly from the uniform pmf, whereas a bag of state-of-the-art dice used by Las Vegas casinos may have barely perceptible imperfections. We can model the randomness of pmfs with the Dirichlet distribution.^[1]

The Dirichlet process is specified by a base distribution $H$ and a positive real number $\alpha$ called the concentration parameter (also known as scaling parameter). The base distribution is the expected value of the process, i.e., the Dirichlet process draws distributions "around" the base distribution the way a normal distribution draws real numbers around its mean. However, even if the base distribution is continuous, the distributions drawn from the Dirichlet process are almost surely discrete. The scaling parameter specifies how strong this discretization is: in the limit of $\alpha \rightarrow 0$ , the realizations are all concentrated at a single value, while in the limit of $\alpha \rightarrow \infty$ the realizations become continuous. Between the two extremes the realizations are discrete distributions with less and less concentration as $\alpha$ increases.

The Dirichlet process can also be seen as the infinite-dimensional generalization of the Dirichlet distribution. In the same way as the Dirichlet distribution is the conjugate prior for the categorical distribution, the Dirichlet process is the conjugate prior for infinite, nonparametric discrete distributions. A particularly important application of Dirichlet processes is as a prior probability distribution in infinite mixture models.

The Dirichlet process was formally introduced by Thomas S. Ferguson in 1973.^[2] It has since been applied in data mining and machine learning, among others for natural language processing, computer vision and bioinformatics.

^ Frigyik, Bela A.; Kapila, Amol; Gupta, Maya R. "Introduction to the Dirichlet Distribution and Related Processes" (PDF). Retrieved 2 September 2021.
^ Ferguson, Thomas (1973). "Bayesian analysis of some nonparametric problems". Annals of Statistics. 1 (2): 209–230. doi:10.1214/aos/1176342360. MR 0350949.

[1] Frigyik, Bela A.; Kapila, Amol; Gupta, Maya R. "Introduction to the Dirichlet Distribution and Related Processes" (PDF). Retrieved 2 September 2021.

[2] Ferguson, Thomas (1973). "Bayesian analysis of some nonparametric problems". Annals of Statistics. 1 (2): 209–230. doi:10.1214/aos/1176342360. MR 0350949.

[1]

[2]