Energy distance

Energy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in R^d with cumulative distribution functions (cdf) F and G respectively, then the energy distance between the distributions F and G is defined to be the square root of

D^{2}(F,G)=2\operatorname {E} \|X-Y\|-\operatorname {E} \|X-X'\|-\operatorname {E} \|Y-Y'\|\geq 0,

where (X, X', Y, Y') are independent, the cdf of X and X' is F, the cdf of Y and Y' is G, $\operatorname {E}$ is the expected value, and || . || denotes the length of a vector. Energy distance satisfies all axioms of a metric thus energy distance characterizes the equality of distributions: D(F,G) = 0 if and only if F = G. Energy distance for statistical applications was introduced in 1985 by Gábor J. Székely, who proved that for real-valued random variables $D^{2}(F,G)$ is exactly twice Harald Cramér's distance:^[1]

\int _{-\infty }^{\infty }(F(x)-G(x))^{2}\,dx.

For a simple proof of this equivalence, see Székely (2002).^[2]

In higher dimensions, however, the two distances are different because the energy distance is rotation invariant while Cramér's distance is not. (Notice that Cramér's distance is not the same as the distribution-free Cramér–von Mises criterion.)

^ Cramér, H. (1928) On the composition of elementary errors, Skandinavisk Aktuarietidskrift, 11, 141–180.
^ E-Statistics: The energy of statistical samples (2002) PDF Archived 2016-04-20 at the Wayback Machine

[1] Cramér, H. (1928) On the composition of elementary errors, Skandinavisk Aktuarietidskrift, 11, 141–180.

[2] E-Statistics: The energy of statistical samples (2002) PDF Archived 2016-04-20 at the Wayback Machine

[1]

[2]