Part of a series on Statistics |
Correlation and covariance |
---|
In probability theory and statistics, the mathematical concepts of covariance and correlation are very similar.[1][2] Both describe the degree to which two random variables or sets of random variables tend to deviate from their expected values in similar ways.
If X and Y are two random variables, with means (expected values) μX and μY and standard deviations σX and σY, respectively, then their covariance and correlation are as follows:
so that
where E is the expected value operator. Notably, correlation is dimensionless while covariance is in units obtained by multiplying the units of the two variables.
If Y always takes on the same values as X, we have the covariance of a variable with itself (i.e. ), which is called the variance and is more commonly denoted as the square of the standard deviation. The correlation of a variable with itself is always 1 (except in the degenerate case where the two variances are zero because X always takes on the same single value, in which case the correlation does not exist since its computation would involve division by 0). More generally, the correlation between two variables is 1 (or –1) if one of them always takes on a value that is given exactly by a linear function of the other with respectively a positive (or negative) slope.
Although the values of the theoretical covariances and correlations are linked in the above way, the probability distributions of sample estimates of these quantities are not linked in any simple way and they generally need to be treated separately.