Correspondence analysis

Correspondence analysis (CA) is a multivariate statistical technique proposed[1] by Herman Otto Hartley (Hirschfeld)[2] and later developed by Jean-Paul Benzécri.[3] It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in two-dimensional graphical form. Its aim is to display in a biplot any structure hidden in the multivariate setting of the data table. As such it is a technique from the field of multivariate ordination. Since the variant of CA described here can be applied either with a focus on the rows or on the columns it should in fact be called simple (symmetric) correspondence analysis.[4]

It is traditionally applied to the contingency table of a pair of nominal variables where each cell contains either a count or a zero value. If more than two categorical variables are to be summarized, a variant called multiple correspondence analysis should be chosen instead. CA may also be applied to binary data given the presence/absence coding represents simplified count data i.e. a 1 describes a positive count and 0 stands for a count of zero. Depending on the scores used CA preserves the chi-square distance[5][6] between either the rows or the columns of the table. Because CA is a descriptive technique, it can be applied to tables regardless of a significant chi-squared test.[7][8] Although the statistic used in inferential statistics and the chi-square distance are computationally related they should not be confused since the latter works as a multivariate statistical distance measure in CA while the statistic is in fact a scalar not a metric.[9]

  1. ^ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP ISBN 0-19-850994-4
  2. ^ Hirschfeld, H.O. (1935) "A connection between correlation and contingency", Proc. Cambridge Philosophical Society, 31, 520–524
  3. ^ Benzécri, J.-P. (1973). L'Analyse des Données. Volume II. L'Analyse des Correspondances. Paris, France: Dunod.
  4. ^ Beh, Eric; Lombardo, Rosaria (2014). Correspondence Analysis. Theory, Practice and New Strategies. Chichester: Wiley. p. 120. ISBN 978-1-119-95324-1.
  5. ^ Greenacre, Michael (2007). Correspondence Analysis in Practice. Boca Raton: CRC Press. p. 204. ISBN 9781584886167.
  6. ^ Legendre, Pierre; Legendre, Louis (2012). Numerical Ecology. Amsterdam: Elsevier. p. 465. ISBN 978-0-444-53868-0.
  7. ^ Greenacre, Michael (1983). Theory and Applications of Correspondence Analysis. London: Academic Press. ISBN 0-12-299050-1.
  8. ^ Greenacre, Michael (2007). Correspondence Analysis in Practice, Second Edition. London: Chapman & Hall/CRC.
  9. ^ Greenacre, Michael (2017). Correspondence Analysis in Practice (3rd ed.). Boca Raton: CRC Press. pp. 26–29. ISBN 9781498731775.