Correspondence analysis

Correspondence analysis (CA) is a multivariate statistical technique proposed^[1] by Herman Otto Hartley (Hirschfeld)^[2] and later developed by Jean-Paul Benzécri.^[3] It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in two-dimensional graphical form. Its aim is to display in a biplot any structure hidden in the multivariate setting of the data table. As such it is a technique from the field of multivariate ordination. Since the variant of CA described here can be applied either with a focus on the rows or on the columns it should in fact be called simple (symmetric) correspondence analysis.^[4]

It is traditionally applied to the contingency table of a pair of nominal variables where each cell contains either a count or a zero value. If more than two categorical variables are to be summarized, a variant called multiple correspondence analysis should be chosen instead. CA may also be applied to binary data given the presence/absence coding represents simplified count data i.e. a 1 describes a positive count and 0 stands for a count of zero. Depending on the scores used CA preserves the chi-square distance^[5]^[6] between either the rows or the columns of the table. Because CA is a descriptive technique, it can be applied to tables regardless of a significant chi-squared test.^[7]^[8] Although the $\chi ^{2}$ statistic used in inferential statistics and the chi-square distance are computationally related they should not be confused since the latter works as a multivariate statistical distance measure in CA while the $\chi ^{2}$ statistic is in fact a scalar not a metric.^[9]

^ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP ISBN 0-19-850994-4
^ Hirschfeld, H.O. (1935) "A connection between correlation and contingency", Proc. Cambridge Philosophical Society, 31, 520–524
^ Benzécri, J.-P. (1973). L'Analyse des Données. Volume II. L'Analyse des Correspondances. Paris, France: Dunod.
^ Beh, Eric; Lombardo, Rosaria (2014). Correspondence Analysis. Theory, Practice and New Strategies. Chichester: Wiley. p. 120. ISBN 978-1-119-95324-1.
^ Greenacre, Michael (2007). Correspondence Analysis in Practice. Boca Raton: CRC Press. p. 204. ISBN 9781584886167.
^ Legendre, Pierre; Legendre, Louis (2012). Numerical Ecology. Amsterdam: Elsevier. p. 465. ISBN 978-0-444-53868-0.
^ Greenacre, Michael (1983). Theory and Applications of Correspondence Analysis. London: Academic Press. ISBN 0-12-299050-1.
^ Greenacre, Michael (2007). Correspondence Analysis in Practice, Second Edition. London: Chapman & Hall/CRC.
^ Greenacre, Michael (2017). Correspondence Analysis in Practice (3rd ed.). Boca Raton: CRC Press. pp. 26–29. ISBN 9781498731775.

[1] Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP ISBN 0-19-850994-4

[2] Hirschfeld, H.O. (1935) "A connection between correlation and contingency", Proc. Cambridge Philosophical Society, 31, 520–524

[3] Benzécri, J.-P. (1973). L'Analyse des Données. Volume II. L'Analyse des Correspondances. Paris, France: Dunod.

[4] Beh, Eric; Lombardo, Rosaria (2014). Correspondence Analysis. Theory, Practice and New Strategies. Chichester: Wiley. p. 120. ISBN 978-1-119-95324-1.

[5] Greenacre, Michael (2007). Correspondence Analysis in Practice. Boca Raton: CRC Press. p. 204. ISBN 9781584886167.

[6] Legendre, Pierre; Legendre, Louis (2012). Numerical Ecology. Amsterdam: Elsevier. p. 465. ISBN 978-0-444-53868-0.

[Greenacre,_Michael_1983-7] Greenacre, Michael (1983). Theory and Applications of Correspondence Analysis. London: Academic Press. ISBN 0-12-299050-1.

[8] Greenacre, Michael (2007). Correspondence Analysis in Practice, Second Edition. London: Chapman & Hall/CRC.

[9] Greenacre, Michael (2017). Correspondence Analysis in Practice (3rd ed.). Boca Raton: CRC Press. pp. 26–29. ISBN 9781498731775.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]