Co-citation

Documents A and B are cited by documents C, D and E, hence the documents A and B exhibit a co-citation strength of three. A more recent refinement of co-citation takes into account placement of citations with the document.
Figure visualizing co-citation on the left and a refinement of co-citation, Co-citation Proximity Analysis (CPA) on the right.

Co-citation is the frequency with which two documents are cited together by other documents.[1] If at least one other document cites two documents in common, these documents are said to be co-cited. The more co-citations two documents receive, the higher their co-citation strength, and the more likely they are semantically related.[1] Like bibliographic coupling, co-citation is a semantic similarity measure for documents that makes use of citation analyses.

The figure to the right illustrates the concept of co-citation and a more recent variation of co-citation which accounts for the placement of citations in the full text of documents. The figure's left image shows the Documents A and B, which are both cited by Documents C, D and E; thus Documents A and B have a co-citation strength, or co-citation index[2] of three. This score is usually established using citation indexes. Documents featuring high numbers of co-citations are regarded as more similar.[1]


The figure's right image shows a citing document which cites the Documents 1, 2 and 3. Both the Documents 1 and 2 and the Documents 2 and 3 have a co-citation strength of one, given that they are cited together by exactly one other document. However, Documents 2 and 3 are cited in much closer proximity to each other in the citing document compared to Document 1. To make co-citation a more meaningful measure in this case, a Co-Citation Proximity Index (CPI) can be introduced to account for the placement of citations relative to each other. Documents co-cited at greater relative distances in the full text receive lower CPI values.[3] Gipp and Beel were the first to propose using modified co-citation weights based on proximity.[4]

Henry Small[1] and Irina Marshakova[5] are credited for introducing co-citation analysis in 1973.[2] Both researchers came up with the measure independently, although Marshakova gained less credit, likely because her work was published in Russian.[6]

Co-citation analysis provides a forward-looking assessment on document similarity in contrast to Bibliographic Coupling, which is retrospective.[7] The citations a paper receives in the future depend on the evolution of an academic field, thus co-citation frequencies can still change. In the adjacent diagram, for example, Doc A and Doc B may still be co-cited by future documents, say Doc F and Doc G. This characteristic of co-citation allows for a dynamic document classification system when compared to Bibliographic Coupling.

Over the decades, researchers proposed variants or enhancements to the original co-citation concept. Howard White introduced author co-citation analysis in 1981.[8] Gipp and Beel proposed Co-citation Proximity Analysis (CPA) and introduced the CPI as an enhancement to the original co-citation concept in 2009.[3] Co-citation Proximity Analysis considers the proximity of citations within the full-texts for similarity computation and therefore allows for a more fine-grained assessment of semantic document similarity than pure co-citation.[9]

  1. ^ a b c d Henry G. Small (July 1973). "Co-citation in the scientific literature: A new measure of the relationship between two documents". Journal of the Association for Information Science and Technology. 24 (4): 265–269. doi:10.1002/ASI.4630240406. ISSN 1532-2882. Wikidata Q56679837..
  2. ^ a b Jeppe Nicolaisen, 2005 Co-citation Archived 2013-03-15 at the Wayback Machine, in Birger Hjørland, ed., Core Concepts in Library and Information Science Archived 2010-05-25 at the Wayback Machine from The Royal School of Library and Information Science (RSLIS), Copenhagen, Denmark.
  3. ^ a b Bela Gipp and Joeran Beel, 2009 "Citation Proximity Analysis (CPA) – A new approach for identifying related work based on Co-Citation Analysis" in Birger Larsen and Jacqueline Leta, editors, Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), volume 2, pages 571–575, Rio de Janeiro (Brazil), July 2009.
  4. ^ Kevin W. Boyack, Henry Small and Richard Klavans, 2013 "Improving the Accuracy of Co-citation Clustering Using Full Text" Archived 2016-03-04 at the Wayback Machine Journal of the American Society for Information Science and Technology, Volume 64, Issue 9, pages 1759–1767, September 2013
  5. ^ Irena Marshakova Shaikevich, 1973. "System of Document Connections Based on References". Scientific and Technical Information Serial of VINITI, 6(2):3–8
  6. ^ Frank Havemann, 2009. "Einführung in die Bibliometrie." Humboldt University of Berlin.
  7. ^ Garfield, E., November 27, 2001. "From Bibliographic Coupling to Co-Citation Analysis Via Algorithmic Historio-Bibliography: A Citationist’s Tribute to Belver C. Griffith. a paper presented at the Drexel University, Philadelphia, PA.
  8. ^ Howard D. White and Belver C. Griffith, 1981. "Author Cocitation: A Literature Measure of Intellectual Structure." Journal of the American Society for Information Science (JASIS), May, 1981 volume 32(3), pp. 163-171. -- the first ACA paper. DOI = 10.1002/asi.4630320302.
  9. ^ M. Schwarzer, M. Schubotz, N. Meuschke, C. Breitinger, V. Markl, and B. Gipp, "Evaluating Link-based Recommendations for Wikipedia" in Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), New York, NY, USA, 2016, pp. 191-200.