Co-citation

Co-citation is the frequency with which two documents are cited together by other documents.^[1] If at least one other document cites two documents in common, these documents are said to be co-cited. The more co-citations two documents receive, the higher their co-citation strength, and the more likely they are semantically related.^[1] Like bibliographic coupling, co-citation is a semantic similarity measure for documents that makes use of citation analyses.

The figure to the right illustrates the concept of co-citation and a more recent variation of co-citation which accounts for the placement of citations in the full text of documents. The figure's left image shows the Documents A and B, which are both cited by Documents C, D and E; thus Documents A and B have a co-citation strength, or co-citation index^[2] of three. This score is usually established using citation indexes. Documents featuring high numbers of co-citations are regarded as more similar.^[1]

The figure's right image shows a citing document which cites the Documents 1, 2 and 3. Both the Documents 1 and 2 and the Documents 2 and 3 have a co-citation strength of one, given that they are cited together by exactly one other document. However, Documents 2 and 3 are cited in much closer proximity to each other in the citing document compared to Document 1. To make co-citation a more meaningful measure in this case, a Co-Citation Proximity Index (CPI) can be introduced to account for the placement of citations relative to each other. Documents co-cited at greater relative distances in the full text receive lower CPI values.^[3] Gipp and Beel were the first to propose using modified co-citation weights based on proximity.^[4]

Henry Small^[1] and Irina Marshakova^[5] are credited for introducing co-citation analysis in 1973.^[2] Both researchers came up with the measure independently, although Marshakova gained less credit, likely because her work was published in Russian.^[6]

Co-citation analysis provides a forward-looking assessment on document similarity in contrast to Bibliographic Coupling, which is retrospective.^[7] The citations a paper receives in the future depend on the evolution of an academic field, thus co-citation frequencies can still change. In the adjacent diagram, for example, Doc A and Doc B may still be co-cited by future documents, say Doc F and Doc G. This characteristic of co-citation allows for a dynamic document classification system when compared to Bibliographic Coupling.

Over the decades, researchers proposed variants or enhancements to the original co-citation concept. Howard White introduced author co-citation analysis in 1981.^[8] Gipp and Beel proposed Co-citation Proximity Analysis (CPA) and introduced the CPI as an enhancement to the original co-citation concept in 2009.^[3] Co-citation Proximity Analysis considers the proximity of citations within the full-texts for similarity computation and therefore allows for a more fine-grained assessment of semantic document similarity than pure co-citation.^[9]

^ ^a ^b ^c ^d Henry G. Small (July 1973). "Co-citation in the scientific literature: A new measure of the relationship between two documents". Journal of the Association for Information Science and Technology. 24 (4): 265–269. doi:10.1002/ASI.4630240406. ISSN 1532-2882. Wikidata Q56679837..
^ ^a ^b Jeppe Nicolaisen, 2005 Co-citation Archived 2013-03-15 at the Wayback Machine, in Birger Hjørland, ed., Core Concepts in Library and Information Science Archived 2010-05-25 at the Wayback Machine from The Royal School of Library and Information Science (RSLIS), Copenhagen, Denmark.
^ ^a ^b Bela Gipp and Joeran Beel, 2009 "Citation Proximity Analysis (CPA) – A new approach for identifying related work based on Co-Citation Analysis" in Birger Larsen and Jacqueline Leta, editors, Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), volume 2, pages 571–575, Rio de Janeiro (Brazil), July 2009.
^ Kevin W. Boyack, Henry Small and Richard Klavans, 2013 "Improving the Accuracy of Co-citation Clustering Using Full Text" Archived 2016-03-04 at the Wayback Machine Journal of the American Society for Information Science and Technology, Volume 64, Issue 9, pages 1759–1767, September 2013
^ Irena Marshakova Shaikevich, 1973. "System of Document Connections Based on References". Scientific and Technical Information Serial of VINITI, 6(2):3–8
^ Frank Havemann, 2009. "Einführung in die Bibliometrie." Humboldt University of Berlin.
^ Garfield, E., November 27, 2001. "From Bibliographic Coupling to Co-Citation Analysis Via Algorithmic Historio-Bibliography: A Citationist’s Tribute to Belver C. Griffith. a paper presented at the Drexel University, Philadelphia, PA.
^ Howard D. White and Belver C. Griffith, 1981. "Author Cocitation: A Literature Measure of Intellectual Structure." Journal of the American Society for Information Science (JASIS), May, 1981 volume 32(3), pp. 163-171. -- the first ACA paper. DOI = 10.1002/asi.4630320302.
^ M. Schwarzer, M. Schubotz, N. Meuschke, C. Breitinger, V. Markl, and B. Gipp, "Evaluating Link-based Recommendations for Wikipedia" in Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), New York, NY, USA, 2016, pp. 191-200.

[Small-1] Henry G. Small (July 1973). "Co-citation in the scientific literature: A new measure of the relationship between two documents". Journal of the Association for Information Science and Technology. 24 (4): 265–269. doi:10.1002/ASI.4630240406. ISSN 1532-2882. Wikidata Q56679837..

[Nicolaisen-2] Jeppe Nicolaisen, 2005 Co-citation Archived 2013-03-15 at the Wayback Machine, in Birger Hjørland, ed., Core Concepts in Library and Information Science Archived 2010-05-25 at the Wayback Machine from The Royal School of Library and Information Science (RSLIS), Copenhagen, Denmark.

[Gipp-3] Bela Gipp and Joeran Beel, 2009 "Citation Proximity Analysis (CPA) – A new approach for identifying related work based on Co-Citation Analysis" in Birger Larsen and Jacqueline Leta, editors, Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), volume 2, pages 571–575, Rio de Janeiro (Brazil), July 2009.

[Boyack-4] Kevin W. Boyack, Henry Small and Richard Klavans, 2013 "Improving the Accuracy of Co-citation Clustering Using Full Text" Archived 2016-03-04 at the Wayback Machine Journal of the American Society for Information Science and Technology, Volume 64, Issue 9, pages 1759–1767, September 2013

[5] Irena Marshakova Shaikevich, 1973. "System of Document Connections Based on References". Scientific and Technical Information Serial of VINITI, 6(2):3–8

[6] Frank Havemann, 2009. "Einführung in die Bibliometrie." Humboldt University of Berlin.

[7] Garfield, E., November 27, 2001. "From Bibliographic Coupling to Co-Citation Analysis Via Algorithmic Historio-Bibliography: A Citationist’s Tribute to Belver C. Griffith. a paper presented at the Drexel University, Philadelphia, PA.

[8] Howard D. White and Belver C. Griffith, 1981. "Author Cocitation: A Literature Measure of Intellectual Structure." Journal of the American Society for Information Science (JASIS), May, 1981 volume 32(3), pp. 163-171. -- the first ACA paper. DOI = 10.1002/asi.4630320302.

[9] M. Schwarzer, M. Schubotz, N. Meuschke, C. Breitinger, V. Markl, and B. Gipp, "Evaluating Link-based Recommendations for Wikipedia" in Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), New York, NY, USA, 2016, pp. 191-200.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]