Part of a series on |
Citation metrics |
---|
Co-citation Proximity Analysis (CPA) is a document similarity measure that uses citation analysis to assess semantic similarity between documents at both the global document level as well as at individual section-level.[1][2] The similarity measure builds on the co-citation analysis approach, but differs in that it exploits the information implied in the placement of citations within the full-texts of documents.
Co-citation Proximity Analysis was conceived by B. Gipp in 2006[3] and the description of the document similarity measure was later published by Gipp and Beel in 2009.[1] The similarity measure rests on the assumption that within a document’s full-text, the documents cited in close proximity to each other tend to be more strongly related than those documents cited farther apart. The figure to the right illustrates the concept. The CPA approach to document similarity assumes the documents B and C to be more strongly related than the documents B and A, because the citations to B and C occur within the same sentence, whereas the citations to B and A are separated by several paragraphs.
The advantage of the CPA approach compared to other citation and co-citation analysis approaches is an improvement in precision. Other widely used citation analysis approaches, such as Bibliographic Coupling, Co-Citation or the Amsler measure, do not take into account the location or proximity of citations within documents. The CPA approach allows a more granular automatic classification of documents and can also be used to identify not only related documents, but the specific sections within texts that are most related.