Rocchio algorithm

The Rocchio algorithm is based on a method of relevance feedback found in information retrieval systems which stemmed from the SMART Information Retrieval System developed between 1960 and 1964. Like many other retrieval systems, the Rocchio algorithm was developed using the vector space model. Its underlying assumption is that most users have a general conception of which documents should be denoted as relevant or irrelevant.[1] Therefore, the user's search query is revised to include an arbitrary percentage of relevant and irrelevant documents as a means of increasing the search engine's recall, and possibly the precision as well. The number of relevant and irrelevant documents allowed to enter a query is dictated by the so called weights, i.e. the variables , and listed below in the Algorithm section.[1]

  1. ^ a b Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze: An Introduction to Information Retrieval, page 163-167. Cambridge University Press, 2009.