Probabilistic relevance model

The probabilistic relevance model^[1]^[2] was devised by Stephen E. Robertson and Karen Spärck Jones as a framework for probabilistic models to come. It is a formalism of information retrieval useful to derive ranking functions used by search engines and web search engines in order to rank matching documents according to their relevance to a given search query.

It is a theoretical model estimating the probability that a document d_j is relevant to a query q. The model assumes that this probability of relevance depends on the query and document representations. Furthermore, it assumes that there is a portion of all documents that is preferred by the user as the answer set for query q. Such an ideal answer set is called R and should maximize the overall probability of relevance to that user. The prediction is that documents in this set R are relevant to the query, while documents not present in the set are non-relevant.

$sim(d_{j},q)={\frac {P(R|{\vec {d}}_{j})}{P({\bar {R}}|{\vec {d}}_{j})}}$

^ Robertson, S. E.; Jones, K. Spärck (May 1976). "Relevance weighting of search terms". Journal of the American Society for Information Science. 27 (3): 129–146. doi:10.1002/asi.4630270302.
^ Robertson, Stephen; Zaragoza, Hugo (2009). "The Probabilistic Relevance Framework: BM25 and Beyond". Foundations and Trends in Information Retrieval. 3 (4): 333–389. CiteSeerX 10.1.1.156.5282. doi:10.1561/1500000019.

[1] Robertson, S. E.; Jones, K. Spärck (May 1976). "Relevance weighting of search terms". Journal of the American Society for Information Science. 27 (3): 129–146. doi:10.1002/asi.4630270302.

[robertson2009-2] Robertson, Stephen; Zaragoza, Hugo (2009). "The Probabilistic Relevance Framework: BM25 and Beyond". Foundations and Trends in Information Retrieval. 3 (4): 333–389. CiteSeerX 10.1.1.156.5282. doi:10.1561/1500000019.

[1]

[2]