Inverted index

In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of documents (named in contrast to a forward index, which maps from documents to content).^[1] The purpose of an inverted index is to allow fast full-text searches, at a cost of increased processing when a document is added to the database.^[2] The inverted file may be the database file itself, rather than its index. It is the most popular data structure used in document retrieval systems,^[3] used on a large scale for example in search engines. Additionally, several significant general-purpose mainframe-based database management systems have used inverted list architectures, including ADABAS, DATACOM/DB, and Model 204.

There are two main variants of inverted indexes: A record-level inverted index (or inverted file index or just inverted file) contains a list of references to documents for each word. A word-level inverted index (or full inverted index or inverted list) additionally contains the positions of each word within a document.^[4] The latter form offers more functionality (like phrase searches), but needs more processing power and space to be created.

^ Knuth, D. E. (1997) [1973]. "6.5. Retrieval on Secondary Keys". The Art of Computer Programming (Third ed.). Reading, Massachusetts: Addison-Wesley. ISBN 0-201-89685-0.
^ Salton, Gerard; Fox, Edward A.; Wu, Harry (November 1983). "Extended Boolean information retrieval". Communications of the ACM. 26 (11): 1022–1036. doi:10.1145/182.358466. hdl:1813/6351.
^ Zobel, Justin; Moffat, Alistair; Ramamohanarao, Kotagiri (December 1998). "Inverted files versus signature files for text indexing". ACM Transactions on Database Systems. 23 (4). New York: Association for Computing Machinery: 453–490. doi:10.1145/296854.277632. S2CID 7293918.
^ Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier (1999). Modern information retrieval. Reading, Massachusetts: Addison-Wesley Longman. p. 192. ISBN 0-201-39829-X.

[1] Knuth, D. E. (1997) [1973]. "6.5. Retrieval on Secondary Keys". The Art of Computer Programming (Third ed.). Reading, Massachusetts: Addison-Wesley. ISBN 0-201-89685-0.

[ACM_1983-2] Salton, Gerard; Fox, Edward A.; Wu, Harry (November 1983). "Extended Boolean information retrieval". Communications of the ACM. 26 (11): 1022–1036. doi:10.1145/182.358466. hdl:1813/6351.

[3] Zobel, Justin; Moffat, Alistair; Ramamohanarao, Kotagiri (December 1998). "Inverted files versus signature files for text indexing". ACM Transactions on Database Systems. 23 (4). New York: Association for Computing Machinery: 453–490. doi:10.1145/296854.277632. S2CID 7293918.

[isbn0-201-39829-X-p192-4] Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier (1999). Modern information retrieval. Reading, Massachusetts: Addison-Wesley Longman. p. 192. ISBN 0-201-39829-X.

[1]

[2]

[3]

[4]