Supplement to the principle of inverted index


---------------------------Introduction---------------------- -----
Inverted index, it seems that everyone calls it that way.
The English term corresponding to the inverted index is inverted index, and some papers also become inverted files, and they all talk about the same thing. The inverted index is different from the forward index.

---------------------------I am the text-------------------- ------- A
document is composed of many words, each of which can be repeated many times in the same document. Of course, the same word can also appear in different documents.

Forward index: Looking at the words in the document from the perspective of the document, it indicates which words each document (identified by the document ID) contains, and how many times each word appears (word frequency) and its position (relative to the document) The offset of the header).

Inverted index (inverted index, or inverted files): Look at the document from the perspective of words, identify each word in those documents (document ID), and how many times each word appears in the respective documents (term frequency) And where it appears (the offset relative to the beginning of the document).

Simply remember:
forward index: document ---> word
inverted index: word ---> document

---------------------------Conclusion---------------------- -----
Inverted index has a wide range of application scenarios, such as search engines, large-scale database indexing, document retrieval, multimedia retrieval/information retrieval, etc. In short, the inverted index is an important indexing mechanism in the search field.

Guess you like

Origin blog.csdn.net/GoSaint/article/details/106829102