Read a text inverted index

Inverted index is a search engine company search engine most commonly used storage model, which is the core of search engines, in the practical application of search engines, sometimes need to find records according to certain values ​​of the key, and so is established by keyword index, this index is called an inverted index.

First of all you must be clear, this stuff index is generally used to improve query efficiency. To give a simple example, there are five known text files, we need to check a word in a text file which is located in the most intuitive way it is loaded one by one word each text file into memory and then use a for loop iterate over the array until you find the word. This practice is a positive index of ideas.

This query efficiency forward index also does not require me to Tucao. Inverted index of thinking actually not difficult. As another example, there are two sections of text

D1:Hello, conan!

D2:Hello, hattori!

The first step to find all words

Hello、conan、hattori

The second step, to find the location of text containing those words

Hello(D1,D2)

conan (D1)

hattori (D2)

We will words as Key Hash tables, text position where stored as Value Hash tables.

When we want to query the location of a word, just need to find the target document quickly Hash according to this table.

Combined before said forward index, not difficult to find. Forward index is through the document to find the word, the word is through the inverted index to find documents.

Advantage of the inverted index further comprising when dealing with complex multi-keyword query, the query can be completed and, in a cross-like inverted logic operation table, the results obtained after the recording is accessed, so that the query to the document conversion operators to address the collection, thereby improving the search speed.

Published 54 original articles · won praise 28 · views 4228

Guess you like

Origin blog.csdn.net/qq_37174887/article/details/102784608