ElasticSearch (7) --- inverted index

Insert picture description here

Previous: ElasticSearch (6)-Kibana plugin

1. Forward index and reverse index

  When it comes to the concept of indexes, you first need to know that indexes can be divided into forward indexes and reverse indexes (which can also be understood as inverted indexes).

Forward index:

  The forward index can be simply understood as from document to word . For example, there are now 4 documents,

Doc Words
Doc1 On the road of life
Doc2 never retreat from the whole body
Doc3 enjoy its achievements and get something for nothing
Doc4 If you don’t work hard, you’re out

  The establishment of a positive index will first parse the words that appear in each document, and then establish the mapping relationship from the document to the word.

Doc Words
Doc1 On,the, road, of, life
Doc2 never ,retreat, from, the, whole, body
Doc3 enjoy ,its, achievements, and, get, something, for, nothing
Doc4 If ,you, do ,not, work hard, you are out
Reverse index:

  Reverse index can be simply understood as from words to documents . Taking the above 4 documents as an example, establishing a reverse index will establish the mapping relationship between words and documents

Word Doc
On Doc1 ,
the Doc1,Doc2,
road Doc1
…… ……

The reverse index can not only record the position of a word in the document, but also record the number of occurrences in the document. For example, in the above table Onin Doc1appear more than once, the word youin Doc4the emergence of two times in.

2. Why doesn't ElasticSearch use forward index?

  If a forward index is used, when a user enters a search keyword, it will traverse all documents to find the document that contains the keyword. Respond data to the user. However, ElasticSearch is often used to deal with the needs of some applications with large amounts of data (such as Baidu search, etc.), the use of forward indexing efficiency is too low to respond to data in real time. So it is more reasonable to use reverse index (inverted index) at this time.

3. Case of inverted index

  Now that there is a chestnut like the following, we need to establish an inverted index on the document shown in the figure below. (The picture comes from the network)

Insert picture description here
Use the tokenizer to segment the content in the document. And record the document number where these words appear.
Insert picture description here
Now if a user wants to search 谷歌, you can find appear directly 谷歌document words are: 1,2,3,4,5. The inverted index can record not only the location information of keywords, but also the frequency of occurrence of keywords. For example, the following figure Waveappears once in document 4, once 拉斯in document 3, and once in document 5.

Insert picture description here
When displaying search results, scores will be scored according to the matching degree of the index. The higher the score, the higher the ranking.
Search now谷歌加盟网站

word Document 1 Document 2 Document 3 Document 4 Document 5
Google * * * * *
Join * * *
website *

According to the distribution of asterisks in the above chart, document 5 has the highest score and the highest matching degree.

Next: ElasticSearch (8) —Word Segmenter
Published 117 original articles · Like 57 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/qq_43655835/article/details/104748456