Previous: ElasticSearch (6)-Kibana plugin
1. Forward index and reverse index
When it comes to the concept of indexes, you first need to know that indexes can be divided into forward indexes and reverse indexes (which can also be understood as inverted indexes).
Forward index:
The forward index can be simply understood as from document to word . For example, there are now 4 documents,
Doc | Words |
---|---|
Doc1 | On the road of life |
Doc2 | never retreat from the whole body |
Doc3 | enjoy its achievements and get something for nothing |
Doc4 | If you don’t work hard, you’re out |
The establishment of a positive index will first parse the words that appear in each document, and then establish the mapping relationship from the document to the word.
Doc | Words |
---|---|
Doc1 | On,the, road, of, life |
Doc2 | never ,retreat, from, the, whole, body |
Doc3 | enjoy ,its, achievements, and, get, something, for, nothing |
Doc4 | If ,you, do ,not, work hard, you are out |
Reverse index:
Reverse index can be simply understood as from words to documents . Taking the above 4 documents as an example, establishing a reverse index will establish the mapping relationship between words and documents
Word | Doc |
---|---|
On | Doc1 , |
the | Doc1,Doc2, |
road | Doc1 |
…… | …… |
The reverse index can not only record the position of a word in the document, but also record the number of occurrences in the document. For example, in the above table On
in Doc1
appear more than once, the word you
in Doc4
the emergence of two times in.
2. Why doesn't ElasticSearch use forward index?
If a forward index is used, when a user enters a search keyword, it will traverse all documents to find the document that contains the keyword. Respond data to the user. However, ElasticSearch is often used to deal with the needs of some applications with large amounts of data (such as Baidu search, etc.), the use of forward indexing efficiency is too low to respond to data in real time. So it is more reasonable to use reverse index (inverted index) at this time.
3. Case of inverted index
Now that there is a chestnut like the following, we need to establish an inverted index on the document shown in the figure below. (The picture comes from the network)
Use the tokenizer to segment the content in the document. And record the document number where these words appear.
Now if a user wants to search 谷歌
, you can find appear directly 谷歌
document words are: 1,2,3,4,5
. The inverted index can record not only the location information of keywords, but also the frequency of occurrence of keywords. For example, the following figure Wave
appears once in document 4, once 拉斯
in document 3, and once in document 5.
When displaying search results, scores will be scored according to the matching degree of the index. The higher the score, the higher the ranking.
Search now谷歌加盟网站
word | Document 1 | Document 2 | Document 3 | Document 4 | Document 5 |
---|---|---|---|---|---|
* | * | * | * | * | |
Join | * | * | * | ||
website | * |
According to the distribution of asterisks in the above chart, document 5 has the highest score and the highest matching degree.