ES---inverted index

ES—inverted index

[Preface]
Elasticsearch uses Lucene's inverted index technology to achieve faster filtering than relational databases. In particular, it has very good filter support for multiple conditions, such as combined queries with age between 18 and 30 and gender as female. Inverted index is introduced in many places, but how is it faster than b-tree index in relational database? Why is it fast?

Generally speaking, the b-tree index is an index structure optimized for writing. When we do not need to support fast updates, we can use pre-sorting and other methods in exchange for smaller storage space, faster retrieval speed and other benefits, at the cost of slow updates. To further in-depth, we still have to look at how Lucene's inverted index is composed.

[Text]
Indexes in ES are not discussed here, they are equivalent to tables in traditional databases

Inverted Index (Inverted Index) : Each document corresponds to an ID. The inverted index will segment each document according to the specified grammar, and then maintain a table that lists the terms that appear in all documents and the document IDs that they appear in. Occurrence frequency, it is a specific storage form to realize "word-document matrix". The inverted index is mainly composed of two parts: "Word dictionary "+" inverted file”。

Simply put: the positive index is to find the value according to the key, and the reverse index is to find the key according to the value.
Diagram of the difference between the two

Positive Index
Insert picture description here
Reverse Index
Insert picture description here
Word Dictionary (Lexicon) : A word dictionary is a collection of strings composed of all the words that have appeared in the document collection. Each index item in the word dictionary records some information about the word itself and points to the "inverted list" pointer.

Inverted list (PostingList) : The inverted list records the document list of all documents in which a certain word has appeared and the position information of the word in the document. Each record is called an inverted item.

Inverted File : The inverted list of all words is often stored sequentially in a file on the disk. This file is called an inverted file. An inverted file is a physical file that stores an inverted index.

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_43288259/article/details/114937841