Elasticsearch Architecture - Inverted Index and Column Storage

Inverted index

Refer to Elasticsearch official documents for inverted index

Elasticsearch uses an inverted index structure, which is suitable for fast full-text search. An inverted index will contain a list of all unique terms in a document and the document in which each term appears.

For example, suppose we have two documents, and each document's contentfield contains the following:

  1. The quick brown fox jumped over the lazy dog
  2. Quick brown foxes leap over lazy dogs in summer

To create an inverted index, first split each document's contentdomain into individual terms, create a sorted list of all unique terms, and then list in which documents each term appears.

The result is as follows:

Term      Doc_1  Doc_2
-------------------------
Quick   |       |  X
The     |   X   |
brown   |   X   |  X
dog     |   X   |
dogs    |       |  X
fox     |   X   |
foxes   |       |  X
in      |       |  X
jumped  |   X   |
lazy    |   X   |  X
leap    |       |  X
over    |   X   |  X
quick   |   X   |
summer  |       |  X
the     |   X   |
------------------------

Now, if we want to search quick brown, we just need to find documents that contain each term:

Term      Doc_1  Doc_2
-------------------------
brown   |   X   |  X
quick   |   X   |
------------------------
Total   |   2   |  1

It can be seen from the results that both documents match, and if a simple similarity algorithm that only counts the number of matching terms is adopted, the first document has a higher matching degree than the second document.

columnar storage

For columnar storage, refer to the official Elasticsearch documentation

insert image description here

In Elasticsearch, Doc Valuesit is a columnar storage structure. By default, each field Doc Valuesis activated Doc Valuesand created during indexing. When a field is indexed, Elasticsearch will add the value of the field to the inverted list for fast retrieval. In the index, it will also store the field Doc Values.

In Elasticsearch Doc Valuesis often applied to the following scenarios:

  • Sort a field
  • aggregate a field
  • Certain filters, such as geo-location filters
  • Certain field-related script calculations

Elasticsearch's columnar storage stores documents in the order they are written (not in the order of doc_id).

Guess you like

Origin blog.csdn.net/qq_34561892/article/details/129393021