[Elasticsearch] Forward Index, Inverted Index - Notes

The inverted index is well known to us. What is the forward index, and es still uses this? When we query some content in a lot of data, the inverted index will traverse all the inverted index "tables" one by one and then group and aggregate, but maybe in the previous search and found the results we want I just don't know about the inverted index, so the display is not very good. In order to deal with this situation, the positive index comes on the scene!

 

  Positive index:

    The data structure of doc value, the core principle is the same as inverted index, write to disk file, os cache for caching (improve the performance of service forward index), if os cache is not enough, write the data in it to disk file

    There are many solutions for performance optimization problems, here jvm less memory

      ES is based on os cache to cache and improve performance. It is not recommended to use jvm memory for caching (causing gc overhead and oom problems), so give jvm less memory and give os cache more memory, which can improve the forward index and Cache and query efficiency of inverted indexes.

 

  colum compression

    1. The same value: merge, keep an identity

      All values ​​are the same, keep one value, less than 256 values, use table encoding mode, if there is a common divisor of more than 256 values, divide by the greatest common divisor and keep none, then use offset combination compression

  

  Disable:

    No need for forward indexing, disable to reduce disk space usage

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "my_field": {
          "type":       "keyword"
          "doc_values": false
        }
      }
    }
  }
}

 For the aggregation operation of the segmented field, you need to set fielddata to true, otherwise an error will be reported to prompt you to open fielddata and load the positive index into memory

    A field without word segmentation will generate a positive index at index-time, so that the positive index is directly used during aggregation.

    The field of word segmentation does not have a forward index (doc value) when creating an index. Direct aggregation reports an error. It is necessary to open fielddata, and then es establishes a forward index on the spot when performing aggregation, and loads the fielddata forward index into memory. Memory-based The positive row index performs the aggregation operation of the word segmentation field. It can be seen that this will consume memory space, so why does it take up a lot of memory? The word segmentation string needs to be aggregated by installing terms, and then perform complex algorithms and operations. If it is based on disk and os cache, the performance will be poor. Obviously, performance and memory choose performance.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324962602&siteId=291194637