Detailed explanation of Elasticsearch mapping performance related parameters

1. Overview

Let's take a look at the introduction of some commonly used parameters to get a general understanding of the meaning of these parameters, and then we will give a more detailed explanation of some important parameters.

1.1 Parameters affecting performance

parameter Description
index Default true, whether the field is indexed
enable The default is true, whether the field is indexed, not indexing can reduce CPU usage, but it cannot be searched
store The default is false. If the fields to be obtained are only small data in the document, these fields can be stored to reduce IO
doc_values Default true, optimize field sorting and aggregation script access, consume disk space
fielddata Default false, optimized for text type sorting, aggregation, and script access, try to avoid it, the operation is expensive
norms Default true, if the field does not need to participate in scoring, set to false to reduce disk usage

The norms of many term operations can be set to false. For example, many date fields are rarely used for full-text indexing on dates. Of course, specific problems are analyzed in detail.

Index structured fields, such as email address, host name, status code, and label, can be set to false for the field norms that can be set to keyword type.

1.2 Other

parameter Description
boost Default 1, score weighting factor
analyzer Tokenizer used by the field
similarity Algorithm used for scoring
fields Multiple processing methods for one field
null_value Set the default value of null
search_analyzer Search tokenizer
ignore_above Set the length of the index and store, if the length exceeds the length, ignore
copy_to Copy the field to the specified field to facilitate searching through a field
ignore_malformed Default false, ignore the abnormal data of this field when adding documents
index_option The content of the inverted index, docs, freqs, positions, offsets
coerce Default true, whether to allow data type coercion, such as string to number, floating point to integer
dynamic The default is true, whether it is allowed to dynamically add mapping types according to the document, true\false\strict

二、ignore_malformed

A very useful parameter, as the name suggests, ignore the abnormal field type.

For example, when adding a document, a date type value is set to an email string, or other types that cannot be converted to date, ES will directly throw an exception.

If ignore_malformed is set to true, the error field will be ignored and other fields will be processed normally.

Three, enable

Simply speaking, whether the field controlled by enable can be indexed, the default is true, because our purpose of using ES is to be able to index the full text.

What if we want to find the data for some fields, but we don't want the index to affect the score of the document?

Just set enable to false.

Note: enable can only be set at the top level and type is object to take effect.

enable

If enable is set to true, although it cannot be indexed, data can be obtained through _source.

Four, store

store controls the data storage method, set to true will store a separate copy. If the query is to retrieve several small fields from a lot of documents, you can set the small fields to store to store them separately.

This method is more efficient than retrieving from _source and then excluding it because it reduces IO operations.

store

Inquire:

{
    
    
    "_source": false,
    "stored_fields": [ "title", "date" ] 
}

By stored_fields method, _source will not be obtained by default

五、dynamic

The dynamic setting is whether ES can dynamically detect the document field, and dynamically modify and add the corresponding type in the mapping.

Dynamic has 3 values:

  1. true means allowing dynamic detection of document fields to add and modify the mapping type
  2. false means that the mapping will not be modified dynamically, but if there are fields in the document that are not in the mapping, it can be added successfully, but the field will not be indexed
  3. strict means that if there are fields in the document that are not in the mapping, the document is not allowed to be added

dynamic

Six, doc_values

The default is true, if you do not need to sort and aggregate the fields, or access the fields in the script, you can set it to false to save disk space

text type does not support doc_values

七、fielddata

Because doc_values ​​does not support the text type, so with fielddata, fielddata is a text version of doc_values, which is also to optimize field sorting, aggregation and script access.

Unlike doc_values, fielddata uses memory instead of disk.

Loading fielddata is an expensive process, so the default is false.

It is strongly recommended not to use fielddata, you should avoid using fielddata when designing

fielddata

Eight, index_option

Which information to store the inverted index, 4 optional parameters:

  1. docs: index document number
  2. freqs: document number + word frequency
  3. positions: document number + word frequency + position, usually used for distance query
  4. offsets: document number + word frequency + position + offset, usually used in the highlighted field

The word segmentation field defaults to positions, other defaults are docs

Nine, format

{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "date": {
    
    
        "type":   "date",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

mapping-format

十、normalizer

Set the field normalization processor.

normalizer

Eleven, null_value

Null values ​​cannot be searched, you can set a default string for null values.

{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "status_code": {
    
    
        "type":       "keyword",
        "null_value": "NULL" 
      }
    }
  }
}

Twelve, search_analyzer

search_analyzer

Thirteen, fields

If you want to perform multiple operations on a field, such as a field of type text, to do a full-text index, sorting and aggregation operations are needed.

{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "city": {
    
    
        "type": "text",
        "fields": {
    
    
          "raw": {
    
     
            "type":  "keyword"
          }
        }
      }
    }
  }
}

In this way, query operations can be performed on city, and sort and aggs operations can be performed on city.raw.

fields

14. Document

Elasticsearch-mapping parameters

Guess you like

Origin blog.csdn.net/trayvontang/article/details/103502794
Recommended