Detailed explanation of important mapping parameters of Elasticsearch (3)

1. Overview

Let's take a look at the introduction of some commonly used parameters to get a general understanding of the meaning of these parameters, and then we will give a more detailed explanation of some important parameters.

1.1 Parameters affecting performance

parameter Description
index The default is true, whether the field is segmented or not, false is not segmented, and you cannot query through the words in the field
enable The default is true, whether the field is indexed, not indexing can reduce CPU usage, but it cannot be searched
store The default is false. If the fields to be obtained are only small data in the document, these fields can be stored to reduce IO
doc_values Default true, optimize field sorting and aggregation script access, consume disk space
fielddata Default false, optimized for text type sorting, aggregation, and script access, try to avoid it, the operation is expensive
norms The default is true, if the field does not need to participate in scoring, set it to false to reduce disk usage, only need to consider the text type

The value of index in earlier versions was analyzed and not_analyzed, corresponding to the current true and false respectively.

To give a simple example: If the value of a field: "es is awsome".

If this field index is set to true, then when we search for'es', we can also search for the document where this field is located.
If the index of this field is set to false, the document where the changed field is located cannot be searched through the'es' search.

The difference between index and enable:

  1. The index is false, no word segmentation, but it can still be searched through an exact match, and it will affect the score of the document.
  2. If enable is false, this field cannot be searched at all, and will not affect the score of the document.

The norms of many term operations can be set to false. For example, many date fields are rarely used for full-text indexing on dates. Of course, specific problems are analyzed in detail.

Index structured fields, such as email address, host name, status code, and label, can be set to false for the field norms that can be set to keyword type.

1.2, other parameters

parameter Description
boost Default 1, score weighting factor
analyzer Analyzer used by the field
similarity Algorithm used for scoring
fields Multiple processing methods for one field
null_value Set the default value of null
search_analyzer Search analyzer
ignore_above Set the length of the index and store, if the length exceeds the length, ignore
copy_to Copy the field to the specified field to facilitate searching through a field
ignore_malformed Default false, ignore the abnormal data of this field when adding documents
index_option The content of the inverted index, docs, freqs, positions, offsets
coerce Default true, whether to allow data type coercion, such as string to number, floating point to integer
dynamic The default is true, whether it is allowed to dynamically add mapping types according to the document, true\false\strict

四、ignore_malformed

A very useful parameter, as the name suggests, ignore the abnormal field type.

For example, when adding a document, a date type value is set to an email string, or other types that cannot be converted to date, ES will directly throw an exception.

If ignore_malformed is set to true, the error field will be ignored and other fields will be processed normally.

Five, enable

Simply speaking, whether the field controlled by enable can be indexed, the default is true, because our purpose of using ES is to be able to index the full text.

What if we want to find the data for some fields, but we don't want the index to affect the score of the document?

Just set enable to false.

Note: enable can only be set at the top level and type is object to take effect.

ES mapping enable parameter

If enable is set to true, although it cannot be indexed, data can be obtained through _source.

Six, store

store controls the data storage method, set to true will store a separate copy. If the query is to retrieve several small fields from a lot of documents, you can set the small fields to store to store them separately.

This method is more efficient than retrieving from _source and then excluding it because it reduces IO operations.

ES mapping store parameters

Inquire:

{
    
    
    "_source": false,
    "stored_fields": [ "title", "date" ] 
}

By stored_fields method, _source will not be obtained by default

七、dynamic

The dynamic setting is whether ES can dynamically detect the document field, and dynamically modify and add the corresponding type in the mapping.

Dynamic has 3 values:

  1. true means allowing dynamic detection of document fields to add and modify the mapping type
  2. false means that the mapping will not be modified dynamically, but if there are fields in the document that are not in the mapping, it can be added successfully, but the field will not be indexed
  3. strict means that if there are fields in the document that are not in the mapping, the document is not allowed to be added

ES mapping dynamic parameters

Eight, doc_values

The default is true. If you do not need to sort and aggregate the fields, or access the fields in the script, you can set it to false to save disk space and reduce the disk IO for adding indexes, because there is no need to create the disk data structure of doc_valuses.

doc_value is a data structure stored on the disk. It is created when adding documents. It uses column-oriented storage to make it more efficient in sorting and aggregation.

doc_values ​​does not support text and annotated_text type fields.

九、fielddata

Because doc_values ​​does not support the text type, so with fielddata, fielddata is a text version of doc_values, which is also to optimize field sorting, aggregation and script access.

Unlike doc_values, fielddata uses memory instead of disk.

Loading fielddata is an expensive process, so the default is false.

It is strongly recommended not to use fielddata, you should avoid using fielddata when designing

fielddata

Ten, index_option

Which information to store the inverted index, 4 optional parameters:

  1. docs: index document number
  2. freqs: document number + word frequency
  3. positions: document number + word frequency + position, usually used for distance query
  4. offsets: document number + word frequency + position + offset, usually used in the highlighted field

The position is the first few words in the document after word segmentation filtering. The offset includes the beginning and ending bytes of the word, which is convenient for querying from the document.

The word segmentation field defaults to positions, and other defaults are docs.

Eleven, format

{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "date": {
    
    
        "type":   "date",
        "format": "yyyy-MM-dd"
      }
    }
  }
}

mapping-format

Twelve, null_value

Null values ​​cannot be searched, you can set a default string for null values.

{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "status_code": {
    
    
        "type":       "keyword",
        "null_value": "NULL" 
      }
    }
  }
}

Thirteen, fields

If you want to perform multiple operations on a field, such as a field of type text, to do a full-text index, sorting and aggregation operations are needed.

{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "city": {
    
    
        "type": "text",
        "fields": {
    
    
          "raw": {
    
     
            "type":  "keyword"
          }
        }
      }
    }
  }
}

In this way, query operations can be performed on city, and sort and aggs operations can be performed on city.raw.

fields

14. search_analyzer

The analyzer used when searching
search_analyzer

15. Normalizer

Set the field normalization processor, mainly used to filter out special characters, convert case, etc.

For keyword, execute before query or index.

normalizer

16. Document

Elasticsearch-mapping parameters

Guess you like

Origin blog.csdn.net/trayvontang/article/details/103550942