es of Mapping

Elasticsearch provide enough field mapping parameters mapping parameter settings to achieve some common features, for example tokenizer field, date formats, retrieval model selection is accomplished by the configuration parameters, introduced one each of the following usage parameters.

1. analyzer

analyzer specify parameters for word the text field, valid for indexing and querying. Tokenizer will type text content into a plurality of terms, the query word division likewise will query the same word or another word is a character string by parsing and index. The usual IK Chinese word as an example, for the title field, analyzer parameters value of ik_max_word, use ik_max_word word means when the title field when content indexing and querying, mapping configuration is as follows:

 

PUT website

{

"mappings":{

"properties":{

"title":{

"type":"text",

"analyzer":"ik_max_word"

}

}

}

}

This parameter can query analyzer, the field, the index level is specified as its priority (the priority more front):

 

Defined on the field tokenizer

Index defined in the configuration word breaker

The default tokenizer (standard)

In the context of the query, word's priority is to find:

 

full-text query word is defined

Defined type defined in the mapping field search_analyzer tokenizer

Custom field type mappings defined analyzer tokenizer

The index word is defined default_search

Index defined by default tokenizer

Standard tokenizer (standard)

2. search_analyzer

Indexing and searching in most cases should be specified when the same word is to ensure consistency of terms after the query parsing index. But sometimes you need to specify a different word breaker. For example, using a filter edge_ngram automatic completion. Use query analyzer attribute specified word breaker by default, but can also be covered search_analyzer.

 

Examples are as follows:

 

PUT website

{

"settings":{

"analysis":{

"filter":{

"autocomplete_filter":{

"type":"edge_ngram",

"min_gram":1,

"max_gram":20

}

},

"analyzer":{

"autocomplete":{

"type":"custom",

"tokenizer":"standard",

"filter":[

"lowercase",

"autocomplete_filter"

]

}

}

}

},

"mappings":{

"properties":{

"title":{

"type":"text",

"analyzer":"autocomplete",

"search_analyzer":"standard"

}

}

}

}

title field uses autocomplete word segmentation is performed, but using standard word is searched. Indexing a document:

 

PUT website/1

{

"title":"Quick Brown Fox"

}

title field generated by the inverted index of terms include the following:

 

[q, qu, qui, quic, quick, b, br, bro, brow, brown, f, fo, fox]

1

3. normalizer

normalizer parameters for analysis before the standardized configuration, attention for keyword type, such as all characters into lower case. Value of foo the following example is used in the field before parsing custom normalizer string normalized and converted into lowercase:

 

PUT website

{

"settings":{

"analysis":{

"normalizer":{

"my_normalizer":{

"type":"custom",

"char_filter":[],

"filter":["lowercase", "asciifolding"]

}

}

}

},

"mappings":{

"properties":{

"foo":{

"type":"keyword",

"normalizer":"my_normalizer"

}

}

}

}

4. boost

Right field is used to boost the weight setting field. Such as setting the right keywords appear in the title field of weight is twice as heavy as the content appears in the right field, where the weight content of heavy field default is 1, mapping is as follows:

 

PUT website

{

"mappings":{

"properties":{

"title":{

"type":"text",

"boost":2

},

"content":{

"type":"text"

}

}

}

}

You can also set the weight at the time of the query:

 

POST website/_search

{

"query":{

"match":{

"title":{

"Query": "I am Chinese"

"boost":2

}

}

}

}

Recommended specified boost in query time. When setting the index weighting, if you do not rebuild the index, the weights can not be modified. When a query is specified weights can achieve the same effect, modify the weights more flexible.

 

5. coerce

coerce property for removing dirty data, the default value is true. Integer numbers 3-5 may be written as a character string "5" or 5.0 float. coerce attribute can be used to clear dirty data, strings, and floating point is cast to an integer.

 

6. copy_to

Since custom _all copy_to parameter field values ​​can be copied to a plurality of super fields field. In the following example the content title and content fields are merged full_content.

 

PUT website

{

"mappings":{

"properties":{

"title":{

"type":"text",

"copy_to":"full_content"

},

"content":{

"type":"text",

"copy_to":"full_content"

},

"full_content":{

"type":"text"

}

}

}

}

7. doc_values

doc_values ​​parameter is to speed sorting, polymerization operation. When building an inverted index, adding an extra ** ** columnar storage mapping, it is a practice space for time. The default is on, the field is not required for the polymerization or may be sorted off doc_values ​​save space.

 

PUT website

{

"mappings":{

"properties":{

"status":{

"type":"keyword"

},

"session_id":{

"type":"keyword",

"doc_values":false

}

}

}

}

Note: text type is not supported doc_values

 

8. dynamic

Whether the field can be added automatically by setting the dynamic mapping, accept the following parameters:

 

true: The default value is automatically added field

false: ignore the new field

strict: strict mode, discover new fields throw an exception

Instructions:

 

PUT website

{

"mappings":{

"dynamic":"strict",

"properties":{

"title":{

"type":"text"

}

}

}

}

9. enabled

ES default index all the fields, and some fields only need to store, query or no aggregation of demand, in which case you can use the enabled parameter to control. enabled to field set to false, ES skips field contents, the field's value can only be obtained from _source, but it can not be searched, the field may be any type. E.g:

 

PUT website

{

"mappings":{

"properties":{

"name":{

"enabled":false

}

}

}

}

10. fielddata

Polymerization type text fields can be turned on fielddata. fielddata for the first time in the polymerization field, sorting or when using a script generation. ES regenerate the document terms in inverted relationship by reading the records in the table on the disk, and finally sort Java heap memory.

 

fielddata attribute text field is off by default, open fielddata consuming memory.

 

 

PUT website

{

"mappings":{

"properties":{

"title":{

"type":"text",

"fielddata":true

}

}

}

}

11. format

ES using the format parameter specifies the date format.

 

PUT website

{

"mappings":{

"properties":{

"Index": {

"type":"date",

"format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"

}

}

}

}

12. ignore_above

The maximum length of the specified string, exceeds the maximum length will be ignored, only a keyword type, for example:

 

PUT website

{

mappings:{

"properties":{

"message":{

"type":"keyword",

"ignore_above":20

}

}

}

}

13. ignore_malformed

ignore_malformed can ignore irregular data. To a field index inappropriate data type exception occurs, causing the entire document indexing failed. If ignore_malformed parameter set to true, the exception will be ignored, abnormal fields will not be indexed, other fields properly indexed.

 

14. index

index attribute specifies whether the field is indexed, it does not search index, evaluates to true or false.

 

15. index_options

What information is stored in the parameter index_options inverted index control index. Set title field stores the document number, word frequency, word location, start and end of terms of character positions, mapping is as follows:

 

PUT website

{

"mappings":{

"properties":{

"title":{

"type":"text",

"index_options":"offsets"

}

}

}

}

index_options Parameter Value table:

 

Parameters role

docs only stores the document number, the default value

freqs stores the document number and frequency of lexical items

No document storage positions, a term frequency, a term offset, offset can be used to close the search and query phrase

offsets document number, key word frequency, word location of the item, a term beginning and end of the character positions are stored, offset is set to true uses Postings highlighter

16. fields

fields parameter allows the same field there are many different ways of indexing. For example, a field of type text, may be used to retrieve Chinese text, pinyin retrieval, mapping is as follows:

 

PUT website

{

"mappings":{

"properties":{

"title":{

"type":"text",

"analyzer":"ik_max_word",

"search_analyzer":"ik_smart",

"fields":{

"pinyin":{

"type":"text",

"analyzer":"pinyin"

}

}

}

}

}

}

17. norms

norms parameters for standardization documents, in order to calculate the correlation of the document query. norms, while useful to score, but will consume more disk space, if you do not need to score a field, it is best not to open norms.

 

18. null_value

Null value fields are not indexed nor can search, null_value parameter allows the index value can be null fields displayed searchable. Examples are as follows:

 

PUT website

{

"mappings":{

"properties":{

"status":{

"type":"keyword",

"null_value":"NULL"

}

}

}

}

 

PUT website/1

{

"status":null

}

 

PUT website/2

{

"status":[]

}

 

GET website/_search

{

"query":{

"term":{

"status":"NULL"

}

}

}

1 document can be searched, because the status is null, the document 2 can not be searched, because value is an empty array status, but not null.

 

19. properties

Type of mapping, the normal field, and type Objet nested field types are referred Properties (properties), these attributes may be any type of data, including nested object types and attributes may be added in the following ways:

 

Clearly define them when creating an index.

Clearly define them when you add or update using PUT mapping API mapping type.

Dynamically added when indexing a document that contains the new field.

20. similarity

similarity parameter is used to specify the document scoring models, there are three parameters:

 

BM25: ES and Lucene default scoring model.

classic: TF / IDF scoring model.

boolean: a scoring model.

PUT website

{

"mappings":{

"properties":{

"name":{

"type":"text",

"similarity":"classic"

}

}

}

}

21. store

By default, the field is indexed, you can search for, but not stored. Because _source field holds the copy of the original document. In some cases it makes sense to, such as not storing the original files, stores only a few fields need.

 

PUT website

{

"mappings":{

"_source":{

"enabled":false

},

"properties":{

"title":{

"type":"text",

"store":true

},

"Index": {

"type":"date",

"store":true

},

"content":{

"type":"text"

}

}

}

}

22. term_vector

Word vector contains the following information text to be parsed:

Set of lexical items

A term position

Start character is mapped to a term position in the original document

term_vector parameter values ​​table:

 

Parameter Value Meaning

no default value, does not store word vector

yes just a collection of items stored word

with_positions storage of terms and a term position

with_offsets Term and character offsets

with_positions_offsets storage lexical items, a term position, character offset position

Source: https://blog.csdn.net/dwjf321/article/details/104003852

 

Published 277 original articles · won praise 65 · views 380 000 +

Guess you like

Origin blog.csdn.net/ailiandeziwei/article/details/104674654