Mapping of the article to get to know Elasticsearch

This article introduces Mapping, Dynamic Mapping and ElasticSearch is how to automatically determine the type of the field, while the introduction Mapping of relevant parameters.

First, look at what is Mapping:

What is Mapping?

In an article with you to get ElasticSearch term , we talked about the structure definition table Mapping is similar to the database schema, it has the following effects:

  • Name defined index fields
  • Field data type definition , such as strings, numbers, Boolean
  • Field, inverted index configuration , such as setting a field is not indexed, recording position, etc.

In an earlier version of ES, an index that can have multiple Type, starting from 7.0, only a Type an index, it can be said there is a Type Mapping a definition.

In the understanding of what is after Mapping, then do the next introduction Mapping settings:

Mapping setting

PUT users
{
    "mappings": {
        "_doc": {
            "dynamic": false
        }
    }
}

When you create an index, can dynamicbe set, it can be set to false, trueor strict.

Dynamic Mappings settings

For example, a new document that contains a field, when Dynamic set truetime, the document can be indexed into the ES, this field can also be indexed, that is, the field can be searched, Mapping also been updated; when the dynamic is set falsetime, the presence of additional fields into the data, the data may be indexed, but the new field is discarded; if set to strictmode when the data is written directly to an error.

Another indexparameter is used to control whether the current field is indexed, by default true, if set false, this field can not be searched.

Parameters index_optionsfor controlling the content recorded inverted index, the following four configurations:

  • doc: only records doc id
  • freqs: Record doc idandterm frequencies
  • positions: records doc id, term frequenciesandterm position
  • offsets: record doc id, term frequencies, term positionandcharacter offects

In addition, textthe type of default configuration positions, other types of default doc, recorded more content, the greater take up storage space.

null_valueThe main field is encountered when nullprocessing policy when the value, the default is NULLthat a null value, then ignore the ES will default value, you can set the default value of the field by setting the value of the other type of support only KeyWord set null_value.

copy_toRole is to copy the value of the field to the target field, achieve a similar _alleffect, it does not appear _source, only to search.

In addition to the parameters described above, there are many parameters, we are interested can be viewed in official documents.

After learning the Mapping setup, let's look at the type of data fields which it!

Field data types

ES field type similar to the type field in MySQL, ES field types are: core type, complex type, type, and geographic special type, specific type of data as shown below:

Field data types

Core type

As it can be seen from the figure the core can be divided into type string type, numeric type, date type, a Boolean type, based on the BASE64 binary type, range type.

String type

Among them, there are two types of string in ES 7.x: textand keyword, after ES 5.x stringtype is no longer supported.

textType applies to the fields that need to be full-text search, such as news text, message content and other long text texttypes are Lucene word breaker (Analyzer) treated as a word item, and use Lucene inverted index storage, text fields can not be for sorting , if desired using this type of field only needs to specify JSON corresponding field when mapping is defined typeas text.

keywordSuitable short, the structure of strings, such as the host name, the name, trade name and the like, may be used for filtering, sorting, retrieving the polymerization, may also be used to query accurately .

Digital Type

Numeric types are divided long、integer、short、byte、double、float、half_float、scaled_float.

Numeric field types in meeting the needs of a range should try to select a smaller data type, field length, the shorter, the higher the efficiency of the search for the floating-point number, A can be considered scaled_floattype, which may be floating point accuracy by scaling factor , 1234, for example, can be converted to 12.34 is stored.

Date Type

In the ES can date the following form:

  • Date formatted string, for example, 2020-03-1700: 00,2020 / 03/17
  • A time stamp (1970-01-01 00:00:00 UTC and the difference), in milliseconds or seconds

Even formatted date string, ES underlayer still uses a timestamp stored.

Boolean

JSON document also exists a Boolean type, a string type but JSON may also be converted to Boolean type ES storage, provided that the value is a string trueor a falseBoolean type commonly used in retrieving the filter.

Binary type

Binary type binaryaccepted BASE64 encoded string, the default storeattribute false, and may not be searched.

Range Type

Range interval is used to convey a type of data it can be divided into five kinds: integer_range、float_range、long_range、double_rangeand date_range.

Complex type

The main types of composite object types (object) and nested types (nested):

Object Types

JSON string allows nested objects, a plurality of documents can be nested, multilayer objects. The document may be stored by the two object types, but because there is no concept of internal and Lucene object, ES JSON original document will be flat, such as a document:

{
    "name": {
        "first": "wu",
        "last": "px"
    }
}

ES fact will convert it to this format, and stored by Lucene, even namea objecttype:

{
    "name.first": "wu",
    "name.last": "px"
}

Nested types

Nested types can be viewed as a special object type, you can make an array of objects independent retrieval, such as document:

{
  "group": "users",
  "username": [
    { "first": "wu", "last": "px"},
    { "first": "hu", "last": "xy"},
    { "first": "wu", "last": "mx"}
  ]
}

usernameField is a JSON array, and each array object is a JSON object. If you usernameset the object type, then the ES will convert it to:

{
  "group": "users",
  "username.first": ["wu", "hu", "wu"],
  "username.last": ["px", "xy", "mx"]
}

JSON can be seen in the converted document firstand lastthe associated lost, if you try to search firstfor the wu, lastfor the xydocument, then success will retrieve these documents, however wu, and xydoes not belong to the same JSON objects in the original JSON document, should be a mismatch , that could not retrieve any results.

Nested types is to solve this problem, type each nested JSON object in the array as a separate document to store hidden, each nested objects can be searched independently, so although on the surface of the case only one document, but actually stores the four documents.

Geography Type

Geographical field is divided into two types: type and geographic latitude and longitude area type:

Type the latitude and longitude

Type the latitude and longitude fields (geo_point) may store latitude and longitude information by geographic type of field can be used to achieve such find within a specified geographic area related documents, sorted according to the distance, modified scoring rules based on geography and other needs.

Geo type

Latitude and longitude can be expressed as a dot type, and geo_shapethe type of a geographical area can be expressed, it may be any shape of a polygonal region may be a point, line, polygon, multi-point, multi-line, multi-faceted geometric types.

Special type

Special types include IP type, filter types, Join type, alias type, etc., where a brief introduction of the IP type Join types and other special types can view the official documentation.

IP type

IP type field may be used to store IPv4 or IPv6 addresses, if necessary stores the IP type of field, need to manually define the mapping:

{
  "mappings": {
    "properties": {
      "my_ip": {
        "type": "ip"
      }
    }
  }
}

Join type

Join type is the type ES 6.x introduced to replace obsolete _parentyuan field, the document used to implement one, one to many relationship, mainly used to make his son a query.

Mapping type Join follows:

PUT my_index
{
  "mappings": {
    "properties": {
      "my_join_field": { 
        "type": "join",
        "relations": {
          "question": "answer" 
        }
      }
    }
  }
}

Wherein, my_join_fieldfor the Join type field name; relationsspecify the relationship: questionis answerthe parent class.

Defined, for example as a parent document ID 1:

PUT my_join_index/1?refresh
{
  "text": "This is a question",
  "my_join_field": "question" 
}

Next, define a sub document that specifies the parent document ID is 1:

PUT my_join_index/_doc/2?routing=1&refresh 
{
  "text": "This is an answer",
  "my_join_field": {
    "name": "answer", 
    "parent": "1" 
  }
}

After re-Complete understanding of the field data type, let us look at what is Dynamic Mapping?

What is Dynamic Mapping?

Dynamic Mapping mechanism so that we do not need to manually define Mapping, ES will automatically be judged according to the document type field right information , but sometimes the wrong projections, such as geographic information is likely to be judged Text, when the type is set, if not, cause some features do not work properly, such as Range queries.

Automatic Identification Type

ES type of automatic identification is based on JSON format, if the input string is JSON format and date format, ES will automatically set Datetype; when the input string is a digital time, as a default string ES process , may be provided by converting suitable type; if the input is Textthe time field, ES will automatically increase keywordsubfields, some automatic identification as shown below:

Automatic Identification Type

Let's use an example to see how type is automatically recognized, enter the following request to create the index:

PUT /mapping_test/_doc/1
{
  "uid": "123",
  "username": "wupx",
  "birth": "2020-03-16",
  "married": false,
  "age": 18,
  "heigh": 180,
  "tags": [
    "java",
    "boy"
  ],
  "money": 999.9
}

Then GET /mapping_test/_mappingview the results as shown below:

As can be seen from the results, ES will automatically calculate the appropriate type according to the document information.

Oh excluded, in case I want to modify Mapping field type, can change it? Let us two situations to explore follows:

Modify Mapping field type?

If you are newly added field, according to Dynamic settings are divided into the following three conditions:

  • When set to Dynamic true, the new field once the document is written, Mapping also be updated.
  • When set to Dynamic false, the index of Mapping is not updated, the new field's data can not be indexed, that can not be searched, but the information will appear in the _sourcemiddle.
  • When set to Dynamic strict, the document is written will fail.

Another field is already present, in this case, can not be modified ES type field, since ES is an inverted index Lucene implemented after once formed can not be modified, if desired to change the field type must be used rebuild the index Reindex API.

The reason can not be modified if the modified data type of the field, will result has been indexed and can not be searched, but if you add a new field, there would be no such effects.

to sum up

This paper describes Mapping and Dynamic Mapping, while the field type introduced in detail, also introduced in the ES is how to make projections of the field type, understand the Mapping of relevant parameters.

In the public No. [ Wupei Xuan ] reply [ es ] mind map and get the source code.

references

"Elasticsearch technical analysis and real."

Elastic Stack from entry to practice

Elasticsearch core technology and combat

https://www.elastic.co/guide/en/elasticsearch/reference/7.1/mapping.html

Guess you like

Origin www.cnblogs.com/wupeixuan/p/12514843.html