This article introduces Mapping, Dynamic Mapping and ElasticSearch is how to automatically determine the type of the field, while the introduction Mapping of relevant parameters.
First, look at what is Mapping:
What is Mapping?
In an article with you to get ElasticSearch term , we talked about the structure definition table Mapping is similar to the database schema
, it has the following effects:
- Name defined index fields
- Field data type definition , such as strings, numbers, Boolean
- Field, inverted index configuration , such as setting a field is not indexed, recording position, etc.
In an earlier version of ES, an index that can have multiple Type, starting from 7.0, only a Type an index, it can be said there is a Type Mapping a definition.
In the understanding of what is after Mapping, then do the next introduction Mapping settings:
Mapping setting
PUT users
{
"mappings": {
"_doc": {
"dynamic": false
}
}
}
When you create an index, can dynamic
be set, it can be set to false
, true
or strict
.
For example, a new document that contains a field, when Dynamic set true
time, the document can be indexed into the ES, this field can also be indexed, that is, the field can be searched, Mapping also been updated; when the dynamic is set false
time, the presence of additional fields into the data, the data may be indexed, but the new field is discarded; if set to strict
mode when the data is written directly to an error.
Another index
parameter is used to control whether the current field is indexed, by default true
, if set false
, this field can not be searched.
Parameters index_options
for controlling the content recorded inverted index, the following four configurations:
- doc: only records
doc id
- freqs: Record
doc id
andterm frequencies
- positions: records
doc id
,term frequencies
andterm position
- offsets: record
doc id
,term frequencies
,term position
andcharacter offects
In addition, text
the type of default configuration positions
, other types of default doc
, recorded more content, the greater take up storage space.
null_value
The main field is encountered when null
processing policy when the value, the default is NULL
that a null value, then ignore the ES will default value, you can set the default value of the field by setting the value of the other type of support only KeyWord set null_value
.
copy_to
Role is to copy the value of the field to the target field, achieve a similar _all
effect, it does not appear _source
, only to search.
In addition to the parameters described above, there are many parameters, we are interested can be viewed in official documents.
After learning the Mapping setup, let's look at the type of data fields which it!
Field data types
ES field type similar to the type field in MySQL, ES field types are: core type, complex type, type, and geographic special type, specific type of data as shown below:
Core type
As it can be seen from the figure the core can be divided into type string type, numeric type, date type, a Boolean type, based on the BASE64 binary type, range type.
String type
Among them, there are two types of string in ES 7.x: text
and keyword
, after ES 5.x string
type is no longer supported.
text
Type applies to the fields that need to be full-text search, such as news text, message content and other long text text
types are Lucene word breaker (Analyzer) treated as a word item, and use Lucene inverted index storage, text fields can not be for sorting , if desired using this type of field only needs to specify JSON corresponding field when mapping is defined type
as text
.
keyword
Suitable short, the structure of strings, such as the host name, the name, trade name and the like, may be used for filtering, sorting, retrieving the polymerization, may also be used to query accurately .
Digital Type
Numeric types are divided long、integer、short、byte、double、float、half_float、scaled_float
.
Numeric field types in meeting the needs of a range should try to select a smaller data type, field length, the shorter, the higher the efficiency of the search for the floating-point number, A can be considered scaled_float
type, which may be floating point accuracy by scaling factor , 1234, for example, can be converted to 12.34 is stored.
Date Type
In the ES can date the following form:
- Date formatted string, for example, 2020-03-1700: 00,2020 / 03/17
- A time stamp (1970-01-01 00:00:00 UTC and the difference), in milliseconds or seconds
Even formatted date string, ES underlayer still uses a timestamp stored.
Boolean
JSON document also exists a Boolean type, a string type but JSON may also be converted to Boolean type ES storage, provided that the value is a string true
or a false
Boolean type commonly used in retrieving the filter.
Binary type
Binary type binary
accepted BASE64 encoded string, the default store
attribute false
, and may not be searched.
Range Type
Range interval is used to convey a type of data it can be divided into five kinds: integer_range、float_range、long_range、double_range
and date_range
.
Complex type
The main types of composite object types (object) and nested types (nested):
Object Types
JSON string allows nested objects, a plurality of documents can be nested, multilayer objects. The document may be stored by the two object types, but because there is no concept of internal and Lucene object, ES JSON original document will be flat, such as a document:
{
"name": {
"first": "wu",
"last": "px"
}
}
ES fact will convert it to this format, and stored by Lucene, even name
a object
type:
{
"name.first": "wu",
"name.last": "px"
}
Nested types
Nested types can be viewed as a special object type, you can make an array of objects independent retrieval, such as document:
{
"group": "users",
"username": [
{ "first": "wu", "last": "px"},
{ "first": "hu", "last": "xy"},
{ "first": "wu", "last": "mx"}
]
}
username
Field is a JSON array, and each array object is a JSON object. If you username
set the object type, then the ES will convert it to:
{
"group": "users",
"username.first": ["wu", "hu", "wu"],
"username.last": ["px", "xy", "mx"]
}
JSON can be seen in the converted document first
and last
the associated lost, if you try to search first
for the wu
, last
for the xy
document, then success will retrieve these documents, however wu
, and xy
does not belong to the same JSON objects in the original JSON document, should be a mismatch , that could not retrieve any results.
Nested types is to solve this problem, type each nested JSON object in the array as a separate document to store hidden, each nested objects can be searched independently, so although on the surface of the case only one document, but actually stores the four documents.
Geography Type
Geographical field is divided into two types: type and geographic latitude and longitude area type:
Type the latitude and longitude
Type the latitude and longitude fields (geo_point) may store latitude and longitude information by geographic type of field can be used to achieve such find within a specified geographic area related documents, sorted according to the distance, modified scoring rules based on geography and other needs.
Geo type
Latitude and longitude can be expressed as a dot type, and geo_shape
the type of a geographical area can be expressed, it may be any shape of a polygonal region may be a point, line, polygon, multi-point, multi-line, multi-faceted geometric types.
Special type
Special types include IP type, filter types, Join type, alias type, etc., where a brief introduction of the IP type Join types and other special types can view the official documentation.
IP type
IP type field may be used to store IPv4 or IPv6 addresses, if necessary stores the IP type of field, need to manually define the mapping:
{
"mappings": {
"properties": {
"my_ip": {
"type": "ip"
}
}
}
}
Join type
Join type is the type ES 6.x introduced to replace obsolete _parent
yuan field, the document used to implement one, one to many relationship, mainly used to make his son a query.
Mapping type Join follows:
PUT my_index
{
"mappings": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": "answer"
}
}
}
}
}
Wherein, my_join_field
for the Join type field name; relations
specify the relationship: question
is answer
the parent class.
Defined, for example as a parent document ID 1:
PUT my_join_index/1?refresh
{
"text": "This is a question",
"my_join_field": "question"
}
Next, define a sub document that specifies the parent document ID is 1:
PUT my_join_index/_doc/2?routing=1&refresh
{
"text": "This is an answer",
"my_join_field": {
"name": "answer",
"parent": "1"
}
}
After re-Complete understanding of the field data type, let us look at what is Dynamic Mapping?
What is Dynamic Mapping?
Dynamic Mapping mechanism so that we do not need to manually define Mapping, ES will automatically be judged according to the document type field right information , but sometimes the wrong projections, such as geographic information is likely to be judged Text
, when the type is set, if not, cause some features do not work properly, such as Range queries.
Automatic Identification Type
ES type of automatic identification is based on JSON format, if the input string is JSON format and date format, ES will automatically set Date
type; when the input string is a digital time, as a default string ES process , may be provided by converting suitable type; if the input is Text
the time field, ES will automatically increase keyword
subfields, some automatic identification as shown below:
Let's use an example to see how type is automatically recognized, enter the following request to create the index:
PUT /mapping_test/_doc/1
{
"uid": "123",
"username": "wupx",
"birth": "2020-03-16",
"married": false,
"age": 18,
"heigh": 180,
"tags": [
"java",
"boy"
],
"money": 999.9
}
Then GET /mapping_test/_mapping
view the results as shown below:
As can be seen from the results, ES will automatically calculate the appropriate type according to the document information.
Oh excluded, in case I want to modify Mapping field type, can change it? Let us two situations to explore follows:
Modify Mapping field type?
If you are newly added field, according to Dynamic settings are divided into the following three conditions:
- When set to Dynamic
true
, the new field once the document is written, Mapping also be updated. - When set to Dynamic
false
, the index of Mapping is not updated, the new field's data can not be indexed, that can not be searched, but the information will appear in the_source
middle. - When set to Dynamic
strict
, the document is written will fail.
Another field is already present, in this case, can not be modified ES type field, since ES is an inverted index Lucene implemented after once formed can not be modified, if desired to change the field type must be used rebuild the index Reindex API.
The reason can not be modified if the modified data type of the field, will result has been indexed and can not be searched, but if you add a new field, there would be no such effects.
to sum up
This paper describes Mapping and Dynamic Mapping, while the field type introduced in detail, also introduced in the ES is how to make projections of the field type, understand the Mapping of relevant parameters.
In the public No. [ Wupei Xuan ] reply [ es ] mind map and get the source code.
references
"Elasticsearch technical analysis and real."
Elastic Stack from entry to practice
Elasticsearch core technology and combat
https://www.elastic.co/guide/en/elasticsearch/reference/7.1/mapping.html