Elasticsearch Reference【6.1】Mapping

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. For instance, use mappings to define:

which string fields should be treated as full text fields.
which fields contain numbers, dates, or geolocations.
whether the values of all fields in the document should be indexed into the catch-all _all field.
the format of date values.
custom rules to control the mapping for dynamically added fields.

  映射是定义一个文档和他所包含的字段是如何存储和索引的过程。例如,使用映射来定义:

    》哪些字符串应该被视为一个整体、不被分割

    》哪些字段包含数字、日期或者地理信息

    》文档中的字段值是否应该被索引到匹配所有的域中

    》日期值的格式

    》自定义规则控制动态添加字段的映射

1

2

3

4

5

6

7

8

9

10

Mapping Type

Each index has one mapping type which determines how the document will be indexed.

[6.0.0] Deprecated in 6.0.0. See Removal of mapping types <br>

A mapping type has:

1、Meta-fields

  Meta-fields are used to customize how a document’s metadata associated is treated. Examples of

  meta-fields include the document’s _index, _type, _id, and _source fields.

2、Fields or properties

  A mapping type contains a list of fields or properties pertinent to the documen

 映射类型

    每个索引会有一种决定他如何被索引的映射类型

    【6.0.0】中弃用, 查询弃用的映射类型

    映射的类型有:

    1、原域:原域被用来定义如何处理文档中的元数据,原域包括文档索引、类型、_id字段和_source域

    2、域或属性:映射类型包含文档相关的域或属性的列表

Field datatypesedit
Each field has a data type which can be:

1、a simple type like text, keyword, date, long, double, boolean or ip.
2、a type which supports the hierarchical nature of JSON such as object or nested.
3、or a specialised type like geo_point, geo_shape, or completion.

It is often useful to index the same field in different ways for different purposes. For instance, a 
string field could be indexed as a text field for full-text search, and as a keyword field for sorting or 
aggregations. Alternatively, you could index a string field with the standard analyzer, the english analyzer, and the french analyzer.

This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.

  域的数据类型

    每个域都有一个数据类型,功能如下:

    1、简单类型如text(默认分词还有一个不分词的.keywork)、keyword、date、long、double、boolean、ip

    2、支持json分层性质的object或nested(嵌套的实体可以独立被搜索)

    3、还有比较专业的geo_point, geo_shape, 和 completion类型

  为了不同的目的,用不同的方法索引相同的字段通常是有用的。例如,可以将字符串字段索引为全文搜索的文本字段,并作为排序或聚合的关键字字段。或者,也可以使用标准分析器、英语分析器和法语分析器对字符串字段进行索引。

  这是多领域的目的。大多数数据类型通过字段参数支持多字段。

Settings to prevent mappings explosione

Defining too many fields in an index is a condition that can lead to a mapping explosion, which can cause out of memory errors and 
difficult situations to recover from. This problem may be more common than expected. As an example, consider a situation in which every new 
document inserted introduces new fields. This is quite common with dynamic mappings. Every time a document contains new fields, those will end
 up in the index’s mappings. This isn’t worrying for a small amount of data, but it can become a problem as the mapping grows. The following settings 
allow you to limit the number of field mappings that can be created manually or dynamically, in order to prevent bad documents from causing a mapping 
explosion:

index.mapping.total_fields.limit
  The maximum number of fields in an index. The default value is 1000.
index.mapping.depth.limit
  The maximum depth for a field, which is measured as the number of inner objects. For instance, if all fields are defined at the root object level,
 then the depth is 1. If there is one object mapping, then the depth is 2, etc. The default is 20.
index.mapping.nested_fields.limit
  The maximum number of nested fields in an index, defaults to 50. Indexing 1 document with 100 nested fields actually indexes 101 documents as each 
nested document is indexed as a separate hidden document.

  防止映射爆炸

    在索引中定义太多字段会导致映射爆炸,这可能导致内存错误和难以恢复的情况。这个问题可能比预期的更普遍。例如,考虑一种情况,其中插入的每个新文档都引入了新字段。这在动态映射中非常常见。每当文档包含新字段时会结束索引的映射。对于少量数据来说,这并不令人担心,但随着映射的增长,这可能成为一个问题。以下设置允许您限制自动或手动生成fiel的数量,防止映射explosin破坏文档:

  index.mapping.total_fields.limit:索引中字段的最大数量。默认值是1000

  index.mapping.depth.limit:一个字段的最大深度,它被测量为内部对象的数量。例如,如果所有字段都在根对象级别定义,那么深度为1。如果有一个对象映射,那么深度是2,等等。默认值是20。

  index.mapping.nested_fields.limit:索引中嵌套字段的最大数量,默认为50。索引一个包含100个嵌套字段的文档实际上索引了101个文档,因为每个嵌套的文档被索引为一个单独的隐藏文档。

Dynamic mappingedit
Fields and mapping types do not need to be defined before being used. Thanks to dynamic mapping, new field names will be added automatically, just by indexing a document. New fields can be added both to the top-level mapping type, and to inner object and nested fields.

The dynamic mapping rules can be configured to customise the mapping that is used for new fields.

Explicit mappingsedit
You know more about your data than Elasticsearch can guess, so while dynamic mapping can be useful to get started, at some point you will want to specify your own explicit mappings.

You can create field mappings when you create an index, and you can add fields to an existing index with the PUT mapping API.

  动态映射:

    字段和映射类型在使用之前不需要定义。由于动态映射,,只需对文档进行索引就将自动添加新的字段名。可以将新字段添加到顶级映射类型和内部对象和嵌套字段中。

    动态映射规则可以为新字段自定义映射。

  明确映射:

    你对数据的了解比elasticsearch更大,因此,虽然动态映射可能很有用,但在某些情况下,需要指定自己的显式映射。

    你可以在创建索引时创建字段映射,并且可以使用PUT mapping API将字段添加到现有索引。

Updating existing field mappings

Other than where documented, existing field mappings cannot be updated. Changing the mapping would mean invalidating already indexed documents. Instead, you should create a new index with the correct mappings and reindex your data into that index.

  更新现有字段映射

    除了文档记录之外,不能更新现有的字段映射,改变映射会使原来的索引失效。相反,你应该创建一个新的索引与正确的映射来重建你的索引

Example mapping
A mapping could be specified when creating an index, as follows:
    创建索引时映射应该被指定
PUT my_index 
{
  "mappings": {
    "doc": { 
      "properties": { 
        "title":    { "type": "text"  }, 
        "name":     { "type": "text"  }, 
        "age":      { "type": "integer" },  
        "created":  {
          "type":   "date", 
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    }
  }
}

猜你喜欢

转载自my.oschina.net/u/3655192/blog/1784690