ElasticSearch7学习笔记之Mapping

背景

ES中的Mapping类似数据库中的schema,用来定义索引中的字段名称、数据类型以及配置字段和倒排索引相关信息

倒排索引

在学习ES中的映射之前,我们先学习一下ES中的倒排索引。

定义

倒排索引就是单词到文档id的关系,如下所示,左边是一个正排索引,右边就是一个单词到文档id的倒排索引:
在这里插入图片描述

核心组成

倒排索引包含两部分:单词词典:用B+树或哈希拉链法存储
倒排列表:由倒排索引项组成,包括文档id、词频tf、位置(所在文档id)、索引(单词在文档中的位置),如下所示是Elastic单词的倒排索引表:
在这里插入图片描述
es的json文档中的每个字段都有自己的倒排索引,可以指定某些字段不做索引,这么做可以节省磁盘空间,但也会导致这些字段不会被检索

ES中的数据类型

简单类型:Text/Keyword、Date、Integer/Floating、Boolean、IPv4 & IPv6
复杂类型:对象和嵌套
特殊类型:geo_point & geo_shape / percolator

DynamicMapping

写入文档时,会自动创建不存在的索引,所以我们无需手动定义Mapping。json类型在向es类型转换时的对应关系如下所示
在这里插入图片描述
插入一条数据后,查看相关Mapping:

PUT mapping_test/_doc/1
{
    
    
  "first_name":"song",
  "last_name": "ye",
  "login_date": "2020-07-11T16:01:48.182Z"
}

GET mapping_test/_mapping

输出如下:

{
    
    
  "mapping_test" : {
    
    
    "mappings" : {
    
    
      "properties" : {
    
    
        "first_name" : {
    
    
          "type" : "text",
          "fields" : {
    
    
            "keyword" : {
    
    
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "last_name" : {
    
    
          "type" : "text",
          "fields" : {
    
    
            "keyword" : {
    
    
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "login_date" : {
    
    
          "type" : "date"
        }
      }
    }
  }
}

可见"login_date"字段自动被定义为date类型

布尔类型的值要是加上"",默认就是一个text;不加""时才是boolean:

PUT mapping_test/_doc/1
{
    
    
  "first_name":"song",
  "last_name": "ye",
  "login_date": "2020-07-11T16:01:48.182Z",
  "is_vip": "true",
  "is_admin": true
}

GET mapping_test/_mapping

输出如下:

{
    
    
    "mapping_test": {
    
    
        "mappings": {
    
    
            "properties": {
    
    
                "first_name": {
    
    
                    "type": "text", 
                    "fields": {
    
    
                        "keyword": {
    
    
                            "type": "keyword", 
                            "ignore_above": 256
                        }
                    }
                }, 
                "is_admin": {
    
    
                    "type": "boolean"
                }, 
                "is_vip": {
    
    
                    "type": "text", 
                    "fields": {
    
    
                        "keyword": {
    
    
                            "type": "keyword", 
                            "ignore_above": 256
                        }
                    }
                }, 
                "last_name": {
    
    
                    "type": "text", 
                    "fields": {
    
    
                        "keyword": {
    
    
                            "type": "keyword", 
                            "ignore_above": 256
                        }
                    }
                }, 
                "login_date": {
    
    
                    "type": "date"
                }
            }
        }
    }
}

能否更改Mapping的字段类型

对于新增字段,如果Dynamic为true,那么Mapping会随着字段的增加而修改;如果Dynamic为false,那么Mapping不会被更新,新增字段无法被索引,但是信息会出现在_source中;Dynamic为strict的话,文档写入失败
对于已有字段,一旦已经有数据写入,就不再支持修改Mapping类型。Lucene实现的倒排索引,一旦生成后,就不允许修改。
如果希望改变字段类型,必须使用Reindex API来重建索引。

这么做的原因很简单:如果修改了字段类型,会导致已建立索引的属性无法被检索,而对新增字段就不会有这样的影响

dynamic为false

下面是把dynamic设置为false后的示例,加入新字段f1就不会被索引到:

PUT mapping_test/_mapping
{
    
    
  "dynamic": false
}

PUT mapping_test/_doc/2
{
    
    
  "f1": "v1"
}

POST mapping_test/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "f1": "v1"
    }
  }
}

输出如下:

{
    
    
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

dynamic为strict

下面是把dynamic设置为strict后,加入新字段的例子:

PUT mapping_test/_mapping
{
    
    
  "dynamic": "strict"
}

PUT mapping_test/_doc/2
{
    
    
  "f2":"v2"
}

输出如下,会发现不允许添加新的字段:

{
    
    
  "error" : {
    
    
    "root_cause" : [
      {
    
    
        "type" : "strict_dynamic_mapping_exception",
        "reason" : "mapping set to strict, dynamic introduction of [f2] within [_doc] is not allowed"
      }
    ],
    "type" : "strict_dynamic_mapping_exception",
    "reason" : "mapping set to strict, dynamic introduction of [f2] within [_doc] is not allowed"
  },
  "status" : 400
}

注意dynamic的值不会影响添加或修改已有字段值,比如上面的例子把f1和f2改成is_vip后就不会有问题

自定义Mapping

我们可以自定义文档的Mapping,可以参考API手册。下面是自定义文档Mapping的例子:

定义字段可否被检索

index字段用来指定字段能否被检索到:

PUT users
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "first_name": {
    
    
        "type": "text"
      },
      "second_name" : {
    
    
        "type": "text"
      },
      "mobile": {
    
    
        "type": "text",
        "index": false
      }
    }
  }
}

然后插入数据,并对mobile字段进行检索,会直接报错400:

PUT users/_doc/1
{
    
    
  "first_name": "song",
  "second_name": "ye",
  "mobile": "123444"
}


POST users/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "mobile": "123444"
    }
  }
}

空值响应

下面是对keyword类型的字段mobile设置空值对应,这样json中的null就对应文档中的"NULL":

PUT users
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "first_name": {
    
    
        "type": "text"
      },
      "second_name" : {
    
    
        "type": "text"
      },
      "mobile": {
    
    
        "type": "keyword",
        "null_value": "NULL"
      }
    }
  }
}

然后插入两条数据,检索mobile为null的记录,这样会把song gong检索出来:

PUT users/_doc/1
{
    
    
  "first_name": "song",
  "second_name": "ye"
}


PUT users/_doc/2
{
    
    
  "first_name": "song",
  "second_name": "gong",
  "mobile": null
}


POST users/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "mobile": "NULL"
    }
  }
}

copy_to字段拼接

下面是copy_to的示例,copy_to用来把多个字段拼接到新字段中,这个新字段不会在_source中存在,只能用来做查询:

PUT users
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "first_name": {
    
    
        "type": "text",
        "copy_to": "full_name"
      },
      "second_name" : {
    
    
        "type": "text",
        "copy_to": "full_name"
      }
    }
  }
}

插入数据后查询,会得到那唯一的结果:

PUT users/_doc/1
{
    
    
  "first_name": "song",
  "second_name": "ye"
}

GET users/_search?q=full_name:(song ye)

输出如下,可见_source中没有full_name字段:

{
    
    
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    
    
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    
    
    "total" : {
    
    
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.5753642,
    "hits" : [
      {
    
    
        "_index" : "users",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
    
    
          "first_name" : "song",
          "second_name" : "ye"
        }
      }
    ]
  }
}

IndexTemplate

索引模板可以帮助我们设定Mapping和设置,并按照一定规则,自动匹配到新建的索引上。
模板仅在一个索引被创建时起作用,修改模板不会影响已有的索引
可以设置多个索引模板,这些设置会被合并在一起,这个过程可以通过指定order来控制
默认设置:
对所有索引的默认设置如下,将其分片和副本数都设置为1:

PUT _template/template_default
{
    
    
  "index_patterns": ["*"],
  "order": 0,
  "version": 1,
  "settings": {
    
    
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

更新模板

对以test打头的索引设置以下模板,副本数为2,关闭日期探测,开启数字探测:

PUT /_template/template_test
{
    
    
  "index_patterns": ["test*"],
  "order": 1,
  "settings": {
    
    
    "number_of_shards": 1,
    "number_of_replicas": 2
  }
  , "mappings": {
    
    
    "date_detection": false,
    "numeric_detection": true
  }
}

查看模板

可以通过以下api进行查看模板:

GET /_template/template_default
GET /_template/template_test

然后再给testtemplate文档插入一条数据,查看其Mapping:

PUT testtemplate/_doc/2
{
    
    
  "someNumber": "1",
  "someDate": "2020/07/11"
}

GET testtemplate/_mapping/

输出如下,可见日期变成了文本,数字则转换成了长整型:

{
    
    
  "testtemplate" : {
    
    
    "mappings" : {
    
    
      "date_detection" : false,
      "numeric_detection" : true,
      "properties" : {
    
    
        "someDate" : {
    
    
          "type" : "text",
          "fields" : {
    
    
            "keyword" : {
    
    
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "someNumber" : {
    
    
          "type" : "long"
        }
      }
    }
  }
}

再看看设置

GET testtemplate/_settings

输出如下,可见副本数为2,分片数为1:

{
    
    
  "testtemplate" : {
    
    
    "settings" : {
    
    
      "index" : {
    
    
        "creation_date" : "1594468782269",
        "number_of_shards" : "1",
        "number_of_replicas" : "2",
        "uuid" : "uQ7AzTYQRGyrxKueUideOA",
        "version" : {
    
    
          "created" : "7060099"
        },
        "provided_name" : "testtemplate"
      }
    }
  }
}

索引模板给出的设置可以被显示的覆盖:

PUT testtemplate2/
{
    
    
  "settings": {
    
    
    "number_of_replicas": 4
  }
}

GET testtemplate2/_settings

输出则显示副本数为4:

{
    
    
    "testtemplate2": {
    
    
        "settings": {
    
    
            "index": {
    
    
                "creation_date": "1594469098371", 
                "number_of_shards": "1", 
                "number_of_replicas": "4", 
                "uuid": "kqjdOulBQs6K6nKzMfyqyA", 
                "version": {
    
    
                    "created": "7060099"
                }, 
                "provided_name": "testtemplate2"
            }
        }
    }
}

DynamicTemplate

根据es识别的数据类型,结合字段名称动态设置字段类型。
动态模板是定义在某个索引的Mapping中,需要有一个名称,匹配规则是一个数组,需要为待匹配字段设置mapping

举例如下,把is打头的string映射成布尔,把其他string映射成keyword类型:

PUT my_index
{
    
    
  "mappings": {
    
    
    "dynamic_templates":[
      {
    
    
        "string_as_boolean":{
    
     "match_mapping_type":  "string",
          "match": "is*",
          "mapping": {
    
    
            "type": "boolean"
          }
        }
      },
      {
    
    
        "string_as_keywords":{
    
    
          "match_mapping_type": "string",
          "match": "*",
          "mapping": {
    
    
            "type": "keyword"
          }
        }
      }
      ]
  }
}

输入数据测试一下:

PUT my_index/_doc/1
{
    
    
  "is_vip": "true",
  "name": "szc"
}

GET my_index/_mapping

输出如下,可见is_vip成了布尔类型,name成了keyword类型:

{
    
    
  "my_index" : {
    
    
    "mappings" : {
    
    
      "dynamic_templates" : [
        {
    
    
          "string_as_boolean" : {
    
    
            "match" : "is*",
            "match_mapping_type" : "string",
            "mapping" : {
    
    
              "type" : "boolean"
            }
          }
        },
        {
    
    
          "string_as_keywords" : {
    
    
            "match" : "*",
            "match_mapping_type" : "string",
            "mapping" : {
    
    
              "type" : "keyword"
            }
          }
        }
      ],
      "properties" : {
    
    
        "is_vip" : {
    
    
          "type" : "boolean"
        },
        "name" : {
    
    
          "type" : "keyword"
        }
      }
    }
  }
}

下面这个例子用来把name下除了middle之外的属性转换成text类型,再复制给full_name搜索属性下,也就是full_name只能用于检索:

PUT my_index
{
    
    
  "mappings": {
    
    
    "dynamic_templates":[
      {
    
    
        "full_name": {
    
    
          "path_match": "name.*",
          "path_unmatch": "*.middle",
          "mapping": {
    
    
            "type": "text",
            "copy_to": "full_name"
          }
        }
      }
      ]
  }
}

使用如下所示,可以通过full_name:s或c查到这个记录,但是检索middle值就不行:

PUT my_index/_doc/1
{
    
    
  "name": {
    
    
    "first": "s",
    "middle": "z",
    "last": "c"
  }
}

GET my_index/_search?q=full_name:s

猜你喜欢

转载自blog.csdn.net/qq_37475168/article/details/122141604