ElasticSearch version control and create Mapping

A version control

Elasticsearch using optimistic locking to ensure data consistency, that is, when the user document (the document that is a relational database table data) to operate, do not need to lock the document, the unlocking operation, only specified version can be operated. When the same version number, Elasticsearch will allow the operation to proceed smoothly, and when there is a conflict version, Elasticsearch will prompt conflict and throws an exception (VersionConflictEngineException).

1.1 Internal Control

Lib PUT / Blog / 2 ? Version = 6 

{ 

  " the above mentioned id " : 4 , 

  " title " : " Regular grammar " , 

  " Content " : " static factory, learning record " , 

  " postdate " : " 2038-08-11 " , 

  " URL " : " http://192.168.95.4:5601/app/kibana#/dev_tools/console?_g= () " 

}

 

Automatic version 1

 

 

 

Elasticsearch version number in the range of 1 to 2 ^ 63-1.

    • Internal version control: When using the _version.

1.2 External Control

External Version Control: elasticsearch in dealing with external version number and build number of the process is somewhat different. It is no longer to check whether the same version of the data specified in the request, but check if the current version is smaller than the specified value. If the request succeeds, the external version number will be stored in the document _version.

In keeping with the external data _version version control, using version_type = external specifies which version is an external control

 

 

 

 

II. Self-built Mapping

2.1 What is mapped (similar to the construction of the table in the database table specifies the type of each attribute, length, etc.)

The definition of an index (index) of a certain type (type) of the structure of the data, mapping defines the time how the data type of each field in these fields as well as word and other related properties, create an index, type field can be defined in advance US properties and phase, so it can clear the date into Japanese word, the digital word processed into digital stocks, the value of a character string like hanging processing field data types supported.

2.2 ES comes with the mapping query

get lib/blog/_mapping

It can be seen, the type field specified when created, in ES first field will be added to form the corresponding field type, such as time, only the first time a join time format, generating the type is a date.

Create a mapping 2.3 Manual

PUT /my_index1
{
  "settings": {
     "number_of_shards": 3,     # 是数据分片数
     "number_of_replicas": 0    # 是数据备份数
  },
  "mappings": {
     "books": {
        "properties": {
            "title": {"type": "text"},
            "name": {"type": "text","analyzer":"standard"},
            "publish_date": {"type": "date", "index": "false"},
            "price": {"type": "double"},
            "number": {"type": "integer"}
        }
     }
  }
}


PUT pd_template

{

  "mappings": {

    "log": {

      "properties": {

        "date":{

            "type": "date",

            "format": "yyyy-MM-dd H:mm:ss"

            },

        "cost":{

"type": "double"

},

"status":{

    "type": "integer"

    }

      }

    }

  }

 

 

 

2.4 elasticsearch时支持数据类型的。

2.4.1 核心类型(Core datatype)

字符串:string,string类型包含 text 和 keyword。

text:该类型被用来索引长文本,在创建索引前会将这些文本进行分词,转化为词的组合,建立索引;允许es来检索这些词,text类型不能用来排序和聚合。

keyword:该类型不需要进行分词,可以被用来检索过滤、排序和聚合,keyword类型自读那只能用本身来进行检索(不可用text分词后的模糊检索)。

数字型:long、integer、short、byte、double、float

日期型:date

布尔型*:boolean

二进制型*:binary

除字符串类型以外,其他类型必须要进行精确查询,因为除字符串外其他类型不会进行分词。

2.4.2 数组类型(Array datatype),数组类型不需要专门指定数组元素的type,例如:

字符型数组:[“one”,“two”]

整型数组*:[1, 2]

数组型数组:[1, [2, 3]] 等价于 [1, 2, 3]

对象数组:[{“name”: “Mary”, “age”: 12}, {“name”: “John”, “age”: 10}]

对象类型(Object datatype):object 用于单个Json对象

2.4.3 地理位置类型(Geo datatypes)

地理坐标类型(Geo-point datatype)geo_point 用于经纬度坐标

地理形状类型(Geo-Shape datatype)geo_shape 用于类似于多边形的复杂形状

2.4.4 特定类型(Specialised datatypes)

IPv4 类型(IPv4 datatype):ip 用于IPv4 地址

Completion 类型(Completion datatype):completion 提供自动补全建议

Token count 类型(Token count datatype):token_count 用于统计做子标记的字段的index数目,该值会一直增加,不会因为过滤条件而减少

mapper-murmur3 类型:通过插件,可以通过murmur3来计算index的哈希值

附加类型(Attachment datatype):采用mapper-attachments插件,可支持attachments索引,例如 Microsoft office 格式,Open Documnet 格式, ePub,HTML等

2.5 Mapping 支持属性

enabled:仅存储、不做搜索和聚合分析。

​ "enabled":true (缺省)| false

index:是否构建倒排索引(即是否分词,设置false,字段将不会被索引)。

​ "index": true(缺省)| falseindex_option:存储倒排索引的哪些信息4个可选参数 docs:索引文档号 freqs:文档号+词频 positions:文档号+词频+位置,通常用来距离查询 offsets:文档号+词频+位置+偏移量,通常被使用在高亮字段 分词字段默认时positions,其他默认时docs。

​ "index_options": "docs"

norms:是否归一化相关参数、如果字段仅用于过滤和聚合分析、可关闭分词字段默认配置,不分词字段:默认{“enable”: false},存储长度因子和索引时boost,建议对需要参加评分字段使用,会额外增加内存消耗 "norms": {"enable": true, "loading": "lazy"}

doc_value:是否开启doc_value,用户聚合和排序分析对not_analyzed字段,默认都是开启,分词字段不能使用,对排序和聚合能提升较大性能,节约内存 "doc_value": true(缺省)| false

fielddata:是否为text类型启动fielddata,实现排序和聚合分析针对分词字段,参与排序或聚合时能提高性能,不分词字段统一建议使用doc_value "fielddata": {"format": "disabled"}

store:是否单独设置此字段的是否存储而从_source字段中分离,只能搜索,不能获取值 "store": false(默认)| true

coerce:是否开启自动数据类型转换功能,比如:字符串转数字,浮点转整型"coerce: true(缺省)| false"

multifields:灵活使用多字段解决多样的业务需求

dynamic:控制mapping的自动更新

"dynamic": true(缺省)| false | strict

data_detection:是否自动识别日期类型data_detection:true(缺省)| false

dynamic和data_detection的详解:Elasticsearch dynamic mapping(动态映射) 策略

analyzer:指定分词器,默认分词器为standard analyzer "analyzer": "ik"

boost:字段级别的分数加权,默认值是1.0 "boost": 1.23

fields:可以对一个字段提供多种索引模式,同一个字段的值,一个分词,一个不分词 "fields": {"raw": {"type": "string", "index": "not_analyzed"}}

ignore_above:超过100个字符的文本,将会被忽略,不被索引 "ignore_above": 100

include_in_all:设置是否此字段包含在_all字段中,默认时true,除非index设置成no "include_in_all": true

null_value:设置一些缺失字段的初始化,只有string可以使用,分词字段的null值也会被分词 "null_value": "NULL"

position_increament_gap:影响距离查询或近似查询,可以设置在多值字段的数据上或分词字段上,查询时可以指定slop间隔,默认值时100 "position_increament_gap": 0

search_analyzer:设置搜索时的分词器,默认跟analyzer是一致的,比如index时用standard+ngram,搜索时用standard用来完成自动提示功能 "search_analyzer": "ik"

similarity:默认时TF/IDF算法,指定一个字段评分策略,仅仅对字符串型和分词类型有效 "similarity": "BM25"

trem_vector:默认不存储向量信息,支持参数yes(term存储),with_positions(term+位置),with_offsets(term+偏移量),with_positions_offsets(term+位置+偏移量)对快速高亮fast vector highlighter能提升性能,但开启又会加大索引体积,不适合大数据量用 "trem_vector": "no"

 

 

 

Guess you like

Origin www.cnblogs.com/KdeS/p/12010121.html