es supports most data types in java:
(1) core data types:
(1) string: will be segmented by default, a complete example is as follows
- "status": {
- "type" : "string" , //string type
- "index" : "analyzed" // word segmentation, regardless of word segmentation is: not_analyzed, set to no, the field will not be indexed
- "analyzer" : "ik" //Specify the tokenizer
- "boost" : 1.23 // field-level score weighting
- "doc_values" : false //For not_analyzed fields, they are enabled by default, and word segmentation fields cannot be used. Sorting and aggregation can improve performance and save memory
- "fielddata" :{ "format" : "disabled" } //For word segmentation fields, it can improve performance when participating in sorting or aggregation. It is recommended to use doc_value regardless of word segmentation fields
- "fields" :{ "raw" :{ "type" : "string" , "index" : "not_analyzed" }} //You can provide multiple index modes for a field, the value of the same field, a token, a not Participle
- "ignore_above" : 100 //Text with more than 100 characters will be ignored and not indexed
- "include_in_all" :ture //Set whether this field is included in the _all field, the default is true, unless the index is set to the no option
- "index_options" : "docs" //4 optional parameters docs (index document number), freqs (document number + word frequency), positions (document number + word frequency + position, usually used for distance query), offsets (document number + Word frequency + position + offset, usually used in the highlight field) The default of the word segmentation field is position, and the default of others is docs
- "norms" :{ "enable" : true , "loading" : "lazy" } //The default configuration of word segmentation field, no word segmentation field: default {"enable": false}, boost when storing length factor and index, it is recommended to Participating in the use of scoring fields will increase memory consumption
- "null_value" : "NULL" //Set the initialization value of some missing fields, only strings can be used, and the null value of the word segmentation field will also be word segmentation
- "position_increament_gap" : 0 //Affects distance query or approximate query, which can be set on the data fire word segmentation field of the multi-value field, and the slop interval can be specified when querying, the default value is 100
- "store" : false //Whether this field is stored separately and separated from the _source field, the default is false, it can only be searched, and the value cannot be obtained
- "search_analyzer" : "ik" //Set the tokenizer when searching, which is the same as the ananlyzer by default. For example, use standard+ngram when indexing, and use standard when searching to complete the automatic prompt function
- "similarity" : "BM25" //The default is TF/IDF algorithm, specify a field scoring strategy, only valid for string type and word segmentation type
- "term_vector" : "no" //Vector information is not stored by default, supports parameters yes (term storage), with_positions (term + position), with_offsets (term + offset), with_positions_offsets (term + position + offset) for fast highlighting Fast vector highlighter can improve performance, but opening it will increase the index volume, which is not suitable for large data volumes.
- }
(2) The main types of numbers are as follows:
long: 64-bit storage
integer: 32-bit storage
short: 16-bit storage
byte: 8-bit storage
double: 64-bit double-precision storage
float: 32-bit single-precision storage
Support parameters:
- coerce:true/false 如果数据不是干净的,将自动会将字符串转成合适的数字类型,字符串会被强转成数字,浮点型会被转成整形,经纬度会被转换为标准类型
- boost:索引时加权因子
- doc_value:是否开启doc_value
- ignore_malformed:false(错误的数字类型会报异常)true(将会忽略)
- include_in_all:是否包含在_all字段中
- index:not_analyzed默认不分词
- null_value:默认替代的数字值
- precision_step:16 额外存储对应的term,用来加快数值类型在执行范围查询时的性能,索引体积相对变大
- store:是否存储具体的值
(3)复合类型
数组类型:没有明显的字段类型设置,任何一个字段的值,都可以被添加0个到多个,要求,他们的类型必须一致:
对象类型:存储类似json具有层级的数据
嵌套类型:支持数组类型的对象Aarray[Object],可层层嵌套
(4)地理类型
geo-point类型: 支持经纬度存储和距离范围检索
geo-shape类型:支持任意图形范围的检索,例如矩形和平面多边形
(5)专用类型
ipv4类型:用来存储IP地址,es内部会转换成long存储
completion类型:使用fst有限状态机来提供suggest前缀查询功能
token_count类型:提供token级别的计数功能
mapper-murmur3类型:安装sudo bin/plugin install mapper-size插件,可支持_size统计_source数据的大小
附件类型:需要https://github.com/elastic/elasticsearch-mapper-attachments开源es插件支持,可存储office,html等类型
(6)多值字段:
一个字段的值,可以通过多种分词器存储,使用fields参数,支持大多数es数据类型
(二)Mapping 参数列表,上面文章出现过的不再解释:
序号 | 名称 | 解释 |
1 | copy_to | 与solr里面的copy_field字段功能一样,支持拷贝某个字段的值到集中的一个字段里面 |
2 | properties | mapping type, object fields and nested fields can contain subfields, and these properties can be added, examples are as follows |
Official website documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html#_multi_fields_2
http://qindongliang.iteye.com/blog/2259541