Reference articles:
https://blog.csdn.net/sfh2018/article/details/118083634
https://blog.csdn.net/w1014074794/article/details/119643883
Introduction to text and keyword types
- ES5.0 and later versions cancel
string
types and split the originalstring
types into two types:text
and .keyword
The difference istext会对字段进行分词处理而keyword则不会进行分词
.
That is to say, if the field is of text type, the stored data will be segmented first, and then the segmented phrases will be stored in the index, while keywords will not be segmented and will be stored directly.text
Types of data are used to index long texts, such as the body of an email or an introduction to a product. These texts are analyzed and segmented into phrases by a word segmenter before the index document is created. After the word segmentation mechanism, es allows retrieval of the words segmented into the text但是text类型的数据不能用来过滤、排序和聚合等操作
.keyword
This type of data can meet the requirements of data such as email addresses, host names, status codes, postal codes, and labels. It does not perform word segmentation and is often used for filtering, sorting, and aggregation.
How does elasticsearch accurately match text fields?
Multiple type configurations for the same field
Create an index and add the alias raw to the city field through the fields keyword in the mapping. The type is keyword, which is used for exact matching and sorting.
- Create index
PUT test_index03
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"city": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
}
- adding data
PUT /test_index03/_doc/1
{
"name" : "叶子在这儿",
"city" : "陕西省西安市长安区"
}
PUT /test_index03/_doc/2
{
"name":"北京的小家",
"city":"北京市昌平区回龙观街道"
}
- Precise query (use alias to perform precise query)
GET /test_index03/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"city.raw": {
"value": "陕西省西安市长安区"
}
}
}
]
}
},
"sort": {
"city.raw": "asc"
},
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}
Configuration of multiple word segmentation rules for the same field
For the field text, the standard analyzer word separator is used by default;
when the alias english is declared through fields, the english word separator is used.
PUT test_index04
{
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"english": {
"type": "text",
"analyzer": "english"
}
}
}
}
}
}
- adding data
PUT test_index03/_doc/1
{
"text": "quick brown fox" }
PUT test_index03/_doc/2
{
"text": "quick brown foxes" }
- Inquire
Use multi_match multi-field matching query to achieve multiple word segmentation rules retrieval in one field.
GET /test_index03/_search
{
"query": {
"multi_match": {
"query": "quick brown foxes",
"fields": [
"text",
"text.english"
],
"type": "most_fields"
}
}
}