The difference between keyword and text types in ElasticSearch and fuzzy query

Reference articles:
https://blog.csdn.net/sfh2018/article/details/118083634
https://blog.csdn.net/w1014074794/article/details/119643883

Introduction to text and keyword types

  • ES5.0 and later versions cancel stringtypes and split the original stringtypes into two types: textand . keywordThe difference is text会对字段进行分词处理而keyword则不会进行分词.
    That is to say, if the field is of text type, the stored data will be segmented first, and then the segmented phrases will be stored in the index, while keywords will not be segmented and will be stored directly.
  • textTypes of data are used to index long texts, such as the body of an email or an introduction to a product. These texts are analyzed and segmented into phrases by a word segmenter before the index document is created. After the word segmentation mechanism, es allows retrieval of the words segmented into the text 但是text类型的数据不能用来过滤、排序和聚合等操作.
  • keywordThis type of data can meet the requirements of data such as email addresses, host names, status codes, postal codes, and labels. It does not perform word segmentation and is often used for filtering, sorting, and aggregation.

How does elasticsearch accurately match text fields?

Multiple type configurations for the same field

Create an index and add the alias raw to the city field through the fields keyword in the mapping. The type is keyword, which is used for exact matching and sorting.

  • Create index
PUT test_index03
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "name": {
    
    
        "type": "keyword"
      },
      "city": {
    
    
        "type": "text",
        "fields": {
    
    
          "raw": {
    
    
            "type": "keyword"
          }
        }
      }
    }
  }
}
  • adding data
PUT /test_index03/_doc/1
 {
    
    
  "name" : "叶子在这儿",
  "city" : "陕西省西安市长安区"
 }
PUT /test_index03/_doc/2
 {
    
    
   "name":"北京的小家",
   "city":"北京市昌平区回龙观街道"
 }
  • Precise query (use alias to perform precise query)
GET /test_index03/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    
          "term": {
    
    
            "city.raw": {
    
    
              "value": "陕西省西安市长安区"
            }
          }
        }
      ]
    }
  },
  "sort": {
    
    
    "city.raw": "asc"
  },
  "aggs": {
    
    
    "Cities": {
    
    
      "terms": {
    
    
        "field": "city.raw"
      }
    }
  }
}

Configuration of multiple word segmentation rules for the same field

For the field text, the standard analyzer word separator is used by default;
when the alias english is declared through fields, the english word separator is used.

PUT test_index04
{
    
    
  "mappings": {
    
    
    "properties": {
    
    
      "city": {
    
     
        "type": "text",
        "fields": {
    
    
          "english": {
    
     
            "type":     "text",
            "analyzer": "english"
          }
        }
      }
    }
  }
}
  • adding data
PUT test_index03/_doc/1
{
    
     "text": "quick brown fox" } 

PUT test_index03/_doc/2
{
    
     "text": "quick brown foxes" } 
  • Inquire

Use multi_match multi-field matching query to achieve multiple word segmentation rules retrieval in one field.

GET /test_index03/_search
{
    
    
  "query": {
    
    
    "multi_match": {
    
    
      "query": "quick brown foxes",
      "fields": [
        "text",
        "text.english"
      ],
      "type": "most_fields"
    }
  }
}

Guess you like

Origin blog.csdn.net/weixin_43824520/article/details/126860414