match

根据定义的分词器（默认standard）对搜索词进行拆分，根据拆分结果逐个进行匹配。特点是可以查出大量可能相关联的数据，但是准确率低。

PUT my_index/_doc/1?refresh
{
    
    
  "message": "this is a test"
}
PUT my_index/_doc/2?refresh
{
    
    
  "message": "this is a dog"
}
PUT my_index/_doc/3?refresh
{
    
    
  "message": "i like the dog"
}

GET my_index/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "message": {
    
    
        "query": "i hate dog"
      }
    }
  }
}

由于使用的是默认standard分词器，i hate dog会被拆分成i、hate、dog。只要包含这3个单词的其中1个，都会被返回：

[
  {
    
    
    "_index" : "my_index",
    "_type" : "_doc",
    "_id" : "3",
    "_score" : 1.8467541,
    "_source" : {
    
    
      "message" : "i like the dog"
    }
  },
  {
    
    
    "_index" : "my_index",
    "_type" : "_doc",
    "_id" : "2",
    "_score" : 0.6747451,
    "_source" : {
    
    
      "message" : "this is a dog"
    }
  }
]

可以通过调整以下参数，来控制分词匹配条件：

参数	说明
analyzer	指定对搜索词进行拆分的分词器，默认standard
operator	搜索词拆分后匹配的条件，可选值：or、and，默认or
minimum_should_match	最少匹配到几个分词，才返回结果

match_phrase

短语匹配，同样会对搜索词拆分，但是所有拆分结果都必须包含，并且顺序一致，中间没有插入其他词语。特点是准确率高，但是最终匹配结果集较小。

PUT my_index/_doc/1?refresh
{
    
    
  "message": "it is a java book"
}
PUT my_index/_doc/2?refresh
{
    
    
  "message": "it is a java reading book"
}

GET my_index/_search
{
    
    
  "query": {
    
    
    "match_phrase": {
    
    
      "message": {
    
    
        "query": "java book"
      }
    }
  }
}

对于匹配java book的文档，下面的条件必须同时为true：

java、book 必须同时匹配
book的位置必须比java的位置大1

因此，返回值如下：

"hits" : [
  {
    
    
    "_index" : "my_index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 0.37872806,
    "_source" : {
    
    
      "message" : "it is a java book"
    }
  }
]

把搜索词反向排列，假如我们搜索的是 book java，那将不会返回任务数据，因为book的位置比java小1

GET _analyze
{
    
    
  "analyzer": "standard",
  "text": ["book java"]
}

{
    
    
  "tokens" : [
    {
    
    
      "token" : "book",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
    
    
      "token" : "java",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

由此可见，book.position=0，而java.position=1。
为了满足上面说的条件，我们需要把book.position+2，这里就需要用到另一个参数：slop（允许偏移量，默认值是0）

GET my_index/_search
{
    
    
  "query": {
    
    
    "match_phrase": {
    
    
      "message": {
    
    
        "query": "book java",
        "slop": 2
      }
    }
  }
}

通过设置slop=2，这样就能返回it is a java book的数据。
如果设置slop=3，那么另一笔数据it is a java reading book也会被查询出来。

扫描二维码关注公众号，回复： 11929856 查看本文章

wildcard

通配符模式的模糊匹配，使用简单，但是性能较慢。
支持以下2种通配符：

?，匹配一个字符
*，匹配零个或多个字符

官方建议：
尽量避免在开头加通配符 ? 或者 *，这样会明显降低查询性能

直接上个例子：

PUT my_index/_doc/1?refresh
{
    
    
  "message": "天气好"
}
GET my_index/_search
{
    
    
  "query": {
    
    
    "wildcard": {
    
    
      "message.keyword": {
    
    
        "value": "天*好"
      }
    }
  }
}

查询结果如下：

"hits" : [
  {
    
    
    "_index" : "my_index",
    "_type" : "_doc",
    "_id" : "1",
    "_score" : 1.0,
    "_source" : {
    
    
      "message" : "天气好"
    }
  }
]

Elasticsearch全文检索对比：match、match_phrase、wildcard

文章目录

match

match_phrase

wildcard

猜你喜欢