ElasticSearch搜索语法学习(term,filter,bool,terms,range)

ES搜索语法学习


目录

  1. 原始数据
  2. term,filter使用
  3. bool组合多个filter条件来搜索数据
  4. terms搜索多个值以及多值搜索结果优化
  5. 基于range filter来进行范围过滤
  6. 手动控制全文检索结果的精准度
  7. dis_max实现best fields策略进行多字段搜索


1. term,filter使用

0. 原始数据(目录1~2使用)
POST /forum/article/_bulk
{
    
     "index": {
    
     "_id": 1 }}
{
    
     "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{
    
     "index": {
    
     "_id": 2 }}
{
    
     "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{
    
     "index": {
    
     "_id": 3 }}
{
    
     "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{
    
     "index": {
    
     "_id": 4 }}
{
    
     "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }
1. 根据用户ID搜索帖子
{
    
    
    "query" : {
    
    
        "constant_score" : {
    
     
            "filter" : {
    
    
                "term" : {
    
     
                    "userID" : 1
                }
            }
        }
    }
}
  1. query:搜索
  2. constant_score:默认
  3. filter:过滤
  4. term filter/query:对搜索文本不分词,直接拿去倒排索引中匹配,你输入的是什么,就去匹配什么。比如说,如果对搜索文本进行分词的话,“helle world” --> “hello”和“world”,两个词分别去倒排索引中匹配term,“hello world” --> “hello world”,直接去倒排索引中匹配“hello world”
  5. 相当于SQL中的单个where条件
  6. 搜索其他字段类似

2. bool组合多个filter条件来搜索数据

1. 搜索发帖日期为2017-01-01,或者帖子ID为XHDK-A-1293-#fJ3的帖子,同时要求帖子的发帖日期绝对不为2017-01-02
  1. SQL实现
select *
from forum.article
where (post_date='2017-01-01' or article_id='XHDK-A-1293-#fJ3')
and post_date!='2017-01-02'
GET /forum/article/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "should": [
            {
   
   "term": { "postDate": "2017-01-01" }},
            {
   
   "term": {
   
   "articleID": "XHDK-A-1293-#fJ3"}}
          ],
          "must_not": {
            "term": {
              "postDate": "2017-01-02"
            }
          }
        }
      }
    }
  }
}
2. 搜索帖子ID为XHDK-A-1293-#fJ3,或者是帖子ID为JODL-X-1937-#pV7而且发帖日期为2017-01-01的帖子
select *
from forum.article
where article_id='XHDK-A-1293-#fJ3'
or (article_id='JODL-X-1937-#pV7' and post_date='2017-01-01')
GET /forum/article/_search 
{
    
    
  "query": {
    
    
    "constant_score": {
    
    
      "filter": {
    
    
        "bool": {
    
    
          "should": [
            {
    
    
              "term": {
    
    
                "articleID": "XHDK-A-1293-#fJ3"
              }
            },
            {
    
    
              "bool": {
    
    
                "must": [
                  {
    
    
                    "term":{
    
    
                      "articleID": "JODL-X-1937-#pV7"
                    }
                  },
                  {
    
    
                    "term": {
    
    
                      "postDate": "2017-01-01"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}
  1. bool:组合多个过滤条件
    1. must:必须匹配
    2. must_not:必须不匹配
    3. should:可以匹配其中任意一个即可
  2. bool可以嵌套
  3. 相当于SQL中的多个and条件

3. terms搜索多个值以及多值搜索结果优化

0. 原始数据
POST /forum/article/_bulk
{
    
     "update": {
    
     "_id": "1"} }
{
    
     "doc" : {
    
    "tag" : ["java", "hadoop"]} }
{
    
     "update": {
    
     "_id": "2"} }
{
    
     "doc" : {
    
    "tag" : ["java"]} }
{
    
     "update": {
    
     "_id": "3"} }
{
    
     "doc" : {
    
    "tag" : ["hadoop"]} }
{
    
     "update": {
    
     "_id": "4"} }
{
    
     "doc" : {
    
    "tag" : ["java", "elasticsearch"]} }
  1. term: {“field”: “value”}
  2. terms: {“field”: [“value1”, “value2”]}
  3. 相当于sql中的in
select * from tbl where col in ("value1", "value2")
1. 搜索articleID为KDKE-B-9947-#kL5或QQPX-R-3956-#aD8的帖子
GET /forum/article/_search 
{
    
    
  "query": {
    
    
    "constant_score": {
    
    
      "filter": {
    
    
        "terms": {
    
    
          "articleID": [
            "KDKE-B-9947-#kL5",
            "QQPX-R-3956-#aD8"
          ]
        }
      }
    }
  }
}
2. 搜索tag中包含java的帖子
GET /forum/article/_search
{
    
    
    "query" : {
    
    
        "constant_score" : {
    
    
            "filter" : {
    
    
                "terms" : {
    
     
                    "tag" : ["java"]
                }
            }
        }
    }
}
  1. 此时会将tag中包含java字符串的结果返回
  "took": 2,
  "timed_out": false,
  "_shards": {
    
    
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    
    
    "total": 3,
    "max_score": 1,
    "hits": [
      {
    
    
        "_index": "forum",
        "_type": "article",
        "_id": "2",
        "_score": 1,
        "_source": {
    
    
          "articleID": "KDKE-B-9947-#kL5",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-02",
          "tag": [
            "java"
          ]
        }
      },
      {
    
    
        "_index": "forum",
        "_type": "article",
        "_id": "4",
        "_score": 1,
        "_source": {
    
    
          "articleID": "QQPX-R-3956-#aD8",
          "userID": 2,
          "hidden": true,
          "postDate": "2017-01-02",
          "tag": [
            "java",
            "elasticsearch"
          ]
        }
      },
      {
    
    
        "_index": "forum",
        "_type": "article",
        "_id": "1",
        "_score": 1,
        "_source": {
    
    
          "articleID": "XHDK-A-1293-#fJ3",
          "userID": 1,
          "hidden": false,
          "postDate": "2017-01-01",
          "tag": [
            "java",
            "hadoop"
          ]
        }
      }
    ]
  }
}
3. 优化搜索结果,仅仅搜索tag只包含java的帖子
  1. 添加字段,标识tag数量
POST /forum/article/_bulk
{
    
     "update": {
    
     "_id": "1"} }
{
    
     "doc" : {
    
    "tag_cnt" : 2} }
{
    
     "update": {
    
     "_id": "2"} }
{
    
     "doc" : {
    
    "tag_cnt" : 1} }
{
    
     "update": {
    
     "_id": "3"} }
{
    
     "doc" : {
    
    "tag_cnt" : 1} }
{
    
     "update": {
    
     "_id": "4"} }
{
    
     "doc" : {
    
    "tag_cnt" : 2} }
GET /forum/article/_search
{
    
    
  "query": {
    
    
    "constant_score": {
    
    
      "filter": {
    
    
        "bool": {
    
    
          "must": [
            {
    
    
              "term": {
    
    
                "tag_cnt": 1
              }
            },
            {
    
    
              "terms": {
    
    
                "tag": ["java"]
              }
            }
          ]
        }
      }
    }
  }
}
  1. 如果tag包含[“java”, “hadoop”, “elasticsearch”],搜索的就是只包含"java", “hadoop”, "elasticsearch"其中一个字符串的数据
  2. terms用于多值搜索
  3. 优化terms多值搜索的结果
  4. 相当于SQL中的in语句

4. 基于range filter来进行范围过滤

0. 为帖子数据增加浏览量的字段

POST /forum/article/_bulk
{ “update”: { “_id”: “1”} }
{ “doc” : {“view_cnt” : 30} }
{ “update”: { “_id”: “2”} }
{ “doc” : {“view_cnt” : 50} }
{ “update”: { “_id”: “3”} }
{ “doc” : {“view_cnt” : 100} }
{ “update”: { “_id”: “4”} }
{ “doc” : {“view_cnt” : 80} }

1. 搜索浏览量在30~60之间的帖子
GET /forum/article/_search
{
    
    
  "query": {
    
    
    "constant_score": {
    
    
      "filter": {
    
    
        "range": {
    
    
          "view_cnt": {
    
    
            "gt": 30,
            "lt": 60
          }
        }
      }
    }
  }
}
  1. range:范围搜索
  2. gt:大于
  3. gte:大于等于
  4. lt:小于
  5. lte:小于等于
2. 搜索发帖日期在最近1个月的帖子
  1. 添加数据
POST /forum/article/_bulk
{
    
     "index": {
    
     "_id": 5 }}
{
    
     "articleID" : "DHJK-B-1395-#Ky5", "userID" : 3, "hidden": false, "postDate": "2017-03-01", "tag": ["elasticsearch"], "tag_cnt": 1, "view_cnt": 10 }
GET /forum/article/_search 
{
    
    
  "query": {
    
    
    "constant_score": {
    
    
      "filter": {
    
    
        "range": {
    
    
          "postDate": {
    
    
            "gt": "2017-03-10||-30d"
          }
        }
      }
    }
  }
}

GET /forum/article/_search 
{
    
    
  "query": {
    
    
    "constant_score": {
    
    
      "filter": {
    
    
        "range": {
    
    
          "postDate": {
    
    
            "gt": "now-30d"
          }
        }
      }
    }
  }
}
  1. range,相当于sql中的between,或者>=,<=,做范围过滤。

5. 手动控制全文检索结果的精准度

0. 为帖子数据增加标题字段

POST /forum/article/_bulk
{ “update”: { “_id”: “1”} }
{ “doc” : {“title” : “this is java and elasticsearch blog”} }
{ “update”: { “_id”: “2”} }
{ “doc” : {“title” : “this is java blog”} }
{ “update”: { “_id”: “3”} }
{ “doc” : {“title” : “this is elasticsearch blog”} }
{ “update”: { “_id”: “4”} }
{ “doc” : {“title” : “this is java, elasticsearch, hadoop blog”} }
{ “update”: { “_id”: “5”} }
{ “doc” : {“title” : “this is spark blog”} }

1. 搜索标题中包含java或elasticsearch的blog
  1. 这个,就跟之前的那个term query,不一样了。不是搜索exact value,是进行full text全文检索。
  2. match query,是负责进行全文检索的。当然,如果要检索的field,是not_analyzed类型的,那么match query也相当于term query。
GET /forum/article/_search
{
    
    
    "query": {
    
    
        "match": {
    
    
            "title": "java elasticsearch"
        }
    }
}
2. 搜索标题中包含java和elasticsearch的blog
  1. 搜索结果精准控制的第一步:灵活使用and关键字,如果你是希望所有的搜索关键字都要匹配的,那么就用and,可以实现单纯match query无法实现的效果
GET /forum/article/_search
{
    
    
    "query": {
    
    
        "match": {
    
    
            "title": {
    
    
		"query": "java elasticsearch",
		"operator": "and"
   	    }
        }
    }
}
3. 搜索包含java,elasticsearch,spark,hadoop,4个关键字中,至少3个的blog
GET /forum/article/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "title": {
    
    
        "query": "java elasticsearch spark hadoop",
        "minimum_should_match": "75%"
      }
    }
  }
}
4. 用bool组合多个搜索条件,来搜索title
GET /forum/article/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must":     {
    
     "match": {
    
     "title": "java" }},
      "must_not": {
    
     "match": {
    
     "title": "spark"  }},
      "should": [
                  {
    
     "match": {
    
     "title": "hadoop" }},
                  {
    
     "match": {
    
     "title": "elasticsearch"   }}
      ]
    }
  }
}
5. 搜索java,hadoop,spark,elasticsearch,至少包含其中3个关键字
  1. 默认情况下,should是可以不匹配任何一个的,比如上面的搜索中,this is java blog,就不匹配任何一个should条件
  2. 但是有个例外的情况,如果没有must的话,那么should中必须至少匹配一个才可以。比如下面的搜索,should中有4个条件,默认情况下,只要满足其中一个条件,就可以匹配作为结果返回
  3. 但是可以精准控制,should的4个条件中,至少匹配几个才能作为结果返回
GET /forum/article/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "should": [
        {
    
     "match": {
    
     "title": "java" }},
        {
    
     "match": {
    
     "title": "elasticsearch"   }},
        {
    
     "match": {
    
     "title": "hadoop"   }},
	{
    
     "match": {
    
     "title": "spark"   }}
      ],
      "minimum_should_match": 3 
    }
  }
}
  1. 全文检索的时候,进行多个值的检索,有两种做法,match query;should
  2. 控制搜索结果精准度:and operator,minimum_should_match

6. dis_max实现best fields策略进行多字段搜索

0. 为帖子数据增加content字段
POST /forum/article/_bulk
{
    
     "update": {
    
     "_id": "1"} }
{
    
     "doc" : {
    
    "content" : "i like to write best elasticsearch article"} }
{
    
     "update": {
    
     "_id": "2"} }
{
    
     "doc" : {
    
    "content" : "i think java is the best programming language"} }
{
    
     "update": {
    
     "_id": "3"} }
{
    
     "doc" : {
    
    "content" : "i am only an elasticsearch beginner"} }
{
    
     "update": {
    
     "_id": "4"} }
{
    
     "doc" : {
    
    "content" : "elasticsearch and hadoop are all very good solution, i am a beginner"} }
{
    
     "update": {
    
     "_id": "5"} }
{
    
     "doc" : {
    
    "content" : "spark is best big data solution based on scala ,an programming language similar to java"} }
1. 搜索title或content中包含java或solution的帖子
  1. 下面这个就是multi-field搜索,多字段搜索
GET /forum/article/_search
{
    
    
    "query": {
    
    
        "bool": {
    
    
            "should": [
                {
    
     "match": {
    
     "title": "java solution" }},
                {
    
     "match": {
    
     "content":  "java solution" }}
            ]
        }
    }
}
  1. best fields策略,就是说,搜索到的结果,应该是某一个field中匹配到了尽可能多的关键词,被排在前面;而不是尽可能多的field匹配到了少数的关键词,排在了前面
  2. dis_max语法,直接取多个query中,分数最高的那一个query的分数即可
GET /forum/article/_search
{
    
    
    "query": {
    
    
        "dis_max": {
    
    
            "queries": [
                {
    
     "match": {
    
     "title": "java solution" }},
                {
    
     "match": {
    
     "content":  "java solution" }}
            ]
        }
    }
}

猜你喜欢

转载自blog.csdn.net/weixin_41910694/article/details/109407919