ES搜索语法学习
目录
- 原始数据
- term,filter使用
- bool组合多个filter条件来搜索数据
- terms搜索多个值以及多值搜索结果优化
- 基于range filter来进行范围过滤
- 手动控制全文检索结果的精准度
- dis_max实现best fields策略进行多字段搜索
1. term,filter使用
0. 原始数据(目录1~2使用)
POST /forum/article/_bulk
{
"index": {
"_id": 1 }}
{
"articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{
"index": {
"_id": 2 }}
{
"articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{
"index": {
"_id": 3 }}
{
"articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{
"index": {
"_id": 4 }}
{
"articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }
1. 根据用户ID搜索帖子
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"userID" : 1
}
}
}
}
}
- query:搜索
- constant_score:默认
- filter:过滤
- term filter/query:对搜索文本不分词,直接拿去倒排索引中匹配,你输入的是什么,就去匹配什么。比如说,如果对搜索文本进行分词的话,“helle world” --> “hello”和“world”,两个词分别去倒排索引中匹配term,“hello world” --> “hello world”,直接去倒排索引中匹配“hello world”
- 相当于SQL中的单个where条件
- 搜索其他字段类似
2. bool组合多个filter条件来搜索数据
1. 搜索发帖日期为2017-01-01,或者帖子ID为XHDK-A-1293-#fJ3的帖子,同时要求帖子的发帖日期绝对不为2017-01-02
- SQL实现
select *
from forum.article
where (post_date='2017-01-01' or article_id='XHDK-A-1293-#fJ3')
and post_date!='2017-01-02'
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"term": { "postDate": "2017-01-01" }},
{
"term": {
"articleID": "XHDK-A-1293-#fJ3"}}
],
"must_not": {
"term": {
"postDate": "2017-01-02"
}
}
}
}
}
}
}
2. 搜索帖子ID为XHDK-A-1293-#fJ3,或者是帖子ID为JODL-X-1937-#pV7而且发帖日期为2017-01-01的帖子
select *
from forum.article
where article_id='XHDK-A-1293-#fJ3'
or (article_id='JODL-X-1937-#pV7' and post_date='2017-01-01')
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"term": {
"articleID": "XHDK-A-1293-#fJ3"
}
},
{
"bool": {
"must": [
{
"term":{
"articleID": "JODL-X-1937-#pV7"
}
},
{
"term": {
"postDate": "2017-01-01"
}
}
]
}
}
]
}
}
}
}
}
- bool:组合多个过滤条件
- must:必须匹配
- must_not:必须不匹配
- should:可以匹配其中任意一个即可
- bool可以嵌套
- 相当于SQL中的多个and条件
3. terms搜索多个值以及多值搜索结果优化
0. 原始数据
POST /forum/article/_bulk
{
"update": {
"_id": "1"} }
{
"doc" : {
"tag" : ["java", "hadoop"]} }
{
"update": {
"_id": "2"} }
{
"doc" : {
"tag" : ["java"]} }
{
"update": {
"_id": "3"} }
{
"doc" : {
"tag" : ["hadoop"]} }
{
"update": {
"_id": "4"} }
{
"doc" : {
"tag" : ["java", "elasticsearch"]} }
- term: {“field”: “value”}
- terms: {“field”: [“value1”, “value2”]}
- 相当于sql中的in
select * from tbl where col in ("value1", "value2")
1. 搜索articleID为KDKE-B-9947-#kL5或QQPX-R-3956-#aD8的帖子
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"terms": {
"articleID": [
"KDKE-B-9947-#kL5",
"QQPX-R-3956-#aD8"
]
}
}
}
}
}
2. 搜索tag中包含java的帖子
GET /forum/article/_search
{
"query" : {
"constant_score" : {
"filter" : {
"terms" : {
"tag" : ["java"]
}
}
}
}
}
- 此时会将tag中包含java字符串的结果返回
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "2",
"_score": 1,
"_source": {
"articleID": "KDKE-B-9947-#kL5",
"userID": 1,
"hidden": false,
"postDate": "2017-01-02",
"tag": [
"java"
]
}
},
{
"_index": "forum",
"_type": "article",
"_id": "4",
"_score": 1,
"_source": {
"articleID": "QQPX-R-3956-#aD8",
"userID": 2,
"hidden": true,
"postDate": "2017-01-02",
"tag": [
"java",
"elasticsearch"
]
}
},
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"articleID": "XHDK-A-1293-#fJ3",
"userID": 1,
"hidden": false,
"postDate": "2017-01-01",
"tag": [
"java",
"hadoop"
]
}
}
]
}
}
3. 优化搜索结果,仅仅搜索tag只包含java的帖子
- 添加字段,标识tag数量
POST /forum/article/_bulk
{
"update": {
"_id": "1"} }
{
"doc" : {
"tag_cnt" : 2} }
{
"update": {
"_id": "2"} }
{
"doc" : {
"tag_cnt" : 1} }
{
"update": {
"_id": "3"} }
{
"doc" : {
"tag_cnt" : 1} }
{
"update": {
"_id": "4"} }
{
"doc" : {
"tag_cnt" : 2} }
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"tag_cnt": 1
}
},
{
"terms": {
"tag": ["java"]
}
}
]
}
}
}
}
}
- 如果tag包含[“java”, “hadoop”, “elasticsearch”],搜索的就是只包含"java", “hadoop”, "elasticsearch"其中一个字符串的数据
- terms用于多值搜索
- 优化terms多值搜索的结果
- 相当于SQL中的in语句
4. 基于range filter来进行范围过滤
0. 为帖子数据增加浏览量的字段
POST /forum/article/_bulk
{ “update”: { “_id”: “1”} }
{ “doc” : {“view_cnt” : 30} }
{ “update”: { “_id”: “2”} }
{ “doc” : {“view_cnt” : 50} }
{ “update”: { “_id”: “3”} }
{ “doc” : {“view_cnt” : 100} }
{ “update”: { “_id”: “4”} }
{ “doc” : {“view_cnt” : 80} }
1. 搜索浏览量在30~60之间的帖子
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"range": {
"view_cnt": {
"gt": 30,
"lt": 60
}
}
}
}
}
}
- range:范围搜索
- gt:大于
- gte:大于等于
- lt:小于
- lte:小于等于
2. 搜索发帖日期在最近1个月的帖子
- 添加数据
POST /forum/article/_bulk
{
"index": {
"_id": 5 }}
{
"articleID" : "DHJK-B-1395-#Ky5", "userID" : 3, "hidden": false, "postDate": "2017-03-01", "tag": ["elasticsearch"], "tag_cnt": 1, "view_cnt": 10 }
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"range": {
"postDate": {
"gt": "2017-03-10||-30d"
}
}
}
}
}
}
GET /forum/article/_search
{
"query": {
"constant_score": {
"filter": {
"range": {
"postDate": {
"gt": "now-30d"
}
}
}
}
}
}
- range,相当于sql中的between,或者>=,<=,做范围过滤。
5. 手动控制全文检索结果的精准度
0. 为帖子数据增加标题字段
POST /forum/article/_bulk
{ “update”: { “_id”: “1”} }
{ “doc” : {“title” : “this is java and elasticsearch blog”} }
{ “update”: { “_id”: “2”} }
{ “doc” : {“title” : “this is java blog”} }
{ “update”: { “_id”: “3”} }
{ “doc” : {“title” : “this is elasticsearch blog”} }
{ “update”: { “_id”: “4”} }
{ “doc” : {“title” : “this is java, elasticsearch, hadoop blog”} }
{ “update”: { “_id”: “5”} }
{ “doc” : {“title” : “this is spark blog”} }
1. 搜索标题中包含java或elasticsearch的blog
- 这个,就跟之前的那个term query,不一样了。不是搜索exact value,是进行full text全文检索。
- match query,是负责进行全文检索的。当然,如果要检索的field,是not_analyzed类型的,那么match query也相当于term query。
GET /forum/article/_search
{
"query": {
"match": {
"title": "java elasticsearch"
}
}
}
2. 搜索标题中包含java和elasticsearch的blog
- 搜索结果精准控制的第一步:灵活使用and关键字,如果你是希望所有的搜索关键字都要匹配的,那么就用and,可以实现单纯match query无法实现的效果
GET /forum/article/_search
{
"query": {
"match": {
"title": {
"query": "java elasticsearch",
"operator": "and"
}
}
}
}
3. 搜索包含java,elasticsearch,spark,hadoop,4个关键字中,至少3个的blog
GET /forum/article/_search
{
"query": {
"match": {
"title": {
"query": "java elasticsearch spark hadoop",
"minimum_should_match": "75%"
}
}
}
}
4. 用bool组合多个搜索条件,来搜索title
GET /forum/article/_search
{
"query": {
"bool": {
"must": {
"match": {
"title": "java" }},
"must_not": {
"match": {
"title": "spark" }},
"should": [
{
"match": {
"title": "hadoop" }},
{
"match": {
"title": "elasticsearch" }}
]
}
}
}
5. 搜索java,hadoop,spark,elasticsearch,至少包含其中3个关键字
- 默认情况下,should是可以不匹配任何一个的,比如上面的搜索中,this is java blog,就不匹配任何一个should条件
- 但是有个例外的情况,如果没有must的话,那么should中必须至少匹配一个才可以。比如下面的搜索,should中有4个条件,默认情况下,只要满足其中一个条件,就可以匹配作为结果返回
- 但是可以精准控制,should的4个条件中,至少匹配几个才能作为结果返回
GET /forum/article/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "java" }},
{
"match": {
"title": "elasticsearch" }},
{
"match": {
"title": "hadoop" }},
{
"match": {
"title": "spark" }}
],
"minimum_should_match": 3
}
}
}
- 全文检索的时候,进行多个值的检索,有两种做法,match query;should
- 控制搜索结果精准度:and operator,minimum_should_match
6. dis_max实现best fields策略进行多字段搜索
0. 为帖子数据增加content字段
POST /forum/article/_bulk
{
"update": {
"_id": "1"} }
{
"doc" : {
"content" : "i like to write best elasticsearch article"} }
{
"update": {
"_id": "2"} }
{
"doc" : {
"content" : "i think java is the best programming language"} }
{
"update": {
"_id": "3"} }
{
"doc" : {
"content" : "i am only an elasticsearch beginner"} }
{
"update": {
"_id": "4"} }
{
"doc" : {
"content" : "elasticsearch and hadoop are all very good solution, i am a beginner"} }
{
"update": {
"_id": "5"} }
{
"doc" : {
"content" : "spark is best big data solution based on scala ,an programming language similar to java"} }
1. 搜索title或content中包含java或solution的帖子
- 下面这个就是multi-field搜索,多字段搜索
GET /forum/article/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "java solution" }},
{
"match": {
"content": "java solution" }}
]
}
}
}
- best fields策略,就是说,搜索到的结果,应该是某一个field中匹配到了尽可能多的关键词,被排在前面;而不是尽可能多的field匹配到了少数的关键词,排在了前面
- dis_max语法,直接取多个query中,分数最高的那一个query的分数即可
GET /forum/article/_search
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"title": "java solution" }},
{
"match": {
"content": "java solution" }}
]
}
}
}