Elasticsearch query and aggregation basic grammar

1 Overview

        Elasticsearch main query syntax and body, including URI query query URI relatively lighter, faster, and body as a query json formatting query, you can have many restrictions. query, filter, aggregate use this article focuses on a structured query, ES version used herein is 6.5.4, ik Chinese word for use, installation and use can refer to:

Elasticsearch install and use

Use Elasticsearch in ik word's

Create the following index ES, and import data

PUT /news
{
        "aliases": {
            "test.chixiao.news": {}
        },
        "mappings":{
            "news": {
                "dynamic": "false",
                "properties": {
                    "id": {
                        "type": "integer"
                    },
                    "title": {
                        "analyzer": "ik_max_word",
                        "type": "text"
                    },
                    "summary": {
                        "analyzer": "ik_max_word",
                        "type": "text"
                    },
                    "author": {
                        "type": "keyword"
                    },
                    "publishTime": {
                        "type": "date"
                    },
                    "modifiedTime": {
                        "type": "date"
                    },
                    "createTime": {
                        "type": "date"
                    },
                    "docId": {
                        "type": "keyword"
                    },
                    "voteCount": {
                        "type": "integer"
                    },
                    "replyCount": {
                        "type": "integer"
                    }
                }
            }
        },
        "settings":{
            "index": {
                "refresh_interval": "1s",
                "number_of_shards": 3,
                "max_result_window": "10000000",
                "mapper": {
                    "dynamic": "false"
                },
                "number_of_replicas": 1
            },
            "analysis": {
                "normalizer": {
                    "lowercase": {
                        "type": "custom",
                        "char_filter": [],
                        "filter": [
                            "lowercase",
                            "asciifolding"
                        ]
                    }
                },
                "analyzer": {
                    "1gram": {
                        "type": "custom",
                        "tokenizer": "ngram_tokenizer"
                    }
                },
                "tokenizer": {
                    "ngram_tokenizer": {
                        "type": "nGram",
                        "min_gram": "1",
                        "max_gram": "1",
                        "token_chars": [
                            "letter",
                            "digit"
                        ]
                    }
                }
            }
        }
    }复制代码

2. Query

2.1 An example query

        A simple example is as follows query, the query and the query is divided into filter, these two types of query structures inside the query, identifying the remaining sort sorting, size, and used to flip from, _Source return document which specifies recalled field.

GET /news/_search
{
  "query": {"match_all": {}}, 
  "sort": [
    {
      "publishTime": {
        "order": "desc"
      }
    }
  ],
  "size": 2,
  "from": 0,
  "_source": ["title", "id", "summary"]
}复制代码

Return result:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 204,
    "max_score" : null,
    "hits" : [
      {
        "_index" : "news",
        "_type" : "news",
        "_id" : "228",
        "_score" : null,
        "_source" : {
          "summary" : "据陕西高院消息,6月11日上午,西安市中级人民法院二审公开开庭宣判了陕西省首例“套路贷”涉黑案件——韩某某等人非法放贷一案,法院驳回上诉,维持原判。西安市中级人",
          "id" : 228,
          "title" : "陕西首例套路贷涉黑案宣判:团伙对借款人喷辣椒水"
        },
        "sort" : [
          1560245097000
        ]
      },
      {
        "_index" : "news",
        "_type" : "news",
        "_id" : "214",
        "_score" : null,
        "_source" : {
          "summary" : "网易娱乐6月11日报道6月11日,有八卦媒体曝光曹云金与妻子唐菀现身天津民政局办理了离婚手续。对此,网易娱乐向曹云金经纪人求证,得到了对方独家回应:“确实是离婚",
          "id" : 214,
          "title" : "曹云金承认已离婚:和平离婚 有人恶意中伤心思歹毒"
        },
        "sort" : [
          1560244657000
        ]
      }
    ]
  }
}复制代码

Return results indicate took time consuming, _shards indicate fragmentation information, the current index has three slices, and three slices are working properly, hits represents the result of a hit, total represents the total number of hits, max_score represents the maximum score, hits hit represent specific document.

Inquiry into the precise filtration (filter) and full-text search (query) into two types: precise filter can easily be cached, so its execution speed is very fast.

2.2 FIlter inquiry

  • term

term to find qualified records can pinpoint, FIELD index which identifies the field, VALUE represents the value to be queried.

{"term": {
    "FIELD": {
      "value": "VALUE"
    }
  }
}复制代码

For example, the query source for the new latitude and longitude of news, so you can use:

GET /news/_search
{
  "query": {"term": {
    "source": {
      "value": "中新经纬"
    }
  }}
}复制代码

  • bool

When required logical combination of a plurality of queries can be used to bool each logical group. bool can contain

{
   "bool" : {
      "must" :     [],
      "should" :   [],
      "must_not" : [],
   }
}复制代码

must: search results must match, SQL-like the AND
must_not: Results must not match, SQL-like the NOT
Should: Results of the match at least to a similar SQL's OR
when we need to check the source for the new latitude and longitude, and id is 4 or 75 the news, it can be used for conditions in which minimun_should_match should specify the need to match the number, the default is only involved in the case should the content of 0,0 score, do not do inverted filter

GET /news/_search
{
  "query": {
    "bool": {
    "must": [
    {"term": {
      "source": {
        "value": "中新经纬"
      }
    }}
  ],
  "should": [
    {"term": {
      "id": {
        "value": "4"
      }
    }},
    {"term": {
      "id": {
        "value": "75"
      }
    }}
  ],
  "minimum_should_match": 1
  }}
}复制代码

  • terms

Find more accurate values ​​for the above case, you can use terms, such as finding the article id is 4 or 75,

GET /news/_search
{
  "query": {"terms": {
    "id": [
      "4",
      "75"
    ]
  }}
}复制代码

  • range

For queries need to use the range, you can use the range, the same location range and term effects, such as finding id from 1-10 article, in which:

  • gt: >Greater than (greater than)
  • lt: <Less than (less than)
  • gte: >=Greater than or equal to (greater than or equal to)
  • lte: <=Less than or equal to (less than or equal to)

GET /news/_search
{
  "query": {"range": {
    "id": {
      "gte": 1,
      "lte": 10
    }
  }}
}复制代码

  • exists

es exists that can be used to look up a field presence or absence of the document, such as finding document exists author field may be blended and must_not should be used within bool, you can achieve not exist or may exist query.

GET /news/_search
{
  "query": {
    "exists": {"field": "author"}
  }
}复制代码

2.3.Query inquiry

        And an exact match is not the same filter, query can be full-text search and search results scoring some fields, es only type text field can be word, although the type keyword is a string, but only as an enumeration, not the word, type the word text can be specified when creating the index.

  • match

When we want to search a field that can be used match, such as finding news articles that appear in Sports

GET /news/_search
{
  "query": {
    "exists": {"field": "author"}
  }
}复制代码

In the match, we can also specify the word breaker, such as the specified word is as ik_smart word input as possible points of large particles, then recalled that contain document imported wine, if the specified word is to ik_max_word the separation of words particles would be more small, will recall document contains lipstick and red wine

{
    "match": {
      "name": {
        "query": "进口红酒",
        "analyzer": "ik_smart"
      }
    
    }
  }复制代码

For the query text it is possible to separate the words good, and this can be used when connected to one or multiple words are hit was only recalled if the connection or, the similarly should be controlled, how many words hit at least was only recalled. For example, sports news search includes news content, the following query contains a sports or news as long as the document will be recalled

GET /news/_search
{
  "query": {
    "match": {
      "summary": {
        "query": "体育新闻",
        "operator": "or",
        "minimum_should_match": 1
      }
    }
  }
}复制代码

  • multi_match

When you need to search more than one field, they can use multi_match query, such as title or summary contains the search keyword document News

GET /news/_search
{
  "query": {
    "multi_match": {
      "query": "新闻",
      "fields": ["title", "summary"]
    }
  }
}复制代码

2.4. Combined Query

        With full-text searching and filtering of these fields, bool can be achieved with a combination of complex queries

GET /news/_search
{
  "query": {"bool": {
    "must": [
      {"match": {
        "summary": {
          "boost": 1,
          "query": "长安"
        }
      }
      },
      {
        "term": {
          "source": {
            "value": "中新经纬",
            "boost": 2
          }
        }
      }
    ],
    "filter": {"bool": {
      "must":[
        {"term":{
          "id":75
        }}
        ]
    }}
  }}
}复制代码

Bool the above request must, must_not, should be used term, range, match. These defaults are involved in scoring, can be controlled by the right boost scoring weight, if you do not want certain search criteria in scoring, can be added in bool filter, the filter in the query field is not in scoring, and content query can be cached.

3. polymerization

 The basic format for the polymerization:

GET /news/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "AGG_TYPE": {}
    }
  }
}复制代码

Wherein NAME represents the current name of the polymerization, can take any legal string, AGG_TYPE indicates the type of the polymerization, it is divided into a common multi-valued and single-valued polymerization polymerization

3.1 An example of the polymerization

GET /news/_search
{
 "size": 0, 
  "aggs": {
    "sum_all": {
      "sum": {
        "field": "replyCount"
      }
    }
  }
}复制代码

The above example represents the current replayCount query libraries inside and return the results:

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 204,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "sum_all" : {
      "value" : 390011.0
    }
  }
}
复制代码

Return result in a default document will contain hits, so it is necessary to specify the size of 0, results sum_all as specified in the request name.

Elasticsearch polymerization type and divided into Metrics Bucket

3.2.Metrics

metrics are mainly single return some value, like avg, max, min, sum, stats like these calculations.

  • max

Like for example, calculate the number of points which is the largest index number

GET /news/_search
{
  "size": 0,
  "aggs": {
    "max_replay": {
      "max": {
        "field": "replyCount"
      }
    }
  }
}复制代码

stats

Commonly used some statistical information, you can use stats, such as viewing a field, total, minimum, maximum, average, etc., such as viewing the basic situation of document in reply to the amount of news

GET /news/_search
{
 "size": 0, 
  "aggs": {
    "cate": {
      "stats": {
        "field": "replyCount"
      }
    }
  }
}
复制代码

Return result:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 204,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "cate" : {
      "count" : 202,
      "min" : 0.0,
      "max" : 32534.0,
      "avg" : 1930.7475247524753,
      "sum" : 390011.0
    }
  }
}
复制代码

3.3.Bucket

Similar to the inside of the barrel sql group by, the use of the content will be divided bucket Bucket

  • terms

       After using the terms barrels points, you can view the distribution of the data, for example, you can see how many source index in a total, how much each source article, size is used to specify a return of up to several categories

GET /news/_search
{
  "size": 0,
  "aggs": {
    "myterms": {
      "terms": {
        "field": "source",
        "size": 100
      }
    }
  }
}复制代码

3.4. The combination of clustering

GET /news/_search
{
  "size": 0,
  "aggs": {
    "myterms": {
      "terms": {
        "field": "source",
        "size": 100
      },
      "aggs": {
        "replay": {
          "terms": {
            "field": "replyCount",
            "size": 10
          }
        },
        "avg_price": { 
            "avg": {
                  "field": "voteCount"
               }
            }
      }
    }
  }
}复制代码

First, the source code is divided above the tub, which average value for each type souce replayCount dividing tub, and each source is calculated based on inside voteCount

Returned as a result of a

{
          "key" : "中国新闻网",
          "doc_count" : 16,
          "avg_price" : {
            "value" : 1195.0
          },
          "replay" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 4,
            "buckets" : [
              {
                "key" : 0,
                "doc_count" : 3
              },
              {
                "key" : 1,
                "doc_count" : 1
              },
              {
                "key" : 5,
                "doc_count" : 1
              },
              {
                "key" : 32,
                "doc_count" : 1
              },
              {
                "key" : 97,
                "doc_count" : 1
              },
              {
                "key" : 106,
                "doc_count" : 1
              },
              {
                "key" : 133,
                "doc_count" : 1
              },
              {
                "key" : 155,
                "doc_count" : 1
              },
              {
                "key" : 156,
                "doc_count" : 1
              },
              {
                "key" : 248,
                "doc_count" : 1
              }
            ]
          }
        }复制代码

4. Query and polyethylene and combinations

With query and aggregation, we can do the aggregation of the results of inquiries, for example, I would like to see summary information contained in those sports are the source of your site, you can do a query like the following

GET /news/_search
{
 "size": 0, 
 "query": {"bool": {"must": [
   {"match": {
     "summary": "体育"
   }}
 ]}}, 
  "aggs": {
    "cate": {
      "terms": {
        "field": "source"
      }
    }
  }
}复制代码

5. Summary

        Elasticsearch query syntax is more complex and diverse, exemplified here only some common queries and polymerization, reference may be official documents and authoritative guide, authoritative guide because it is Chinese, read more convenient, but is content 2.x, the official documents has a corresponding version of the content, the content is relatively new, it is recommended to read the official documentation.

Elasticsearch Definitive Guide (Chinese)

Elasticsearch6.5 official document (English)


Reproduced in: https: //juejin.im/post/5d0247e76fb9a07f091b9e29

Guess you like

Origin blog.csdn.net/weixin_34224941/article/details/93168485