es query document

1.kibana operation

1.1 Query all

// 查询所有
GET /indexName/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    
    }
  }
}

1.2. Full text search query

Common full-text search queries include:

  • match query: single field query
  • multi_match query: multi-field query, any field meets the conditions even if it meets the query conditions

The match query syntax is as follows:

GET /indexName/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "FIELD": "TEXT"
    }
  }
}

The mulit_match syntax is as follows:

GET /indexName/_search
{
    
    
  "query": {
    
    
    "multi_match": {
    
    
      "query": "TEXT",
      "fields": ["FIELD1", " FIELD12"]
    }
  }
}

Example of match query:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-OAxvKsZd-1644301793606)(assets/image-20210721170455419.png)]

Example of a multi_match query:
insert image description here

1.3 Precise query

Precise query is generally to search for keyword, value, date, boolean and other types of fields. Therefore, the word segmentation of the search conditions will not be performed. The common ones are:

  • term: query based on the exact value of the term
  • range: query based on the range of values

1.3.1. term query

Because the field search for exact query is a field without word segmentation, the query condition must also be an entry without word segmentation . When querying, only when the content entered by the user exactly matches the automatic value is considered to meet the condition. If the user enters too much content, the data cannot be searched.

Grammar description:

// term查询
GET /indexName/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "FIELD": {
    
    
        "value": "VALUE"
      }
    }
  }
}

Example:

When I search for exact terms, I can correctly query the results:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-9tjTOfPk-1644301917796)(assets/image-20210721171655308.png)]

However, when the content of my search is not an entry, but a phrase formed by multiple words, it cannot be searched:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-ue0ORngS-1644301917797)(assets/image-20210721171838378.png)]

1.3.2. range query

Range query is generally used when performing range filtering on numeric types. For example, do price range filtering.

Basic syntax:

// range查询
GET /indexName/_search
{
    
    
  "query": {
    
    
    "range": {
    
    
      "FIELD": {
    
    
        "gte": 10, // 这里的gte代表大于等于,gt则代表大于
        "lte": 20 // lte代表小于等于,lt则代表小于
      }
    }
  }
}

Example:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-NrYCOQiJ-1644301917798)(assets/image-20210721172307172.png)]

1.4. Geographic coordinate query

1.4.1. Rectangular range query

Rectangular range query, that is, geo_bounding_box query, queries all documents whose coordinates fall within a certain rectangular range:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-musSFUBK-1644302076175)(assets/DKV9HZbVS6.gif)]

When querying, you need to specify the coordinates of the upper left and lower right points of the rectangle, and then draw a rectangle, and all points that fall within the rectangle meet the conditions.

The syntax is as follows:

// geo_bounding_box查询
GET /indexName/_search
{
    
    
  "query": {
    
    
    "geo_bounding_box": {
    
    
      "FIELD": {
    
    
        "top_left": {
    
     // 左上点
          "lat": 31.1,
          "lon": 121.5
        },
        "bottom_right": {
    
     // 右下点
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

This does not meet the needs of "nearby people", so we will not do it.

1.4.2. Nearby query

Nearby query, also called distance query (geo_distance): query all documents whose specified center point is less than a certain distance value.

In other words, find a point on the map as the center of the circle, draw a circle with the specified distance as the radius, and the coordinates falling within the circle are considered eligible:

Grammar description:

// geo_distance 查询
GET /indexName/_search
{
    
    
  "query": {
    
    
    "geo_distance": {
    
    
      "distance": "15km", // 半径
      "FIELD": "31.21,121.5" // 圆心
    }
  }
}

Example:

Let's search for hotels within 15km near Lujiazui:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-7QGe6ZT1-1644302076177)(assets/image-20210721175443234.png)]

A total of 47 hotels were found.

1.5. Compound query

insert image description here
The function score query contains four parts:

  • Original query condition: query part, search for documents based on this condition, and score the document based on the BM25 algorithm, the original score (query score)
  • Filter condition : the filter part, documents that meet this condition will be recalculated
  • Calculation function : Documents that meet the filter conditions need to be calculated according to this function, and the obtained function score (function score), there are four functions
    • weight: the result of the function is a constant
    • field_value_factor: Use a field value in the document as the function result
    • random_score: Use random numbers as the result of the function
    • script_score: custom scoring function algorithm
  • Calculation mode : the result of the calculation function, the correlation calculation score of the original query, and the calculation method between the two, including:
    • multiply: Multiply
    • replace: replace query score with function score
    • Others, such as: sum, avg, max, min

The operation process of function score is as follows:

  • 1) Query and search documents according to the original conditions , and calculate the relevance score, called the original score (query score)
  • 2) According to filter conditions , filter documents
  • 3) For documents that meet the filter conditions , the function score is obtained based on the calculation of the score function
  • 4) The original score (query score) and function score (function score) are calculated based on the operation mode , and the final result is obtained as a correlation score.

insert image description here

1.6. Boolean queries

A Boolean query is a combination of one or more query clauses, each of which is a subquery . Subqueries can be combined in the following ways:

  • must: must match each subquery, similar to "and"
  • should: Selective matching subquery, similar to "or"
  • must_not: must not match, does not participate in scoring , similar to "not"
  • filter: must match, do not participate in scoring

For example, when searching for hotels, in addition to keyword search, we may also filter based on fields such as brand, price, and city:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-Axk6QnUZ-1644302320639)(assets/image-20210721193822848.png)]

Each different field has different query conditions and methods, and must be multiple different queries. To combine these queries, you must use bool queries.

It should be noted that when searching, the more fields involved in scoring, the worse the query performance will be . Therefore, it is recommended to do this when querying with multiple conditions:

  • The keyword search in the search box is a full-text search query, use must query, and participate in scoring
  • For other filter conditions, use filter query. Do not participate in scoring

1) Grammar example:

GET /hotel/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "term": {
    
    "city": "上海" }}
      ],
      "should": [
        {
    
    "term": {
    
    "brand": "皇冠假日" }},
        {
    
    "term": {
    
    "brand": "华美达" }}
      ],
      "must_not": [
        {
    
     "range": {
    
     "price": {
    
     "lte": 500 } }}
      ],
      "filter": [
        {
    
     "range": {
    
    "score": {
    
     "gte": 45 } }}
      ]
    }
  }
}

2) Example

Requirement: Search for hotels whose name contains "Home Inn", the price is not higher than 400, and within 10km around the coordinates 31.21, 121.5.

analyze:

  • Name search is a full-text search query and should be involved in scoring. put in must
  • If the price is not higher than 400, use range to query, which belongs to the filter condition and does not participate in the calculation of points. put in must_not
  • Within the range of 10km, use geo_distance to query, which belongs to the filter condition and does not participate in the calculation of points. put in filter

insert image description here

1.7 Sorting

Elasticsearch sorts according to the correlation score (_score) by default, but also supports custom ways to sort search results . Field types that can be sorted include: keyword type, numeric type, geographic coordinate type, date type, etc.

1.7.1. Ordinary field sorting

The syntax for sorting by keyword, value, and date is basically the same.

Grammar :

GET /indexName/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "sort": [
    {
    
    
      "FIELD": "desc"  // 排序字段、排序方式ASC、DESC
    }
  ]
}

The sorting condition is an array, that is, multiple sorting conditions can be written. According to the order of declaration, when the first condition is equal, then sort according to the second condition, and so on

Example :

Requirement description: The hotel data is sorted in descending order of user ratings (score), and the same ratings are sorted in ascending order of price (price)

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-CtGocVpt-1644367761772)(assets/image-20210721195728306.png)]

1.7.2. Geographical coordinate sorting

Geographic coordinate ordering is slightly different.

Grammar description :

GET /indexName/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "sort": [
    {
    
    
      "_geo_distance" : {
    
    
          "FIELD" : "纬度,经度", // 文档中geo_point类型的字段名、目标坐标点
          "order" : "asc", // 排序方式
          "unit" : "km" // 排序的距离单位
      }
    }
  ]
}

The meaning of this query is:

  • Specify a coordinate as the target point
  • Calculate the distance from the coordinates of the specified field (must be geo_point type) to the target point in each document
  • Sort by distance

Example:

Requirement description: Realize sorting the hotel data in ascending order according to the distance to your location coordinates

Tip: The way to get the latitude and longitude of your location: https://lbs.amap.com/demo/jsapi-v2/example/map/click-to-get-lnglat/

Suppose my location is: 31.034661, 121.612282, looking for the nearest hotel around me.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-A1w1ko7R-1644367761773)(assets/image-20210721200214690.png)]
paging

Elasticsearch only returns top10 data by default. And if you want to query more data, you need to modify the paging parameters. In elasticsearch, modify the from and size parameters to control the paging results to be returned:

  • from: start from the first few documents
  • size: how many documents to query in total

similar to mysqllimit ?, ?

1.8. Pagination

The basic syntax of pagination is as follows:

GET /hotel/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "from": 0, // 分页开始的位置,默认为0
  "size": 10, // 期望获取的文档总数
  "sort": [
    {
    
    "price": "asc"}
  ]
}

1.9 highlight

1.9.1. Highlighting principle

What is highlighting?

When we search on Baidu and JD.com, the keywords will turn red, which is more eye-catching. This is called highlighting:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-8CcRlQE5-1644368125339)(assets/image-20210721202705030.png)]

The implementation of highlighting is divided into two steps:

  • 1) Add a label to all keywords in the document, such as <em>label
  • 2) The page <em>writes CSS styles for the tags

1.9.2. Implement highlighting

Highlighted syntax :

GET /hotel/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "FIELD": "TEXT" // 查询条件,高亮一定要使用全文检索查询
    }
  },
  "highlight": {
    
    
    "fields": {
    
     // 指定要高亮的字段
      "FIELD": {
    
    
        "pre_tags": "<em>",  // 用来标记高亮字段的前置标签
        "post_tags": "</em>" // 用来标记高亮字段的后置标签
      }
    }
  }
}

Notice:

  • Highlighting is for keywords, so the search conditions must contain keywords , not range queries.
  • By default, the highlighted field must be the same as the field specified by the search , otherwise it cannot be highlighted
  • If you want to highlight non-search fields, you need to add an attribute: required_field_match=false

Example :

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-gGZeODOg-1644368125341)(assets/image-20210721203349633.png)]

2. Java code implementation

2.1 Query all

2.1.1 Initiate a query request

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-ztjj9XEK-1644368545014)(assets/image-20210721203950559.png)]

Code interpretation:

  • The first step is to create SearchRequestan object and specify the index library name

  • The second step is to use request.source()the construction of DSL, which can include query, paging, sorting, highlighting, etc.

    • query(): Represents the query condition, using QueryBuilders.matchAllQuery()the DSL to construct a match_all query
  • The third step is to use client.search() to send a request and get a response

There are two key APIs here. One is request.source()that it contains all functions such as query, sorting, paging, highlighting, etc.:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-2WUUR8xl-1644368545015)(assets/image-20210721215640790.png)]

The other is QueryBuildersthat it contains various queries such as match, term, function_score, bool, etc.:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-wbkj7KNe-1644368545016)(assets/image-20210721215729236.png)]

2.1.2. Parsing the response

Analysis of the response result:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-E11P20bn-1644368545017)(assets/image-20210721214221057.png)]

The result returned by elasticsearch is a JSON string, the structure contains:

  • hits: the result of the hit
    • total: The total number of entries, where value is the specific total entry value
    • max_score: the relevance score of the highest scoring document across all results
    • hits: An array of documents for search results, each of which is a json object
      • _source: the original data in the document, also a json object

Therefore, we parse the response result, which is to parse the JSON string layer by layer. The process is as follows:

  • SearchHits: Obtained through response.getHits(), which is the outermost hits in JSON, representing the result of the hit
    • SearchHits#getTotalHits().value: Get the total number of information
    • SearchHits#getHits(): Get the SearchHit array, which is the document array
      • SearchHit#getSourceAsString(): Get the _source in the document result, which is the original json document data

2.1.3. Complete code

The complete code is as follows:

@Test
void testMatchAll() throws IOException {
    
    
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    request.source()
        .query(QueryBuilders.matchAllQuery());
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4.解析响应
    handleResponse(response);
}

private void handleResponse(SearchResponse response) {
    
    
    // 4.解析响应
    SearchHits searchHits = response.getHits();
    // 4.1.获取总条数
    long total = searchHits.getTotalHits().value;
    System.out.println("共搜索到" + total + "条数据");
    // 4.2.文档数组
    SearchHit[] hits = searchHits.getHits();
    // 4.3.遍历
    for (SearchHit hit : hits) {
    
    
        // 获取文档source
        String json = hit.getSourceAsString();
        // 反序列化
        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
        System.out.println("hotelDoc = " + hotelDoc);
    }
}

2.2 match query

The match and multi_match queries of full-text search are basically the same as the API of match_all. The difference is the query condition, which is the query part.

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-Ii0rbJc9-1644368856399)(assets/image-20210721215923060.png)]

Therefore, the difference in the Java code is mainly the parameters in request.source().query(). Also use the methods provided by QueryBuilders:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-z1jhaI49-1644368856401)(assets/image-20210721215843099.png)]

The result parsing code is completely consistent and can be extracted and shared.

The complete code is as follows:

@Test
void testMatch() throws IOException {
    
    
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    request.source()
        .query(QueryBuilders.matchQuery("all", "如家"));
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

2.3. Precise query

Exact queries are mainly two:

  • term: term exact match
  • range: range query

Compared with the previous query, the difference is also in the query condition, and everything else is the same.

The API for query condition construction is as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-2Pnb0OWT-1644368856401)(assets/image-20210721220305140.png)]

2.4. Boolean queries

Boolean query is to combine other queries with must, must_not, filter, etc. The code example is as follows:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-1qa35uiy-1644368856402)(assets/image-20210721220927286.png)]

It can be seen that the difference between API and other queries is that the construction of query conditions, QueryBuilders, result parsing and other codes are completely unchanged.

The complete code is as follows:

@Test
void testBool() throws IOException {
    
    
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.准备BooleanQuery
    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
    // 2.2.添加term
    boolQuery.must(QueryBuilders.termQuery("city", "杭州"));
    // 2.3.添加range
    boolQuery.filter(QueryBuilders.rangeQuery("price").lte(250));

    request.source().query(boolQuery);
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

2.5. Sorting, paging

The sorting and paging of search results are parameters at the same level as query, so they are also set using request.source().

The corresponding APIs are as follows:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-a8ZMhOPL-1644368856404)(assets/image-20210721221121266.png)]

Full code example:

@Test
void testPageAndSort() throws IOException {
    
    
    // 页码,每页大小
    int page = 1, size = 5;

    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.query
    request.source().query(QueryBuilders.matchAllQuery());
    // 2.2.排序 sort
    request.source().sort("price", SortOrder.ASC);
    // 2.3.分页 from、size
    request.source().from((page - 1) * size).size(5);
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

3.6. Highlight

The highlighted code is quite different from the previous code, there are two points:

  • Query DSL: In addition to query conditions, you also need to add highlight conditions, which are also at the same level as query.
  • Result parsing: In addition to parsing the _source document data, the result also needs to parse the highlighted result

3.6.1. Highlight request build

The construction API of the highlight request is as follows:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-JLtv0YLT-1644368856405)(assets/image-20210721221744883.png)]

The above code omits the query condition part, but don’t forget: the highlight query must use full-text search and search keywords, so that keywords can be highlighted in the future.

The complete code is as follows:

@Test
void testHighlight() throws IOException {
    
    
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.query
    request.source().query(QueryBuilders.matchQuery("all", "如家"));
    // 2.2.高亮
    request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false));
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

3.6.2. Analysis of highlighted results

The highlighted results and the query document results are separated by default and not together.

So parsing the highlighted code requires additional processing:

[External link image transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the image and upload it directly (img-gSjHUphK-1644368856406)(assets/image-20210721222057212.png)]

Code interpretation:

  • Step 1: Get the source from the result. hit.getSourceAsString(), this part is the non-highlighted result, json string. It also needs to be deserialized into a HotelDoc object
  • Step 2: Obtain the highlighted result. hit.getHighlightFields(), the return value is a Map, the key is the highlight field name, and the value is the HighlightField object, representing the highlight value
  • Step 3: Obtain the highlighted field value object HighlightField from the map according to the highlighted field name
  • Step 4: Get Fragments from HighlightField and convert them to strings. This part is the real highlighted string
  • Step 5: Replace non-highlighted results in HotelDoc with highlighted results

The complete code is as follows:

private void handleResponse(SearchResponse response) {
    
    
    // 4.解析响应
    SearchHits searchHits = response.getHits();
    // 4.1.获取总条数
    long total = searchHits.getTotalHits().value;
    System.out.println("共搜索到" + total + "条数据");
    // 4.2.文档数组
    SearchHit[] hits = searchHits.getHits();
    // 4.3.遍历
    for (SearchHit hit : hits) {
    
    
        // 获取文档source
        String json = hit.getSourceAsString();
        // 反序列化
        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
        // 获取高亮结果
        Map<String, HighlightField> highlightFields = hit.getHighlightFields();
        if (!CollectionUtils.isEmpty(highlightFields)) {
    
    
            // 根据字段名获取高亮结果
            HighlightField highlightField = highlightFields.get("name");
            if (highlightField != null) {
    
    
                // 获取高亮值
                String name = highlightField.getFragments()[0].string();
                // 覆盖非高亮结果
                hotelDoc.setName(name);
            }
        }
        System.out.println("hotelDoc = " + hotelDoc);
    }
}

Guess you like

Origin blog.csdn.net/qq_44954571/article/details/122823297