SpringCloud microservice technology stack. Dark horse follow-up (6)

SpringCloud microservice technology stack. Dark horse follow-up 6

today's goal

In yesterday's study, we have imported a large amount of data into elasticsearch and realized the data storage function of elasticsearch. But what elasticsearch is best at is search and data analysis.

So today, let's study the data search function of elasticsearch. We will implement search using DSL and RestClient respectively.

1. DSL query document

Elasticsearch queries are still implemented based on JSON-style DSL.

1.1. DSL query classification

Elasticsearch provides a JSON-based DSL ( Domain Specific Language ) to define queries. Common query types include:

  • Query all : Query all data, for general testing. For example: match_all
  • Full-text search (full text) query : Use the word segmenter to segment the user input content, and then match it in the inverted index database. For example:
    • match_query
    • multi_match_query
  • Precise query : Find data based on precise entry values, generally searching for keyword, numeric, date, boolean and other types of fields. For example:
    • ids
    • range
    • term
  • Geographic (geo) query : query based on latitude and longitude. For example:
    • geo_distance
    • geo_bounding_box
  • Compound (compound) query : compound query can combine the above-mentioned various query conditions and merge query conditions. For example:
    • bool
    • function_score

The query syntax is basically the same:

GET /indexName/_search
{
    
    
  "query": {
    
    
    "查询类型": {
    
    
      "查询条件": "条件值"
    }
  }
}

Let's take the query all as an example, where:

  • The query type is match_all
  • no query condition
// 查询所有
GET /indexName/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    
    }
  }
}

Other queries are nothing more than changes in query types and query conditions .

For example:

# 查询所有
GET /hotel/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  }
}

Query results:
insert image description here
Note: Although all are queried here, basically 10 items are displayed by default, and not all of them will be queried

Summary:
What is the basic syntax of the query DSL?
● GET /index library name/_ search
● { "query": { "query type": { "FIELD": "TEXT"}}}

1.2. Full text search query

1.2.1. Usage scenarios

The basic process of full-text search query is as follows:

  • Segment the content of the user's search and get the entry
  • According to the entry to match in the inverted index library, get the document id
  • Find the document according to the document id and return it to the user

The more common scenarios include:

  • Mall's input box search
  • Baidu input box search

For example, JD.com:
insert image description here
Because it is matching with terms, the fields participating in the search must also be text-type fields that can be segmented.

1.2.2. Basic syntax

Common full-text search queries include:

  • match query: single field query
  • multi_match query: multi-field query, any field meets the conditions even if it meets the query conditions
    match query syntax is as follows: the general query is a TEXT type field
GET /indexName/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "FIELD": "TEXT"
    }
  }
}

The mulit_match syntax is as follows:

GET /indexName/_search
{
    
    
  "query": {
    
    
    "multi_match": {
    
    
      "query": "TEXT",
      "fields": ["FIELD1", " FIELD12"]
    }
  }
}

1.2.3. Examples

Example of match query:

# match查询
GET /hotel/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "all": "外滩如家"
    }
  }
}

insert image description here
Example of a multi_match query:

# multi_match查询查询
GET /hotel/_search
{
    
    
  "query": {
    
    
    "multi_match": {
    
    
      "query": "外滩如家",
      "fields": ["brand","name","business"]
    }
  }
}

insert image description here
It can be seen that the results of the two queries are the same, why?

Because we copied the brand, name, and business values ​​into the all field using copy_to. So you search based on three fields, and of course the same effect as searching based on all fields.

However, the more search fields, the greater the impact on query performance, so it is recommended to use copy_to and then single-field query.

1.2.4. Summary

What is the difference between match and multi_match?

  • match: query based on a field
  • multi_match: Query based on multiple fields, the more fields involved in the query, the worse the query performance

1.3. Precise query

Precise query is generally to search for keyword, value, date, boolean and other types of fields. Therefore, the word segmentation of the search conditions will not be performed. Moreover, the search results and the query results must match exactly. The common ones are:

  • term: query based on the exact value of the term
  • range: query based on the range of values

1.3.1. term query

Because the field search for exact query is a field without word segmentation, the query condition must also be an entry without word segmentation . When querying, only when the content entered by the user exactly matches the automatic value is considered to meet the condition. If the user enters too much content, the data cannot be searched.

Grammar description:

// term查询
GET /indexName/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "FIELD": {
    
    
        "value": "VALUE"
      }
    }
  }
}

Example:

When I search for exact terms, I can correctly query the results:

# 精确查询
GET /hotel/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "city": {
    
    
        "value": "上海"
      }
    }
  }
}

Query results
insert image description here
However, when the content of my search is not an entry, but a phrase formed by multiple words, it cannot be searched:

GET /hotel/_search
{
    
    
  "query": {
    
    
    "term": {
    
    
      "city": {
    
    
        "value": "杭州上海"
      }
    }
  }
}

search result:
insert image description here

1.3.2. range query

Range query is generally used when performing range filtering on numeric types. For example, do price range filtering.

Basic syntax:

// range查询
GET /indexName/_search
{
    
    
  "query": {
    
    
    "range": {
    
    
      "FIELD": {
    
    
        "gte": 10, // 这里的gte代表大于等于,gt则代表大于
        "lte": 20 // lte代表小于等于,lt则代表小于
      }
    }
  }
}

Example:

# 精确查询 range
GET /hotel/_search
{
    
    
  "query": {
    
    
    "range": {
    
    
      "price": {
    
    
        "gte": 1000,
        "lte": 3000
      }
    }
  }
}

insert image description here

gte: greater than or equal to
lte: less than or equal to
gt: greater than or equal to
lt: less than

1.3.3. Summary

What are the common types of precise query?

  • Term query: Exact match based on terms, general search keyword type, numeric type, Boolean type, date type fields
  • range query: query based on the range of values, which can be ranges of values ​​and dates

1.4. Geographic coordinate query

The so-called geographic coordinate query is actually based on longitude and latitude query, official documents: query based on longitude and latitude (official documents)

Common usage scenarios include:

  • Ctrip: Search Hotels Near Me
  • Didi: Find taxis near me
  • WeChat: Search People Near Me

Nearby hotels:
insert image description here

Nearby cars:
insert image description here

1.4.1. Rectangular range query

Rectangular range query, that is, geo_bounding_box query, queries all documents whose coordinates fall within a certain rectangular range:
insert image description here

When querying, you need to specify the coordinates of the upper left and lower right points of the rectangle, and then draw a rectangle, and all points that fall within the rectangle are eligible points.

The syntax is as follows:

// geo_bounding_box查询
GET /indexName/_search
{
    
    
  "query": {
    
    
    "geo_bounding_box": {
    
    
      "FIELD": {
    
    
        "top_left": {
    
     // 左上点
          "lat": 31.1,
          "lon": 121.5
        },
        "bottom_right": {
    
     // 右下点
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

This does not meet the needs of "nearby people", so we will not do it.

1.4.2. Nearby query

Nearby query, also called distance query (geo_distance): query all documents whose specified center point is less than a certain distance value.

In other words, find a point on the map as the center of the circle, draw a circle with the specified distance as the radius, and the coordinates falling within the circle are considered eligible: Grammar
insert image description here
description:

// geo_distance 查询
GET /indexName/_search
{
    
    
  "query": {
    
    
    "geo_distance": {
    
    
      "distance": "15km", // 半径
      "FIELD": "31.21,121.5" // 圆心
    }
  }
}

Example:

# 附近的
GET /hotel/_search
{
    
    
  "query": {
    
    
    "geo_distance":{
    
    
      "distance" : "15km",
      "location" : "31.21,121.5"
    }
  }
}

Let's search for hotels within 15km near Lujiazui:
insert image description here

A total of 47 hotels were found.

Then shorten the radius to 3 km:

# 附近的
GET /hotel/_search
{
    
    
  "query": {
    
    
    "geo_distance":{
    
    
      "distance" : "3km",
      "location" : "31.21,121.5"
    }
  }
}

insert image description here

It can be found that the number of searched hotels has been reduced to 5.

1.5. Compound query

Compound query: A compound query can combine other simple queries to implement more complex search logic. There are two common ones:

  • fuction score: Calculation function query, which can control the calculation of document relevance and control the ranking of documents
  • bool query: Boolean query, using logical relationships to combine multiple other queries to achieve complex searches

1.5.1. Correlation score

When we use match query, the document results will be scored (_score) according to the relevance to the search term, and the returned results will be sorted in descending order of the score.

For example, if we search for "Hongqiao Home Inn", the results are as follows:

[
  {
    
    
    "_score" : 17.850193,
    "_source" : {
    
    
      "name" : "虹桥如家酒店真不错",
    }
  },
  {
    
    
    "_score" : 12.259849,
    "_source" : {
    
    
      "name" : "外滩如家酒店真不错",
    }
  },
  {
    
    
    "_score" : 11.91091,
    "_source" : {
    
    
      "name" : "迪士尼如家酒店真不错",
    }
  }
]

In elasticsearch, the early scoring algorithm is the TF-IDF algorithm, the formula is as follows:
insert image description here

In the later version 5.1 upgrade, elasticsearch improved the algorithm to BM25 algorithm, the formula is as follows:

insert image description here
The TF-IDF algorithm has a flaw, that is, the higher the term frequency, the higher the document score, and a single term has a greater impact on the document. However, BM25 will have an upper limit for the score of a single entry, and the curve will be smoother:

insert image description here

Summary: elasticsearch will score according to the relevance of terms and documents. There are two algorithms:

  • TF-IDF algorithm
  • BM25 algorithm, the algorithm adopted after version 5.1 of elasticsearch

1.5.2. Score function query

Scoring based on relevance is a reasonable requirement, but reasonable ones are not necessarily what product managers need .
Taking Baidu as an example, in your search results, it is not that the higher the relevance, the higher the ranking, but the higher the ranking is for who pays more. As shown in the picture:

insert image description here

If you want to control the correlation score, you need to use the function score query in elasticsearch .

1) Grammar Description

insert image description here

The function score query contains four parts:

  • Original query condition: query part, search for documents based on this condition, and score the document based on the BM25 algorithm, the original score (query score)
  • Filter condition : the filter part, documents that meet this condition will be recalculated
  • Calculation function : Documents that meet the filter conditions need to be calculated according to this function, and the obtained function score (function score), there are four functions
    • weight: the result of the function is a constant
    • field_value_factor: Use a field value in the document as the function result
    • random_score: Use random numbers as the result of the function
    • script_score: custom scoring function algorithm
  • Calculation mode : the result of the calculation function, the correlation calculation score of the original query, and the calculation method between the two, including:
    • multiply: Multiply
    • replace: replace query score with function score
    • Others, such as: sum, avg, max, min

The operation process of function score is as follows:

  • 1) Query and search documents according to the original conditions , and calculate the relevance score, called the original score (query score)
  • 2) According to filter conditions , filter documents
  • 3) For documents that meet the filter conditions , the function score is obtained based on the calculation of the score function
  • 4) The original score (query score) and function score (function score) are calculated based on the operation mode , and the final result is obtained as a correlation score.

So the key points here are:

  • Filter conditions: determine which documents have their scores modified
  • Scoring function: the algorithm to determine the score of the function
  • Calculation mode: determine the final calculation result

2) Example

Requirements: Rank hotels with the brand "Home Inn" higher

Translate this requirement into the four points mentioned before:

  • Original condition: Uncertain, can change arbitrarily
  • Filter condition: brand = "Home Inn"
  • Calculation function: It can be simple and rude, and directly give a fixed calculation result, weight
  • Operation mode: such as summation

Let's first use the most original query to query hotels near the Bund

# 把如家酒店排名靠前
GET /hotel/_search
{
    
    
  "query": {
    
    
    "function_score": {
    
    
      "query": {
    
    
        "match": {
    
    
          "all": "外滩"
        }
      }
    }
  }
}

The query results are as follows: Show that the Grand Hyatt Hotel is near the front
insert image description here

Now we add functions
so the final DSL statement is as follows:

GET /hotel/_search
{
    
    
  "query": {
    
    
    "function_score": {
    
    
      "query": {
    
      .... }, // 原始查询,可以是任意条件
      "functions": [ // 算分函数
        {
    
    
          "filter": {
    
     // 满足的条件,品牌必须是如家
            "term": {
    
    
              "brand": "如家"
            }
          },
          "weight": 2 // 算分权重为2
        }
      ],
      "boost_mode": "sum" // 加权模式,求和
    }
  }
}

Test, when the scoring function is not added, Home Inn's score is as follows:
insert image description here
After adding the scoring function, Home Inn's score is improved:

insert image description here

3) Summary

What are the three elements defined by function score query?

  • Filter criteria: which documents should be added points
  • Calculation function: how to calculate function score
  • Weighting method: how to calculate function score and query score

1.5.3. Compound query – Boolean query

A Boolean query is a combination of one or more query clauses, each of which is a subquery . Subqueries can be combined in the following ways:

  • must: must match each subquery, similar to "and"
  • should: Selective matching subquery, similar to "or"
  • must_not: must not match, does not participate in scoring , similar to "not"
  • filter: must match, do not participate in scoring

For example, when searching for hotels, in addition to keyword search, we may also filter based on fields such as brand, price, and city:
insert image description here

Each different field has different query conditions and methods, and must be multiple different queries. To combine these queries, you must use bool queries.

It should be noted that when searching, the more fields involved in scoring, the worse the query performance will be . Therefore, it is recommended to do this when querying with multiple conditions:

  • The keyword search in the search box is a full-text search query, use must query, and participate in scoring
  • For other filter conditions, use filter query. Do not participate in scoring

1) Grammar example:

Query:
City: Shanghai
Brand: Crowne Plaza or Ramada
Price: Above 500, not involved in scoring
Score: greater than or equal to 45, not involved in calculating points

GET /hotel/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    "term": {
    
    "city": "上海" }}
      ],
      "should": [
        {
    
    "term": {
    
    "brand": "皇冠假日" }},
        {
    
    "term": {
    
    "brand": "华美达" }}
      ],
      "must_not": [
        {
    
     "range": {
    
     "price": {
    
     "lte": 500 } }}
      ],
      "filter": [
        {
    
     "range": {
    
    "score": {
    
     "gte": 45 } }}
      ]
    }
  }
}

2) Example

Requirement: Search for hotels whose name contains "Home Inn", the price is not higher than 400, and within 10km around the coordinates 31.21, 121.5.

analyze:

  • Name search is a full-text search query and should be involved in scoring. put in must
  • If the price is not higher than 400, use range to query, which belongs to the filter condition and does not participate in the calculation of points. put in must_not
  • Within the range of 10km, use geo_distance to query, which belongs to the filter condition and does not participate in the calculation of points. put in filter
# 复合搜索--布尔查询
# 搜索名字包含“如家”,价格不高于400,在坐标31.21,121.5周围10km范围内的酒店。
GET /hotel/_search
{
    
    
  "query": {
    
    
    "bool": {
    
    
      "must": [
        {
    
    
          "match": {
    
    
            "name": "如家"
          }
        }
      ],
      "must_not": [
        {
    
    
          "range": {
    
    
            "price": {
    
    
              "gt": 400
            }
          }
        }
      ],
      "filter": [
        {
    
    
          "geo_distance": {
    
    
            "distance": "10km",
            "location": {
    
    
              "lat": 31.21,
              "lon": 121.5
            }
          }
        }
      ]
    }
  }
}

The query results are as follows:
insert image description here

3) Summary

How many logical relationships does bool query have?

  • must: conditions that must be matched, can be understood as "and"
  • should: The condition for selective matching, which can be understood as "or"
  • must_not: conditions that must not match, do not participate in scoring
  • filter: conditions that must be matched, do not participate in scoring

2. Search result processing

The search results can be processed or displayed in a way specified by the user.

2.1. Sorting

Elasticsearch sorts according to the correlation score (_score) by default, but also supports custom ways to sort search results . Field types that can be sorted include: keyword type, numeric type, geographic coordinate type, date type, etc.

2.1.1. Ordinary field sorting

The syntax for sorting by keyword, value, and date is basically the same.

Grammar :

GET /indexName/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "sort": [
    {
    
    
      "FIELD": "desc"  // 排序字段、排序方式ASC、DESC
    }
  ]
}

The sorting condition is an array, that is, multiple sorting conditions can be written. According to the order of declaration, when the first condition is equal, then sort according to the second condition, and so on

Example :

Requirement description: The hotel data is sorted in descending order of user rating (score), and the same rating is sorted in ascending order of price (price).
The code is as follows:

# 需求描述:酒店数据按照用户评价(score)降序排序,评价相同的按照价格(price)升序排序
GET /hotel/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "sort": [
    {
    
    
      "score": {
    
    
        "order": "desc"
      },
      "price": {
    
    
        "order": "asc"
      }
    }
  ]
}

search result:
insert image description here

2.1.2. Geographical coordinate sorting

Geographic coordinate ordering is slightly different.
Grammar description :

GET /indexName/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "sort": [
    {
    
    
      "_geo_distance" : {
    
    
          "FIELD" : "纬度,经度", // 文档中geo_point类型的字段名、目标坐标点
          "order" : "asc", // 排序方式
          "unit" : "km" // 排序的距离单位
      }
    }
  ]
}

The meaning of this query is:

  • Specify a coordinate as the target point
  • Calculate the distance from the coordinates of the specified field (must be geo_point type) to the target point in each document
  • Sort by distance

Example:

Requirement description: Realize sorting the hotel data in ascending order according to the distance to your location coordinates

Tip: The way to get the latitude and longitude of your location: Gaud map to get the latitude and longitude

Suppose my location is: 31.034661, 121.612282, looking for the nearest hotel around me.

# 假设我的位置是:31.034661,121.612282,寻找我周围距离最近的酒店。

GET /hotel/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  }
  , "sort": [
    {
    
    
      "_geo_distance": {
    
    
        "location": {
    
    
          "lat": 31.034661,
          "lon": 121.612282
        },
        "order": "asc",
        "unit": "km"
      }
    }
  ]
}

search result:
insert image description here

2.2. Pagination

Elasticsearch only returns top10 data by default. And if you want to query more data, you need to modify the paging parameters. In elasticsearch, modify the from and size parameters to control the paging results to be returned:

  • from: start from the first few documents
  • size: how many documents to query in total

similar to mysqllimit ?, ?

2.2.1. Basic pagination

The basic syntax of pagination is as follows:

GET /hotel/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "from": 0, // 分页开始的位置,默认为0
  "size": 10, // 期望获取的文档总数
  "sort": [
    {
    
    "price": "asc"}
  ]
}

Example: Query hotel information on pages 0 to 20, in descending order by price

# 示例:查询0 ~ 20页的酒店信息,按照价格降序
GET /hotel/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  }
  , "sort": [
    {
    
    
      "price": {
    
    
        "order": "desc"
      }
    }
  ],
  "from": 0,
  "size": 20
}

search result:
insert image description here

2.2.2. Deep pagination problem

Now, I want to query the data of 990~1000, the query logic should be written as follows:

GET /hotel/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "from": 990, // 分页开始的位置,默认为0
  "size": 10, // 期望获取的文档总数
  "sort": [
    {
    
    "price": "asc"}
  ]
}

Here is the data starting from query 990, that is, the 990th to 1000th data.

However, when paging inside elasticsearch, you must first query 0~1000 entries, and then intercept the 10 entries of 990~1000:
insert image description here

Query TOP1000, if es is a single-point mode, this does not have much impact.

But elasticsearch must be a cluster in the future. For example, my cluster has 5 nodes, and I want to query TOP1000 data. It is not enough to query 200 items per node.

Because the TOP200 of node A may be ranked beyond 10,000 on another node.

Therefore, if you want to obtain the TOP1000 of the entire cluster, you must first query the TOP1000 of each node. After summarizing the results, re-rank and re-intercept the TOP1000.
insert image description here
So what if I want to query the data of 9900~10000? Do we need to query TOP10000 first? Then each node has to query 10,000 entries? aggregated into memory?

When the query paging depth is large, there will be too much summary data, which will put a lot of pressure on the memory and CPU. Therefore, elasticsearch will prohibit requests with from+ size exceeding 10,000.

For deep paging, ES provides two solutions, official documents :

  • search after: sorting is required when paging, the principle is to query the next page of data starting from the last sorting value. The official recommended way to use.
  • scroll: The principle is to form a snapshot of the sorted document ids and store them in memory. It is officially deprecated.

2.2.3. Summary

Common implementation schemes and advantages and disadvantages of pagination query:

  • from + size
    • Advantages: Support random page turning
    • Disadvantages: deep paging problem, the default query upper limit (from + size) is 10000
    • Scenario: Random page-turning searches such as Baidu, JD.com, Google, and Taobao
  • after search
    • Advantages: no query upper limit (the size of a single query does not exceed 10000)
    • Disadvantage: can only query backward page by page, does not support random page turning
    • Scenario: Search without random page turning requirements, such as mobile phone scrolling down to turn pages
  • scroll
    • Advantages: no query upper limit (the size of a single query does not exceed 10000)
    • Disadvantages: There will be additional memory consumption, and the search results are not real-time
    • Scenario: Acquisition and migration of massive data. It is not recommended starting from ES7.1. It is recommended to use the after search solution.

2.3. Highlight

2.3.1. Highlighting principle

What is highlighting?

When we search on Baidu and JD.com, the keyword will turn red, which is more eye-catching. This is called highlighting: the
insert image description here
implementation of highlighting is divided into two steps:

  • 1) Add a label to all keywords in the document, such as <em>label
  • 2) The page <em>writes CSS styles for the tags

2.3.2. Achieve highlighting

Highlighted syntax :

GET /hotel/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "FIELD": "TEXT" // 查询条件,高亮一定要使用全文检索查询
    }
  },
  "highlight": {
    
    
    "fields": {
    
     // 指定要高亮的字段
      "FIELD": {
    
    
        "pre_tags": "<em>",  // 用来标记高亮字段的前置标签
        "post_tags": "</em>" // 用来标记高亮字段的后置标签
      }
    }
  }
}

Example: For example, we want to highlight Home Inn

# 示例:如家显示高亮
GET /hotel/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "all": "如家"
    }
  },
  "highlight": {
    
    
    "fields": {
    
    
      "name": {
    
    
        "pre_tags": "<em>"
        , "post_tags": "</em>"
      }
    }
  }
}

The result is as follows:
insert image description here

Notice:

  • Highlighting is for keywords, so the search conditions must contain keywords , not range queries.
  • By default, the highlighted field must be the same as the field specified by the search , otherwise it cannot be highlighted
  • If you want to highlight non-search fields, you need to add an attribute: required_field_match=false

Example :

# 示例:如家显示高亮
GET /hotel/_search
{
    
    
  "query": {
    
    
    "match": {
    
    
      "all": "如家"
    }
  },
  "highlight": {
    
    
    "fields": {
    
    
      "name": {
    
    
        "pre_tags": "<em>",
        "post_tags": "</em>",
        "require_field_match": "false"
      }
    }
  }
}

insert image description here

2.4. Summary

The query DSL is a large JSON object with the following properties:

  • query: query condition
  • from and size: paging conditions
  • sort: sorting conditions
  • highlight: highlight condition

Example:
insert image description here

3. RestClient query document

Document query is also applicable to the RestHighLevelClient object learned yesterday. The basic steps include:

  • 1) Prepare the Request object
  • 2) Prepare request parameters
  • 3) Initiate a request
  • 4) Parse the response

3.1. Quick Start

Let's take the match_all query as an example

3.1.1. Initiate query request

insert image description here

Code interpretation:

  • The first step is to create SearchRequestan object and specify the index library name
  • The second step is to use request.source()the construction of DSL, which can include query, paging, sorting, highlighting, etc.
    • query(): Represents the query condition, using QueryBuilders.matchAllQuery()the DSL to construct a match_all query
  • The third step is to use client.search() to send a request and get a response

There are two key APIs here. One is request.source()that it contains all functions such as query, sorting, paging, highlighting, etc.:
insert image description here

The other is that QueryBuildersit contains various queries such as match, term, function_score, bool:
insert image description here
the code is as follows:
HotelSearchTest.java

    /**
     * DSL查询所有索引matchall
     *
     * @throws IOException
     */
    @Test
    void testMatchAll() throws IOException {
        SearchRequest searchRequest = new SearchRequest("hotel");
        searchRequest.source().query(QueryBuilders.matchAllQuery());
        SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        System.out.println("结果是:" + response);
    }

Check the result after running:
insert image description here

3.1.2. Parsing the response

Analysis of the response result:
insert image description here

The result returned by elasticsearch is a JSON string, the structure contains:

  • hits: the result of the hit
    • total: The total number of entries, where value is the specific total entry value
    • max_score: the relevance score of the highest scoring document across all results
    • hits: An array of documents for search results, each of which is a json object
      • _source: the original data in the document, also a json object

Therefore, we parse the response result, which is to parse the JSON string layer by layer. The process is as follows:

  • SearchHits: Obtained through response.getHits(), which is the outermost hits in JSON, representing the result of the hit
    • SearchHits#getTotalHits().value: Get the total number of information
    • SearchHits#getHits(): Get the SearchHit array, which is the document array
      • SearchHit#getSourceAsString(): Get the _source in the document result, which is the original json document data

3.1.3. Complete code

The complete code is as follows:

@Test
void testMatchAll() throws IOException {
    
    
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    request.source()
        .query(QueryBuilders.matchAllQuery());
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4.解析响应
    handleResponse(response);
}

private void handleResponse(SearchResponse response) {
    
    
    // 4.解析响应
    SearchHits searchHits = response.getHits();
    // 4.1.获取总条数
    long total = searchHits.getTotalHits().value;
    System.out.println("共搜索到" + total + "条数据");
    // 4.2.文档数组
    SearchHit[] hits = searchHits.getHits();
    // 4.3.遍历
    for (SearchHit hit : hits) {
    
    
        // 获取文档source
        String json = hit.getSourceAsString();
        // 反序列化
        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
        System.out.println("hotelDoc = " + hotelDoc);
    }
}

operation result:
insert image description here

3.1.4. Summary

The basic steps of a query are:

  1. Create a SearchRequest object

  2. Prepare Request.source(), which is DSL.

    ① QueryBuilders to build query conditions

    ② Pass in the query() method of Request.source()

  3. send request, get result

  4. Parsing results (refer to JSON results, from outside to inside, parse layer by layer)

3.2. match query

The match and multi_match queries of full-text search are basically the same as the API of match_all. The difference is the query condition, which is the query part.

insert image description here

Therefore, the difference in the Java code is mainly the parameters in request.source().query(). Also use the method provided by QueryBuilders:

insert image description here

The result parsing code is completely consistent and can be extracted and shared.

The complete code is as follows:

@Test
void testMatch() throws IOException {
    
    
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    request.source()
        .query(QueryBuilders.matchQuery("all", "如家"));
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

The query results are as follows:
insert image description here
Let's take a look at multimatch
HotelSearchTest.java

  @Test
    public void testMultiMatch() throws IOException {
    
    
        SearchRequest searchRequest = new SearchRequest("hotel");
        searchRequest.source().query(QueryBuilders.multiMatchQuery("如家", "brand", "name", "business"));
        SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        SearchHits hits = response.getHits();
        long total = hits.getTotalHits().value;
        System.out.println("总共有:" + total + "条");
        SearchHit[] searchHits = hits.getHits();
        for (SearchHit searchHit : searchHits) {
    
    
            String sourceAsString = searchHit.getSourceAsString();
            HotelDoc hotelDoc = JSON.parseObject(sourceAsString, HotelDoc.class);
            System.out.println(hotelDoc);
        }
    }

The result is as follows:
insert image description here

3.3. Precise query

Exact queries are mainly two:

  • term: term exact match
  • range: range query

Compared with the previous query, the difference is also in the query condition, and everything else is the same.

The API for query condition construction is as follows:
insert image description here
term example:
HotelSearchTest.java

@Test
    public void testTerm() throws IOException {
    
    
        SearchRequest searchRequest = new SearchRequest("hotel");
        searchRequest.source().query(QueryBuilders.termQuery("city", "深圳"));
        SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        SearchHits hits = response.getHits();
        long total = hits.getTotalHits().value;
        System.out.println("总共有:" + total + "条");
        SearchHit[] searchHits = hits.getHits();
        for (SearchHit searchHit : searchHits) {
    
    
            String sourceAsString = searchHit.getSourceAsString();
            HotelDoc hotelDoc = JSON.parseObject(sourceAsString, HotelDoc.class);
            System.out.println(hotelDoc);
        }
    }

Query result:
insert image description here
range Example:
HotelSearchTest.java

@Test
    public void testRange() throws IOException {
    
    
        SearchRequest searchRequest = new SearchRequest("hotel");
        searchRequest.source().query(QueryBuilders.rangeQuery("price").gte(300));
        SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        SearchHits hits = response.getHits();
        long total = hits.getTotalHits().value;
        System.out.println("总共有:" + total + "条");
        SearchHit[] searchHits = hits.getHits();
        for (SearchHit searchHit : searchHits) {
    
    
            String sourceAsString = searchHit.getSourceAsString();
            HotelDoc hotelDoc = JSON.parseObject(sourceAsString, HotelDoc.class);
            System.out.println(hotelDoc);
        }
    }

search result:
insert image description here

3.4. Boolean queries

Boolean query is to combine other queries with must, must_not, filter, etc. The code example is as follows:
insert image description here

It can be seen that the difference between API and other queries is that the construction of query conditions, QueryBuilders, result parsing and other codes are completely unchanged.

The complete code is as follows:

@Test
void testBool() throws IOException {
    
    
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.准备BooleanQuery
    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
    // 2.2.添加term
    boolQuery.must(QueryBuilders.termQuery("city", "上海"));
    // 2.3.添加range
    boolQuery.filter(QueryBuilders.rangeQuery("price").lte(350));

    request.source().query(boolQuery);
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

search result:
insert image description here

3.5. Sorting, paging

The sorting and paging of search results are parameters at the same level as query, so they are also set using request.source().

The corresponding APIs are as follows:
insert image description here

Full code example:

@Test
void testPageAndSort() throws IOException {
    
    
    // 页码,每页大小
    int page = 1, size = 5;

    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.query
    request.source().query(QueryBuilders.matchAllQuery());
    // 2.2.排序 sort
    request.source().sort("price", SortOrder.DESC);
    // 2.3.分页 from、size
    request.source().from((page - 1) * size).size(5);
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

Query result: price descending order
insert image description here

3.6. Highlight

The highlighted code is quite different from the previous code, there are two points:

  • Query DSL: In addition to query conditions, you also need to add highlight conditions, which are also at the same level as query.
  • Result parsing: In addition to parsing the _source document data, the result also needs to parse the highlighted result

3.6.1. Highlight request build

The construction API of the highlight request is as follows:

insert image description here

The above code omits the query condition part, but don’t forget: the highlight query must use full-text search and search keywords, so that keywords can be highlighted in the future.

The complete code is as follows:

@Test
void testHighlight() throws IOException {
    
    
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.query
    request.source().query(QueryBuilders.matchQuery("all", "如家"));
    // 2.2.高亮
    request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false));
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

3.6.2. Analysis of highlighted results

The highlighted results and the query document results are separated by default and not together.

So parsing the highlighted code requires additional processing:
insert image description here

Code interpretation:

  • Step 1: Get the source from the result. hit.getSourceAsString(), this part is the non-highlighted result, json string. It also needs to be deserialized into a HotelDoc object
  • Step 2: Obtain the highlighted result. hit.getHighlightFields(), the return value is a Map, the key is the highlight field name, and the value is the HighlightField object, representing the highlight value
  • Step 3: Obtain the highlighted field value object HighlightField from the map according to the highlighted field name
  • Step 4: Get Fragments from HighlightField and convert them to strings. This part is the real highlighted string
  • Step 5: Replace non-highlighted results in HotelDoc with highlighted results

The complete code is as follows:

private void handleResponse(SearchResponse response) {
    
    
    // 4.解析响应
    SearchHits searchHits = response.getHits();
    // 4.1.获取总条数
    long total = searchHits.getTotalHits().value;
    System.out.println("共搜索到" + total + "条数据");
    // 4.2.文档数组
    SearchHit[] hits = searchHits.getHits();
    // 4.3.遍历
    for (SearchHit hit : hits) {
    
    
        // 获取文档source
        String json = hit.getSourceAsString();
        // 反序列化
        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
        // 获取高亮结果
        Map<String, HighlightField> highlightFields = hit.getHighlightFields();
        if (!CollectionUtils.isEmpty(highlightFields)) {
    
    
            // 根据字段名获取高亮结果
            HighlightField highlightField = highlightFields.get("name");
            if (highlightField != null) {
    
    
                // 获取高亮值
                String name = highlightField.getFragments()[0].string();
                // 覆盖非高亮结果
                hotelDoc.setName(name);
            }
        }
        System.out.println("hotelDoc = " + hotelDoc);
    }
}

Note: CollectionUtils.isEmpty uses the query results under this package
insert image description here
:
insert image description here

4. Dark horse tourism case

Below, we use the case of dark horse tourism to practice the knowledge we have learned before.

We implement four functions:

  • Hotel Search and Pagination
  • Hotel results filter
  • Hotels near me
  • Hotel PPC

Start the hotel-demo project we provide, its default port is 8089, visit http://localhost:8090, you can see the project page:

insert image description here
Open the console with F12 and find that an error is reported, because there is no list request code has not been perfected
insert image description here

4.1. Hotel search and pagination

Case requirements: Realize the hotel search function of Heima Tourism, complete keyword search and pagination

4.1.1. Demand Analysis

On the home page of the project, there is a big search box and pagination buttons:

insert image description here

Click the search button, and you can see that the browser console has made a request:

insert image description here
The request parameters are as follows:
insert image description here
From this, we can know that the information of our request is as follows:

  • Request method: POST
  • Request path: /hotel/list
  • Request parameter: JSON object, including 4 fields:
    • key: search keyword
    • page: page number
    • size: size of each page
    • sortBy: sorting, currently not implemented
  • Return value: pagination query, which needs to return the pagination result PageResult, which contains two attributes:
    • total: total number
    • List<HotelDoc>: Data of the current page

Therefore, our business process is as follows:

  • Step 1: Define the entity class and receive the JSON object of the request parameter
  • Step 2: Write a controller to receive page requests
  • Step 3: Write business implementation, use RestHighLevelClient to realize search and paging

4.1.2. Define entity classes

There are two entity classes, one is the front-end request parameter entity, and the other is the response result entity that the server should return.
1) Request parameters
The json structure of the front-end request is as follows:

{
    
    
    "key": "搜索关键字",
    "page": 1,
    "size": 3,
    "sortBy": "default"
}

Therefore, we cn.itcast.hotel.pojodefine an entity class under the package:

package cn.itcast.hotel.pojo;

import lombok.Data;

@Data
public class RequestParams {
    
    
    private String key;
    private Integer page;
    private Integer size;
    private String sortBy;
}

2) Return value
Paged query needs to return the paged result PageResult, which contains two attributes:

  • total: total number
  • List<HotelDoc>: Data of the current page

Therefore, we cn.itcast.hotel.pojodefine the return result in:

package cn.itcast.hotel.pojo;

import lombok.Data;

import java.util.List;

@Data
public class PageResult {
    
    
    private Long total;
    private List<HotelDoc> hotels;

    public PageResult() {
    
    
    }

    public PageResult(Long total, List<HotelDoc> hotels) {
    
    
        this.total = total;
        this.hotels = hotels;
    }
}

4.1.3. Define the controller

Define a HotelController, declare the query interface, and meet the following requirements:

  • Request method: Post
  • Request path: /hotel/list
  • Request parameter: object, of type RequestParam
  • Return value: PageResult, which contains two attributes
    • Long total: total number
    • List<HotelDoc> hotels: hotel data

So we cn.itcast.hotel.webdefine HotelController in:

@RestController
@RequestMapping("/hotel")
public class HotelController {
    
    

    @Autowired
    private IHotelService hotelService;
	// 搜索酒店数据
    @PostMapping("/list")
    public PageResult search(@RequestBody RequestParams params){
    
    
        return hotelService.search(params);
    }
}

4.1.4. Realize search service

We called IHotelService in the controller, but did not implement this method, so we will define the method in IHotelService and implement the business logic.

1) Define a method in the interface cn.itcast.hotel.servicein :IHotelService

/**
 * 根据关键字搜索酒店信息
 * @param params 请求参数对象,包含用户输入的关键字 
 * @return 酒店文档列表
 */
PageResult search(RequestParams params);

2) To realize the search business, RestHighLevelClient is definitely inseparable. We need to register it in Spring as a Bean. Declare this bean in cn.itcast.hotel:HotelDemoApplication

@Bean
public RestHighLevelClient client(){
    
    
    return  new RestHighLevelClient(RestClient.builder(
        HttpHost.create("http://192.168.150.101:9200")
    ));
}

3) Implement the search method in cn.itcast.hotel.service.impl:HotelService

@Override
public PageResult search(RequestParams params) {
    
    
    try {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        // 2.1.query
        String key = params.getKey();
        if (key == null || "".equals(key)) {
    
    
            boolQuery.must(QueryBuilders.matchAllQuery());
        } else {
    
    
            boolQuery.must(QueryBuilders.matchQuery("all", key));
        }

        // 2.2.分页
        int page = params.getPage();
        int size = params.getSize();
        request.source().from((page - 1) * size).size(size);

        // 3.发送请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析响应
        return handleResponse(response);
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

// 结果解析
private PageResult handleResponse(SearchResponse response) {
    
    
    // 4.解析响应
    SearchHits searchHits = response.getHits();
    // 4.1.获取总条数
    long total = searchHits.getTotalHits().value;
    // 4.2.文档数组
    SearchHit[] hits = searchHits.getHits();
    // 4.3.遍历
    List<HotelDoc> hotels = new ArrayList<>();
    for (SearchHit hit : hits) {
    
    
        // 获取文档source
        String json = hit.getSourceAsString();
        // 反序列化
        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
		// 放入集合
        hotels.add(hotelDoc);
    }
    // 4.4.封装返回
    return new PageResult(total, hotels);
}

View the results after restarting SpringBoot
insert image description here

4.2. Hotel result filtering

Requirements: Add filter functions such as brand, city, star rating, price, etc.

4.2.1. Demand Analysis

Below the page search box, there will be some filter items:
insert image description here
the passed parameters are shown in the figure:
insert image description here
the filter conditions included are:

  • brand: brand value
  • city: city
  • minPrice~maxPrice: price range
  • starName: star

We need to do two things:

  • Modify the object RequestParams of the request parameters to receive the above parameters
  • Modify the business logic and add some filter conditions in addition to the search conditions

4.2.2. Modify entity class

Modify cn.itcast.hotel.pojothe entity class RequestParams under the package:

@Data
public class RequestParams {
    
    
    private String key;
    private Integer page;
    private Integer size;
    private String sortBy;
    // 下面是新增的过滤条件参数
    private String city;
    private String brand;
    private String starName;
    private Integer minPrice;
    private Integer maxPrice;
}

4.2.3. Modify search service

In the search method of HotelService, there is only one place that needs to be modified: the query condition in requet.source().query( ... ).

In the previous business, there was only match query, and it was searched according to keywords. Now it is necessary to add conditional filtering, including:

  • Brand filtering: keyword type, query by term
  • Star filter: keyword type, use term query
  • Price filtering: it is a numeric type, query with range
  • City filter: keyword type, query with term

The combination of multiple query conditions must be combined with boolean queries:

  • Put the keyword search in the must, and participate in the score calculation
  • Other filter conditions are placed in the filter and do not participate in the calculation of points

Because the logic of conditional construction is more complicated, it is encapsulated as a function first:
insert image description here

The code for buildBasicQuery is as follows:

private void buildBasicQuery(RequestParams params, SearchRequest request) {
    
    
    // 1.构建BooleanQuery
    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
    // 2.关键字搜索
    String key = params.getKey();
    if (key == null || "".equals(key)) {
    
    
        boolQuery.must(QueryBuilders.matchAllQuery());
    } else {
    
    
        boolQuery.must(QueryBuilders.matchQuery("all", key));
    }
    // 3.城市条件
    if (params.getCity() != null && !params.getCity().equals("")) {
    
    
        boolQuery.filter(QueryBuilders.termQuery("city", params.getCity()));
    }
    // 4.品牌条件
    if (params.getBrand() != null && !params.getBrand().equals("")) {
    
    
        boolQuery.filter(QueryBuilders.termQuery("brand", params.getBrand()));
    }
    // 5.星级条件
    if (params.getStarName() != null && !params.getStarName().equals("")) {
    
    
        boolQuery.filter(QueryBuilders.termQuery("starName", params.getStarName()));
    }
	// 6.价格
    if (params.getMinPrice() != null && params.getMaxPrice() != null) {
    
    
        boolQuery.filter(QueryBuilders
                         .rangeQuery("price")
                         .gte(params.getMinPrice())
                         .lte(params.getMaxPrice())
                        );
    }
	// 7.放入source
    request.source().query(boolQuery);
}

search result:
insert image description here

4.3. Hotels around me

Demand: hotels near me

4.3.1. Demand analysis

On the right side of the hotel list page, there is a small map, click the location button of the map, the map will find your location:
insert image description here

And, a query request will be initiated on the front end to send your coordinates to the server:
insert image description here

What we have to do is to sort the surrounding hotels according to the distance based on the location coordinates. The implementation idea is as follows:

  • Modify the RequestParams parameter to receive the location field
  • Modify the business logic of the search method, if the location has a value, add the function of sorting according to geo_distance

4.3.2. Modify entity class

Modify cn.itcast.hotel.pojothe entity class RequestParams under the package:

package cn.itcast.hotel.pojo;

import lombok.Data;

@Data
public class RequestParams {
    
    
    private String key;
    private Integer page;
    private Integer size;
    private String sortBy;
    private String city;
    private String brand;
    private String starName;
    private Integer minPrice;
    private Integer maxPrice;
    // 我当前的地理坐标
    private String location;
}

4.3.3. Distance sorting API

We have previously learned about sorting functions, including two types:

  • Ordinary field sorting
  • Sort by geographic coordinates

We only talked about the Java writing method corresponding to ordinary field sorting. Geographical coordinate sorting has only learned DSL syntax, as follows:

GET /indexName/_search
{
    
    
  "query": {
    
    
    "match_all": {
    
    }
  },
  "sort": [
    {
    
    
      "price": "asc"  
    },
    {
    
    
      "_geo_distance" : {
    
    
          "FIELD" : "纬度,经度",
          "order" : "asc",
          "unit" : "km"
      }
    }
  ]
}

Corresponding java code example:
insert image description here

4.3.4. Add distance sorting

In the method of cn.itcast.hotel.service.impl, add a sort function:HotelServicesearch
insert image description here

Full code:

@Override
public PageResult search(RequestParams params) {
    
    
    try {
    
    
        // 1.准备Request
        SearchRequest request = new SearchRequest("hotel");
        // 2.准备DSL
        // 2.1.query
        buildBasicQuery(params, request);

        // 2.2.分页
        int page = params.getPage();
        int size = params.getSize();
        request.source().from((page - 1) * size).size(size);

        // 2.3.排序
        String location = params.getLocation();
        if (location != null && !location.equals("")) {
    
    
            request.source().sort(SortBuilders
                                  .geoDistanceSort("location", new GeoPoint(location))
                                  .order(SortOrder.ASC)
                                  .unit(DistanceUnit.KILOMETERS)
                                 );
        }

        // 3.发送请求
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        // 4.解析响应
        return handleResponse(response);
    } catch (IOException e) {
    
    
        throw new RuntimeException(e);
    }
}

4.3.5. Sorting distance display

After restarting the service, I tested my hotel function:
insert image description here
I found that it is indeed possible to sort the hotels near me, but I didn’t see how far the hotel is from me. What should I do?

After the sorting is completed, the page also needs to obtain the specific distance value of each hotel near me, which is independent in the response result:
insert image description here
Therefore, in the result parsing stage, in addition to parsing the source part, we also need to get the sort part, that is The sorted distances are then put into the response result.

We do two things:

  • Modify HotelDoc, add a sorting distance field for page display
  • Modify the handleResponse method in the HotelService class to add the acquisition of the sort value

1) Modify the HotelDoc class and add a distance field

package cn.itcast.hotel.pojo;

import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@NoArgsConstructor
public class HotelDoc {
    
    
    private Long id;
    private String name;
    private String address;
    private Integer price;
    private Integer score;
    private String brand;
    private String city;
    private String starName;
    private String business;
    private String location;
    private String pic;
    // 排序时的 距离值
    private Object distance;

    public HotelDoc(Hotel hotel) {
    
    
        this.id = hotel.getId();
        this.name = hotel.getName();
        this.address = hotel.getAddress();
        this.price = hotel.getPrice();
        this.score = hotel.getScore();
        this.brand = hotel.getBrand();
        this.city = hotel.getCity();
        this.starName = hotel.getStarName();
        this.business = hotel.getBusiness();
        this.location = hotel.getLatitude() + ", " + hotel.getLongitude();
        this.pic = hotel.getPic();
    }
}

2) Modify the handleResponse method in HotelService
insert image description here

After restarting the test, it was found that the page can successfully display the distance:
insert image description here

4.4. Hotel bidding ranking

Requirement: Let the specified hotel rank at the top of the search results

4.4.1. Demand analysis

To make the specified hotel rank at the top of the search results, the effect is as shown in the figure:

insert image description here
The page adds an ad tag to the specified hotel .

So how can we make the designated hotel rank top?

The function_score query we have learned before can affect the score. The higher the score, the higher the natural ranking. And function_score contains 3 elements:

  • Filter criteria: which documents should be added points
  • Calculation function: how to calculate function score
  • Weighting method: how to calculate function score and query score

The demand here is: to make the designated hotel rank high. Therefore, we need to add a mark to these hotels, so that in the filter condition, we can judge according to this mark whether to increase the score .

For example, we add a field to the hotel: isAD, Boolean type:

  • true: is an advertisement
  • false: not an ad

In this way, function_score contains 3 elements and it is easy to determine:

  • Filter condition: determine whether isAD is true
  • Calculation function: we can use the simplest violent weight, fixed weighted value
  • Weighting method: You can use the default multiplication method to greatly improve the calculation score

Therefore, the implementation steps of the business include:

  1. Add isAD field to HotelDoc class, Boolean type
  2. Pick a few hotels you like, add the isAD field to its document data, and the value is true
  3. Modify the search method, add the function score function, and add weight to the hotel whose isAD value is true

4.4.2. Modify the HotelDoc entity

Add the isAD field to cn.itcast.hotel.pojothe HotelDoc class under the package:
insert image description here

4.4.3. Adding Ad Marks

Next, we pick a few hotels, add the isAD field, and set it to true:

# 增加广告
POST /hotel/_update/36934
{
    
    
  "doc": {
    
    
    "isAD" : true
  }
}

POST /hotel/_update/38609
{
    
    
  "doc": {
    
    
    "isAD" : true
  }
}

POST /hotel/_update/38665
{
    
    
  "doc": {
    
    
    "isAD" : true
  }
}

POST /hotel/_update/47478
{
    
    
  "doc": {
    
    
    "isAD" : true
  }
}

4.4.4. Add scoring function query

Next, we will modify the query conditions. The boolean query was used before, but now it needs to be changed to function_socre query.

The function_score query structure is as follows:

insert image description here

The corresponding Java API is as follows:
insert image description here
We can put the previously written boolean query as the original query condition into the query, and then add filter conditions , scoring functions , and weighting modes . So the original code can still be used.

Modify the method in the class cn.itcast.hotel.service.implunder the package and add the calculation function query:HotelServicebuildBasicQuery

private void buildBasicQuery(RequestParams params, SearchRequest request) {
    
    
    // 1.构建BooleanQuery
    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
    // 关键字搜索
    String key = params.getKey();
    if (key == null || "".equals(key)) {
    
    
        boolQuery.must(QueryBuilders.matchAllQuery());
    } else {
    
    
        boolQuery.must(QueryBuilders.matchQuery("all", key));
    }
    // 城市条件
    if (params.getCity() != null && !params.getCity().equals("")) {
    
    
        boolQuery.filter(QueryBuilders.termQuery("city", params.getCity()));
    }
    // 品牌条件
    if (params.getBrand() != null && !params.getBrand().equals("")) {
    
    
        boolQuery.filter(QueryBuilders.termQuery("brand", params.getBrand()));
    }
    // 星级条件
    if (params.getStarName() != null && !params.getStarName().equals("")) {
    
    
        boolQuery.filter(QueryBuilders.termQuery("starName", params.getStarName()));
    }
    // 价格
    if (params.getMinPrice() != null && params.getMaxPrice() != null) {
    
    
        boolQuery.filter(QueryBuilders
                         .rangeQuery("price")
                         .gte(params.getMinPrice())
                         .lte(params.getMaxPrice())
                        );
    }

    // 2.算分控制
    FunctionScoreQueryBuilder functionScoreQuery =
        QueryBuilders.functionScoreQuery(
        // 原始查询,相关性算分的查询
        boolQuery,
        // function score的数组
        new FunctionScoreQueryBuilder.FilterFunctionBuilder[]{
    
    
            // 其中的一个function score 元素
            new FunctionScoreQueryBuilder.FilterFunctionBuilder(
                // 过滤条件
                QueryBuilders.termQuery("isAD", true),
                // 算分函数
                ScoreFunctionBuilders.weightFactorFunction(10)
            )
        });
    request.source().query(functionScoreQuery);
}

search result
insert image description here

Guess you like

Origin blog.csdn.net/sinat_38316216/article/details/129724704