Elasticsearch --- DSL, RestClient query documents, search result processing

1. DSL query document

Elasticsearch queries are still implemented based on JSON-style DSL.

 

1.1, DSL query classification

Elasticsearch provides a JSON-based DSL ( Domain Specific Language ) to define queries. Common query types include:

  • Query all : Query all data, for general testing. For example: match_all

  • Full-text search (full text) query : Use the word segmenter to segment the user input content, and then match it in the inverted index database. For example:

    • match_query

    • multi_match_query

  • Precise query : Find data based on precise entry values, generally searching for keyword, numeric, date, boolean and other types of fields. For example:

    • ids

    • range

    • term

  • Geographic (geo) query : query based on latitude and longitude. For example:

    • geo_distance

    • geo_bounding_box

  • Compound (compound) query : compound query can combine the above-mentioned various query conditions and merge query conditions. For example:

    • bool

    • function_score

The query syntax is basically the same:

GET /indexName/_search
{
  "query": {
    "查询类型": {
      "查询条件": "条件值"
    }
  }
}

Let's take the query all as an example, where:

  • The query type is match_all

  • no query condition

// 查询所有
GET /indexName/_search
{
  "query": {
    "match_all": {
    }
  }
}

Other queries are nothing more than changes in query types and query conditions .

 

1.2. Full-text search query

scenes to be used


The basic process of full-text search query is as follows:

  • Segment the content of the user's search and get the entry

  • According to the entry to match in the inverted index library, get the document id

  • Find the document according to the document id and return it to the user

The more common scenarios include:

  • Mall's input box search

  • Baidu input box search

For example, Jingdong:

Because the entries are used to match, the fields participating in the search must also be text-type fields that can be segmented.  


basic grammar


Common full-text search queries include:

  • match query: single field query

  • multi_match query: multi-field query, any field meets the conditions even if it meets the query conditions

The match query syntax is as follows:

GET /indexName/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT"
    }
  }
}

The mulit_match syntax is as follows:

GET /indexName/_search
{
  "query": {
    "multi_match": {
      "query": "TEXT",
      "fields": ["FIELD1", " FIELD12"]
    }
  }
}

example


Example of match query:

Example of a multi_match query:

It can be seen that the results of the two queries are the same, why? .

 

Because we copied the brand, name, and business values ​​into the all field using copy_to. So you search based on three fields, and of course the same effect as searching based on all fields.

 

However, the more search fields, the greater the impact on query performance, so it is recommended to use copy_to and then single-field query.


Summarize

What is the difference between match and multi_match?

  • match: query based on a field

  • multi_match: Query based on multiple fields, the more fields involved in the query, the worse the query performance

 

1.3. Accurate query

Precise query is generally to search for keyword, value, date, boolean and other types of fields. Therefore, the word segmentation of the search conditions will not be performed. The common ones are:

  • term: query based on the exact value of the term

  • range: query based on the range of values

term query


Because the field search for exact query is a field without word segmentation, the query condition must also be an entry without word segmentation . When querying, only when the content entered by the user exactly matches the automatic value is considered to meet the condition. If the user enters too much content, the data cannot be searched.

 

Grammar description:

// term查询
GET /indexName/_search
{
  "query": {
    "term": {
      "FIELD": {
        "value": "VALUE"
      }
    }
  }
}

Example:

When I search for exact terms, I can correctly query the results:

However, when the content of my search is not an entry, but a phrase formed by multiple words, it cannot be searched:


range query


Range query is generally used when performing range filtering on numeric types. For example, do price range filtering.

 

Basic syntax:

// range查询
GET /indexName/_search
{
  "query": {
    "range": {
      "FIELD": {
        "gte": 10, // 这里的gte代表大于等于,gt则代表大于
        "lte": 20 // lte代表小于等于,lt则代表小于
      }
    }
  }
}

Example:


Summarize


What are the common types of precise query?

  • Term query: Exact match based on terms, general search keyword type, numeric type, Boolean type, date type fields

  • range query: query based on the range of values, which can be ranges of values ​​and dates

 

1.4. Geographical coordinate query

The so-called geographic coordinate query is actually a query based on latitude and longitude. Official documents: Geo queries | Elasticsearch Guide [8.7] | Elastic

Common usage scenarios include:

  • Ctrip: Search Hotels Near Me

  • Didi: Find taxis near me

  • WeChat: Search People Near Me

Nearby hotels:

Nearby cars:

Rectangular range query


Rectangular range query, that is, geo_bounding_box query, queries all documents whose coordinates fall within a certain rectangular range:

 

When querying, you need to specify the coordinates of the upper left and lower right points of the rectangle, and then draw a rectangle, and all points that fall within the rectangle are eligible points.

 

The syntax is as follows:

// geo_bounding_box查询
GET /indexName/_search
{
  "query": {
    "geo_bounding_box": {
      "FIELD": {
        "top_left": { // 左上点
          "lat": 31.1,
          "lon": 121.5
        },
        "bottom_right": { // 右下点
          "lat": 30.9,
          "lon": 121.7
        }
      }
    }
  }
}

nearby query


Nearby query, also called distance query (geo_distance): query all documents whose specified center point is less than a certain distance value.

 

In other words, find a point on the map as the center of the circle, draw a circle with the specified distance as the radius, and the coordinates falling within the circle are considered eligible:

 

Grammar description:

// geo_distance 查询
GET /indexName/_search
{
  "query": {
    "geo_distance": {
      "distance": "15km", // 半径
      "FIELD": "31.21,121.5" // 圆心
    }
  }
}

Example: 

Let's search for hotels within 15km near Lujiazui:

A total of 47 hotels were found.

 

Then shorten the radius to 3 km:

It can be found that the number of searched hotels has been reduced to 5.  

   

1.5. Compound query

Compound query: A compound query can combine other simple queries to implement more complex search logic. There are two common ones:

  • fuction score: Calculation function query, which can control the calculation of document relevance and control the ranking of documents

  • bool query: Boolean query, using logical relationships to combine multiple other queries to achieve complex searches

 

1.5.1. Correlation score

When we use match query, the document results will be scored (_score) according to the relevance to the search term, and the returned results will be sorted in descending order of the score.

For example, if we search for "Hongqiao Home Inn", the results are as follows:

[
  {
    "_score" : 17.850193,
    "_source" : {
      "name" : "虹桥如家酒店真不错",
    }
  },
  {
    "_score" : 12.259849,
    "_source" : {
      "name" : "外滩如家酒店真不错",
    }
  },
  {
    "_score" : 11.91091,
    "_source" : {
      "name" : "迪士尼如家酒店真不错",
    }
  }
]

In elasticsearch, the early scoring algorithm is the TF-IDF algorithm, the formula is as follows:

In the later version 5.1 upgrade, elasticsearch improved the algorithm to BM25 algorithm, the formula is as follows:

The TF-IDF algorithm has a flaw, that is, the higher the term frequency, the higher the document score, and a single term has a greater impact on the document. However, BM25 will have an upper limit for the score of a single entry, and the curve will be smoother:

Summary: elasticsearch will score according to the relevance of terms and documents. There are two algorithms:

  • TF-IDF algorithm

  • BM25 algorithm, the algorithm adopted after version 5.1 of elasticsearch

   

1.5.2. Calculation function query

Scoring based on relevance is a reasonable requirement, but reasonable ones are not necessarily what product managers need .

Taking Baidu as an example, in your search results, it is not that the higher the relevance, the higher the ranking, but the higher the ranking is for who pays more.

If you want to calculate the control correlation score, you need to use the function score query in elasticsearch.

Grammar Description


 The function score query contains four parts:

  • Original query condition: query part, search for documents based on this condition, and score the document based on the BM25 algorithm, the original score (query score)

  • Filter condition : the filter part, documents that meet this condition will be recalculated

  • Calculation function : Documents that meet the filter conditions need to be calculated according to this function, and the obtained function score (function score), there are four functions

    • weight: the result of the function is a constant

    • field_value_factor: Use a field value in the document as the function result

    • random_score: Use random numbers as the result of the function

    • script_score: custom scoring function algorithm

  • Calculation mode : the result of the calculation function, the correlation calculation score of the original query, and the calculation method between the two, including:

    • multiply: Multiply

    • replace: replace query score with function score

    • Others, such as: sum, avg, max, min

The operation process of function score is as follows:

  • 1) Query and search documents according to the original conditions , and calculate the relevance score, called the original score (query score)

  • 2) According to filter conditions , filter documents

  • 3) For documents that meet the filter conditions , the function score is obtained based on the calculation of the score function

  • 4) The original score (query score) and function score (function score) are calculated based on the operation mode , and the final result is obtained as a correlation score.

So the key points here are:

  • Filter conditions: determine which documents have their scores modified

  • Scoring function: the algorithm to determine the score of the function

  • Calculation mode: determine the final calculation result 


2) Example


Requirements: Rank hotels with the brand "Home Inn" higher

Translate this requirement into the four points mentioned before:

  • Original condition: Uncertain, can change arbitrarily

  • Filter condition: brand = "Home Inn"

  • Calculation function: It can be simple and rude, and directly give a fixed calculation result, weight

  • Operation mode: such as summation

So the final DSL statement is as follows:

GET /hotel/_search
{
  "query": {
    "function_score": {
      "query": {  .... }, // 原始查询,可以是任意条件
      "functions": [ // 算分函数
        {
          "filter": { // 满足的条件,品牌必须是如家
            "term": {
              "brand": "如家"
            }
          },
          "weight": 2 // 算分权重为2
        }
      ],
      "boost_mode": "sum" // 加权模式,求和
    }
  }
}

Test, when the calculation function is not added, Home Inn's score is as follows:

After adding the scoring function, the score of Home Inn has increased:


3) Summary


What are the three elements defined by function score query?

  • Filter criteria: which documents should be added points

  • Calculation function: how to calculate function score

  • Weighting method: how to calculate function score and query score

   

1.5.3, Boolean query

A Boolean query is a combination of one or more query clauses, each of which is a subquery . Subqueries can be combined in the following ways:

  • must: must match each subquery, similar to "and"

  • should: Selective matching subquery, similar to "or"

  • must_not: must not match, does not participate in scoring , similar to "not"

  • filter: must match, do not participate in scoring

For example, when searching for hotels, in addition to keyword search, we may also filter based on fields such as brand, price, and city:

Each different field has different query conditions and methods, and must be multiple different queries. To combine these queries, you must use bool queries.

It should be noted that when searching, the more fields involved in scoring, the worse the query performance will be . Therefore, it is recommended to do this when querying with multiple conditions:

  • The keyword search in the search box is a full-text search query, use must query, and participate in scoring

  • For other filter conditions, use filter query. Do not participate in scoring

syntax example


GET /hotel/_search
{
  "query": {
    "bool": {
      "must": [
        {"term": {"city": "上海" }}
      ],
      "should": [
        {"term": {"brand": "皇冠假日" }},
        {"term": {"brand": "华美达" }}
      ],
      "must_not": [
        { "range": { "price": { "lte": 500 } }}
      ],
      "filter": [
        { "range": {"score": { "gte": 45 } }}
      ]
    }
  }
}

example


Requirement: Search for hotels whose name contains "Home Inn", the price is not higher than 400, and within 10km around the coordinates 31.21, 121.5.

analyze:

  • Name search is a full-text search query and should be involved in scoring. put in must

  • If the price is not higher than 400, use range to query, which belongs to the filter condition and does not participate in the calculation of points. put in must_not

  • Within the range of 10km, use geo_distance to query, which belongs to the filter condition and does not participate in the calculation of points. put in filter


summary


How many logical relationships does bool query have?

  • must: conditions that must be matched, can be understood as "and"

  • should: The condition for selective matching, which can be understood as "or"

  • must_not: conditions that must not match, do not participate in scoring

  • filter: conditions that must be matched, do not participate in scoring

 

 

2. Search result processing

The search results can be processed or displayed in a way specified by the user.

 

2.1. Sorting

Elasticsearch sorts according to the correlation score (_score) by default, but also supports custom ways to sort search results . Field types that can be sorted include: keyword type, numeric type, geographic coordinate type, date type, etc.

Ordinary field sorting


The syntax for sorting by keyword, value, and date is basically the same.

Grammar :

GET /indexName/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "FIELD": "desc"  // 排序字段、排序方式ASC、DESC
    }
  ]
}

The sorting condition is an array, that is, multiple sorting conditions can be written. According to the order of declaration, when the first condition is equal, then sort according to the second condition, and so on

 

Example :

Requirement description: The hotel data is sorted in descending order of user ratings (score), and the same ratings are sorted in ascending order of price (price)


Sort by geographic coordinates


Geographic coordinate ordering is slightly different.

Grammar description :

GET /indexName/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "_geo_distance" : {
          "FIELD" : "纬度,经度", // 文档中geo_point类型的字段名、目标坐标点
          "order" : "asc", // 排序方式
          "unit" : "km" // 排序的距离单位
      }
    }
  ]
}

The meaning of this query is:

  • Specify a coordinate as the target point

  • Calculate the distance from the coordinates of the specified field (must be geo_point type) to the target point in each document

  • Sort by distance

 

Example:

Requirement description: Realize sorting the hotel data in ascending order according to the distance to your location coordinates

 

Suppose my location is: 31.034661, 121.612282, looking for the nearest hotel around me.

 

2.2. Pagination

Elasticsearch only returns top10 data by default. And if you want to query more data, you need to modify the paging parameters. In elasticsearch, modify the from and size parameters to control the paging results to be returned:

  • from: start from the first few documents

  • size: how many documents to query in total

similar to mysql limit ?, ?

basic pagination


The basic syntax of pagination is as follows:

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0, // 分页开始的位置,默认为0
  "size": 10, // 期望获取的文档总数
  "sort": [
    {"price": "asc"}
  ]
}

Deep pagination problem


Now, I want to query the data of 990~1000, the query logic should be written as follows:

GET /hotel/_search
{
  "query": {
    "match_all": {}
  },
  "from": 990, // 分页开始的位置,默认为0
  "size": 10, // 期望获取的文档总数
  "sort": [
    {"price": "asc"}
  ]
}

Here is the data starting from query 990, that is, the 990th to 1000th data.

However, when paging inside elasticsearch, you must first query 0~1000 entries, and then intercept the 10 entries of 990~1000:

   

Query TOP1000, if es is a single-point mode, this does not have much impact.

But elasticsearch must be a cluster in the future. For example, my cluster has 5 nodes, and I want to query TOP1000 data. It is not enough to query 200 items per node.

Because the TOP200 of node A may be ranked beyond 10,000 on another node.

Therefore, if you want to obtain the TOP1000 of the entire cluster, you must first query the TOP1000 of each node. After summarizing the results, re-rank and re-intercept the TOP1000.

So what if I want to query the data of 9900~10000? Do we need to query TOP10000 first? Then each node has to query 10,000 entries? aggregated into memory?

 

When the query paging depth is large, there will be too much summary data, which will put a lot of pressure on the memory and CPU. Therefore, elasticsearch will prohibit requests with from+ size exceeding 10,000.

 

For deep paging, ES provides two solutions, official documents :

  • search after: sorting is required when paging, the principle is to query the next page of data starting from the last sorting value. The official recommended way to use.

  • scroll: The principle is to form a snapshot of the sorted document ids and store them in memory. It is officially deprecated.


summary


Common implementation schemes and advantages and disadvantages of pagination query:

  • from + size

    • Advantages: Support random page turning

    • Disadvantages: deep paging problem, the default query upper limit (from + size) is 10000

    • Scenario: Random page-turning searches such as Baidu, JD.com, Google, and Taobao

  • after search

    • Advantages: no query upper limit (the size of a single query does not exceed 10000)

    • Disadvantage: can only query backward page by page, does not support random page turning

    • Scenario: Search without random page turning requirements, such as mobile phone scrolling down to turn pages

  • scroll

    • Advantages: no query upper limit (the size of a single query does not exceed 10000)

    • Disadvantages: There will be additional memory consumption, and the search results are not real-time

    • Scenario: Acquisition and migration of massive data. It is not recommended starting from ES7.1. It is recommended to use the after search solution.

 

2.3. Highlight

Highlighting principle


What is highlighting?

When we search on Baidu and JD.com, the keywords will turn red, which is more eye-catching. This is called highlighting:

The implementation of highlighting is divided into two steps:

  • 1) Add a label to all keywords in the document, such as <em>label

  • 2) The page <em>writes CSS styles for the tags  


achieve highlighting


Highlighted syntax:

GET /hotel/_search
{
  "query": {
    "match": {
      "FIELD": "TEXT" // 查询条件,高亮一定要使用全文检索查询
    }
  },
  "highlight": {
    "fields": { // 指定要高亮的字段
      "FIELD": {
        "pre_tags": "<em>",  // 用来标记高亮字段的前置标签
        "post_tags": "</em>" // 用来标记高亮字段的后置标签
      }
    }
  }
}

Notice:

  • Highlighting is for keywords, so the search conditions must contain keywords , not range queries.

  • By default, the highlighted field must be the same as the field specified by the search , otherwise it cannot be highlighted

  • If you want to highlight non-search fields, you need to add an attribute: required_field_match=false

Example :


Summarize


The query DSL is a large JSON object with the following properties:

  • query: query condition

  • from and size: paging conditions

  • sort: sorting conditions

  • highlight: highlight condition

Example:



 

 

3. RestClient query documents

Document query is also applicable to the RestHighLevelClient object learned yesterday. The basic steps include:

  • 1) Prepare the Request object

  • 2) Prepare request parameters

  • 3) Initiate a request

  • 4) Parse the response

 

3.1. Quick start

Let's take the match_all query as an example

Initiate a query request


Code interpretation:

  • The first step is to create SearchRequestan object and specify the index library name

  • The second step is to use request.source()the construction of DSL, which can include query, paging, sorting, highlighting, etc.

    • query(): Represents the query condition, using QueryBuilders.matchAllQuery()the DSL to construct a match_all query

  • The third step is to use client.search() to send a request and get a response

There are two key APIs here. One is request.source()that it contains all functions such as query, sorting, paging, highlighting, etc.:

 

The other is QueryBuildersthat it contains various queries such as match, term, function_score, bool, etc.:

  


parse the response


Analysis of the response result:

The result returned by elasticsearch is a JSON string, the structure contains:

  • hits: the result of the hit

    • total: The total number of entries, where value is the specific total entry value

    • max_score: the relevance score of the highest scoring document across all results

    • hits: An array of documents for search results, each of which is a json object

      • _source: the original data in the document, also a json object

Therefore, we parse the response result, which is to parse the JSON string layer by layer. The process is as follows:

  • SearchHits: Obtained through response.getHits(), which is the outermost hits in JSON, representing the result of the hit

    • SearchHits#getTotalHits().value: Get the total number of information

    • SearchHits#getHits(): Get the SearchHit array, which is the document array

      • SearchHit#getSourceAsString(): Get the _source in the document result, which is the original json document data  


full code


The complete code is as follows:

@Test
void testMatchAll() throws IOException {
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    request.source()
        .query(QueryBuilders.matchAllQuery());
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);

    // 4.解析响应
    handleResponse(response);
}

private void handleResponse(SearchResponse response) {
    // 4.解析响应
    SearchHits searchHits = response.getHits();
    // 4.1.获取总条数
    long total = searchHits.getTotalHits().value;
    System.out.println("共搜索到" + total + "条数据");
    // 4.2.文档数组
    SearchHit[] hits = searchHits.getHits();
    // 4.3.遍历
    for (SearchHit hit : hits) {
        // 获取文档source
        String json = hit.getSourceAsString();
        // 反序列化
        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
        System.out.println("hotelDoc = " + hotelDoc);
    }
}

summary


The basic steps of a query are:

  1. Create a SearchRequest object

  2. Prepare Request.source(), which is DSL.

    ① QueryBuilders to build query conditions

    ② Pass in the query() method of Request.source()

  3. send request, get result

  4. Parsing results (refer to JSON results, from outside to inside, parse layer by layer)

 

3.2, match query

The match and multi_match queries of full-text search are basically the same as the API of match_all. The difference is the query condition, which is the query part.

Therefore, the difference in the Java code is mainly the parameters in request.source().query(). Also use the methods provided by QueryBuilders:

The result parsing code is completely consistent and can be extracted and shared.

The complete code is as follows:

@Test
void testMatch() throws IOException {
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    request.source()
        .query(QueryBuilders.matchQuery("all", "如家"));
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

 

3.3. Precise query

Exact queries are mainly two:

  • term: term exact match

  • range: range query

Compared with the previous query, the difference is also in the query condition, and everything else is the same.

The API for query condition construction is as follows:

 

3.4. Boolean query

Boolean query is to combine other queries with must, must_not, filter, etc. The code example is as follows:

It can be seen that the difference between API and other queries is that the construction of query conditions, QueryBuilders, result parsing and other codes are completely unchanged.

The complete code is as follows:

@Test
void testBool() throws IOException {
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.准备BooleanQuery
    BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
    // 2.2.添加term
    boolQuery.must(QueryBuilders.termQuery("city", "杭州"));
    // 2.3.添加range
    boolQuery.filter(QueryBuilders.rangeQuery("price").lte(250));

    request.source().query(boolQuery);
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

 

3.5, sorting, paging

The sorting and paging of search results are parameters at the same level as query, so they are also set using request.source().

The corresponding APIs are as follows:

Full code example:  

@Test
void testPageAndSort() throws IOException {
    // 页码,每页大小
    int page = 1, size = 5;

    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.query
    request.source().query(QueryBuilders.matchAllQuery());
    // 2.2.排序 sort
    request.source().sort("price", SortOrder.ASC);
    // 2.3.分页 from、size
    request.source().from((page - 1) * size).size(5);
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

 

3.6. Highlight

The highlighted code is quite different from the previous code, there are two points:

  • Query DSL: In addition to query conditions, you also need to add highlight conditions, which are also at the same level as query.

  • Result parsing: In addition to parsing the _source document data, the result also needs to parse the highlighted result

Highlight request build


The construction API of the highlight request is as follows:

The above code omits the query condition part, but don’t forget: the highlight query must use full-text search and search keywords, so that keywords can be highlighted in the future.

The complete code is as follows:

@Test
void testHighlight() throws IOException {
    // 1.准备Request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL
    // 2.1.query
    request.source().query(QueryBuilders.matchQuery("all", "如家"));
    // 2.2.高亮
    request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false));
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    handleResponse(response);

}

Analysis of highlighted results


The highlighted results and the query document results are separated by default and not together.

So parsing the highlighted code requires additional processing:

Code interpretation:

  • Step 1: Get the source from the result. hit.getSourceAsString(), this part is the non-highlighted result, json string. It also needs to be deserialized into a HotelDoc object

  • Step 2: Obtain the highlighted result. hit.getHighlightFields(), the return value is a Map, the key is the highlight field name, and the value is the HighlightField object, representing the highlight value

  • Step 3: Obtain the highlighted field value object HighlightField from the map according to the highlighted field name

  • Step 4: Get Fragments from HighlightField and convert them to strings. This part is the real highlighted string

  • Step 5: Replace non-highlighted results in HotelDoc with highlighted results

The complete code is as follows:

private void handleResponse(SearchResponse response) {
    // 4.解析响应
    SearchHits searchHits = response.getHits();
    // 4.1.获取总条数
    long total = searchHits.getTotalHits().value;
    System.out.println("共搜索到" + total + "条数据");
    // 4.2.文档数组
    SearchHit[] hits = searchHits.getHits();
    // 4.3.遍历
    for (SearchHit hit : hits) {
        // 获取文档source
        String json = hit.getSourceAsString();
        // 反序列化
        HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class);
        // 获取高亮结果
        Map<String, HighlightField> highlightFields = hit.getHighlightFields();
        if (!CollectionUtils.isEmpty(highlightFields)) {
            // 根据字段名获取高亮结果
            HighlightField highlightField = highlightFields.get("name");
            if (highlightField != null) {
                // 获取高亮值
                String name = highlightField.getFragments()[0].string();
                // 覆盖非高亮结果
                hotelDoc.setName(name);
            }
        }
        System.out.println("hotelDoc = " + hotelDoc);
    }
}

Guess you like

Origin blog.csdn.net/a1404359447/article/details/130433673