Table of contents
2.1.2. Geographical coordinate sorting
2.2.2. Deep pagination problem
3.6.1. Highlight request build
3.6.2. Analysis of highlighted results
2. Search result processing
The search results can be processed or displayed in a way specified by the user.
2.1. Sorting
Elasticsearch sorts according to the correlation score (_score) by default, but also supports custom ways to sort search results . Field types that can be sorted include: keyword type, numeric type, geographic coordinate type, date type, etc.
2.1.1. Ordinary field sorting
The syntax for sorting by keyword, value, and date is basically the same.
Grammar :
GET /indexName/_search { "query": { "match_all": {} }, "sort": [ { "FIELD": "desc" // sorting field, sorting method ASC, DESC } ] }
The sorting condition is an array, that is, multiple sorting conditions can be written. According to the order of declaration, when the first condition is equal, then sort according to the second condition, and so on
Example :
Requirement description: The hotel data is sorted in descending order of user ratings (score), and the same ratings are sorted in ascending order of price (price)
2.1.2. Geographical coordinate sorting
Geographic coordinate ordering is slightly different.
Grammar description :
GET /indexName/_search { "query": { "match_all": {} }, "sort": [ { "_geo_distance" : { "FIELD" : "latitude, longitude", // field name of geo_point type in the document, target coordinate point "order" : "asc", // sorting method "unit" : "km" // distance unit for sorting } } ] }
The meaning of this query is:
-
Specify a coordinate as the target point
-
Calculate the distance from the coordinates of the specified field (must be geo_point type) to the target point in each document
-
Sort by distance
Example:
Requirement description: Realize sorting the hotel data in ascending order according to the distance to your location coordinates
Tip: The way to get the latitude and longitude of your location: Get the latitude and longitude of mouse click-Map Properties-Example Center-JS API 2.0 Example | Gaode Map API
Suppose my location is: 31.034661, 121.612282, looking for the nearest hotel around me.
2.2. Pagination
Elasticsearch only returns top10 data by default. And if you want to query more data, you need to modify the paging parameters. In elasticsearch, modify the from and size parameters to control the paging results to be returned:
-
from: start from the first few documents
-
size: how many documents to query in total
similar to mysqllimit ?, ?
2.2.1. Basic pagination
The basic syntax of pagination is as follows:
GET /hotel/_search { "query": { "match_all": {} }, "from": 0, // The starting position of pagination, the default is 0 "size": 10, // total number of documents expected to be retrieved "sort": [ {"price": "asc"} ] }
2.2.2. Deep pagination problem
Now, I want to query the data of 990~1000, the query logic should be written as follows:
GET /hotel/_search { "query": { "match_all": {} }, "from": 990, // The starting position of pagination, the default is 0 "size": 10, // total number of documents expected to be retrieved "sort": [ {"price": "asc"} ] }
Here is the data starting from query 990, that is, the 990th to 1000th data.
However, when paging inside elasticsearch, you must first query 0~1000 entries, and then intercept the 10 entries of 990~1000:
Query TOP1000, if es is a single-point mode, this does not have much impact.
But elasticsearch must be a cluster in the future. For example, my cluster has 5 nodes, and I want to query TOP1000 data. It is not enough to query 200 items per node.
Because the TOP200 of node A may be ranked beyond 10,000 on another node.
Therefore, if you want to obtain the TOP1000 of the entire cluster, you must first query the TOP1000 of each node. After summarizing the results, re-rank and re-intercept the TOP1000.
So what if I want to query the data of 9900~10000? Do we need to query TOP10000 first? Then each node has to query 10,000 entries? aggregated into memory?
When the query paging depth is large, there will be too much summary data, which will put a lot of pressure on the memory and CPU. Therefore, elasticsearch will prohibit requests with from+ size exceeding 10,000.
For deep paging, ES provides two solutions, official documents :
-
search after: sorting is required when paging, the principle is to query the next page of data starting from the last sorting value. The official recommended way to use.
-
scroll: The principle is to form a snapshot of the sorted document ids and store them in memory. It is officially deprecated.
2.2.3. Summary
Common implementation schemes and advantages and disadvantages of pagination query:
-
from + size
:-
Advantages: Support random page turning
-
Disadvantages: deep paging problem, the default query upper limit (from + size) is 10000
-
Scenario: Random page-turning searches such as Baidu, JD.com, Google, and Taobao
-
-
after search
:-
Advantages: no query upper limit (the size of a single query does not exceed 10000)
-
Disadvantage: can only query backward page by page, does not support random page turning
-
Scenario: Search without random page turning requirements, such as mobile phone scrolling down to turn pages
-
-
scroll
:-
Advantages: no query upper limit (the size of a single query does not exceed 10000)
-
Disadvantages: There will be additional memory consumption, and the search results are not real-time
-
Scenario: Acquisition and migration of massive data. It is not recommended starting from ES7.1. It is recommended to use the after search solution.
-
2.3. Highlight
2.3.1. Highlighting principle
What is highlighting?
When we search on Baidu and JD.com, the keywords will turn red, which is more eye-catching. This is called highlighting:
The implementation of highlighting is divided into two steps:
-
1) Add a label to all keywords in the document, such as
<em>
label -
2) The page
<em>
writes CSS styles for the tags
2.3.2. Achieve highlighting
Highlighted syntax :
GET /hotel/_search { "query": { "match": { "FIELD": "TEXT" // query condition, highlight must use full-text search query } }, "highlight": { "fields": { // Specify the fields to highlight "FIELD": { "pre_tags": "<em>", // pre-tags used to mark highlighted fields "post_tags": "</em>" // Post tags used to mark highlighted fields } } } }
Notice:
-
Highlighting is for keywords, so the search conditions must contain keywords , not range queries.
-
By default, the highlighted field must be the same as the field specified by the search , otherwise it cannot be highlighted
-
If you want to highlight non-search fields, you need to add an attribute: required_field_match=false
Example :
2.4. Summary
The query DSL is a large JSON object with the following properties:
-
query: query condition
-
from and size: paging conditions
-
sort: sorting conditions
-
highlight: highlight condition
Example:
3. RestClient query document
Document query is also applicable to the RestHighLevelClient object learned yesterday. The basic steps include:
-
1) Prepare the Request object
-
2) Prepare request parameters
-
3) Initiate a request
-
4) Parse the response
3.1. Quick Start
Let's take the match_all query as an example
3.1.1. Initiate query request
Code interpretation:
-
The first step is to create
SearchRequest
an object and specify the index library name -
The second step is to use
request.source()
the construction of DSL, which can include query, paging, sorting, highlighting, etc.-
query()
: Represents the query condition, usingQueryBuilders.matchAllQuery()
the DSL to construct a match_all query
-
-
The third step is to use client.search() to send a request and get a response
There are two key APIs here. One is request.source()
that it contains all functions such as query, sorting, paging, highlighting, etc.:
The other is QueryBuilders
that it contains various queries such as match, term, function_score, bool, etc.:
3.1.2. Parsing the response
Analysis of the response result:
The result returned by elasticsearch is a JSON string, the structure contains:
-
hits
: the result of the hit-
total
: The total number of entries, where value is the specific total entry value -
max_score
: the relevance score of the highest scoring document across all results -
hits
: An array of documents for search results, each of which is a json object-
_source
: the original data in the document, also a json object
-
-
Therefore, we parse the response result, which is to parse the JSON string layer by layer. The process is as follows:
-
SearchHits
: Obtained through response.getHits(), which is the outermost hits in JSON, representing the result of the hit-
SearchHits#getTotalHits().value
: Get the total number of information -
SearchHits#getHits()
: Get the SearchHit array, which is the document array-
SearchHit#getSourceAsString()
: Get the _source in the document result, which is the original json document data
-
-
3.1.3. Complete code
The complete code is as follows:
@Test void testMatchAll() throws IOException { // 1. Prepare Request SearchRequest request = new SearchRequest("hotel"); // 2. Prepare DSL request.source() .query(QueryBuilders.matchAllQuery()); // 3. Send request SearchResponse response = client.search(request, RequestOptions.DEFAULT); // 4. Parse the response handleResponse(response); } private void handleResponse(SearchResponse response) { // 4. Parse the response SearchHits searchHits = response.getHits(); // 4.1. Get the total number of items long total = searchHits.getTotalHits().value; System.out.println("A total of searched" + total + "data"); // 4.2. Document array SearchHit[] hits = searchHits.getHits(); // 4.3. Traverse for (SearchHit hit : hits) { // get document source String json = hit.getSourceAsString(); // deserialize HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class); System.out.println("hotelDoc = " + hotelDoc); } }
3.1.4. Summary
The basic steps of a query are:
-
Create a SearchRequest object
-
Prepare Request.source(), which is DSL.
① QueryBuilders to build query conditions
② Pass in the query() method of Request.source()
-
send request, get result
-
Parsing results (refer to JSON results, from outside to inside, parse layer by layer)
3.2. match query
The match and multi_match queries of full-text search are basically the same as the API of match_all. The difference is the query condition, which is the query part.
Therefore, the difference in the Java code is mainly the parameters in request.source().query(). Also use the methods provided by QueryBuilders:
The result parsing code is completely consistent and can be extracted and shared.
The complete code is as follows:
@Test void testMatch() throws IOException { // 1. Prepare Request SearchRequest request = new SearchRequest("hotel"); // 2. Prepare DSL request.source() .query(QueryBuilders.matchQuery("all", "Home Inn")); // 3. Send request SearchResponse response = client.search(request, RequestOptions.DEFAULT); // 4. Parse the response handleResponse(response); }
3.3. Precise query
Exact queries are mainly two:
-
term: term exact match
-
range: range query
Compared with the previous query, the difference is also in the query condition, and everything else is the same.
The API for query condition construction is as follows:
3.4. Boolean queries
Boolean query is to combine other queries with must, must_not, filter, etc. The code example is as follows:
It can be seen that the difference between API and other queries is that the construction of query conditions, QueryBuilders, result parsing and other codes are completely unchanged.
The complete code is as follows:
@Test void testBool() throws IOException { // 1. Prepare Request SearchRequest request = new SearchRequest("hotel"); // 2. Prepare DSL // 2.1. Prepare BooleanQuery BoolQueryBuilder boolQuery = QueryBuilders.boolQuery(); // 2.2. Add term boolQuery.must(QueryBuilders.termQuery("city", "Hangzhou")); // 2.3. Add range boolQuery.filter(QueryBuilders.rangeQuery("price").lte(250)); request.source().query(boolQuery); // 3. Send request SearchResponse response = client.search(request, RequestOptions.DEFAULT); // 4. Parse the response handleResponse(response); }
3.5. Sorting, paging
The sorting and paging of search results are parameters at the same level as query, so they are also set using request.source().
The corresponding APIs are as follows:
Full code example:
@Test void testPageAndSort() throws IOException { // page number, size per page int page = 1, size = 5; // 1. Prepare Request SearchRequest request = new SearchRequest("hotel"); // 2. Prepare DSL // 2.1.query request.source().query(QueryBuilders.matchAllQuery()); // 2.2. sort sort request.source().sort("price", SortOrder.ASC); // 2.3. Paging from, size request.source().from((page - 1) * size).size(5); // 3. Send request SearchResponse response = client.search(request, RequestOptions.DEFAULT); // 4. Parse the response handleResponse(response); }
3.6. Highlight
The highlighted code is quite different from the previous code, there are two points:
-
Query DSL: In addition to query conditions, you also need to add highlight conditions, which are also at the same level as query.
-
Result parsing: In addition to parsing the _source document data, the result also needs to parse the highlighted result
3.6.1. Highlight request build
The construction API of the highlight request is as follows:
The above code omits the query condition part, but don’t forget: the highlight query must use full-text search and search keywords, so that keywords can be highlighted in the future.
The complete code is as follows:
@Test void testHighlight() throws IOException { // 1. Prepare Request SearchRequest request = new SearchRequest("hotel"); // 2. Prepare DSL // 2.1.query request.source().query(QueryBuilders.matchQuery("all", "Home Inn")); // 2.2. Highlight request.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false)); // 3. Send request SearchResponse response = client.search(request, RequestOptions.DEFAULT); // 4. Parse the response handleResponse(response); }
3.6.2. Analysis of highlighted results
The highlighted results and the query document results are separated by default and not together.
So parsing the highlighted code requires additional processing:
Code interpretation:
-
Step 1: Get the source from the result. hit.getSourceAsString(), this part is the non-highlighted result, json string. It also needs to be deserialized into a HotelDoc object
-
Step 2: Obtain the highlighted result. hit.getHighlightFields(), the return value is a Map, the key is the highlight field name, and the value is the HighlightField object, representing the highlight value
-
Step 3: Obtain the highlighted field value object HighlightField from the map according to the highlighted field name
-
Step 4: Get Fragments from HighlightField and convert them to strings. This part is the real highlighted string
-
Step 5: Replace non-highlighted results in HotelDoc with highlighted results
The complete code is as follows:
private void handleResponse(SearchResponse response) { // 4. Parse the response SearchHits searchHits = response.getHits(); // 4.1. Get the total number of items long total = searchHits.getTotalHits().value; System.out.println("A total of searched" + total + "data"); // 4.2. Document array SearchHit[] hits = searchHits.getHits(); // 4.3. Traverse for (SearchHit hit : hits) { // get document source String json = hit.getSourceAsString(); // deserialize HotelDoc hotelDoc = JSON.parseObject(json, HotelDoc.class); // get highlighted result Map<String, HighlightField> highlightFields = hit.getHighlightFields(); if (!CollectionUtils.isEmpty(highlightFields)) { // Get the highlighted result according to the field name HighlightField highlightField = highlightFields.get("name"); if (highlightField != null) { // Get the highlight value String name = highlightField.getFragments()[0].string(); // overwrite non-highlighted results hotelDoc.setName(name); } } System.out.println("hotelDoc = " + hotelDoc); } }