Elasticsearch advanced query techniques

1. Introduction

1.1 Introduction to Elasticsearch

Elasticsearch is an open source distributed search and analytics engine built on top of Apache Lucene. It provides distributed real-time search and analysis functions, can process massive data, and return query results with millisecond response time. Elasticsearch uses JSON format to store, search and analyze data.

2. Basic query types

2.1 Match Query

Match Query is one of the most basic and commonly used query types, which is used to search for documents matching specified terms in specified fields.

// 创建一个 Match Query
MatchQueryBuilder matchQuery = QueryBuilders.matchQuery("title", "Elasticsearch")
                                 .operator(Operator.AND)  // 设置匹配操作符为 AND
                                 .fuzziness(Fuzziness.AUTO)  // 设置模糊匹配
                                 .prefixLength(3)  // 设置前缀长度
                                 .maxExpansions(10);  // 设置最大扩展数

// 执行查询
SearchResponse searchResponse = client.prepareSearch("index")
                                     .setTypes("type")
                                     .setQuery(matchQuery)
                                     .get();

2.2 Bool Query

Bool Query is used to combine multiple query conditions and supports four logical operations: must, must_not, should and filter.

// 创建一个 Bool Query
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery()
                               .must(QueryBuilders.matchQuery("title", "Elasticsearch"))
                               .mustNot(QueryBuilders.termQuery("category", "deleted"))
                               .should(QueryBuilders.rangeQuery("price").lte(100))
                               .filter(QueryBuilders.termQuery("availability", "instock"));

// 执行查询
SearchResponse searchResponse = client.prepareSearch("index")
                                     .setTypes("type")
                                     .setQuery(boolQuery)
                                     .get();

2.3 Range Query

Range Query is used to search for documents where the value of the specified field is within the specified range.

// 创建一个 Range Query
RangeQueryBuilder rangeQuery = QueryBuilders.rangeQuery("price")
                               .gte(50)  // 大于等于 50
                               .lte(100);  // 小于等于 100

// 执行查询
SearchResponse searchResponse = client.prepareSearch("index")
                                     .setTypes("type")
                                     .setQuery(rangeQuery)
                                     .get();

2.4 Terms Query

Terms Query is used to search for documents matching multiple specified terms in a specified field.

// 创建一个 Terms Query
TermsQueryBuilder termsQuery = QueryBuilders.termsQuery("category", "electronics", "books", "clothing");

// 执行查询
SearchResponse searchResponse = client.prepareSearch("index")
                                     .setTypes("type")
                                     .setQuery(termsQuery)
                                     .get();

2.5 Aggregation Query

Aggregation Query is used to aggregate and analyze search results, which can be counted, grouped, calculated, etc.

// 创建一个 Aggregation Query
SearchRequestBuilder searchRequest = client.prepareSearch("index")
                                           .setTypes("type");

AggregationBuilder aggregation = AggregationBuilders.terms("by_category")
                                     .field("category.keyword")
                                     .size(5)  // 设置返回的桶数量
                                     .order(Terms.Order.count(false));  // 设置桶排序方式

searchRequest.addAggregation(aggregation);

// 执行查询
SearchResponse searchResponse = searchRequest.get();

// 获取聚合结果
Terms byCategoryAggregation = searchResponse.getAggregations().get("by_category");
List<? extends Terms.Bucket> buckets = byCategoryAggregation.getBuckets();
for (Terms.Bucket bucket : buckets) {
    
    
    String key = bucket.getKeyAsString();
    long count = bucket.getDocCount();
    System.out.println(key + ": " + count);
}

3. Advanced query

3.1 Fuzzy Query (fuzzy query)

// 创建 Fuzzy Query
FuzzyQueryBuilder fuzzyQuery = QueryBuilders.fuzzyQuery("fieldName", "searchTerm");
// 设置模糊匹配的最大编辑距离
fuzzyQuery.fuzziness(Fuzziness.AUTO);
// 设置前缀长度
fuzzyQuery.prefixLength(2);
// 设置最大扩展项数
fuzzyQuery.maxExpansions(10);
// 设置查询的模糊程度
fuzzyQuery.transpositions(true);

// 将 Fuzzy Query 添加到查询对象中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(fuzzyQuery);

3.2 Prefix Query (prefix query)

// 创建 Prefix Query
PrefixQueryBuilder prefixQuery = QueryBuilders.prefixQuery("fieldName", "searchTerm");

// 将 Prefix Query 添加到查询对象中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(prefixQuery);

3.3 Wildcard Query (wildcard query)

// 创建 Wildcard Query
WildcardQueryBuilder wildcardQuery = QueryBuilders.wildcardQuery("fieldName", "searchTerm");

// 将 Wildcard Query 添加到查询对象中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(wildcardQuery);

3.4 Regexp Query (regular expression query)

// 创建 Regexp Query
RegexpQueryBuilder regexpQuery = QueryBuilders.regexpQuery("fieldName", "regexpPattern");

// 将 Regexp Query 添加到查询对象中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(regexpQuery);

3.5 Geo-query (geographic location query)

3.5.1 Geo-bounding box query

// 创建 Geo-bounding box 查询
GeoBoundingBoxQueryBuilder geoBoundingBoxQuery = QueryBuilders.geoBoundingBoxQuery("fieldName")
    .setCorners(bottomLeftLat, bottomLeftLon, topRightLat, topRightLon);

// 将 Geo-bounding box Query 添加到查询对象中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(geoBoundingBoxQuery);

3.5.2 Geo-distance query

// 创建 Geo-distance 查询
GeoDistanceQueryBuilder geoDistanceQuery = QueryBuilders.geoDistanceQuery("fieldName")
    .point(centerLat, centerLon)
    .distance(distance, DistanceUnit.KILOMETERS);

// 将 Geo-distance Query 添加到查询对象中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(geoDistanceQuery);

3.5.3 Geo-shape query

// 创建 Geo-shape 查询
GeoShapeQueryBuilder geoShapeQuery = QueryBuilders.geoShapeQuery("fieldName", "shapeType", "coordinates");

// 将 Geo-shape Query 添加到查询对象中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(geoShapeQuery);

3.6 Script Query (script query)

// 创建 Script Query
ScriptQueryBuilder scriptQuery = QueryBuilders.scriptQuery(new Script(ScriptType.INLINE, "painless", "script"));

// 将 Script Query 添加到查询对象中
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(scriptQuery);

4. Complex query application

4.1 Multiple sorts

In Elasticsearch, you can implement multiple sorts by specifying multiple sort fields. Here is a sample code showing how to do multiple sorting via the Java API:

// 创建 SearchRequest 请求对象
SearchRequest searchRequest = new SearchRequest("index_name");

// 创建 SearchSourceBuilder 构建查询条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

// 设置排序字段和排序方式
searchSourceBuilder.sort(new FieldSortBuilder("field1").order(SortOrder.ASC));
searchSourceBuilder.sort(new FieldSortBuilder("field2").order(SortOrder.DESC));

// 将查询条件设置到请求对象中
searchRequest.source(searchSourceBuilder);

// 发起搜索请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

4.2 Fuzzy matching and error correction

Elasticsearch provides fuzzy matching and error correction functions, you can use Fuzzy Query to achieve. Here is a sample code showing how to do fuzzy matching and error correction through the Java API:

// 创建 SearchRequest 请求对象
SearchRequest searchRequest = new SearchRequest("index_name");

// 创建 SearchSourceBuilder 构建查询条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

// 创建 Fuzzy Query
FuzzyQueryBuilder fuzzyQuery = QueryBuilders.fuzzyQuery("field", "keyword").fuzziness(Fuzziness.AUTO);

// 将 Fuzzy Query 设置到查询条件中
searchSourceBuilder.query(fuzzyQuery);
// 将查询条件设置到请求对象中
searchRequest.source(searchSourceBuilder);

// 发起搜索请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

4.3 Multi-field query

If you want to query on multiple fields, you can use Multi Match Query. Here is a sample code showing how to do a multi-field query through the Java API:

// 创建 SearchRequest 请求对象
SearchRequest searchRequest = new SearchRequest("index_name");

// 创建 SearchSourceBuilder 构建查询条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

// 创建 Multi Match Query
MultiMatchQueryBuilder multiMatchQuery = QueryBuilders.multiMatchQuery("keyword", "field1", "field2");

// 将 Multi Match Query 设置到查询条件中
searchSourceBuilder.query(multiMatchQuery);

// 将查询条件设置到请求对象中
searchRequest.source(searchSourceBuilder);

// 发起搜索请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

4.4 Filter data by time range

If you want to filter data by time range, you can use Range Query. The following is a sample code that shows how to filter data by time range through the Java API:

// 创建 SearchRequest 请求对象
SearchRequest searchRequest = new SearchRequest("index_name");

// 创建 SearchSourceBuilder 构建查询条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

// 创建 Range Query
RangeQueryBuilder rangeQuery = QueryBuilders.rangeQuery("date_field").from("start_date").to("end_date");

// 将 Range Query 设置到查询条件中
searchSourceBuilder.query(rangeQuery);

// 将查询条件设置到请求对象中
searchRequest.source(searchSourceBuilder);

// 发起搜索请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

4.5 Scroll Query and Search After in ResultSet

In Elasticsearch, you can use Scroll Query and Search After to retrieve large amounts of data in pages. Here is a sample code showing how to use Scroll Query and Search After via Java API:

// 创建 SearchRequest 请求对象
SearchRequest searchRequest = new SearchRequest("index_name");

// 创建 SearchSourceBuilder 构建查询条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

// 设置每页返回的数据量
int pageSize = 10;
searchSourceBuilder.size(pageSize);

// 设置 Scroll 过期时间
TimeValue scrollTime = TimeValue.timeValueMinutes(1);
searchRequest.scroll(scrollTime);

// 将查询条件设置到请求对象中
searchRequest.source(searchSourceBuilder);

// 发起搜索请求
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

// 获取 Scroll Id
String scrollId = searchResponse.getScrollId();

// 获取第一页的数据
SearchHits hits = searchResponse.getHits();
String[] scrollIds = new String[hits.getHits().length];
for (int i = 0; i < hits.getHits().length; i++) {
    
    
    scrollIds[i] = hits.getHits()[i].getId();
}

// 使用 Scroll Id 进行后续分页请求
while (hits.getHits().length != 0) {
    
    
    SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
    scrollRequest.scroll(scrollTime);
    
    // 设置 Search After 参数
    scrollRequest.searchAfter(scrollIds);
    
    // 发起 Scroll 请求
    searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);

    // 处理每一页的数据
    hits = searchResponse.getHits();
    for (int i = 0; i < hits.getHits().length; i++) {
    
    
        // 处理数据
    }
}

5. Debugging and performance optimization

5.1 Explain API to view matching details

The Explain API can help us see the details of how a document matches a query. Below is a sample code using the Explain API.

SearchRequest searchRequest = new SearchRequest("index_name");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.termQuery("field_name", "search_term"));
searchRequest.source(searchSourceBuilder);

ExplainRequest explainRequest = new ExplainRequest("index_name","document_id");
explainRequest.source(searchSourceBuilder);

ExplainResponse explainResponse = client.explain(explainRequest, RequestOptions.DEFAULT);
Explanation explanation = explainResponse.getExplanation();

System.out.println(explanation.toString());

5.2 Check and optimize the search profile

Searching the Profile can help us understand how the query is executing and which part of the query is taking the most time. The following is a sample code using Profile.

SearchRequest searchRequest = new SearchRequest("index_name");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.termQuery("field_name", "search_term"));
searchSourceBuilder.profile(true);
searchRequest.source(searchSourceBuilder);

SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
SearchProfileShardResult profileResult = searchResponse.getProfileResults()[0];
SearchProfileTree profileTree = profileResult.getQueryProfileTree();

System.out.println(profileTree.toString());

5.3 Use warmer to reduce cold query time

When Elasticsearch discovers a new query pattern, it performs a cold query to initialize the cache. Using warmer can reduce the time of cold query. Below is a sample code using warmer.

IndicesWarmerRequest request = new IndicesWarmerRequest("warmer_name");
request.indices("index_name");
request.types("type_name");
request.source("{\n" +
    "  \"query\": {\n" +
    "    \"match\": {\n" +
    "      \"field_name\": \"search_term\"\n" +
    "    }\n" +
    "  }\n" +
    "}");

WarmerResponse warmerResponse = client.indices().warmer(request, RequestOptions.DEFAULT);
boolean acknowledged = warmerResponse.isAcknowledged();

if (acknowledged) {
    
    
    System.out.println("Warmer has been successfully registered!");
}

Guess you like

Origin blog.csdn.net/u010349629/article/details/131697266