From entry to advanced: Detailed Elasticsearch advanced query skills

Elasticsearch is a powerful full-text search engine that uses the Lucene search library for underlying indexing and searching. Elasticsearch provides many advanced query techniques that can help users query data more accurately and efficiently. This tutorial will introduce Elasticsearch's advanced query techniques and provide some sample code to illustrate their use.

1. Boolean query

Elasticsearch supports Boolean queries, including AND, OR, and NOT operators. This enables users to limit query results using multiple criteria.

For example, the following query will return all documents matching "foo" and "bar":

GET /_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "content": "foo" }},
        { "match": { "content": "bar" }}
      ]
    }
  }
}

Additionally, a "should" query can be used to match either condition. The following query will return all documents matching "foo" or "bar":

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "content": "foo" }},
        { "match": { "content": "bar" }}
      ]
    }
  }
}

2. Range query

Elasticsearch supports range queries, which can be used to query whether a field is within a specified range. There are two types of range queries: numeric ranges and date ranges.

For example, the following query will return all users whose age is between 18 and 30:

GET /_search
{
  "query": {
    "range": {
      "age": {
        "gte": 18,
        "lte": 30
      }
    }
  }
}

The following query will return all users with registration dates between January 1, 2019 and January 1, 2020:

GET /_search
{
  "query": {
    "range": {
      "registered_at": {
        "gte": "2019-01-01",
        "lte": "2020-01-01"
      }
    }
  }
}

3. Fuzzy query

Elasticsearch supports fuzzy queries, which can be used to query documents that contain misspellings or approximate matches. Fuzzy queries use fuzzy matching algorithms, such as edit distance algorithms, to find documents that closely match.

For example, the following query will return documents containing "fox" or "fix":

GET /_search
{
  "query": {
    "fuzzy": {
      "content": {
        "value": "fox",
        "fuzziness": "2"
      }
    }
  }
}

The "fuzziness" parameter specifies the maximum allowable edit distance. In the example above, "fuzziness" is 2, meaning that the query will match documents with an edit distance of 1 or 2.

Fourth, regular expression query

Elasticsearch supports regular expression queries, which can be used to query text that matches a specified pattern. Regular expression queries can use the "regexp" query type.

For example, the following query will return documents containing "foo" or "bar":

GET /_search
{
  "query": {
    "regexp": {
      "content": "foo|bar"
    }
  }
}

5. Wildcard query

Elasticsearch supports wildcard queries, which can be used to query text containing wildcard patterns. Wildcard queries can use the "wildcard" query type.

For example, the following query will return documents that contain either "foo" or "bar":

GET /_search
{
  "query": {
    "wildcard": {
      "content": "foo* OR bar*"
    }
  }
}

6. Phrase query

Elasticsearch supports phrase queries, which can be used to query documents containing one or more phrases. Phrase queries can use the "match_phrase" query type.

For example, the following query returns documents containing the phrase "quick brown fox" or "lazy dog":

GET /_search
{
  "query": {
    "match_phrase": {
      "content": "quick brown fox lazy dog"
    }
  }
}

7. Highlight

Elasticsearch supports highlighting keywords in query results, which can be used to make query results easier to understand. Highlighting can be enabled using the "highlight" parameter.

For example, the following query will return documents containing "foo" or "bar" and highlight the keyword in the query results:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "content": "foo" }},
        { "match": { "content": "bar" }}
      ]
    }
  },
  "highlight": {
    "fields": {
      "content": {}
    }
  }
}

Eight, pagination and sorting

Elasticsearch supports pagination and sorting of query results. You can use the "from" and "size" parameters to specify the starting position and number of returned results. You can use the "sort" parameter to specify the sorting method.

For example, the following query will return 5 documents starting with document 10, sorted by the "age" field in ascending order:

GET /_search
{
  "from": 10,
  "size": 5,
  "query": {
    "match_all": {}
  },
  "sort": [
    { "age": "asc" }
  ]
}

9. Aggregation query

Elasticsearch supports aggregation queries, which can be used to count and group documents. Aggregation queries can be enabled using the "aggs" parameter.

For example, the following query will return the number of documents that contain each word in the "content" field:

GET /_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "word_count": {
      "terms": {
        "field": "content"
      }
    }
  }
}

The above are some advanced query skills of Elasticsearch. Some sample code is provided below to illustrate their use.

Ten, Java sample code

The sample code is as follows:

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.RangeQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder.Field;

public class ElasticsearchDemo {
    
    public static void main(String[] args) throws IOException {
        // 创建客户端
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("localhost", 9200, "http")));
        
        // 创建索引和映射
        createIndexAndMapping(client);
        
        // 插入文档
        insertDocument(client);
        
        // 查询
        MatchQueryBuilder matchQuery = QueryBuilders.matchQuery("content", "elasticsearch");
        SearchRequest searchRequest = new SearchRequest("my_index");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(matchQuery);
        searchRequest.source(searchSourceBuilder);
        SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
        printSearchResult(response);
        
        // 带有高亮显示的查询
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field(new Field("content").preTags("<em>").postTags("</em>"));
        searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(matchQuery);
        searchSourceBuilder.highlighter(highlightBuilder);
        searchRequest = new SearchRequest("my_index");
        searchRequest.source(searchSourceBuilder);
        response = client.search(searchRequest, RequestOptions.DEFAULT);
        printSearchResultWithHighlight(response);
        
        // 范围查询
        RangeQueryBuilder rangeQuery = QueryBuilders.rangeQuery("publish_date")
                .from("2020-01-01")
                .to("2021-12-31");
        searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(rangeQuery);
        searchRequest = new SearchRequest("my_index");
        searchRequest.source(searchSourceBuilder);
        response = client.search(searchRequest, RequestOptions.DEFAULT);
        printSearchResult(response);
        
        // 排序
        searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(matchQuery);
        searchSourceBuilder.sort("publish_date");
        searchRequest = new SearchRequest("my_index");
        searchRequest.source(searchSourceBuilder);
        response = client.search(searchRequest, RequestOptions.DEFAULT);
        printSearchResult(response);
        
        // 删除索引
        deleteIndex(client);
        
        // 关闭客户端
        client.close();
    }
    
    private static void createIndexAndMapping(RestHighLevelClient client) throws IOException {
        // 创建索引
        Map<String, Object> settings = new HashMap<>();
        settings.put("number_of_shards", 1);
        settings.put("number_of_replicas", 0);
        Map<String, Object> mapping = new HashMap<>();
        Map<String, Object> properties = new HashMap<>();
        properties.put("title", Map.of("type", "text"));
        properties.put("content", Map.of("type", "text"));
        properties.put("publish_date", Map.of("type", "date"));
        mapping.put("properties", properties);
        client.indices().create(Map.of("index", "my_index", "settings", settings, "mapping", mapping),
                RequestOptions.DEFAULT);
    }

    private static void insertDocument(RestHighLevelClient client) throws IOException {
        // 插入文档
        Map<String, Object> document = new HashMap<>();
        document.put("title", "Elasticsearch Guide");
        document.put("content", "This is a guide to Elasticsearch.");
        document.put("publish_date", "2021-03-01");
        client.index(Map.of("index", "my_index", "id", "1", "body", document), RequestOptions.DEFAULT);
    }

    private static void deleteIndex(RestHighLevelClient client) throws IOException {
        // 删除索引
        client.indices().delete(Map.of("index", "my_index"), RequestOptions.DEFAULT);
    }

    private static void printSearchResult(SearchResponse response) {
        // 打印查询结果
        SearchHits hits = response.getHits();
        System.out.println("Total hits: " + hits.getTotalHits().value);
        System.out.println("Hits:");
        for (SearchHit hit : hits) {
            System.out.println("Id: " + hit.getId());
            System.out.println("Score: " + hit.getScore());
            System.out.println("Title: " + hit.getSourceAsMap().get("title"));
            System.out.println("Content: " + hit.getSourceAsMap().get("content"));
            System.out.println("Publish date: " + hit.getSourceAsMap().get("publish_date"));
        }
    }

    private static void printSearchResultWithHighlight(SearchResponse response) {
        // 打印带有高亮显示的查询结果
        SearchHits hits = response.getHits();
        System.out.println("Total hits: " + hits.getTotalHits().value);
        System.out.println("Hits:");
        for (SearchHit hit : hits) {
            System.out.println("Id: " + hit.getId());
            System.out.println("Score: " + hit.getScore());
            System.out.println("Title: " + hit.getSourceAsMap().get("title"));
            HighlightField highlightField = hit.getHighlightFields().get("content");
            if (highlightField != null) {
                Text[] fragments = highlightField.fragments();
                String content = "";
                for (Text fragment : fragments) {
                    content += fragment;
                }
                System.out.println("Content: " + content);
            } else {
                System.out.println("Content: " + hit.getSourceAsMap().get("content"));
            }
            System.out.println("Publish date: " + hit.getSourceAsMap().get("publish_date"));
        }
    }
}

Here we use the Elasticsearch advanced REST client API to implement the sample code. Compared with the low-level API, the advantage of using the high-level API is that it is easier to use, and the usage method is closer to object-oriented programming, which improves development efficiency.

Eleven, using the Spring Boot framework

First, we need to add the relevant dependencies. Add the following dependencies to pom.xmlthe file :

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
    <version>7.15.2</version>
</dependency>

Among them, spring-boot-starter-data-elasticsearchdepends on the basic dependencies provided by Spring Boot for integration with Elasticsearch, elasticsearch-rest-high-level-clientand depends on the Elasticsearch advanced REST client API.

Next, we create a Spring Boot main class and add the following code to it:

import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder.Field;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder.HighlightQuery;
import org.elasticsearch.search.sort.SortOrder;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.data.elasticsearch.client.RestClients;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

@SpringBootApplication
public class ElasticsearchDemoApplication implements CommandLineRunner {

    public static void main(String[] args) {
        SpringApplication.run(ElasticsearchDemoApplication.class, args);
    }

    @Bean
    public RestHighLevelClient client() {
        return RestClients.create(RestClients.createLocalHost()).rest();
    }

    @Override
    public void run(String... args) throws Exception {
        RestHighLevelClient client = client();
        try {
            createIndex(client);
            insertDocument(client);
            searchDocument(client);
            deleteIndex(client);
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            client.close();
        }
    }

    private static void createIndex(RestHighLevelClient client) throws IOException {
        // 创建索引
        Settings.Builder settings = Settings.builder()
                .put("index.number_of_shards", 1)
                .put("index.number_of_replicas", 0);
        Map<String, Object> mapping = new HashMap<>();
        Map<String, Object> properties = new HashMap<>();
        properties.put("title", Map.of("type", "text"));
        properties.put("content", Map.of("type", "text"));
        properties.put("publish_date", Map.of("type", "date"));
        mapping.put("properties", properties);
        client.indices().create(Map.of("index", "my_index", "settings", settings, "mapping", mapping),
                RequestOptions.DEFAULT);
    }

    private static void insertDocument(RestHighLevelClient client) throws IOException {
        // 插入文档
        Map<String, Object> document = new HashMap<>();
        document.put("title", "Elasticsearch Guide");
        document.put("content", "This is a guide to
        IndexRequest request = new IndexRequest("my_index")
                .id("1")
                .source(document);
        client.index(request, RequestOptions.DEFAULT);
    }

    private static void searchDocument(RestHighLevelClient client) throws IOException {
        // 搜索文档
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
                .must(QueryBuilders.matchQuery("title", "Elasticsearch"))
                .should(QueryBuilders.matchQuery("content", "guide"));
        sourceBuilder.query(boolQueryBuilder)
                .sort("publish_date", SortOrder.DESC)
                .from(0)
                .size(10)
                .timeout(TimeValue.timeValueSeconds(1))
                .fetchSource(new String[]{"title", "publish_date"}, new String[]{"content"});
        HighlightBuilder highlightBuilder = new HighlightBuilder()
                .field(new Field("title"))
                .highlightQuery(new HighlightQuery().matchQuery(new HashMap<String, Object>() {
   
   {
                    put("title", new HashMap<>());
                }}));
        sourceBuilder.highlighter(highlightBuilder);
        SearchRequest request = new SearchRequest("my_index").source(sourceBuilder);
        SearchResponse response = client.search(request, RequestOptions.DEFAULT);
        System.out.println("Total hits: " + response.getHits().getTotalHits().value);
        for (SearchHit hit : response.getHits().getHits()) {
            System.out.println("Title: " + hit.getSourceAsMap().get("title"));
            System.out.println("Publish date: " + hit.getSourceAsMap().get("publish_date"));
            System.out.println("Content: " + hit.getHighlightFields().get("title").fragments()[0].string());
            System.out.println("--------------------------");
        }
    }

    private static void deleteIndex(RestHighLevelClient client) throws IOException {
        // 删除索引
        DeleteRequest request = new DeleteRequest("my_index");
        client.indices().delete(request, RequestOptions.DEFAULT);
    }

}

We have implemented the interface ElasticsearchDemoApplicationin CommandLineRunnerso that the relevant methods are executed when the application starts. In runthe method , we call methods for creating an index, inserting a document, searching for a document, and deleting an index. The specific implementation of these methods is the same as in the sample code.

Next, we can run the application and see the results. Enter the following command in the terminal:

mvn spring-boot:run

Through this Spring Boot implementation, we can interact with Elasticsearch more conveniently without having to manually set up connections and release resources. In addition, Spring Boot also provides many other features, such as automatic configuration and dependency injection. This allows us to focus more on the business logic without having to pay too much attention to the interaction with Elasticsearch.

Summarize

Elasticsearch is a powerful search engine with many advanced query techniques. In actual use, you can choose an appropriate query method according to specific needs, and use the advanced functions in the query statement to implement more complex query operations. This tutorial introduces the basic query methods and advanced query techniques of Elasticsearch, and provides corresponding code examples, hoping to help readers better grasp the query functions of Elasticsearch.