Elasticsearch in use in SpringBoot

A, SpringBoot stencil mode access (not recommended)

In fact, the beginning is ready to use templates SpringBoot to use direct access, such access is the way, they all say the Internet is the use.

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
        </dependency>

But look behind the java api official documents

Deprecated in 7.0.0.

The TransportClient is deprecated in favour of the Java High Level REST Client and will be removed in Elasticsearch 8.0. The migration guide describes all the steps needed to migrate.

Look at the source code introduced templates way


4821599-a02310fa6ccd54f4.png
Templates way to introduce SpringBoot

Java api call directly templates way, will not support the official follow-up is not recommended to use Java High Level REST Client instead, Elasticsearch 8.0simply remove the version, think, or change the way people use the recommended it, so future updates replacement migration had to do, that is the way we are now ready to use.

Two, High Level Java REST Client Access mode

Use High Level Java REST Client be Elasticsearch search query, the first step in adding a dependency

  • org.elasticsearch.client:elasticsearch-rest-client
  • org.elasticsearch:elasticsearch

2.1, adding a dependency

Add in a particular way in SpringBoot are pom.xmlin:

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>6.3.2</version>
        </dependency>

        <!-- Java High Level REST Client -->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>6.3.2</version>
        </dependency>

2.2, add the configuration address

Can be initialized after adding a dependency

RestHighLevelClient client = new RestHighLevelClient(
        RestClient.builder(
                new HttpHost("localhost", 9200, "http")));

The clientinterior maintains a thread pool, so the task is completed can client.close()to release resources, but it depends on demand, if you need frequent queries, then it is made directly to a single case, avoid constant creation and release of the thread pool will the performance impact of application, in practice SpringBoot made easier if the single embodiment.
application.ymlConfiguration file to add the cluster address and I am only one, there are more than can be separated by commas and then himself resolution.

elasticsearch:
  ip: localhost:9200
@Configuration
public class ElasticsearchRestClient {

    /**
     * ES地址,ip:port
     */
    @Value("${elasticsearch.ip}")
    String ipPort;

    @Bean
    public RestClientBuilder restClientBuilder() {

        return RestClient.builder(makeHttpHost(ipPort));
    }


    @Bean(name = "highLevelClient")
    public RestHighLevelClient highLevelClient(@Autowired RestClientBuilder restClientBuilder) {
        restClientBuilder.setMaxRetryTimeoutMillis(60000);
        return new RestHighLevelClient(restClientBuilder);
    }


    private HttpHost makeHttpHost(String s) {
        String[] address = s.split(":");
        String ip = address[0];
        int port = Integer.parseInt(address[1]);
        
        return new HttpHost(ip, port, "http");
    }
}

Here we have only one address, if there are multiple addresses that he can do the next deal.

Three, Elasticsearch search query

After the previous step can be used in the project clientto carry out specific retrieval and query operations, and before the specific use to clear a few concepts.

3.1 Elasticsearch data structure

On our side of the usage scenarios, Elasticsearch is used to store each end of the log, in this scenario, each log is a Document(文档), we know that the log contains a lot of information, such as uploading time, browser, ip, etc., each log contains multiple fields of information that is Field(字段)different log may be of different types, such as server logs, logs of user behavior, which is Type(类型), daily log storage is separately Indice(索引), may be analogous to a relational database such as MySQL.

Relational Database Elasticsearch
Databases (database) Indices (Index)
Tables (table) Types (Type)
Rows (rows) Documents (document)
Columns (Column) Fields (Fields)

Elasticsearch comprising a plurality of index (indices) (database), each index may comprise a plurality of types (types) (Table), each type including a plurality of documents (Documents) (OK), each document comprising a plurality of fields (Fields ) (column).

For chestnuts, manually add a log, indice designated as customer, type as _doc, document id for the 1.

localhost:9200/customer/_doc/1?pretty
{
    "city": "北京",
    "useragent": "Mobile Safari",
    "sys_version": "Linux armv8l",
    "province": "北京",
    "event_id": "",
    "log_time": 1559191912,
    "session": "343730"
}

And then log inquiries about just added.

GET localhost:9200/customer/_doc/1?pretty
{
    "_index": "customer",
    "_type": "_doc",
    "_id": "1",
    "_version": 3,
    "_seq_no": 2,
    "_primary_term": 1,
    "found": true,
    "_source": {
        "city": "北京",
        "useragent": "Mobile Safari",
        "sys_version": "Linux armv8l",
        "province": "北京",
        "event_id": "",
        "log_time": 1559191912,
        "session": "343730"
    }
}

3.2 Elasticsearch condition query

The first step needs to be initialized SearchRequest, set the index (indices) and type (types), add to the log above example.

        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices("customer");
        searchRequest.types("_doc");

You then need a combination of search criteria, mainly related to =, !=, >, <query these conditions, you require more complex can view the official documentation.

// 条件=
MatchQueryBuilder matchQuery = QueryBuilders.matchQuery("city", "北京");
TermQueryBuilder termQuery = QueryBuilders.termQuery("province", "福建");

// 范围查询
RangeQueryBuilder timeFilter = QueryBuilders.rangeQuery("log_time").gt(12345).lt(343750);

Construction of the query need, the need for a combination of the query, the query in a combination of inside !=conditions of the query, the need to use BoolQueryBuilder, BoolQueryBuildercontains four methods:

  • mustEquivalent &(与)conditions.
  • must notEquivalent ~(非)conditions.
  • shouldEquivalent | (或)conditions.
  • filterSimilarly must, except that it does not participate in computing the score, the higher efficiency when used without the need for score calculation.
QueryBuilder totalFilter = QueryBuilders.boolQuery()
                .filter(matchQuery)
                .filter(timeFilter)
                .mustNot(termQuery);

3.3 Elasticsearch paging query

You can set the number of documents returned for each query, if not set, default return only 10 hits, this number can be set manually:

sourceBuilder.query(totalFilter).size(100);

Returns the number of pieces is not only set to meet the demand, because our side is no way to determine in advance, so the need to implement paging yourself, you need from()methods assist.

Complete sample code is as follows:

@Service
    public class TestService {
        @Autowired
        RestHighLevelClient highLevelClient;

        private void search(RestHighLevelClient highLevelClient) throws IOException {

            SearchRequest searchRequest = new SearchRequest();
            searchRequest.indices("customer");
            searchRequest.types("_doc");

            // 条件=
            MatchQueryBuilder matchQuery = QueryBuilders.matchQuery("city", "北京");
            TermQueryBuilder termQuery = QueryBuilders.termQuery("province", "福建");
            // 范围查询
            RangeQueryBuilder timeFilter = QueryBuilders.rangeQuery("log_time").gt(12345).lt(343750);
            SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();

            QueryBuilder totalFilter = QueryBuilders.boolQuery()
                    .filter(matchQuery)
                    .filter(timeFilter)
                    .mustNot(termQuery);

            int size = 200;
            int from = 0;
            long total = 0;

            do {
                try {
                    sourceBuilder.query(totalFilter).from(from).size(size);
                    sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
                    searchRequest.source(sourceBuilder);

                    SearchResponse response = highLevelClient.search(searchRequest);
                    SearchHit[] hits = response.getHits().getHits();
                    for (SearchHit hit : hits) {
                        System.out.println(hit.getSourceAsString());
                    }

                    total = response.getHits().totalHits;

                    System.out.println("测试:[" + total + "][" + from + "-" + (from + hits.length) + ")");

                    from += hits.length;

                    // from + size must be less than or equal to: [10000]
                    if (from >= 10000) {
                      System.out.println("测试:超过10000条直接中断");
                      break;
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            } while (from < total);
        }
    }

3.4 abnormal paging query

There was a problem when the data is queried more than 10,000 when the reported anomalies in the process of paging in:

from + size must be less than or equal to: [10000]

This problem is most efficient solution is to increase the window size:

curl -XPUT http://127.0.0.1:9200/customer/_settings -d '{ "index" : { "max_result_window" : 500000}}'

But the corresponding increase in the size of the window, at the expense of more server memory, CPU resources at our side usage scenarios, this is worth it, because our aim is to do the search target data, rather than large-scale traversal, so here we will directly give up more than this number of queries, which is above the code:

 // from + size must be less than or equal to: [10000]
  if (from > 10000) {
     System.out.println("测试:超过10000条直接中断");
     break;
  }

For Elasticsearch actually a lot of places are not familiar with, and interested children's shoes can interact more and correct me together, otherwise follow-up only in the course to deepen understanding.

Reference:
1, elasticsearch: The Definitive Guide
2, elasticsearch: the Java API [7.1]
3, elasticsearch: the Java REST Client [7.1]
4, elasticsearch query - Boolean queries Query Bool
5, solve ElasticSearch depth paging mechanism in Result window is too large problem

Reproduced in: https: //www.jianshu.com/p/de838a665eec

Guess you like

Origin blog.csdn.net/weixin_34037515/article/details/91149809