Spring Boot + Elasticsearch search function, you will be after reading this

In this tutorial, we will use the following key technology stack:

  • Spring Boot
  • Elasticsearch
  • Logstash
  • MySQL

Spring Boot as our main external API interface, Elasticsearch it as our search engine. Suppose we already use MySQL as our database, this tutorial will teach you how to use MySQL with Elasticsearch Logstash achieve real-time synchronization.

Spring Boot Settings

Add dependent

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<dependency>
	<groupId>io.searchbox</groupId>
	<artifactId>jest</artifactId>
	<version>6.3.1</version>
</dependency>

We need jest (Java Rest Client) send HTTP Rest request to query the elasticsearch 9200 port.

Set application.properties

spring.elasticsearch.jest.uris=http://localhost:9200
spring.elasticsearch.jest.read-timeout=5000

EsStudyCard.java

@Document(indexName = "cards", type = "card")
public class EsStudyCard {
    @Id
    private int id;
    private String title;
    private String description;
    
    // setters and getters
}

It should be noted that we can not use annotations @Entity and @Document for the same entity class (the Entity class) , because @Entity represented by JPA entity in charge of this class, and @Document represented by Elasticsearch entity in charge of the class, two used together, then it will conflict and cause an exception. So we need to create another entity classes dedicated to Elasticsearch use.

If you are using an external Elasticsearch Docker container or server, localhost set into the corresponding host name or a host address.

EsStudyCardService.java

@Service
public class EsStudyCardService {
    @Autowired
    private JestClient jestClient;

    public List<EsStudyCard> search(String content) {
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        searchSourceBuilder.query(
            QueryBuilders
            .boolQuery()
            .should(QueryBuilders.matchQuery("title", content).fuzziness("10").operator(Operator.OR))
            .should(QueryBuilders.matchQuery("description", content).fuzziness("10").operator(Operator.OR))
            .minimumShouldMatch(1)
        );
        Search search = new Search
                            .Builder(searchSourceBuilder.toString())
                            .addIndex("cards")
                            .addType("card")
                            .build();
        try {
            JestResult result = jestClient.execute(search);
            return result.getSourceAsObjectList(EsStudyCard.class);
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        }
    }
}

This wording is a fuzzy search. First we create an instance of a SearchSourceBuilder, followed by calling the query () method, passing a QueryBuilder instance (the interface created by QueryBuilders). Here we will query any data QueryBuilder fuzzy meet the search criteria. Should we use herein () method and adding minimumShouldMatch (1) satisfy the logical way to represent the title or description fields.

QueryBuilder passed in should () method in, where we use fuziness () method represents a parameter we want to use fuzzy search, edit distance express delivery vague search terms. Then we call operator () method and pass Operator.OR said that if the content here contains more than one word, each word will correspond fuzzy search and logical OR relationship will somehow connect them. For example, if we search the word "red apple", the search condition to meet the "red" fuzzy search OR meet the "apple" fuzzy search.

Then we have to specify the type index by Search. Then use JestClient Send request. Here we just need to make sure that we rely JestClient joined (as shown before), then use @Autowired annotations, Spring Boot will help us inject instances of the JestClient.

EsStudyCardController.java

@PostMapping(value = "/esstudycard/search")
public ResponseEntity<Object> search(@RequestBody JSONObject payload) {
    String content = (String) payload.get("content");
    List<EsStudyCard> studyCards = esStudyCardService.search(content);
    JSONObject jsonObject = new JSONObject();
    jsonObject.put("data", studyCards);
    return ResponseEntity.ok().body(jsonObject);
}

Then the controller class which simply calls EsStudyCardService the search method on it.

Logstash profile

To MySQL, for example, we need to download the appropriate jdbc driver, and placed in the specified directory.

Then we create a configuration file Logstash below. Here Docker containers to an example, the form of the mapping of the volume Logstash profile to / usr / share / logstash / pipeline (CentOS system). Logstash automatically detects in the pipeline profile file.

input {
    jdbc {
        jdbc_connection_string => "jdbc:mysql://localhost:3306/vanpanda"
        jdbc_user => "root"
        jdbc_password => "root"
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        statement => "SELECT * FROM study_cards where modified_timestamp > :sql_last_value order by modified_timestamp;"
        use_column_value => true
        tracking_column_type => "timestamp"
        tracking_column => "modified_timestamp"
        schedule => "* * * * *"
        jdbc_default_timezone => "America/Los_Angeles"
        sequel_opts => {
            fractional_seconds => true
        }
    }
}

output {
    elasticsearch {
        index => "cards"
        document_type => "card"
        document_id => "%{id}"
        hosts => ["localhost:9200"]
    }
}

The configuration file is to tell the location Logstash data source. Here our data is our source MySQL database. According to this configuration file, Logstash sent every minute we specify query (Query), then insert the returned data to Elasticsearch in near real-time to achieve synchronization.

Please localhost into the appropriate host name or host address.

View Elasticsearch state or data

We can view the status of Elasticsearch through web api way, including its index.

http://localhost:9200/{index_name}
http://localhost:9200/_cat/indices?v
http://localhost:9200/{index_name}/_search?pretty

The above url copied directly into the browser to find it by Elasticsearch Http Rest api 9200 port.

Let Logstash continuous operation, and check the data update

When Logstash completion of the first task will automatically shut down. We need to continue to run and Logstash time to time to perform a task to ensure Elasticsearch with MySQL can achieve near real-time synchronization of the Docker containers.

In Logstash configuration file, configuration schedule set in the input jdbc, otherwise Logstash think this is just a one-time task, and it will automatically shut down after the first execution.

input {
    jdbc {
		...
        schedule => "* * * * *" #加入这个,表示每分钟更新
    }
}

Representative examples of syntax to perform tasks once per minute. We can see the syntax documentation custom time.

Ensure Logstash not repeat the old data is updated

If we do not tell the judge how Logstash old and new data, when Logstash each task execution, the old data will be deleted, and then re-inserted once, resulting in a waste of resources.

For Logstash do the following configuration:

input {
  jdbc { 
  	...
  	# 需要利用 Modified Date / Timestamp 来判断数据新旧
    statement => "SELECT" * FROM testtable where Date > :sql_last_value order by Date" 
    # 告诉 Logstash 使用 column value for
    use_column_value => true  :sql_last_value
    # 指定 column Date 作为 :sql_last_value 的值
    tracking_column => Date 
    # 如果 column type 是 timestamp,则一定要配置这项。默认为 numeric
    tracking_column_type => timestamp
}

Reference document: Migrating the Data Into elasticsearch the Using MySql Logstash

Error messages and solutions

Unable to find driver class via URLClassLoader in given driver jars: com.mysql.jdbc.Driver and com.mysql.jdbc.Driver

logstash version 6.2.x and above, conf file do not set jdbc_driver_library, the Connector
Driver jar files directly in / usr / share / logstash / logstash -core / lib / jars / can be.

Reference document: Stackoverflow is a question

After Logstash run, Elasticsearch, only one data, other data is not stored in.

The main reason is because the id data duplication, so Elasticsearch will delete old data and insert new data. A status check of the index
(http: // localhost:? 9200 / _cat / indices v), can be found in the data is deleted.
When we set the Logstash conf file, in which an output document_id, using wildcards% {field_name} to specify a unique id. For example, we select from the database fields, there is a primary key field named id as we can this set:% {id}. Logstash to this field will be of value in as Elasticsearch in the id.

: Reference Document a question elastic official website

Logstash sql_last_value is always 0

If tracking_column is a timestamp, then remember to set
tracking_column_type => timestamp

Logstash sql_last_value timestamp is not consistent with database

One possible reason is that logstash: inconsistent timezone of the database sql_last_value. Because
Elasticsearch with Logstash use UTC timezone default.
Workaround: Logstash configuration file, set jdbc_default_timezone as you want timezone, then Elasticsearch and Logstash will be converted accordingly.

Reference document: Logstash-jdbc-the INPUT Timezone Issue

MySQL Docker container appears when you first start ERROR 1396 (HY000) at line 1: Operation CREATE USER failed for 'root' @ '%' and then turned off

Probably because we pass parameters to the MySQL docker-compose.yml inside the container when recreating named root
error caused by the user. Try try another user name. Or if you want to continue to use the root user to create a table, then only need to set MYSQL_ROOT_PASSWORD the docker-compose.yml it.

Reference Document: ERROR 1396 (HY000): Operation for the CREATE failed the USER 'the root' @ '%' # 129

Logstash: sql_last_value msec after the timestamp of the second to truncate out, this time if the timestamp information database retain mm, then the problem will be repeated to select data occurs.
For example: If the time stamp of the database is 2019-11-20 09: 28: 03.042000, Logstash of sql_last_value will become 2019-11-20 09:28:03. Our query is to select>: sql_last_value, since this data in the database has information milliseconds, then he will always be relatively large, resulting in duplicate selected data.

This problem has been fixed in logstash-jdbc-input 4.3.5 version. If we use the latest version of Logstash, we installed automatic container jdbc-input plug-in also should have this version fixes.
At this point, we must also do the following configuration is jdbc in Logstash profile can make this fix take effect.

sequel_opts => {
	fractional_seconds => true
}

参考文档:Force all usage of sql_last_value to be typed according to the settings #260

In Spring Boot years, when we use @Entity and @Document on the same Entity, the following error occurs:
at The bean 'studyCardPagedJpaRepository', defined in null, could not BE A Registered bean with that name has already been defined in null. and overriding is disabled.

Because @Entity represented by JPA entity in charge of this class, and @Document represented by Elasticsearch
charge of the entity class, use two words together create a conflict and cause an exception. So we need to create another entity classes dedicated to Elasticsearch use.

Reference document: CSDN blog post
reference documents: the Spring stepped pit the Data Record

Author still learning, if there is something wrong, please point out and contain, thank you!
Author: David Chou (Vancouver, Simon Fraser University computer students)

Published 14 original articles · won praise 8 · views 2206

Guess you like

Origin blog.csdn.net/vandavidchou/article/details/103098850