Best practices for synchronizing mysql data to ElasticSearch

Elasticsearch is a real-time distributed search and analysis engine. It can help you process large-scale data with unprecedented speed. ElasticSearch is a Lucene-based search server. It provides a full-text search engine with distributed multi-user capabilities, based on a RESTful web interface. Elasticsearch is developed in Java and released as an open source under the terms of the Apache license. It is a popular enterprise search engine. Designed for use in cloud computing, it can achieve real-time search, stable, reliable, fast, and easy to install and use.

Install es and head plugins, omitted.

In order to enable massive data to provide real-time and fast queries, mysql is obviously unable to do so, so we need to use es to provide big data search services. The typical scenario is: product or commodity search.

The first is data synchronization. There are many ways to synchronize mysql data to es. After testing, the stable and easy-to-use logtash-input-jdbc

How to install logstash-input-jdbc plugin?

Reference: http://blog.csdn.net/yeyuma/article/details/50240595#quote 

Full synchronization and incremental synchronization

Full synchronization means to synchronize all data to es, usually when es is just established and used for the first synchronization. Incremental synchronization means to synchronize subsequent updates and insert records to es. (There is no way to synchronize deleted records, you can only execute your own delete commands on both sides)
According to the company's internal practice, the principle of incremental synchronization of logstash-input-jdbc is very simple. When we do incremental synchronization, we need to know insert and update records. Therefore, to enter the table (the target to be synchronized) that ES provides search services, update_time must be added, and this field is updated every time it is inserted and updated, so that logstash- Just know the input-jdbc.
For details, see: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_predefined_parameters

关键点:
where t.update_time > :sql_last_value

Test Results:

Update a piece of data first

Then query in es to see if it has been updated to

Successfully synced automatically!

If you need to synchronize multiple tables at the same time, you need the following configuration

Copy code
input {
  jdbc {
    jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
    jdbc_user => "root"
    jdbc_password => "password"
    schedule => "* * * * *"
    statement => "select * from table1"
    type => "table1"
  }
  jdbc {
    jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar"
    jdbc_driver_class => "com.mysql.jdbc.Driver"
    jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name"
    jdbc_user => "root"
    jdbc_password => "password"
    schedule => "* * * * *"
    statement => "select * from table2"
    type => "table2"
  }
  # add more jdbc inputs to suit your needs 
}
output {
    elasticsearch {
        index => "testdb"
        document_type => "%{type}"   # <- use the type from each input
        hosts => "localhost:9200"
    }
}
Copy code

Guess you like

Origin blog.csdn.net/litianquan/article/details/80870046