Elasticsearch is a real-time distributed search and analysis engine. It can help you process large-scale data with unprecedented speed. ElasticSearch is a Lucene-based search server. It provides a full-text search engine with distributed multi-user capabilities, based on a RESTful web interface. Elasticsearch is developed in Java and released as an open source under the terms of the Apache license. It is a popular enterprise search engine. Designed for use in cloud computing, it can achieve real-time search, stable, reliable, fast, and easy to install and use.
Install es and head plugins, omitted.
In order to enable massive data to provide real-time and fast queries, mysql is obviously unable to do so, so we need to use es to provide big data search services. The typical scenario is: product or commodity search.
The first is data synchronization. There are many ways to synchronize mysql data to es. After testing, the stable and easy-to-use logtash-input-jdbc
How to install logstash-input-jdbc plugin?
Reference: http://blog.csdn.net/yeyuma/article/details/50240595#quote
Full synchronization and incremental synchronization
Full synchronization means to synchronize all data to es, usually when es is just established and used for the first synchronization. Incremental synchronization means to synchronize subsequent updates and insert records to es. (There is no way to synchronize deleted records, you can only execute your own delete commands on both sides)
According to the company's internal practice, the principle of incremental synchronization of logstash-input-jdbc is very simple. When we do incremental synchronization, we need to know insert and update records. Therefore, to enter the table (the target to be synchronized) that ES provides search services, update_time must be added, and this field is updated every time it is inserted and updated, so that logstash- Just know the input-jdbc.
For details, see: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html#_predefined_parameters
关键点:
where t.update_time > :sql_last_value
Test Results:
Update a piece of data first
Then query in es to see if it has been updated to
Successfully synced automatically!
If you need to synchronize multiple tables at the same time, you need the following configuration
input { jdbc { jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar" jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name" jdbc_user => "root" jdbc_password => "password" schedule => "* * * * *" statement => "select * from table1" type => "table1" } jdbc { jdbc_driver_library => "/Users/logstash/mysql-connector-java-5.1.39-bin.jar" jdbc_driver_class => "com.mysql.jdbc.Driver" jdbc_connection_string => "jdbc:mysql://localhost:3306/database_name" jdbc_user => "root" jdbc_password => "password" schedule => "* * * * *" statement => "select * from table2" type => "table2" } # add more jdbc inputs to suit your needs } output { elasticsearch { index => "testdb" document_type => "%{type}" # <- use the type from each input hosts => "localhost:9200" } }