mysql synchronized to the data extraction in elasticsearch

Learn several ways:

    1. Use the mysql binlog log, this can be used Ali's canal, in synchronization to es

    2. Use the official es recommended logstash-input-jdbc, which is a plug-in logstash, the source address logstash-input-jdbc

First, install logstash, not repeat them here, because the logstash-input-jdbc use ruby ​​hair fast, so also need to install ruby, choose to download and install, installed, open the CMD input ruby ​​-v to see if the installation was successful

 

Then modify the gem source, gem sources -l

Delete the original source

gem sources --remove https://rubygems.org/
add a new source

gem sources -a http://gems.ruby-china.org/
gem sources -l


After the amendment is successful, the data also need to modify the source address of Gemfile:

Bundler the install GEM
the bundle config mirror.https: //rubygems.org https://gems.ruby-china.org
then is mounted logstash-input-jdbc, logstash in the bin directory, the following command:

 

. \ logstash-plugin.bat install logstash- input-jdbc
, appears on the Installation successful after a while, the installation is successful, that is, how to use, and

Official documents address: https: //www.elastic.co/guide/en/logstash/current/plugins-inputs-jdbc.html

Create a new file in the bin directory folder, such as mysql

 

Jdbc.conf create a new file, configuration is as follows:

    

input {
stdin {
}
jdbc {
jdbc_connection_string => "jdbc:mysql://127.0.0.1:3306/test"
jdbc_user => "root"
jdbc_password => "123456"
jdbc_driver_library => "F:\Program Files\logstash-6.2.4\bin\mysql\mysql-connector-java-5.1.30.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_paging_enabled => "true"
jdbc_page_size => "300000"
use_column_value => "true"
tracking_column => "id"
statement_filepath => "F:\Program Files\logstash-6.2.4\bin\mysql\jdbc.sql"
schedule => "* * * * *"
type => "jdbc"
jdbc_default_timezone =>"Asia/Shanghai"
}
}


filter {
json {
Source => "Message"
remove_field => [ "Message"]
}
}


Output {
elasticsearch {
the hosts => [ "localhost: 9200"]
index => "TEST_OUT"
Template => "F.: \ Program Files \ logstash-6.2. . 4 \ bin \ MySQL \ ES-template.json "
template_name =>" T-statistic-OUT-logstash "
template_overwrite => to true
DOCUMENT_TYPE =>" OUT "
the document_id =>"% {ID} "
}
stdout {
CODEC => json_lines
}
}
after sql_last_value change is based on the id: jar package is then connected to the database into this folder, then a new jdbc.sql script file for executing the sql, select * from test where id> =: sql_last_value, , according to the specified time pulling es in the update data, synchronization achieve incremental update data.

Then start logstash, execute the command:

. \ logstash.bat -f. \ mysql \ jdbc.conf
If the start is not successful, and check jdbc.conf jdbc.sql encoding format set in UTF-8 format can no BOM

Information similar to this to start, indicating that the startup is successful, it will automatically execute jdbc.sql sql statement, the query to obtain data synchronization es in:

[2018-07-04T15:59:46,690][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-07-04T15:59:46,869][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-07-04T15:59:46,915][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-07-04T15:59:46,921][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-07-04T15:59:46,936][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>"F:\\Program Files\\logstash-6.2.4\\bin\\mysql\\es-template.json"}
[2018-07-04T15:59:46,951][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"t-statistis-out-template", "order"=>1, "settings"=>{"index"=>{"refresh_interval"=>"5s"}}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>false}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"not_analyzed"}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"not_analyzed"}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}, "acc_id"=>{"type"=>"keyword"}, "acc_name"=>{"type"=>"keyword"}, "acc_pp"=>{"type"=>"keyword"}, "account_price_type"=>{"type"=>"keyword"}, "cyacc_no"=>{"type"=>"keyword"}, "order_id"=>{"type"=>"keyword"}, "voucher_id"=>{"type"=>"keyword"}}}}, "aliases"=>{}}}
[2018-07-04T15:59:46,986][INFO ][logstash.outputs.elasticsearch] Installing elasticsearch template to _template/t-statistic-out-logstash
[2018-07-04T15:59:47,097][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2018-07-04T15:59:47,499][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x1b1682f0 run>"}
The stdin plugin is now waiting for input:
[2018-07-04T15:59:47,613][INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}
[2018-07-04T16:00:01,955][INFO ][logstash.inputs.jdbc ] (0.009963s) SELECT version()
If the synchronization is not required to set up the fields es word, or query word division would lead to incorrect results, fuzzy matching can not be used like that, we need to set the template, template_name and template_overwrite in output jdbc.conf's like a relational database, as follows below:

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "t_statistic_out"
template => "F:\Program Files\logstash-6.2.4\bin\mysql\es-template.json"
template_name => "t-statistic-out-logstash"
template_overwrite => true
document_type => "out"
document_id => "%{id}"
}
stdout {
codec => json_lines
}
}
es-template.json配置文件如下,设置不分词的字段值type为keyword:

{
"template" : "t-statistis-out-template",
"order":1,
"settings": {
"index": {
"refresh_interval": "5s"
}
},
"mappings": {
"_default_": {
"_all" : {"enabled":false},
"dynamic_templates": [
{
"message_field" : {
"match" : "message",
"match_mapping_type" : "string",
"mapping" : { "type" : "string", "index" : "not_analyzed" }
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : { "type" : "string", "index" : "not_analyzed" }
}
}
],
"properties": {
"@timestamp": {
"type": "date"
},
"@version": {
"type": "keyword"
},
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"latitude": {
"type": "half_float"
},
"longitude": {
"type": "half_float"
}
}
},
"acc_id": {
"type": "keyword"
},
"acc_name": {
"type": "keyword"
},
"acc_pp": {
"type": "keyword"
},
"account_price_type": {
"type": "keyword"
},
"cyacc_no": {
"type": "keyword"
},
"order_id": {
"type": "keyword"
},
"voucher_id": {
"type": "keyword"
}

}
}
},
"aliases": { }

}
If it is found to no avail, set the order is greater than 0, logstash start time will default logstash send a file, then you need to delete the default

 

Then delete the corresponding index, reboot, you will find the entry into force of the


---------------------
Author: Dian life is a dream
Source: CSDN
Original: https: //blog.csdn.net/ll840768874/article/details/80914833
copyright Disclaimer: This article is a blogger original article, reproduced, please attach Bowen link!

Guess you like

Origin www.cnblogs.com/HKROnline-SyncNavigator/p/10972095.html