This article describes how to use the canal incremental synchronization mysql database information to ElasticSearch. (Note: incremental !!!)
1 Introduction
Introduction 1.1 canal
Canal is a MySQL based binary log performance data synchronization system. Widely used in Canal Alibaba (including https://www.taobao.com ), to provide a reliable low-latency data pipe increments, github Address: https://github.com/alibaba/canal
Canal Server can parse MySQL binlog and subscribe to data changes, and the changes can be implemented Canal Client broadcast to any place, such as databases and Apache Kafka.
It has the following features:
- All supported platforms.
- Support powered by Prometheus fine-grained monitoring system.
- Analytical support and subscribe to MySQL binlog in different ways, for example by GTID.
- Support high-performance, real-time data synchronization. (See Performance)
- Canal Server and Canal Client support HA / Scalability, supported by Apache ZooKeeper
- Docker support.
Disadvantages:
Does not support full amount update only supports incremental updates.
The full wiki Address: https://github.com/alibaba/canal/wiki
1.2 How it works
The principle is simple:
- Canal analog MySQL interactive protocol of the slave, slave disguised as MySQL, and transmitted to the forwarding protocol Master MySQL server.
- MySQL Master dump request is received and begins to push the binary log Slave (i.e. canal).
- Canal binary log object resolves to their data type (raw byte stream)
as the picture shows:
1.3 synchronous es
Es to synchronize data when the need to use an adapter: canal adapter. The latest version 1.1.3, Download: https://github.com/alibaba/canal/releases .
Seemingly es currently supported versions 6.x, 7.x versions are not supported! ! !
2. Preparation
2.1 es and jdk
Reference may be mounted es: https://www.dalaoyang.cn/article/78
Reference can be installed jdk: https://www.dalaoyang.cn/article/16
2.2 Installation canal server
Download canal.deployer-1.1.3.tar.gz
wget https://github.com/alibaba/canal/releases/download/canal-1.1.3/canal.deployer-1.1.3.tar.gz
unzip files
tar -zxvf canal.deployer-1.1.3.tar.gz
Unzip files into the folder
cd canal.deployer-1.1.3
Modify conf / example / instance.properties file, several major attention to the following:
- canal.instance.master.address: address database, for example 127.0.0.1:3306
- canal.instance.dbUsername: Database User
- canal.instance.dbPassword: Database Password
Complete reads as follows:
#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal.instance.mysql.slaveId=0
# enable gtid use true/false
canal.instance.gtidon=false
# position info
canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=
# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=
# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=
#canal.instance.tsdb.dbUsername=
#canal.instance.tsdb.dbPassword=
#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=
# username/password
canal.instance.dbUsername=root
canal.instance.dbPassword=12345678
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==
# table regex
canal.instance.filter.regex=.*\\..*
# table black regex
canal.instance.filter.black.regex=
# mq config
#canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
#canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#################################################
Back canal.deployer-1.1.3 directory, start the canal:
sh bin/startup.sh
View Log:
vi logs/canal/canal.log
View specific instance log:
vi logs/example/example.log
Close the command
sh bin/stop.sh
2.3 Installation canal-adapter
Download canal.adapter-1.1.3.tar.gz
wget https://github.com/alibaba/canal/releases/download/canal-1.1.3/canal.adapter-1.1.3.tar.gz
Decompression
tar -zxvf canal.adapter-1.1.3.tar.gz
Unzip files into the folder
cd canal.adapter-1.1.3
Modify conf / application.yml file, the main attention to the following, because it is yml file, note the name of my property described here:
- server.port: canal-adapter port number
- canal.conf.canalServerHost: canal-server and ip address
- canal.conf.srcDataSources.defaultDS.url: Address Database
- canal.conf.srcDataSources.defaultDS.username: database user name
- canal.conf.srcDataSources.defaultDS.password: Database Password
- canal.conf.canalAdapters.groups.outerAdapters.hosts: es host address, tcp port
Complete reads as follows:
server:
port: 8081
spring:
jackson:
date-format: yyyy-MM-dd HH:mm:ss
time-zone: GMT+8
default-property-inclusion: non_null
canal.conf:
mode: tcp
canalServerHost: 127.0.0.1:11111
batchSize: 500
syncBatchSize: 1000
retries: 0
timeout:
accessKey:
secretKey:
srcDataSources:
defaultDS:
url: jdbc:mysql://127.0.0.1:3306/test?useUnicode=true
username: root
password: 12345678
canalAdapters:
- instance: example
groups:
- groupId: g1
outerAdapters:
- name: es
hosts: 127.0.0.1:9300
properties:
cluster.name: elasticsearch
Also need to configure the conf / es / *. Yml file, adapter will automatically load all .yml ending in conf / es profiles. Before introducing the configuration, you need to tell us about the table structure used in this case, as follows:
CREATE TABLE `test` (
`id` int(11) NOT NULL,
`name` varchar(200) NOT NULL,
`address` varchar(1000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Es need to manually create the index, such as created here using es-head, as shown below:
Index test structure is as follows:
{
"mappings":{
"_doc":{
"properties":{
"name":{
"type":"text"
},
"address":{
"type":"text"
}
}
}
}
}
Next, create test.yml (file names at will), the contents are well understood _index name for the index, sql corresponding statement reads as follows:
dataSourceKey: defaultDS
destination: example
groupId:
esMapping:
_index: test
_type: _doc
_id: _id
upsert: true
sql: "select a.id as _id,a.name,a.address from test a"
commitBatch: 3000
Once configured, return to the root canal-adapter, execute the command to start
bin/startup.sh
View Log
vi logs/adapter/adapter.log
Close canal-adapter command
bin/stop.sh
3. Test
After all successful start, first look at es-head, as shown, and now there is no data.
Next, we insert a test data in the database, the statement is as follows:
INSERT INTO `test`.`test`(`id`, `name`, `address`) VALUES (7, '北京', '北京市朝阳区');
Then look at the es-head, as follows
Next, look at the log, as follows:
2019-06-22 17:54:15.385 [pool-2-thread-1] DEBUG c.a.otter.canal.client.adapter.es.service.ESSyncService - DML: {"data":[{"id":7,"name":"北京","address":"北京市朝阳区"}],"database":"test","destination":"example","es":1561197255000,"groupId":null,"isDdl":false,"old":null,"pkNames":["id"],"sql":"","table":"test","ts":1561197255384,"type":"INSERT"}
Affected indexes: test
Small knowledge: view the log of the method described above may not be very easy to use, it is recommended to use the following syntax, such as viewing log 200 last line:
tail -200f logs/adapter/adapter.log
4. Summary
1. The full amount of the update can not be achieved, but additions and deletions are possible.
2. Be sure to create a good index in advance.
3.es configuration is tcp port, such as the default of 9300