premise
Recently, basically complete business system architecture, data-level construction is relatively weak, because I currently work focus is to build a small data platform. A higher priority task is the need for near real-time data synchronization business systems (including save, update, delete or soft) to a another data source, the data needs to be cleaned before and persistence to build a relatively easy to follow reasonable business statistics, Building the data model label system like extension. Based on the resources and capabilities of the current team, the priority research Alibaba
open source middleware Canal
use.
This article briefly explain how to quickly build a set of Canal
related components.
About Canal
Brief introduction
Introduction The following principles and the next are from the Canal project README
:
Canal[kə'næl]
, Translated meaning waterway / pipeline / trench, the main purpose is based on MySQL
a database incremental log analysis, provide incremental data subscription and consumption. Alibaba early as Hangzhou and the United States to deploy double room, there is room across business needs synchronized implementation is mainly based on business trigger
acquisition incremental changes. Beginning in 2010, the business gradually try to get incremental change log analysis database synchronization, which spawned a large number of incremental subscription databases and consumer businesses.
Log-based incremental subscription and consumer business, including:
- Database Mirroring
- Database real-time backup
- Build and maintain real-time index (resolved isomers index, the inverted index, etc.)
- Business
Cache
refresh - With incremental data processing business logic
Canal works
MySQL
Copy standby principle:
MySQL
TheMaster
example of the change data written to the binary log (binary log
which is called binary log records eventsbinary log events
, canshow binlog events
be viewed)MySQL
TheSlave
instancemaster
isbinary log events
copied to its relay log (relay log
)MySQL
TheSlave
example replayrelay log
of the event, the data changes to reflect its own data
Canal
It works as follows:
Canal
SimulationMySQL Slave
of the interaction protocols, disguised himself asMySQL Slave
toMySQL Master
senddump
protocolMySQL Master
Receipt ofdump
the request, starting a pushbinary log
toSlave
(i.e.Canal
)Canal
Parsingbinary log
the object (as the originalbyte
stream), and may be transmitted to the corresponding connector or the like through the message queue middleware
Versions and components on the Canal
As of the time I started writing this article ( 2020-03-05
), Canal
the latest version v1.1.5-alpha-1
( 2019-10-09
released), is the latest official version v1.1.4
( 2019-09-02
released). Among them, v1.1.4
the major added authentication, monitoring functions, and made a series of performance optimizations, this version of the connectors are integrated Tcp
, Kafka
and RockerMQ
. The v1.1.5-alpha-1
version has added a RabbitMQ
connector, but this version of the RabbitMQ
connector is temporarily unable to define the connection RabbitMQ
port number, but the problem has been master
repaired branch (specifically see the source code of the CanalRabbitMQProducer
submitted record class). In other words, v1.1.4
the version currently can only use the built-in connectors Tcp
, Kafka
and RockerMQ
three kinds of early adopters if you want to use the RabbitMQ
connector, you can use one of the following two ways:
- Selection
v1.1.5-alpha-1
version, but you can not modifyRabbitMQ
theport
property, by default5672
. - Based on
master
themselves built branchCanal
.
At present, the Canal
activity of the project is relatively high, but considering the stability function, the author suggested the use of stable version implemented in a production environment, you can use the current v1.1.4
version of the examples in this article use is optional v1.1.4
version, with the Kafka
connectors used . Canal
It includes three core components:
canal-admin
: Admin module, to provide forWebUI
theCanal
management.canal-adapter
: Adapter, increase client data adapter and start landing capabilities, includingREST
log adapter, relational database data synchronization (sync table to table),HBase
data synchronization,ES
data synchronization, and so on.canal-deployer
: Publisher, where the core functions, includingbinlog
parsing, conversion and sending messages to the connector, and so on functions provided by this module.
In general, the canal-deployer
member is required, the other two components can be used on demand.
Deploy the required middleware
To build a set of components can be used to deploy MySQL
, Zookeeper
, Kafka
and Canal
four instances of middleware, the following simple analysis of the deployment process. Selection of a virtual machine system CentOS7
.
Installing MySQL
For simplicity, the choice of yum
source installation (official link is https://dev.mysql.com/downloads/repo/yum
):
Info :::
mysql80-Community Community-Release-el7-3 Although the package name with a keyword mysql80, in fact, has integrated MySQL mainstream version 5.6,5.7 and 8.x, and so the latest installation package repository
:::
Selection of the latest version of MySQL8.x
Community Edition, download CentOS7
the applicable rpm包
:
cd /data/mysql
wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
// 下载完毕之后
sudo rpm -Uvh mysql80-community-release-el7-3.noarch.rpm
At this time, list what yum
the repository of MySQL
related packages:
[root@localhost mysql]# yum repolist all | grep mysql
mysql-cluster-7.5-community/x86_64 MySQL Cluster 7.5 Community disabled
mysql-cluster-7.5-community-source MySQL Cluster 7.5 Community - disabled
mysql-cluster-7.6-community/x86_64 MySQL Cluster 7.6 Community disabled
mysql-cluster-7.6-community-source MySQL Cluster 7.6 Community - disabled
mysql-cluster-8.0-community/x86_64 MySQL Cluster 8.0 Community disabled
mysql-cluster-8.0-community-source MySQL Cluster 8.0 Community - disabled
mysql-connectors-community/x86_64 MySQL Connectors Community enabled: 141
mysql-connectors-community-source MySQL Connectors Community - disabled
mysql-tools-community/x86_64 MySQL Tools Community enabled: 105
mysql-tools-community-source MySQL Tools Community - Sourc disabled
mysql-tools-preview/x86_64 MySQL Tools Preview disabled
mysql-tools-preview-source MySQL Tools Preview - Source disabled
mysql55-community/x86_64 MySQL 5.5 Community Server disabled
mysql55-community-source MySQL 5.5 Community Server - disabled
mysql56-community/x86_64 MySQL 5.6 Community Server disabled
mysql56-community-source MySQL 5.6 Community Server - disabled
mysql57-community/x86_64 MySQL 5.7 Community Server disabled
mysql57-community-source MySQL 5.7 Community Server - disabled
mysql80-community/x86_64 MySQL 8.0 Community Server enabled: 161
mysql80-community-source MySQL 8.0 Community Server - disabled
Edit /etc/yum.repos.d/mysql-community.repo
the file ( [mysql80-community]
block enabled设置为1
, in fact, the default is like this, do not change, if you want to choose 5.x
the version you need to modify the corresponding block):
[mysql80-community]
name=MySQL 8.0 Community Server
baseurl=http://repo.mysql.com/yum/mysql-8.0-community/el/7/$basearch/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql
And then install the MySQL
service:
sudo yum install mysql-community-server
This process is relatively long, because of the need to download and install five rpm
installation (or installation package archive all combinations mysql-8.0.18-1.el7.x86_64.rpm-bundle.tar
). If the network is relatively poor, can also be installed directly from the official website to manually download:
// 下载下面5个rpm包 common --> libs --> libs-compat --> client --> server
mysql-community-common
mysql-community-libs
mysql-community-libs-compat
mysql-community-client
mysql-community-server
// 强制安装
rpm -ivh mysql-community-common-8.0.18-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-libs-8.0.18-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-libs-compat-8.0.18-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-client-8.0.18-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-server-8.0.18-1.el7.x86_64.rpm --force --nodeps
After installation, start the MySQL
service, and then search MySQL
service root
account temporary password for the first landing ( mysql -u root -p
):
// 启动服务,关闭服务就是service mysqld stop
service mysqld start
// 查看临时密码 cat /var/log/mysqld.log
[root@localhost log]# cat /var/log/mysqld.log
2020-03-02T06:03:53.996423Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.18) initializing of server in progress as process 22780
2020-03-02T06:03:57.321447Z 5 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: >kjYaXENK6li
2020-03-02T06:04:00.123845Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.18) starting as process 22834
// 登录临时root用户,使用临时密码
[root@localhost log]# mysql -u root -p
Then do the following:
- Modify the
root
user's password:ALTER USER 'root'@'localhost' IDENTIFIED BY 'QWqw12!@';
(Note the password rules must contain uppercase and lowercase letters, numbers and special characters) - Update
root
thehost
switch databaseuse mysql;
, designatedhost
as%
to allow remote access to other serversUPDATE USER SET HOST = '%' WHERE USER = 'root';
- Giving
'root'@'%'
users all the permissions, executeGRANT ALL PRIVILEGES ON *.* TO 'root'@'%';
- Change
root'@'%
user password validation rules so that you can useNavicat
tools such as access to:ALTER USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';
After the operation is complete, you can use root
remote access on this virtual machine MySQL
services. Finally, make sure to open up binlog
(note that is MySQL8.x
enabled by default binlog
) SHOW VARIABLES LIKE '%bin%';
:
Finally, in MySQL
the Shell
following command, create a new user name canal
password for QWqw12!@
new users, giving REPLICATION SLAVE
and REPLICATION CLIENT
permissions:
CREATE USER canal IDENTIFIED BY 'QWqw12!@';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;
ALTER USER 'canal'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';
Switching back to root
the user, create a database test
:
CREATE DATABASE `test` CHARSET `utf8mb4` COLLATE `utf8mb4_unicode_ci`;
Install Zookeeper
Canal
And Kafka
clusters rely on Zookeeper
to do service coordination, in order to facilitate the management, usually independent deployment Zookeeper
service or Zookeeper
cluster. Here I choose 2020-03-04
released 3.6.0
version:
midkr /data/zk
# 创建数据目录
midkr /data/zk/data
cd /data/zk
wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.6.0/apache-zookeeper-3.6.0-bin.tar.gz
tar -zxvf apache-zookeeper-3.6.0-bin.tar.gz
cd apache-zookeeper-3.6.0-bin/conf
cp zoo_sample.cfg zoo.cfg && vim zoo.cfg
The zoo.cfg
document dataDir
set /data/zk/data
, and then start Zookeeper
:
[root@localhost conf]# sh /data/zk/apache-zookeeper-3.6.0-bin/bin/zkServer.sh start
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /data/zk/apache-zookeeper-3.6.0-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
Note here that we have to start this version of the Zookeeper
service must be installed locally JDK8+
, this requires discretion. The default port is started 2181
, the log after a successful start as follows:
AnSo Kafka
Kafka
Is a high-performance distributed middleware message queue, it depends on the deployment Zookeeper
. The author of this choice 2.4.0
and the Scala
version 2.13
of the installation package:
mkdir /data/kafka
mkdir /data/kafka/data
wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.4.0/kafka_2.13-2.4.0.tgz
tar -zxvf kafka_2.13-2.4.0.tgz
Because after decompression /data/kafka/kafka_2.13-2.4.0/config/server.properties
configuration corresponding zookeeper.connect=localhost:2181
already meet the need, without having to modify, you need to modify the log file directory log.dirs
is /data/kafka/data
. And then start the Kafka
service:
sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-server-start.sh /data/kafka/kafka_2.13-2.4.0/config/server.properties
Once this starts to exit the console will be the end of the Kafka
process, you can add -daemon
parameters to control the Kafka
background process does not hang up operation.
sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-server-start.sh -daemon /data/kafka/kafka_2.13-2.4.0/config/server.properties
Installation and use of Canal
And finally to the main character on stage, where the choice Canal
of v1.1.4
stable release, only need to download the deployer
module:
mkdir /data/canal
cd /data/canal
# 这里注意一点,Github在国内被墙,下载速度极慢,可以先用其他下载工具下载完再上传到服务器中
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz
tar -zxvf canal.deployer-1.1.4.tar.gz
Unpacked directory as follows:
- bin # 运维脚本
- conf # 配置文件
canal_local.properties # canal本地配置,一般不需要动
canal.properties # canal服务配置
logback.xml # logback日志配置
metrics # 度量统计配置
spring # spring-实例配置,主要和binlog位置计算、一些策略配置相关,可以在canal.properties选用其中的任意一个配置文件
example # 实例配置文件夹,一般认为单个数据库对应一个独立的实例配置文件夹
instance.properties # 实例配置,一般指单个数据库的配置
- lib # 服务依赖包
- logs # 日志文件输出目录
In development and test environments recommendations to logback.xml
the log level modified to DEBUG
facilitate positioning problem. We should focus on canal.properties
and instance.properties
two configuration files. canal.properties
File, you need to modify:
- Remove
canal.instance.parser.parallelThreadSize = 16
this configuration item comments , that is, to enable this configuration item, and the number of threads associated parser instance, will show not configured to block or not to be resolved. canal.serverMode
Configuration items specified askafka
optional values aretcp
,kafka
androcketmq
(master
branch or the latestv1.1.5-alpha-1
version, you can userabbitmq
), by defaultkafka
.canal.mq.servers
Configuration requires designated as aKafka
service or a clusterBroker
address configured here127.0.0.1:9092
.
canal.mq.servers have different meanings in different canal.serverMode.
Under kafka mode, refers to Kafka service or Broker cluster address, which is bootstrap.servers
under rocketmq mode, refer to the list of NameServer
under rabbitmq mode, refer to Host RabbitMQ services and Port
Other configuration items can refer to the following two official Wiki
links:
instance.properties
Generally refers to a database instance configuration, Canal
architecture supports a Canal
service instance, handle multiple database instances binlog
asynchronous resolution. instance.properties
Need to modify the configuration items include:
canal.instance.mysql.slaveId
And the need to configure aMaster
node servingID
completely different values, here configured654321
.- Examples of sources of configuration data, including the address, the user password and the target database:
canal.instance.master.address
Here designated127.0.0.1:3306
.canal.instance.dbUsername
Here designatedcanal
.canal.instance.dbPassword
Here designatedQWqw12!@
.- New
canal.instance.defaultDatabaseName
, designated heretest
(need toMySQL
build in atest
database, see the previous procedure).
Kafka
Configuration, here temporarily using statictopic
and singlepartition
:canal.mq.topic
, Herein designatedtest
, which is parsed by thebinlog
structured data is sent toKafka
the nametest
oftopic
the .canal.mq.partition
Here designated0
.
After configuring the job done, you can start Canal
the service:
sh /data/canal/bin/startup.sh
# 查看服务日志
tail -100f /data/canal/logs/canal/canal
# 查看实例日志 -- 一般情况下,关注实例日志即可
tail -100f /data/canal/logs/example/example.log
After the start, see examples of the log are as follows:
In test
creating a database table orders, and perform a few simple DML
:
use `test`;
CREATE TABLE `order`
(
id BIGINT UNIQUE PRIMARY KEY AUTO_INCREMENT COMMENT '主键',
order_id VARCHAR(64) NOT NULL COMMENT '订单ID',
amount DECIMAL(10, 2) NOT NULL DEFAULT 0 COMMENT '订单金额',
create_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
UNIQUE uniq_order_id (`order_id`)
) COMMENT '订单表';
INSERT INTO `order`(order_id, amount) VALUES ('10086', 999);
UPDATE `order` SET amount = 10087 WHERE order_id = '10086';
DELETE FROM `order` WHERE order_id = '10086';
This time, can take advantage Kafka
of kafka-console-consumer
or Kafka Tools
to view test
the topic
data:
sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic test
Specific data are as follows:
// test数据库建库脚本
{"data":null,"database":"`test`","es":1583143732000,"id":1,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"CREATE DATABASE `test` CHARSET `utf8mb4` COLLATE `utf8mb4_unicode_ci`","sqlType":null,"table":"","ts":1583143930177,"type":"QUERY"}
// order表建表DDL
{"data":null,"database":"test","es":1583143957000,"id":2,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"CREATE TABLE `order`\n(\n id BIGINT UNIQUE PRIMARY KEY AUTO_INCREMENT COMMENT '主键',\n order_id VARCHAR(64) NOT NULL COMMENT '订单ID',\n amount DECIMAL(10, 2) NOT NULL DEFAULT 0 COMMENT '订单金额',\n create_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',\n UNIQUE uniq_order_id (`order_id`)\n) COMMENT '订单表'","sqlType":null,"table":"order","ts":1583143958045,"type":"CREATE"}
// INSERT
{"data":[{"id":"1","order_id":"10086","amount":"999.0","create_time":"2020-03-02 05:12:49"}],"database":"test","es":1583143969000,"id":3,"isDdl":false,"mysqlType":{"id":"BIGINT","order_id":"VARCHAR(64)","amount":"DECIMAL(10,2)","create_time":"DATETIME"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"order_id":12,"amount":3,"create_time":93},"table":"order","ts":1583143969460,"type":"INSERT"}
// UPDATE
{"data":[{"id":"1","order_id":"10086","amount":"10087.0","create_time":"2020-03-02 05:12:49"}],"database":"test","es":1583143974000,"id":4,"isDdl":false,"mysqlType":{"id":"BIGINT","order_id":"VARCHAR(64)","amount":"DECIMAL(10,2)","create_time":"DATETIME"},"old":[{"amount":"999.0"}],"pkNames":["id"],"sql":"","sqlType":{"id":-5,"order_id":12,"amount":3,"create_time":93},"table":"order","ts":1583143974870,"type":"UPDATE"}
// DELETE
{"data":[{"id":"1","order_id":"10086","amount":"10087.0","create_time":"2020-03-02 05:12:49"}],"database":"test","es":1583143980000,"id":5,"isDdl":false,"mysqlType":{"id":"BIGINT","order_id":"VARCHAR(64)","amount":"DECIMAL(10,2)","create_time":"DATETIME"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"order_id":12,"amount":3,"create_time":93},"table":"order","ts":1583143981091,"type":"DELETE"}
Visible Kafka
name test
of topic
has been written structured corresponding binlog
event data, the consumer can write listen Kafka
corresponding topic
then to obtain data for subsequent processing.
summary
This article describes the most space for other middleware is how to deploy, the side that the issue Canal
itself is not complicated to deploy, its configuration file more attribute items, but actually requires self-defined configuration items and changes are relatively less, that is to understand its operation and maintenance costs and learning costs are not high. Later analyzes based on structured binlog
to do events ELT
and persistence as well as work-related Canal
production environment available levels HA
to build clusters.
References:
personal blog
(End herein, c-3-d ea-20200306)