Based Canal and Kafka achieve near real-time synchronization of MySQL Binlog

premise

Recently, basically complete business system architecture, data-level construction is relatively weak, because I currently work focus is to build a small data platform. A higher priority task is the need for near real-time data synchronization business systems (including save, update, delete or soft) to a another data source, the data needs to be cleaned before and persistence to build a relatively easy to follow reasonable business statistics, Building the data model label system like extension. Based on the resources and capabilities of the current team, the priority research Alibabaopen source middleware Canaluse.

This article briefly explain how to quickly build a set of Canalrelated components.

About Canal

Brief introduction

Introduction The following principles and the next are from the Canal project README:

Canal[kə'næl], Translated meaning waterway / pipeline / trench, the main purpose is based on MySQLa database incremental log analysis, provide incremental data subscription and consumption. Alibaba early as Hangzhou and the United States to deploy double room, there is room across business needs synchronized implementation is mainly based on business triggeracquisition incremental changes. Beginning in 2010, the business gradually try to get incremental change log analysis database synchronization, which spawned a large number of incremental subscription databases and consumer businesses.

Log-based incremental subscription and consumer business, including:

  • Database Mirroring
  • Database real-time backup
  • Build and maintain real-time index (resolved isomers index, the inverted index, etc.)
  • Business Cacherefresh
  • With incremental data processing business logic

Canal works

MySQLCopy standby principle:

  • MySQLThe Masterexample of the change data written to the binary log ( binary logwhich is called binary log records events binary log events, can show binlog eventsbe viewed)
  • MySQLThe Slaveinstance masteris binary log eventscopied to its relay log ( relay log)
  • MySQLThe Slaveexample replay relay logof the event, the data changes to reflect its own data

CanalIt works as follows:

  • CanalSimulation MySQL Slaveof the interaction protocols, disguised himself as MySQL Slaveto MySQL Mastersend dumpprotocol
  • MySQL MasterReceipt of dumpthe request, starting a push binary logto Slave(i.e. Canal)
  • CanalParsing binary logthe object (as the original bytestream), and may be transmitted to the corresponding connector or the like through the message queue middleware

Versions and components on the Canal

As of the time I started writing this article ( 2020-03-05), Canalthe latest version v1.1.5-alpha-1( 2019-10-09released), is the latest official version v1.1.4( 2019-09-02released). Among them, v1.1.4the major added authentication, monitoring functions, and made a series of performance optimizations, this version of the connectors are integrated Tcp, Kafkaand RockerMQ. The v1.1.5-alpha-1version has added a RabbitMQconnector, but this version of the RabbitMQconnector is temporarily unable to define the connection RabbitMQport number, but the problem has been masterrepaired branch (specifically see the source code of the CanalRabbitMQProducersubmitted record class). In other words, v1.1.4the version currently can only use the built-in connectors Tcp, Kafkaand RockerMQthree kinds of early adopters if you want to use the RabbitMQconnector, you can use one of the following two ways:

  • Selection v1.1.5-alpha-1version, but you can not modify RabbitMQthe portproperty, by default 5672.
  • Based on masterthemselves built branch Canal.

At present, the Canalactivity of the project is relatively high, but considering the stability function, the author suggested the use of stable version implemented in a production environment, you can use the current v1.1.4version of the examples in this article use is optional v1.1.4version, with the Kafkaconnectors used . CanalIt includes three core components:

  • canal-admin: Admin module, to provide for WebUIthe Canalmanagement.
  • canal-adapter: Adapter, increase client data adapter and start landing capabilities, including RESTlog adapter, relational database data synchronization (sync table to table), HBasedata synchronization, ESdata synchronization, and so on.
  • canal-deployer: Publisher, where the core functions, including binlogparsing, conversion and sending messages to the connector, and so on functions provided by this module.

In general, the canal-deployermember is required, the other two components can be used on demand.

Deploy the required middleware

To build a set of components can be used to deploy MySQL, Zookeeper, Kafkaand Canalfour instances of middleware, the following simple analysis of the deployment process. Selection of a virtual machine system CentOS7.

Installing MySQL

For simplicity, the choice of yumsource installation (official link is https://dev.mysql.com/downloads/repo/yum):

Info :::
mysql80-Community Community-Release-el7-3 Although the package name with a keyword mysql80, in fact, has integrated MySQL mainstream version 5.6,5.7 and 8.x, and so the latest installation package repository
:::

Selection of the latest version of MySQL8.xCommunity Edition, download CentOS7the applicable rpm包:

cd /data/mysql
wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
// 下载完毕之后
sudo rpm -Uvh mysql80-community-release-el7-3.noarch.rpm

At this time, list what yumthe repository of MySQLrelated packages:

[root@localhost mysql]# yum repolist all | grep mysql
mysql-cluster-7.5-community/x86_64 MySQL Cluster 7.5 Community   disabled
mysql-cluster-7.5-community-source MySQL Cluster 7.5 Community - disabled
mysql-cluster-7.6-community/x86_64 MySQL Cluster 7.6 Community   disabled
mysql-cluster-7.6-community-source MySQL Cluster 7.6 Community - disabled
mysql-cluster-8.0-community/x86_64 MySQL Cluster 8.0 Community   disabled
mysql-cluster-8.0-community-source MySQL Cluster 8.0 Community - disabled
mysql-connectors-community/x86_64  MySQL Connectors Community    enabled:    141
mysql-connectors-community-source  MySQL Connectors Community -  disabled
mysql-tools-community/x86_64       MySQL Tools Community         enabled:    105
mysql-tools-community-source       MySQL Tools Community - Sourc disabled
mysql-tools-preview/x86_64         MySQL Tools Preview           disabled
mysql-tools-preview-source         MySQL Tools Preview - Source  disabled
mysql55-community/x86_64           MySQL 5.5 Community Server    disabled
mysql55-community-source           MySQL 5.5 Community Server -  disabled
mysql56-community/x86_64           MySQL 5.6 Community Server    disabled
mysql56-community-source           MySQL 5.6 Community Server -  disabled
mysql57-community/x86_64           MySQL 5.7 Community Server    disabled
mysql57-community-source           MySQL 5.7 Community Server -  disabled
mysql80-community/x86_64           MySQL 8.0 Community Server    enabled:    161
mysql80-community-source           MySQL 8.0 Community Server -  disabled

Edit /etc/yum.repos.d/mysql-community.repothe file ( [mysql80-community]block enabled设置为1, in fact, the default is like this, do not change, if you want to choose 5.xthe version you need to modify the corresponding block):

[mysql80-community]
name=MySQL 8.0 Community Server
baseurl=http://repo.mysql.com/yum/mysql-8.0-community/el/7/$basearch/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql

And then install the MySQLservice:

sudo yum install mysql-community-server

This process is relatively long, because of the need to download and install five rpminstallation (or installation package archive all combinations mysql-8.0.18-1.el7.x86_64.rpm-bundle.tar). If the network is relatively poor, can also be installed directly from the official website to manually download:

// 下载下面5个rpm包 common --> libs --> libs-compat --> client --> server
mysql-community-common
mysql-community-libs
mysql-community-libs-compat
mysql-community-client
mysql-community-server

// 强制安装
rpm -ivh mysql-community-common-8.0.18-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-libs-8.0.18-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-libs-compat-8.0.18-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-client-8.0.18-1.el7.x86_64.rpm --force --nodeps
rpm -ivh mysql-community-server-8.0.18-1.el7.x86_64.rpm --force --nodeps

After installation, start the MySQLservice, and then search MySQLservice rootaccount temporary password for the first landing ( mysql -u root -p):

// 启动服务,关闭服务就是service mysqld stop
service mysqld start
// 查看临时密码 cat /var/log/mysqld.log
[root@localhost log]# cat /var/log/mysqld.log 
2020-03-02T06:03:53.996423Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.18) initializing of server in progress as process 22780
2020-03-02T06:03:57.321447Z 5 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: >kjYaXENK6li
2020-03-02T06:04:00.123845Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.18) starting as process 22834
// 登录临时root用户,使用临时密码
[root@localhost log]# mysql -u root -p

Then do the following:

  • Modify the rootuser's password: ALTER USER 'root'@'localhost' IDENTIFIED BY 'QWqw12!@';(Note the password rules must contain uppercase and lowercase letters, numbers and special characters)
  • Update rootthe hostswitch database use mysql;, designated hostas %to allow remote access to other serversUPDATE USER SET HOST = '%' WHERE USER = 'root';
  • Giving 'root'@'%'users all the permissions, executeGRANT ALL PRIVILEGES ON *.* TO 'root'@'%';
  • Change root'@'%user password validation rules so that you can use Navicattools such as access to:ALTER USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';

After the operation is complete, you can use rootremote access on this virtual machine MySQLservices. Finally, make sure to open up binlog(note that is MySQL8.xenabled by default binlog) SHOW VARIABLES LIKE '%bin%';:

Finally, in MySQLthe Shellfollowing command, create a new user name canalpassword for QWqw12!@new users, giving REPLICATION SLAVEand REPLICATION CLIENTpermissions:

CREATE USER canal IDENTIFIED BY 'QWqw12!@';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;
ALTER USER 'canal'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';

Switching back to rootthe user, create a database test:

CREATE DATABASE `test` CHARSET `utf8mb4` COLLATE `utf8mb4_unicode_ci`;

Install Zookeeper

CanalAnd Kafkaclusters rely on Zookeeperto do service coordination, in order to facilitate the management, usually independent deployment Zookeeperservice or Zookeepercluster. Here I choose 2020-03-04released 3.6.0version:

midkr /data/zk
# 创建数据目录
midkr /data/zk/data
cd /data/zk
wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.6.0/apache-zookeeper-3.6.0-bin.tar.gz
tar -zxvf apache-zookeeper-3.6.0-bin.tar.gz
cd apache-zookeeper-3.6.0-bin/conf
cp zoo_sample.cfg zoo.cfg && vim zoo.cfg

The zoo.cfgdocument dataDirset /data/zk/data, and then start Zookeeper:

[root@localhost conf]# sh /data/zk/apache-zookeeper-3.6.0-bin/bin/zkServer.sh start
/usr/bin/java
ZooKeeper JMX enabled by default
Using config: /data/zk/apache-zookeeper-3.6.0-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

Note here that we have to start this version of the Zookeeperservice must be installed locally JDK8+, this requires discretion. The default port is started 2181, the log after a successful start as follows:

AnSo Kafka

KafkaIs a high-performance distributed middleware message queue, it depends on the deployment Zookeeper. The author of this choice 2.4.0and the Scalaversion 2.13of the installation package:

mkdir /data/kafka
mkdir /data/kafka/data
wget http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.4.0/kafka_2.13-2.4.0.tgz
tar -zxvf kafka_2.13-2.4.0.tgz

Because after decompression /data/kafka/kafka_2.13-2.4.0/config/server.propertiesconfiguration corresponding zookeeper.connect=localhost:2181already meet the need, without having to modify, you need to modify the log file directory log.dirsis /data/kafka/data. And then start the Kafkaservice:

sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-server-start.sh /data/kafka/kafka_2.13-2.4.0/config/server.properties

Once this starts to exit the console will be the end of the Kafkaprocess, you can add -daemonparameters to control the Kafkabackground process does not hang up operation.

sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-server-start.sh -daemon /data/kafka/kafka_2.13-2.4.0/config/server.properties

Installation and use of Canal

And finally to the main character on stage, where the choice Canalof v1.1.4stable release, only need to download the deployermodule:

mkdir /data/canal
cd /data/canal
# 这里注意一点,Github在国内被墙,下载速度极慢,可以先用其他下载工具下载完再上传到服务器中
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz
tar -zxvf canal.deployer-1.1.4.tar.gz

Unpacked directory as follows:

- bin   # 运维脚本
- conf  # 配置文件
  canal_local.properties  # canal本地配置,一般不需要动
  canal.properties        # canal服务配置
  logback.xml             # logback日志配置
  metrics                 # 度量统计配置
  spring                  # spring-实例配置,主要和binlog位置计算、一些策略配置相关,可以在canal.properties选用其中的任意一个配置文件
  example                 # 实例配置文件夹,一般认为单个数据库对应一个独立的实例配置文件夹
    instance.properties   # 实例配置,一般指单个数据库的配置
- lib   # 服务依赖包
- logs  # 日志文件输出目录

In development and test environments recommendations to logback.xmlthe log level modified to DEBUGfacilitate positioning problem. We should focus on canal.propertiesand instance.propertiestwo configuration files. canal.propertiesFile, you need to modify:

  • Remove canal.instance.parser.parallelThreadSize = 16this configuration item comments , that is, to enable this configuration item, and the number of threads associated parser instance, will show not configured to block or not to be resolved.
  • canal.serverModeConfiguration items specified as kafkaoptional values are tcp, kafkaand rocketmq( masterbranch or the latest v1.1.5-alpha-1version, you can use rabbitmq), by default kafka.
  • canal.mq.serversConfiguration requires designated as a Kafkaservice or a cluster Brokeraddress configured here 127.0.0.1:9092.

canal.mq.servers have different meanings in different canal.serverMode.
Under kafka mode, refers to Kafka service or Broker cluster address, which is bootstrap.servers
under rocketmq mode, refer to the list of NameServer
under rabbitmq mode, refer to Host RabbitMQ services and Port

Other configuration items can refer to the following two official Wikilinks:

instance.propertiesGenerally refers to a database instance configuration, Canalarchitecture supports a Canalservice instance, handle multiple database instances binlogasynchronous resolution. instance.propertiesNeed to modify the configuration items include:

  • canal.instance.mysql.slaveIdAnd the need to configure a Masternode serving IDcompletely different values, here configured 654321.
  • Examples of sources of configuration data, including the address, the user password and the target database:
    • canal.instance.master.addressHere designated 127.0.0.1:3306.
    • canal.instance.dbUsernameHere designated canal.
    • canal.instance.dbPasswordHere designated QWqw12!@.
    • New canal.instance.defaultDatabaseName, designated here test(need to MySQLbuild in a testdatabase, see the previous procedure).
  • KafkaConfiguration, here temporarily using static topicand single partition:
    • canal.mq.topic, Herein designated test, which is parsed by the binlogstructured data is sent to Kafkathe name testof topicthe .
    • canal.mq.partitionHere designated 0.

After configuring the job done, you can start Canalthe service:

sh /data/canal/bin/startup.sh 
# 查看服务日志
tail -100f /data/canal/logs/canal/canal
# 查看实例日志  -- 一般情况下,关注实例日志即可
tail -100f /data/canal/logs/example/example.log

After the start, see examples of the log are as follows:

In testcreating a database table orders, and perform a few simple DML:

use `test`;

CREATE TABLE `order`
(
    id          BIGINT UNIQUE PRIMARY KEY AUTO_INCREMENT COMMENT '主键',
    order_id    VARCHAR(64)    NOT NULL COMMENT '订单ID',
    amount      DECIMAL(10, 2) NOT NULL DEFAULT 0 COMMENT '订单金额',
    create_time DATETIME       NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
    UNIQUE uniq_order_id (`order_id`)
) COMMENT '订单表';

INSERT INTO `order`(order_id, amount) VALUES ('10086', 999);
UPDATE `order` SET amount = 10087 WHERE order_id = '10086';
DELETE  FROM `order` WHERE order_id = '10086';

This time, can take advantage Kafkaof kafka-console-consumeror Kafka Toolsto view testthe topicdata:

sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic test

Specific data are as follows:

// test数据库建库脚本
{"data":null,"database":"`test`","es":1583143732000,"id":1,"isDdl":false,"mysqlType":null,"old":null,"pkNames":null,"sql":"CREATE DATABASE `test` CHARSET `utf8mb4` COLLATE `utf8mb4_unicode_ci`","sqlType":null,"table":"","ts":1583143930177,"type":"QUERY"}

// order表建表DDL
{"data":null,"database":"test","es":1583143957000,"id":2,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"CREATE TABLE `order`\n(\n    id          BIGINT UNIQUE PRIMARY KEY AUTO_INCREMENT COMMENT '主键',\n    order_id    VARCHAR(64)    NOT NULL COMMENT '订单ID',\n    amount      DECIMAL(10, 2) NOT NULL DEFAULT 0 COMMENT '订单金额',\n    create_time DATETIME       NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',\n    UNIQUE uniq_order_id (`order_id`)\n) COMMENT '订单表'","sqlType":null,"table":"order","ts":1583143958045,"type":"CREATE"}

// INSERT
{"data":[{"id":"1","order_id":"10086","amount":"999.0","create_time":"2020-03-02 05:12:49"}],"database":"test","es":1583143969000,"id":3,"isDdl":false,"mysqlType":{"id":"BIGINT","order_id":"VARCHAR(64)","amount":"DECIMAL(10,2)","create_time":"DATETIME"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"order_id":12,"amount":3,"create_time":93},"table":"order","ts":1583143969460,"type":"INSERT"}

// UPDATE
{"data":[{"id":"1","order_id":"10086","amount":"10087.0","create_time":"2020-03-02 05:12:49"}],"database":"test","es":1583143974000,"id":4,"isDdl":false,"mysqlType":{"id":"BIGINT","order_id":"VARCHAR(64)","amount":"DECIMAL(10,2)","create_time":"DATETIME"},"old":[{"amount":"999.0"}],"pkNames":["id"],"sql":"","sqlType":{"id":-5,"order_id":12,"amount":3,"create_time":93},"table":"order","ts":1583143974870,"type":"UPDATE"}

// DELETE
{"data":[{"id":"1","order_id":"10086","amount":"10087.0","create_time":"2020-03-02 05:12:49"}],"database":"test","es":1583143980000,"id":5,"isDdl":false,"mysqlType":{"id":"BIGINT","order_id":"VARCHAR(64)","amount":"DECIMAL(10,2)","create_time":"DATETIME"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":-5,"order_id":12,"amount":3,"create_time":93},"table":"order","ts":1583143981091,"type":"DELETE"}

Visible Kafkaname testof topichas been written structured corresponding binlogevent data, the consumer can write listen Kafkacorresponding topicthen to obtain data for subsequent processing.

summary

This article describes the most space for other middleware is how to deploy, the side that the issue Canalitself is not complicated to deploy, its configuration file more attribute items, but actually requires self-defined configuration items and changes are relatively less, that is to understand its operation and maintenance costs and learning costs are not high. Later analyzes based on structured binlogto do events ELTand persistence as well as work-related Canalproduction environment available levels HAto build clusters.

References:

personal blog

(End herein, c-3-d ea-20200306)

Guess you like

Origin www.cnblogs.com/throwable/p/12483983.html
Recommended