Data synchronization tool - canal

Foreword

About two years ago, I encountered the problem of data synchronization in a project.

At that time, the system deployed dozens example, is divided into a central platform and platform sub-centers N, and each system corresponds to a separate database instance.

In the database level, there is such a demand:

  • Data center platform database to include all system platforms.
  • Sub central database only contains data of the system platform.
  • You can add or modify the platform in the center of sub- data center platform, but talk about real-time data synchronization to the corresponding sub-center platform database.

Between dozens of database instances, there is no clear master-slave relationship, but also the source of the data is synchronized, so do not use MySQL master-slave synchronization do.

At that time, I experimented with several ways, ways last used is based on Mybatis interceptor mechanism + message queue of the way to do.

Principle is probably Mybatis interceptor to intercept transaction operations, such as add, modify, and delete, custom data according to the primary key (source and destination identification data), encapsulated into objects posted to the message queue corresponding to the topic. Then, different for each system monitor topic, consumption data and synchronized to the database.

Over the next period of time, we know the canal open-source components. Find it more directly, it can parse the data from the MySQL binlog, posted to the message queue or elsewhere.

A, canal Profile

Speaking canal, Alibaba there is data traffic demand synchronization. So starting in 2010, Ali began a gradual attempt to resolve the log-based database, access the incremental change synchronization, thus derived from the incremental subscription & consumer business.

Log on incremental subscription & support consumer business:

  • Database Mirroring
  • Database real-time backup
  • Multi-level index (sellers and buyers each sub-library index)
  • search build
  • Business cache refresh
  • Price changes and other important business news

We are based on the mechanism canal to complete a series such as data synchronization, cache flushing and other services.

Second, start the canal

1, modify the MySQL configuration

For self-built MySQL service, you need to open Binlog write function, the configuration is ROW binlog-format mode, my.cnf configured as follows:

[mysqld]
log-bin=mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复
复制代码

Then create an account, link to MySQL, MySQL slave as the authority of.

CREATE USER canal IDENTIFIED BY 'canal';  
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
FLUSH PRIVILEGES;
复制代码

2. Download

Download the canal is very simple, accessible releases page select the package needs to download and then unzip the downloaded package to the specified directory.

tar -zxvf canal.deployer-1.1.4.tar.gz -C /canal

After the extraction is completed, we can see a directory like this:

3, modify the configuration

Before you start, you also need to modify some configuration information.

First, locate canal/conf/example, edit instance.propertiesthe configuration file, there are several Key:

canal.instance.mysql.slaveId=1234               # canal模拟slaveid
canal.instance.master.address=127.0.0.1:3306    # MySQL数据库地址
canal.instance.dbUsername=canal                 # 作为slave角色的账户
canal.instance.dbPassword=canal                 # 作为slave角色的账户密码
canal.instance.connectionCharset = UTF-8        # 数据库编码方式对应Java中的编码类型
canal.instance.filter.regex=.*\\..*             # 表过滤的表达式
canal.mq.topic=example                          # MQ 主题名称
复制代码

We want to listen to the canal of data to be sent to the message queue, also need to modify canal.propertiesthe file, here is the main MQ configuration. Here I use a version of Ali cloud RocketMQ, parameters are as follows:

# 配置ak/sk
canal.aliyun.accessKey = XXX
canal.aliyun.secretKey = XXX
# 配置topic
canal.mq.accessChannel = cloud
canal.mq.servers = 内网接入点
canal.mq.producerGroup = GID_**group(在后台创建)
canal.mq.namespace = rocketmq实例id
canal.mq.topic=(在后台创建)
复制代码

4, start

Directly run the startup script to run: ./canal/bin/startup.sh. Then open the logs/canal/canal.logfile, you can start to see the effect.

2020-02-26 21:12:36.715 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.
2020-02-26 21:12:36.746 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[192.168.44.128(192.168.44.128):11111]
2020-02-26 21:12:37.406 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......
复制代码

Third, start the MQ Listener

We listen to the canal of data, delivered to the message queue, then the next step is to write a listener to consume the data.

For convenience, I use directly Ali cloud version RocketMQ, test code is as follows:

public static void main(String[] args) {
	Properties properties = new Properties();
	// 您在控制台创建的 Group ID
	properties.put(PropertyKeyConst.GROUP_ID, "GID_CANAL");
	// AccessKey 阿里云身份验证,在阿里云服务器管理控制台创建
	properties.put(PropertyKeyConst.AccessKey, "accessKey");
	// SecretKey 阿里云身份验证,在阿里云服务器管理控制台创建
	properties.put(PropertyKeyConst.SecretKey, "secretKey");
	// 设置 TCP 接入域名,到控制台的实例基本信息中查看
	properties.put(PropertyKeyConst.NAMESRV_ADDR,"http://MQ_INST_xxx.mq-internet.aliyuncs.com:80");
	// 集群订阅方式(默认)
	// properties.put(PropertyKeyConst.MessageModel, PropertyValueConst.CLUSTERING);
	Consumer consumer = ONSFactory.createConsumer(properties);
	consumer.subscribe("example","*",new CanalListener());
	consumer.start();
	logger.info("Consumer Started");
}
复制代码

Fourth, the test

After the environment are deployed, we entered the testing phase to look at the actual results.

We have a t_accounttable, for example, there is a record of the account id and account balances.

First, we add a record,insert into t_account (id,user_id,amount) values (4,4,200);

At this point, MQ consumption data as follows:

{
	"data": [{
		"id": "4",
		"user_id": "4",
		"amount": "200.0"
	}],
	"database": "seata",
	"es": 1582723607000,
	"id": 2,
	"isDdl": false,
	"mysqlType": {
		"id": "int(11)",
		"user_id": "varchar(255)",
		"amount": "double(14,2)"
	},
	"old": null,
	"pkNames": ["id"],
	"sql": "",
	"sqlType": {
		"id": 4,
		"user_id": 12,
		"amount": 8
	},
	"table": "t_account",
	"ts": 1582723607656,
	"type": "INSERT"
}
复制代码

Can be seen by the data, there is a detailed record of the database name, table name, field content and additional data tables and so on.

Then, we can put the data to change it:update t_account set amount = 150 where id = 4;

At this point, MQ consumption data as follows:

{
	"data": [{
		"id": "4",
		"user_id": "4",
		"amount": "150.0"
	}],
	"database": "seata",
	"es": 1582724016000,
	"id": 3,
	"isDdl": false,
	"mysqlType": {
		"id": "int(11)",
		"user_id": "varchar(255)",
		"amount": "double(14,2)"
	},
	"old": [{
		"amount": "200.0"
	}],
	"pkNames": ["id"],
	"sql": "",
	"sqlType": {
		"id": 4,
		"user_id": 12,
		"amount": 8
	},
	"table": "t_account",
	"ts": 1582724016353,
	"type": "UPDATE"
}
复制代码

It can be seen, in addition to the modified content, but also with Canal oldfield records the value before the modification field.

Finally, we delete this data:delete from t_account where id = 4;

Appropriate, MQ consumption data as follows:

{
	"data": [{
		"id": "4",
		"user_id": "4",
		"amount": "150.0"
	}],
	"database": "seata",
	"es": 1582724155000,
	"id": 4,
	"isDdl": false,
	"mysqlType": {
		"id": "int(11)",
		"user_id": "varchar(255)",
		"amount": "double(14,2)"
	},
	"old": null,
	"pkNames": ["id"],
	"sql": "",
	"sqlType": {
		"id": 4,
		"user_id": 12,
		"amount": 8
	},
	"table": "t_account",
	"ts": 1582724155370,
	"type": "DELETE"
}
复制代码

After listening to the changes in the database table, you can according to their own business scenarios, these data are processed on a business matter.

V. Summary

It can be seen using the canal component can easily monitor the completion of the data changes. If you do use the message queue data synchronization, there is only little need to pay particular attention to that message sequence issues.

binlog itself is orderly, but after writing to how to protect mq order is matter of concern.

In mq order problem here, you can see the order of the canal consumer relevant answers.

Guess you like

Origin juejin.im/post/5e5534d5f265da574e2296ba