Database series-MySQL incremental data synchronization based on Canal

Environmental preparation:

1. Redis (default port 6379)

2. Zookeeper (default port 2181)

3. Kafka (default port 9092)

4. Canal (default port 11111)

5. MySQL (default port 3306

Github code address of this article : https://github.com/cheriduk/spring-boot-integration-template

Canal introduction: quote the official introduction


canal is an incremental subscription & consumption component of Alibaba MySQL database Binlog.

Name: canal [kə'næl]
Translation: Waterway/pipe/ditch
Language: Pure java development
Positioning: Based on database incremental log analysis, providing incremental data subscription & consumption, currently mainly supports MySQL

In the early days, Alibaba's B2B company had a need for synchronization across computer rooms due to the deployment of dual computer rooms in Hangzhou and the United States. However, the early database synchronization business was mainly based on the trigger method to obtain incremental changes. However, starting in 2010, Alibaba companies began to gradually try database-based log analysis to obtain incremental changes for synchronization, which resulted in incremental changes. The business of mass subscription & consumption has since opened a new era. ps. The synchronization currently used internally already supports log parsing of some versions of MySQL 8.x and Oracle

Services supported by log incremental subscription & consumption:

  1. Database mirroring

  2. Real-time database backup

  3. Multi-level index (sellers and buyers are indexed separately)

  4. search build (elastic search)

  5. Business cache refresh (redis)

  6. Important business news such as price changes

How Canal works:

The principle is relatively simple:

  1. Canal simulates the interactive protocol of mysql slave, pretends to be mysql slave, and sends dump protocol to mysql master

  2. The mysql master receives the dump request and starts to push the binary log to the slave (that is, canal)

  3. Canal parses binary log objects (originally byte stream)

The above is the official introduction

 

canal link: https://pan.baidu.com/s/1HIT4b30BtXrkHym-w4peww Extraction code: ar6c


 How to use it in project development?

For the actual project, we configure the MQ mode, with RocketMQ or Kafka, canal will send data to the MQ topic, and then consume it through the consumer of the message queue .

This article demonstrates deploying Canal, using Kafka, and synchronizing data to Redis

Through the architecture diagram, we clearly know the components to be used: MySQL, Canal, Kafka, ZooKeeper, Redis

Everyone should know how to build MySQL. There are also many references on the Internet such as ZooKeeper and Redis.

Mainly talk about Kafka construction

First download the installation package on the official website :
 

Unzip, open the /config/server.properties configuration file, modify the log directory

First start ZooKeeper, I use version 3.4.13:

 
Then start Kafka, open cmd in the bin directory of Kafka, and enter the command:

kafka-server-start.bat ../../config/server.properties

We can see the Kafka-related configuration information registered on ZooKeeper through ZooInspector:

 

Then you need to create a queue to receive the data sent by canal, use the command:

 

kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic canaltopic

 The name of the queue created is canaltopic.

Insert picture description here

Configure Cannal Server

Download the relevant installation package from the canal official website :

 

Find the canal.properties configuration file in the canal.deployer-1.1.4/conf directory:

# tcp, kafka, RocketMQ 这里选择kafka模式
canal.serverMode = kafka
# 解析器的线程数,打开此配置,不打开则会出现阻塞或者不进行解析的情况
canal.instance.parser.parallelThreadSize = 16
# 配置MQ的服务地址,这里配置的是kafka对应的地址和端口
canal.mq.servers = 127.0.0.1:9092
# 配置instance,在conf目录下要有example同名的目录,可以配置多个
canal.destinations = example

Then configure instance and find the /conf/example/instance.properties configuration file:

## mysql serverId , v1.0.26+ will autoGen(自动生成,不需配置)
# canal.instance.mysql.slaveId=0

# position info
canal.instance.master.address=127.0.0.1:3306
# 在Mysql执行 SHOW MASTER STATUS;查看当前数据库的binlog
canal.instance.master.journal.name=mysql-bin.000006
canal.instance.master.position=4596
# 账号密码
canal.instance.dbUsername=canal
canal.instance.dbPassword=Canal@****
canal.instance.connectionCharset = UTF-8
#MQ队列名称
canal.mq.topic=canaltopic
#单队列模式的分区下标
canal.mq.partition=0

Database configuration, create authorized account

 The principle of canal is to simulate itself as a mysql slave, so here must be related permissions as a mysql slave 

CREATE USER canal IDENTIFIED BY 'canal';    
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';  
-- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;  
FLUSH PRIVILEGES; 

     For existing accounts, you can query permissions through grants:

show grants for 'canal' 

After the configuration is complete, canal can be started.

Test verification

At this time, you can open the consumer window of Kafka to test whether Kafka receives the message.

kafka-console-consumer.bat --bootstrap-server 127.0.0.1:9092 --from-beginning --topic canaltopic

If garbled characters appear on the console, you need to temporarily set the encoding

Just switch to UTF-8 encoding before executing the cmd command line, use the command line: chcp 65001

Operate the data in the MySQL database, and then observe the changes on Kafka;

The data structure corresponding to the string of characters returned

There are official instructions:

https://github.com/alibaba/canal/wiki/ClientAPI#%E6%95%B0%E6%8D%AE%E5%AF%B9%E8%B1%A1%E6%A0%BC%E5%BC%8F%E7%AE%80%E5%8D%95%E4%BB%8B%E7%BB%8Dentryprotocolproto 

I am using the latest version, the data format given in the official website document may not be updated in time, there are some differences

Start Redis and synchronize data to Redis.

After the environment is set up, write the Redis client code below

First introduce the maven dependency of Kafka and Redis:

 <dependencies>
        <dependency>
            <groupId>org.springframework.kafka</groupId>
            <artifactId>spring-kafka</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-redis</artifactId>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.28</version>
            <scope>compile</scope>
        </dependency>
</dependencies>

Configure in the application.yml file:

spring:  
  redis:
    host: 127.0.0.1
    port: 6379
    database: 0
    password: 123456

Write tool classes for operating Redis:
 

@Component
public class RedisClient {

    /**
     * 获取redis模版
     */
    @Resource
    private StringRedisTemplate stringRedisTemplate;

    /**
     * 设置redis的key-value
     */
    public void setString(String key, String value) {
        setString(key, value, null);
    }

    /**
     * 设置redis的key-value,带过期时间
     */
    public void setString(String key, String value, Long timeOut) {
        stringRedisTemplate.opsForValue().set(key, value);
        if (timeOut != null) {
            stringRedisTemplate.expire(key, timeOut, TimeUnit.SECONDS);
        }
    }

    /**
     * 获取redis中key对应的值
     */
    public String getString(String key) {
        return stringRedisTemplate.opsForValue().get(key);
    }

    /**
     * 删除redis中key对应的值
     */
    public Boolean deleteKey(String key) {
        return stringRedisTemplate.delete(key);
    }
}

Create MQ consumer for data synchronization

Add kafka configuration information to the application.yml configuration file:

spring:
  kafka:
  	# Kafka服务地址
    bootstrap-servers: 127.0.0.1:9092
    consumer:
      # 指定一个默认的组名
      group-id: consumer-group1
      #序列化反序列化
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringDeserializer
      value-serializer: org.apache.kafka.common.serialization.StringDeserializer
      # 批量抓取
      batch-size: 65536
      # 缓存容量
      buffer-memory: 524288

You can create a CanalBean object to receive 

public class CanalBean {
    //数据
    private List<Student> data;
    //数据库名称
    private String database;
    private long es;
    //递增,从1开始
    private int id;
    //是否是DDL语句
    private boolean isDdl;
    //表结构的字段类型
    private MysqlType mysqlType;
    //UPDATE语句,旧数据
    private String old;
    //主键名称
    private List<String> pkNames;
    //sql语句
    private String sql;
    private SqlType sqlType;
    //表名
    private String table;
    private long ts;
    //(新增)INSERT、(更新)UPDATE、(删除)DELETE、(删除表)ERASE等等
    private String type;
    //getter、setter方法
}
public class MysqlType {
    private String id;
    private String commodity_name;
    private String commodity_price;
    private String number;
    private String description;
    //getter、setter方法
}
public class SqlType {
    private int id;
    private int commodity_name;
    private int commodity_price;
    private int number;
    private int description;
}

Create the Bean corresponding to the business test table for testing use

 

@Data // lombok插件依赖
public class Student implements Serializable {
    private Long id;

    private String name;

    private Integer age;

    private static final long serialVersionUID = 1L;
}

Finally, you can create a consumer CanalConsumer for consumption

package com.gary.sync.consumer;

import com.alibaba.fastjson.JSONObject;
import com.gary.sync.model.CanalBean;
import com.gary.sync.model.Student;
import com.gary.sync.model.TbCommodityInfo;
import com.gary.sync.redis.RedisClient;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;

import javax.annotation.Resource;
import java.util.List;

@Component
public class CanalConsumer {
    //日志记录
    private static Logger log = LoggerFactory.getLogger(CanalConsumer.class);
    //redis操作工具类
    @Resource
    private RedisClient redisClient;
    //监听的队列名称为:canaltopic
    @KafkaListener(topics = "canaltopic")
    public void receive(ConsumerRecord<?, ?> consumer) {
        String value = (String) consumer.value();
        log.info("topic名称:{},key:{},分区位置:{},下标:{},value:{}", consumer.topic(), consumer.key(),consumer.partition(), consumer.offset(), value);
        //转换为javaBean
        CanalBean canalBean = JSONObject.parseObject(value, CanalBean.class);
        //获取是否是DDL语句
        boolean isDdl = canalBean.getIsDdl();
        //获取类型
        String type = canalBean.getType();
        //不是DDL语句
        if (!isDdl) {
            List<Student> students = canalBean.getData();
            //过期时间
            long TIME_OUT = 600L;
            if ("INSERT".equals(type)) {
                //新增语句
                for (Student student : students) {
                    Long id = student.getId();
                    //新增到redis中,过期时间是10分钟
                    redisClient.setString(String.valueOf(id), JSONObject.toJSONString(student), TIME_OUT);
                }
            } else if ("UPDATE".equals(type)) {
                //更新语句
                for (Student student : students) {
                    Long id = student.getId();
                    //更新到redis中,过期时间是10分钟
                    redisClient.setString(String.valueOf(id), JSONObject.toJSONString(student), TIME_OUT);
                }
            } else {
                //删除语句
                for (Student student : students) {
                    Long id = student.getId();
                    //从redis中删除
                    redisClient.deleteKey(String.valueOf(id));
                }
            }
        }
    }
}

Test MySQL and Redis data synchronization

Start -zookeeper-"kafka-"canal-"redis in turn

zookeeper

kafka 

canal 

repeat

Test data preparation:

Create a table in MySQL first

DROP TABLE IF EXISTS `student`;

CREATE TABLE `student` (
  `id` int NOT NULL,
  `name` varchar(25) DEFAULT NULL,
  `age` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Then start the project 

 Then add a piece of data:

INSERT INTO `test`.`student` (`id`, `name`, `age`) 
VALUES
  ('777', '测试', '123') ;

The new data is found in the student table:

Redis also found the corresponding data, proving that the synchronization was successful!

 

scenes to be used:

  1. Canal can only synchronize incremental data.

  2. Not real-time synchronization, but quasi-real-time synchronization.

Incremental synchronization, scenarios that are not very realizable

 

 

Guess you like

Origin blog.csdn.net/Coder_Boy_/article/details/111055381