An article lets you get started with Canal data synchronization skills~

Video Tutorial Portal:

Canal minimalist introduction: one hour for you to quickly get started with Canal data synchronization skills~_哔哩哔哩_bilibili Canal minimalist introduction: one hour for you to quickly get started with Canal data synchronization skills~ A total of 13 videos, including: 01. Pre-class tutorial and pre-knowledge points, 02. Canal component understanding, 03. MySQL master-slave replication principle, etc. For more exciting videos of UP master, please pay attention to UP account. https://www.bilibili.com/video/BV1Uc411P7XN/?spm_id_from=333.337.search-card.all.click

1. Pre-knowledge points

1.1 Canal pronunciation

Popular reading: Canal

1.2 Prerequisite knowledge points

  • Basic operation of MySQL

  • Java basics

  • SpringBoot

2. Introduction to Canal

2.1 Historical Background

In the early days, Alibaba deployed computer rooms in Hangzhou and the United States, and there was a business requirement for synchronization across computer rooms. The initial implementation was based on business triggers, which was not very convenient. In 2010, it was gradually replaced by database log analysis, which derived the A large number of database incremental subscription and consumption operations. In this context, Canal came out.

Around 2014, Tmall Double Eleven was first introduced to solve the problem of high concurrent read and write of the MySQL database for large-scale promotional activities. Later, it was widely used and promoted within Ali, and was officially open sourced in 2017.

Github:https://github.com/alibaba/canal

 

2.2 Definition

The Canal component is a component based on MySQL database incremental log parsing, which provides incremental data subscription and consumption, and supports delivery of incremental data to downstream consumers (such as Kafka, RocketMQ, etc.) or storage (such as Elasticsearch, HBase, etc.).

Plain language: Canal senses the changes in MySQL data, then parses the changed data, sends the changed data to MQ or synchronizes to other databases, and waits for further business logic processing.

3. The working principle of Canal

3.1 MySQL master-slave replication principle

 

  • The MySQL master writes data changes to the binary log, or Binlog for short.

  • MySQL slave copies the master's binary log to its relay log (relay log)

  • MySQL slave replays the relay log operation to synchronize the changed data to the latest.

3.2 MySQL Binlog log

3.2.1 Introduction

MySQL's Binlog can be said to be the most important log of MySQL. It records all DDL and DML statements in the form of events.

By default, MySQL does not enable Binlog, because it takes time to record Binlog logs, and the official data shows that there is a 1% performance loss.

Whether to enable it or not depends on the actual situation during development.

Generally speaking, Binlog logs are enabled in the following two scenarios:

  • When MySQL master-slave cluster is deployed, Binlog needs to be enabled on the Master side to facilitate data synchronization to Slaves.

  • The data is restored, and the data is restored by using the MySQL Binlog tool.

3.2.1 Binlog classification

There are three formats of MySQL Binlog, STATEMENT, MIXED, ROW. In the configuration file, you can choose to configure the

置 binlog_format= statement|mixed|row

Classification introduce advantage shortcoming
STATEMENT At the statement level, it records every statement that performs a write operation, which saves space compared to the ROW mode, but may cause data inconsistencies such as update tt set create_date=now(), and the hungry data will be different due to different execution times save space May cause data inconsistency
ROW Row level, records the changes of each row record after each operation. If the SQL execution result of an update is 10,000 rows and only one statement is saved, if it is a row, the result of the 10,000 rows will be saved here. Maintain absolute consistency of data. Because no matter what sql is, what function is referenced, it only records the effect after execution Take up a lot of space
MIXED It is an upgrade of the statement. For example, when the function contains UUID(), when the table containing the AUTO_INCREMENT field is updated, when the INSERT DELAYED statement is executed, when using UDF, it will be processed in the way of ROW Save space while maintaining a certain degree of consistency There are still some rare cases that still cause inconsistencies. In addition, statement and mixed are inconvenient for situations that require binlog monitoring.

Based on the above comparison, Canal wants to do monitoring and analysis, and it is more appropriate to choose the row format.  

3.3 Canal working principle

  • Canal disguises itself as MySQL slave (slave library) and sends dump protocol to MySQL master (main library)

  • MySQL master (main library) receives dump request and starts to push binary log to slave (that is, canal)

  • Canal receives and parses Binlog logs, obtains changed data, and executes subsequent logic

 

4. Canal application scenarios

4.1 Data Synchronization

Canal can help users perform various data synchronization operations, such as real-time synchronization of MySQL data to Elasticsearch, Redis and other data storage media.

4.2 Database real-time monitoring

Canal can monitor the update operation of MySQL in real time, and can notify relevant personnel in time of the modification of sensitive data.

4.3 Data Analysis and Mining

Canal can post MySQL incremental data to message queues such as Kafka to provide data sources for data analysis and mining.

 

4.4 Database backup

Canal can copy the data incremental log on the MySQL master database to the standby database to realize database backup.

 

4.5 Data Integration

Canal can integrate data from multiple MySQL databases to provide more efficient and reliable solutions for data processing.

4.6 Database migration

Canal can assist in completing MySQL database version upgrades and data migration tasks.

 

Five, MySQL preparation

5.1 Create a database

New library: canal-demo

 

5.2 Create table

user table

CREATE TABLE `user` (
  `id` bigint NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `age` int DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;

5.3 Modify the configuration file to enable Binlog support

Modify the configuration file of mysql, a name is: my.ini

server-id=1
log-bin=C:/ProgramData/MySQL/MySQL Server 8.0/binlogs/mysql-bin.log
binlog_format=row
binlog-do-db=canal-demo

server-id: mysql instance id, used to distinguish instances when clustering

lob-bin: binlog log file name

binlog_format: binlog log data storage format

binlog-do-db: Specifies to enable the binlog log database.

Note: Generally, the database to be synchronized is specified according to the situation. If not configured, it means that all databases have Binlog enabled.

5.4 Verify that Binlog takes effect

Restart the MySQL service and view the Binlog log

Method 1:

show  VARIABLES like 'log_bin'

 

Method 2:

Enter the specified directory:

insert into user(name, age) values('dafei', 18);
insert into user(name, age) values('dafei', 18);
insert into user(name, age) values('dafei', 18);

 

 

6. Canal installation and configuration

6.1 Download

Address: Releases · alibaba/canal · GitHub

Just unzip.

6.2 Configuration

6.2.1 Modify the configuration of canal.properties

canal.port = 11111
# tcp, kafka, rocketMQ, rabbitMQ, pulsarMQ
canal.serverMode = tcp

canal.destinations = example

canal.port: default port 11111

canal.serverMode: service mode, tcp means input client, xxMQ output to various message middleware

canal.destinations: Canal can collect data from multiple MySQL databases, and each MySQL database has an independent configuration file control. Specific configuration rules: Under the conf/ directory, use a folder to place, and the folder name represents a MySQL instance. canal.destinations is used to configure the database that needs to monitor data. If multiple, use, separated

6.2.2 Modify the MySQL instance configuration file instance.properties

config/directory

canal.instance.mysql.slaveId=20

# position info
canal.instance.master.address=127.0.0.1:3306

# username/password
canal.instance.dbUsername=root
canal.instance.dbPassword=admin

canal.instance.mysql.slaveId: use canal slave stage id

canal.instance.master.address: database ip port

canal.instance.dbUsername: connect to mysql account

canal.instance.dbPassword: password to connect to mysql

6.3 start

Double click to start

 

7. Canal programming

7.1 Helloworld

1>Create project: canal-hello

2> Import related dependencies

<dependency>
    <groupId>com.alibaba.otter</groupId>
    <artifactId>canal.client</artifactId>
    <version>1.1.0</version>
</dependency>

3> Write test code

package com.langfeiyes.hello;

import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import com.google.protobuf.ByteString;
import com.google.protobuf.InvalidProtocolBufferException;

import java.net.InetSocketAddress;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class CanalDemo {

    public static void main(String[] args) throws InvalidProtocolBufferException {
        //1.获取 canal 连接对象
        CanalConnector canalConnector = CanalConnectors.newSingleConnector(new InetSocketAddress("localhost", 11111), "example", "", "");
        while (true) {
            //2.获取连接
            canalConnector.connect();
            //3.指定要监控的数据库
            canalConnector.subscribe("canal-demo.*");
            //4.获取 Message
            Message message = canalConnector.get(100);
            List<CanalEntry.Entry> entries = message.getEntries();
            if (entries.size() <= 0) {
                System.out.println("没有数据,休息一会");
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            } else {
                for (CanalEntry.Entry entry : entries) {
                    // 获取表名
                    String tableName = entry.getHeader().getTableName();
                    //  Entry 类型
                    CanalEntry.EntryType entryType = entry.getEntryType();
                    //  判断 entryType 是否为 ROWDATA
                    if (CanalEntry.EntryType.ROWDATA.equals(entryType)) {
                        //  序列化数据
                        ByteString storeValue = entry.getStoreValue();
                        //  反序列化
                        CanalEntry.RowChange rowChange = CanalEntry.RowChange.parseFrom(storeValue);
                        // 获取事件类型
                        CanalEntry.EventType eventType = rowChange.getEventType();
                        // 获取具体的数据
                        List<CanalEntry.RowData> rowDatasList = rowChange.getRowDatasList();
                        // 遍历并打印数据
                        for (CanalEntry.RowData rowData : rowDatasList) {
                            List<CanalEntry.Column> beforeColumnsList = rowData.getBeforeColumnsList();
                            Map<String, Object> bMap = new HashMap<>();
                            for (CanalEntry.Column column : beforeColumnsList) {
                                bMap.put(column.getName(), column.getValue());
                            }
                            Map<String, Object> afMap = new HashMap<>();
                            List<CanalEntry.Column> afterColumnsList = rowData.getAfterColumnsList();
                            for (CanalEntry.Column column : afterColumnsList) {
                                afMap.put(column.getName(), column.getValue());
                            }
                            System.out.println("表名:" + tableName + ",操作类型:" + eventType);
                            System.out.println("改前:" + bMap );
                            System.out.println("改后:" + afMap );
                        }
                    }
                }
            }
        }
    }
}

4> test

Perform DML operations on the user table in the canal-demo library, and observe the printed values

Canal API system analysis

 

7.2 SpringBoot integration

1>Create project: canal-sb-demo

2> Import related dependencies

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-parent</artifactId>
    <version>2.7.11</version>
</parent>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter</artifactId>
    </dependency>
    <dependency>
        <groupId>top.javatool</groupId>
        <artifactId>canal-spring-boot-starter</artifactId>
        <version>1.2.6-RELEASE</version>
    </dependency>
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>8.0.12</version>
    </dependency>
    <dependency>
        <groupId>com.google.protobuf</groupId>
        <artifactId>protobuf-java</artifactId>
        <version>3.21.4</version>
    </dependency>
</dependencies>

 3>Configuration file

canal:
  server: 127.0.0.1:11111 #canal 默认端口11111
  destination: example
spring:
  application:
    name: canal-sb-demo
  datasource:
    driver-class-name: com.mysql.cj.jdbc.Driver
    url: jdbc:mysql://127.0.0.1:3306/canal-demo?useUnicode=true&characterEncoding=utf-8&serverTimezone=UTC&useSSL=false
    username: root
    password: admin

4> Entity Object

package com.langfeiyes.sb.domain;

public class User {
    private Long id;
    private String name;
    private Integer age;

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Integer getAge() {
        return age;
    }

    public void setAge(Integer age) {
        this.age = age;
    }

    @Override
    public String toString() {
        return "User{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", age=" + age +
                '}';
    }

}

5>Monitoring processing class

package com.langfeiyes.sb.handler;

import com.langfeiyes.sb.domain.User;
import org.springframework.stereotype.Component;
import top.javatool.canal.client.annotation.CanalTable;
import top.javatool.canal.client.handler.EntryHandler;

@Component
@CanalTable(value = "user")
public class UserHandler implements EntryHandler<User> {
 
    @Override
    public void insert(User user) {
        System.err.println("添加:" + user);
    }
 
    @Override
    public void update(User before, User after) {
        System.err.println("改前:" + before);
        System.err.println("改后:" + after);

    }
    @Override
    public void delete(User user) {
        System.err.println("删除:" + user);
    }
}

6>Start class

@SpringBootApplication
public class App {
    public static void main(String[] args) {
        SpringApplication.run(App.class, args);
    }
}

7> test

  • Start the canal server first

  • restart project

  • Modify the user table

  • Observation results

Eight, the same type of technology

Type 1: Data synchronization component based on log parsing

This type of component mainly obtains the addition, deletion and modification operations of the database by parsing the log files such as the Binlog (MySQL) or Redo Log (Oracle) of the database, and records these operations. Then, these operation records can be transferred to another database for the purpose of data synchronization. Representative products of such components include Ali's open source Canal , Tencent Cloud's DBSync , etc.

Type 2: ETL-based data synchronization components

ETL is Extract-Transform-Load, which refers to extracting data from the source system, transforming the data, and finally loading it into the target system. Such components usually need to write complex data conversion rules and data mapping relationships, and are suitable for scenarios with frequent data structure changes, large data volumes, and multiple data sources. Representative products include Alibaba Cloud's DataWorks , Informatica PowerCenter , etc.

Type 3: CDC-based data synchronization components

CDC (Change Data Capture) is change data capture. It is a data synchronization technology that can capture data changes in a database in real time or quasi-real time and transmit them to another database. The CDC technology is implemented based on the transaction log or redo log of the database, which can realize low-latency, high-performance data synchronization. Representative products of CDC components include Oracle GoldenGate, IBM Infosphere Data Replication , etc.

Type 4: Data synchronization component based on message queue

Such components usually abstract the change operations that occur in the database into a data structure, and publish it to other systems for processing through the message queue, so as to realize asynchronous transmission and decoupling of data. Representative products include Apache Kafka, RabbitMQ , etc.

Nine, Canal common interview questions

Q: What is Canal? What are the characteristics?

Answer: Canal is a distributed, high-performance, and reliable message queue based on Netty, which is open sourced by Alibaba. It has a wide range of applications in real-time data synchronization and data distribution scenarios. Canal has the following features: supports log parsing and subscription of MySQL, Oracle and other databases; supports multiple data output methods, such as Kafka, RocketMQ, ActiveMQ, etc.; supports data filtering and format conversion; has excellent features such as low latency and high reliability Performance.

Q: How does Canal work?

Answer: Canal mainly obtains the addition, deletion, and modification operations of the database by parsing the binlog logs of the database, and then sends these change events to downstream consumers. The core components of Canal include two parts: Client and Server. Client is responsible for connecting to the database, starting log parsing, and sending the parsed data to Server; Server is responsible for receiving the data sent by Client, and filtering and distributing the data. Canal also supports a variety of data exporters, such as Kafka, RocketMQ, ActiveMQ, etc., which can send the parsed data to different message queues for further processing and analysis.

Q: What are the pros and cons of Canal?

Answer: The advantages of Canal mainly include: high performance, distributed, good reliability, support for data filtering and conversion, and cross-database types (such as MySQL, Oracle, etc.). Disadvantages include: it is difficult to use, it has a certain impact on the database log, and it does not support data backtracking (that is, it cannot obtain historical data), etc.

Q: What application scenarios does Canal have in business?

A: Canal is mainly used in real-time data synchronization and data distribution scenarios. Common application scenarios include: data backup and disaster recovery, incremental data extraction and synchronization, real-time data analysis, online data migration, etc. Especially in the Internet big data scenario, Canal has become one of the important tools for various data processing tasks.

Guess you like

Origin blog.csdn.net/langfeiyes/article/details/130711899
Recommended