Introduction to Canal of Big Data Technology

Introduction to Canal of Big Data Technology


written in front

  • Canal version:Canal-1.1.2

Official website: https://github.com/alibaba/canal/

Official document: https://github.com/alibaba/canal/wiki

Chapter 1 Getting Started with Canal

1.1 What is Canal

Alibaba B2B companies, because of the characteristics of the business, the sellers are mainly concentrated in the country, and the buyers are mainly concentrated in foreign countries, so the demand has been derived. 同步杭州和美国异地机房Since 2010, Alibaba companies have gradually tried to analyze logs based on the database to obtain increased Synchronize volume changes, and thus derive incremental subscription & consumption business.

Canal is a middleware that provides incremental data subscription & consumption based on database incremental log parsing Javadeveloped . at present. Canal mainly supports it MySQL 的 Binlog 解析, and the Canal Client is used to process the obtained related data after the analysis is completed. (Database synchronization requires Ali's Otter middleware, based on Canal).

1.2 MySQL Binlog

1.2.1 What is Binlog

MySQL's binary log can be said to be the most important log of MySQL. It records all DDL and DML (except data query statements) statements in the form of events, and also includes the time consumed by statement execution. MySQL's binary log is a transaction Safe type.

Generally speaking, there will be a performance loss of about 1% when the binary log is turned on. Binary has two most important usage scenarios:

  • One: MySQL Replication opens Binlog on the Master side, and the Master passes its binary logs to the Slaves to achieve the 数据一致purpose of Master-Slave.

  • Second: 数据恢复Naturally , restore data by using the MySQL Binlog tool.

Binary logs include two types of files: 二进制日志索引文件(file name suffix .index) is used to record all binary files, 二进制日志文件(file name suffix .00000*) records all DDL and DML (except data query statements) statement events of the database.

1.2.2 Binlog classification

There are three formats of MySQL Binlog, namely STATEMENT,MIXED,ROW. You can choose to configure binlog_format= statement|mixed|row in the configuration file. The difference between the three formats:

1) statement: At the statement level, binlog will record every statement that executes a write operation. Compared with the row mode, it saves space, but may cause inconsistency. For example update tt set create_date=now(), if the binlog log is used for recovery, the data may be different due to different execution times.

  • Advantages: save space.
  • Disadvantage: It may cause data inconsistency.

2) row: Row level, binlog will record the change of each row record after each operation.

  • Advantages: Maintain absolute consistency of data. Because no matter what sql is or what function is referenced, it only records the effect after execution.
  • Disadvantages: Take up a lot of space.

3) mixed: The upgraded version of statement solves the inconsistency of the statement mode caused by some circumstances. The default is still statement. In some cases, for example:

  • When the function contains UUID();
  • When a table containing an AUTO_INCREMENT field is updated;
  • When executing the INSERT DELAYED statement; when using UDF; it will be processed in the way of ROW

Advantages and disadvantages:

  • Advantages: save space, while taking into account a certain degree of consistency.
  • Disadvantages: There are some rare cases that still cause inconsistencies. In addition, statement and mixed

The monitoring of binlog is not convenient.

Based on the comparison above, Canal wants to do monitoring and analysis, so the choice row 格式is more appropriate.

1.3 How Canal works

1.3.1 MySQL master-slave replication process

  • The master main library will change the record and write it to the binary log (Binary Log);
  • Slave sends a dump protocol from the library to the MySQL Master, and copies the binary log events of the Master main library to its relay log (relay log);
  • Slave reads and redoes the events in the relay log from the library, and synchronizes the changed data to its own database.

insert image description here

1.3.2 How Canal works

It's very simple, just 伪装成 Slavepretend to copy data from Master.

1.4 Usage Scenarios

  • Original scene: part of Ali Otter middleware

Otter is Ali's synchronization framework between remote databases, and Canal is a part of it.

tp

  • Common Scenario 1: Updating the cache

insert image description here

  • Common Scenario 2: Grab the new change data of the business table and use it to make real-time statistics (this is our scenario)

Chapter 2. MySQL Preparation

2.1 Create a database

insert image description here

2.2 Create a data table

CREATE TABLE user_info(
`id` VARCHAR(255),
`name` VARCHAR(255),
`sex` VARCHAR(255)
);

2.3 Modify the configuration file to enable Binlog

[zhangsan@node01 module]$ sudo vim /etc/my.cnf 
server-id=1  #配置mysql replaction需要定义,不能和canal的slaveId重复  
log-bin=mysql-bin 
binlog_format=row 
binlog-do-db=gmall-2021

Note: binlog-do-db is modified according to your own situation, specifying the specific database to be synchronized, if not configured, it means that all databases have Binlog enabled

2.4 Restart MySQL to make the configuration take effect

sudo systemctl restart mysqld	

Go to the /var/lib/mysql directory to view the initial file size: 154

[zhangsan@node01 lib]$ pwd
/var/lib
[zhangsan@node01  lib]$ sudo ls -l mysql
总用量 474152
-rw-r-----. 1 mysql mysql	56 8 月	7 2020 auto.cnf
drwxr-x---. 2 mysql mysql	4096 9 月  25 2020 azkaban
-rw-------. 1 mysql mysql	1680 8 月	7 2020 ca-key.pem
-rw-r--r--. 1 mysql mysql	1112 8 月	7 2020 ca.pem
drwxr-x--- 2 mysql mysql	4096 8 月  18 16:56 cdc_test
-rw-r--r--. 1 mysql mysql	1112 8 月	7 2020 client-cert.pem
-rw-------. 1 mysql mysql	1676 8 月	7 2020 client-key.pem
drwxr-x---. 2 mysql mysql	4096 9 月  25 2020 gmall_report
-rw-r----- 1 mysql mysql	1085 12 月  1 09:12 ib_buffer_pool
-rw-r-----. 1 mysql mysql 79691776 12 月 13 08:45 ibdata1
-rw-r-----. 1 mysql mysql 50331648 12 月 13 08:45 ib_logfile0
-rw-r-----. 1 mysql mysql 50331648 12 月 13 08:45 ib_logfile1
-rw-r----- 1 mysql mysql 12582912 12 月 13 08:45 ibtmp1
drwxr-x--- 2 mysql mysql	4096 9 月  22 15:30 maxwell
drwxr-x---. 2 mysql mysql	4096 8 月  12 2020 metastore
drwxr-x---. 2 mysql mysql	4096 9 月  22 15:43 mysql
-rw-r-----. 1 mysql mysql	154 12 月 13 08:45 mysql-bin.000001
-rw-r----- 1 mysql mysql	19 12 月 13 08:45 mysql-bin.index
srwxrwxrwx 1 mysql mysql	0 12 月 13 08:45 mysql.sock
-rw------- 1 mysql mysql	5 12 月 13 08:45 mysql.sock.lock
drwxr-x---. 2 mysql mysql	4096 8 月	7 2020 performance_schema
-rw-------. 1 mysql mysql	1680 8 月	7 2020 private_key.pem
-rw-r--r--.	1	mysql	mysql	452 8 月	7 2020 public_key.pem
-rw-r--r--.	1	mysql	mysql	1112 8 月	7 2020 server-cert.pem
-rw	--.	1	mysql	mysql	1680 8 月	7 2020 server-key.pem
drwxr-x---.	2	mysql	mysql	12288 8 月	7 2020 sys
drwxr-x--- 2 mysql mysql	4096 2 月	2 2021 test
[zhangsan@node01 lib]$

As you can see, the file size of mysql-bin.000001 is 154

2.5 Test whether Binlog is enabled

  • insert data
INSERT INTO user_info VALUES('1001','zhangsan','male');	
  • Go to the /var/lib/mysql directory again to check the size of the index file
-rw	--.	1	mysql	mysql	1680 8 月	7 2020 ca-key.pem
-rw-r--r--.	1	mysql	mysql	1112 8 月	7 2020 ca.pem
drwxr-x---	2	mysql	mysql	4096 8 月  18 16:56 cdc_test
-rw-r--r--.	1	mysql	mysql	1112 8 月	7 2020 client-cert.pem
-rw	--.	1	mysql	mysql	1676 8 月	7 2020 client-key.pem
drwxr-x---.	2	mysql	mysql	4096 9 月  25 2020 gmall_report
-rw-r-----	1	mysql	mysql	1085 12 月  1 09:12 ib_buffer_pool
-rw-r-----.	1	mysql	mysql	79691776 12 月 13 08:45 ibdata1
-rw-r-----.	1	mysql	mysql	50331648 12 月 13 08:45 ib_logfile0
-rw-r-----.	1	mysql	mysql	50331648 12 月 13 08:45 ib_logfile1
-rw-r-----	1	mysql	mysql	12582912 12 月 13 08:45 ibtmp1
drwxr-x---	2	mysql	mysql	4096 9 月  22 15:30 maxwell
drwxr-x---.	2	mysql	mysql	4096 8 月  12 2020 metastore
drwxr-x---.	2	mysql	mysql	4096 9 月  22 15:43 mysql
-rw-r-----.	1	mysql	mysql	452 12 月 13 08:45 mysql-bin.000001
-rw-r-----	1	mysql	mysql	19 12 月 13 08:45 mysql-bin.index
srwxrwxrwx	1	mysql	mysql	0 12 月 13 08:45 mysql.sock
-rw-------	1	mysql	mysql	5 12 月 13 08:45 mysql.sock.lock
drwxr-x---.	2	mysql	mysql	4096 8 月	7 2020 performance_schema
-rw	--.	1	mysql	mysql	1680 8 月	7 2020 private_key.pem
-rw-r--r--.	1	mysql	mysql	452 8 月	7 2020 public_key.pem
-rw-r--r--.	1	mysql	mysql	1112 8 月	7 2020 server-cert.pem
-rw	--.	1	mysql	mysql	1680 8 月	7 2020 server-key.pem
drwxr-x---.	2	mysql	mysql	12288 8 月	7 2020 sys
drwxr-x--- 2 mysql mysql	4096 2 月	2 2021 test
[zhangsan@node01 lib]$

It can be seen that the file size of mysql-bin.000001 has become larger (452)

2.6 Authorization

Execute in MySQL: modify the MySQL password length; grant the canal user select permission

mysql> set global validate_password_length=4; 
mysql> set global validate_password_policy=0;
mysql> GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO
'canal'@'%' IDENTIFIED BY 'canal' ;

View the user table under the mysql library

insert image description here

Chapter 3 Canal Download and Installation

3.1 Download and decompress the Jar package

https://github.com/alibaba/canal/releases

After downloading, copy canal.deployer-1.1.2.tar.gz to the /opt/sortware directory, and then extract it to the /opt/module/canal-1.1.2 package

Note: canal is scattered after decompression, we need to specify canal when specifying the decompression directory

3.2 Modify the configuration of canal.properties

[zhangsan@node01 conf]$ pwd
/opt/module/canal/conf
[zhangsan@node01 conf]$ vim canal.properties #################################################
#########	common argument	############# #################################################
canal.id = 1 canal.ip = canal.port = 11111
canal.metrics.pull.port = 11112 canal.zkServers =
# flush data to zk canal.zookeeper.flush.period = 1000 canal.withoutNetty = false
# tcp, kafka, RocketMQ canal.serverMode = tcp
# flush meta cursor/parse position to file

Description: This file is the basic general configuration of canal, the default port number of canal is 11111, modify the output model of canal, default tcp, change to output to kafka

多实例配置如果创建多个实例, through the previous canal architecture, we can know that there can be multiple instances in a canal service conf/下的每一个 example 即是一个实例,每个实例下面都有独立的配置文件. By default, there is only one instance example. If multiple instances are required to process different MySQL data, copy multiple examples directly and rename them. The name is consistent with the name specified in the configuration file, and then modify the canal in canal.properties. destinations=instance1, instance2, instance3.

#################################################
#########	destinations	############# #################################################
canal.destinations = example

3.3 modify instance.properties

We only read one MySQL data here, so there is only one instance, and the configuration file of this instance is in conf/examplethe directory

[zhangsan@node01 example]$ pwd
/opt/module/canal/conf/example
[zhangsan@node01 example]$ vim instance.properties

  • Configure the MySQL server address

Note: the value of canal.instance.mysql.slaveId cannot be the same as the server-id value of /etc/my.cnf; because canal is equivalent to a slave node, the server-id cannot be the same when master-slave replication.

#################################################
## mysql serverId , v1.0.26+ will autoGen 
canal.instance.mysql.slaveId=20

# enable gtid use true/false 
canal.instance.gtidon=false

# position info 
canal.instance.master.address=node01:3306
  • Configure the username and password to connect to MySQL, the default is the canal we authorized earlier
# username/password	
canal.instance.dbUsername=canal 
canal.instance.dbPassword=canal 

canal.instance.connectionCharset = UTF-8 
canal.instance.defaultDatabaseName =test 
# enable druid Decrypt database password 
canal.instance.enableDruid=false

Case Test Error Description

Error message: Check the log, it is in the canal installation directory logs/canal/canal.log, provided that the configuration item of canal.properties canal.destinations = examplehas not been changed, if it is changed to [test_xxx], the log is located in the canal installation directorylogs/test_xxx/test_xxx.log

[zhangsan@node01 canal]$ cat canal.log
2023-01-07 15:10:56.713 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler
2023-01-07 15:10:56.759 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations
2023-01-07 15:10:56.771 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.
2023-01-07 15:10:56.851 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[192.102.153.10(192.102.153.10):11111]
2023-01-07 15:10:58.627 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......
2023-01-07 15:10:58.822 [canal-instance-scan-0] INFO  com.alibaba.otter.canal.deployer.CanalController - auto notify start doris-load successful.
2023-01-07 15:15:34.251 [New I/O server worker #1-1] ERROR c.a.otter.canal.server.netty.handler.SessionHandler - something goes wrong with channel:[id: 0x71dc2f6d, /192.102.153.1:57500 => /192.102.153.10:11111], exception=java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:322)
        at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281)
        at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201)
        at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

Regarding this error, I searched for some information, but none of them helped

https://github.com/alibaba/canal/issues/3585

Also encountered the following error

https://github.com/alibaba/canal/issues/640

Failed to monitor MySQL data in real time

Reason: When decompressing canal at the beginning, without creating an installation directory first, decompressing canal directly, causing the canal directory to be scattered, and then moving the scattered directories to the newly created directory canal-1.1.5

Solution:

Just delete the canal-1.1.5 directory directly, and re-extract and install canal

Official Documentation Reference

  • AdminGuide

https://github.com/alibaba/canal/wiki/AdminGuide

  • ClientAPI

https://github.com/alibaba/canal/wiki/ClientAPI

  • ClientExample

https://github.com/alibaba/canal/wiki/ClientExample

Finish!

Guess you like

Origin blog.csdn.net/m0_52735414/article/details/128594570