Big Data Technology Canal Summary and Detailed Cases

0 Canal Introduction

Canal is developed in Java based on database incremental log parsing and provides middleware for incremental data subscription & consumption. At present, Canal mainly supports the analysis of MySQL's Binlog, and the Canal Client is used to process the obtained relevant data after the analysis is completed. (Database synchronization requires Ali's Otter middleware, based on Canal)

1 MySQL Binlog

1.1 What is Binlog

MySQL's binary log, which records all DDL and DML (except data query statements) statements in the form of events, and also includes the time consumed by statement execution, MySQL's binary log is transaction-safe. Binary has two most important usage scenarios:

① MySQL Replication opens Binlog on the Master side, and the Master passes its binary log to the Slaves to achieve the purpose of Master-Slave data consistency.

② Data recovery, by using the MySQL Binlog tool to restore data. Binary logs include two types of files: binary log index files (file name suffix .index) are used to record all binary files, binary log files (file name suffix . statement) statement event

1.2 Binlog classification

There are three formats of MySQL Binlog, STATEMENT, MIXED, ROW. You can choose to configure binlog_format= statement|mixed|row in the configuration file
. The difference between the three formats
1) At the statement
level, binlog will record each statement that performs a write operation. Compared with the row mode, it saves space
, but may cause inconsistency, such as "update tt set create_date=now()". If the binlog log is used for recovery, the data may be different due to different execution times.

优点:节省空间。
缺点:有可能造成数据不一致。

2) At the row
level, binlog will record the changes of each row after each operation.

优点:保持数据的绝对一致性。因为不管 sql 是什么,引用了什么函数,只记录执行后的效果。
缺点:占用较大空间。

3) The upgraded version of mixed statement solves the inconsistency of the statement mode caused
by some situations to a certain extent . The default is still statement. In some cases, for example: when the function contains UUID(); the table containing the AUTO_INCREMENT field is
When updating; when executing the INSERT DELAYED statement; when using UDF; it will be processed in the way of ROW

优点:节省空间,同时兼顾了一定的一致性。
缺点:还有些极个别情况依旧会造成不一致,另外 statement 和 mixed 对于需要对binlog 的监控的情况都不方便。

Based on the above comparison, Canal wants to do monitoring and analysis, and it is more appropriate to choose the row format

1.3 Mysql master-slave replication

1) The Master main library will change the record and write it to the binary log (Binary Log);
2) The Slave slave library sends a dump protocol to the MySQL Master, and copies the binary log events of the Master main library to its relay log (relay log) ;
3) Slave reads and redoes the events in the relay log from the library, and synchronizes the changed data to its own database. The
insert image description here
principle of Canal is very simple, that is, it pretends to be a slave node of mysql and pretends to copy data from the Master. The usage scenarios are as follows:
insert image description here

1.4 Modify the mysql configuration file

Modify the /etc/my.cnf file as follows
insert image description here

#开启binlog
log_bin = mysql-bin
#binlog日志类型
binlog_format = row
#MySQL服务器唯一id
server_id = 1
#设置需要同步的库
binlog-do-db=canal

Modify binlog-do-db according to your own situation, and specify the specific database to be synchronized. If not configured, it means that all databases have Binlog enabled. After the modification is complete, restart mysql to take effect

service mysql restart

1.5 Test whether the binlog takes effect

build table

create table student (
	id varchar(20)
	,name varchar(20)
	,age int
	,sex varchar(5)
) ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci;

insert data

insert into student values('1001','zhangsan',18,'male');

Compare and view the binlog before the data is not inserted, and
insert image description here
you can see that the binlog has changed, indicating that the configuration has taken effect.

1.6 Create a new canal user in Mysql

CREATE USER 'canal'@'%' IDENTIFIED BY 'canal';
GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;

2 Install Canal

2.1 Download canal

Enter the official website to download the canal installation package, here we take version 1.1.2 as an example

https://github.com/alibaba/canal/releases

2.2 Create a canal folder and unzip it

The canal file package here contains a lot of files after decompression, so it is recommended to create a separate folder to store the decompressed files.

mkdir canal
tar -zxvf canal.deployer-1.1.2.tar.gz -C ./canal

2.3 Modify the configuration of canal.properties

Description:
① This file is the basic general configuration of canal. The default canal port number is 11111. Modify the output model of canal, default tcp, and output to kafka. ②
Multi-instance configuration If you create multiple instances, through the previous canal architecture, we You can know that there can be multiple instances in a canal service, and each example under conf/ is an instance, and each instance has an independent configuration file under it. By default, there is only one instance example. If multiple instances are required to process different MySQL data, copy multiple examples directly and rename them. The name is consistent with the name specified in the configuration file, and then modify the canal in canal.properties. destinations=instance1, instance2, instance3.
insert image description here

2.4 modify instance.properties

We only read one MySQL data here, so there is only one instance, and the configuration file of this instance is in the conf/example directory

vim instance.properties
  1. Configure the Mysql server address
#################################################
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=10
# enable gtid use true/false
canal.instance.gtidon=false
# position info
canal.instance.master.address=wavehouse-3:3306

insert image description here

Note: here canal.instance.mysql.slaveId=10, as long as it is different from the server_id in my.cnf, because here canal is a slave disguised as mysql, so the id should be different.

2) Configure the username and password to connect to MySQL, the default is the canal we authorized earlier

canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.defaultDatabaseName =canal
# enable druid Decrypt database password
canal.instance.enableDruid=false

insert image description here

2.5 start canal

./startup.sh

insert image description here

3 Real-time monitoring

3.1 Create a maven project

3.2 Add dependencies

<dependencies>
 <dependency>
 <groupId>com.alibaba.otter</groupId>
 <artifactId>canal.client</artifactId>
 <version>1.1.2</version>
</dependency>
 <dependency>
 <groupId>org.apache.kafka</groupId>
 <artifactId>kafka-clients</artifactId>
 <version>2.4.1</version>
 </dependency>
</dependencies>

3.3 Write code according to Canal's architecture

insert image description here

package com.chen.canal;

import com.alibaba.fastjson.JSONObject;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import com.google.protobuf.ByteString;
import com.google.protobuf.InvalidProtocolBufferException;

import java.net.InetSocketAddress;
import java.util.List;

public class CanalClient {
    
    
    public static void main(String[] args) throws InvalidProtocolBufferException {
    
    
        //1.获取Canal连接对象
        CanalConnector canalConnector = CanalConnectors.newSingleConnector(new InetSocketAddress("192.168.2.202", 11111),
                "example", "", "");
        while (true){
    
    
            //2.获取连接
            canalConnector.connect();
            //3.指定需要监控的数据库
            canalConnector.subscribe("canal.*");
            //4.获取message
            Message message = canalConnector.get(100);
            //4.1获取entries
            List<CanalEntry.Entry> entries = message.getEntries();
            //4.2 判断是否有数据
            if(entries.size() <= 0){
    
    
                System.out.println("当前没有数据,休息一下~~~~");
                try {
    
    
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
    
    
                    e.printStackTrace();
                }
            }else{
    
    
                //如果有数据,则进行遍历
                for (CanalEntry.Entry entry : entries) {
    
    
                    //5.获取表名
                    String tableName = entry.getHeader().getTableName();
                    //6.获取entryType
                    CanalEntry.EntryType entryType = entry.getEntryType();
                    //7.判断entryType类型是否未ROWDATA
                    if(CanalEntry.EntryType.ROWDATA.equals(entryType)){
    
    
                        //7.1如果是则序列化数据
                        ByteString storeValue = entry.getStoreValue();
                        //8.反序列化
                        CanalEntry.RowChange rowChange = CanalEntry.RowChange.parseFrom(storeValue);
                        //9.获取事件类型
                        CanalEntry.EventType eventType = rowChange.getEventType();
                        //10.获取具体的数据
                        List<CanalEntry.RowData> rowDatasList = rowChange.getRowDatasList();
                        //11.遍历打印数据
                        for (CanalEntry.RowData rowData : rowDatasList) {
    
    
                            //11.1获取before的数据
                            List<CanalEntry.Column> beforeColumnsList = rowData.getBeforeColumnsList();
                            //11.2新建JSON存放before数据
                            JSONObject beforeData = new JSONObject();
                            for (CanalEntry.Column column : beforeColumnsList) {
    
    
                                beforeData.put(column.getName(),column.getValue());
                            }
                            //11.3获取after数据
                            List<CanalEntry.Column> afterColumnsList = rowData.getAfterColumnsList();
                            //11.4新建JSON存放after数据
                            JSONObject afterData = new JSONObject();
                            for (CanalEntry.Column column : afterColumnsList) {
    
    
                                afterData.put(column.getName(),column.getValue());
                            }
                            //12.打印
                            System.out.println("TableName: "+tableName+" ,EvenType: "+eventType+
                                    " ,Before: "+beforeData+",After: "+afterData);

                        }
                    }
                }
                
            }
        }
    }
}

3.3.1 Insert data

insert into student values('1002','lisi',28,'fe');

After inserting data, we can find that it is After update, and Before is not updated, because there is no data before Before and it is empty
insert image description here

3.3.2 Inserting multiple rows of data

insert into student values('1003','wangwu',29,'fe'),('1004','zhaoliu',38,'male'),('1005','zhuqi',8,'male')

insert image description here

3.3.3 Update data

update student set age=22 where id=1002;

insert image description here

Update data will have data in both before and after, before is the data before modification, and after is the data after modification

3.3. Delete data

delete from student where id=1003;

insert image description here
The data does not exist after deletion, so there is data in before, but there is no data in after

3.4 Kafka mode test

1) Modify the output model of canal in canal.properties, default tcp, and change it to output to kafka
insert image description here

2) Modify the address of the Kafka cluster
insert image description here

3) Modify the topic and number of partitions output to Kafka in instance.properties.
Note: By default, it is still output to a kafka partition of the specified Kafka topic, because multiple partitions in parallel may disrupt the order of binlog. If you want to improve parallelism, first set kafka The number of partitions > 1, and then set the canal.mq.partitionHash attribute
insert image description here
4)
Before starting Canal, start kafka first, and start kafka first to start zookeeper

bin/zkServer.sh start
bin/kafka-server-start.sh -daemon config/server.properties

insert image description here
5) When you see CanalLauncher, you indicate that the startup is successful, and the canal_test topic will be created at the same time
insert image description here

6) Start the Kafka consumer client test to check the consumption

bin/kafka-console-consumer.sh --bootstrap-server 192.168.2.200:9092 --topic canal_test

7) Check the consumer console after inserting data into MySQL

insert into student values('1008','zhuba',29,'fe'),('1009','yangjiu',38,'male');

insert image description here
Kafka writes multiple pieces of inserted data in a json array

  1. Update MySQL data view consumer console
update student set name='zhuba2' where id=1008;

insert image description here
After the update, Kafka displays the old field as old data, and the data field as new data
9) Delete MySQL data and view the consumption console

delete from student where id=1005;

insert image description here

Guess you like

Origin blog.csdn.net/Keyuchen_01/article/details/129758623