0 Canal Introduction
Canal is developed in Java based on database incremental log parsing and provides middleware for incremental data subscription & consumption. At present, Canal mainly supports the analysis of MySQL's Binlog, and the Canal Client is used to process the obtained relevant data after the analysis is completed. (Database synchronization requires Ali's Otter middleware, based on Canal)
1 MySQL Binlog
1.1 What is Binlog
MySQL's binary log, which records all DDL and DML (except data query statements) statements in the form of events, and also includes the time consumed by statement execution, MySQL's binary log is transaction-safe. Binary has two most important usage scenarios:
① MySQL Replication opens Binlog on the Master side, and the Master passes its binary log to the Slaves to achieve the purpose of Master-Slave data consistency.
② Data recovery, by using the MySQL Binlog tool to restore data. Binary logs include two types of files: binary log index files (file name suffix .index) are used to record all binary files, binary log files (file name suffix . statement) statement event
1.2 Binlog classification
There are three formats of MySQL Binlog, STATEMENT, MIXED, ROW. You can choose to configure binlog_format= statement|mixed|row in the configuration file
. The difference between the three formats
1) At the statement
level, binlog will record each statement that performs a write operation. Compared with the row mode, it saves space
, but may cause inconsistency, such as "update tt set create_date=now()". If the binlog log is used for recovery, the data may be different due to different execution times.
优点:节省空间。
缺点:有可能造成数据不一致。
2) At the row
level, binlog will record the changes of each row after each operation.
优点:保持数据的绝对一致性。因为不管 sql 是什么,引用了什么函数,只记录执行后的效果。
缺点:占用较大空间。
3) The upgraded version of mixed statement solves the inconsistency of the statement mode caused
by some situations to a certain extent . The default is still statement. In some cases, for example: when the function contains UUID(); the table containing the AUTO_INCREMENT field is
When updating; when executing the INSERT DELAYED statement; when using UDF; it will be processed in the way of ROW
优点:节省空间,同时兼顾了一定的一致性。
缺点:还有些极个别情况依旧会造成不一致,另外 statement 和 mixed 对于需要对binlog 的监控的情况都不方便。
Based on the above comparison, Canal wants to do monitoring and analysis, and it is more appropriate to choose the row format
1.3 Mysql master-slave replication
1) The Master main library will change the record and write it to the binary log (Binary Log);
2) The Slave slave library sends a dump protocol to the MySQL Master, and copies the binary log events of the Master main library to its relay log (relay log) ;
3) Slave reads and redoes the events in the relay log from the library, and synchronizes the changed data to its own database. The
principle of Canal is very simple, that is, it pretends to be a slave node of mysql and pretends to copy data from the Master. The usage scenarios are as follows:
1.4 Modify the mysql configuration file
Modify the /etc/my.cnf file as follows
#开启binlog
log_bin = mysql-bin
#binlog日志类型
binlog_format = row
#MySQL服务器唯一id
server_id = 1
#设置需要同步的库
binlog-do-db=canal
Modify binlog-do-db according to your own situation, and specify the specific database to be synchronized. If not configured, it means that all databases have Binlog enabled. After the modification is complete, restart mysql to take effect
service mysql restart
1.5 Test whether the binlog takes effect
build table
create table student (
id varchar(20)
,name varchar(20)
,age int
,sex varchar(5)
) ENGINE=InnoDB
DEFAULT CHARACTER SET=utf8 COLLATE=utf8_general_ci;
insert data
insert into student values('1001','zhangsan',18,'male');
Compare and view the binlog before the data is not inserted, and
you can see that the binlog has changed, indicating that the configuration has taken effect.
1.6 Create a new canal user in Mysql
CREATE USER 'canal'@'%' IDENTIFIED BY 'canal';
GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;
2 Install Canal
2.1 Download canal
Enter the official website to download the canal installation package, here we take version 1.1.2 as an example
https://github.com/alibaba/canal/releases
2.2 Create a canal folder and unzip it
The canal file package here contains a lot of files after decompression, so it is recommended to create a separate folder to store the decompressed files.
mkdir canal
tar -zxvf canal.deployer-1.1.2.tar.gz -C ./canal
2.3 Modify the configuration of canal.properties
Description:
① This file is the basic general configuration of canal. The default canal port number is 11111. Modify the output model of canal, default tcp, and output to kafka. ②
Multi-instance configuration If you create multiple instances, through the previous canal architecture, we You can know that there can be multiple instances in a canal service, and each example under conf/ is an instance, and each instance has an independent configuration file under it. By default, there is only one instance example. If multiple instances are required to process different MySQL data, copy multiple examples directly and rename them. The name is consistent with the name specified in the configuration file, and then modify the canal in canal.properties. destinations=instance1, instance2, instance3.
2.4 modify instance.properties
We only read one MySQL data here, so there is only one instance, and the configuration file of this instance is in the conf/example directory
vim instance.properties
- Configure the Mysql server address
#################################################
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=10
# enable gtid use true/false
canal.instance.gtidon=false
# position info
canal.instance.master.address=wavehouse-3:3306
Note: here canal.instance.mysql.slaveId=10, as long as it is different from the server_id in my.cnf, because here canal is a slave disguised as mysql, so the id should be different.
2) Configure the username and password to connect to MySQL, the default is the canal we authorized earlier
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
canal.instance.defaultDatabaseName =canal
# enable druid Decrypt database password
canal.instance.enableDruid=false
2.5 start canal
./startup.sh
3 Real-time monitoring
3.1 Create a maven project
3.2 Add dependencies
<dependencies>
<dependency>
<groupId>com.alibaba.otter</groupId>
<artifactId>canal.client</artifactId>
<version>1.1.2</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.4.1</version>
</dependency>
</dependencies>
3.3 Write code according to Canal's architecture
package com.chen.canal;
import com.alibaba.fastjson.JSONObject;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import com.google.protobuf.ByteString;
import com.google.protobuf.InvalidProtocolBufferException;
import java.net.InetSocketAddress;
import java.util.List;
public class CanalClient {
public static void main(String[] args) throws InvalidProtocolBufferException {
//1.获取Canal连接对象
CanalConnector canalConnector = CanalConnectors.newSingleConnector(new InetSocketAddress("192.168.2.202", 11111),
"example", "", "");
while (true){
//2.获取连接
canalConnector.connect();
//3.指定需要监控的数据库
canalConnector.subscribe("canal.*");
//4.获取message
Message message = canalConnector.get(100);
//4.1获取entries
List<CanalEntry.Entry> entries = message.getEntries();
//4.2 判断是否有数据
if(entries.size() <= 0){
System.out.println("当前没有数据,休息一下~~~~");
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}else{
//如果有数据,则进行遍历
for (CanalEntry.Entry entry : entries) {
//5.获取表名
String tableName = entry.getHeader().getTableName();
//6.获取entryType
CanalEntry.EntryType entryType = entry.getEntryType();
//7.判断entryType类型是否未ROWDATA
if(CanalEntry.EntryType.ROWDATA.equals(entryType)){
//7.1如果是则序列化数据
ByteString storeValue = entry.getStoreValue();
//8.反序列化
CanalEntry.RowChange rowChange = CanalEntry.RowChange.parseFrom(storeValue);
//9.获取事件类型
CanalEntry.EventType eventType = rowChange.getEventType();
//10.获取具体的数据
List<CanalEntry.RowData> rowDatasList = rowChange.getRowDatasList();
//11.遍历打印数据
for (CanalEntry.RowData rowData : rowDatasList) {
//11.1获取before的数据
List<CanalEntry.Column> beforeColumnsList = rowData.getBeforeColumnsList();
//11.2新建JSON存放before数据
JSONObject beforeData = new JSONObject();
for (CanalEntry.Column column : beforeColumnsList) {
beforeData.put(column.getName(),column.getValue());
}
//11.3获取after数据
List<CanalEntry.Column> afterColumnsList = rowData.getAfterColumnsList();
//11.4新建JSON存放after数据
JSONObject afterData = new JSONObject();
for (CanalEntry.Column column : afterColumnsList) {
afterData.put(column.getName(),column.getValue());
}
//12.打印
System.out.println("TableName: "+tableName+" ,EvenType: "+eventType+
" ,Before: "+beforeData+",After: "+afterData);
}
}
}
}
}
}
}
3.3.1 Insert data
insert into student values('1002','lisi',28,'fe');
After inserting data, we can find that it is After update, and Before is not updated, because there is no data before Before and it is empty
3.3.2 Inserting multiple rows of data
insert into student values('1003','wangwu',29,'fe'),('1004','zhaoliu',38,'male'),('1005','zhuqi',8,'male')
3.3.3 Update data
update student set age=22 where id=1002;
Update data will have data in both before and after, before is the data before modification, and after is the data after modification
3.3. Delete data
delete from student where id=1003;
The data does not exist after deletion, so there is data in before, but there is no data in after
3.4 Kafka mode test
1) Modify the output model of canal in canal.properties, default tcp, and change it to output to kafka
2) Modify the address of the Kafka cluster
3) Modify the topic and number of partitions output to Kafka in instance.properties.
Note: By default, it is still output to a kafka partition of the specified Kafka topic, because multiple partitions in parallel may disrupt the order of binlog. If you want to improve parallelism, first set kafka The number of partitions > 1, and then set the canal.mq.partitionHash attribute
4)
Before starting Canal, start kafka first, and start kafka first to start zookeeper
bin/zkServer.sh start
bin/kafka-server-start.sh -daemon config/server.properties
5) When you see CanalLauncher, you indicate that the startup is successful, and the canal_test topic will be created at the same time
6) Start the Kafka consumer client test to check the consumption
bin/kafka-console-consumer.sh --bootstrap-server 192.168.2.200:9092 --topic canal_test
7) Check the consumer console after inserting data into MySQL
insert into student values('1008','zhuba',29,'fe'),('1009','yangjiu',38,'male');
Kafka writes multiple pieces of inserted data in a json array
- Update MySQL data view consumer console
update student set name='zhuba2' where id=1008;
After the update, Kafka displays the old field as old data, and the data field as new data
9) Delete MySQL data and view the consumption console
delete from student where id=1005;