Java Development - Basic Usage of Canal

foreword

What I bring to you today is the basic usage of Canal. Canal is often used by us to synchronize data in Java. Of course, it is not between MySQL and MySQL, Redis and Redis. If it is them, it will be easy. We can The synchronization of master-slave, master-master, cascade, etc. between them is completed directly through configuration. Why use Canal? The main purpose is to complete the data synchronization between MySQL and Redis, MySQL and ES. The essence is to reduce the coupling degree of the code during the synchronization process. Otherwise, we can store data in several different storage methods through the code.

Meet Canal

What is Canal

canal [kə'næl] , translated as waterway/pipe/ditch, the main purpose is to provide incremental data subscription and consumption based on MySQL database incremental log parsing.

The following picture can represent the purpose of Canal, so let’s take a look at it together:

After seeing this picture, we would like to thank the developers for their efforts and providing us with such a good tool. At present, many companies use this method for data synchronization, which can be sent to MySQL and ES respectively through Canal. Synchronous Data.

Services based on log incremental subscription and consumption include

  • database mirroring
  • Database real-time backup
  • Index construction and real-time maintenance (split heterogeneous index, inverted index, etc.)
  • Business cache refresh
  • Incremental data processing with business logic

The current canal supports source MySQL versions including 5.1.x , 5.5.x , 5.6.x , 5.7.x , 8.0.x

Fundamental

The implementation of Canal mainly uses the principle of MySQL master-slave replication, which is subdivided as follows:

  • MySQL master writes data changes to the binary log (binary log, where the records are called binary log events, which can be viewed through show binlog events)
  • MySQL slave copies the master's binary log events to its relay log (relay log)
  • MySQL slave replays events in the relay log, reflecting data changes to its own data

That is to say, Canal disguises itself as a MySQL slave library, and like other Slavas, sends a dump protocol to the Master. After receiving the dump request, the MySQL master starts to push the binary log to the slave (that is, canal), and the canal parses the binary log object (originally byte stream).

Canal preparation

Friends who are new to Canal click the link below to download Canal:

Releases · alibaba/canal · GitHub

Don't use a version that is too new, we will use version 1.1.4: 

After the download is complete, put it in an English path. We change the name of the folder to canal, and there are four folders under it:

 

MySQL configuration

Here, we don't need to configure the master-slave of MySQL. If you want to know more, you may wish to read this blog:

Java development - MySQL master-slave replication first experience

Here is the master-slave configuration you want, and some experience on the master-slave configuration.

Here, we only need to start a MySQL service, set a connection user and password, and the overall steps are similar to configuring MySQL master-slave, because it is essential to configure Canal as MySQL's Slava.

Is the MySQL service enabled? Then log in to the MySQL service, let's create and authorize a user first.

Create user:

CREATE USER 'canal'@'%' IDENTIFIED WITH 'mysql_native_password' BY '123456';

One of the changes between mysql8.0 and 5.x is the change of the encryption authentication method. This is mentioned in the MySQL master-slave replication mentioned above. caching_sha2_password is 8.0, mysql_native_password is 5.x, and canal we all use the mysql_native_password method here Create a password.

Remote authorization: 

GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%'  WITH GRANT OPTION;

Refresh permissions:

FLUSH PRIVILEGES;

Modify the my.cnf file, which is found according to the path of mysql installation location, but it seems that this file does not exist in most cases, so we can directly create one in the etc directory and use it. I am really afraid, you can run the following command to view my. The default running location of cnf:

 mysql --help | grep 'my.cnf'

 

So in the default path: /usr/local/Cellar/mysql/version number/, there is no etc file here, create it manually, don’t be afraid, then:

Enter the etc file and run here:

vim my.cnf

enter:

[mysqld]
# 打开binlog
log-bin=mysql-bin
# 选择ROW(行)模式
binlog-format=ROW
# 不要和canal的slaveId重复即可
server_id=1

 Exit and save, then restart mysql.

Check whether the binlog of mysql is enabled:

show variables like 'log_bin';

 

Turned on.

Check binlog_format:

show variables like "%binlog_format%";

 

ROW is displayed, which means our settings take effect.

Check server_id:

show variables like "%server_id%";

 

The 1 we set has taken effect. 

View the binlog file currently being written:

show master status;

 

We mainly look at these two parameters. Remember, stop here and don’t touch anything in the database, otherwise these two data will change, which will affect our configuration of canal. The above two parameters are needed when we configure canal later.

Forehead. . . . . However, these two parameters do not need to be set. If not set, it means that the synchronization will start from the latest place. The blogger has already tried it, and there is no problem.

Canal configuration

We open the canal folder we just downloaded, and open the file under this path: conf/example/instance.properties:

#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal.instance.mysql.slaveId=0

# enable gtid use true/false
canal.instance.gtidon=false

# position info
canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=mysql-bin.000001
canal.instance.master.position=157
canal.instance.master.timestamp=
canal.instance.master.gtid=

# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=

# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=123456
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
canal.instance.filter.regex=.*\\..*
# table black regex
canal.instance.filter.black.regex=
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch

# mq config
canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#################################################

There are not many core parameters that we need to change, as follows:

canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=mysql-bin.000001
canal.instance.master.position=157

canal.instance.dbUsername=canal
canal.instance.dbPassword=123456

Others don’t need to be changed for the time being, and we will talk about them later in the actual application. You should know what these parameters mean without bloggers talking about them, right? Save it.

Now let's start canal. The start of canal is very simple. Open a command line tool, directly drag the bin/startup.sh file into it and press Enter. The method is not fixed:

 

The command line has output a large section of content, but we don’t know if the canal has started successfully. Let’s take a look:

 

You can see the process number of CanalLauncher through jps, it seems that there should be no problem. 

Simple Canal listening test

Let's create the simplest Spring Boot project, and the process will not be repeated:

First we introduce dependencies:

<dependency>
    <groupId>com.alibaba.otter</groupId>
    <artifactId>canal.client</artifactId>
    <version>1.1.4</version>
</dependency>

The version number should be consistent with what we use.  

Add configuration:

canal:
  serverAddress: 127.0.0.1
  serverPort: 11111
  instance:
    - example

Use the Spring Bean life cycle function afterPropertiesSet() in the CannalClient class. Remember, this is just monitoring, not for real projects, so don’t copy it. The knowledge leaflet here lets you see the effect of canal monitoring:

package com.codingfire.canal.Client;

import com.alibaba.fastjson.JSONObject;
import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import com.google.protobuf.ByteString;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.stereotype.Component;

import java.net.InetSocketAddress;
import java.util.List;

@Component
public class CanalClient implements InitializingBean {
    private final static int BATCH_SIZE = 1000;

    @Override
    public void afterPropertiesSet() throws Exception {
        // 创建链接
        CanalConnector connector = CanalConnectors.newSingleConnector(new InetSocketAddress("127.0.0.1", 11111), "example", "", "");
        try {
            //打开连接
            connector.connect();
            //订阅数据库表,全部表
            connector.subscribe(".*\\..*");
            //回滚到未进行ack的地方,下次fetch的时候,可以从最后一个没有ack的地方开始拿
            connector.rollback();
            while (true) {
                // 获取指定数量的数据
                Message message = connector.getWithoutAck(BATCH_SIZE);
                System.out.println(message.getEntries().size());
                //获取批量ID
                long batchId = message.getId();
                //获取批量的数量
                int size = message.getEntries().size();
                //如果没有数据
                if (batchId == -1 || size == 0) {
                    try {
                        //线程休眠2秒
                        Thread.sleep(2000);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                } else {
                    System.out.println("----------------");
                    //如果有数据,处理数据
                    //遍历entries,单条解析
                    for (CanalEntry.Entry entry : message.getEntries()) {
                        //获取表名
                        String tableName = entry.getHeader().getTableName();
                        //获取类型
                        CanalEntry.EntryType entryType = entry.getEntryType();
                        //获取序列化后的数据
                        ByteString storeValue = entry.getStoreValue();
                        //判断entry类型是否为ROWDATA类型
                        if (CanalEntry.EntryType.ROWDATA.equals(entryType)){
                            //反序列化
                            CanalEntry.RowChange rowChange = CanalEntry.RowChange.parseFrom(storeValue);
                            //获取当前事件操作类型
                            CanalEntry.EventType eventType = rowChange.getEventType();
                            //获取数据集
                            List<CanalEntry.RowData> rowDatasList = rowChange.getRowDatasList();
                            //遍历
                            for (CanalEntry.RowData rowData : rowDatasList) {
                                //改变前数据
                                JSONObject jsonObjectBefore = new JSONObject();
                                List<CanalEntry.Column> beforeColumnsList = rowData.getBeforeColumnsList();
                                for (CanalEntry.Column column : beforeColumnsList) {
                                    jsonObjectBefore.put(column.getName(),column.getValue());
                                }
                                //改变后数据
                                JSONObject jsonObjectAfter = new JSONObject();
                                List<CanalEntry.Column> afterColumnsList = rowData.getAfterColumnsList();
                                for (CanalEntry.Column column : afterColumnsList) {
                                    jsonObjectAfter.put(column.getName(),column.getValue());
                                }
                                System.out.println("Table:"+tableName+",EventTpye:"+eventType+",Before:"+jsonObjectBefore+",After:"+jsonObjectAfter);
                            }
                        }else {
                            System.out.println("当前操作类型为:"+entryType);
                        }
                    }
                }
                //进行 batch id 的确认。确认之后,小于等于此 batchId 的 Message 都会被确认。
                connector.ack(batchId);
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            connector.disconnect();
        }
    }
}

Now comes the most exciting moment, please run our Spring Boot project:

 

Seeing this means that the startup is successful. Next, we connect to the database:

mysql -uroot -p123456

It doesn’t matter which user you are connected to. If there is no database, you can create a new database. If you already have one, then you can directly operate the database table inside. The blogger currently has a canal database, and we will use this database:

use canal;

There is a user table in the blogger, and the table in it is operated:

insert into user value(null ,'小明','123456',20,'13812345678');

Now check whether the console has monitored the database changes:

  

You can see that the console has printed out the SQL we just operated, and the test is successful.

Note: This is just monitoring, not a real usage scenario. It is just to let everyone intuitively see the scenario where SQL statements are monitored. In practical applications, we will use it in conjunction with MQ, but it is not explained in this article. 

epilogue

This blog is just an explanation of the basic configuration and monitoring mechanism of canal. It aims to help you understand the working method of canal. In the next blog, we will use MQ to synchronize data, so don’t worry, let’s take it slowly , step by step, we must learn the basic knowledge solidly, the configuration of canal is very similar to the master-slave of MySQL, and it is relatively simple, mainly configuration items, so we need to be more careful and make no mistakes, otherwise a Incorrect parameters will cause the system to fail to operate normally. Okay, see you next time.

Guess you like

Origin blog.csdn.net/CodingFire/article/details/131420343