This article takes you to a quick introduction to Canal, just read this one!


Preface

          When we are doing real-time data warehouses, data is often stored in a database such as MySQL. When a piece of data is added or modified, the data needs to be immediately synchronized to Kafka or other databases. At this time, we need to use Ali open source Canal. To realize our function.82828a4bfcd0a438592d87112ea52b0e.jpg

1. What is Canal

Let's look at the description of the official website:

canal [kə'næl], waterway translation means / pipes / drains, the main purpose is based on MySQLa database log incremental parsing,提供增量数据订阅和消费

According to the official website we describe about can be understood as Canalprimarily based on MySQLdoing incremental data synchronization, such as: real-time synchronization of data to kafka, HBase, ES, etc., can understand the data 同步工具.96e1e7e56efa33882b9db9c7214550c7.jpg

2. What Canal Can Do

  • Database mirroring
  • Real-time database backup
  • Index construction and real-time maintenance (split heterogeneous index, inverted index, etc.)
  • Business cache refresh
  • Incremental data processing with business logic

Note: The current version of MySQL Canal supports are 5.1.x, 5.5.x, 5.6.x, 5.7.x,8.0.x

Three, Canal working principle

549480d752ef2378093a6b7643b325a4.jpgHow MySQL slave works

  • MySQL master writes data changes to the binary log (binary log, where the record is called binary log events, which can be viewed through show binlog events)
  • MySQL slave copies the master's binary log events to its relay log (relay log)
  • MySQL slave replays the events in the relay log and reflects the data changes to its own data

How canal works

  • canal simulates the interaction protocol of MySQL slave, pretends to be a MySQL slave, and sends dump protocol to MySQL master
  • MySQL master receives the dump request and starts to push the binary log to the slave (ie canal)
  • canal parse binary log object (originally byte stream)

Fourth, deploy Canal

4.1 Install MySQL

          I have posted how to deploy MySQL before. I will not write it here again. If MySQL is not installed on your machine, you can go to this article —> https://blog.csdn.net/qq_43791724/article/details/108196454

Turn on MySQL binary log

         When we successfully install MySQL successfully, there will be a my.cnffile that needs to be added.

[mysqld]
log-bin=/var/lib/mysql/mysql-bin # 开启 binlog
binlog-format=ROW # 选择 ROW 模式
server_id=1 # 配置 MySQL replaction 需要定义,不要和 canal 的 slaveId 重复

         Note: When we opened the binary logcan after we log mode log-binto create a directory mysql-bin.*file. When the data in our database changes 会mysql-bin.*, a record is generated in the file.

4.2 Install Canal

Go to the official to download the required version https://github.com/alibaba/canal/releases The version I use here is:1.0.24

  1. Upload the downloaded gz package to the specified directory
  2. Create a folder
mkdir canal
  1. Unzip the gz package
tar -zxvf canal.deployer-1.0.24.tar.gz  -C ../servers/canal/
  1. Configuration canal.properties

The first four configuration items of the common attribute:

canal.id= 1
canal.ip=
canal.port= 11111
canal.zkServers=

canal.id is the number of the canal. In a cluster environment, the id of different canal is different, pay attention to it and mysql的server_id不同. ip is not specified here, and the default is this machine, for example, the above is 192.168.100.201, and the port number is 11111. zk is used for canal cluster. 5. Look at the configuration related canal.propertiesto destinations :

#################################################
#########       destinations        ############# 
#################################################
canal.destinations = example
canal.conf.dir = ../conf
canal.auto.scan = true
canal.auto.scan.interval = 5
canal.instance.global.mode = spring 
canal.instance.global.lazy = false
canal.instance.global.spring.xml = classpath:spring/file-instance.xml

这里的canal.destinations = example可以设置多个,比如example1,example2,则需要创建对应的两个文件夹,并且每个文件夹下都有一个instance.properties文件。全局的canal实例管理用spring,这里的file-instance.xml最终会实例化所有的destinations instances:\

  1. 全局的canal实例管理用spring,这里的file-instance.xml最终会实例化所有的destinations instances:
<!-- properties -->
<bean class="com.alibaba.otter.canal.instance.spring.support.PropertyPlaceholderConfigurer" lazy-init="false">
 <property name="ignoreResourceNotFound" value="true" />
    <property name="systemPropertiesModeName" value="SYSTEM_PROPERTIES_MODE_OVERRIDE"/><!-- 允许system覆盖 -->
    <property name="locationNames">
     <list>
         <value>classpath:canal.properties</value>                     <value>classpath:${canal.instance.destination:}/instance.properties</value>
         </list>
    </property>
</bean>

<bean id="socketAddressEditor" class="com.alibaba.otter.canal.instance.spring.support.SocketAddressEditor" />
<bean class="org.springframework.beans.factory.config.CustomEditorConfigurer"
   <property name="propertyEditorRegistrars">
    <list>
      <ref bean="socketAddressEditor" />
       </list>
   </property>
</bean>
<bean id="instance" class="com.alibaba.otter.canal.instance.spring.CanalInstanceWithSpring">
 <property name="destination" value="${canal.instance.destination}" />
    <property name="eventParser">
     <ref local="eventParser" />
    </property>
    <property name="eventSink">
        <ref local="eventSink" />
    </property>
    <property name="eventStore">
        <ref local="eventStore" />
    </property>
    <property name="metaManager">
        <ref local="metaManager" />
    </property>
    <property name="alarmHandler">
        <ref local="alarmHandler" />
    </property>
</bean>

比如canal.instance.destination等于example,就会加载example/instance.properties配置文件 7. 修改instance 配置文件

## mysql serverId,这里的slaveId不能和myql集群中已有的server_id一样
canal.instance.mysql.slaveId = 1234

#  按需修改成自己的数据库信息
#################################################
...
canal.instance.master.address=192.168.1.120:3306
# username/password,数据库的用户名和密码
...
canal.instance.dbUsername = root
canal.instance.dbPassword = 123456
#################################################
  1. 启动
sh bin/startup.sh
  1. 关闭
sh bin/stop.sh
  1. 通过jps 查询服务状态
[root@node01 ~]# jps
2133 CanalLauncher
4184 Jps

到这里说明我们的服务就配好了,这时候我们可以使用java代码创建一个客户端来进行测试

五、通过Java编写Canal客户端

5.1 导入依赖

 <dependencies>
        <dependency>
            <groupId>com.alibaba.otter</groupId>
            <artifactId>canal.client</artifactId>
            <version>1.0.24</version>
        </dependency>
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.58</version>
        </dependency>
    </dependencies>

5.2 编写测试类

package com.canal.Test;

/**
 * @author 大数据老哥
 * @version V1.0
 * @Package com.canal.Test
 * @File :CanalTest.java
 * @date 2021/1/11 21:54
 */


import com.alibaba.otter.canal.client.CanalConnector;
import com.alibaba.otter.canal.client.CanalConnectors;
import com.alibaba.otter.canal.protocol.CanalEntry;
import com.alibaba.otter.canal.protocol.Message;
import com.google.protobuf.InvalidProtocolBufferException;
import java.net.InetSocketAddress;
import java.util.List;

/**
 * 测试canal配置是否成功
 */

public class CanalTest {

    public static void main(String[] args) {
        //1.创建连接
        CanalConnector connect = CanalConnectors.newSingleConnector(new InetSocketAddress("192.168.100.201"11111),
                "example""""");        //指定一次性读取的条数
        int bachChSize = 1000;
        // 设置转态
        boolean running = true;
        while (running) {
            //2.建立连接
            connect.connect();
            //回滚上次请求的信息放置防止数据丢失
            connect.rollback();
            // 订阅匹配日志
            connect.subscribe();
            while (running) {
                Message message = connect.getWithoutAck(bachChSize);
                // 获取batchId
                long batchId = message.getId();
                // 获取binlog数据的条数
                int size = message.getEntries().size();
                if (batchId == -1 || size == 0) {

                } else {
                    printSummary(message);
                }
                // 确认指定的batchId已经消费成功
                connect.ack(batchId);
            }
        }
    }

    private static void printSummary(Message message) {
        // 遍历整个batch中的每个binlog实体
        for (CanalEntry.Entry entry : message.getEntries()) {
            // 事务开始
            if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN || entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
                continue;
            }

            // 获取binlog文件名
            String logfileName = entry.getHeader().getLogfileName();
            // 获取logfile的偏移量
            long logfileOffset = entry.getHeader().getLogfileOffset();
            // 获取sql语句执行时间戳
            long executeTime = entry.getHeader().getExecuteTime();
            // 获取数据库名
            String schemaName = entry.getHeader().getSchemaName();
            // 获取表名
            String tableName = entry.getHeader().getTableName();
            // 获取事件类型 insert/update/delete
            String eventTypeName = entry.getHeader().getEventType().toString().toLowerCase();

            System.out.println("logfileName" + ":" + logfileName);
            System.out.println("logfileOffset" + ":" + logfileOffset);
            System.out.println("executeTime" + ":" + executeTime);
            System.out.println("schemaName" + ":" + schemaName);
            System.out.println("tableName" + ":" + tableName);
            System.out.println("eventTypeName" + ":" + eventTypeName);

            CanalEntry.RowChange rowChange = null;
            try {
                // 获取存储数据,并将二进制字节数据解析为RowChange实体
                rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
            } catch (InvalidProtocolBufferException e) {
                e.printStackTrace();
            }

            // 迭代每一条变更数据
            for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
                // 判断是否为删除事件
                if (entry.getHeader().getEventType() == CanalEntry.EventType.DELETE) {
                    System.out.println("---delete---");
                    printColumnList(rowData.getBeforeColumnsList());
                    System.out.println("---");
                }
                // 判断是否为更新事件
                else if (entry.getHeader().getEventType() == CanalEntry.EventType.UPDATE) {
                    System.out.println("---update---");
                    printColumnList(rowData.getBeforeColumnsList());
                    System.out.println("---");
                    printColumnList(rowData.getAfterColumnsList());
                }
                // 判断是否为插入事件
                else if (entry.getHeader().getEventType() == CanalEntry.EventType.INSERT) {
                    System.out.println("---insert---");
                    printColumnList(rowData.getAfterColumnsList());
                    System.out.println("---");
                }
            }
        }
    }
    // 打印所有列名和列值
    private static void printColumnList(List<CanalEntry.Column> columnList) {
        for (CanalEntry.Column column : columnList) {
            System.out.println(column.getName() + "\t" + column.getValue());
        }
    }
}

5.3 启动测试

         在数据库中随便修改一条数据看看能不能使用Canal客户端能不能消费到1da235fe27a0439a7220bf6929eaa63e.jpg

小结

          今天给大家分享了Canle它的主要的功能做增量数据同步,后面会使用Canle进行做实时数仓。我在这里为大家提供大数据的资源需要的朋友可以去下面GitHub去下载,信自己,努力和汗水总会能得到回报的。我是大数据老哥,我们下期见~~~

资源获取 获取Flink面试题,Spark面试题,程序员必备软件,hive面试题,Hadoop面试题,Docker面试题,简历模板等资源请去 GitHub自行下载 https://github.com/lhh2002/Framework-Of-BigData Gitee 自行下载  https://gitee.com/li_hey_hey/dashboard/projects


Guess you like

Origin blog.51cto.com/14417862/2589241