canal monitoring mysql practice

canal monitoring mysql practice

 Canal is a middleware developed in java that provides incremental data subscription & consumption based on database incremental log parsing. At present, canal mainly supports the binlog analysis of MySQL, and the canal client is used to process the obtained relevant data after the analysis is completed. (Database synchronization requires Ali's otter middleware, based on canal). Usage scenarios include:

1. Cache update

2. An asynchronous database or an intermediary that synchronizes to a relational database

canal introduction and working principle

Services supported by log incremental subscription & consumption:

  1. database mirroring
  2. Database real-time backup
  3. Multi-level index (seller and buyer's respective sub-database index)
  4. search build
  5. Business cache refresh
  6. Important business news such as price changes

This also introduces the monitoring of important data change messages such as business cache refresh and price changes.

The Canal principle is relatively simple:

img

  1. Canal simulates the interactive protocol of mysql slave, pretends to be mysql slave, and sends dump protocol to mysql master
  2. The mysql master receives the dump request and starts to push the binary log to the slave (that is, canal)
  3. Canal parses the binary log object (originally a byte stream)

img

Canal architecture and working principle

  1. server represents a canal running instance, corresponding to a jvm
  2. instance corresponds to a data queue (1 canal server corresponds to 1..n instances)
  3. Submodules under instance
  4. eventParser: Data source access, simulate slave protocol to interact with master, protocol analysis
  5. eventSink: Parser and Store linker for data filtering, processing and distribution
  6. eventStore: data store
  7. metaManager: incremental subscription & consumption information managerimg
  • EventSink acts as a channel-like function, which can filter, distribute/route (1:n), merge (n:1) and process data . EventSink is a bridge connecting EventParser and EventStore.
  • EventStore实现模式是内存模式,内存结构为环形队列,由三个指针(Put、Get和Ack)标识数据存储和读取的位置。
  • MetaManager是增量订阅&消费信息管理器,增量订阅和消费之间的协议包括get/ack/rollback,分别为:
  • Message getWithoutAck(int batchSize),允许指定batchSize,一次可以获取多条,每次返回的对象为Message,包含的内容为:batch id[唯一标识]和entries[具体的数据对象]
  • void rollback(long batchId),顾名思义,回滚上次的get请求,重新获取数据。基于get获取的batchId进行提交,避免误操作
  • void ack(long batchId),顾名思议,确认已经消费成功,通知server删除数据。基于get获取的batchId进行提交,避免误操作

docker canal搭建

先在Docker Hub中下载canal-server镜像

docker pull canal/canal-server:latest
复制代码

先启动Canal,用于复制properties配置文件

docker run -p 11111:11111 --name canal -d canal/canal-server:latest
复制代码

初次启动Canal镜像后,将instance.properties文件复制到宿主机,用于后续挂载使用

docker cp canal:/home/admin/canal-server/conf/example/instance.properties  /mydata/canal/conf/
复制代码

修改instance.properties,该文件主要配置监听的mysql实例

#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal.instance.mysql.slaveId=0# enable gtid use true/false 未开启gtid主从同步
canal.instance.gtidon=false# position info 在同一宿主机内 若有主从数据库,填写主数据库地址
canal.instance.master.address=172.17.0.1:3306
#需要读取的起始的binlog文件 不填写的话默认应该是从最新的Binlog开始监听
canal.instance.master.journal.name=
#需要读取的起始的binlog文件的偏移量
canal.instance.master.position=
#需要读取的起始的binlog的时间戳
canal.instance.master.timestamp=
canal.instance.master.gtid=
​
# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=
​
# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal# 从数据库地址 主备切换时使用
#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==# table regex
canal.instance.filter.regex=.*\..*
# table black regex 不需要监听的名单
canal.instance.filter.black.regex=mysql\..*,sys\..*,performance_schema\..*,information_schema\..*
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch# mq config 默认的sql存储队列
canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,topic2:mytest2\..*,.*\..*
canal.mq.partition=0
# hash partition config
#canal.mq.enableDynamicQueuePartition=false
#canal.mq.partitionsNum=3
#canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
#canal.mq.partitionHash=test.table:id^name,.*\..*
#################################################复制代码

Canal为我们提供了canal.instance.filter.regex与canal.instance.filter.black.regex选项参数来过滤数据库表数据解析,类似黑白名单。常见例子有: ●所有表:.* or ... ●canal schema下所有表:canal..* ●canal下的以canal打头的表:canal.canal.* ●canal schema下的一张表:canal.test1 ●多个规则组合使用:canal..*,mysql.test1,mysql.test2 (逗号分隔)

修改canal.properties,该文件主要时配置canal server

#################################################
#########     destinations      #############
#################################################
##配置监听多数据实例的地方 单数据库监听的话这里配置example就可以
canal.destinations = example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5
# set this value to 'true' means that when binlog pos not found, skip to latest.
# WARN: pls keep 'false' in production env, or if you know what you want.
canal.auto.reset.latest.pos.mode = falsecanal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
#canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xmlcanal.instance.global.mode = spring
canal.instance.global.lazy = false
canal.instance.global.manager.address = ${canal.admin.manager}
#canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
canal.instance.global.spring.xml = classpath:spring/file-instance.xml
#canal.instance.global.spring.xml = classpath:spring/default-instance.xml# tcp, kafka, rocketMQ, rabbitMQ, pulsarMQ 选择的消费队列
canal.serverMode = tcp
复制代码

消费队列模式与Server-client模式一致,主要区别如下:

  • 不需要CanalServerWithNetty,改为CanalMQProducer投递消息给消息队列
  • 不使用CanalClient,改为MqClient获取消息队列的消息进行消费

这种模式相比于Server-client模式

  • 下游解耦,利用消息队列的特性,可以支持多个客户端广播消费、集群消费、重复消费等

  • 会增加系统的复杂度,增加一些延迟

image.png

#本地的instance.properties:容器的instance.properties  将容器的instance.properties配置文件挂载到宿主机,方便后续变更
docker stop canal;docker rm canal; 重新生成容器
docker run -p 11111:11111 --name canal -v /mydata/canal/conf/instance.properties:/home/admin/canal-server/conf/example/instance.properties -d canal/canal-server:latest
复制代码

image.png

查看消费实例example的日志可以看出canal监听的binlog位置正好是连接时的binlog位置,前提是未指定了Binlog的位置。客户端开始连接后便可以从指定位置开始消费增量的binlog。binlog-format=ROW # 选择 ROW 模式

java客户端实例消费

1.引入pom文件

            <!--canal-->
            <dependency>
                <groupId>com.alibaba.otter</groupId>
                <artifactId>canal.client</artifactId>
                <version>1.1.5</version>
            </dependency>
​
            <!-- Message、CanalEntry.Entry等来自此安装包 -->
            <dependency>
                <groupId>com.alibaba.otter</groupId>
                <artifactId>canal.protocol</artifactId>
                <version>1.1.5</version>
            </dependency>复制代码

2.application.yml配置文件canal

canal:
  serverAddress: 42.192.183.193
  serverPort: 11111
  instance: #多个instance
    - example
复制代码

对应的properties文件

@Component
@ConfigurationProperties(prefix = "canal")
@Data
public class CanalInstanceProperties {
​
    /**
     * canal server地址
     */
    private String serverAddress;
​
    /**
     * canal server端口
     */
    private Integer serverPort;
​
    /**
     * canal 监听实例
     */
    private Set<String> instance;
​
}
复制代码

3.监听数据库变动代码

@Component
@Slf4j
public class MysqlDataListening {
​
    private static final ThreadFactory springThreadFactory = new CustomizableThreadFactory("canal-pool-");
​
    private static final ExecutorService executors = Executors.newFixedThreadPool(1, springThreadFactory);
​
    @Autowired
    private CanalInstanceProperties canalInstanceProperties;
​
​
    @PostConstruct
    private void startListening() {
        canalInstanceProperties.getInstance().forEach(
            instanceName -> {
                executors.submit(() -> {
                    connector(instanceName);
                });
            }
        );
    }
​
    /**
     * 消费canal的线程池
     */
    public void connector(String instance){
        CanalConnector canalConnector = CanalConnectors.newSingleConnector(
                new InetSocketAddress(canalInstanceProperties.getServerAddress(), canalInstanceProperties.getServerPort()),
                instance, "", "");
        canalConnector.connect();
        //订阅所有消息
        canalConnector.subscribe(".*\..*");
        // canalConnector.subscribe("test1.*"); 只订阅test1数据库下的所有表
        //恢复到之前同步的那个位置
        canalConnector.rollback();
​
        for(;;){
            //获取指定数量的数据,但是不做确认标记,下一次取还会取到这些信息。 注:不会阻塞,若不够100,则有多少返回多少
            Message message = canalConnector.getWithoutAck(100);
            //获取消息id
            long batchId = message.getId();
            int size = message.getEntries().size();
            if (size == 0 || batchId == -1) {
                try{
                    Thread.sleep(1000);
                } catch (InterruptedException ignored) {
                }
            }
            if(batchId != -1){
                log.info("instance -> {}, msgId -> {}", instance, batchId);
                printEnity(message.getEntries());
                //提交确认
                canalConnector.ack(batchId);
                //处理失败,回滚数据
                //canalConnector.rollback(batchId);
            }
        }
    }
​
    private  void printEnity(List<CanalEntry.Entry> entries) {
        for (CanalEntry.Entry entry : entries) {
            if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN
                    || entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND) {
                continue;
            }
​
            CanalEntry.RowChange rowChange = null;
            try{
                // 序列化数据
                rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
            } catch (InvalidProtocolBufferException e) {
                e.printStackTrace();
            }
            assert rowChange != null;
            CanalEntry.EventType eventType = rowChange.getEventType();
            log.info(String.format("================>; binlog[%s:%s] , name[%s,%s] , eventType : %s",
                    entry.getHeader().getLogfileName(), entry.getHeader().getLogfileOffset(),
                    entry.getHeader().getSchemaName(), entry.getHeader().getTableName(),
                     eventType));
​
            if (rowChange.getEventType() == CanalEntry.EventType.QUERY || rowChange.getIsDdl()) {
                log.info("sql ------------>{}" ,rowChange.getSql());
            }
​
            for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
                    switch (rowChange.getEventType()){
                        //如果希望监听多种事件,可以手动增加case
                        case UPDATE:
                            printColumn(rowData.getAfterColumnsList());
                            printColumn(rowData.getBeforeColumnsList());
                            break;
                        case INSERT:
                            printColumn(rowData.getAfterColumnsList());
                            break;
                        case DELETE:
                            printColumn(rowData.getBeforeColumnsList());
                            break;
                        default:
                    }
                }
​
        }
    }
​
    private void printColumn(List<CanalEntry.Column> columns) {
        StringBuilder sb = new StringBuilder();
        for (CanalEntry.Column column : columns) {
            sb.append("[");
            sb.append(column.getName()).append(" : ").append(column.getValue()).append("    update=").append(column.getUpdated());
            sb.append("]");
            sb.append("    ");
        }
        log.info(sb.toString());
    }
复制代码

数据库变动效果

image.png

注意的问题canal client: 为了保证有序性,一份实例(instance)同一时间只能由一个canal client进行get/ack/rollback操作,否则客户端接收无法保证有序。canal server 上的一个 instance 只能有一个 client 消费。clientId是固定的,Binlog文件落入文件保存。

image.png 由于保证了有序性,生产过快而消费慢的问题,如何解决消费堆积问题

其次在使用Canal自带客户端进行同步时需要自己手动调用get()或者getWithoutAck()进行拉取 拉取日志后进行同步只能一条一条处理,效率比较低 为了解决上面的问题打算在日志同步过程中引入MQ来作为中间同步,Canal支持RocketMQ和Kafka两种,最终选用Kafka来进行

总结

The principle of canal is to use the mysql master-slave replication protocol to simulate pulling incremental binlog dates from the database. Canal uses Instance as a slave database instance, and after the client connects to the instance, it consumes incremental Binlog logs in an orderly manner. There are a few points to pay special attention to. First, the production and consumption model of canal is an array with pointers, which point to the production location, consumption location and ack location respectively to control the consumption and production queues. The second is that the configuration of Binlog requires the row format, and the analysis of canal is adapted to the row format. The third is that canal ensures that only one client consumes through client competition to ensure the orderliness of the binlog. Fourth, when there is a large amount of data on the production side, canal will have the problem of untimely consumption, and there is a certain delay. During performance analysis, the business binlog is stored in the canal client to get the data, which can basically reach a TPS of 10~20w. The specific business analysis must be lower than this, but for general business, it is enough.

refer to

github.com/luozijing/s… code warehouse

blog.csdn.net/gudejundd/a… cache deletion solution

zhuanlan.zhihu.com/p/345736518… Detailed explanation of canal

github.com/alibaba/can… canal detailed explanation

github.com/alibaba/can… canal performance

Guess you like

Origin juejin.im/post/7165901699435986975