Building Local Canal Middleware for Data Migration - Inspiration from Cache Breakdown

Get into the habit of writing together! This is the 21st day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .


Let's start with cache breakdown

The so-called cache breakdown means that there is no data in the cache for hot data, and a large number of user requests are directly hitting the database. This is a very dangerous situation, and we should avoid this process during the development process.

There are 3 solutions that we often use at present:

  • Cache expiration strategy scheme: use the read-write separation architecture, reads are always read from the cache, and canal is used for synchronization between DB and cache

  • Hotspot caching strategy: After identifying hotspot data, send requests for hotspot data to a special area

  • Mutual exclusion lock: before the data is about to expire, add a mutex lock to the key. If other requests come, if the current key is found to be locked, it means that there is already a request to the background to get the data value before the expiration. Now, other requests can boldly read data from the cache. Requests to get data from the database can refresh the cache asynchronously, and the mutex lock can be removed after the refresh is complete

Guava's solution:

  • Guava Cache performs concurrency control during load. When multiple threads request a non-existing or expired cache item, it ensures that only one thread enters the load method, and other threads wait until the cache item is generated, thus avoiding a large number of thread hits. Pass through the cache directly to the DB.
  • Implemented through refreshAfterWrite, after configuring the refresh policy, the corresponding cache item will be set according to the

Refresh at a fixed time to avoid thread blocking and ensure that the cache item is in the latest state; the premise is that the cache item has been generated. In actual production situations, we can preheat the cache and generate cache items in advance to avoid thread accumulation caused by traffic peaks.

How to choose or not depends on the obligation. If you want to use the read-write separation architecture, it involves using Canal middleware to synchronize data. This article will lead you to build a local Demo to see how it is played~


MySQL data synchronization

The main process of data synchronization is shown in the following figure:

insert image description here

Every time the Master updates data, it writes it into the local bin log. The Slave synchronizes the Master's bin log to the local relay log through the IO thread, and then executes the relay log into the database through an SQL thread.


How Canal Works

Canal disguises itself as a Slave by forging the Dump protocol to synchronize data with the Master,

insert image description here


Build local Canal middleware for data migration

Download MySQL-community version dev.mysgl.com/downloads/m…

Download Canal github.com/alibaba/can…

请添加图片描述

There is a small bug under windows that needs to be changed (the mac version of the computer can be ignored), edit the startup.bat under the bin:

请添加图片描述

Then double-click to start startup.bat, if there is no error message in the log under the logs folder, the startup is successful

Open MySQL's Bin Log , find the MySQL configuration file my.cnf, and put the following code snippet into the configuration file

[mysqld]
log-bin = mysql-bin			# 开启bing log
binlog-format = ROW			# 选择 ROW 模式
server_id = 1				# 配置MySQL replaction 需要定义,不要和canal的slaveId重复
复制代码

Create test library Create canal_test table under test library test, only set one id field 请添加图片描述 Create user

CREATE USER canal IDENTIFIED BY 'canal'; 
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%'; 
FLUSH PRIVILEGES;
复制代码

Canal uses this user to connect to the target database and consume changes to the bin log

pom coordinates

<dependency>
    <groupId>com.alibaba.otter</groupId>
    <artifactId>canal.client</artifactId>
    <version>1.1.0</version>
</dependency>
复制代码

Java code

public class CanalStarter {
    public static void main(String[] args) throws InterruptedException, InvalidProtocolBufferException {
        CanalConnector connector = CanalConnectors.newSingleConnector(
                new InetSocketAddress(AddressUtils.getHostIp(), 11111),
                "example",  // destination
                "",          // username
                ""           // password
        );
        try{
            connector.connect();
            connector.subscribe(".*\\..*"); // .*.\.* 表示所有数据库的变更
            connector.rollback();   // 回到上次读取的位置,即回到上一个数据库的bin log消费到的这条记录的位置上

            while (true){
                Message msg = connector.getWithoutAck(100);     // 拿100条数据

                if (msg == null || msg.getId() < 0 || msg.getEntries().size() == 0){
                    System.out.println("nothing consumed");
                    Thread.sleep(1000);
                    continue;
                }
                // 对变更的数据做处理
                printEntry(msg.getEntries());

                connector.ack(msg.getId()); // 确认这条消息已经被消费掉
            }
        }finally {
            connector.disconnect();
        }
    }
}
复制代码

How to print the contents before and after data change:

    private static void printEntry(List<CanalEntry.Entry> entries) throws InvalidProtocolBufferException {
        for (CanalEntry.Entry entry : entries) {
            // 过滤掉一些类型
            if (entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONBEGIN ||
                    entry.getEntryType() == CanalEntry.EntryType.TRANSACTIONEND
            ){
                continue;
            }

            CanalEntry.RowChange rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());

            for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
                System.out.println("event type " + rowChange.getEventType());

                System.out.println("********** before change");
                printRowData(rowData.getBeforeColumnsList());

                System.out.println("********** after change");
                printRowData(rowData.getAfterColumnsList());
            }

        }
    }
复制代码

This will filter out changes that are not related to changes in the data CanalEntry.EntryType.TRANSACTIONBEGINitselfCanalEntry.EntryType.TRANSACTIONEND

How to print a row of data:

    private static void printRowData(List<CanalEntry.Column> columns){
        if (CollectionUtils.isEmpty(columns)){
            return;
        }
        columns.forEach(e -> {
            System.out.println(e.getName() + " : " + e.getValue());
        });
    }
复制代码

After starting the source code, modify the data in the original table:

INSERT INTO canal_test (id) VALUES (2022);
复制代码

请添加图片描述Since it is newly inserted data, there is no before, and after prints out the newly inserted data

UPDATE canal_test SET id = 2023 WHERE id = 2022;
复制代码

请添加图片描述

Guess you like

Origin juejin.im/post/7088881644333400078