The use of canal--log incremental subscription & consumption

principle

1. Canal simulates the interactive protocol of mysql slave, pretends to be mysql slave, and sends dump protocol to mysql master

2. The mysql master receives the dump request and starts to push (slave pull, not the master actively push to slaves) binary log to slave (that is, canal)

3. Canal parses the binary log object (original is byte stream)Enter image description

Architecture

Enter image description

Component Description:

server represents a canal running instance, corresponding to a jvm

instance corresponds to a data queue (1 server corresponds to 1..n instances)

The instance module is composed of eventParser (data source access, simulating slave protocol and master interaction, protocol analysis), eventSink (Parser and Store connector, data filtering, processing, and distribution), eventStore (data storage) and metaManager (Incremental Subscription & Consumption Information Manager) composition.

Before EventParser sends the dump command to mysql, it will first obtain the position of the last successful parsing from Log Position (if it is the first startup, it will obtain the initial specified position or the binlog position of the current data segment). After mysql receives the dump command, EventParser parses the pull binlog data from mysql and passes it to EventSink (passed to the EventSink module for data storage, which is a blocking operation until the storage is successful), and updates the Log Position after successful transmission. The flow chart is as follows:Enter image description

EventSink acts as a channel-like function that can filter, distribute/route (1:n), merge (n:1) and process data.

EventSink is the bridge connecting EventParser and EventStore.

The EventStore implementation mode is the memory mode, the memory structure is a circular queue, and the location of data storage and reading is identified by three pointers (Put, Get, and Ack).

MetaManager is an incremental subscription & consumption information manager. The protocol between incremental subscription and consumption includes get/ack/rollback, respectively: Message getWithoutAck(int batchSize), allowing batchSize to be specified, multiple items can be obtained at one time, and returned each time The object is Message, which contains: batch id [unique identifier] and entries [specific data object]

void rollback(long batchId), Gu Mingsi, rolls back the last get request, and retrieves the data again. Submit based on the batchId obtained by get to avoid misoperation

void ack(long batchId), Guming Siyi, confirms that the consumption has been successful, and informs the server to delete the data. Submit based on the batchId obtained by get to avoid misoperation

The get/ack/rollback protocol of canal is different from the regular jms protocol, allowing get/ack to be processed asynchronously, for example, you can call get multiple times in a row, and then submit ack/rollback asynchronously in sequence, which is called streaming api in the project.

HA mechanism

Canal supports HA, and its implementation mechanism is also implemented by zookeeper. The features used include watcher and EPHEMERAL nodes (bound with session life cycle), which are similar to HDFS HA.

The ha of canal is divided into two parts, canal server and canal client have corresponding ha implementations respectively

canal server: In order to reduce requests for mysql dump, instances on different servers (the same instance on different servers) require that only one be running at the same time, and the others are in standby state (standby is the state of instance).

canal client: In order to ensure orderliness, only one canal client can perform get/ack/rollback operations for an instance at a time, otherwise the client cannot guarantee ordering.

The architecture diagram of server ha is as follows:

Enter image description

Rough steps:

When the canal server wants to start a canal instance, it first tries to start a judgment with zookeeper (implementation: create an EPHEMERAL node, whoever is successfully created will be allowed to start)

After the zookeeper node is successfully created, the corresponding canal server starts the corresponding canal instance, and the canal instance that is not successfully created will be in the standby state.

Once zookeeper finds that the instance node created by canal server A disappears, it immediately informs other canal servers to perform the operation in step 1 again, and re-selects a canal server to start the instance.

Every time the canal client connects, it will first ask the zookeeper who has started the canal instance, and then establish a link with it. Once the link is unavailable, it will try to connect again.

Canal deployment and use

MySQL configuration

Canal synchronization data needs to scan MySQL's binlog log, and binlog is disabled by default and needs to be enabled. In order to ensure the consistency of synchronized data, the log format used is row-based replication (RBR), and binlog is enabled in my.conf.

[mysqld]
log-bin=mysql-bin #添加这一行就ok
binlog-format=ROW #选择row模式
server_id=1 #配置mysql replaction需要定义,不能和canal的slaveId重复

After changing my.conf, you need to restart MySQL. There are many ways to restart, just find the one that suits you.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325317207&siteId=291194637