Foreword
The previous article, we learned canal can sense changes in data from MySQL. This is because it mimics MySQL slave interactive protocol, disguised himself as MySQL slave, in order to achieve a master-slave replication.
It is learned this, I have two questions it has been lingering in my mind:
- How does it interact with the simulation MySQL slave protocol?
- It is how to resolve binlog log it?
Today, I prepare it with these two issues, push lightly push lightly canal code, check it out.
A, MySQL master-slave replication
Before turning to canal, we need to further review under the principle of MySQL master-slave replication.
FIG summarizes the process is as follows:
- MySQL master data changes are written to the binary log (binary log, which records the binary log event is called binary log events);
- MySQL slave to the master copy of the binary log events to its relay log (relay log);
- MySQL slave relay log replay the event, data changes will be reflected in its own database.
Two, canal principles
The figure is very image describes the roles of the canal. Its principle is simple:
- canal interactive simulation mysql slave protocol, disguised himself as mysql slave, sent the agreement to dump mysql master;
- mysql master dump request is received, starts to push binary log Slave (i.e. Canal);
- Analytical canal binary log object (for the original byte stream);
- The canal parsed object, according to the service scenarios, such as to distribute MySQL, RocketMQ or ES.
Third, the source start
After reading the MySQL master-slave replication and canal principles, in order to facilitate debug, I have a source Fork in GitHub, and import local.
Can be found in com.alibaba.otter.canal.deployer.CanalLauncher
the class, it is a stand-alone version canal entrance class started.
Here, the main method can be run directly run canal, and in /canal/bin/startup.sh
the same in effect.
In fact, canal codes are more divided on the architecture and design of many modules, such as a parser event, event consumer, memory, storage, service instances, metadata, and other high availability.
This article is not intended to be exhaustive description of each one to achieve, it would have eight children to write a series of canal job. Mainly to the beginning of the two issues we have raised.
Fourth, how to simulate slave?
We have already said, CanalLauncher
it is the entry class canal started.
After running the main method, canal will open with a lot of preparatory work. For example, loading a configuration file, initialization message queue, start canal Admin, load the Spring configuration, hook registration procedures.
canal analog slave protocol in EventParser
the module starts in.
In canal code, simplifying the whole process as follows:
// 开始执行replication
// 1. 构造Erosa连接
ErosaConnection erosaConnection = buildErosaConnection();
// 2. 启动一个心跳线程
startHeartBeat(erosaConnection);
// 3. 执行dump前的准备工作
preDump(erosaConnection);
erosaConnection.connect();// 链接
// 查询master serverId
long queryServerId = erosaConnection.queryServerId();
if (queryServerId != 0) {
serverId = queryServerId;
}
// 4. 获取binlog最后的位置信息
EntryPosition position = findStartPosition(erosaConnection);
final EntryPosition startPosition = position;
// 加载元数据
processTableMeta(startPosition);
// 重新链接,因为在找position过程中可能有状态,需要断开后重建
erosaConnection.reconnect();
// 4. 开始dump数据
erosaConnection.dump(startPosition.getJournalName(),startPosition.getPosition(),sinkHandler);
复制代码
1, shake hands, verification
Before you begin, canal must first establish a connection to the MySQL server, and complete the client authentication.
In MySQL, the connection process following protocol:
In the code, we look at its connection method:
Wherein negotiate
the method is client authentication handshake protocol and the specific implementation. MySQL is in accordance with the protocol specification, created by the above Socket channel
to read and write data network.
2, preparation before the dump
After properly connected to MySQL, before starting dump instruction, but also to initialize some configuration information.
The idea is by MySQL actuators, execute SQL statements, get information.
Code is non-stick, but the statement they are executed as follows:
show variables like 'binlog_format' #获取binlog format格式
show variables like 'binlog_row_image' #获取binlog image格式
show variables like 'server_id' #获取matser serverId
show master status #获取binlog名称和position
复制代码
3, registration slave
Now calls the erosaConnection.dump(binlogfilename,binlogPosition,func)
method to register slave and send the dump command.
In use COM_BINLOG_DUMP
prior to sending the request binlog event, register with the primary server from a server, its instructions are COM_REGISTER_SLAVE
.
After registering, dump request is transmitted, its instructions are COM_BINLOG_DUMP
.
After you perform this code, we pass show processlist;
to see the process, you can see the state of this dump thread.
id | user | host | db | command | time | state |
---|---|---|---|---|---|---|
139 | canal | localhost:62901 | null | Binlog Dump | 3 | Master has sent all binlog to slave; waiting for more updates |
Fifth, how to resolve binlog data?
In the previous section we have seen, MySQL master server has accepted the canal from the server, then when the canal to get binlog content, it is how to resolve it?
First of all, remember when configuring MySQL server, we will binlog-format
set ROW mode, it is row-based replication.
binlog each data change events can be called, in the ROW mode, there are several major types of events:
event | SQL command | rows content |
---|---|---|
TABLE_MAP_EVENT | null | Definition table to be changed. |
WRITE_ROWS_EVENT | insert | To insert row data |
DELETE_ROWS_EVENT | delete | Deleted data |
UPDATE_ROWS_EVENT | Update | To change the data of the original data + |
Every change data will trigger two events, first you want to change the table information to tell you, and then tell you to change the content of row.
For example TABLE_MAP_EVENT + WRITE_ROWS_EVENT
.
canal after receiving binlog data, and it is not immediately resolved to our familiar JSON data, but only started when sent.
For example, we choose to use RocketMQ
, then began binlog inside the byte array into an object before sending.
// 并发构造
EntryRowData[] datas = MQMessageUtils.buildMessageData(message, executor);
// 串行分区
List<FlatMessage> flatMessages = MQMessageUtils.messageConverter(datas, message.getId());
复制代码
In both methods, the complete conversion to a byte array object. Converted into FlatMessage
an object, it becomes our consumption to the message queue data structure.
public class FlatMessage implements Serializable {
private long id;
private String database;
private String table;
private List<String> pkNames;
private Boolean isDdl;
private String type;
// binlog executeTime
private Long es;
// dml build timeStamp
private Long ts;
private String sql;
private Map<String, Integer> sqlType;
private Map<String, String> mysqlType;
private List<Map<String, String>> data;
private List<Map<String, String>> old;
}
复制代码
to sum up
As the beginning of this article said, when I had just learned canal mechanism, really feel very weird.
Hey, it's how MySQL slave simulate it? Always feel that is not what's on the inside black technology. . .
In fact, it is due to ignorance of the author of MySQL.
MySQL already developed a good variety of interface protocols, how to connect, validation, registration and dump there are obviously white write it.
It should be a sentence: just flowers, just waiting for the king to ~