Analysis and application practice of MySQL master-slave replication principle

Vivo Internet Server Team - Shang Yongxing

MySQL Replication (master-slave replication) means that data changes can be replicated from one MySQL Server to another or multiple MySQL Servers. Through the replication function, the high availability and scalability of the database can be expanded on the basis of single-point services. wait.

1. Background

MySQL is widely used in the production environment, and a large number of applications and services have important dependencies on MySQL services. It can be said that if the MySQL instance in the data layer fails, it will directly cause the upper layer to fail without a reliable downgrade strategy. At the same time, the data stored in MySQL needs to reduce the risk of loss as much as possible, so as to avoid the impact of asset loss and customer complaints caused by data loss in the event of a failure.

In the context of such requirements for service availability and data reliability, MySQL provides a reliable log-based replication capability (MySQL Replication) at the Server layer. Under the action of this mechanism, one or more slaves can be easily constructed library, improve the high availability and scalability of the database , and achieve load balancing at the same time :

  • Real-time data change backup

  • The written data of the main library will be continuously executed and retained on the redundant slave library nodes, reducing the risk of data loss

  • Horizontally expand nodes to support read-write separation

  • When the main library itself is under high pressure, the read traffic can be distributed to other slave library nodes to achieve read scalability and load balancing

  • High Availability Guarantee

  • When the main library fails, you can quickly switch to one of its slave libraries and promote the slave library to the main library. Because the data is the same, it will not affect the operation of the system

A MySQL cluster with features including but not limited to the above can cover most applications and failure scenarios, and has high availability and data reliability. The production environment MySQL provided by the current storage group is based on the default asynchronous master-slave replication cluster. An online database service that guarantees 99.99% availability and 99.9999% data reliability to the business.

This article will discuss in depth the implementation of MySQL's replication mechanism, and discuss how to specifically apply the replication capability to improve the availability and reliability of the database.

Second, the principle of replication

2.1 Introduction of Binlog

Discuss the principle of replication from a broad perspective. The binary log is used between MySQL servers to realize the transmission and replication of real-time data changes. The binary log here is a log belonging to the MySQL server, which records all the changes made to MySQL. This replication mode can also be divided into three types according to the characteristics of specific data:

  • Statement: Based on the statement format

  • In Statement mode, the original SQL statement executed on the master database is sent to the slave database for data acquisition during the replication process, and the master database will send the original SQL statement executed to the slave database.

  • Row: Based on the row format

  • In Row mode, the master database will record the specific row changes of data caused by each DML operation in the Binlog and copy them to the slave database, and the slave database will modify the data correspondingly according to the row change record, but the DDL type operation is still in the form of Statement format records.

  • Mixed: Based on mixed statement and row formats

  • MySQL will distinguish the log form to be recorded according to each specific SQL statement executed, that is, choose one between statement and row.

The earliest implementation was based on the statement format, which was introduced into MySQL in version 3.23. It has been the capability of the MySQL Server layer from the beginning, which has nothing to do with the specific storage engine used; after version 5.1, it began to support row-based replication; in 5.1 Replication of mixed formats is supported after version .8.

These three modes have their own advantages and disadvantages. Relatively speaking, the row format based on Row is more widely used. Although the overhead of resources in this mode will be relatively large, the accuracy and reliability of data changes are stronger than Statement. At the same time, Binlog in this mode provides complete data change information, which can make its application not limited to the MySQL cluster system, and can be used by services such as Binlogserver, DTS data transmission, etc., to provide flexible cross-system data transmission ability, the current online MySQL clusters for Internet business are all based on Binlog in Row format .

2.2 The main points of Binlog

2.2.1 Binlog event types

For the definition of Binlog, it can be considered as a sequence of single Events. These individual Events can be mainly divided into the following categories:

The emergence of various events has a significant pattern:

  • XID_EVENT marks the end of a transaction

  • When a QUERY_EVENT of DDL type occurs, it is also the end submission point of a transaction, and XID_EVENT will not appear

  • GTID_EVENT only enables GTID_MODE (MySQL version is greater than 5.6)

  • TABLE_MAP_EVENT must appear before the change data of a certain table, and there is a situation of one-to-many ROW_EVENT

In addition to the above event types that are closer to the data, there are ROTATE_EVENT (indicating that the Binlog file has been split), FORMAT_DESCRIPTION_EVENT (defining the metadata format), etc.

2.2.2 Binlog life cycle

The existence of Binlog and Innodb Log (redolog) is different. It does not rotate and overwrite files repeatedly. The Server will continuously split and generate new Binlog according to the configured single Binlog file size configuration, and record in an .index file All binlog file names on the current hard disk, and at the same time, delete expired binlog files according to the binlog expiration time. The configuration of these two self-built databases is a single size of 1G, and they are kept for 7 days.

Therefore, under the background of this mechanism, the state of historical data can only be traced in a short period of time, and it is impossible to completely trace the data changes of the database, unless it is a server that has not yet experienced log expiration and recovery. 

2.2.3 Binlog event example

Binlog is valid for the server layer. Even if no slave library is copying the main library, as long as log_bin is enabled in the configuration, the binlog file will be stored in the corresponding local directory. Use mysqlbinlog to open a sample binlog file in Row format:

As shown in the figure above, you can clearly notice three operations, creating database test, creating data table test, row changes caused by one write, and readable statements (create, alter, drop, begin, commit.....) It can be considered as QUERY_EVENT, and Write_rows is one of ROW_EVENT.

During the replication process, such Binlog data is sent to the slave library through the established connection, waiting for the slave library to process and apply.

2.2.4 Copy baseline value

Binlog is strictly ordered when it is generated, but it only has second-level physical timestamps, so it is unreliable to rely on time for positioning or sorting. There may be hundreds or thousands of events in the same second. At the same time, for the replication node In other words, effective and reliable record values ​​are also needed to locate the water level in Binlog. MySQL Binlog supports two forms of replication reference values, namely the traditional Binlog File:Binlog Position mode, and the global transaction serial number GTID available after version 5.6.

  • FILE Position

As long as log_bin is enabled, MySQL will have the location record of File Position, which is not affected by GTID.

File: binlog.000001
Position: 381808617

This concept is relatively more intuitive. It can be directly understood as being in the Binlog file with the corresponding number of File, and at the same time, the data of the total Position bytes has been generated. As shown in the example, the instance has generated a Binlog of 381808617 bytes. This The value also matches the size of the file directly viewed on the corresponding machine, so File Postion is the corresponding value of the file sequence and size.

To enable replication based on this mode, you need to explicitly specify the corresponding File and Position in the replication relationship:

CHANGE MASTER TO MASTER_LOG_FILE='binlog.000001', MASTER_LOG_POSITION=381808617;

This value must be accurate, because the data obtained from the library in this mode depends entirely on the effective opening point, so if there is a deviation, data will be lost or duplicate data will be performed and the replication will be interrupted.

  • GTID

MySQL will assign a unique global transaction ID to each transaction in the state of GTID_MODE=ON, the format is: server_uuid:id

Executed_Gtid_Set: e2e0a733-3478-11eb-90fe-b4055d009f6c:1-753

Among them, e2e0a733-3478-11eb-90fe-b4055d009f6c is used to uniquely identify the instance that generated the Binlog event, and 1-753 indicates that 753 transactions generated by the e2e0a733-3478-11eb-90fe-b4055d009f6c instance have been generated or received;

When the slave library obtains Binlog Event from the main library, its own execution record will be consistent with the obtained master library Binlog GTID record, or e2e0a733-3478-11eb-90fe-b4055d009f6c:1-753, if there is a slave library for e2e0a733- 3478-11eb-90fe-b4055d009f6c has enabled replication, then you will see the same value when executing show master status from the library itself.

If you can see a value inconsistent with the copied master library on the slave library, then it can be considered that there is an errant GTID, which is generally caused by master-slave switching or forced write operations on the slave library. Under normal circumstances, the slave library Binlog GTID should be consistent with that of the main library;

To start copying based on this mode, you don’t need to specify a specific value like File Position, you only need to set:

CHANGE MASTER TO MASTER_AUTO_POSITION=1;

After the slave library reads the Binlog, it will automatically check whether there are executed or unexecuted Binlog transactions according to its own Executed_GTID_Set records, and perform corresponding ignore and execute operations.

2.3 The specific process of copying

2.3.1 Basic replication process

When the main library has enabled binlog ( log_bin = ON ) and recorded binlog normally, how to enable replication?

Here is an introduction to MySQL's default asynchronous replication mode:

  1. First, start the I/O thread from the library, and establish a client connection with the main library.

  2. The main library starts the binlog dump thread, reads the binlog event on the main library and sends it to the I/O thread of the slave library. After the I/O thread gets the binlog event, it writes it into its own Relay Log.

  3. Start the SQL thread from the library, wait for the data in the Relay to be replayed, and complete the data update from the library.

In summary, there will only be one thread on the master library, and two threads on the slave library.

  • Timing relationship

When the cluster enters the running state, the slave library will continue to receive Binlog events from the main library and perform corresponding processing. Then this process will follow the following data flow method:

  1. Master records data changes in Binlog, and BinlogDump Thread reads the corresponding Binlog after receiving the write request

  2. The Binlog information is pushed to the I/O Thread of the Slave.

  3. The Slave I/O thread writes the read Binlog information into the local Relay Log.

  4. Slave's SQL thread reads the content in the Relay Log and executes it on the slave library.

The above processes are all asynchronous operations, so some large changes, such as DDL changes to fields, and write, update, or delete operations that affect a large number of rows will cause a surge in the delay between the master and slave. For delay scenarios, high Versions of MySQL have gradually introduced some new features to help improve the speed of transaction replay from the library.

  • The meaning of Relay Log

Relay log can be regarded as the same log file as binlog in essence, even if you open the two directly locally, you can only find few differences;

Binlog Version 3 (MySQL 4.0.2 - < 5.0.0)

added the relay logs and changed the meaning of the log position

Before MySQL 4.0, there was no Relay Log, and there were only two threads in the whole process. But this also brings a problem, that is, the replication process needs to be carried out synchronously, which is easily affected, and the efficiency is not high. For example, the main library must wait for the slave library to finish reading before sending the next binlog event. This is somewhat similar to a blocking channel and a non-blocking channel.

After the Relay Log is added to the process, the original synchronous acquisition event and replay event are decoupled. The two steps can be performed asynchronously, and the Relay Log acts as a buffer. The Relay Log contains a relay-log.info file, which is used to record the progress of the current replication and the Pos from which the next event will be written. This file is updated by the SQL thread.

There will be some differences in the special copy mode gradually introduced later, but overall, it is done according to this process.

2.3.2 Semi-synchronous replication

In the scenario of asynchronous replication, it is impossible to ensure that the slave library is updated to a consistent state with the master library in real time. If the master library fails in the background of delay, the difference data between the two cannot be guaranteed, and at the same time, it cannot be restored. In this case, read-write separation is performed, and if it is changed from asynchronous to fully synchronous, then the performance overhead will be greatly increased, and it is difficult to meet the needs of actual use.

Based on this background, MySQL has introduced a semi-synchronous replication mechanism since version 5.5 to reduce the probability of data loss. In this replication mode, MySQL makes the Master wait for an ACK (Acknowledge Character) message from a Slave node at a certain point in time. , the transaction is committed after receiving the ACK message, which can not only reduce the impact on performance, but also obtain stronger data reliability than asynchronous replication.

Before introducing semi-synchronous replication, let’s quickly go through the complete process of MySQL transaction writing when it encounters master-slave replication. The master database transaction writing is divided into 4 steps:

  1. InnoDB Redo File Write (Prepare Write)

  2. Binlog File Flush & Sync to Binlog File

  3. InnoDB Redo File Commit(Commit Write)

  4. Send Binlog to Slave

  • When the Master does not need to pay attention to whether the Slave receives the Binlog Event, it is asynchronous master-slave replication

  • When the Master needs to wait for the Slave's ACK before the 3rd step Commit Write replies to the client, it is semi-synchronous replication (after-commit)

  • When the Master needs to wait for the Slave's ACK in the second step of Flush&Sync, that is, before Commit, it is to enhance semi-synchronous replication (after-sync)

  • Timing relationship

Judging from the timing diagram of semi-synchronous replication, in fact, it is only in the stage of waiting for the ACK from the slave library in the Commit link of the master library. Here, only one ACK from the slave node is needed to continue the normal processing flow. This mode Under this condition, even if the main library is down, at least one slave library node can be guaranteed to be available, and the waiting time during synchronization is also reduced.

2.3.3 Summary

In the context of the online database version in the current production environment, the replication methods officially provided by MySQL are mainly as described above. Of course, there are still many derivative database products based on MySQL or compatible with MySQL, which can do more in terms of usability and reliability. improvement, this article will not continue to expand the description of this part.

2.4 Features of replication

The replication method mentioned so far has a notable feature: the scenario of data delay cannot be avoided. Asynchronous replication will make the data of the slave library lag behind, while semi-synchronous replication will block the writing of the master library and affect performance.

In the early replication mode of MySQL, the IO thread and the SQL thread of the slave library essentially acquire events serially and read and replay them. Only one thread is responsible for executing the Relaylog, but the main library itself can receive requests concurrently, and the upper limit of performance Only depending on the bottleneck of machine resources and the upper limit of MySQL processing power, it is difficult to align the execution of the master library and the execution of the slave library (SQL thread application events). Here is a set of test data:

  • Machine: 64-core 256G, MySQL 5.7.29

  • Test scenario: regular INSERT, UPDATE pressure test scenario

  • Results: The IO thread speed of MySQL Server is evaluated by the amount of data on the network. It exceeds 100MB per second, which can normally cover business use. However, the estimated speed of SQL threads is only 21~23MB/s. If it involves UPDATE scenarios, the performance will also decrease;

  • It should be noted that the above results are obtained on the premise that a higher version of MySQL has the parallel replication capability. If the version does not have this feature, the performance will be worse.

It is unrealistic to expect the business layer to limit the use. MySQL began to try to introduce available parallel replication solutions in version 5.6. Generally speaking, it is by trying to enhance the application speed at the slave library level.

2.4.1 Parallel replication based on Schema level

Parallel replication based on the database level is based on a very simple principle. The data and data changes in different Databases/Schemas in the instance are irrelevant and can be processed in parallel.

In this mode, MySQL slave nodes will start multiple WorkThreads, and the original SQLThread responsible for playback will be transformed into a Coordinator role, responsible for judging whether transactions can be executed in parallel and then distributed to WorkThreads.

If the transactions belong to different schemas, are not DDL statements, and have no cross-schema operations, then they can be played back in parallel. Otherwise, it is necessary to wait for all worker threads to complete execution before executing the content in the current log.

MySQL Server
 
MySQL [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| aksay_record       |
| mysql              |
| performance_schema |
| proxy_encrypt      |
| sys                |
| test               |
+--------------------+
7 rows in set (0.06 sec)

For the slave library, if it receives data changes in aksay_record and proxy_encrypt from the main library, it can process the data of these two parts of the Schema at the same time.

However, this method also has obvious defects and deficiencies. First, only when multiple Schema traffic is balanced will there be a greater performance improvement. However, if there is a hotspot table or only one Schema on the instance has data changes, then this parallel mode There is no difference from the early serial replication; similarly, although the data of different schemas are not related, such parallel execution will also affect the execution order of transactions. To some extent, the causal consistency of the entire server is destroyed.

2.4.2 Group commit-based replication (Group Commit)

Parallel replication based on Schema is ineffective in most scenarios, such as the case of one database with multiple tables, but the idea of ​​changing the single execution thread of the slave database has been continued. In version 5.7, a transaction group-based commit is added The parallel replication method, before specifically introducing the group commit strategy applied in the replication, it is necessary to introduce the logic of the server's own Innodb engine committing transactions:

Binlog storage is based on the configuration of sync_binlog. Normally, sync_binlog=1 is used, that is, fsync is initiated every time a transaction is submitted.

When the main library executes transactions concurrently on a large scale, because each transaction triggers locks to be placed on the disk, all the Binlogs are serially placed on the disk, which becomes a performance bottleneck. To solve this problem, MySQL itself introduced the group commit capability of transactions in version 5.6 (this does not refer to the logic applied on the slave library). The design principle is easy to understand. As long as the resources can be obtained at the same time, all the Prepare Transactions can all be submitted at the same time.

Under the background that the main library has this capability, it is easy to find that the slave library can also apply a similar mechanism to execute transactions in parallel. The following describes the two stages of MySQL's specific implementation:

  • Commit-Parents-Based

Writing in MySQL is based on lock-based concurrency control, so all uncommitted transactions that are in the Prepare phase at the same time on the Master side will not have lock conflicts, and can be executed in parallel when executed on the Slave side.

Therefore, when all transactions enter the prepare phase, a logical timestamp can be marked (the sequence_number of the last committed transaction is used in the implementation), and transactions with the same timestamp on the slave side can be executed concurrently.

However, this mode will rely on the submission of the previous transaction group. If it is a concurrent transaction without resource constraints, it will not be able to execute because its commit-parent has not been submitted;

  • Based on Logic-Based

The restrictions existing in Commit-Parent-Based have been lifted. The pure understanding is that only the sequence_number of the current transaction can be executed concurrently. It can be executed concurrently only based on whether the lock can be acquired and there is no conflict, instead of relying on the previous one. The sequence_number of committed transactions.

3. Application

Currently vivo's online MySQL database service standard architecture is based on a master-slave-offline asynchronous replication cluster, one of which is used to separate business read requests, offline nodes do not provide read services, and provide offline and real-time data extraction/DB for big data Platform query and backup system use; for such an application background, the storage research and development team provides two additional extension services for MySQL scenarios:

3.1 Application of high availability system + middleware

Although MySQL's master-slave replication can improve the high availability of the system, MySQL does not have the automatic failover capability similar to Redis in versions 5.6 and 5.7. If the main database fails and does not intervene, the business will actually not be written normally. Yes, in the case of a long failure time, separate reads on the slave library will also become unreliable.

3.1.1 VSQL (original high availability 2.0 architecture)

Then, on the basis of the current standard one-master-two-slave architecture, add HA high-availability components and middleware components to the system to enhance the high availability, read scalability, and data reliability of MySQL services:

  • The HA component manages the replication topology of MySQL, is responsible for monitoring the health status of the cluster, and manages automatic failover in failure scenarios;

  • The middleware Proxy is used to manage traffic, to deal with the problems of slow change resolution or ineffective cache in the original domain name scenario, to control the separation of read and write, and to realize the black and white lists of IP and SQL, etc.;

3.1.2 Data reliability enhancement

The data itself still relies on MySQL's native master-slave replication mode to synchronize in the cluster, so there is still the risk of asynchronous replication itself. For this scenario, we provide three feasible solutions:

  • Log remote replication

After configuring the central node of HA and the login machine of MySQL machines on the entire network, follow the classic MHA log file replication compensation scheme to ensure that data will not be lost when a failure occurs. In operation, the HA node will access the local file directory of the failed node to read the candidate master The missing Binlog data of the node is replayed on the candidate master.

Advantage

  • Consistent with the MHA scheme of 1.0, the old mechanism can be used directly

  • After the mechanism is transformed, it can be mixed in the high-availability capability, without the need for secret-free mutual trust between machines, reducing permission requirements and security risks

disadvantage

  • It may not be available. The machine where the faulty node is located needs to be accessible and the hard disk is normal. It cannot cope with hardware or network abnormalities.

  • The link on the network is long, and it may not be possible to control the time consumption of the intermediate replay log, resulting in the service being unavailable for a long time

  • Centralized log storage

Relying on the BinlogServer module in the data transmission service, it provides the centralized storage capability of Binlog logs. The HA component manages the MySQL cluster and BinlogServer at the same time, and strengthens the robustness of the MySQL architecture. The replication relationship of the real slave database is all established on the BinlogServer, and does not directly connect to the master database. .

Advantage

  • You can customize the storage form of the log: file system or other shared storage mode

  • Issues that do not involve machine availability and permissions

  • Indirectly improve the storage security of binlog (backup)

disadvantage

  • Additional resource usage, if you need to keep logs for a long time, the resource usage is large

  • If semi-synchronization is not enabled, there is no guarantee that all binlog logs can be collected, even if the collection (equivalent to IO thread) speed far exceeds the relay speed, the limit is about 110MB/s

  • Increased system complexity requires the risk of introducing additional links

  • change to semisynchronous replication

The MySQL cluster enables semi-synchronous replication, and prevents degradation through configuration (higher risk). Agent itself supports related monitoring of semi-synchronous clusters, which can reduce the amount of log loss during failover (compared to asynchronous replication)

Advantage

  • MySQL's native mechanism does not need to introduce additional risks

  • In essence, it is strengthening the ability of high availability (MySQL cluster itself)

  • HA components can be seamlessly connected to semi-synchronous clusters without any modification

disadvantage

  • Incompatible versions exist, may not be able to open

  • The business may not accept the consequences of reduced performance

  • Semi-synchronization cannot guarantee that no data will be lost at all. The Agent's own mechanism actually prefers the slave node with the "most execution" instead of the slave node with the "most log"

orchestrator will promote the replica which has executed more events rather than the replica which has more data in the relay logs.

At present, we are using the log remote replication solution, and this year we are planning the BinlogServer solution for centralized storage to strengthen data security; however, it is worth mentioning that semi-synchronization is also an effective and feasible method for read-multiple-write For fewer businesses, the ability to upgrade the cluster can actually be considered, which can essentially ensure the accuracy of separating read traffic.

3.2 Data transmission service

3.2.1 Cross-system data transfer based on Binlog

By using Binlog, it is already a very classic application scenario to transfer MySQL data streams to other systems in real time, including MySQL, ElasticSearch, Kafka and other MQ. The ability to synchronize changed data natively provided by MySQL makes it effective In real-time linkage between various systems, DTS (Data Transfer Service) for MySQL collection is also based on the same method as the replication principle introduced above. Here we introduce how we use the same mechanism as MySQL slave nodes to obtain data, which is also for complete Extended introduction to enable copying:

(1) How to get Binlog

There are two common ways:

  • Monitor the Binlog file, similar to the operation of the log collection system

  • The mechanism of MySQL Slave, the collector pretends to be a Slave to achieve

This article only introduces the second method, the implementation of Fake Slave

(2) Register Slave identity

Here we take GO SDK as an example. The byte range of GO is 0~255, and other languages ​​can be converted accordingly.

data := make([]byte, 4+1+4+1+len(hostname)+1+len(b.cfg.User)+1+len(b.cfg.Password)+2+4+4)
  1. Bits 0-3 are 0, meaningless

  2. The fourth bit is Command_Register_Slave in the MySQL protocol, and the byte value is 21

  3. Bits 5-8 are the 4 bytes of the server_id (not uuid, a value) preset by the current instance using little-endian encoding

  4. The next few bits are the hostname, user, password of the current instance

  5. The next 2 bits are the little-endian encoded port value

  6. The last 8 bits are generally set to 0, and the last 4 bits refer to master_id, and the fake slave can be set to 0

(3) Initiate a copy command

data := make([]byte, 4+1+4+2+4+len(p.Name))

  1. Bits 0-3 are also set to 0, which has no special meaning

  2. The fourth bit is the Command_Binlog_Dump of the MySQL protocol, and the byte value is 18

  3. Bits 5-8 are the 4-bit bytes generated by the little-endian encoding of the Binlog Position value

  4. The 9th-10th digit is the category of MySQL Dump, the default is 0, which refers to Binlog_Dump_Never_Stop, which is encoded into two 0 values

  5. Bits 11-14 are the four-byte value of the server_id (non-uuid) of the instance based on the little-endian encoding

  6. The last few digits are directly added to the Binlog File name

After the above two commands are executed through the client connection, a valid replication connection can be observed on the master library.

3.2.2 Using Parallel Copy Mode to Improve Performance

After the above two commands are executed through the client connection, a valid replication connection can be observed on the master library.

According to the early performance test results, without any optimization, the average transmission speed on the network is about 7.3MB/s without any optimization, and the average transmission speed on the network is about 7.3MB/s, which is far behind MySQL's SQL Relay speed. It is difficult to meet the demand in high-voltage scenarios.

The DTS consumption unit realizes the transaction reorganization and concurrent transaction analysis of the events consumed from Kafka, but the actual final execution is still played back to MySQL serially and single-threaded. This process makes the performance bottleneck completely focus on the serial execution. step.

  1. Prior to MySQL 5.7, the Schema attribute of the transaction was used to enable DML operations under different dbs to be played back concurrently on the standby database. After optimization, concurrency in different tables can be achieved. However, if the business is written to a library (or an optimized table) concurrently on the master side, there will be a large delay on the slave side. Schema-based parallel replication, when Slave provides the read function as a read-only instance, it can guarantee the causal order of transactions under the same schema (Causal Consistency, when discussing Consistency in this article, it is assumed that the Slave side is read-only), but cannot guarantee the transaction between different schemas. . For example, when the business is concerned about the sequence of transaction execution, write T1 in db1 on the Master side, and execute T2 in db2 after receiving the return of T1. However, on the slave side, the data of T2 may be read first, and then the data of T1 may be read.

  2. The LOGICAL CLOCK parallel replication of MySQL 5.7 removes the restriction of the schema, so that the transactions executed concurrently on a db or a table in the master database can also be executed in parallel on the slave side. The implementation of Logical Clock parallel replication was originally Commit-Parent-Based, and transactions of the same commit parent can be executed concurrently. However, in this way, transactions that can guarantee that there are no conflicts cannot be concurrent, and transactions must wait until all transactions of the previous commit parent group have been played back before they can be executed. Later, it is optimized to the Lock-Based method, so that as long as there is an overlap between the transaction and the Lock Interval of the currently executing transaction, that is, to ensure that there is no lock conflict on the Master side, it can be executed concurrently on the Slave side. LOGICAL CLOCK can guarantee non-concurrent execution of transactions, that is, when one transaction T1 is executed, another transaction T2 starts to execute the Causal Consistency in the scenario.

(1) Connection pool transformation

In the old version of DTS, each consumption task has only one maintained MySQL long connection, and all the transactions of the consumption link are serially executed on this long connection, resulting in a huge performance bottleneck, so considering the concurrent execution of transactions Requirements, it is impossible to concurrently reuse connections, so it is necessary to transform the original single connection object and upgrade it to a mechanism similar to the connection pool.

The go-mysql/client package itself does not contain a connection pool mode. Here, the number of surviving connections is expanded at startup based on the concurrency of transaction concurrency analysis.

// 初始化客户端连接数
se.conn = make([]*Connection, meta.MaxConcurrenceTransaction)

(2) Concurrent selection connection

  • use logic clock

In the mode of enabling GTID replication, the text of GTID_EVENT in the binlog will contain two values:

LastCommitted  int64
SequenceNumber int64

lastCommitted is the basis for our concurrency. In principle, transactions equal to LastCommitted can be executed concurrently. Combined with the original transaction concurrency analysis, a transaction set with the number of concurrency (configuration value) will be generated. Then analyze and judge this list, and send the transaction to the connection pool The allocation, to achieve an approximate load balancing mechanism.

  • non-concurrent item mutual exclusion

For concurrent execution scenarios, you can simply use a mechanism similar to load balancing to traverse the mysql connection from the connection pool to execute the corresponding transaction; but it should be noted that the source transaction itself has a sequence, in the logical-clock In the scenario, some transactions with concurrent prepare can be executed concurrently, but there are still quite a few transactions that cannot be executed concurrently. They are obviously scattered in the entire transaction queue. It can be considered that concurrent transactions (at least 2) cannot be executed concurrently. Surrounded by concurrent transactions:

Assume that there is a transaction queue with 6 elements, of which only t1, t2, t5, and t6 can be executed concurrently, then when t3 is executed, t1 and t2 must have been executed, and when t5 is executed, both t3 and t4 have been executed.

(3) Checkpoint update

In a concurrent transaction execution scenario, transactions with a low water level are executed later, and transactions with a high water level are executed first. According to the original mechanism, a lower water level will cover a higher water level, and there are certain risks:

  1. The structure SQL of Write_Event is adjusted to replace into, which can avoid conflicting and repeated write events; Update and Delete can be based on the concurrency guarantee of the logical clock, and will not appear.

  2. The water level will only go up, not down.

But no matter how to optimize, concurrent execution of transactions will inevitably introduce more risks, such as the uncontrollable rollback of concurrent transactions, the destruction of the causal consistency between the target instance and the source instance, etc. Businesses can make trade-offs according to their own needs, whether to enable Concurrent execution.

After the concurrent transaction execution transformation based on the logical clock, the execution performance of the consumer can be improved from 7.3MB/s to about 13.4MB/s in the same test scenario.

(4) Summary

Based on the library and table filtering of the consumption task itself, another form of concurrent execution can be realized, and multiple consumption tasks can be started to support different libraries and tables respectively. This is also the use of Kafka's multi-consumer group support, which can be expanded horizontally In order to improve concurrency performance and apply to data migration scenarios, this part can provide special support.

The method based on the logical clock is ineffective for large-scale clusters without GTID enabled on the current network. Therefore, we have been looking for better solutions for this part, such as the combination of higher-version feature Write Sets, etc. Continue to do performance optimization.

Four. Summary

Finally, the replication capability of MySQL not only greatly improves the availability and reliability of the MySQL database service itself, but also provides Binlog, a very flexible and open data interface to expand the application range of data. By using this "interface ", it is easy to achieve real-time synchronization of data in multiple different storage structures and environments. In the future, the storage group will also focus on the extended service of BinlogServer to strengthen the MySQL architecture, including but not limited to data security guarantees and downstream Data link opening, etc.

References:

Guess you like

Origin blog.csdn.net/vivo_tech/article/details/130056126