Record several replication methods of MySQL master-slave replication

Asynchronous replication
Multi-threaded replication
Enhanced semi-synchronous replication

Asynchronous replication

MySQL replication is asynchronous by default. Master-slave replication requires at least two MYSQL services. These MySQL services can be distributed on different servers or on the same server.

MySQL master-slave asynchronous replication is the most common replication scenario. The integrity of the data depends on the BINLOG of the main library not being lost. As long as the BINLOG of the main library is not lost, even if the main library is down, we can manually synchronize some of the lost data to the slave library through BINLOG.

Note: When the main library is down, the DBA can manually access the main library binlog through the mysqlbinlog tool, extract the missing logs and synchronize them to the secondary library; or configure the highly available MHA architecture to automatically extract the missing data to complete the secondary library , Or enable Global Transaction Identifiers (GTID) to automatically extract missing binlogs to the slave library.

MySQL records transactions (or SQL statements) in BINLOG, which means that for engines that support transactions (such as InnoDB), BINLOG needs to be written when each transaction is committed; for engines that do not support transactions (such as MyISAM) , When each SQL statement is executed, some BINLOG is needed. In order to ensure the safety of Binlog, MySQL introduces the sync_binlog parameter to control the frequency of flushing BINLOG to disk.

show variables like 'sync_binlog';

By default, sync_binlog=1, which means that MySQL needs to flush BINLOG to disk before the transaction is committed. In this case, even if the database host operating system crashes or the host suddenly loses power, the system will lose at most prepared transactions; Set sync_binlog=1 to ensure data security as much as possible.
sync_binlog=0, indicating that MySQL does not control the refresh of binlog, and the file system itself controls the refresh of the file cache.
sync_binlog=N, if N is not equal to 0 or 1, the refresh method is similar to sync_binlog=1, except that the refresh frequency will be extended to the N times after the binlog is submitted to the group.

The above is traditional asynchronous replication. Before the arrival of parallel replication technology (also known as multi-threaded replication) in MySQL 5.7, efficiency issues were most criticized. Slave delay is a stubborn problem, although schema-level parallel replication has appeared before. But the actual effect is not good.

Multi-threaded replication

In MySQL 5.7, a brand-new multi-threaded replication technology is introduced, which solves the problem that when the data under the same schema of the master is changed, the slave library cannot be used concurrently, and at the same time, the advantages of the binlog group submission are fully utilized. , Which guarantees the ability to concurrently apply Relay Log from the library.

In MySQL 8.0, multi-threaded replication has been technically updated and introduced the concept of writeset. In the previous version, if the same session of the main library executes multiple transactions of different related objects in sequence, for example, execute first The data of Update A table is executed again, and the data of Update B table is executed. After BINLOG is copied to the slave database, these two transactions cannot be executed in parallel. The arrival of writeset breaks this limitation.

Enhanced semi-synchronous replication

The replication described above is an asynchronous operation, and there will inevitably be a certain delay between the data of the main library and the slave library. There is a hidden danger: when a transaction is written to the main library and submitted successfully, and the slave library has not yet obtained the main library’s data. When BINLOG logs, the main library is down unexpectedly due to disk damage, memory failure, power failure, etc., which causes the BINLOG of the transaction on the main library to be lost. At this time, the slave library will lose this transaction, resulting in inconsistency between the master and the slave. Past issues: a summary of one hundred interview questions

In order to solve this problem, starting from MySQL 5.5, semi-synchronous replication was introduced. The technology at this time is called traditional semi-synchronous replication for the time being. After the development of this technology to MySQL 5.7, it has evolved into enhanced semi-synchronous replication (also Becomes a lossless copy). During asynchronous replication, the master database can successfully return to the client after executing the Commit operation and writing the BINLOG log, without waiting for the BINLOG log to be transmitted to the slave database, as shown in the figure.

In semi-synchronous replication, in order to ensure that every BINLOG transaction on the main database can be reliably replicated to the slave database, the main database does not feed back to the front-end application user in time when each transaction is successfully submitted, but waits at least After a slave library (see the parameter rpl_semi_sync_master_wait_for_slave_count for details) also receives the BINLOG transaction and successfully writes the relay log, the master library returns the Commit operation to the client (whether it is traditional semi-synchronous replication or enhanced semi-synchronous replication, the purpose They are all the same, except that the two methods have a different seat, which will be explained below)

Semi-synchronous replication ensures that after the transaction is successfully submitted, there are at least two log records, one on the BINLOG log of the master database, and the other on the Relay Log of at least one slave database, thereby further ensuring the data. Completeness. Past issues: a summary of one hundred interview questions

In traditional semi-synchronous replication, the master database writes data to BINLOG, and after the Commit operation is executed, it will always wait for the ACK from the slave library, that is, after the slave library writes the Relay Log, the data is placed on the disk and the message is returned to the master library. Notify the main library that it can return to the front-end application to operate successfully. This will cause a problem. In fact, the main library has already committed the transaction to the transaction engine layer. The application can already see that the data has changed, but it is just waiting for the return. When the master library is down, it is possible that the slave library has not been able to write the Relay Log, and the master and slave libraries are inconsistent.

The enhancement of semi-synchronous replication is to solve this problem. It has been fine-tuned, that is, after the master library writes data to BINLOG, it starts to wait for the response ACK from the slave library until at least one slave library writes the Relay Log, and then puts the data on the disk. The message is returned to the main library to notify the main library that it can perform the Commit operation, and then the main library starts to submit to the transaction engine layer, and the application can see that the data has changed at this time. The general flow of enhanced semi-synchronous replication is shown in the figure below.

In the semi-synchronous replication mode, if the slave database is down or the network is delayed when the BINLOG log is transferred to the slave database, the BINLOG is not transferred to the slave database evenly. At this time, the transaction on the master database will wait for a period of time (length of time) (Determined by the number of milliseconds set by the parameter rpl_semi_sync_master_timeout), if BINLOG cannot be successfully sent to the slave database within this period of time, MySQL automatically adjusts the replication to asynchronous mode, and the transaction returns the submitted result to the client normally.

Semi-synchronous replication largely depends on the network conditions between the master and slave libraries. The smaller the round-trip delay RTT, the better the real-time performance of the slave libraries. In layman's terms, the faster the network between the master and slave libraries, the more real-time the slave libraries.

Note: RTT (Round-Trip Time) is an important performance indicator in computer networks. It represents the total elapsed time from the start of sending data at the sender to the receipt of the confirmation from the receiving end (this may be a bit confusing, We can understand it as the first two handshake of the TCP three-way handshake).

Source: cnblogs.com/itbsl/p/13507401.html