Didi Side: Talk about MySQL master-slave data synchronization mechanism

Said it in front

In the reader exchange group (50+) of Nien, a 40-year-old architect , some friends have recently obtained first-tier Internet companies such as Didi, Alibaba, Autohome, Jitu, Youzan, Xiyin, Baidu, NetEase, Didi's interview qualifications encountered a few very important master-slave synchronization interview questions:

  • Let’s talk about the process of MySQL master-slave synchronization
  • Let’s talk about several ways of master-slave synchronization in MySQL
  • Talk about MySQL master-slave data synchronization mechanism

The importance of master-slave synchronization:

  • To solve the problem of data reliability, master-slave synchronization is required;
  • To solve the high availability problem of MySQL service, master-slave synchronization is required;
  • When dealing with high concurrency, master-slave synchronization is still needed.

Therefore, here Nien will give you a systematic and systematic review, so that you can fully demonstrate your strong "technical muscles" and make the interviewer "can't help himself and drool" .

This question and the reference answers are also included in the V109 version of our " Nion Java Interview Guide " for reference by subsequent friends to improve everyone's 3-level architecture, design, and development levels.

Note: This article is continuously updated as a PDF. For the latest PDF files of Nion architecture notes and interview questions, please obtain them from the public account [Technical Freedom Circle].

1. MySQL master-slave synchronization process

When the client submits a transaction to the MySQL cluster, until the client receives a successful response from the cluster, during this process, the MySQL cluster needs to perform many operations:

  • The main library requires:
  • commit transaction
  • Update data in the storage engine
  • Write Binlog to disk
  • Return response to client
  • Copy Binlog to all slave libraries
  • Each slave library requires
  • Write the copied Binlog to the temporary log
  • Play back this Binlog
  • Update data in the storage engine
  • Return a successful copy response to the main library

The timing of these operations is very important. The timing here refers to the order of these operations. The same operation, because the timing is different, has a big difference for the application.

For example, if you copy the Binlog first, wait for the Binlog to be copied to the slave node, and then the master node submits the transaction. In this case, the Binlog of the slave node is always synchronized with the master node, and the master node will not be down under any circumstances. Data will be lost.

However, if the timing is reversed and the transaction is committed first and then the Binlog is copied, the performance will be greatly improved, but the risk of data loss will also increase.

MySQL provides several parameters to configure this timing. Let's first take a look at what the default timing looks like.

2. Three ways of master-slave synchronization

1. Asynchronous replication

By default, MySQL uses asynchronous replication, and the thread that performs transaction operations will not wait for the thread that copies the Binlog.

After the client submits a transaction request to the MySQL main library, the main library will first record the transaction to Binlog, then submit the transaction and update the data of the storage engine. After the transaction is successfully submitted, a success response is returned to the client.

At the same time, the slave library will open a dedicated replication thread, receive the Binlog of the master library, write it to the relay log, and then return a successful copy response to the master library.

In addition, the slave library also has a Binlog playback thread, which is used to read the relay log and play back the Binlog to update the data of the storage engine. This process has nothing to do with the master-slave replication relationship we discussed today, so it is not shown in the figure.

The two processes of transaction submission and replication are executed independently in different threads without waiting for each other. This is asynchronous replication.

Once we understand the order of asynchronous replication, we can more easily understand the causes of some of the problems mentioned in the previous lessons.

For example, under asynchronous replication, why is there a risk of data loss when the main database is down? Why does read-write separation cause problems with reading dirty data?

These problems arise because asynchronous replication cannot guarantee that data can be copied to the slave database in the first time.

The advantage of asynchronous replication is superior performance, but the disadvantage is poor data security. At a certain moment, the data difference between the master and the slave may be large. If the master crashes, some data may be lost when the slave takes over.

2. Synchronous replication

The difference between fully synchronous replication and semi-synchronous replication is that fully synchronous replication must receive acks from all slave databases before committing the transaction.

Synchronous replication is basically unusable in actual projects for two reasons:

  • First, the performance is very poor because the response must be copied to all nodes before the response is returned;
  • Second, the availability is also very poor. If any database problem occurs in the main database and all slave databases, the business will be affected.

Fully synchronous replication has the best data consistency, but the performance is also the worst.

3. Semi-synchronous replication

In order to solve this problem, MySQL has added a semisynchronous replication method starting from version 5.7.

  • In asynchronous replication, the transaction thread does not need to wait for the replication response at all;
  • In synchronous replication, the transaction thread must wait for all replication responses;
  • Semi-synchronous replication is located between the two. The transaction thread does not need to wait for all successful replication responses. It only needs a part of the replication responses to return before it can feedback to the client.

After the master update operation is written to the Binlog, the slave will be actively notified. After receiving the update, the slave will write the Relay Log to respond. The master only needs to receive at least one ACK response before it can commit the transaction.

It can be found that compared with asynchronous replication, semi-synchronous replication requires at least one slave to write Binlog to Relay Log, which reduces performance to a certain extent, but can ensure that at least one slave database is consistent with the master's data, thereby improving data safety.

Semi-synchronous replication takes into account the advantages of asynchronous replication and synchronous replication. If the main database goes down, at least one slave database will have the latest data, and there will be no risk of data loss.

Moreover, the performance of semi-synchronous replication is not bad, and it can also provide high availability guarantee. The downtime of the slave database will not affect the services provided by the main database. Therefore, this compromised replication method, semi-synchronous replication, is also a good choice.

3. Things to note about semi-synchronous replication

Next, we will introduce to you several issues that require special attention when choosing semi-synchronous replication in practical applications.

When configuring semi-synchronous replication, there is a key parameter rpl_semi_sync_master_wait_no_slave, which means: "Wait for at least several slave nodes to complete data replication before returning."

The larger the value set for this parameter, the smaller the risk of data loss, but the performance and availability of the cluster will be reduced accordingly. The maximum value can be set to the same number of slave nodes, which becomes synchronous replication.

Normally, the default value of 1 is sufficient, which can minimize performance loss while ensuring high availability. As long as there is still a slave library running normally, it will not affect the read and write operations of the main library. The risk of data loss is also small. Unless there is a problem with the master database and the slave database with the latest data at the same time, data loss may occur.

Another important parameter is rpl_semi_sync_master_wait_pointthat it controls whether the thread executing the transaction in the main library waits for replication before committing the transaction (AFTER_SYNC), or after committing the transaction (AFTER_COMMIT). The default value is AFTER_SYNC, which means waiting for replication before committing the transaction, which ensures that no data is lost. AFTER_COMMIT has better performance and does not lock the table for a long time, but there is still a risk of data loss due to a host crash.

In addition, although we set up synchronous or semi-synchronous replication and wait for the replication to succeed before committing the transaction, there is still a situation that is easily overlooked and may lead to the risk of data loss.

If the thread that commits the transaction in the main database waits for replication for more than the set threshold, the transaction will still be submitted normally. In addition, MySQL will automatically downgrade to asynchronous replication mode until there are enough ( rpl_semi_sync_master_wait_no_slave) slave databases to catch up with the master database before reverting to semi-synchronous replication. If the main database crashes during this period, there is still a risk of data loss.

Say it at the end

Interview questions related to master-slave synchronization are very common interview questions.

If everyone can answer the above content fluently and thoroughly, the interviewer will basically be shocked and attracted by you.

Before the interview, it is recommended that you systematically review the 5,000-page " Nien Java Interview Guide PDF ". If you have any questions during the question review process, you can come to talk to Nien, a 40-year-old architect.

In the end, the interviewer loved it so much that he "can't help himself and his mouth watered" . The offer is coming.

Recommended reading

" Ten billions of visits, how to design a cache architecture "

" Multi-level cache architecture design "

" Message Push Architecture Design "

" Alibaba 2: How many nodes do you deploy?" How to deploy 1000W concurrency?

" Meituan 2 Sides: Five Nines High Availability 99.999%. How to achieve it?"

" NetEase side: Single node 2000Wtps, how does Kafka do it?"

" Byte Side: What is the relationship between transaction compensation and transaction retry?"

" NetEase side: 25Wqps high throughput writing Mysql, 100W data is written in 4 seconds, how to achieve it?"

" How to structure billion-level short videos? "

" Blow up, rely on "bragging" to get through JD.com, monthly salary 40K "

" It's so fierce, I rely on "bragging" to get through SF Express, and my monthly salary is 30K "

" It exploded...Jingdong asked for 40 questions on one side, and after passing it, it was 500,000+ "

" I'm so tired of asking questions... Ali asked 27 questions while asking for his life, and after passing it, it's 600,000+ "

" After 3 hours of crazy asking on Baidu, I got an offer from a big company. This guy is so cruel!"

" Ele.me is too cruel: Face an advanced Java, how hard and cruel work it is "

" After an hour of crazy asking by Byte, the guy got the offer, it's so cruel!"

" Accept Didi Offer: From three experiences as a young man, see what you need to learn?"

"Nien Architecture Notes", "Nien High Concurrency Trilogy", "Nien Java Interview Guide" PDF, please go to the following official account [Technical Freedom Circle] to get ↓↓↓

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/133046659