A Design Idea of MySQL Backup and Recovery

background

Let me first explain the background. Due to certain factors, our company’s current backup strategy adopts the next-day backup plan, and the incremental backup uses the binlog server method. So how to quickly restore has become our need to think The problem

Recovery demand

According to some of my past experience, there are usually the following scenarios where data needs to be restored from backup:

  1. The library was deleted by mistake

  2. The table was deleted by mistake, the type is TRUNCATE or DROP

  3. The column was deleted by mistake, the type is ALTER... DROP COLUMN

  4. The data was deleted by mistake, the type is DELETE or UPDATE or REPLACE

  5. Table space is damaged or bad blocks appear

According to the scenario, we can be roughly divided into two categories:

  • The first type is irreversible recovery, that is, the usual DDL, such as the above-mentioned 1, 2, 3, and 5 scenarios

  • The second type is reversible recovery, usually binlog can be used for rollback (requires binlog format to be ROW, binlog_image to FULL), which corresponds to the above scenario 4

Generally speaking, the recovery requirements of the second category are relatively easy to deal with. Binlog rollback tools can be used. For example, binlog2sql and MyFlash are well-known in the industry. I will not repeat them here. We will focus on the first category of requirements.

In order to achieve the purpose of rapid recovery, the way that DBAs in the industry often use is to deploy a delayed slave library to solve it. All of our company's current core DBs have deployed delayed slave libraries. But even if there is a delayed slave library, suppose we miss the delayed time, or specify the wrong location when using the delayed recovery from the library later, resulting in the mistaken deletion of DDL and the slave library. At this time, we have no way to use it. Delay from the Ku this life-saving straw.

Full recovery (recovery from different machines)

At this time, we can only restore data through backup. First of all, we need to restore the full backup, usually the physical backup of the xtrabackup backup. Assuming that your backup is on a remote machine, you may need to do the following steps to perform a full recovery:

  1. Backup scp or rsync to the target instance machine

  2. If the backup file is compressed, it needs to be decompressed

  3. After decompression is complete, you need to apply redo log

  4. Change file permissions

  5. Suppose you copy the files directly to the datadir directory of the target instance, then you can directly start mysqld in this step. If not, then you also need to move-back or copy-back the data file to the datadir of the target instance

  6. Instance launch

Supplementary recovery

At this point, the full recovery has been completed, and the next thing to do is incremental recovery. According to our previous backup plan, we need to use binlog to restore incremental data. For binlog recovery, we usually need the following steps

  1. Determine the binlog site corresponding to Quanji, which is the starting point for recovery

  2. Analyze the binlog of the main library to determine the location of the accidentally deleted data, as the end of our recovery

  3. Use mysqlbinlog —start-position —stop-position+pipe to restore binlog to the target instance

There are many ways to restore binlog. You can use the binlog on the original master or the binlog on the binlogserver. All you need to do is to find the end of the binlog recovery.

Supplementary recovery optimization

At this point, you may feel that using binlog to restore is a bit troublesome. This is indeed the case. There is no way to specify which GTID to restore to with the mysqlbinlog command. You can only parse the binlog and find the pos position corresponding to the GTID that needs to be restored. This will be more troublesome for automation. In addition, if you use the mysqlbinlog command to restore, it is a single-threaded restore. If the amount of binlog that needs to be restored is relatively large, then the time for this incremental restore can be imagined.

So is there any way to speed up binlog application? Here we think of the parallel replication of MySQL 5.7. If we can use the parallel replication of sql thread, will this problem be solved?

binlog recovery on master

Let's go back to the point of full recovery. Can we use the new instance as the slave of the original master and restore it to the specified GTID position? Yes, this is a very simple, easy and error-prone way, and can also use the principle of parallel replication to accelerate the purpose of binlog applications. But one of the requirements of this method is that the oldest binlog of the original master contains the starting recovery site we need. This is easy to think of, so this will become our preferred recovery method.

binlog recovery on binlogserver

Assuming that the binlog on the original master has been purged, then we need to recover from the binlog. Some people might think of copying the binlog on the binlogserver to the original master, and then modifying the binlog index to achieve the registration purpose. In fact, this is not desirable. For specific reasons, see "Manually registering the binlog file causing master-slave exception".

What can we do? It is to use binlogserver to disguise the master, and then change from the library. The idea is to deceive the slave, let the slave's io_thread pull the missing binlog, and sql_thread will apply the binlog event in parallel (we will demonstrate this method in the next section) .

Optimized recovery process

After optimization, our backup recovery process has become. First, restore through the binlog on the master. If it is found that the binlog on the master has been purged, then restore through the binlog on the binlogserver. This way I think it is A more scientific and reasonable recovery process.

Comparison of the timeliness of various recovery methods

Business recovery

At this point, we have completed the full + incremental backup data recovery. At this time, we need to confirm the data with R&D. After the confirmation is completed, the corresponding table will be restored to the original master. The usual methods are:

  1. mysqldump export + import target instance

  2. Table space transfer

to sum up

This section mainly introduces the design process of backup and recovery. When we have no way to optimize the full recovery, we optimize the incremental backup method and process to shorten the recovery time. And the point that needs to be explained is that I have not fully tested the information in this section. I do not guarantee that every point is correct. Further verification is needed. After the verification is passed, I will notify you and combine it with the existing database. Dimensional platform to achieve automatic recovery

Finally, a few reminders:

  1. Data is intangible property, please make sure to backup and verify the backup

  2. If possible, try to deploy delayed slave libraries

  3. Make a recovery plan, so as not to be in a hurry when you recover.

  4. Choose the appropriate recovery method according to the scene and shorten the recovery time as much as possible

The full text is over.

Enjoy MySQL :)

Scan the QR code to follow the author

Teacher Ye's "MySQL Core Optimization" class has been upgraded to MySQL 8.0, scan the code to start the journey of MySQL 8.0 practice

Guess you like

Origin blog.csdn.net/n88Lpo/article/details/108722023