Undo Log and Redo Log must be clear this time

Transactions and ACID

When we learn about databases, we often see the terms transaction and ACID.

What is a transaction?

In a database system, a transaction refers to a complete logical process consisting of a series of database operations.

For example, bank transfer:
1. Deduct the amount from the original account;
2. Add the amount to the target account.

The sum of these two database operations constitutes a complete logical process that cannot be split. This process is called a transaction and has ACID properties.

So what is ACID?

The definition of ACID on Wikipedia is as follows:

ACID refers to the four characteristics that a database management system (DBMS) must have in order to ensure that transactions are correct and reliable in the process of writing or updating data: atomicity (atomicity, or indivisibility) , consistency (consistency), isolation (isolation, also known as independence), persistence (durability).

  • Atomic (Atomic): In the same business process, a transaction guarantees that multiple data modifications are either successful at the same time or are revoked together. For example, transfer, either the transfer is successful or the transfer fails, and there is no such thing as half of the transfer.
  • Isolation: In different business processes, transactions ensure that the data being read and written by each business are independent of each other and will not affect each other. Databases generally have four isolation levels: Read Uncommitted, Read Committed, Repeatable Read, and Serializable.
  • Durability: The transaction should ensure that all successfully submitted data modifications can be correctly persisted, that is, saved to disk without data loss.
  • Consistency: Ensure that the data in the system is correct, there will be no contradictions between different data, and the results are consistent.

In fact, the ultimate goal of atomicity, isolation, and persistence is for data consistency.

How to achieve Atomicity and Persistence

Atomicity ensures that multiple operations in a transaction either succeed or fail, and there is no half-success. Persistence ensures that once a transaction takes effect, data will not be modified or lost for any reason.

So what if atomicity and persistence can be achieved?

We can easily think of it, wouldn't it be enough for the database to write the data to the disk?

Yes, this is the correct method, but the problem is that the "write to disk" operation is not atomic, and the write operation can have the status of writing, writing, writing success, and even writing failure.

And a transaction often includes multiple operations. For example, when we place an order online, it generally involves these operations: debiting money from our account, adding money to the merchant's account, subtracting the inventory of the product, etc. . These operations are in a transaction, that is to say, either all succeed or all fail.

crash recovery

If 100 yuan is deducted from our account, this operation is successfully written to the disk, and when we add 100 yuan to the merchant, the system crashes (so unlucky?), or there is a power outage (won’t it?), resulting in writing Login failed (isn't it common?).

In order to avoid this from happening, the database has to find a way to know what the complete operation was like before the system crashed, so that after the server recovers, the database will rewrite the part of the data that has not had time to be written to the disk, and give the merchant Add 100 yuan to the account to complete the unfinished business.

So the question is, how does the database know all the information about previous transactions after the system is restored?

A good memory is not as good as a bad pen, why don't we just write it down first?

Redo Log

This requires the database to record all operations of the transaction before writing to the disk, such as what data is modified, which memory page and disk block the data is physically located in, what value is changed to what value, and so on. The form is first written to disk.

Only after all the log records are safely placed on the disk, and then write "Commit Record" at the end, it means that I have finished writing all the operation records.

At this time, the database will modify the real data according to the information in the log. After the modification is completed, an "End Record" will be added to the log, indicating that I have completed all the steps in the log, and the work of transaction persistence That's it.

This transaction implementation method is called "Commit Logging".

The principles of this method to achieve data persistence and atomicity are as follows:

First of all, once the log is successfully written into the Commit Record, it means that all information related to the transaction has been written into the log. If the system crashes during the process of modifying data, after restarting, just re-operate according to the content of the log. , which guarantees persistence.

Secondly, if the system crashes before the log is finished, after the system restarts, the database sees that there is no Commit Record in the log, which means that the log is incomplete and has not been written, then mark this part of the log as rollback state, the entire transaction is rolled back, which ensures atomicity.

In other words, I first record the things I want to change in the log, and then write them to the disk according to the log. In case I faint during the process of writing to the disk, when I wake up, I will Check the integrity of the log first.

If the log is complete and there is a Commit Record in it, I will do it again according to the log, and it will be successful in the end. If the log is incomplete and there is no Commit Record in it, I just roll back the entire transaction and do nothing.

This log is called Redo Log, that is, "redo log". For a database that crashes halfway, the transaction is redone based on this log.

Undo Log

However, there is a problem with Redo Log, that is, the efficiency is too slow.

Because all real changes to data by the database must occur after the transaction is committed, and only after the log is written to the Commit Record.

Even if the disk I/O is free enough before the transaction is committed, even if the amount of data modified by a certain transaction is very large and occupies a large amount of memory buffer, no matter what the reason is, it is never allowed to start modifying the data on the disk before the transaction is committed. In case the system crashes, who is responsible for the data trip?

But when the amount of data in a transaction is particularly large, wait for all changes to be written to the Redo Log and then write them to the disk uniformly, so the performance will not be very good, it will be very slow, and the boss will be unhappy.

Can you secretly write some data to the disk before the transaction is committed (sneak away)?

The answer is yes, this is the STEAL strategy, but here comes the problem. If you write data secretly, in case the transaction needs to be rolled back, or the system crashes, the data written in advance will become dirty data, and you must find a way Just restore it.

This requires the introduction of Undo Log (rollback log). Before secretly writing data, you must first record in Undo Log what data has been written and what has been changed. When the transaction is rolled back, follow the Undo Log log. , one by one returned to its original appearance, as if it had not been modified.

Undo Log also has another function, which is to implement multiple row version control (MVCC). When a read row is locked by other transactions, it can obtain the previous data of the row record from Undo Log, thereby providing the row Version information for users to read.

Summarize

The difference between Undo Log (redo log) and Redo Log (rollback log) is not so profound, we just need to understand it literally.

Redo Log (redo log) is used to restore data after a system crash, allowing the database to redo the things that were not done well according to the log. With Redo Log, you can ensure that even if the database crashes and restarts, the previously submitted records will not be lost. This capability is called crash-safe.

Undo Log (rollback log) is for rollback. Start writing data before the transaction is committed. In case the transaction does not plan to be committed at the end and needs to be rolled back, or the system crashes, the data written in advance will become dirty data. At this time, Undo Log must be used to restore up.

This way of writing the log before writing to the disk is called: Write-Ahead Logging (WAL). WAL makes the performance higher, but at the same time it is more complicated. Although it is more complicated, the effect is very good. Mysql, sqlite, Postgresql, sql server and other databases have implemented the WAL mechanism.

Guess you like

Origin blog.csdn.net/zhanyd/article/details/122582031