The realization principle of transaction atomicity, consistency and durability

Preface

Everyone knows that transactions have four characteristics:

  • Atomicity

    Atomicity means that the entire database transaction is an indivisible unit of work. Only if all database operations in the transaction are executed successfully, the entire transaction is considered successful. If the execution of any SQL statement in the transaction fails, the SQL statement that has been executed successfully must also be cancelled, and the database state should return to the state before the transaction was executed.

  • Consistency

    Consistency means that a transaction transforms the database from one state to the next consistent state. Before the start of the transaction and after the end of the transaction, the integrity constraints of the database have not been destroyed.

  • Isolation

    The impact of a transaction is not visible to other transactions until the transaction is committed-this is achieved by locking.

  • Durability (durability)

    Once the transaction is committed, the result is permanent. Even if a failure such as a downtime occurs, the database can recover data.

For the realization of isolation, you can refer to this article of mine: A brief analysis of InnoDB locks and transactions

This article mainly talks about how the other three features are achieved: through the redo and undo of the database . If this article does not make special instructions, the default refers to the InnoDB storage engine of mysql.

achieve

Basic knowledge

  1. The physical status of each page (Page) changes recorded in the redo log
  2. The undo log and redo log record the physical log is not the same, it is a logical log. It can be considered that when a record is deleted, a corresponding insert record is recorded in the undo log, and vice versa, when a record is updated, it records a corresponding update record.
  3. LSN is called the log sequence number (log sequence number). In the innodb storage engine, lsn occupies 8 bytes. The value of LSN will gradually increase as the log is written. The update operation in the transaction will generate a new LSN. LSN not only exists in the redo log, but also exists in the data page.
  4. checkpoint checkpoint , which means when dirty pages are written to disk

Overall process

Insert picture description here
Both logs and data are modified in the buffer and then synchronized to the disk.

  1. After the transaction is started, if the modified data is not in the log buffer, it will be read from the disk to the log buffer

  2. Before modifying the data in the memory, record the original data to the undo log

  3. Modify the data page in the memory, record the LSN in the memory data page, and call it data_in_buffer_lsn for the time being

  4. Write redo log to redo log in buffer while modifying the data page (almost at the same time), and record the corresponding LSN, which is called redo_log_in_buffer_lsn for the time being;

  5. Log flushing and data flushing

    The rules for log flushing are

    • When commit action is issued

    • Brush once per second

    • When the memory used in the log buffer exceeds half

    • When there is a checkpoint

    The rules for data flushing are

    • When there is a checkpoint

    The number of log flushing is more than that of data flushing. In addition, even during checkpoint, innoDB will ** ensure that the log needs to be written before writing data. This method is called Write-Ahead Logging (Write-Ahead Logging, WAL). **The InnoDB storage engine guarantees the integrity of the transaction by pre-writing the log.

    The reason why the write-ahead log method can ensure the integrity is that there are LSNs in the log and data pages, and the order of data records in the log and data pages can be compared through LSN. The log file is the real core.

Instance

Source: Detailed analysis of MySQL transaction log (redo log and undo log)

Insert picture description here
In the above figure, the horizontal lines from top to bottom represent: the time axis, the LSN recorded in the data page in the buffer (data_in_buffer_lsn), the LSN recorded in the data page in the disk (data_page_on_disk_lsn), and the LSN recorded in the buffer redo log. redo_log_in_buffer_lsn), the LSN recorded in the redo log file on the disk (redo_log_on_disk_lsn), and the LSN recorded by the checkpoint (checkpoint_lsn).

Assuming that at the beginning (12:0:00), all log pages and data pages have been flushed, and the LSN of the checkpoint has also been recorded. At this time, their LSNs are completely consistent.

Suppose that a transaction is started at this time, and an update operation is performed immediately. After the execution is completed, the data page and redo log in the buffer record the updated LSN value, which is assumed to be 110. At this time, if you execute show engine innodb status to view the value of each LSN, that is, the position status at ① in the figure, the result will be:

log sequence number(110) > log flushed up to(100) = pages flushed up to = last checkpoint at

Then a delete statement was executed, and the LSN increased to 150. Wait until 12:00:01, trigger the redo log flushing rules (one of which is that the default log flushing frequency controlled by innodb_flush_log_at_timeout is 1 second), then the LSN in the redo log file on disk will be updated to and redo log The LSN of in buffer is the same, so they are all equal to 150. At this time, show engine innodb status, which is the position of ② in the figure, will result in:

log sequence number(150) = log flushed up to > pages flushed up to(100) = last checkpoint at

After that, an update statement is executed, and the LSN in the cache will increase to 300, which is the position of ③ in the figure.

Assuming that the checkpoint appears subsequently, which is the position of ④ in the figure, as mentioned earlier, the checkpoint will trigger the data page and log page flushing, but it takes a certain time to complete, so before the data page flushing is completed, check The LSN of the point is still the LSN of the last checkpoint, but at this time the LSN of the data page and log page on the disk has increased, namely:

log sequence number > log flushed up to 和 pages flushed up to > last checkpoint at

However, the size of log flushed up to and pages flushed up to cannot be determined, because log flushing may be faster than data flushing, or it may be equal to, or slower. However, the checkpoint mechanism protects that the data flushing speed is slower than the log flushing: when the data flushing speed exceeds the log flushing, the data flushing will be temporarily stopped and waiting for the log flushing progress to exceed the data flushing.

When the data page and log page are flushed, that is, when the position ⑤ is reached, all LSNs are equal to 300.

As time goes by, it reaches 12:00:02, which is the position ⑥ in the figure, which triggers the log flushing rule, but at this time, the log LSN in the buffer is the same as the log LSN in the disk, so log flushing is not performed. Disk, that is, all lsns are equal in show engine innodb status at this time.

Then an insert statement is executed, assuming that the LSN in the buffer has increased to 800, which is position ⑦ in the figure. At this time, the size and position of the various LSNs are the same.

Subsequent execution of the submission action, that is, position ⑧. By default, the submission action will trigger log flushing, but will not trigger data flushing, so the result of show engine innodb status is:

log sequence number = log flushed up to > pages flushed up to = last checkpoint at

Finally, with the passage of time, the checkpoint appeared again, that is, position ⑨ in the figure. However, this checkpoint will not trigger log flushing, because the LSN of the log has been synchronized before the checkpoint occurs. Assuming that the data flashing speed is extremely fast this time, it will be completed within a moment and the status change cannot be captured, then the result of show engine innodb status will be the same LSN.

restore

The recovery strategy of mysql is:

  1. When recovering, first redo all transactions according to redo, including uncommitted transactions
  2. Then roll back uncommitted transactions according to undo.

Through this strategy, the atomicity, consistency and durability of the transaction are guaranteed.

In addition, the checkpoint mechanism is introduced, and when restoring, you only need to restore from the checkpoint position.

Impressions

  1. Many contents on the Internet may be inaccurate. Even this article I wrote may have inaccuracy. The best way to solve this problem is to read the source code.
  2. Different techniques may achieve different, but the core principles are often interlinked, and these core principles are generally built on the basis of knowledge on
  3. Learning knowledge can learn different depths. After reading " MySQL Technical Insider: InnoDB Storage Engine ", you can get a general grasp of the use of mysql, but the understanding is not in-depth. With the writing of these two articles, the understanding of MySQL is more in-depth Some, if you need to go deeper, you really need to look at the source code. This thing feels like it takes more time. Therefore, you need to know what level you need to understand, and use limited time to do more cost-effective things . Choice is very important.

data

  1. In-depth study of MySQL transactions: the realization principle of ACID features
  2. Detailed analysis of MySQL transaction logs (redo log and undo log)
  3. https://blog.csdn.net/suerge_storm/article/details/90484944
  4. https://blog.csdn.net/qq_41151659/article/details/99559397

At last

If you like my article, you can follow my public account (Programmer Mala Tang)

Review of previous articles:

  1. The realization principle of transaction atomicity, consistency and durability
  2. How to exercise your memory
  3. Detailed explanation of CDN request process
  4. Thoughts on the career development of programmers
  5. The history of blog service being crushed
  6. Common caching techniques
  7. How to efficiently connect with third-party payment
  8. Gin framework concise version
  9. Thinking about code review
  10. A brief analysis of InnoDB locks and transactions
  11. Markdown editor recommendation-typora

Guess you like

Origin blog.csdn.net/shida219/article/details/106970517