Didi, please see the realization principle of the four major characteristics (ACID) of MYSQL transactions: understand its principle and understand its implementation.

1. What are the four characteristics of a transaction

  • Atomicity (or indivisibility)
  • Consistency
  • Isolation
  • Durability

Next, we will analyze the specific concepts of the four major features and their underlying implementation principles:
Before describing the specific four major features, let's add a little pre-knowledge:

1. Logical architecture and storage engine

As shown above, we can divide the logical architecture of the mysql server into three layers:

① The first layer: responsible for client connection and authorization authentication , etc.

②The second layer: server layer: responsible for parsing, optimizing and caching query statements, implementing and storing built-in functions, etc.

③The third layer: storage engine: responsible for storing and reading data in the database , the mysql server layer does not manage transactions, and transactions are implemented by storage engines. Among them, the storage engines that support transactions include innoDB and NDB Cluster, among which innoDB is widely used.

We will introduce the four characteristics in detail:

1. Atomicity

1. Definition

Atomicity means that a transaction is an indivisible unit of work, and the operations in it are either done or not done; if a SQL statement in the transaction fails to execute, the executed statement must also be rolled back, and the database returns to the front of the transaction status.

2. Implementation principle: redolog

Before talking about the redolog log in detail, let's explain the transaction logs that exist in mysql: there are many types of transaction logs in mysql: including binary logs, error logs, and query logs. In addition, innoDB also provides two logs: undolog (rollback log) and redolog (redo log), where undolog is an important guarantee for the atomicity and consistency of data, and redolog is used to ensure the persistence of transactions sex.

UNDOLOG

Undolog is an important guarantee for transaction atomicity. Undolog can make all SQL statements that have been successfully executed in a transaction roll back. The specific workflow is as follows: When a transaction modifies data in the database, innoDB will generate specific UNDO log, once the transaction execution fails or the rollback operation is triggered, the information in the redolog can be used to restore the value before the database modification.

Undolog is a logical log. When rollback is triggered or transaction execution fails, innoDB will perform operations in the opposite direction on the database according to the information recorded in undolog. For example: the previous operation was an insert statement, and the delete statement is called at this time; if The update statement executed before will execute the update statement in the opposite direction.

2. Persistence

1. Definition

Persistence means that once a transaction is committed, its changes to the database should be permanent. Subsequent other operations or failures should not have any effect on it.

2. Implementation principle

redolog

Before we talk about transaction persistence, let’s first talk about the background of redolog’s existence and the necessity of its existence:

The read and write operations on the database are operations on the data in the database, that is to say, IO operations on the database are required, but the frequent IO to the database is very inefficient. For this reason, innoDB provides a cache layer (BufferPool), which stores The mapping of some pages in the database is used as a buffer for database data: when we need to read in data from the database, we first search from the BufferPool, but we cannot find it in the BufferPool. At this time, we obtain the data from the database and then transfer It is put into the BufferPool, and writing data to the database is also to first write the data to the BufferPool, and then periodically refresh it to the disk (this process is called dirty brushing).

However, while bringing convenience, there are also certain risks and disadvantages. If the database suddenly crashes at a certain moment, and there are still data in the BufferPool at this time or the modified data is not flushed to the disk, it will inevitably lead to data loss. There is also no guarantee of data persistence. In order to solve this problem, redolog logs came into being. Redolog is also a log in innoDB. Its implementation principle is as follows: before data is written to BufferPool or before data in BufferPool is modified, the operation will first be recorded in redolog. When the transaction is committed, fsync will be used The interface refreshes the redolog, and if the database goes down, the database will read the information in the redolog and restore the data in the database. The redo log uses WAL (Write-ahead logging, write-ahead log). All data will be written into the redolog before being written into BufferPool or modifying data in BufferPool, which ensures that data will not be lost due to mysql downtime. This ensures data persistence.

So since redolog also writes data to the database when the data is submitted, why is it more efficient than writing to the database through BufferPool?

Mainly because of the following two aspects:

1. Redolog is sequential IO, while BufferPool is random IO. The location for data reading and writing is randomly generated, and the speed is slower than sequential IO.

2. The way BufferPool writes data is based on a data page. Generally, the page size of mysql is 16kb, and once any data is modified on a page, the entire page data needs to be rewritten and written , and redolog is real Effective writing: Only newly added or modified data is written, and invalid IO is greatly reduced .

3. Isolation

1. Definition:

Isolation is to study the interaction between transactions. Isolation means that the operations inside the transaction are isolated from other transactions. In a concurrent environment, each transaction does not interfere with each other. Strict isolation corresponds to the transaction isolation level. Serializable (serializable), but serialization is rarely used in practical applications due to performance considerations.

2. Implementation principle

The pursuit of isolation is that concurrent transactions do not affect each other, and the most important considerations in our daily operations are read operations and write operations: 1. The impact of one write
operation on another write operation: through the lock mechanism solve

2. The impact of one write operation on another read operation: solved by MVCC mechanism

1. Lock mechanism: In the write operation requirements, only one transaction is allowed to write to the same part of the data at the same time. The implementation principle of the lock mechanism implemented by innoDB can be understood as follows: before the transaction writes the data, it must first Obtain the lock resource, and then the data can be written at this time. Other transactions that want to acquire the lock resource must wait for the current transaction to roll back the transaction or release the lock after submitting the write operation.

Row locks and table locks: According to the granularity of locks, locks can be divided into row locks, table locks, and locks in between. Table locks lock the entire table when a transaction operates on data. The concurrency performance is poor, and the row lock only locks the operated data when the transaction operates the data, and the concurrency performance is good, but considering that the creation, inspection and destruction of the lock need to consume resources, so generally speaking, the table lock is better than the row lock It can save some resources, but considering business and performance requirements, row locks are generally used, but different storage engines in sql support table locks and row locks differently. For innoDB, it supports row locks. locks and table locks.

Regarding the isolation of transactions and the problems that may arise from different isolations, I recommend you to read another article of mine, so I won’t repeat it here:



In-depth understanding of transaction isolation

2. MVCC mechanism: The default isolation level in the isolation level of SQL is repeatable read (Repeatly Read). Generally speaking, RR cannot solve the problem of phantom reading, but RR implemented by innoDB can avoid the problem of phantom reading. RR To solve problems such as dirty reads, non-repeatable reads, and phantom reads, MVCC is used: the full name of MVCC is Multi-Version Concurrency Control, which is a multi-version concurrency control protocol. MVCC has the following characteristics: At the same time, the data read by different transactions may be different (the data in different versions is different), as shown in the figure below, which can better reflect this characteristic: at T5 time, transaction A and transaction C Different versions of data can be read.

The biggest advantage of MVCC is that it does not need to add read locks, so there is no conflict between reading and writing. The MVCC implemented by innoDB can allow multiple versions to coexist. The realization of its functions is mainly based on the following technologies and data structures: 1. Hidden columns:
database Each piece of data in has a hidden column, and the hidden column has an id pointing to the data transaction of this row and a pointer to undolog

2. Undolog-based version chain: There is a pointer to undolog in the hidden column of each piece of data, and each undolog pointer will also point to an earlier version of undolog, thus forming an undolog version chain

3. ReadView: By hiding the column and version chain, the data can be restored to the previous version, but the specific version to be restored needs to be determined by the specific ReadView. The so-called ReadView means that the transaction (transaction A) takes a snapshot of the entire transaction system (trx_sys) at a certain moment, and when the read operation is performed later, the read transaction id will be compared with trx_sys to determine the desired read Whether the fetched data is valid for this ReadView, that is, whether it is valid for transaction A.

The main content in trx_sys and the rules for judging visibility are as follows:

low_limit_id: Indicates the next id that the system should assign to the transaction in the generated ReadView. If the id of the transaction is greater than or equal to the low_limit_id, it will not be visible to the ReadView.

up_limit_id: Indicates the transaction that was active in the system when ReadView was generated. If the id of the active transaction is less than up_limit_id, it will be visible to the ReadView

rw_trx_ids: Indicates the id list of active transactions in the system when ReadView is generated. If the id of the query data is between low_limit_id and up_limit_id, you need to check whether the transaction is in rw_trx_ids. If so, it means that the transaction is still active when ReadView is generated , the ReadView is not visible. If not, it means that the transaction has been committed when the ReadView is generated, so the data is visible to the ReadView.

3. MVCC is like a song to avoid problems such as dirty reads, non-repeatable reads, and phantom reads?
3.1 Dirty read:

When transaction A reads the balance of zhangsan at T3, a ReadView will be generated. Since transaction B is still active without committing at this time, its transaction id must be in the rw_trx_ids of ReadView. Therefore, according to the rules introduced above, the modification of transaction B is ReadView is not visible. Next, transaction A queries the data of the previous version according to the undo log pointed to by the pointer, and obtains that the balance of zhangsan is 100. In this way, transaction A avoids dirty reading.

3.2 Non-repeatable read

Before transaction A reads the balance of zhangsan at T2, a ReadView will be generated. At this time, transaction B is discussed in two situations. One is that as shown in the figure, the transaction has started but has not been committed. At this time, its transaction id is in the rw_trx_ids of ReadView; the other is that transaction B has not started yet. At this time, its transaction The id is greater than or equal to the low_limit_id of ReadView. In either case, according to the rules introduced earlier, the modification of transaction B is not visible to ReadView.

When transaction A reads the balance of zhangsan again at T5, it will judge the visibility of the data according to the ReadView generated at T2, so that it can be judged that the modification of transaction B is not visible; therefore, transaction A queries the undo log pointed to by the pointer For the first version of the data, the balance of zhangsan is 100, thus avoiding non-repeatable reading.

3.3 Phantom reading

 

 

MVCC's mechanism for avoiding phantom reads is very similar to avoiding non-repeatable reads.

Before transaction A reads the user balance of 0<id<5 at time T2, ReadView will be generated. At this time, transaction B is discussed in two situations. One is that as shown in the figure, the transaction has started but has not been committed. At this time, its transaction id is in the rw_trx_ids of ReadView; the other is that transaction B has not started yet. At this time, its transaction The id is greater than or equal to the low_limit_id of ReadView. In either case, according to the rules introduced earlier, the modification of transaction B is not visible to ReadView.

When transaction A reads the user balance of 0<id<5 again at time T5, it will judge the visibility of the data according to the ReadView generated at time T2, so that it can be judged that the modification of transaction B is not visible. Therefore, for the newly inserted data lisi (id=2), transaction A queries the data of the previous version according to the undo log pointed to by its pointer, and finds that the data does not exist, thereby avoiding phantom reading.

Locked read will lock the queried data (shared lock or exclusive lock) when querying. Due to the characteristics of the lock, when a transaction locks and reads the data, other transactions cannot write the data, so dirty reads and non-repeatable reads can be avoided. To avoid phantom reading, you need to pass the next-key lock. Next-key lock is a kind of row lock, which is equivalent to record lock ( record lock ) + gap lock ( gap lock ) ; its characteristic is that it not only locks the record itself ( the function of record lock ) , but also locks a range ( function of gap lock ) . Therefore, locked reads can also avoid dirty reads, non-repeatable reads, and phantom reads to ensure isolation.

4. Consistency

 

1. Concept: Consistency means that after the execution of the transaction, the integrity constraints of the database are not broken, and the data state before and after the execution of the transaction is legal.

2. Realization: Consistency is the ultimate goal of the database. Atomicity, isolation, and persistence all exist to satisfy consistency. Except for the database level to ensure data consistency, the realization of consistency is at the application layer Also guaranteed.

Measures to achieve consistency:

1. Use atomicity, persistence, and isolation to ensure consistency. If these three characteristics cannot be guaranteed, consistency cannot be guaranteed

2. The database itself makes guarantees, for example, it is not allowed to insert string information into the plastic data, and the length of the string is not allowed to exceed the maximum length of the column

3. Guarantee at the application level, for example, if the transfer operation only deducts the balance of the transferer without increasing the balance of the receiver, no matter how perfect the database is, the consistency of the state cannot be guaranteed

5. Summary:

  • Atomicity: The statement is either fully executed or not executed at all, which is the core feature of the transaction. The transaction itself is defined by atomicity; the implementation is mainly based on the undo log
  • Persistence: Guarantee that data will not be lost due to downtime and other reasons after the transaction is committed; the implementation is mainly based on redo log
  • Isolation: Ensure that transaction execution is not affected by other transactions as much as possible; the default isolation level of InnoDB is RR, and the implementation of RR is mainly based on the lock mechanism (including next-key lock), MVCC (including hidden columns of data, and versions based on undo log Chain, ReadView)
  • Consistency: the ultimate goal pursued by transactions, the realization of consistency requires both the guarantee at the database level and the guarantee at the application level

Guess you like

Origin blog.csdn.net/m0_65431718/article/details/129847676