Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

The text is as follows:

As we all know, transactions and locks are very important functions in mysql, and they are also the focus and difficulty of interviews. This article will introduce the related concepts of transactions and locks and their implementation principles in detail. I believe that after reading it, you will have a deeper understanding of transactions and locks. Organized a 328-page MySQL, PDF document

What is a transaction

In Wikipedia, the definition of transaction is: a transaction is a logical unit in the execution of a database management system (DBMS), which consists of a limited sequence of database operations.

Four characteristics of transactions

The transaction contains four major characteristics, namely Atomicity, Consistency, Isolation and Durability (ACID).

  • Atomicity Atomicity refers to a series of operations on the database, either all succeed or all fail. Partial success is impossible. Take the transfer scenario as an example. The balance of one account decreases and the balance of the other account increases. These two operations must succeed or fail at the same time.

  • Consistency (Consistency) Consistency means that the integrity constraints of the database have not been destroyed, and it is a legal data state before and after the transaction is executed. The consistency here can mean that the constraints of the database itself have not been destroyed, such as the uniqueness constraints of certain fields, field length constraints, etc.; it can also indicate business constraints in various actual scenarios, such as the above transfer operation, the amount of money reduced by an account The amount added to another account must be the same.

  • Isolation (Isolation) Isolation means that multiple transactions are completely isolated from each other and do not interfere with each other. The ultimate goal of isolation is also to ensure consistency.

  • Durability Durability means that as long as the transaction commits successfully, the changes made to the database are permanently saved, and it is impossible to return to the original state for any reason.

Transaction status

According to the different stages of the office, the affairs can be roughly divided into the following 5 states:

  • Active (active) When the database operation corresponding to the transaction is being executed, the transaction is active.

  • Partially committed (partially committed) When the last operation in the transaction is completed, but the changes have not been flushed to disk, the transaction is in a partially committed state.

  • Failed (failed) When the transaction is active or partially committed, the transaction cannot be continued due to some errors, the transaction is in a failed state.

  • Aborted (aborted) When the transaction is in a failed state, and the rollback operation is completed, and the data is restored to the state before the transaction was executed, the transaction is in an aborted state.

  • Submitted (committed) When the transaction is in the partially committed state and the modified data is synchronized to the disk, the transaction is in the committed state.

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

Transaction isolation level

As mentioned earlier, transactions must be isolated. The simplest way to achieve isolation is to not allow concurrent transactions, and each transaction is queued for execution, but the performance of this method is really bad. In order to balance the isolation and performance of the transaction, the transaction supports different isolation levels.

In order to facilitate the presentation of the subsequent content, we first create a sample table hero.

CREATE TABLE hero (  
    number INT,  
    name VARCHAR(100),  
    country varchar(100),  
    PRIMARY KEY (number)  
) Engine=InnoDB CHARSET=utf8;

Problems encountered in concurrent execution of transactions

When transactions are executed concurrently, if no control is performed, the following four types of problems may occur:

  • Dirty Write (Dirty Write) Dirty write refers to a transaction that modifies uncommitted data of other transactions.

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

As shown in the figure above, Session A and Session B each start a transaction. The transaction in Session B first updates the name column of the record whose number is listed as 1 to "Guan Yu", and then the transaction in Session A then lists the number as The name column of the record of 1 is updated to Zhang Fei. If the transaction in Session B is rolled back later, the update in Session A will no longer exist. This phenomenon is called dirty write.

  • Dirty Read Dirty read refers to a transaction that reads uncommitted data from other transactions.

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?
As shown in the figure above, Session A and Session B each start a transaction. The transaction in Session B first updates the name column of the record whose number is listed as 1 to'Guan Yu', and then the transaction in Session A queries the number as 1 If the value of the column name is'Guan Yu', and the transaction in Session B is rolled back later, the transaction in Session A is equivalent to reading a non-existent data. This phenomenon is called Read for dirty.

  • Non-Repeatable Read Non-Repeatable Read refers to the data that has been submitted by other transactions is read during the execution of a transaction, resulting in inconsistent results of the two reads.

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

As shown in the above figure, we have submitted several implicit transactions in Session B (mysql will automatically add transactions for addition, deletion, and modification statements). These transactions have modified the value of the column name of the record whose number column is 1. After each transaction is committed, If all transactions in Session A can view the latest value, this phenomenon is also called non-repeatable read.

  • Phantom Phantom means that during the execution of a transaction, the newly inserted data of other transactions is read, resulting in inconsistent results of the two reads.

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

As shown in the above figure, the transaction in Session A first queries the table hero according to the condition number> 0, and obtains the record with the name column value of'Liu Bei'; then an implicit transaction is submitted in Session B, and the transaction is inserted into the table hero A new record is created; after that, the transaction in Session A queries the table hero according to the same condition number> 0, and the result set contains the newly inserted record of the transaction in Session B. This phenomenon is also called phantom read .

The difference between non-repeatable read and phantom read is that non-repeatable read reads data modified or deleted by other transactions, while phantom read reads data newly inserted by other transactions.

The problem of dirty writing is too serious and any isolation level must be avoided. Regardless of whether it is dirty read, non-repeatable read, or phantom read, they all belong to the problem of database read consistency, and they are inconsistent between two reads before and after a transaction.

Four isolation levels

Four isolation levels are established in the SQL standard to solve the above read consistency problem. Different isolation levels can solve different read consistency problems.

  • READ UNCOMMITTED: Uncommitted read.

  • READ COMMITTED: has been submitted for reading.

  • REPEATABLE READ: Repeatable reading.

  • SERIALIZABLE: Serialization.

The possible read consistency issues under each isolation level are as follows:

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

InnoDB supports four isolation levels (basically consistent with the definition of the SQL standard). The higher the isolation level, the lower the transaction concurrency. The only difference is that InnoDB solves the problem of phantom reads at the REPEATABLE READ level. This is why InnoDB uses repeatable read as the default isolation level for transactions.

MVCC

MVCC (Multi Version Concurrency Control), the Chinese name is multi-version concurrency control. Simply put, it solves the problem of read consistency under concurrent access by maintaining the historical version of the data.

Version chain

In InnoDB, each row of records actually contains two hidden fields: transaction id (trx_id) and rollback pointer (roll_pointer).

  1. trx_id: transaction id. Every time a row is modified, the transaction id of the transaction will be assigned to the trx_id hidden column.

  2. roll_pointer: rollback pointer. Every time a row is modified, the undo log address is assigned to the hidden column of roll_pointer.

Suppose there is only one row in the hero table, and the transaction id inserted at that time is 80. At this time, the sample diagram of the record is as follows:

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

Assuming that two transactions with transaction IDs of 100 and 200 perform an UPDATE operation on this record, the operation flow is as follows:

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

Because each change will be recorded in the undo log first, and use roll_pointer to point to the undo log address. Therefore, it can be considered that the modification log of this record is connected in series to form a version chain, and the head node of the version chain is the latest value of the current record. as follows:
Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

ReadView

If the database isolation level is READ UNCOMMITTED, then the record of the latest version in the version chain can be read. If it is serialized (SERIALIZABLE), transactions are executed with locks, and there is no problem of read inconsistency. But if it is READ COMMITTED or REPEATABLE READ, it is necessary to traverse each record in the version chain to determine whether the record is visible to the current transaction until it is found (if it is not found after traversing Note that the record does not exist). InnoDB implements this function through ReadView. ReadView mainly contains the following 4 contents:

  • m_ids: Represents a list of transaction ids of active read and write transactions in the current system when ReadView is generated.

  • min_trx_id: Represents the smallest transaction id among the active read and write transactions in the current system when the ReadView is generated, that is, the smallest value in m_ids.

  • max_trx_id: Indicates the id value that should be assigned to the next transaction in the system when the ReadView is generated.

  • creator_trx_id: Represents the transaction id that generated the ReadView transaction.

With ReadView, we can judge whether a certain version of the record is visible to the current transaction based on the following steps.

  • If the trx_id attribute value of the accessed version is the same as the creator_trx_id value in ReadView, it means that the current transaction is accessing its own modified record, so this version can be accessed by the current transaction.

  • If the trx_id attribute value of the accessed version is less than the min_trx_id value in ReadView, it indicates that the transaction that generated this version has been committed before the current transaction generates ReadView, so this version can be accessed by the current transaction.

  • If the trx_id attribute value of the accessed version is greater than or equal to the max_trx_id value in ReadView, it indicates that the transaction that generated this version is started after the current transaction generates ReadView, so this version cannot be accessed by the current transaction.

  • If the trx_id attribute value of the accessed version is between the min_trx_id and max_trx_id of ReadView, then you need to determine whether the trx_id attribute value is in the m_ids list. If it is, it means that the transaction that generated this version when the ReadView was created is still active. Cannot be accessed; if it is not, it means that the transaction that generated the version when the ReadView was created has been committed and the version can be accessed.

In MySQL, a very big difference between READ COMMITTED and REPEATABLE READ isolation levels is that they generate ReadView at different times. READ COMMITTED generates a ReadView every time before reading data, so that it can ensure that the data submitted by other transactions can be read every time. REPEATABLE READ generates a ReadView only when the data is read for the first time, so that it can ensure that the results of subsequent reads are completely consistent.

lock

Concurrent access to the same data resource by transactions is mainly divided into three types: read-read, write-write, and read-write.

  • Read-read means that concurrent transactions simultaneously access the same row of data records. Since both transactions perform read-only operations and will not cause any impact on the record, concurrent reads are completely allowed.

  • Write-write means that concurrent transactions modify the same row of data records at the same time. This situation may lead to dirty write problems, which is not allowed under any circumstances, so it can only be achieved by locking, that is, when a transaction needs to modify a row of records, it will first give this record Lock, if the lock is successful, continue execution, otherwise wait in line, the transaction execution is completed or rolled back will automatically release the lock.

  • Read-write means that one transaction performs a read operation and the other performs a write operation. In this case, dirty reads, non-repeatable reads, and phantom reads may occur. The best solution is to use multi-version concurrency control (MVCC) for read operations and lock for write operations.

Lock granularity

According to the data range of the lock function, locks can be divided into row-level locks and table-level locks.

  1. Row-level lock: Act on the data row, the granularity of the lock is relatively small.

  2. Table-level lock: Acting on the entire data table, the granularity of the lock is relatively large.

Classification of locks

In order to realize that the read-read is not affected, and the write-write and read-write can block each other, Mysql uses the idea of ​​reading and writing locks to achieve it, specifically it is divided into shared locks and exclusive locks:

1. Shared locks (Shared Locks): referred to as S locks. When a transaction wants to read a record, the S lock of the record needs to be acquired first. S locks can be held by multiple transactions at the same time. We can use select ...... lock in share mode; to manually add an S lock.

2. Exclusive Locks (Exclusive Locks): referred to as X locks, when a transaction wants to change a record, the X lock of the record needs to be acquired first. X locks can only be held by one transaction at the same time. There are two ways to lock X locks. The first is automatic lock. When adding, deleting, or modifying data, an X lock is added by default. There is also a manual lock. We use a FOR UPDATE to add an X lock to a row of data.

One more thing to note is that if a transaction already holds an S lock for a row of records, another transaction cannot add an X lock for this row, and vice versa.

In addition to shared locks (Shared Locks) and exclusive locks (Exclusive Locks), Mysql also has intention locks (Intention Locks). The intent lock is maintained by the database itself. Generally speaking, before we add a shared lock to a row of data, the database will automatically add an intent shared lock (IS lock) to this table; when we add an exclusive lock to a row of data Before locking, the database will automatically add an intent exclusive lock (IX lock) to this table. The intention lock can be considered as the identification of S lock and X lock on the data table. Through the intention lock, you can quickly determine whether any record in the table is locked, so as to avoid traversing to see if there is a record in the table locked. Locking efficiency. For example, if we want to add table-level X locks, if there are row-level X locks or S locks in the data table, the lock will fail. At this time, we can know whether this table has row level directly based on the intent lock. X lock or S lock.

Table-level locks in InnoDB

Table-level locks in InnoDB mainly include table-level intentional shared locks (IS locks) and intentional exclusive locks (IX locks) and self-increasing locks (AUTO-INC locks). Among them, IS lock and IX lock have been introduced before, so I won't repeat them here. Let's focus on the AUTO-INC lock.

Everyone knows that if we add the AUTO_INCREMENT attribute to a column field, there is no need to specify a value for the field when inserting, and the system will automatically guarantee the increment. There are two main principles for the system to realize this automatic incremental assignment to AUTO_INCREMENT modified columns:

1. AUTO-INC lock: The table-level AUTO-INC lock is added when the insert statement is executed, and the lock is released immediately after the insert is executed. If we cannot determine how many records to insert before the execution of our insert statement, such as INSERT...SELECT insert statements, generally use the AUTO-INC lock method.

2. Lightweight lock: The lightweight lock is acquired first when the AUTO_INCREMENT value is generated by the insert statement, and then the lightweight lock is released after the AUTO_INCREMENT value is generated. If our insert statement can determine how many records to insert before it is executed, then a lightweight lock is generally used to assign values ​​to the columns modified by AUTO_INCREMENT. This approach can avoid locking the table and can improve insert performance.

"By default, MySQL automatically selects the lock mode according to the actual scenario. Of course, you can also force innodb_autoinc_lock_mode to specify only one of them."

Row-level locks in InnoDB

As mentioned earlier, MVCC can solve the read consistency problems of dirty reads, non-repeatable reads, and phantom reads, but in fact this only solves the data reading problem of ordinary select statements. The read operation performed by the transaction using MVCC is called snapshot read, and all ordinary SELECT statements are considered snapshot read under the READ COMMITTED and REPEATABLE READ isolation levels. In addition to snapshot reads, there is also a locked read, that is, lock the record when reading, and still need to solve the problems of dirty read, non-repeatable read, and phantom read in the case of locked read. Since all locks are placed on records, these locks are all row-level locks.

InnoDB's row lock is achieved by locking the index. If the index is not used when locking the query, the entire clustered index will be locked, which is equivalent to locking the table. According to the different lock ranges, row locks can be realized by using Record Locks, Gap Locks, and Next-Key Locks. Suppose there is a table t, and the primary key is id. We inserted 4 rows of data, the primary key values ​​are 1, 4, 7, and 10. Next, we will take the clustered index as an example to specifically introduce three forms of row locks.

  • Record Locks The so-called records refer to the data actually stored in the clustered index. For example, 1, 4, 7, and 10 above are all records.

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

Obviously, record lock is to directly lock a row of records. When we use a unique index (including a unique index and a clustered index) to perform an equivalent query and exactly match a record, the record will be directly locked at this time. For example, select * from t where id = 4 for update; will lock the record with id = 4.

  • Gap Locks Gap refers to the part between two records that has not been filled with data logically, such as the above (1,4), (4,7), etc.

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

In the same way, the gap lock is to lock certain gap intervals. When we use equivalence query or range query and do not hit any record, the corresponding gap interval will be locked. For example, select from t where id = 3 for update; or select from t where id> 1 and id <4 for update; will lock the interval (1,4).

  • Next-Key Locks (Next-Key Locks) Next-Key Locks refer to the left-open and right-close interval composed of the gap plus the record to the right of it. For example, the above (1,4], (4,7), etc.

Momo Interviewer: Tell me about your understanding of transactions and locks in MySQL?

Adjacent key lock is a combination of Record Locks and Gap Locks, that is, in addition to locking the record itself, it also locks the gap between the indexes. When we use a range query and hit part of the record, the key range is locked at this time. Note that the interval locked by the proximity key lock will include the proximity key interval to the right of the last record. For example, select * from t where id> 5 and id <= 7 for update; will lock (4,7], (7,+∞). The default row lock type of mysql is Next-Key Locks. Using a unique index, when an equivalent query matches a record, Next-Key Locks will degenerate into a record lock; when no record is matched, it will degenerate into a gap lock.

Gap Locks and Next-Key Locks are both used to solve the problem of phantom reading. Under the READ COMMITTED isolation level, Gap Locks and Next-Key Locks -Key Locks) will fail! Organized a 328-page MySQL, PDF document

Guess you like

Origin blog.51cto.com/14975073/2608574