Database System Transaction and Concurrency Control

1. Affairs

A transaction refers to a set of operations that satisfy ACID characteristics . A transaction can be committed through Commit , or rolled back using Rollback .

ACID

1) Atomicity

A transaction is regarded as an indivisible minimum unit , and all operations of a transaction are either successfully committed or rolled back upon failure.

Rollback can be implemented with the Undo Log, which records the modification operations performed by the transaction, and these modification operations can be performed in reverse during the rollback.

2) Consistency

The database remains in a consistent state before and after a transaction is executed. In a consistent state, all transactions read the same data with the same result.

3) Isolation

Modifications made by one transaction are not visible to other transactions until they are finally committed.

4) Durability

Once a transaction is committed, its modifications will be permanently saved to the database. Even if the system crashes, the results of transaction execution cannot be lost.

In the event of a system crash, redo logs (Redo Log) can be used for recovery to achieve persistence. Unlike rollback logging, which records logical changes to data, redo logs record physical changes to data pages.


The concept of ACID characteristics of transactions is simple, but it is not well understood, mainly because these characteristics are not a hierarchical relationship :

  • Only when the consistency is satisfied, the execution result of the transaction is correct.
  • In the absence of concurrency, transactions are executed serially, and the isolation must be satisfied. At this time, as long as the atomicity can be satisfied, the consistency must be satisfied.
  • In the case of concurrency, multiple transactions are executed in parallel. Transactions must not only satisfy atomicity, but also need to satisfy isolation to satisfy consistency.
  • Transactions satisfy persistence in order to cope with system crashes.

AUTOCOMMIT

MySQL uses autocommit mode by default . That is, if you do not explicitly use the START TRANSACTION statement to start a transaction, each query operation will be treated as a transaction and automatically committed.

 

2. Concurrency and consistency issues

In a concurrent environment, it is difficult to guarantee the isolation of transactions, so there will be many concurrent consistency problems.

2.1 Lost Modifications

Lost modification means that the update operation of one transaction is replaced by the update operation of another transaction . Generally, it is often encountered in real life. For example, two transactions T1 and T2 modify a piece of data. T1 is modified first and submitted to take effect, and T2 is modified later. The modification of T2 covers the modification of T1.

 

2.2 Read dirty data

Reading dirty data means that under different transactions, the current transaction can read uncommitted data of other transactions. For example: T1 modifies a piece of data but does not submit it, T2 then reads the piece of data. If T1 undoes this modification, then the data read by T2 is dirty data.

2.3 Non-repeatable read

Non-repeatable read refers to reading the same data set multiple times within a transaction . Before this transaction ends, another transaction also accesses the same data set and modifies it. Due to the modification of the second transaction, the data read twice in the first transaction may be inconsistent. For example: T2 reads a data, and T1 modifies the data. If T2 reads this data again, the result read at this time is different from the result read for the first time.

 

2.4 Phantom read

Phantom reading is essentially a non-repeatable read. T1 reads a certain range of data, T2 inserts new data in this range, and T1 reads this range of data again. At this time, the sum of the read results and the first The results of each read are different.


The main cause of concurrency inconsistency is the destruction of transaction isolation , and the solution is to ensure isolation through concurrency control . Concurrency control can be implemented by blocking, but the blocking operation needs to be controlled by the user, which is quite complicated. The database management system provides transaction isolation levels, allowing users to deal with concurrency and consistency issues in an easier way.

3. Block

3.1 Blocking Granularity

MySQL provides two blocking granularities: row-level locks and table-level locks .

You should try to lock only the part of the data that needs to be modified, not all resources. The smaller the amount of locked data, the smaller the possibility of lock contention, and the higher the degree of concurrency of the system.

However, locking requires resource consumption, and various lock operations (including acquiring locks, releasing locks, and checking lock status) will increase system overhead. Therefore, the smaller the blocking granularity, the greater the system overhead .

When choosing the blocking granularity, a trade-off needs to be made between the locking overhead and the degree of concurrency .

3.2 Types of blocking

1) Read-write lock

  • Mutual exclusion lock (Exclusive), abbreviated as X lock, also known as write lock.
  • Shared lock (Shared), abbreviated as S lock, also known as read lock.

There are two rules:

  • A transaction adds an X lock to data object A, and then A can be read and updated. During the locking period, other transactions cannot add any locks to A.
  • A transaction adds an S lock to data object A, and can perform read operations on A, but cannot perform update operations. During the locking period, other transactions can add S locks to A, but cannot add X locks.

The compatibility relationship of the lock is as follows:

 

3.3 Intention lock

Multi-grained blocking can be supported more easily using Intention Locks .

In the case of row-level locks and table-level locks, if transaction T wants to add X lock to table A, it needs to check whether other transactions have locked table A or any row in table A, then it needs to lock Each row of table A is checked once, which is very time-consuming.

The intent lock introduces IX/IS on top of the original X/S lock. IX/IS are table locks, which are used to indicate that a transaction wants to add an X lock or an S lock to a certain data row in the table. There are two rules:

  • Before a transaction obtains the S lock of a data row object, it must first obtain the IS lock of the table or a stronger lock;
  • Before a transaction acquires an X lock on a data row object, it must first acquire an IX lock on a table.

By introducing intention locks, transaction T wants to add X locks to table A, it only needs to check whether other transactions have added X/IX/S/IS locks to table A, if so, it means that other transactions are using this table Or the lock of a certain row in the table, so transaction T plus X lock fails.

The compatibility relationship of various locks is as follows:

The explanation is as follows:

Any IS/IX locks are compatible, because they only indicate that they want to lock the table, rather than actually lock; the
compatibility relationship here is for table-level locks, and table-level IX locks and row-level X locks Locks are compatible, two transactions can add X locks to two data rows. (Transaction T1 wants to add X lock to data row R1, transaction T2 wants to add X lock to data row R2 of the same table, both transactions need to add IX lock to the table, but IX lock is compatible, and IX Locks are also compatible with row-level X locks, so both transactions can lock successfully and modify two data rows in the same table.)


3.4 Blocking protocol

1) Three-level blockade protocol

Level 1 Blocking Protocol

When transaction T wants to modify data A, X lock must be added, and the lock will not be released until T ends.

It can solve the problem of lost modification, because two transactions cannot modify the same data at the same time, so the modification of the transaction will not be overwritten.

Secondary Blocking Protocol

On a first-level basis, it is required to add an S lock when reading data A, and release the S lock immediately after reading.

It can solve the problem of reading dirty data, because if a transaction is modifying data A, according to the level 1 blocking protocol, X lock will be added, then S lock cannot be added, that is, data will not be read.

Level 3 Lockdown Protocol

On the second-level basis, it is required to add an S lock when reading data A, and the S lock cannot be released until the end of the transaction.

It can solve the problem of non-repeatable reading, because when reading A, other transactions cannot add X lock to A, thus avoiding data changes during reading.

 2) Two-stage lock protocol

Locking and unlocking are performed in two phases.

Serializable scheduling means that through concurrency control, the results of concurrently executed transactions are the same as those of a serially executed transaction. Serially executed transactions do not interfere with each other, and there will be no concurrent consistency issues.

Transactions following the two-phase lock protocol are sufficient conditions for serializable scheduling. For example, the following operation satisfies the two-stage lock protocol, which is serializable scheduling.

lock-x(A)...lock-s(B)...lock-s(C)...unlock(A)...unlock(C)...unlock(B)

 But it is not a necessary condition. For example, the following operation does not satisfy the two-phase lock protocol, but it is still serializable.

lock-x(A)...unlock(A)...lock-s(B)...unlock(B)...lock-s(C)...unlock(C)

MySQL implicit and explicit locking

MySQL's InnoDB storage engine uses a two-stage lock protocol, which automatically locks when needed according to the isolation level, and all locks are released at the same time, which is called implicit locking.

InnoDB can also use specific statements for explicit locking:

SELECT ... LOCK In SHARE MODE;
SELECT ... FOR UPDATE;

4. Isolation level

Uncommitted read (READ UNCOMMITTED)

Modifications in a transaction, even if not committed, are visible to other transactions.

Commit to read (READ COMMITTED)

A transaction can only read changes made by committed transactions. In other words, changes made by one transaction are not visible to other transactions until they are committed.

Repeatable read (REPEATABLE READ)

It is guaranteed that the result of reading the same data multiple times in the same transaction is the same.

Serializable (SERIALIZABLE)

Force transactions to be executed serially, so that multiple transactions do not interfere with each other, and there will be no concurrent consistency problems.

This isolation level needs to be implemented by locking, because the locking mechanism must be used to ensure that only one transaction is executed at the same time, that is, the serial execution of transactions is guaranteed.

 

 5. Multi-version concurrency control

Multi-version concurrency control (Multi-Version Concurrency Control, MVCC) is a specific way for MySQL's InnoDB storage engine to implement isolation levels . It is used to implement two isolation levels: committed read and repeatable read . The uncommitted read isolation level always reads the latest data row, the requirements are very low, and there is no need to use MVCC. The serializable isolation level needs to lock all rows that are read, which cannot be achieved using MVCC alone.

5.1 Basic idea

As mentioned in the blocking section, locking can solve the concurrency consistency problem that occurs when multiple transactions are executed at the same time . In actual scenarios, read operations are often more than write operations, so read-write locks are introduced to avoid unnecessary locking operations, for example, read and read do not have a mutually exclusive relationship. The read and write operations in the read-write lock are still mutually exclusive, while MVCC uses the idea of ​​multiple versions, the write operation updates the latest version snapshot, and the read operation reads the old version snapshot, there is no mutual exclusion relationship, which is similar to CopyOnWrite .

In MVCC, transaction modification operations (DELETE, INSERT, UPDATE) will add a new version snapshot for the data row.

The most fundamental reason for dirty reads and non-repeatable reads is that a transaction reads uncommitted modifications of other transactions. When a transaction reads, in order to solve the problem of dirty reads and non-repeatable reads, MVCC stipulates that only committed snapshots can be read. Of course, a transaction can read its own uncommitted snapshot, which is not considered a dirty read.

5.2 Version number

  • System version number SYS_ID: It is an incremental number, and the system version number will be automatically incremented every time a new transaction is started.
  • Transaction version number TRX_ID: The system version number at the beginning of the transaction.

5.3 Undo log

The multi-version of MVCC refers to the snapshots of multiple versions , and the snapshots are stored in the Undo log, which connects all the snapshots of a data row through the rollback pointer ROLL_PTR.

For example, create a table t in MySQL, which contains the primary key id and a field x. We first insert a data row, and then perform two update operations on the data row.

INSERT INTO t(id, x) VALUES(1, "a");
UPDATE t SET x="b" WHERE id=1;
UPDATE t SET x="c" WHERE id=1;

Because START TRANSACTION is not used to execute the above operation as a transaction, according to MySQL's AUTOCOMMIT mechanism, each operation will be executed as a transaction, so the above operation involves a total of three transactions. In addition to recording the transaction version number TRX_ID and operation, the snapshot also records a bit DEL field, which is used to mark whether it is deleted.

INSERT, UPDATE, DELETE operations will create a log and write the transaction version number TRX_ID. DELETE can be regarded as a special UPDATE, and the DEL field is additionally set to 1.

5.4 ReadView

MVCC maintains a ReadView structure, which mainly includes the current system uncommitted transaction list TRX_IDs {TRX_ID_1, TRX_ID_2, ...}, as well as the minimum values ​​of the list TRX_ID_MIN and TRX_ID_MAX .

During the SELECT operation, judge whether the data row snapshot can be used according to the relationship between TRX_ID, TRX_ID_MIN and TRX_ID_MAX of the data row snapshot :

  • TRX_ID < TRX_ID_MIN, indicating that the data row snapshot was changed before all current uncommitted transactions, so it can be used.
  • TRX_ID > TRX_ID_MAX, indicating that the data row snapshot was changed after the transaction was started , so it cannot be used.
  • TRX_ID_MIN <= TRX_ID <= TRX_ID_MAX, need to judge according to the isolation level:
    • Commit read: If TRX_ID is in the TRX_IDs list, it means that the transaction corresponding to the data row snapshot has not been committed, and the snapshot cannot be used. Otherwise, it has been submitted and can be used.
    • Repeatable read: Neither can be used. Because if it can be used, other transactions can also read the data row snapshot and modify it, then the value obtained by the current transaction to read the data row will change, that is, the non-repeatable read problem occurs.

When the data row snapshot is not available, it is necessary to find the next snapshot along the rollback pointer ROLL_PTR of the Undo Log, and then make the above judgment.

5.5 Snapshot read and current read

snapshot read

The SELECT operation of MVCC is the data in the snapshot and does not need to be locked.

SELECT * FROM table ...;

currently reading

MVCC other operations that modify the database (INSERT, UPDATE, DELETE) need to be locked to read the latest data. It can be seen that MVCC does not need to lock at all, but just avoids the locking operation of SELECT.

When performing a SELECT operation, you can forcefully specify a locking operation. The first statement below requires an S lock, and the second requires an X lock.

SELECT * FROM table WHERE ? lock in share mode;
SELECT * FROM table WHERE ? for update;

6. Next-Key Locks

Next-Key Locks is a lock implementation of MySQL's InnoDB storage engine.

MVCC cannot solve the phantom read problem, and Next-Key Locks exists to solve this problem. Under the repeatable read (REPEATABLE READ) isolation level, using MVCC + Next-Key Locks can solve the phantom read problem.

Record Locks

Lock an index on a record, not the record itself. If the table is not indexed, InnoDB will automatically create a hidden clustered index on the primary key, so Record Locks can still be used.

Gap Locks

Lock the gaps between indexes, but not the indexes themselves. For example, when a transaction executes the following statement, other transactions cannot insert 15 in tc.

SELECT c FROM t WHERE c BETWEEN 10 and 20 FOR UPDATE;

Next-Key Locks

It is a combination of Record Locks and Gap Locks, not only locking the index on a record, but also locking the gap between the indexes. It locks a front-opening and back-closing interval. For example, an index contains the following values: 10, 11, 13, and 20, then the following intervals need to be locked:

(-∞, 10]
(10, 11]
(11, 13]
(13, 20]
(20, +∞)

Guess you like

Origin blog.csdn.net/daydayup858/article/details/129397113