Mysql transaction and lock knowledge (1) MVCC

Preface

After reading this series of articles, you can master:

  • The characteristics of transactions and problems caused by transaction concurrency
  • Solution to the problem of transaction read consistency
  • Principle of MVCC
  • The classification of locks, the principle of row locks, and the algorithm of row locks

Note: The Mysql version used in this article is the 5.7 major version, the minor version is not limited, storage engine: InnoDB, transaction isolation level: RR. If readers use other versions of Mysql to operate, the results may be inconsistent with this article

select version();
show variables like 'engine%';
show global variables like 'tx_isolation';

1. What is a database transaction?

1.1 Typical scenarios of transactions

Many times we need transactions because we hope that multiple operations involving the database will either succeed or fail. For example, when a customer places an order, it will operate on the order table, capital table, logistics table, etc., which need to be executed in one transaction.

Many students have come into contact with a very typical case when studying database affairs, which is bank transfer. If we simplify the intra-bank transfer to a situation where the balance of one account decreases and the balance of the other account increases, then these two actions must succeed or fail at the same time, otherwise the bank's accounts will be uneven.

Another example: 12306's continuous transfer function, two tickets must be successfully purchased at the same time, it is meaningless to buy only the first half or only the second half.

1.2 Definition of transaction

Wikipedia's definition: A transaction is a logical unit in the execution of a database management system (DBMS), which consists of a limited sequence of database operations.

There are two key points. The first, the so-called logical unit, means that it is the smallest unit of work in the database and cannot be divided. Second, it may contain one or a series of DML statements, including insert delete update.
(Single DDL (create drop) and DCL (grant revoke) will also have transactions)

1.3 Which storage engines support transactions

Not all databases or all storage engines support transactions, it appears as a feature. Among the storage engines supported by MySQL, which ones support transactions?

In addition to NDB as a cluster, only InnoDB supports transactions, which is an important reason why it has become the default storage engine.
Why does supporting transactions make InnoDB stand out, and what features does transactions provide?

1.4 Four characteristics of transaction

  1. Atomic Atomicity:, which is what we just said cannot be divided, because atoms are the smallest unit in chemistry (participating in chemical reactions). This means that our series of operations on the database are either successful or failed. Partial success or partial failure is impossible. Taking the transfer scenario just mentioned as an example, the balance of an account decreases , Which must correspond to the increase in the balance of another account.
    All success is relatively simple. The question is if the previous operation has succeeded and the subsequent operation fails, how can all it fail? At this time we must roll back.
    Atomicity is achieved in InnoDB through undo log , which records the value before data modification (logical log). Once an exception occurs, undo log can be used to implement the rollback operation.
  2. Isolation: After we have the definition of transaction, there will be many transactions in the database to operate the same table or the same row of data at the same time, which will inevitably produce some concurrent or interference operations. Our definition of isolation is that multiple transactions and concurrent operations on tables or rows should be transparent and not interfere with each other. For example, two people transfer 100 to Xiao Ming and open two transactions, and both get the balance of Xiao Ming's account of 1000, and then each is based on 1000 plus 100, and the final result is 1100, which causes the problem of data confusion.
    How to achieve isolation in InnoDB? We will analyze this in detail later.
  3. Durability: What does the durability of a transaction mean? Our arbitrary operations on the database, additions, deletions, and modifications, as long as the transaction is successfully submitted, the result is permanent, and it is impossible to change to the original state due to database power failure, downtime, or unexpected restart. This is the durability of the transaction.
    How to achieve persistence? Recall that InnoDB crash recovery (crash-safe) is achieved by what?
    Persistence is achieved through redo log and double write buffer (double write buffer) , when we manipulate data, we will first write to the buffer pool of memory Inside, the redo log is recorded at the same time. If an exception occurs before the disk is flushed, the content of the redo log can be read after the restart and written to the disk to ensure the durability of the data. Of course, the prerequisite for successful recovery is that the data page itself is not destroyed and is complete, which is guaranteed by double-write buffering.
    It should be noted that atomicity, isolation, durability, and finally are all for consistency.
  4. Consistent:, refers to the integrity constraints of the database has not been destroyed, before and after the execution of the transaction is a legal data state.
    The database itself provides some constraints: for example, the primary key must be unique, and the field length meets the requirements. There is also user-defined integrity.
    For example, in this transfer scenario, the balance of account A is 0. If the transfer is 1000 yuan at this time and the transfer is successful, the balance of account A will become -1000. Although it also meets atomicity, we know that the debit card The balance cannot be less than 0, so consistency is violated. User-defined integrity is usually controlled in the code.

1.5 When will transactions appear in the database

When I execute such an update statement, does it have a transaction?

update student set sname ='二营长' where id=1;

In fact, it not only automatically opened a transaction, but also automatically committed, so it finally wrote to the disk.
This is the first way to open a transaction. The addition , deletion , and modification of the statement will automatically start the transaction , of course, one SQL transaction. Note that each transaction is numbered. This number is an integer and has an incremental characteristic.
If you want to put multiple SQLs in a transaction, you must manually start the transaction.
There are two ways to manually start a transaction:

  • One is to use begin;
  • -One is to use start transaction

So how to end a transaction?
There are two ways to end the transaction:

  • The first is to rollback the transaction rollback , the transaction ends.
  • The second is to submit a transaction, commit , the transaction ends.

There is an autocommit parameter in InnoDB (divided into two levels, session level and global level).

show variables like 'autocommit';

Its default value is ON. What does the autocommit parameter mean? Whether to submit automatically. If its value is true/on, we will automatically commit the transaction when we manipulate the data.
Otherwise, if we set autocommit to false/off, then the database transaction needs to be manually ended, using rollback or commit.
In another case, the transaction will end when the client's connection is disconnected.

1.6 What are the problems caused by transaction concurrency?

  • Dirty read:
    Suppose we have two transactions, one transaction number 1010, the other-transaction number 1011. In the first transaction, it first queries a piece of data with a where id = 1 condition, and returns this piece of data with name=Xiaoming and age=16. Then for the second transaction, it also operates on the row of data with id = 1. It uses an update statement to change the age of the row of data with id = 1 to 18, but please note that it has not been submitted .
    Insert picture description here
    At this time, in the first transaction, it executes the same query operation again, and finds that the data has changed, and the age of the acquired user data becomes 18. Then, in a transaction, because other transactions modify the data and did not commit, resulting in inconsistent reading of the data twice, this kind of transaction concurrency problem is called dirty read .
    If in the transfer case, our first transaction operates based on the uncommitted balance of the second transaction read, but the second transaction is rolled back, this will cause data inconsistency.

  • Non-repeatable read: There are
    Insert picture description here
    also two transactions, and the first transaction finds a piece of data with id=1. Then an update operation was executed in the second transaction, and after the update was executed, it commits the modification. Then the first transaction reads the data submitted by other transactions, which leads to the inconsistency of the data read in the previous two times. Like here, is age equal to 16 or 18, then this kind of transaction reads the submitted data of other transactions The situation where the data leads to the inconsistency of the data read twice is called non-repeatable read.

  • Phantom read:
    Insert picture description here
    In the first transaction, we executed a range query, and at this time, there is only one piece of data that meets the conditions. In the second transaction, it inserted a row of data and submitted it. Important: A row of data is inserted . When I checked again in the first transaction, I found that there was an extra row of data. This situation is like a phantom that suddenly appeared. This kind of inconsistency of data read twice before and after a transaction is caused by other transactions inserting data. This situation is called phantom read.

What is the biggest difference between non-repeatable reading and phantom reading?
Read inconsistency caused by modification or deletion is called non-repeatable read, and read inconsistency caused by insertion is called phantom read.

Summary:
We just talked about the three major problems caused by transaction concurrency, now let's summarize it for everyone. Whether it is dirty reads, non-repeatable reads, or phantom reads, they are all issues of database read consistency, and they are all inconsistent reads before and after two times in a transaction.
The problem of read consistency must be solved by the database providing a certain transaction isolation mechanism. Just like when we go to a restaurant for dinner, the basic facilities and sanitation guarantee are provided by the restaurant. Then we use a database, the problem of isolation must also be solved by the database.

1.7 SQL92 standard

The American National Standards Institute (ANSI) has developed a SQL standard, which means that database vendors are recommended to provide a certain transaction isolation level in accordance with this standard to solve the problem of transaction concurrency. There are many versions of this SQL standard, and the most familiar one is the SQL92 standard.

Let's take a look at the official website of the SQL92 standard.

Level P1 (dirty read) P2 (non-repeatable read) P3 (phantom reading)
READ UNCOMMITTED (read uncommitted) Possible Possible Possible
READ COMMITTED (read has been submitted) Not Possible Possible Possible
REPEATABLE READ (repeatable read) Not Possible Not Possible Possible
SERIALIZABLE (Serialization) Not Possible Not Possible Not Possible

There is a table (search iso), which defines four isolation levels. The P1, P2, and P3 on the right represent the 3 problems of transaction concurrency, dirty read, non-repeatable read, and phantom read. Possible represents that this problem may occur under this isolation level, in other words, the problem is not solved. Not Possible solves this problem.

We analyze in detail how these 4 isolation levels are defined.

  • The first isolation level is called: Read Uncommitted (read uncommitted) , a transaction can read the uncommitted data of other transactions, dirty reads will occur, so it is called RU, it does not solve any problems.
  • The second isolation level is called: Read Committed , that is, a transaction can only read data that has been committed by other transactions, and cannot read data that has not been committed by other transactions. It solves the problem of dirty reads. But there will be a problem of non-repeatable reading.
  • The third isolation level is called: Repeatable Read (repeatable read) , it solves the problem of non-repeatable read, that is, read the same data multiple times in the same transaction, the result is the same, but at this level, there is no Define to solve the problem of phantom reading.
  • The fourth Serializable (Serialization) , in this isolation level, all transactions are executed serially, that is, operations on data need to be queued, there is no concurrent operation of transactions, so it solves all problem.

Transaction isolation level modification sql:

set global transaction isolation level read uncommitted;
set global transaction isolation level read committed;
set global transaction isolation level repeatable read;
set global transaction isolation level serializable;

This is the SQL92 standard, but different database vendors or storage engines have different implementations. For example, there are only two types of RC (committed read) and Serializable (serialization) in Oracle. So what is the implementation of InnoDB, the storage engine supporting transactions in MySQL?

1.8 MySQL InnoDB support for isolation level

Transaction isolation level P1 (dirty read) P2 (non-repeatable read) P3 (phantom reading)
READ UNCOMMITTED (read uncommitted) Possible Possible Possible
READ COMMITTED (read has been submitted) Not Possible Possible Possible
REPEATABLE READ (repeatable read) Not Possible Not Possible Impossible for InnoDB
SERIALIZABLE (Serialization) Not Possible Not Possible Not Possible

The four isolation levels supported by InnoDB are exactly the same as those defined by SQL92. The higher the isolation level, the lower the transaction concurrency. The only difference is that InnoDB solves the problem of phantom reading at the RR level . In other words, there is no need to use serialized isolation levels to solve all problems, which not only ensures data consistency, but also supports a high degree of concurrency. This is why InnoDB uses RR as the transaction isolation level by default.

1.9 Two implementation solutions to solve read consistency

If you want to solve the problem of read consistency, ensure that the results of reading data twice in a transaction are consistent, and achieve transaction isolation, what should be done? Generally speaking, we have two major types of programs.

1.9.1 LBCC (Lock Based Concurrency Control)

The first one, since it is necessary to ensure that the data read twice before and after, when I read the data, lock the data I want to operate and not allow other transactions to modify it.

If transaction isolation is only based on locks, and a transaction is not allowed to be modified at other times when it is read, it means that concurrent read and write operations are not supported, and most of our applications read more and write less, which will Greatly affect the efficiency of operating data.

So we have another solution.

1.9.2 MVCC (Multi Version Concurrency Control)

If you want to keep the data read twice before and after a transaction consistent, then we can create a backup or snapshot for it before modifying the data, and then read the snapshot later.

Principles of MVCC:

Data version that can be seen by a transaction:

  1. Modifications of transactions committed before the first query
  2. Modification of this transaction

A data version that cannot be seen by a transaction:

  1. The transaction created after the first query of this transaction (the transaction ID is greater than my transaction ID)
  2. Modification of active (uncommitted) transactions

The effect of MVCC: I can check the data that existed before the start of my transaction, even if it was modified or deleted later. And the new data added after my transaction, I cannot find out.
That's why we call this a snapshot. No matter what other transactions do to add, delete, modify, and check, it can only see the data version seen during the first query.

Question: How is this snapshot implemented? Will it take up additional storage space?
Let's analyze the principle of MVCC. First of all, InnoDB's transactions are numbered and will continue to increase.

InnoDB implements two hidden fields for each row of records:
DB_TRX_ID: 6 bytes: transaction ID, in which transaction the data is inserted or modified as new data is recorded as the current transaction ID.
DB_ROLL_PTR: 7 bytes: rollback pointer (we understand it as the delete version number, when the data is deleted or recorded as old data, the current transaction ID is recorded, and it is empty when it is not modified or deleted)

Insert picture description here
(1) The first transaction, initialize the data (check the initial data)

Transaction 1
begin;
insert into mvcctest values (NULL,‘小明’);
insert into mvcctest values ​​(NULL,'Lao Wang');
commit;

At this time, the created version is the current transaction ID (assuming the transaction number is 1), and the deleted version is empty:

id name Create version Delete version
1 Akari 1 NULL
2 Pharaoh 1 NULL

(2) The second transaction, execute the first query, read two pieces of original data, this time the transaction ID is 2;

Transaction 2
begin;
select * from mvcctest;-first query

(3) The third transaction, insert data:

Transaction 3
begin;
insert into mvcctest values ​​(NULL,'Old Zhang');
commit;

At this time, there is one more "old sheet" in the data, and its creation version number is the current transaction number, 3;

id name Create version Delete version
1 Akari 1 NULL
2 Pharaoh 1 NULL
2 Lao Zhang 3 NULL

(4) The second transaction, execute the second query:

Transaction 2
select * from mvcctest; – second query

According to MVCC's search rules: the data inserted after the start of my transaction cannot be found, and the creation ID of "Lao Zhang" is greater than 2, so only two pieces of data can be found.

(5) The fourth transaction, delete data, delete the record with id=2'Lao Wang':

Transaction 4
begin;
delete from mvcctest where id = 2;
commit;

At this time, the deleted version of'Lao Wang' is recorded as the current transaction ID 4, and other data remains unchanged:

id name Create version Delete version
1 Akari 1 NULL
2 Pharaoh 1 4
2 Lao Zhang 3 NULL

(6) In the second transaction, execute the third query:

Transaction 2
select * from mvcctest;-third query

Search rules: Only find data whose creation time is less than or equal to the current transaction ID, and rows whose deletion time is greater than the current transaction ID (or not deleted). That is, data deleted after my business started. So'Pharaoh' can still find out. So there are still these two pieces of data.

(7) The fifth transaction, perform the update operation, the transaction ID of this transaction is 5:

Transaction 5
begin;
update mvcctest set name ='China's first handsome' where id=1;
commit;

The data at this time, when the data is updated, the deleted version of the old data is recorded as the current transaction ID 5 (undo rollback),
a new data is generated, and the creation ID is the current transaction ID5:

id name Create version Delete version
1 Akari 1 5
2 Pharaoh 1 4
2 Lao Zhang 3 NULL
1 China's first handsome 5 NULL

(8) The second transaction, execute the fourth query:

Transaction 2
select * from mvcctest; – the fourth query

Search rules: Only find data whose creation time is less than or equal to the current transaction ID, and delete rows (or updates) whose deletion time is greater than the current transaction ID .

Because the updated data "China's No. 1 Commander" created version greater than 2, it means it was added after the transaction and cannot be found. And the deleted version of the old data "Lao Wang" is greater than 2, which means it was deleted after the transaction and can be found out.

通过以上演示我们能看到,通过版本号的控制,无论其他事务是插入、修改、删除, 第一个事务查询到的数据都没有变化。这个是MVCC的效果。当然,这里是一个简化的模型。

问题一: InnoDB中,一条数据的旧版本,是存放在哪里的呢?

答案是:undo logo 。 因为修改了多次, 这些undo log会形成一个链条,叫做undo log链,所以前面我们说的DB_ROLL_PTR,它其实就是指向undo log链的指针。

问题二: 每个不同时间点的事务,它们去undo log链找数据的时候,拿到的数据是不一样的。在这个undo log链里面,一个事务怎么判断哪个版本的数据是它应该读取的呢?

按照前面说的MVCC的规则,必须根据事务id做一系列比较。所以,我们必须要有一个数据结构,把本事务ID、活跃事务ID、当前系统最大事务ID存起来,这样才能实现判断。这个数据结构就叫Read View (可见性视图),每个事务都维护一个自己的Read View。

m_ids{} min_trx_id max_trx_id creator_trx_id
列表,当前系统活跃的事务id m_ids 的最小值 系统分配给下一个事务的id 生成readView事务的事务id
  • m_ids:表示在生成ReadView时当前系统中活跃的读写事务的事务id列表
  • min_trx_id:表示在生成ReadView时当前系统中活跃的读写事务中最小的事务id,也就是m_ids中的最小值。
  • max_trx_id:表示生成ReadView时系统中应该分配给下一个事务的id值。
  • creator_trx_id:表示生成该ReadView的事务的事务id。

有了这个数据结构以后,事务判断可见性的规则是这样的:(了解)

  1. 从数据的最早版本开始判断(undo log)
  2. 数据版本的trx_id = creator_trx_id,本事务修改,可以访问
  3. 数据版本的trx_id < min_trx_id (未提交事务的最小ID),说明这个版本在生成ReadView已经提交,可以访问
  4. The tr_ id of the data version> max_trx_id (the next transaction ID), this version is created by the transaction opened after the ReadView is generated, and cannot be accessed
  5. The trx_id of the data version is between min_trx_id and max_trx_id, see if it is in m_ids. If you are, you can't. If not, you can.
  6. If the current version is not visible, find the next version in the undo log chain.

Note:
Read View in RR (Repeatable Read) is created when the transaction is first queried. The Read View of RC (Read Submitted) is created every time a transaction is queried.

Oracle. Postgres and other databases have MVCC implementations.
It should be noted that in InnoDB, MVCC and locks are used together, and the two schemes are not mutually exclusive.
The first type of solution is locks. How does locks achieve read consistency?

Move to the next article;
Mysql transaction and lock knowledge (2) Mysql lock

Guess you like

Origin blog.csdn.net/nonage_bread/article/details/113028396