Locks are the key to understanding isolation levels

Primer

When it comes to the isolation level of the database, we seem to know it all, but it seems that we can't figure out the real difference between the various isolation levels. I read a lot of articles on the Internet before, and I thought I understood it at the time, but I forgot it after a while. , and then it takes a lot of time to re-understand.

In the end, I still didn't understand.

After reading "Phoenix Architecture: Building a Reliable Large-Scale Distributed System" by Zhou Zhiming, I really understand that the key to understanding isolation levels is the understanding of database locks.

The "key" to understand the isolation level is actually a lock: different isolation levels are actually just a combination of different locks.

Knowing this, we no longer have to memorize the isolation level by rote, and we can deduce it with a combination of locks.

Let's take a look at the three types of locks in the database.

three locks

  • Write Lock (Write Lock, also known as exclusive lock, eXclusive Lock, abbreviated as X-Lock): If the data has a write lock, only the transaction holding the write lock can write the data. When the data holds a write lock, it is exclusive, and other transactions cannot write data, nor can they impose a read lock.
  • Read Lock (Read Lock, also known as shared lock, Shared Lock, abbreviated as S-Lock): the lock is shared, and multiple transactions can add multiple read locks to the same data. It can no longer be locked for writing, so other transactions cannot write to the data, but others can still read it. For a transaction holding a read lock, if only one of its own transactions has a read lock on the data, it is allowed to directly upgrade it to a write lock, and then write the data.
  • Range Lock: Directly add an exclusive lock to a certain range, and data in this range cannot be written. The following statement is a typical example of adding range locks:
SELECT * FROM phones WHERE price < 100 FOR UPDATE;

isolation level

There are four characteristics of transactions, namely atomicity (Atomic), consistency (Consistency), isolation (Isolation) and durability (Durability). The four characteristics of transactions are also called ACID.

An isolation level is a classification of the degree of isolation.

The isolation of the database ensures that the data read and written by each transaction are independent of each other and will not affect each other. It can be seen from the definition that isolation must be closely related to concurrency, because if there is no concurrency and all transactions are serial, then no isolation is required, or such access has natural isolation. But the reality is that there is no concurrency, so how to achieve safe access to data under concurrency?

Why is the isolation level needed, and not all transactions are executed serially? Serialized access provides the highest level of isolation. Adding read locks, write locks, and range locks to all read and write data of a transaction can ensure complete isolation of transactions without affecting each other.

Obviously, the efficiency of doing so is too low, and the database must consider performance issues.

The isolation level we are talking about means that during the process of one transaction reading data, the data is modified by another transaction, resulting in a problem. This is the isolation problem in the case of "one transaction read + another transaction write".

That is, I was reading and reading, but the data was changed by others during the reading process, which caused me to read wrong data. To solve this problem, the following four isolation levels appeared.

They are read uncommitted, read committed, repeatable read, and serializable .

We will use the mysql8.0 database to do a few experiments to verify it. First, prepare the tables and data:

// 创建表
create table phones(
id int not null auto_increment,    
model varchar(30) not null default '',   
price int not null 
default 0,    
primary key(id)) 
engine=InnoDB default charset=utf8mb4;
// 插入数据
insert into phones(model, price) values('001', 1800), ('002', 2100), ('003', 3000);

RED UNCOMMITTED (read uncommitted)

Lock Combination: Write Lock

Read uncommitted (RED UNCOMMITTED) means that when a transaction has not been committed, the changes it makes can be seen by other transactions.

At the read uncommitted level, changes made in a transaction are visible to other transactions even if they are not committed. Transactions can read uncommitted data, which is called "dirty read" (Dirty Read), because it is likely to read dirty data in the middle of the process, rather than the final data.

For example, the first transaction T1 queries the price of the mobile phone is 1800, then transaction T2 modifies the price to 2000, but the transaction has not yet been submitted, at this time transaction T1 reads the price again to 2000, and finally transaction T2 rolls back, and the price changes again Back to 1800.

In this way, the price 2000 read by the second transaction T2 is dirty data.

select pice from phones where id = 1;  // 时间顺序:1,事务:T1。读到price=1800
update phones set pice = 2000 where id = 1; // (注意:这里没有commit) 时间顺序:2,事务:T2。设置price=2000
select pice from phones where id = 1;   // 时间顺序:3,事务:T1。读到price=2000
ROLLBACK; // 时间顺序:4,事务:T2。事务回滚,price=1800

Since read uncommitted does not add a read lock when reading data, the uncommitted data of transaction T2 will be immediately read by transaction T1. This is a performance that a transaction is affected by other transactions and the isolation is broken.

Read committed isolation level can solve the "dirty read" problem.

If the isolation level is read-committed, since transaction T2 holds a write lock on the data, the second query of transaction T1 cannot obtain the read lock, and the read-committed level requires adding a read lock before reading the data, so The query in T1 will be blocked until the transaction T2 is committed or rolled back to get the result.

Let me verify the "dirty read" problem, we open two terminals, terminal A and terminal B.
insert image description here

1. Set the transaction isolation level of terminal A to read uncommitted, that is, read uncommitted.

// 设置隔离级别为read uncommitted
set session transaction isolation level read uncommitted;

insert image description here
2. Set the transaction isolation level of terminal B to read uncommitted, open a transaction, and modify the price of the model 001 mobile phone to 2000.

// 设置隔离级别为read uncommitted
set session transaction isolation level read uncommitted;
// 开启事务
start transaction;
// 修改价格
update phones set price = 2000 where id = 1;

Terminal B starts the transaction, and modifies the price of model 001 to 2000. Note that terminal B has not submitted the transaction at this time, and terminal A has read the latest price.
insert image description here
At this time, terminal B rolls back the transaction, and the price of model 001 returns to 1800, that is, the modification operation fails, and the price does not change at all.

rollback;

insert image description here
The price of 2000 read in the middle of terminal A is dirty data, which is called "dirty read".

RED COMMITTED (read committed)

Lock combination: write lock + read lock (released after the read operation is completed)

Read committed (RED COMMITTED) means that after a transaction is committed, the changes it makes will be seen by other transactions.

The default isolation level of the Oracle database system is read committed.

Read committed means that a transaction can only read data that has been committed by other transactions, so it is called read committed. This transaction level is also called non-repeatable read (non-repeatable read), because the same query twice may get different results.

Let's look at an example, if the price of the mobile phone with id 1 is 1800 at the beginning.

The price found in the first query statement is 1800, but before the second query statement is executed, the price is changed to 2000 by transaction T2, so the price obtained in the second query is 2000, and the data read twice are different , which is called non-repeatable read.

select pice from phones where id = 1;  // 时间顺序:1,事务:T1。读到price=1800
update phones set pice = 2000 where id = 1; commit; // 时间顺序:2,事务:T2。设置price=2000
select pice from phones where id = 1; commit;  // 时间顺序:3,事务:T1。读到price=2000

The reason for non-repeatable reading is that when reading data at the submitted isolation level, a read lock will be added, but this read lock will be released immediately after the query operation is completed. Since the read lock does not run through the entire transaction cycle, it is impossible to prevent the read data from changing, and transaction T2 can take the opportunity to add a write lock to modify the data.

Repeatable read isolation level solves the problem of non-repeatable read.

If the isolation level is repeatable read, the read lock imposed by transaction T1 will not be released immediately after reading, but will run through the entire transaction cycle, so transaction T2 cannot obtain the write lock, and the update will be blocked until transaction T1 is committed Or rollback before submitting.

Let's reproduce the non-repeatable read phenomenon:

1. Set the transaction isolation level of terminal A to read committed, that is, read committed.

// 设置隔离级别为read committed
set session transaction isolation level read committed;

insert image description here
2. Set the transaction isolation level of terminal B to read committed, start a transaction, and modify the price of the model 001 mobile phone to 2000.

// 设置隔离级别为read committed
set session transaction isolation level read committed;
// 开启事务
start transaction;
// 修改价格
update phones set price = 2000 where id = 1;

The price of terminal B is changed from 1800 to 2000, but before the transaction of terminal B is submitted, the price of model 001 in terminal A is still 1800, which solves the problem of dirty reading.
insert image description here
After terminal B submits the transaction, terminal A can find out that the latest price is 2000, but because the data found by terminal A is different before and after terminal B’s transaction submission, this creates the problem of non-repeatable reading .
insert image description here

REPEATABLE READ (repeatable read)

Lock combination: write lock + read lock (continuous)

Repeatable read (REPEATABLE READ) means that the data seen during the execution of a transaction is always consistent with the data seen when the transaction is started.

The default transaction isolation level of the InnoDB storage engine in MySQL is repeatable read.

Repeatable reading adds read locks and write locks to the data involved in the transaction, and holds them until the end of the transaction, but does not add range locks. The weaker part of repeatable read than serialization is the phantom read problem (Phantom Read). This level guarantees that the results of reading the same record multiple times in the same transaction are consistent. However, the problem of phantom reading cannot be solved. The so-called phantom reading means that when a transaction reads a certain range of records again, another transaction inserts a new record in this range. When fetching the records in this range, it is found that there is one more row, which will generate phantom rows.

For example, we have an e-commerce website. To count the number of mobile phones whose price is less than 2,000 yuan, execute the following SQL statement:

select count (1) from phones where price < 2000  // 时间顺序:1,事务:T1
insert into phones(model, pice) values('004', 1800) // 时间顺序:2,事务:T2
select count (1) from phones where price < 2000  // 时间顺序:3,事务:T1

Transaction T1 executes two queries, but between the two queries, another transaction inserts a mobile phone whose price is less than 2000 in the database. Since the repeatable read isolation level does not add a range lock to prohibit the insertion of new data in the range, the new data is successfully inserted between the two query intervals.

Finally, we found that the results of the two queries were different. The second query had one more line than the first, which resulted in phantom rows. This is the problem of phantom reading.

Similarly, let's verify the problem of phantom reading:

1. Set the transaction isolation level of terminal A to repeatable read, that is, repeatable read, start a transaction, and query data.

// 设置隔离级别为repeatable read
 set session transaction isolation level repeatable read;
 // 开启事务
start transaction;
// 查询数据
select * from phones;

insert image description here
2. Set the transaction isolation level of terminal B to repeatable read, that is, repeatable read, enable the transaction, and set the mobile phone price of model 001 to 1900.

// 设置隔离级别为repeatable read
 set session transaction isolation level repeatable read;
 // 开启事务
start transaction;
// 设置id为1的手机价格为1900
update phones set price = 1900 where id = 1;
// 提交事务
 commit;
 // 查询数据
select * from phones;

insert image description here

In the query result of terminal A, the price of mobile phone 001 is still 2,000 yuan, and there is no non-repeatable read problem, which shows that the transaction isolation level of repeatable read solves the problem of non-repeatable read.

However, the price of mobile phone 001 has obviously been changed to 1900 yuan by terminal B, and the transaction of terminal B has been submitted, and terminal A sees that the price is still 2000 yuan. At this time, if terminal A modifies the price, will it be wrong?

Let's go to terminal A to modify the price and see the result:

// 设置id为1的手机价格加50
update phones set price = price + 50 where id = 1;
// 查询数据
select * from phones;

insert image description here
Although the price of mobile phone 001 is 2000 when terminal A reads it, it will read the latest value, which is 1900, when modifying the price.

The repeatable read isolation level uses the MVCC (Multi-Version Concurrency Control, multi-version concurrency control) mechanism, the query (select) operation in the database will not update the version number, it is a snapshot read, and the data in the data table (insert , update, delete) will update the version number, which is currently read.

Next, let's take a look at the problem of phantom reading, start a transaction in terminal B, and insert a piece of data.

 // 开启事务
start transaction;
// 插入一条数据
insert into phones(model, price) values('004', 1960);
// 提交事务
 commit;
 // 查询数据
select * from phones;

insert image description here
At this time, terminal A did not experience phantom reading. After terminal A updated a piece of data and submitted it, phantom reading occurred:

// 设置id为1的手机价格加50
update phones set price = price - 50 where id = 1;
// 查询数据
select * from phones;
// 提交事务
 commit;
 // 查询数据
select * from phones;

insert image description here

SERIALIZABLE (serialization)

Lock combination: write lock + read lock + range lock

Serialization (SERIALIZABLE), as the name implies, for the same line of records, "write" will add "write lock", "read" will add "read lock". When a read-write lock conflict occurs, the later accessed transaction must wait for the previous transaction to complete before continuing to execute.

Serialization is the highest level of isolation. It enforces serial execution of transactions, and SERIALIZABLE will lock every row of data read, which is the safest operation and completely solves the situation of reading wrong data, but it may cause a lot of timeout and lock contention problems. Seriously affect performance.

It is absolutely impossible for modern databases not to consider performance. Concurrency Control (Concurrency Control) theory determines that the degree of isolation and concurrency capabilities are in conflict with each other. The higher the degree of isolation, the lower the throughput of concurrent access. Modern databases will definitely provide other isolation levels besides serializable for users to use, allowing users to adjust the isolation level independently. The fundamental purpose is to allow users to adjust the locking method of the database to achieve a balance between isolation and throughput.

Let's see how serialization avoids phantom reading:

1. Set the transaction isolation level of terminal A to serializable, that is, serialize, start a transaction, and query data.

// 设置隔离级别为serializable
set session transaction isolation level serializable;
// 开启事务
start transaction;
// 查询数据
select * from phones;

insert image description here
2. Set the transaction isolation level of terminal B to serializable, that is, serialize, start the transaction, and update the data.

// 设置隔离级别为serializable
set session transaction isolation level serializable;
// 开启事务
start transaction;
// 更新数据
update phones set price = 2000 where id = 1;

It can be seen that when terminal B performs an update operation on the data with the id of 1 in the phones table, it will be blocked, and the "ERROR 1205 (HY000): Lock wait timeout exceeded: try restarting transaction" error will be thrown after the lock times out. Thus avoiding phantom reads.
insert image description here

MVCC

In addition to being implemented with locks, the above four isolation levels have another common feature, that is, problems such as phantom reads, non-repeatable reads, and dirty reads are all caused by the fact that one transaction is affected by another transaction that writes data during the process of reading data. influence and destroy the isolation. Aiming at the isolation problem of "one transaction read + another transaction write", in recent years, a lock-free optimization scheme called "Multi-Version Concurrency Control" (MVCC) has been widely adopted by mainstream commercial databases. .

MVCC is a read optimization strategy, and its "lock-free" specifically means that no lock is required when reading. The basic idea of ​​MVCC is that any modification to the database will not directly overwrite the previous data, but a new version will coexist with the old version, so as to achieve the purpose of not locking at all when reading. In this sentence, "version" is a key word. You may wish to understand the version as two invisible fields in each row of records in the database: CREATE_VERSION and DELETE_VERSION. The values ​​of these two field records are transaction IDs. ID is a globally strictly increasing value, and then data is written according to the following rules.

insert data

CREATE_VERSION records the transaction ID for inserting data, and DELETE_VERSION is undefined.

Insert a piece of data into the phones table, and assume that the two version numbers of MVCC are create_version and delete_version: create_version represents the version number of the created row; delete_version represents the version number of the deleted row. In order to better display the effect, add another field trans_id describing the transaction version number.

The SQL statement for inserting data is as follows:

insert into phones(model, price) values('001', 2000)

insert image description here

change the data

Treat the modified data as a combination of "deleting old data and inserting new data", that is, first copy the original data, the DELETE_VERSION of the original data records the transaction ID of the modified data, and CREATE_VERSION is the transaction ID of the previous version. The CREATE_VERSION of the copied new data records the transaction ID of the modified data, and the DELETE_VERSION is undefined.

When performing a modification operation, the MVCC mechanism first copies the original data, sets the value of the price field to 2100, and then sets the value of the create_version field to the version number of the current system, while the value of the delete_version field is undefined.

In addition, the MVCC mechanism will also set the value of the delete_version field of the original row to the current system version number to indicate that the original row is deleted.

The SQL statement to modify the data is as follows:

update phones set price = 2100 where id = 1;

insert image description here

It should be noted here that the original line will be copied to the Undo Log.

delete data

DELETE_VERSION records the transaction ID for deleting data, and CREATE_VERSION is the transaction ID of the previous version.

delete from phones where id = 1;

insert image description here
When deleting a data row in the data table, the MVCC mechanism will write the version number of the current system into the deleted version field delete_version of the deleted data row, so as to identify that the current data row has been deleted.

At this point, if another transaction wants to read the changed data, it will decide which version of the data should be read according to the isolation level.

  • The isolation level is repeatable read: always read the record whose CREATE_VERSION is less than or equal to the current transaction ID. On this premise, if there are still multiple versions of the data, the latest one (with the largest transaction ID) is taken.
  • The isolation level is read submitted: always take the latest version, that is, the data record of the most recently submitted version.

There is no need to use MVCC for the other two isolation levels, because the original data can be directly modified by reading uncommitted, and other transactions can see it immediately when viewing the data, and there is no need for a version field at all. The original semantics of serialization is to block the read operations of other transactions, and MVCC is for lock-free optimization when reading, so naturally it will not be used together.

Summarize

The "key" to understand the isolation level is actually a lock: different isolation levels are actually different combinations of write locks, read locks, and range locks.

The four isolation levels from large to small correspond to the combination of four locks from more to less:

  • SERIALIZABLE (serializable): write lock + read lock + range lock
  • REPEATABLE READ (repeatable read): write lock + read lock (continuous)
  • RED COMMITTED (read committed): write lock + read lock (released after the read operation is completed)
  • RED UNCOMMITTED (read uncommitted): write lock

No need to memorize by rote, isn't it much easier for us to understand the isolation level from the perspective of locks?

Locks happen to be the key to understanding isolation levels.

References:
"Phoenix Architecture: Building a Reliable Large-Scale Distributed System"
"In-depth Understanding of Distributed Transactions: Principles and Practices"

Guess you like

Origin blog.csdn.net/zhanyd/article/details/123207954