Are all mysql select statements with the same conditions, why are the contents read different?

Suppose there is the following table in the current database.

picture

User table database original state

As an old rule, the following content still occurs by default under the repeatable read isolation level of the innodb engine .

picture

All select results are different

As you can see, thread 1 also age >= 3reads data. The first time I read 1 piece of data , this is the original state. After that, thread 2 also changed the age field of id=2 to 3.

Thread 1 reads two more times at this time. The result of one read is still the original one , and the result of another read is two. The difference is that for update is added or not.

Why are all readings under the same conditions, but the data read out is not the same?

Doesn't repeatable reading require the same content to be read every time?

To answer this question.

I need to start with the topic of how Pangu created the world.

Excuse me.

Lost.

Let's start with how the transaction is rolled back.

How is transaction rollback implemented?

When we execute a transaction, it is generally in the following format

begin;
操作1;
操作2;
操作3;
xxxxx
....
commit;

Before committing a transaction, various operations are performed, which can contain various logics.

As long as the logic is executed, it may report an error.

Recall that ACIDthere is one thing in the transaction A, atomicity , the whole transaction is a whole, either succeed together or fail together.

picture

ACID

If it fails, then the half-executed transaction has the ability to return to the state before the transaction was not executed, which is rollback .

The code to execute the transaction is written like the following.

begin;
try:
    操作1;
  操作2;
  操作3;
  xxxxx
  ....
  commit;
except Exception:
    rollback;

If the execution rollbackcan return to the state before the transaction is executed, it means that mysql needs to know some rows and what the data before the transaction looks like.

How does the database do it?

This refers to the undo log , which records what a row of data looked like before the transaction was executed.

For example id=1, in that row of data, the field is updatedname from **"Xiaobai" to "Xiaobai debug"**, then an undo log will be added to record the previous data.

picture

The undo log will record the previous data

Since there can be many concurrently executed transactions, there may be many undo logs. The id ( trx_id) field of the transaction is added to the log to indicate the undo log generated under which transaction.

At the same time, they are organized in the form of a linked listroll_pointer , and a pointer ( ) is added to the undo log to point to the previous undo log, thus forming a version chain .

picture

undo log version chain

With this version chain, when a transaction fails halfway through the execution, it will be rolled back directly. At this time, you can follow the version chain and return to the state before the transaction was executed.

What is current read and snapshot read

With the above undo log version chain, we can see that the latest data is in the header , and the ones after that are all old data versions. Whether it is the latest or old data version, we call it a data snapshot .

The current read is the header of the version chain, that is, the latest data .

Snapshot read is one of the snapshots in the version chain. Of course, if this snapshot happens to be the header, then the result of the snapshot read at this time is the same as the current read.

picture

Current Read and Snapshot Read

The ordinary select statements we usually execute, such as the following, are snapshot reads .

select * from user where phone_no=2;

And special select statements, such as selectadding lock in share modeor at the end for update, belong to the current read .

In addition , the insert,update,deleteoperations are all write operations . Since it is written, it must be the latest data, so the current read will be triggered.

So here comes the question.

The current read is the header of the version chain , so when the current read is executed, is it possible that another transaction happens to generate a newer snapshot to replace the current header and become a new header? Not reading the latest data?

The answer is no , whether it is a (special) read operation such as select ... for update or a write operation such as insert and update, this row of data will be locked . The undo log snapshot is also generated in the case of a write operation, and a lock needs to be obtained before the write operation is performed . Therefore, the write operation needs to block and wait for the current read to complete, and then the version chain can be updated only after the lock is obtained.

read view

A large number of transactions can be executed concurrently in the database, and each transaction will be assigned a transaction ID, which is incremented. The newer the transaction, the larger the ID.

In the undo log version chain of a row of data in the data table, each undo log also has a transaction id ( ), which is the transaction idtrx_id that created the undo log .

Not all transactions generate undo logs, that is to say, the undo log version chain of a row of data has only the ids of some transactions. However, all transactions may access the version chain corresponding to this row of data. Moreover, although there are many undo log snapshots on the version chain, not all undo logs can be read. After all, there are some undo logs, and the transaction that created them has not yet been committed. People may fail and roll back at any time.

The question now becomes, now there is a transaction that reads the undo log version chain by means of snapshot reading, which snapshots can it read? And which snapshot should it read?

Here we will introduce the concept of a read view . It's like a sliding window with upper and lower borders.

There are so many transactions in the entire database, which are divided into committed and uncommitted. Uncommitted means that these transactions are still in progress, which is the so-called active transaction . The ids of all active transactions, forming m_ids . The smallest transaction id among them is the lower boundary of the read view, called min_trx_id.

At the moment when the read view is generated , the largest transaction id in all transactions , plus 1, is the upper boundary of the read view, called max_trx_id.

Too many concepts, a bit messy? It's okay, continue to look down, there will be examples later.

Which snapshots a transaction can read

With this basic information, let's take a look at what snapshots the transaction can read under the read view ?

Remember a major premise: transactions can only read undo log data generated by themselves (transactions can be submitted or not), or data that other transactions have submitted .

Now that the transaction (let's call it transaction A ) has a read view, no matter which undo log version chain we look at, we can put the read view on the version chain. The version chain is divided into several parts.

picture

readview

  • trx_id of version chain snapshot < min_trx_id of read view

    From the above description, we can know that the m_ids of the read view come from the ids of all active transactions in the database , and the smallest min_trx_id is the lower boundary of the read view, because the transaction id is incremented according to time, so if the trx_id of the version chain snapshot is higher than If min_trx_id is even smaller, then these must be inactive (committed) transaction ids, and these snapshots can be read by transaction A.

  • trx_id of version chain snapshot >= max_trx_id of read view

    max_trx_id is generated at the moment transaction A creates the read view , and it is greater than the transaction id known to all databases at that time . So if a snapshot on the undo log version chain contains a trx_id larger than max_trx_id, it means that the snapshot has exceeded the "understanding range" of transaction A, and it should not be read.

  • min_trx_id of read view <= trx_id of version chain snapshot < max_trx_id of read view

  • If the trx_id of the version chain snapshot happens to be the id of transaction A, which happens to be the undo log snapshot generated by itself, it can be read regardless of whether it is committed or not .

  • If the trx_id of the version chain snapshot happens to be in the active transaction m_ids, then these transaction data have not been committed, so transaction A cannot read them

  • In addition to the above two cases, the rest are committed transaction data, which can be read with confidence.

Which snapshot the transaction will read

As mentioned above, transactions in the visible range of the read view have the opportunity to read more than N snapshots. But with so many snapshot versions, which snapshot will the transaction read?

The transaction will traverse the undo log version chain from the header , and it will use the trx_id in each undo log to judge the upper and lower boundaries of its own read view. The first snapshot that occurs less than max_trx_id .

  • If the snapshot is generated by itself, it is okay to mention it or not , and it is decided to read it.

  • If the snapshot was generated by someone else and has been submitted , that's fine, I decided to read it.

For example, in the figure below, it undo日志1is just less than max_trx_id, and the transaction has been committed, then read it.

picture

readview and undo version chain

what is MVCC

Like the above, maintain a multi-snapshot undo log version chain , and the transaction read viewdecides which undo log snapshot to read according to its own . Ideally, each transaction reads its own snapshot, and then uses this snapshot. Do your own logic, and only operate the latest row data when writing data, so that reading and writing are separated . Compared with the way of single row data without snapshots, it can better resolve read and write conflicts, so the database Concurrency performance is also better. In fact, this is the MVCC often asked in interviews , the full name of Multi - Version Control Currency Control , that is, multi - version concurrency control.

picture

MVCC

How are the four isolation levels implemented?

An article I wrote earlier left a question at the end, how are the four isolation levels implemented.

After knowing the undo log version chain and MVCC , let's go back and look at this problem.

picture

Four-layer isolation level

The read is not committed , and the latest data is read every time, regardless of whether the transaction where the data row is located is committed. The implementation is also very simple, just read the head of the undo log version chain (the latest snapshot) every time.

Unlike read uncommitted, read committed and repeatable read isolation levels are implemented based on MVCC's read view . Conversely, MVCC will only appear in these two isolation levels .

Read the submitted isolation level, each time a normal select is executed, a new read view will be regenerated , and then the latest read view will be traversed one by one on the version chain of a row of data to find the first suitable data. In this way, the latest committed data of other transactions can be read every time .

The transaction under the repeatable read isolation level is only generated when the normal select is executed for the first time , and the read view will be reused no matter how many times the normal select is executed . In this way, each reading can be kept under the same standard, and the data read will be the same.read view

The purpose of serialization is to make concurrent transactions look like single-threaded execution, and the implementation is also very simple. Like the read-uncommitted isolation level , the transaction read-only undo log chain header under the serialization isolation level is the chain header of the undo log chain. The snapshot of the latest version , and even a normal select will add a read lock on the latest snapshot of the version chain . In this way, if other transactions want to write, they have to wait for the read lock to be released. All transactions that operate on this row of data are honestly blocked waiting for locks, and processed one by one, which is the same as single-threaded processing in effect.

Look at the example at the beginning of the article

We use the concepts mentioned above to go back to the example at the beginning of the article and sort it out.

picture

User table database original state

We assume that the three pieces of data at the beginning of the database are all generated by trx_id=1the transaction insert.

So the data table looks like this at the beginning. There is only one snapshot per row of data. Note that in the snapshot, trx_idthe transaction id that created them is filled in, which is the transaction just mentioned 1. roll_pointerOriginally, it should point to the undo log generated by insert. For simplicity, it is written here as null(the insert undo log can be cleaned up after the transaction is committed).

picture

User table database original trx information

The picture below is still the picture at the beginning of the article. It is released here for the convenience of everyone, so there is no need to go back and read it.

picture

All select results are different

When thread 1 starts a transaction, we assume that its transaction trx_id=2, the first time a normal select is executed, is a snapshot read , and at the repeatable read isolation level, one will be generated read view. Currently this database, there is only one active transaction, that m_ids =[2]. The smallest id in m_ids, that is min_trx_id=2. max_trx_id is the current maximum database transaction id (only itself, so it is also 2), plus 1, that ismax_trx_id=3

picture

readview of transaction 1

At this time , the transaction of thread 1 takes this read view to read the database table.

Because the trx_id=1 of these three pieces of data is smaller than min_trx_id=2, they belong to the visible range, so all snapshots of these three pieces of data can be read, and finally the data that meets the conditions (age>=3) is returned, and there is one piece.

At this time, transaction 2, assuming its transaction trx_id=3, performs the update operation and generates a new undo log snapshot.

picture

Add user table database to undo log

At this time, thread 1 executes ordinary select for the second time , or snapshot read . Since it is a repeatable read, the previous read view will be reused, and a read operation will be performed again. Here, focus on the row of data with id=2. Start traversal, the first snapshot trx_id=3 >= read view's max_trx_id=3 , so it is not readable, traverse the next snapshot trx_id=1 < min_trx_id=2 , readable. Therefore, the row of data with id=2 still gets age=2, not the updated age=3, so the snapshot read result still has only one data that matches age>=3.

But thread 1 reads the third time, executes select for update , it becomes the current read , and directly reads the latest snapshot of the line in the version chain of the undo log , so it can read id=2, age=3, so the final result returned is consistent with There are 2 pieces of data with age>=3 .

In general, we see different results due to different rules for reading data between snapshot read and current read.

Seeing this, everyone should understand that the so-called repeatable read must read the same data every time , and the "read" in the header here refers to the snapshot read.

If the interviewer asks you next time, will the data read every time under the repeatable read isolation level be the same?

You should know how to answer, right?

Summarize

  • The transaction is rolled back through the undo log, thereby realizing the atomicity of the transaction.

  • The undo logs generated by multiple transactions form a version chain. When a snapshot is read , the transaction determines which snapshot to read based on the read view. The current read transaction directly reads the latest snapshot version.

  • MySQL's innodb engine improves read and write concurrency through MVCC.

At last

Recently, the reading volume of original updates has steadily declined, and after thinking about it, I tossed and turned at night.

I have an immature request.

picture

It's been a long time since I left Guangdong, and no one called me Pretty Boy for a long time.

Can you call me a pretty boy in the comment area ?

Can such a kind and simple wish of mine be fulfilled?

If you really can't speak out, can you help me click the like and watch in the lower right corner ?

Stop talking, let's choke in the ocean of knowledge together

Pay attention to the public number: [Xiaobai debug]

Not satisfied with talking shit in the message area?

Add me, we have set up a group of paddling and bragging. In the group, you can chat about interesting topics with colleagues or interviewers you may encounter next time you change jobs. Just super! open! Heart!

picture

picture

Recommended articles:

Guess you like

Origin blog.csdn.net/ilini/article/details/124464707