What MVCC that?

MVCC stands for Multi-Version Concurrency Control, databases, etc. typically used in a scene, multi-version concurrency control


Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.


If no concurrency control, so if users read and write data at the same time, it may be inconsistent data reading situation. For example, a bank account transfer A to B of A account when the money was deducted, and the money has not been added to the B account, then the user to view their balances, will feel the money out of thin air disappeared. MySQL's isolation is used to solve such problems, but isolation is achieved by different concurrency control. For the previous question, a simple concurrency control is to speak read and write operations serialization, when transfers between accounts, check account is not allowed, although this approach can solve the problem, but certainly too simple and crude, low efficiency . Compared to the serialization of Concurrency Control, MVCC has the advantage of reading and writing influence, for reading and writing less modern Internet scene, this approach significantly higher performance.


MVCC is achieved through concurrent save multiple versions of data control, when you need to update a data storage system implements MVCC does not overwrite the original data with new data immediately, but to create a new version of this record. For most database systems, storage will be divided into Data Part and Undo Log, Data Part is used to store transaction data has been submitted, and Undo Log is used to store older versions of data. There are multiple versions of the separation allows read and write, and read operations are needed to read the data before a version, and write operations do not conflict, greatly improving performance.

Each record while recording a rollback in the update is going to be. The same record multiple versions can exist in the system, which is multi-version concurrency control database. (MVCC).


MVCC effect

If the time determined in accordance with MVCC version data, time at Time = 1, the state of the database as follows:


Time Record A Record B

“Record A When time=0” “Record B when time=0”

“Record A When time=1”  

This time the system actually stored three records, Record A at time 0 and 1 of each record, Record is a record B, if a transaction time at Time = 0 is turned on, then the data is read:


Record A Record B

“Record A When time=0” “Record B when time=0”

If this transaction when an opening at Time =, then the data is read:


Record A Record B

“Record A When time=1” “Record B when time=0”

Case can be seen above, for reading is concerned, the transaction can only read one version and the latest piece of data prior to this release, if at Time = 2, when the transaction Transaction X To insert a Record C, and update the Record B,

But the transaction has not been committed, then the state of the database as follows:


Time Record A Record B Record C

“Record A When time=0” “Record B when time=0”  

“Record A When time=1”  

2(Not Committed) “Record B when time=2” “Record C When time=2”

This time will read what other transactions are of? In this case, read other firm can see the latest version of the system is that the system is in Time = 1, when it still will not read Transaction X overwritten data, the data will still be read at this time:


Record A Record B

“Record A When time=1” “Record B when time=0”

Based on this version of the mechanism, there would not be read by another transaction occurred while reading Record Record intermediate results B and C have not been updated in the Transaction X, as other office system can still be seen in a state of Time = 1 .


As said, each transaction should see what specific version of the data, this is determined by the MVCC implement different systems, below I will introduce MySQL's MVCC implementation. In addition to the read data must be less than or equal external version of the current system has been submitted in writing Affairs

It must be greater than the current version of the submission, and here think if there will be a problem if the time Time = 2, opening a number of write or update transactions, when they try to submit the same time, there must be a transaction database has been found Time = 2 in the state,

So how to do this transaction? You can think about.


MySQL的MVCC

MySQL的Innodb引擎支持多种事务隔离级别,而其中的RR级别(Repeatable-Read)就是依靠MVCC来实现的,MySQL中MVCC的版本指的是事务ID(Transaction ID),首先来看一下MySQL Innodb中行记录

的存储格式,除了最基本的行信息外,还会有一些额外的字段,这里主要介绍和MVCC有关的字段:DATA_TRX_ID和DATA_ROLL_PTR,如下是一张表的初始信息:


Primary Key Time Name DATA_TRX_ID DATA_ROLL_PTR

2018-4-28 Huan NULL

这里面为了便于说明,表中DATA_TRX_ID和DATA_ROLL_PTR存的值是Mock的值:


DATA_TRX_ID:最近更新这条记录的Transaction ID,数据库每开启一个事务,事务ID都会增加,每个事务拿到的ID都不一样

DATA_ROLL_PTR:用来存储指向Undo Log中旧版本数据指针,支持了事务的回滚

最开始的记录无法回滚,所以DATA_ROLL_PTR为空。


这个时候开启事务A(事务ID:2),对记录进行了更新,但还没有提交,那么当前的数据为:


Transaction 1


可以看到,旧的数据会被存到Undo Log中,通过当前记录中的DATA_ROLL_PTR关联,那么如果另一个事务中想读取该数据,读到的会是什么数据了?假如说另一个事务B在事务A之后开启(事务ID:3),

既然我们最开始说Innodb的MVCC是基于事务ID做的,那么既然事务B的事务ID比事务A的大,那么事务B就可以独到A还未提交的数据了,这明显和Innodb RR的定义不符合。实际上,事务读取时,

判断应该读取哪个版本的记录,有一个较为复杂的逻辑,不是单纯的和记录上的事务ID进行比较,假设当前读的事务ID为read_id,记录当前存储的事务ID为tid,当前系统中未提交的事务中                  郑州×××医院×××:http://myyk.familydoctor.com.cn/yiyuanzaixian/zztjyy//

(Read_View中)的最大最小事务ID分别为max_tid和min_tid,那么数据可见性判断流程为:


通过上图(这个图是通过分析网上的一些博客内容得到的,和实际MySQL的逻辑细节可能不一致),在来分析上文提到的Case,由于事务B的事务ID不满足read_id=tid||tid<min_tid的条件,

且该记录当前有DATA_ROLL_PTR,所以最后该事务B实际读取的是Undo Log中的记录:


Primary Key Time Name DATA_TRX_ID DATA_ROLL_PTR

2018-4-28 Huan NULL

需要注意的是,MySQL的MVCC和理论上的MVCC实际有所差异,MySQL同一时刻只允许一个事务去操作某条数据,该条数据上的操作实际是串行的,也就是说一条记录的有用版本实际就只会有当前记录

和一条Undo Log记录,是悲观锁的操作方式,而MVCC的定义上实际是乐观锁的操作方式,某一时刻记录可以存在很多个版本。