Starting from 0, handwriting MySQL transactions

Say it in front: starting from 0, the learning value of handwriting MySQL

A 7-year-experienced guy who was once mentored by Nien got a monthly salary of 40K by virtue of his proficiency in Mysql.

Starting from 0, the learning value of handwriting a MySQL lies in:

  • You can deeply understand the internal mechanism and principle of MySQL. Mysql can be said to be the absolute focus and difficulty of the interview
  • So as to better grasp the use and optimization of MySQL.
  • Help you improve your programming skills and problem-solving skills.
  • As a high-quality resume wheel project, note that this is a high-quality resume wheel project

The projects of many small partners are very low, and there is an extreme lack of wheel projects, so, here comes the wheel project.

Article directory

Handwritten DB architecture design:

Nien's style: Before you start writing code, do the architecture first

Functionally, a handwritten DB system architecture is divided into the following modules:

  • Data Manager DM
  • Transaction ManagerTM
  • Version Manager (VM)
  • Table Manager (TBM)
  • Index Manager (IM),

Handwritten DB architecture design design diagram, as follows:

Starting from 0, handwriting Mysql transaction

A transaction is a coherent sequence of operations in an application, all of which must complete successfully, otherwise all changes made during each operation are undone.

That is, transactions are atomic, and a series of operations in a transaction either succeed or none are performed.

There are two ways to end a transaction. When all steps in the transaction are successfully executed, the transaction is committed.

If one of these steps fails, a rollback operation occurs, undoing all operations at the beginning of the transaction.

The statement of the transaction definition is as follows

(1) BEGIN TRANSACTION : The transaction starts.

(2) END TRANSACTION : The transaction ends.

(3) COMMIT : Transaction commit. This operation indicates the successful end of the transaction, and it notifies the transaction manager that all update operations of the transaction can now be committed or retained permanently.

(4) ROLLBACK : Transaction rollback. This operation indicates that the transaction ended unsuccessfully. It will notify the transaction manager that a failure occurred, the database may be in an inconsistent state, and all update operations of the transaction must be rolled back or undone.

5 states of affairs

A transaction is the basic execution unit of a database, and if the transaction is executed successfully, the database enters another consistent state from one consistent state.

The status of the transaction status has the following 5 types

  • Active state : The initial state of the transaction, which is in this state when the transaction is executed.
  • Partially committed state : When the last statement of the operation sequence is executed, the transaction is in the partially committed state. At this time, although the transaction has been fully executed, because the actual output may still temporarily reside in the memory, hardware failure may occur before the transaction is successfully completed, therefore, the partial commit status does not mean that the transaction is successfully executed.
  • Failure state : Due to hardware or logic errors, the transaction cannot continue to execute normally, and the transaction enters the failure state, and the transaction in the failure state must be rolled back. In this way, the transaction enters the suspended state.
  • Aborted state : The transaction is rolled back and the database is restored to the state it was in before the transaction started.
  • Commit state : When the transaction is successfully completed, the transaction is said to be in the commit state. Only after the transaction is in the committed state can the transaction be said to have been committed.

If for some reason the transaction fails to execute successfully, but it has already modified the database, it may cause the database to be in an inconsistent state at this time, and it is necessary to undo (roll back) the changes caused by the transaction.

Transitions between 5 states of a transaction

  • BEGIN TRANSACTION : start running a transaction, make the transaction active
  • END TRANSACTION : Indicates that all read and write operations in the transaction have been completed, the transaction enters a partial commit state, and the impact of all operations of the transaction on the database is stored in the database.
  • COMMIT : Marks that the transaction has been successfully completed, the impact of all operations in the transaction on the database has been safely stored in the database, the transaction enters the commit state, and ends the operation of the transaction.
  • ABORT : Mark the transaction to enter the failure state, the system cancels the impact of all operations in the transaction on the database and other transactions, and ends the operation of the transaction.

How to manage the state of the transaction?

In the handwritten database MYDB, each transaction has an XID, which uniquely identifies the transaction.

The XID of the transaction starts from 1 and increments by itself, and cannot be repeated.

The transaction manager TM maintains the state of the transaction by maintaining the XID file, and provides an interface for other modules to query the state of a certain transaction.

It is stipulated that XID 0 is a super transaction (Super Transaction).

When some operations want to be performed without applying for a transaction, you can set the XID of the operation to 0. The status of a transaction with an XID of 0 is always committed.

Each transaction has three states:

  • active , in progress, not yet over, that is, the active state
  • committed , submitted status
  • aborted , rollback status

It is defined as follows:

 // 事务的三种状态
//活动状态
private static final byte FIELD_TRAN_ACTIVE   = 0;
//已提交状态
private static final byte FIELD_TRAN_COMMITTED = 1;
//回滚状态
private static final byte FIELD_TRAN_ABORTED  = 2;

The XID file allocates one byte of space to each transaction to store its state.

At the same time, an 8-byte number is stored at the head of the XID file, recording the number of transactions managed by the XID file.

Therefore, the status of the transaction xid in the file is stored at (xid-1)+8 bytes, and xid-1 is because the status of xid 0 (Super XID) does not need to be recorded.

Some interfaces are provided in TransactionManager for other modules to call to create transactions and query transaction status. The interface methods are as follows:

//开启事务
long begin();
//提交事务
void commit(long xid);
//撤销事务
void abort(long xid);
// 判断事务状态-活动状态
boolean isActive(long xid);
// 是否提交状态
boolean isCommitted(long xid);
// 是否失败状态
boolean isAborted(long xid);
// 关闭事务管理TM
void close();

create xid file

We need to create an xid file and create a TM object, the specific implementation is as follows:

public static TransactionManagerImpl create(String path) {
    
    
	File f = new File(path+TransactionManagerImpl.XID_SUFFIX);
    try {
    
    
        if(!f.createNewFile()) {
    
    
            Panic.panic(Error.FileExistsException);
        }
    } catch (Exception e) {
    
    
        Panic.panic(e);
    }
    if(!f.canRead() || !f.canWrite()) {
    
    
        Panic.panic(Error.FileCannotRWException);
    }

    FileChannel fc = null;
    RandomAccessFile raf = null;
    try {
    
    
        raf = new RandomAccessFile(f, "rw");
        fc = raf.getChannel();
    } catch (FileNotFoundException e) {
    
    
       Panic.panic(e);
    }

    // 写空XID文件头
    ByteBuffer buf = ByteBuffer.wrap(new byte[TransactionManagerImpl.LEN_XID_HEADER_LENGTH]);
    try {
    
    
        //从零创建 XID 文件时需要写一个空的 XID 文件头,即设置 xidCounter 为 0,
        // 否则后续在校验时会不合法:
        fc.position(0);
        fc.write(buf);
    } catch (IOException e) {
    
    
        Panic.panic(e);
    }

    return new TransactionManagerImpl(raf, fc);
}

//从一个已有的 xid 文件来创建 TM
public static TransactionManagerImpl open(String path) {
    
    
    File f = new File(path+TransactionManagerImpl.XID_SUFFIX);
    if(!f.exists()) {
    
    
        Panic.panic(Error.FileNotExistsException);
    }
    if(!f.canRead() || !f.canWrite()) {
    
    
        Panic.panic(Error.FileCannotRWException);
    }

    FileChannel fc = null;
    RandomAccessFile raf = null;
    try {
    
    
        //用来访问那些保存数据记录的文件
        raf = new RandomAccessFile(f, "rw");
        //返回与这个文件有关的唯一FileChannel对象
        fc = raf.getChannel();
    } catch (FileNotFoundException e) {
    
    
       Panic.panic(e);
    }

    return new TransactionManagerImpl(raf, fc);
}

define constant

Let's look at the implementation class TransactionManagerImpl of the TransactionManager interface. First define some necessary constants

// XID文件头长度
static final int LEN_XID_HEADER_LENGTH = 8;
// 每个事务的占用长度
private static final int XID_FIELD_SIZE = 1;

// 事务的三种状态
//活动状态
private static final byte FIELD_TRAN_ACTIVE   = 0;
//已提交状态
private static final byte FIELD_TRAN_COMMITTED = 1;
//回滚状态
private static final byte FIELD_TRAN_ABORTED  = 2;

// 超级事务,永远为commited状态
public static final long SUPER_XID =1;

// XID 文件后缀
static final String XID_SUFFIX = ".xid";

private RandomAccessFile file;
private FileChannel fc;
private long xidCounter;
//显示锁
private Lock counterLock;

FileChannel is used to read and write files in NIO mode. FileChannel provides a way to access files through channels. It can use the position(int) method with parameters to locate any position in the file and start to operate. It can also map files to Direct memory to improve the access efficiency of large files. About java NIO, please refer to <<java High Concurrency Core Programming (Volume 1)>>.

Verify that the XID file is legal

After the constructor creates a TransactionManager, it must first verify the XID file to ensure that it is a legal XID file.

The verification method is also very simple. The theoretical length of the file is deduced from the 8-byte number in the file header, and compared with the actual length of the file. If different, the XID file is considered invalid.

TransactionManagerImpl(RandomAccessFile raf, FileChannel fc) {
    
    
    this.file = raf;
    this.fc = fc;
    //显式锁
    counterLock = new ReentrantLock();
    checkXIDCounter();
}

/**
 * 检查XID文件是否合法
 * 读取XID_FILE_HEADER中的xidcounter,根据它计算文件的理论长度,对比实际长度
 */
private void checkXIDCounter() {
    
    
    long fileLen = 0;
    try {
    
    
        fileLen = file.length();
    } catch (IOException e1) {
    
    
        Panic.panic(Error.BadXIDFileException);
    }
    if(fileLen < LEN_XID_HEADER_LENGTH) {
    
    
        //对于校验没有通过的,会直接通过 panic 方法,强制停机。
        // 在一些基础模块中出现错误都会如此处理,
        // 无法恢复的错误只能直接停机。
        Panic.panic(Error.BadXIDFileException);
    }

    // java NIO中的Buffer的array()方法在能够读和写之前,必须有一个缓冲区,
    // 用静态方法 allocate() 来分配缓冲区
    ByteBuffer buf = ByteBuffer.allocate(LEN_XID_HEADER_LENGTH);
    try {
    
    
        fc.position(0);
        fc.read(buf);
    } catch (IOException e) {
    
    
        Panic.panic(e);
    }
    //从文件开头8个字节得到事务的个数
    this.xidCounter = Parser.parseLong(buf.array());
    // 根据事务xid取得其在xid文件中对应的位置
    long end = getXidPosition(this.xidCounter + 1);
    if(end != fileLen) {
    
    
        //对于校验没有通过的,会直接通过 panic 方法,强制停机
        Panic.panic(Error.BadXIDFileException);
    }
}

The lock uses the reentrant lock ReentrantLock. ReentrantLock is a basic implementation class of the explicit lock provided by the JUC package. The ReentrantLock class implements the Lock interface. It has the same concurrency and memory semantics as synchronized, but it has time-limited preemption, Some advanced lock features such as interruptible preemption. For the content of ReenttrantLock, please refer to <<Java High Concurrency Core Programming (Volume 2)>>.

Theoretical length of the file: the first 8 bytes + the bytes occupied by a transaction state * the number of transactions;

Use the 8 bytes at the beginning of the xid file (recording the number of transactions) to deduce the theoretical length of the file and compare it with the actual length of the file. If not, the XID file is considered invalid. It will be checked every time the xid object is created.

For those that do not pass the verification, the panic method will be used directly to force the shutdown. Errors in some basic modules will be handled in this way, and errors that cannot be recovered can only be stopped directly.

To obtain the offset of the xid state in the file, the getXidPosition() method is implemented as follows:

// 根据事务xid取得其在xid文件中对应的位置
private long getXidPosition(long xid) {
    
    
    return LEN_XID_HEADER_LENGTH + (xid-1)*XID_FIELD_SIZE;
}

open transaction

begin()Open a transaction, initialize the structure of the transaction, and store it in activeTransaction for inspection and snapshot use:

 /**
 *begin() 每开启一个事务,并计算当前活跃的事务的结构,将其存放在 activeTransaction 中,
 * 用于检查和快照使用:
 * @param level
 * @return
 */
@Override
public long begin(int level) {
    
    
    lock.lock();
    try {
    
    
        long xid = tm.begin();
        //activeTransaction 当前事务创建时活跃的事务,,如果level!=0,放入t的快照中
        Transaction t = Transaction.newTransaction(xid, level, activeTransaction);
        activeTransaction.put(xid, t);
        return xid;
    } finally {
    
    
        lock.unlock();
    }
}

change transaction state

The offset of the transaction xid = the first 8 bytes + the bytes occupied by a transaction state * the xid of the transaction;

Calculate the offset to record the state of the transaction through the transaction xid, and then change the state.

The specific implementation is as follows:

// 更新xid事务的状态为status
private void updateXID(long xid, byte status) {
    
    
    long offset = getXidPosition(xid);
    //每个事务占用长度
    byte[] tmp = new byte[XID_FIELD_SIZE];
    tmp[0] = status;
    ByteBuffer buf = ByteBuffer.wrap(tmp);
    try {
    
    
        fc.position(offset);
        fc.write(buf);
    } catch (IOException e) {
    
    
        Panic.panic(e);
    }
    try {
    
    
        //将数据刷出到磁盘,但不包括元数据
        fc.force(false);
    } catch (IOException e) {
    
    
        Panic.panic(e);
    }
}

Among them, abort() and commit() call this method,

// 提交XID事务
public void commit(long xid) {
    
    
    updateXID(xid, FIELD_TRAN_COMMITTED);
}

// 回滚XID事务
public void abort(long xid) {
    
    
    updateXID(xid, FIELD_TRAN_ABORTED);
}

Update xid Header

Every time a transaction is created, the number of transactions recorded in the file header needs to be +1;

// 将XID加一,并更新XID Header
private void incrXIDCounter() {
    
    
    xidCounter ++;
    ByteBuffer buf = ByteBuffer.wrap(Parser.long2Byte(xidCounter));
    //游标pos, 限制为lim, 容量为cap
    try {
    
    
        fc.position(0);
        fc.write(buf);
    } catch (IOException e) {
    
    
        Panic.panic(e);
    }
    try {
    
    
        fc.force(false);
    } catch (IOException e) {
    
    
        Panic.panic(e);
    }
}

Judging the transaction status

According to xid-"obtain the offset of the status of the record transaction xid-"read the status of the transaction xid-"whether it is equal to status.

// 检测XID事务是否处于status状态
private boolean checkXID(long xid, byte status) {
    
    
    long offset = getXidPosition(xid);
    ByteBuffer buf = ByteBuffer.wrap(new byte[XID_FIELD_SIZE]);
    try {
    
    
        fc.position(offset);
        fc.read(buf);
    } catch (IOException e) {
    
    
        Panic.panic(e);
    }
    return buf.array()[0] == status;
}

// 活动状态判断
public boolean isActive(long xid) {
    
    
    if(xid == SUPER_XID) return false;
    return checkXID(xid, FIELD_TRAN_ACTIVE);
}

// 已提交状态判断
public boolean isCommitted(long xid) {
    
    
    if(xid == SUPER_XID) return true;
    return checkXID(xid, FIELD_TRAN_COMMITTED);
}

//回滚状态判断
public boolean isAborted(long xid) {
    
    
    if(xid == SUPER_XID) return false;
    return checkXID(xid, FIELD_TRAN_ABORTED);
}

Close TM

 //TM关闭
public void close() {
    
    
    try {
    
    
        fc.close();
        file.close();
    } catch (IOException e) {
    
    
        Panic.panic(e);
    }
}

Two-phase locks implement transaction operations

Two-phase lock (2PL) function introduction

Transaction scheduling generally includes serial scheduling and parallel scheduling; first, let's understand the following concepts:

  • Concurrency control : In a system shared by multiple users, many users may operate on the same data at the same time
  • Scheduling : the execution order of transactions
  • Serial scheduling : Multiple transactions are executed serially in sequence, and all operations of another transaction are executed only after all operations of one transaction are executed. As long as it is serial scheduling, the execution result is correct.

  • Parallel scheduling: use time-sharing method to process multiple transactions at the same time. However, the scheduling results of parallel scheduling may be wrong and may produce inconsistent states, including: lost modification, non-repeatable read and read dirty data.

Parallel Scheduling of Transactions

In a transaction, it is divided into a locking (lock) phase and an unlocking (unlock) phase, that is, all lock operations are before the unlock operation, as shown in the following figure:

Two phases of transaction locking and unlocking

In actual situations, SQL is ever-changing and the number of entries is uncertain. It is difficult for the database to determine what is the locking phase and what is the unlocking phase in the transaction. So S2PL (Strict-2PL) was introduced, that is, in the transaction, only when committing or rolling back is the unlocking phase, and the rest of the time is the locking phase. The introduction of 2PL is to ensure the isolation of transactions, that is, multiple transactions are equivalent to serial execution in the case of concurrency.

2 phase lock lock phase

The first stage is the stage of obtaining the lock, which is called the expansion stage: in fact, this stage can enter the lock operation. Before reading any data, you must apply for an S lock, and before performing a write operation, you must apply for and obtain an X lock. , the lock is unsuccessful, the transaction enters the waiting state, and does not continue until the lock is successful. Once locked, it cannot be unlocked.

The second stage is the stage of releasing the lock, which is called the contraction stage: when the transaction releases a block, the transaction enters the block stage, in which only unlocking can be performed but no locking operation can be performed.

Two-phase locking (2PL) transaction implementation

Transactions in mysql are implicit transactions by default. When performing insert, update, and delete operations, the database automatically starts, commits, or rolls back transactions.

Whether to enable implicit transactions is controlled by the variable autocommit.

So transactions are divided into implicit transactions and explicit transactions .

Implicit transactions are automatically opened, committed, or rolled back by transactions, such as insert, update, and delete statements. The opening, committing, or rollback of transactions is automatically controlled by mysql.

Explicit transactions need to be manually opened, committed or rolled back, and are controlled by the developer himself.

Handwritten two-phase lock (2PL) transaction

Before implementing the transaction isolation level, we need to discuss the Version Manager (VM).

VM is based on the two-stage lock protocol to realize the serialization of the scheduling sequence, and implements MVCC to eliminate read and write blocking.

Two isolation levels are implemented at the same time. VM is the transaction and data version management core of the handwritten database MYDB.

transaction storage

For a record, MYDB uses the Entry class to maintain its structure.

Although in theory, MVCC implements multiple versions, but in implementation, VM does not provide Update operation, and the update operation for fields is implemented by the following table and field management (TBM).

So in the implementation of VM, there is only one version of a record.

A record is stored in a Data Item, so just save a DataItem reference in the Entry:

public class Entry {
    
    

    private static final int OF_XMIN = 0;
    private static final int OF_XMAX = OF_XMIN+8;
    private static final int OF_DATA = OF_XMAX+8;

    private long uid;
    private DataItem dataItem;
    private VersionManager vm;

    public static Entry newEntry(VersionManager vm, DataItem dataItem, long uid) {
    
    
        Entry entry = new Entry();
        entry.uid = uid;
        entry.dataItem = dataItem;
        entry.vm = vm;
        return entry;
    }

    public static Entry loadEntry(VersionManager vm, long uid) throws Exception {
    
    
        DataItem di = ((VersionManagerImpl)vm).dm.read(uid);
        return newEntry(vm, di, uid);
    }

    public void release() {
    
    
        ((VersionManagerImpl)vm).releaseEntry(this);
    }

    public void remove() {
    
    
        dataItem.release();
    }
}

The data format stored in an Entry is stipulated as follows:

[XMIN]  [XMAX]  [DATA]
8个字节  8个字节

XMIN is the transaction number that created the record (version), and XMAX is the transaction number that deleted the record (version). DATA is the data held by this record.

According to this structure, the wrapEntryRaw() method called when creating a record is as follows:

public static byte[] wrapEntryRaw(long xid, byte[] data) {
    
    
    byte[] xmin = Parser.long2Byte(xid);
    byte[] xmax = new byte[8];
    return Bytes.concat(xmin, xmax, data);
}

Similarly, if you want to get the data held in the record, you need to parse it according to this structure:

// 以拷贝的形式返回内容
public byte[] data() {
    
    
    dataItem.rLock();
    try {
    
    
        SubArray sa = dataItem.data();
        byte[] data = new byte[sa.end - sa.start - OF_DATA];
        System.arraycopy(sa.raw, sa.start+OF_DATA, data, 0, data.length);
        return data;
    } finally {
    
    
        dataItem.rUnLock();
    }
}

Here the data is returned in the form of a copy. If you need to modify it, you need to follow a certain process: you need to call before()the method call unBefore()the method when you want to undo the modification, and call after()the method .

The whole process is mainly to save the previous phase data and log in time. DM will guarantee that modification of DataItem is atomic.

@Override
public void before() {
    
    
    wLock.lock();
    pg.setDirty(true);
    System.arraycopy(raw.raw, raw.start, oldRaw, 0, oldRaw.length);
}

@Override
public void unBefore() {
    
    
    System.arraycopy(oldRaw, 0, raw.raw, raw.start, oldRaw.length);
    wLock.unlock();
}

@Override
public void after(long xid) {
    
    
    dm.logDataItem(xid, this);
    wLock.unlock();
}

Setting the value of XMAX reflects that if it needs to be modified, certain rules must be followed, and this version is invisible to each transaction after XMAX, which is equivalent to deletion. The code of setXmax is as follows:

public void setXmax(long xid) {
    
    
    dataItem.before();
    try {
    
    
        SubArray sa = dataItem.data();
        System.arraycopy(Parser.long2Byte(xid), 0, sa.raw, sa.start+OF_XMAX, 8);
    } finally {
    
    
        dataItem.after(xid);
    }
}

open transaction

begin() calculates the structure of the current active transaction every time it starts a transaction, and stores it in activeTransaction,

@Override
public long begin(int level) {
    
    
    lock.lock();
    try {
    
    
        long xid = tm.begin();
        Transaction t = Transaction.newTransaction(xid, level, activeTransaction);
        activeTransaction.put(xid, t);
        return xid;
    } finally {
    
    
        lock.unlock();
    }
}

commit transaction

The commit() method commits a transaction, mainly to free the related structure, release the held lock, modify the TM state, and remove the transaction from activeTransaction.

@Override
public void commit(long xid) throws Exception {
    
    
    lock.lock();
    Transaction t = activeTransaction.get(xid);
    lock.unlock();

    try {
    
    
        if(t.err != null) {
    
    
            throw t.err;
        }
    } catch(NullPointerException n) {
    
    
        System.out.println(xid);
        System.out.println(activeTransaction.keySet());
        Panic.panic(n);
    }

    lock.lock();
    activeTransaction.remove(xid);
    lock.unlock();

    lt.remove(xid);
    tm.commit(xid);
}

rollback transaction

There are two ways to abort a transaction, manual and automatic.

Manual refers to calling the abort() method, while automatic means that when a deadlock is detected in the transaction, the rollback transaction will be automatically undone; or when a version jump occurs, it will also be automatically rolled back:

/**
 * 回滚事务
 * @param xid
 */
@Override
public void abort(long xid) {
    
    
    internAbort(xid, false);
}

private void internAbort(long xid, boolean autoAborted) {
    
    
    lock.lock();
    Transaction t = activeTransaction.get(xid);
   //手动回滚
    if(!autoAborted) {
    
    
        activeTransaction.remove(xid);
    }
    lock.unlock();

    //自动回滚
    if(t.autoAborted) return;
    lt.remove(xid);
    tm.abort(xid);
}

delete transaction

When a transaction commits or aborts, it releases all locks it holds and removes itself from the wait graph.

And choose an xid from the waiting queue to occupy the uid.

When unlocking, just unlock the Lock object, so that other business threads can obtain the lock and continue to execute.

public void remove(long xid) {
    
    
    lock.lock();
    try {
    
    
        List<Long> l = x2u.get(xid);
        if(l != null) {
    
    
            while(l.size() > 0) {
    
    
                Long uid = l.remove(0);
                selectNewXID(uid);
            }
        }
        waitU.remove(xid);
        x2u.remove(xid);
        waitLock.remove(xid);

    } finally {
    
    
        lock.unlock();
    }
}

// 从等待队列中选择一个xid来占用uid
private void selectNewXID(long uid) {
    
    
    u2x.remove(uid);
    List<Long> l = wait.get(uid);
    if(l == null) return;
    assert l.size() > 0;

    while(l.size() > 0) {
    
    
        long xid = l.remove(0);
        if(!waitLock.containsKey(xid)) {
    
    
            continue;
        } else {
    
    
            u2x.put(uid, xid);
            Lock lo = waitLock.remove(xid);
            waitU.remove(xid);
            lo.unlock();
            break;
        }
    }

    if(l.size() == 0) wait.remove(uid);
}

insert data

insert() is to wrap the data into an Entry and hand it over to DM for insertion:

@Override
public long insert(long xid, byte[] data) throws Exception {
    
    
    lock.lock();
    Transaction t = activeTransaction.get(xid);
    lock.unlock();

    if(t.err != null) {
    
    
        throw t.err;
    }

    byte[] raw = Entry.wrapEntryRaw(xid, data);
    return dm.insert(xid, raw);
}

read transaction

The read() method reads an entry, and the visibility can be judged according to the isolation level.

@Override
public byte[] read(long xid, long uid) throws Exception {
    
    
    lock.lock();
    //当前事务xid读取时的快照数据
    Transaction t = activeTransaction.get(xid);
    lock.unlock();

    if(t.err != null) {
    
    
        throw t.err;
    }

    Entry entry = null;
    try {
    
    
        //通过uid找要读取的事务dataItem
        entry = super.get(uid);
    } catch(Exception e) {
    
    
        if(e == Error.NullEntryException) {
    
    
            return null;
        } else {
    
    
            throw e;
        }
    }
    try {
    
    
        if(Visibility.isVisible(tm, t, entry)) {
    
    
            return entry.data();
        } else {
    
    
            return null;
        }
    } finally {
    
    
        entry.release();
    }
}

ACID properties of transactions

A transaction is a unit of program execution that accesses and updates various data items in a database. The purpose of the transaction is to either modify all or not do it.

In most scenarios, the application only needs to operate a single database, and the transaction in this case is called a local transaction (Local Transaction).

The ACID properties of local transactions are directly supported by the database.

In order to achieve local transactions, Mysql has done a lot of work, such as rollback logs, redo logs, MVCC, read-write locks, etc.

For the InnoDB storage engine, its default transaction isolation level is Repeatable read, which fully follows and satisfies the ACID characteristics of transactions.

ACID is an acronym for four words: Atomicity, Consistency, Isolation, and Durability.

InnoDB guarantees the ACID characteristics of transactions through logs and locks:

  • Through the database lock mechanism, the isolation of transactions is guaranteed;
  • Ensure the isolation of transactions through Redo Log (redo log);
  • The atomicity and consistency of transactions are guaranteed through Undo Log (undo log);

Atomicity

A transaction must be an atomic operation sequence unit. All operations contained in a transaction are either all successful or not executed at one time. If any one fails, the entire transaction is rolled back. Only when all operations are executed successfully, the entire transaction be considered a success.

Before operating any data, first backup the data to Undo Log, and then modify the data. If an error occurs or the user executes a Rollback statement, the system can use the backup in the Undo Log to restore the data to the state before the transaction started, so as to ensure the atomicity of the data.

Consistency

The execution of the transaction cannot destroy the integrity and consistency of the database data, and the database must be in a consistent state before and after the transaction is executed.

Consistency includes two aspects, namely constraint consistency and data consistency;

  • Constraint Consistency: Constraints such as foreign key, check (not supported by mysql), unique index, etc. specified when creating the table structure.
  • Data consistency: It is a comprehensive regulation, because it is the result of atomicity, persistence, and isolation, rather than relying solely on a certain technology.

isolation

In a concurrent situation, concurrent transactions are isolated from each other, and the execution of one transaction cannot be interfered by other transactions.

That is, when different transactions operate on the same data concurrently, each transaction has its own complete data space, that is, the operations and data used within a transaction are isolated from other concurrent transactions, and the concurrently executed transactions cannot interfere with each other .

Durability

Persistence, also known as permanence, means that once a transaction is committed, its state changes to the corresponding data in the database should be permanent. Even if the system crashes or the machine goes down, as long as the database can be restarted, it will be able to restore it to the state when the transaction successfully ended.

The Redo Log records the backup of new data. Before the transaction is committed, you only need to persist the Redo Log, and there is no need to persist the data. When the system crashes, although the data is not persisted, the Redo Log has already been persisted. The system can restore all data to the state before the crash according to the content of the Redo Log. This is the process of using the Redo Log to ensure data persistence.

Summarize

The integrity of the data mainly reflects the consistency characteristics. The integrity of the data is guaranteed by atomicity, isolation, and persistence. These three characteristics are guaranteed by redo Log and Undo Log. The relationship between ACID properties is shown in the figure below:

deadlock

In mysql, the two-phase locking protocol (2PL) usually includes two phases of expansion and contraction. During the expansion phase, the transaction will acquire locks, but cannot release any locks. During the contraction phase, existing locks can be released but no new locks can be acquired. This provision has the risk of deadlock.

For example: when T1' is in the expansion stage, it acquires the read lock of Y and reads Y. At this time, it wants to acquire the write lock of X, but finds that the read lock of T2' has locked X, and T2' also wants to Acquire a write lock on Y. In short, T1' won't release Y unless it gets X, and T2' won't release X unless it gets Y, so it's caught in a loop and a deadlock is formed.

2PL blocks the transaction until the thread holding the lock releases the lock. This waiting relationship can be abstracted into a directed edge. For example, if Tj is waiting for Ti, it can be expressed as Tj --> Ti. In this way, an infinite number of directed edges can form a graph. To detect deadlock, you only need to check whether there is a cycle in this graph.

deadlock detection

Create a LockTable object and maintain this graph in memory. The maintenance structure is as follows:

public class LockTable {
    
    
    
    private Map<Long, List<Long>> x2u;  // 某个XID已经获得的资源的UID列表
    private Map<Long, Long> u2x;        // UID被某个XID持有
    private Map<Long, List<Long>> wait; // 正在等待UID的XID列表
    private Map<Long, Lock> waitLock;   // 正在等待资源的XID的锁
    private Map<Long, Long> waitU;      // XID正在等待的UID
    ......
}

Every time there is a waiting situation, try to add an edge to the graph and perform deadlock detection. If a deadlock is detected, the edge is withdrawn, the addition is not allowed, and the transaction is withdrawn.

// 不需要等待则返回null,否则返回锁对象
// 会造成死锁则抛出异常
public Lock add(long xid, long uid) throws Exception {
    
    
    lock.lock();
    try {
    
    
        //某个xid已经获得的资源的UID列表,如果在这个列表里面,则不造成死锁,也不需要等待
        if(isInList(x2u, xid, uid)) {
    
    
            return null;
        }
        //表示有了一个新的uid,则把uid加入到u2x和x2u里面,不死锁,不等待
        // u2x  uid被某个xid占有
        if(!u2x.containsKey(uid)) {
    
    
            u2x.put(uid, xid);
            putIntoList(x2u, xid, uid);
            return null;
        }
        //以下就是需要等待的情况
        //多个事务等待一个uid的释放
        waitU.put(xid, uid);
        //putIntoList(wait, xid, uid);
        putIntoList(wait, uid, xid);
        //造成死锁
        if(hasDeadLock()) {
    
    
            //从等待列表里面删除
            waitU.remove(xid);
            removeFromList(wait, uid, xid);
            throw Error.DeadlockException;
        }
        //从等待列表里面删除
        Lock l = new ReentrantLock();
        l.lock();
        waitLock.put(xid, l);
        return l;

    } finally {
    
    
        lock.unlock();
    }
}

Call add, if you don't need to wait, go to the next step, if you need to wait, it will return a locked Lock object. When the caller acquires the object, it needs to try to acquire the lock of the object, so as to achieve the purpose of blocking the thread.

 try {
    
    
    l = lt.add(xid, uid);
} catch(Exception e) {
    
    
    t.err = Error.ConcurrentUpdateException;
    internAbort(xid, true);
    t.autoAborted = true;
    throw t.err;
}
if(l != null) {
    
    
    l.lock();
    l.unlock();
}

Judgment deadlock

The algorithm for finding whether there is a cycle in a graph is actually a deep search. It should be noted that this graph is not necessarily a connected graph. The specific idea is to set an access stamp for each node, which is initialized to 1, and then traverse all nodes, use each non-1 node as the root for deep search, and search all nodes encountered in the connected graph. Set to the same number, different connected graphs have different numbers. In this way, if you encounter a previously traversed node when traversing a certain graph, it means that a cycle has appeared.

The specific implementation of judging deadlock is as follows:

private boolean hasDeadLock() {
    
    
    xidStamp = new HashMap<>();
    stamp = 1;
    System.out.println("xid已经持有哪些uid x2u="+x2u);//xid已经持有哪些uid
    System.out.println("uid正在被哪个xid占用 u2x="+u2x);//uid正在被哪个xid占用

    //已经拿到锁的xid
    for(long xid : x2u.keySet()) {
    
    
        Integer s = xidStamp.get(xid);
        if(s != null && s > 0) {
    
    
            continue;
        }
        stamp ++;
        System.out.println("xid"+xid+"的stamp是"+s);
        System.out.println("进入深搜");
        if(dfs(xid)) {
    
    
            return true;
        }
    }
    return false;
}

private boolean dfs(long xid) {
    
    
    Integer stp = xidStamp.get(xid);
    //遍历某个图时,遇到了之前遍历过的节点,说明出现了环
    if(stp != null && stp == stamp) {
    
    
        return true;
    }
    if(stp != null && stp < stamp) {
    
    
        return false;
    }
    //每个已获得资源的事务一个独特的stamp
    xidStamp.put(xid, stamp);
    System.out.println("xidStamp找不到该xid,加入后xidStamp变为"+xidStamp);
    //已获得资源的事务xid正在等待的uid
    Long uid = waitU.get(xid);
    System.out.println("xid"+xid+"正在等待的uid是"+uid);
   if(uid == null){
    
    
       System.out.println("未成环,退出深搜");
       //xid没有需要等待的uid,无死锁
       return false;
   }
    //xid需要等待的uid被哪个xid占用了
    Long x = u2x.get(uid);
    System.out.println("xid"+xid+"需要的uid被"+"xid"+x+"占用了");
    System.out.println("=====再次进入深搜"+"xid"+x+"====");
    assert x != null;
    return dfs(x);
}

When a transaction commits or aborts, it releases all locks it holds and removes itself from the wait graph.

public void remove(long xid) {
    
    
    lock.lock();
    try {
    
    
        List<Long> l = x2u.get(xid);
        if(l != null) {
    
    
            while(l.size() > 0) {
    
    
                Long uid = l.remove(0);
                selectNewXID(uid);
            }
        }
        waitU.remove(xid);
        x2u.remove(xid);
        waitLock.remove(xid);

    } finally {
    
    
        lock.unlock();
    }
}

The while loop releases all locks on resources held by this thread, which can be acquired by waiting threads:

// 从等待队列中选择一个xid来占用uid
private void selectNewXID(long uid) {
    
    
    u2x.remove(uid);
    List<Long> l = wait.get(uid);
    if(l == null) return;
    assert l.size() > 0;

    while(l.size() > 0) {
    
    
        long xid = l.remove(0);
        if(!waitLock.containsKey(xid)) {
    
    
            continue;
        } else {
    
    
            u2x.put(uid, xid);
            Lock lo = waitLock.remove(xid);
            waitU.remove(xid);
            lo.unlock();
            break;
        }
    }

    if(l.size() == 0) wait.remove(uid);
}

It is still a fair lock to try to unlock from the beginning of the List. When unlocking, just unlock the Lock object, so that the business thread has acquired the lock and can continue to execute.

The test code is as follows:

public static void main(String[] args) throws Exception {
    
    
    LockTable lock = new LockTable();
    lock.add(1L,3L);
    lock.add(2L,4L);
    lock.add(3L,5L);
    lock.add(1L,4L);

    System.out.println("+++++++++++++++++++++++");
    lock.add(2L,5L);
    System.out.println("++++++++++++++++");
    lock.add(3L,3L);
    System.out.println(lock.hasDeadLock());
}

The execution results are as follows:

xid已经持有哪些uid x2u={
    
    1=[3], 2=[4], 3=[5]}
uid正在被哪个xid占用 u2x={
    
    3=1, 4=2, 5=3}
xid1的stamp是null
进入深搜
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2}
xid1正在等待的uid是4
xid1需要的uid被xid2占用了
=====再次进入深搜xid2====
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2, 2=2}
xid2正在等待的uid是null
未成环,退出深搜
xid3的stamp是null
进入深搜
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2, 2=2, 3=3}
xid3正在等待的uid是null
未成环,退出深搜
+++++++++++++++++++++++
xid已经持有哪些uid x2u={
    
    1=[3], 2=[4], 3=[5]}
uid正在被哪个xid占用 u2x={
    
    3=1, 4=2, 5=3}
xid1的stamp是null
进入深搜
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2}
xid1正在等待的uid是4
xid1需要的uid被xid2占用了
=====再次进入深搜xid2====
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2, 2=2}
xid2正在等待的uid是5
xid2需要的uid被xid3占用了
=====再次进入深搜xid3====
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2, 2=2, 3=2}
xid3正在等待的uid是null
未成环,退出深搜
++++++++++++++++
xid已经持有哪些uid x2u={
    
    1=[3], 2=[4], 3=[5]}
uid正在被哪个xid占用 u2x={
    
    3=1, 4=2, 5=3}
xid1的stamp是null
进入深搜
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2}
xid1正在等待的uid是4
xid1需要的uid被xid2占用了
=====再次进入深搜xid2====
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2, 2=2}
xid2正在等待的uid是5
xid2需要的uid被xid3占用了
=====再次进入深搜xid3====
xidStamp找不到该xid,加入后xidStamp变为{
    
    1=2, 2=2, 3=2}
xid3正在等待的uid是3
xid3需要的uid被xid1占用了
=====再次进入深搜xid1====

How to do transaction isolation in high concurrency scenarios

Functional Analysis of Transaction Isolation Level

Locks and multi-version control (MVCC) technology are used to ensure isolation. There are four types of isolation supported by InnoDB, from low to high.

  • Read uncommitted (READ UNCOMMITTED)
  • READ COMMITTED
  • Repeatable read (REPEATABLE READ)
  • Serializable (SERIALIZABLE).

Read uncommitted (READ UNCOMMITTED)

Under the read uncommitted isolation level, dirty reads are allowed. Uncommitted means that these data may be rolled back, and data that does not necessarily end up being read is read. The read uncommitted isolation level is prone to dirty read problems;

Dirty read means that under different transactions, the current transaction can read uncommitted data of other transactions, which simply means reading dirty data. The so-called dirty data refers to the modification of the row record in the buffer pool by the transaction, and it has not been submitted yet.

If dirty data is read, that is, one transaction can read uncommitted data in another transaction, which obviously violates the isolation of the database.

The condition for dirty reads to occur is that the isolation level of the transaction needs to be Read uncommitted. In a production environment, most databases will at least be set to Read Committed, so the probability of sending in a production environment is very small.

READ COMMITTED

Only the submitted data is allowed to be read, that is, in the process of transaction A accumulating n from 0 to 10, B cannot see the Chinese value of n, but can only see 10.

Under the committed isolation level, dirty reads are prohibited, but non-repeatable reads run.

Read committed satisfies the simple definition of isolation: a transaction can only see changes made by committed transactions.

The default isolation level of oracle and sql server is read committed.

The Read Committed isolation level is prone to non-repeatable read problems. Non-repeatable reads refer to the situation where the data of the same data set is read multiple times within a transaction.


Transaction A reads the same data multiple times, but transaction B updates and compares the data during the multiple reads of transaction A, resulting in inconsistent results when transaction A reads the same data multiple times.

The difference between dirty reads and non-repeatable reads is: Dirty reads read uncommitted data, while non-repeatable reads read committed data, but it violates the principle of database transaction consistency.

Under normal circumstances, the problem of non-repeatable reading is acceptable, because it reads the data that has been submitted, and it will not cause a big problem in itself. The default isolation level of Oracle and SQL server database transactions is set to Read Committed, and the RC isolation level is a phenomenon that allows non-repeatable reading. The default isolation level of Mysql is RR, and the non-repeatable read phenomenon is avoided by using the Next-Key Lock algorithm.

The difference between non-repeatable read and phantom read is:

  • The key point of non-repeatable reading is modification: under the same conditions, the data that was read for the first time is read again and the value is different.
  • The focus of phantom reading is to add or delete: under the same conditions, the number of records read for the first time and the second time are different.

Repeatable read (REPEATABLE READ)

It is guaranteed that when the same data is read multiple times during the transaction, its value is consistent with that at the start of the transaction.
Under the repeatable read isolation level, dirty reads and non-repeatable reads are prohibited, but phantom reads exist.

Phantom reading means that the number of data in the two queries of a transaction is inconsistent. For example, one transaction queries several columns (Row) of data, while another transaction inserts new columns of data at this time. In the next query, you will find that there are several columns of data that it did not have before.

The default isolation level of mysql is repeatable read. To view the transaction isolation level of the current database of mysql, the command is as follows

show variables like 'tx_isolation';select @@tx_isolation;

Set the transaction isolation level:

set tx_isolation='READ-UNCOMMITTED';
set tx_isolation='READ-COMMITTED';
set tx_isolation='REPEATABLE-READ';
set tx_isolation='SERIALIZABLE';

In theory, repeatable reading will lead to another thorny problem: phantom reading. Phantom reading until the current user reads a range of data rows, another transaction inserts new rows in the range, and when the user reads When there are data rows in this range, new "phantom" rows will be found. The InnoDB storage engine solves this problem through the multiversion concurrency control (MVCC, Multiversion Concurrency Control) mechanism.

Serializable (SERIALIZABLE)

The strictest transaction requires all transactions to be executed serially and cannot be executed concurrently.

It solves the phantom read problem by enforcing the ordering of transactions so that they cannot conflict with each other.

In other words, a shared lock is added to each read data row.

Serializable can cause massive timeouts and lock contention.

Summary :

isolation level Dirty Read Non-repeatable read (NoRepeatable Read) Phantom Read
Read uncommitted (Read uncommitted) possible possible possible
Read committed impossible possible possible
Repeatable read (Repeatable read) impossible impossible possible
Serializable impossible impossible impossible

MVCC

The full name of MVCC is Multi-Version Concurrent Control, that is, multi-version concurrency control, and its principle is similar to copyonwrite.

Problems caused by concurrent transactions

read-read

That is, concurrent transactions read the same record one after another.

Because reading a record does not have any impact on the record, there is no security problem in reading the same record concurrently by the same transaction, so this operation is allowed. There is no problem and no need for concurrency control.

write-write

That is, concurrent transactions successively modify the same record.

If concurrent transactions are allowed to read the same record and make modifications to this record based on the old estimate, then the modification made by the previous transaction will be overwritten by the modification of the subsequent transaction, that is, the problem of commit coverage occurs.

In another case, concurrent transactions make modifications to the same record one after another, and the other transaction rolls back after one transaction is committed, so there will be a problem that the committed modification is lost due to rollback, that is, the rollback coverage problem.

So there is a thread safety problem, and there may be an update loss problem in both cases.

write-read or read-write

That is, two concurrent transactions perform read and write operations on the same record respectively.

If a transaction reads a revision record that has not been committed by another transaction, then there is a dirty read problem;

If we control it so that a transaction can only read the modified data of other committed transactions, then the data read by this transaction before and after another transaction commits the modification is different, which means that non-repeatable reading has occurred ;

If a transaction finds some records according to some conditions, and then another thing inserts some records into the table, the original transaction finds that the results obtained when the same conditions are queried again are inconsistent with the results obtained by the first query, which means Then phantom reading occurred .

For the problems of write-read or read-write, MySQL's InnoDB implements MVCC to better handle read-write conflicts. Even if there are concurrent reads and writes, there is no need to lock and realize "non-blocking concurrent reads".

So, to sum up, the reason why MVCC is needed is that databases usually use locks to achieve isolation. The most primitive locks, after locking a resource, will prohibit any other thread from accessing the same resource. However, a feature of many applications is the scenario of more reads and less writes. The number of reads of a lot of data is much greater than the number of modifications, and mutual exclusion between read data is not necessary. Therefore, a read-write lock method is used. Read locks and read locks are not mutually exclusive, while write locks, write locks, and read locks are mutually exclusive. This greatly improves the concurrency capability of the system.

Later, people found that concurrent reading was not enough, and proposed a method to prevent conflicts between reading and writing, which is to save the data in a way similar to snapshots when reading data, so that the read lock does not conflict with the write lock. Yes, different transaction sessions will see their own specific version of the data. Of course, snapshot is a conceptual model, and different databases may implement this function in different ways.

Under the MVCC protocol, each read operation will see a consistent snapshot snapshot, and non-blocking reads can be achieved. Read this snapshot snapshot without locking. In addition to the snapshot snapshot version, MVCC allows data to have multiple versions. The version number can be a timestamp or a globally incremented
transaction ID. At the same point in time, different transactions see different data.

First of all, before understanding MVCC, we need to clarify two definitions

  • Current read : The read data is the latest version. When reading the data, it is also necessary to ensure that other concurrent transactions will not modify the current data. The current read will lock the read records. For example: select ... lock in share mode (shared lock), select ... for update | update | insert | delete (exclusive lock)
  • Snapshot reading : Every time data is modified, a snapshot record will be stored in the undo log. The snapshot here is to read a snapshot of a certain version in the undo log. The advantage of this method is that the data can be read without locking, but the disadvantage is that the read data may not be the latest version. General queries are snapshot reads, for example: select * from t_user where id=1; queries in MVCC are all snapshots.

The realization principle of mvcc

MVCC in MySQL is mainly implemented through hidden fields in row records (hidden primary key row_id, transaction ID trx_id, rollback pointer roll_pointer), undo log (version chain), and ReadView (consistent read view).

hidden field

In MySQL, in addition to custom fields, there are some hidden fields in each row of records:

  • row_id : When the database table does not define a primary key, InnoDB will generate a clustered index with row_id as the primary key.
  • trx_id : The transaction ID records the transaction id of the newly added/recently modified record, and the transaction id is self-incrementing.
  • roll_pointer : The rollback pointer points to the previous version of the current record (in the undo log).

ReadView

The ReadView consistency view is mainly composed of two parts: the ID array of all uncommitted transactions and the largest transaction ID that has been created. For example: [100,200],300. Transactions 100 and 200 are currently uncommitted transactions, while transaction 300 is the largest transaction currently created (already committed). ReadView is created when the SELECT statement is executed, but the strategy for generating ReadView is different under the two transaction levels of read committed and repeatable read: the read committed level is regenerated every time the SELECT statement is executed A ReadView, and the repeatable read level is only generated when the first SELECT statement is executed, and the subsequent SELECT statement will continue to use the previously generated ReadView (even if there is an update statement later, it will continue to use).

ReadView is a "read view" that MVCC generates when snapshotting data. There are 4 more important variables in ReadView:

  • m_ids : Active transaction id list, the transaction id list of all active (that is, uncommitted) transactions in the current system.
  • min_trx_id : The smallest transaction id in m_ids.
  • max_trx_id : When generating ReadView, the system should assign the id to the next transaction (note that it is not the largest transaction id in m_ids), that is, the largest transaction id in m_ids + 1.
  • creator_trx_id : The transaction id of the transaction that generated this ReadView.

version chain

When modifying data, the modified page content will be recorded in the redo log (in order to restore the operation of the database after the database restarts), and the original snapshot of the data will be recorded in the undo log (for rolling back the transaction). The undo log has two functions. In addition to being used to roll back transactions, it is also used to implement MVCC.

Use a simple example to draw the logic diagram of the undo log version chain used in MVCC:

When transaction 100 (trx_id=100) executes insert into t_user values(1,'bend',30); after:

mysql version chain

When transaction 102 (trx_id=102) executes update t_user set name='Li Si' where id=1; After:

mysql version chain

When transaction 102 (trx_id=102) executes update t_user set name='Wang Wu' where id=1; after:

mysql version chain

The comparison rules of the specific version chain are as follows. First, take out the transaction ID of the first version at the top of the version chain and start to compare one by one:

specific version chain

(where min_id points to the smallest transaction ID in the array of uncommitted transactions in ReadView, and max_id points to the largest transaction ID that has been created in ReadView)

Which version of data can be read when a transaction performs snapshot reading, the rules of ReadView are as follows:

(1) When the trx_id recorded in the version chain is equal to the current transaction id (trx_id = creator_trx_id) , it means that this version in the version chain is modified by the current transaction, so the snapshot record is visible to the current transaction.

(2) When the trx_id recorded in the version chain is less than the minimum id of the active transaction (trx_id < min_trx_id) , it means that the record in the version chain has been submitted, so the snapshot record is visible to the current transaction.

(3) When the trx_id recorded in the version chain is greater than the next transaction id to be allocated (trx_id > max_trx_id) , the snapshot record is not visible to the current transaction.

(4) When the trx_id recorded in the version chain is greater than or equal to the minimum active transaction id and the trx_id recorded in the version chain is smaller than the next transaction id to be allocated (min_trx_id<= trx_id < max_trx_id), if the trx_id recorded in the version chain is active In the transaction id list m_ids, it indicates that when ReadView is generated, the transaction that modifies the record has not been submitted, so the snapshot record is not visible to the current transaction; otherwise, the snapshot record is visible to the current transaction.

When the transaction reads the snapshot of the record with id=1, select * from t_user where id=1, in the snapshot of the version chain, starting from the latest record, judge these 4 conditions in turn, until a certain version of the snapshot is right for the current The transaction is visible, otherwise continue comparing the previous version of the record.

MVCC is mainly used to solve the problem of dirty reads under the RU isolation level and non-repeatable reads under the RC isolation level, so MVCC only takes effect under the RC (resolve dirty reads) and RR (resolve non-repeatable reads) isolation levels, that is MySQL will only generate ReadView for snapshot reads under the RC and RR isolation levels.

The difference is that under the RC isolation level, each snapshot read will generate a latest ReadView; under the RR isolation level, only the first snapshot read in the transaction will generate a ReadView, and subsequent snapshot reads will use the ReadView generated for the first time. .

MySQL reduces the blocking probability of transactions through MVCC.

For example: T1 wants to update the value of record X, so T1 needs to acquire the lock of X first, and then update, that is, create a new version of X, assuming it is x3.

Assuming that T1 has not released the lock of X, T2 wants to read the value of X, and it will not block at this time, and MYDB will return an older version of X, such as x2. In this way, the final execution result is equivalent to that T2 is executed first, and T1 is executed later, and the scheduling sequence is still serializable. If X does not have an older version, it can only wait for T1 to release the lock.

So it just reduces the probability.

handwritten transaction isolation level

If the latest version of a record is locked, when another transaction wants to modify or read this record, MYDB will return an older version of the data.

At this time, it can be considered that the latest locked version is invisible to another transaction.

Committed to read

That is, when a transaction reads data, it can only read the data generated by the committed transaction.

Version visibility is related to transaction isolation.

The lowest degree of transaction isolation supported by MYDB is "Read Committed", that is, when a transaction reads data, it can only read the data generated by the committed transaction. The benefits of ensuring the lowest read commit have been explained in Chapter 4 (preventing cascading rollbacks from conflicting with commit semantics).

MYDB implements read submission and maintains two variables for each version, which are the aforementioned XMIN and XMAX:

  • XMIN : the transaction number that created this version
  • XMAX : delete the transaction number for this version

XMIN should be filled in when a version is created, and XMAX should be filled in when a version is deleted or a new version appears.

The XMAX variable also explains why the DM layer does not provide a delete operation. When you want to delete a version, you only need to set its XMAX. In this way, this version is invisible to each transaction after XMAX, which is equivalent to was deleted.

Under read commit, the visibility logic of the version to the transaction is as follows:

(XMIN == Ti and                             // 由Ti创建且
    XMAX == NULL                            // 还未被删除
)
or                                          // 或
(XMIN is commited and                       // 由一个已提交的事务创建且
    (XMAX == NULL or                        // 尚未删除或
    (XMAX != Ti and XMAX is not commited)   // 由一个未提交的事务删除
))

If the condition is true, the version is visible to Ti. Then to get the version suitable for Ti, you only need to start from the latest version and check the visibility forward one by one. If it is true, you can return directly.

The following method determines whether a record is visible to transaction t:

private static boolean readCommitted(TransactionManager tm, Transaction t, Entry e) {
    
    
    long xid = t.xid;
    long xmin = e.getXmin();
    long xmax = e.getXmax();
    if(xmin == xid && xmax == 0) return true;

    if(tm.isCommitted(xmin)) {
    
    
        if(xmax == 0) return true;
        if(xmax != xid) {
    
    
            if(!tm.isCommitted(xmax)) {
    
    
                return true;
            }
        }
    }
    return false;
}

repeatable read

Non-repeatability, which will cause a transaction to get different results for reading the same data item during execution. As shown in the following results, the initial value of adding X is 0:


T1 begin
R1(X) // T1 读得 0
T2 begin
U2(X) // 将 X 修改为 1
T2 commit
R1(X) // T1 读的 

It can be seen that T1 reads X twice, and the reading results are different. If you want to avoid this situation, you need to introduce a stricter isolation level, that is, repeatable read.

When T1 reads for the second time, it reads the value modified by T2 that has been submitted (this modification transaction is the original version of XMAX and the new version of XMIN), which leads to this problem. So we can stipulate that a transaction can only read the data versions generated by those transactions that have ended when it started.

According to this rule, the transaction needs to ignore 2 points:

(1) Data of transactions started after this transaction;

(2) The data of the transaction that was still active when the transaction started.

For the first item, you only need to compare the transaction ID to determine. As for the second item, it is necessary to record all currently active transactions SP(Ti) at the beginning of transaction Ti. If a certain version is recorded, XMIN should be invisible to Ti in SP(Ti).

Therefore, the judgment logic of repeatable read is as follows:

(XMIN == Ti and                 // 由Ti创建且
 (XMAX == NULL or               // 尚未被删除
))
or                              // 或
(XMIN is commited and           // 由一个已提交的事务创建且
 XMIN < XID and                 // 这个事务小于Ti且
 XMIN is not in SP(Ti) and      // 这个事务在Ti开始前提交且
 (XMAX == NULL or               // 尚未被删除或
  (XMAX != Ti and               // 由其他事务删除但是
   (XMAX is not commited or     // 这个事务尚未提交或
XMAX > Ti or                    // 这个事务在Ti开始之后才开始或
XMAX is in SP(Ti)               // 这个事务在Ti开始前还未提交
))))

Therefore, it is necessary to provide a structure to abstract a transaction to save snapshot data (the transaction was still active when the transaction was created):

// vm对一个事务的抽象
public class Transaction {
    
    
    public long xid;
    public int level;
    public Map<Long, Boolean> snapshot;
    public Exception err;
    public boolean autoAborted;

    //事务id  隔离级别  快照
    public static Transaction newTransaction(long xid, int level, Map<Long, Transaction> active) {
    
    
        Transaction t = new Transaction();
        t.xid = xid;
        t.level = level;
        if(level != 0) {
    
    
            //隔离级别为可重复读,读已提交不需要快照信息
            t.snapshot = new HashMap<>();
            for(Long x : active.keySet()) {
    
    
                t.snapshot.put(x, true);
            }
        }
        return t;
    }

    public boolean isInSnapshot(long xid) {
    
    
        if(xid == TransactionManagerImpl.SUPER_XID) {
    
    
            return false;
        }
        return snapshot.containsKey(xid);
    }
}

The active in the construction method saves all current active transactions.

Therefore, under the isolation level of repeatable read, the judgment of whether a version is visible to the transaction is as follows:

private static boolean repeatableRead(TransactionManager tm, Transaction t, Entry e) {
    
    
    long xid = t.xid;
    long xmin = e.getXmin();
    long xmax = e.getXmax();
    if(xmin == xid && xmax == 0) return true;
 
    if(tm.isCommitted(xmin) && xmin < xid && !t.isInSnapshot(xmin)) {
    
    
        if(xmax == 0) return true;
        if(xmax != xid) {
    
    
            if(!tm.isCommitted(xmax) || xmax > xid || t.isInSnapshot(xmax)) {
    
    
                return true;
            }
        }
    }
    return false;
}

version jump

For the version jumping problem, consider the following situation, assuming that X initially only has version x0, and both T1 and T2 are repeatable read isolation levels:

T1 begin
T2 begin
R1(X) // T1读取x0
R2(X) // T2读取x0
U1(X) // T1将X更新到x1
T1 commit
U2(X) // T2将X更新到x2
T2 commit

This situation is fine in practice, but it is not logically correct.

T1 updates X from x0 to x1, which is correct. But T2 updates X from x0 to x2, skipping the x1 version.

Read commits allow version jumps, while repeatable reads do not allow version jumps.

The idea to solve the version jump: if Ti needs to modify X, and X has been modified by transaction Tj invisible to Ti, then Ti is required to roll back.

The implementation of MVCC makes it possible to undo or rollback a transaction: only need to mark this transaction as aborted.

According to the visibility mentioned in the previous chapter, each transaction can only see the data generated by other committed transactions, and the data generated by an aborted transaction will not have any impact on other transactions, which is equivalent to, This transaction never existed

if(Visibility.isVersionSkip(tm, t, entry)) {
    
    
    System.out.println("检查到版本跳跃,自动回滚");
    t.err = Error.ConcurrentUpdateException;
    internAbort(xid, true);
    t.autoAborted = true;
    throw t.err;
}

We have summarized earlier that there are two situations where Ti is invisible to Tj:

(1) XID(Tj) > XID(Ti) Modified version created after Ti

(2) When Tj in SP(Ti) Ti is created, the modified version has been created but not yet submitted

The version jump check first takes out the latest submitted version of the data X to be modified, and checks whether the creator of the latest version is visible to the current transaction. The specific implementation is as follows:

public static boolean isVersionSkip(TransactionManager tm, Transaction t, Entry e) {
    
    
    long xmax = e.getXmax();
    if(t.level == 0) {
    
    
        return false;
    } else {
    
    
        return tm.isCommitted(xmax) && (xmax > t.xid || t.isInSnapshot(xmax));
    }
}

about the author:

First work: Mark , senior big data architect, Java architect, nearly 20 years of experience in Java, big data architecture and development. Senior architecture mentor, successfully guided multiple intermediate Java and senior Java transformation architect positions.

Second work: Nien , senior system architect, senior writer in the IT field, and famous blogger. In the past 20 years, he has worked on 3-level architecture research, system architecture, system analysis, and core code development in the fields of high-performance Web platform, high-performance communication, high-performance search, and data mining. Senior architecture mentor, successfully guided multiple intermediate Java and senior Java transformation architect positions.

Said in the back:

Continuous iteration and continuous upgrading are the tenets of the Nien team.

Continuous iteration and continuous upgrading are also the soul of "Starting from 0, Handwriting MySQL".

More real interview questions will be collected later. At the same time, if you encounter interview problems, you can come to Nien's community "Technical Freedom Circle (formerly Crazy Maker Circle)" to communicate and ask for help.

Our goal is to create the world's best "Handwritten MySQL" interview collection.

The realization path of technical freedom PDF:

Realize your architectural freedom:

" Have a thorough understanding of the 8-figure-1 template, everyone can do the architecture "

" 10Wqps review platform, how to structure it? This is what station B does! ! ! "

" Alibaba Two Sides: How to optimize the performance of tens of millions and billions of data?" Textbook-level answers are coming "

" Peak 21WQps, 100 million DAU, how is the small game "Sheep a Sheep" structured? "

" How to Scheduling 10 Billion-Level Orders, Come to a Big Factory's Superb Solution "

" Two Big Factory 10 Billion-Level Red Envelope Architecture Scheme "

… more architecture articles, being added

Realize your responsive freedom:

" Responsive Bible: 10W Words, Realize Spring Responsive Programming Freedom "

This is the old version of " Flux, Mono, Reactor Combat (the most complete in history) "

Realize your spring cloud freedom:

" Spring cloud Alibaba Study Bible "

" Sharding-JDBC underlying principle and core practice (the most complete in history) "

" Get it done in one article: the chaotic relationship between SpringBoot, SLF4j, Log4j, Logback, and Netty (the most complete in history) "

Realize your linux freedom:

" Linux Commands Encyclopedia: 2W More Words, One Time to Realize Linux Freedom "

Realize your online freedom:

" Detailed explanation of TCP protocol (the most complete in history) "

" Three Network Tables: ARP Table, MAC Table, Routing Table, Realize Your Network Freedom!" ! "

Realize your distributed lock freedom:

" Redis Distributed Lock (Illustration - Second Understanding - The Most Complete in History) "

" Zookeeper Distributed Lock - Diagram - Second Understanding "

Realize your king component freedom:

" King of the Queue: Disruptor Principles, Architecture, and Source Code Penetration "

" The King of Cache: Caffeine Source Code, Architecture, and Principles (the most complete in history, 10W super long text) "

" The King of Cache: The Use of Caffeine (The Most Complete in History) "

" Java Agent probe, bytecode enhanced ByteBuddy (the most complete in history) "

Realize your interview questions freely:

4000 pages of "Nin's Java Interview Collection" 40 topics

Please go to the following "Technical Freedom Circle" official account to get the PDF file update of Nien's architecture notes and interview questions↓↓↓

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/131297906