InnoDB MVCC implementation principle and source code analysis

1. Principle introduction

Multi-version of data (MVCC) is one of the main ways for MySQL to achieve high performance. By not locking ordinary SELECT, and directly using MVCC to read the value of the version, avoiding the process of repeatedly locking the data. InnoDB supports multiple versions of MVCC, in which the RC and RR isolation levels are supported by the consistent read view method, that is, take a snapshot of the transaction system at a certain time and record all active read and write transaction IDs, and then read operations are based on transaction IDs and snapshots. Transaction IDs are compared to determine visibility.

2. InnoDB data row structure

In the row structure, in addition to the user-defined columns, there are 3 system columns: DATA_ROW_ID, DATA_TRX_ID, DATA_ROLL_PTR. If the table does not define a primary key, then DATA_ROW_ID is used as the primary key column, otherwise there is no DATA_ROW_ID column in the row structure. in:

    DATA_TRX_ID: The ID of the transaction that modifies this row of data

    DATA_ROLL_PTR: Pointer to the rollback segment of the row.

The entire MVCC implementation, the key is completed by these two fields.

3. READ-VIEW principle process

4. Interpretation of READ-VIEW

1) The read view is bound to the SQL statement, and is applied for or obtained before each SQL statement is executed (RR isolation level: the first select application of the transaction, and then this is used; RC isolation level: each select will apply)

2) read view structure

struct read_view_t{
	ulint		type;	/*!< VIEW_NORMAL, VIEW_HIGH_GRANULARITY */
	undo_no_t	undo_no;/*!< 0 or if type is
				VIEW_HIGH_GRANULARITY
				transaction undo_no when this high-granularity
				consistent read view was created */
	trx_id_t	low_limit_no;
				/*!< The view does not need to see the undo
				logs for transactions whose transaction number
				is strictly smaller (<) than this value: they
				can be removed in purge if not needed by other
				views */
	trx_id_t low_limit_id;
				/*!< The read should not see any transaction
				with trx id >= this value. In other words,
				this is the "high water mark". */
	trx_id_t	up_limit_id;
				/*!< The read should see all trx ids which
				are strictly smaller (<) than this value.
				In other words,
				this is the "low water mark". */
	ulint		n_trx_ids;
				/*!< Number of cells in the trx_ids array */
	trx_id_t*	trx_ids;/*!< Additional trx ids which the read should
				not see: typically, these are the read-write
				active transactions at the time when the read
				is serialized, except the reading transaction
				itself; the trx ids in this array are in a
				descending order. These trx_ids should be
				between the "low" and "high" water marks,
				that is, up_limit_id and low_limit_id. */
	trx_id_t	creator_trx_id;
				/*!< trx id of creating transaction, or
				0 used in purge */
	UT_LIST_NODE_T(read_view_t) view_list;
				/*!< List of read views in trx_sys */
};

Mainly includes 3 members {low_limit_id, up_limit_id, trx_ids}.

    low_limit_id: Indicates the largest transaction ID of the current transaction active read/write linked list when creating a read view, that is, the largest transaction ID created recently except itself

    up_limit_id: Indicates the minimum transaction ID of the active read-write linked list of the current transaction when the read view is created.

    trx_ids: When creating a read view, all transaction IDs in the active transaction list

3) For the isolation level less than or equal to RC, read_view_close_for_mysql will be called after each SQL statement to delete the read view from the transaction, so that when the next SQL statement is started, it will be judged that trx->read_view is NULL, so as to re-apply. For the RR isolation level, the read_view will not be deleted after the end of the SQL statement, so the next SQL statement will use the last application to ensure that the read views in the transaction are the same, thus achieving a repeatable read isolation level.

4) For visibility judgment, assign clustered index and secondary index. Clustered Index:

     Recorded DATA_TRX_ID < view->up_limit_id: When the read view was created, the transaction that modified the record was committed and the record was visible

   DATA_TRX_ID >=  view->low_limit_id: The record is not visible if the current transaction is modified after it is started

   DATA_TRX_ID is located at ( view->up_limit_id, view->low_limit_id ): It is necessary to check whether the trx_id exists in the active read and write transaction array. If it exists, the record is not visible to the current read view.

   Secondary index:

    Since the secondary index of InnoDB only saves the last updated trx_id of the page, when using the secondary index to query, if the trx_id of the page is less than view->up_limit_id, it can be directly judged that all the records of the page are visible to the current view, otherwise it needs to be Return to the clustered index for judgment.

5) If the record is not visible to the view, it is necessary to traverse the history list through the DB_ROLL_PTR pointer of the record to construct the visible version data of the current view

6) After the start transaction and begin statements are executed, the transaction ID, rollback segment, read_view, and the transaction are not allocated in the innodb layer, and the transaction is placed in the read-write transaction list. This operation requires the first SQL statement to call the function trx_start_low to complete. This requires Notice.




Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324406651&siteId=291194637