This article takes you to understand the undo log of MySQL

1. The need for transaction rollback

When we learned about affairs earlier, we said that affairs need to be guaranteed 原子性, that is 事务中的操作要么全做,要么全不做. But sometimes there will be some situations in the transaction, such as:

  • 情况一:Various errors may be encountered during transaction execution, such as errors in the server itself, operating system errors, or even errors caused by sudden power failures
  • Case 2: Programmers can manually enter statements during transaction execution ROLLBACKto end the execution of the current transaction

The above two situations will cause the transaction to end halfway through the execution, but many things may have been modified during the transaction execution process. In order to ensure the transaction, 原子性we need to change things back to the original state. This process is called 回滚(English name : rollback), this can create a false impression: this transaction does not seem to do anything, so it meets the atomicity requirements.

It's like playing cards with our friends when we were young. The regret card is a very typical rollback operation. For example, if you play two three cards, the operation corresponding to the regret card is to take out the two three cards. The rollback in the database is similar to the regret card. You insert a record, and the rollback operation corresponds to deleting this record; you update a record, and the rollback operation corresponds to updating the record to the old value; you If a record is deleted, the rollback operation corresponds to inserting the record again. It seems so simple

From the above description, we can already vaguely feel that whenever we want to make changes to a record (the changes here can refer to INSERT, DELETE, UPDATE), we need to keep a hand-write down everything needed for rollback. For example:

  • When you insert a record, you must at least write down the primary key value of this record. When you roll back later, you only need to delete the record corresponding to the primary key value.
  • You delete a record, at least write down the contents of this record, so that when you roll back later, you can insert the records composed of these contents into the table
  • If you modify a record, you must at least record the old value before modifying this record, so that you can update this record to the old value when you roll back later

These things recorded by the database for rollback are called undo logs, and the English name undo logis called it undo日志. One thing to note here is that since the query operation (SELECT) does not modify any user records, it is not necessary to record the corresponding log when the query operation is executed undo. In reality InnoDB, undothe log is not as simple as what we said above, and the format of the log generated by different types of operations undois also different, but let’s put these details that are easy to confuse for a while. Let's go back and see what the transaction id is

2. Transaction id

2.1 When to assign an id to a transaction

As we said earlier when we learned about the introduction to transactions, a transaction can be one 只读事务, or one 读写事务:

  • We can START TRANSACTION READ ONLYopen one through a statement 只读事务. In a read-only transaction, we cannot add, delete, or modify ordinary tables (tables that can also be accessed by other transactions), but we can add, delete, and modify temporary tables.
  • We can START TRANSACTION READ WRITEstart a transaction through a statement 读写事务, or use a statement to open a transaction by default . In the read and write transaction, we can perform addition, deletion, modification and query operations on the table BEGIN.START TRANSACTION读写事务

If , , and operations are performed on a table during the execution of a transaction , the storage engine will assign it a unique one , as follows:InnoDB事务id

  • For 只读事务the transaction, only when it performs add, delete, and modify operations on a temporary table created by a user for the first time, will it assign a transaction id to this transaction, otherwise it will not assign a transaction id

    小提示:
    As we said earlier, EXPLAINwhen executing and analyzing a query plan for a certain query statement, sometimes you will see a Using temporary prompt in the Extra column, which indicates that an internal temporary table will be used when executing the query statement. CREATE TEMPORARY TABLEThis so-called internal temporary table is not the same as the user temporary table we created manually. When the transaction is rolled back, it is not necessary to roll back the internal temporary table used in the execution of the SELECT statement . Temporary tables are not assigned transaction ids.

  • For 读写事务example, a transaction id will be assigned to a transaction only when it performs add, delete, or modify operations on a table (including temporary tables created by users) for the first time, otherwise it will not assign a transaction id

Sometimes although we have enabled one 读写事务, the transaction is full of query statements, and no addition, deletion, or modification statements are executed, which means that this transaction will not be assigned a transaction id

After talking for a long time, what is the use of transaction id? This will be kept secret for now, and I will talk about it in detail step by step later. Now just know that a transaction will be assigned a unique transaction id only when the transaction makes changes to the records in the table.

2.2 How the transaction id is generated

This 事务idis essentially a number, and its allocation strategy is row_idroughly the same as the allocation strategy for hidden columns (columns that InnoDB automatically creates when the user does not create a primary key and UNIQUE key for the table) we mentioned earlier. The specific strategy is as follows:

  • The server will maintain a global variable in memory. Whenever a transaction needs to be allocated 事务id, the value of the variable will be assigned to the transaction as the transaction id, and the variable will be incremented by 1
  • Whenever the value of this variable 256is a multiple of , the value of this variable will be refreshed to an Max Trx IDattribute called in the page number 5 of the system table space. The 占用8个字节storage space of this attribute
  • When the system restarts next time, it will Max Trx IDload the attributes mentioned above into the memory, add 256 to the value and assign it to the global variable we mentioned earlier (because the value of the global
    variable may be greater than Max Trx IDthe attribute value)

This ensures that the transaction id value assigned throughout the system is an increasing number. The transaction that is assigned an id first gets a smaller transaction id, and the transaction that is assigned an id later gets a larger transaction id.

2.3 trx_id hidden column

When we learned InnoDBthe record row format, we emphasized that in addition to saving complete user data, the records of the clustered index will automatically add a hidden column named , trx_idif roll_pointerthe user is a user 没有在表中定义主键以及UNIQUE键, a row_idhidden column named List. So the actual structure of a record on a page looks like this:

insert image description here

The columns in it trx_idare actually pretty easy to understand, and they are just where a statement that makes changes to the clustered index record is located 事务对应的事务id(the changes here can be INSERT, , DELETEor UPDATEoperations). As for roll_pointerthe hidden columns, we will analyze them later~

3. Format of undo log

In order to realize the atomicity of the transaction, the storage engine needs to write down the corresponding log first InnoDBwhen actually performing , , or a record. Generally, every time a change is made to a record, it corresponds to a log, but in some operations of updating records, it may also correspond to 2 logs, which we will talk about later. During the execution of a transaction, there may be , , several records, that is to say, many corresponding logs need to be recorded, and these logs will be numbered from the beginning, that is to say, they are called No. 0 undo log, No. 1 undo log according to the order of generation. No. 1 undo log, ..., No. n undo log, etc., this number is also called .undoundoundo新增删除更新undoundo0undo no

These undo logs are recorded in pages of type FIL_PAGE_UNDO_LOG(the corresponding hexadecimal number is 0x0002, students who have forgotten what the page type is, need to go back and look at the previous chapters). These pages can be allocated from the system table space, or from a table space dedicated to storing undo logs, which is the so-called undo tablespaceinternal allocation. However, we will talk about how to allocate undopages for storing logs later. Now let’s take a look at what kind of undologs will be generated by different operations~ For the smooth development of the story, let’s first create a demo18table named:

mysql> CREATE TABLE demo18 (
    id INT NOT NULL,
    key1 VARCHAR(100),
    col VARCHAR(100),
    PRIMARY KEY (id),
    KEY idx_key1 (key1)
    )Engine=InnoDB CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.06 sec)

There are 3 columns in this table, among which idthe column is the primary key, we key1have created one for the column 二级索引, and the col column is an ordinary column. As we InnoDBmentioned in the data dictionary we introduced earlier, each table will be assigned a unique one table id. We can check what a certain table corresponds to through the tables information_schemain the system database . Now let's check how much it corresponds to :innodb_tablestable iddemo18table id

mysql> SELECT * FROM information_schema.innodb_tables WHERE name = 'testdb/demo18';
+----------+---------------+------+--------+-------+------------+---------------+------------+--------------+--------------------+
| TABLE_ID | NAME          | FLAG | N_COLS | SPACE | ROW_FORMAT | ZIP_PAGE_SIZE | SPACE_TYPE | INSTANT_COLS | TOTAL_ROW_VERSIONS |
+----------+---------------+------+--------+-------+------------+---------------+------------+--------------+--------------------+
|     1128 | testdb/demo18 |   33 |      6 |    66 | Dynamic    |             0 | Single     |            0 |                  0 |
+----------+---------------+------+--------+-------+------------+---------------+------------+--------------+--------------------+
1 row in set (0.00 sec)

As can be seen from the query results, demo18the table corresponds table idto 1128, first remember this value, we will use it later

3.1 The undo log corresponding to the INSERT operation

As we said before, when we insert a record into the table, there will be a distinction between 乐观插入and 悲观插入, but no matter how you insert it, the final result is that this record is placed in a data page. If you want to roll back the insert operation, then just delete this record, that is to say, when writing the corresponding undolog, mainly record the primary key information of this record. Therefore, a log of InnoDBtype is designed , and its complete structure is shown in the following figure:TRX_UNDO_INSERT_RECundo

insert image description here
According to the diagram, we emphasize a few points:

  • undo noYes in a transaction 从0开始递增, that is to say, as long as the transaction is not committed, every time an undo log is generated, the undo no of the log will increase by 1.

  • If the primary key in the record contains only one column, you only need to record the sum occupied by the column in the type TRX_UNDO_INSERT_RECof log, and if the primary key in the record contains , then all the occupied need to be recorded (in the figure represents the column occupied The size of the storage space represents the real value of the column).undo存储空间大小真实值多个列每个列存储空间大小和对应的真实值lenvalue

    小提示:
    When we insert a record into a table, we actually need to insert a record into the clustered index and all secondary indexes. However, when recording undo logs, we only need to consider the situation when inserting records into the clustered index, because in fact, the clustered index records and the secondary index records are in one-to-one correspondence. When we roll back the insert operation, we only need to Know the primary key information of this record
    , and then perform the corresponding delete operation according to the primary key information. When the delete operation is performed, the corresponding records in all secondary indexes will also be deleted. The undo logs corresponding to the DELETE operation and UPDATE operation mentioned later are also for the clustered index records, and we will not emphasize it later.

Now we insert two records into demo18:

mysql> BEGIN;  # 显式开启一个事务,假设该事务的id为100
Query OK, 0 rows affected (0.00 sec)

mysql> # 插入两条记录
mysql> INSERT INTO demo18(id, key1, col) VALUES (1, 'AWM', '狙击枪'), (2, 'M416', '步枪');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

Because the primary key of the record contains only one column, we only need to record the length ( ) and the storage space occupied by the column to be inserted into the record idin the corresponding log . In this example, two records are inserted, so two logs of type will be generated :undoidid列的类型为INT,INT类型占用的存储空间长度为4个字节真实值TRX_UNDO_INSERT_RECundo

  • The first undolog undo nois 0, the length of the storage space occupied by the record primary key is 4, and the actual value is 1. Draw a schematic like this:
    insert image description here
  • In the second undolog , the length of the storage space occupied by the record primary key is , and the actual value is . Draw a schematic like this:undo no142
    insert image description here

Compared with the first article undo日志, undo nothere 主键各列信息is a difference.

The meaning of roll_pointer hidden column

It's time to unveil roll_pointerthe real veil. This 7field that takes up 3 bytes is actually not mysterious at all. It is essentially a pointer to the corresponding record undo日志的一个指针. demo18For example, we inserted a record into the table above 2, and each record has a corresponding one undo日志. Records are stored in FIL_PAGE_INDEXpages of type (that is, the data pages we have been talking about before), and undologs are stored in FIL_PAGE_UNDO_LOGpages of type . The effect is shown in the figure:

insert image description here
It can also be seen more intuitively from the figure that roll_pointerthe essence is a pointer to the log corresponding to the record undo. roll_pointerHowever, the specific meaning of each byte of these 7 bytes undowill be explained in detail after we talk about how to allocate pages for storing logs~

3.2 Undo log corresponding to DELETE operation

We know that the records inserted into the page will next_recordform a one-way linked list according to the attributes in the record header information. We call this linked list a normal record linked list; as we said before when we talked about the data page structure, deleted records In fact, a linked list will also be formed according to the attributes in the record header information next_record, but in this linked list 记录占用的存储空间可以被重新利用, so this linked list is also called 垃圾链表. PageHeaderThe section has an PAGE_FREEattribute called , which points to the head node in the garbage list consisting of deleted records. For the smooth development of the story, let's draw a picture first, assuming that the distribution of records in a certain page at the moment is like this (this is not a demo18record in the table, but just an example we randomly cite):

insert image description here
In order to highlight the theme, in this simplified version of the schematic diagram, we only delete_maskshow the flags of the records. It can be seen from the figure that 正常记录链表中包含了3条正常记录, the garbage linked list contains 2条已删除记录, and the storage space occupied by these records in the garbage linked list can be reused. Page HeaderThe value of the property of the part of the page PAGE_FREErepresents a pointer to the head node of the garbage list. Assuming that we are going to use DELETEthe statement 正常记录链表to delete the last record in , in fact, the deletion process needs to go through two stages:

  • Phase 1: Only delete_maskset the identification bit of the record to 1, and do not modify the others (in fact, the values ​​of these hidden columns of the record will be modified trx_id) roll_pointer. InnoDB calls this stage delete mark. This is how the process is drawn:

    insert image description hereIt can be seen that the last record in the normal record linked list delete_mask值被设置为1is not added 垃圾链表. That is to say, the record is in one state at this time 中间状态, and the deleted record has been in this so-called state until the transaction in which the delete statement is committed is committed 中间状态.

    小提示:
    Why is there such a strange intermediate state? In fact, it is mainly to realize a function called MVCC, haha, I will introduce it later.

  • Phase 2: 当该删除语句所在的事务提交之后, there will be 专门的线程a real deletion of the record later. The so-called real deletion is to remove the record 正常记录链表from and add 垃圾链表it to it, and then adjust some other information of the page, such as the number of user records in the page PAGE_N_RECS, the position of the last inserted record PAGE_LAST_INSERT, the pointer of the head node of the garbage list PAGE_FREE, The number of bytes that can be reused in the page PAGE_GARBAGE, and some information about the page directory, etc. InnoDB calls this phase purge.

    After 阶段二the execution is completed, the record is truly deleted. The storage space occupied by this deleted record can also be reused. This is how it is drawn: comparing with the picture, we should also pay attention to one point. When adding the deleted record to the garbage list, it will actually modify the value of the attribute
    insert image description here
    when it is added .链表的头节点处PAGE_FREE

    小提示:
    The Page Header part of the page has a PAGE_GARBAGE attribute, which records the total number of bytes occupied by the reusable storage space in the current page. Whenever a deleted record is added to the garbage list, the value of the PAGE_GARBAGE attribute will be added to the storage space occupied by the deleted record. PAGE_FREE points to the head node of the garbage list, and then whenever a new record is inserted, first judge whether the storage space occupied by the deleted record represented by the head node pointed to by PAGE_FREE is enough to accommodate the newly inserted record, if not, directly Apply for new space on the page to store this record (yes, you read that right, it will not try to traverse the entire garbage list to find a node that can accommodate the new record). If it can be accommodated, then directly reuse the storage space of this deleted record, and point PAGE_FREE to the next deleted record in the garbage list. But there is a problem here. If the storage space occupied by the newly inserted record is smaller than the storage space occupied by the head node of the garbage list, it means that part of the storage space occupied by the record corresponding to the head node is not used. This part of the space is called the debris space. Wouldn't these fragmented spaces be used forever? In fact, it is not. The size of the storage space occupied by these fragmented spaces will be counted in the PAGE_GARBAGE attribute. These fragmented spaces will not be reused until the entire page is almost used up. However, when the page is almost full, if you insert another record, At this time, the space for a complete record cannot be allocated on the page. At this time, we will first check whether the combined space of PAGE_GARBAGE and the remaining available space can accommodate this record. If possible, InnoDB will try to reorganize The process of reorganizing the records in the page is to open a temporary page first, insert the records in the page one by one, because no fragments will be generated when inserting in sequence, and then copy the content of the temporary page to this page, so that you can Free up those fragmented spaces (obviously, reorganizing the records in the page is more performance-consuming).

From the above description, we can also see that before the transaction where the delete statement is committed, it will only go through phase one, that is, delete markphase (we don’t need to roll back after committing, so we only need to consider doing phase one of the delete operation affected by rollback). For this purpose, a type of log InnoDBis designed , and its complete structure is shown in the following figure:TRX_UNDO_DEL_MARK_RECundo

insert image description hereMy god, there are too many attributes in this~ (In fact, the meaning of most of the attributes has been introduced above) Yes, there are indeed a lot, but please don’t pay attention, if you can’t remember Don't force yourself, I'm listing them all here to make everyone familiar. I would like to trouble everyone to overcome the intensive panic disorder first, and then look up the attributes in the log of this type above, paying special attention to these points TRX_UNDO_DEL_MARK_REC:undo

  • Before operating on a record delete mark, the old trx_idand roll_pointerhidden column values ​​of the record need to be recorded in the corresponding log, which is the sum attribute undoshown in our figure . This has the advantage that the log corresponding to the record before modification can be found through the log . For example, in a transaction, we first insert a record, and then perform a delete operation on the record. The schematic diagram of this process is as follows:oldtrx_idold roll_pointerundoold roll_pointerundo

    insert image description here

  • It can be seen from the figure that delete markafter the operation is executed, its corresponding undolog and the INSERTcorresponding undolog of the operation form a linked list. This is very interesting. This linked list is called 版本链. Now it seems that we can’t see the use of this version chain. Let’s take a look later. After talking about the UPDATEcorresponding undolog of the operation, this so-called version chain will be slowly displayed. Out of its forceful place.

  • Different from the log of type TRX_UNDO_INSERT_REC, undothe log of type TRX_UNDO_DEL_MARK_REChas undoone more 索引列各列信息content, that is to say, if a column is included in an index, its related information should be recorded in this 索引列各列信息part, the so-called related information Including the position of the column in the record (indicated posby ), the storage space occupied by the column (indicated lenby ), and the actual value of the column (indicated by value). So 索引列各列信息the stored content is essentially <pos, len, value>a list of . This part of information is mainly used in 中间状态记录the second stage of the real deletion after the transaction is committed, that is, it is purgeused in the stage. How to use it can be ignored now~

We have finished the introduction, now continue to delete a record in the above transaction idfor the transaction, for example, we delete the record for :100id1

mysql> DELETE FROM demo18 WHERE id = 1;
Query OK, 1 row affected (0.01 sec)

delete markThe structure of the log corresponding to this operation undois as follows:
insert image description hereaccording to this figure, we have to pay attention to the following points:

  • Because this undo log is generated in the transaction with id 100 第3条undo日志,所以它对应的undo no就是2.

  • When operating on the record delete mark, trx_idthe value of the hidden column of the record is 100(that is to say, the latest modification of the record occurred in this transaction), so fill in 100the old trx_idattribute. Then take roll_pointerout the value of the hidden column of the record and fill it old roll_pointerin the attribute, so that the log old roll_pointergenerated when the record was last changed can be found through the attribute value .undo

  • Since there are 2 indexes in the demo18 table: 一个是聚簇索引, one is 二级索引idx_key1. posAs long as it is a column included in the index, the position ( ), occupied storage space ( len) and actual value ( ) of this column in the record valueneed to be stored in the undo log.

    • For the primary key, there is only one idcolumn, and the relevant information stored in the undo log is:

      • pos: idThe column is the primary key, that is, it is recorded 第一个列, and its corresponding pos value is 0. pos takes 1 byte to store.

      • len: The type of the id column is INT, occupying 4 bytes, so the value of len is 4. len occupies 1 byte to store.

      • value: The value of the id column in the deleted record is 1, that is, the value of the value is 1. value takes 4 bytes to store.

      • Draw a picture to demonstrate it like this:
        insert image description here

      • So for idthe column, the final storage result is < 0, 4, 1>, and the storage space occupied by storing this information is 1 + 1 + 4 = 6个字节.

    • For idx_key1, there is only one key1column, and undothe relevant information stored in the log is:

      • pos: The key1 column is arranged after the id column, trx_id column, and roll_pointer column, and its corresponding pos value 3. pos takes 1 byte to store.

      • len: The type of the key1 column is VARCHAR(100), and the utf8 character set is used. The actual storage content of the deleted record is AWM, so it occupies a total of 3 bytes, that is, the value of len is 3. len occupies 1 byte to store.

      • value: The value of the key1 column in the deleted record is AWM, that is, the value of the value is AWM. value takes 3 bytes to store.

      • Draw a picture to demonstrate it like this:insert image description here

      • So for key1the column, the final storage result is < 3, 3, 'AWM'>, and the storage space occupied by storing this information is 1 + 1 + 3 = 5bytes.

    As can be seen from the above description, <0, 4, 1>and <3, 3, 'AWM'>occupy 11a total of bytes. Then index_col_info lenit occupies 2a byte, so it takes up a total of 13bytes, and the number 13 is filled index_col_info lenin the attribute.

3.3 Undo log corresponding to UPDATE operation

When executing UPDATEa statement, InnoDB handles 更新主键these 不更新主键two cases completely differently.

3.3.1 The case where the primary key is not updated

In the case of not updating the primary key, it can be subdivided into the case where the storage space occupied by the updated column does not change or changes.

In-place update (in-place update)

When updating a record, for each column to be updated, if the storage space occupied by the updated column and the column before the update are the same, then an in-place update can be performed, that is, the corresponding column can be directly modified on the basis of the original record. The value of the column. Again, the storage space occupied by each column is the same before and after the update. Any updated column occupies a larger storage space than that after the update, or the storage space occupied before the update is smaller than that after the update. Update in place. For example, there is a record with an id value of 2 in the demo18 table, and the size of its columns is shown in the figure (because the utf8 character set is used, the two characters 'rifle' occupy 6 bytes):

insert image description here
Suppose we have a statement like this UPDATE:

UPDATE demo18 SET key1 = 'P92', col = '手枪' WHERE id = 2;

In this UPDATE statement, the col column is updated from a rifle to a pistol, occupying 6 bytes before and after, that is, the occupied storage space has not changed; the key1 column is updated from M416 to P92, that is, it is changed from 4
bytes The update is 3 bytes, which does not meet the conditions required for in-place update, so in-place update cannot be performed. But if the UPDATE statement looks like this:

UPDATE demo18 SET key1 = 'M249', col = '机枪'  WHERE id = 2;

Since the storage space occupied by each updated column is the same before and after the update, such a statement can perform an in-place update.

Delete old records first, then insert new ones

In the case of not updating the primary key, if the storage space occupied by any of the updated columns is inconsistent before and after the update, then you need to delete this old record from the clustered index page first, and then according to The value of the updated column creates a new record and inserts it into the page.

Please note that the deletion we are talking about here is not delete markan operation, but a real deletion, that is, remove this record from the normal record list and add it to the garbage list, and modify the corresponding statistical information on the page ( For example PAGE_FREE, PAGE_GARBAGEwait for these information). However, the thread that does the real delete operation here is not another special thread used when doing the operation in the nagging statement, but the real delete operation is performed synchronously by the user thread. After the real delete, it must be updated according to each DELETEcolumn purgeThe new record created by the value is inserted.

Here, if the storage space occupied by the newly created record does not exceed the space occupied by the old record, then you can directly reuse the storage space occupied by the old record added to the garbage list, otherwise you need to apply for a new section of space in the page for The new record is used, if there is no space available in this page, then the page splitting operation is required, and then a new record is inserted.

For the situation where UPDATE does not update the primary key (including the above-mentioned in-place update and first delete the old record and then insert the new record), InnoDB designed a type of TRX_UNDO_UPD_EXIST_RECundo log, its complete structure is as follows:

insert image description here
TRX_UNDO_DEL_MARK_RECIn fact, most of the properties are similar to the types of undo logs we have introduced , but we still need to pay attention to the following points:

  • n_updatedThe attribute indicates that several columns will be updated after the execution of this UPDATE statement, and the following ones respectively <pos, old_len, old_value>indicate the position of the updated column in the record, the storage space occupied by the column before the update, and the actual value of the column before the update.
  • If UPDATEthe column updated in the statement includes an index column, the column information of the index column will also be added, otherwise this part will not be added.

Now continue to update a record in the above transaction for idthe transaction, for example, let's update the record for :100id2

BEGIN;  # 显式开启一个事务,假设该事务的id为100
# 插入两条记录
INSERT INTO demo18(id, key1, col) VALUES (1, 'AWM', '狙击枪'), (2, 'M416', '步枪');
   
# 删除一条记录   
DELETE FROM demo18 WHERE id = 1;
# 更新一条记录
UPDATE demo18 SET key1 = 'M249', col = '机枪' WHERE id = 2;

UPDATEThe size of the column updated by this statement has not changed, so 采用就地更新it can be executed as follows. When actually changing the page record, an TRX_UNDO_UPD_EXIST_RECundo log of type will be recorded first, which looks like this:

insert image description here
With this picture, let's pay attention to these places such as:

  • Because this undolog is the first log generated in the transaction id, it corresponds to .1004undoundo no3
  • The log roll_pointerpointing undo noto 1 of this log is 2the log generated when the record with the primary key value is inserted undo, that is, the log generated when the record was changed last time undo.
  • Since the value of UPDATEthe index column is updated in this statement key1, it is necessary to record the information of each column of the index column, that is, to fill key1in the information of the primary key and the column before updating.

3.3.2 The case of updating the primary key

In the clustered index, the records are connected into a one-way linked list according to the size of the primary key value. If we update the primary key value of a record, it means that the position of this record in the clustered index will change. Change, for example, if you will record 主键值从1更新为10000, if there are a lot of records whose primary key values ​​are distributed between 1 and 10000, then these two records may be very far apart in the clustered index, or even separated in the middle So many pages. For UPDATEthe case where the record primary key value is updated in the statement, InnoDBthe clustered index is processed in two steps:

  • Delete mark operation on old records

    高能注意:Here is the delete mark operation! That is to say, UPDATEbefore the transaction of the statement is committed, only one delete markoperation is performed on the old record, and after the transaction is committed, a special thread performs purgethe operation and adds it to the garbage list. This must be distinguished from what we said above that when the primary key value of the record is not updated, the old record is actually deleted first, and then the new record is inserted!

    小提示:
    The reason why the delete mark operation is only performed on the old record is that other transactions may also access this record at the same time. If it is actually deleted and added to the garbage list, other transactions will not be able to access it. This function is the so-called MVCC, and we will talk about what an MVCC is in detail in the following chapters.

  • Create a new record based on the updated values ​​of each column, and insert it into the clustered index (need to reposition the inserted position).

    Since the primary key value of the updated record has changed, it is necessary to re-locate the location of this record from the clustered index, and then insert it.

For UPDATEthe case where the statement updates the record primary key value, before operating on the record delete mark, an undo log of type will be recorded TRX_UNDO_DEL_MARK_REC; when a new record is inserted later, a log of type will be recorded TRX_UNDO_INSERT_REC, undothat is to say, one record for each pair When the primary key value is changed, two undo logs will be recorded. We have talked about the format of these logs above, so I won’t go into details.

4. General Linked List Structure

Multiple linked lists will be used in 写入undo日志the process, and many linked lists have the same node structure, as shown in the figure:

insert image description here
In a certain table space, we can uniquely locate the position of a node through the page number of a page and the offset within the page. These two pieces of information are equivalent to a pointer pointing to this node. so:

  • Pre Node Page NumberThe combination of and Pre Node Offsetis a pointer to the previous node
  • Next Node Page NumberThe combination of and Next Node Offsetis a pointer to the next node.

The whole List Nodetakes up 12 bytes of storage space. In order to better manage the linked list, InnoDB proposes one 基节点的结构, which stores this 链表的头节点, 尾节点and 链表长度信息the structure diagram of the base node is as follows:

insert image description here
in:

  • List LengthIndicates how many nodes there are in the linked list.
  • First Node Page Number和First Node OffsetThe combination is a pointer to the head node of the linked list.
  • Last Node Page Number和Last Node OffsetThe combination is a pointer to the tail node of the linked list.

The whole List Base Nodetakes up 16bytes of storage space. So the schematic diagram of using List Base Nodethe List Nodelinked list composed of these two structures is like this:

insert image description here

Five, FIL_PAGE_UNDO_LOG page

When we talked about the table space before, we said that the table space is actually composed of many 页面构成pages 默认大小为16KB. There are different types of these pages. For example, FIL_PAGE_INDEXpages of type are used to store clustered indexes and secondary indexes, FIL_PAGE_TYPE_FSP_HDRpages of type are used to store table space header information, and various other types of pages, one of which is called This FIL_PAGE_UNDO_LOGtype of page is specially used 存储undo日志, and the general structure of this type of page is shown in the following figure (take the default 16KB size as an example):

insert image description here
A page of type FIL_PAGE_UNDO_LOGis simply referred to as Undoa page. File HeaderThe and in the above picture File Trailerare the common structures of various pages. We have learned many times before, so I won’t go into details here. Undo Page Headeris Undo页面unique, let's take a look at its structure:

insert image description here
The meaning of each attribute is as follows:

  • TRX_UNDO_PAGE_TYPE: What kind of undo logs are going to be stored on this page.
    We introduced several types of undo logs earlier, which can be divided into two categories:

    • TRX_UNDO_INSERT(Denoted by decimal 1): The TRX_UNDO_INSERT_RECundo log of type belongs to this category, and is generally INSERTgenerated by a statement, or this type of log UPDATEwill also be generated when the primary key is updated in the statement .undo

    • TRX_UNDO_UPDATE(Denoted by decimal 2), except for logs of type TRX_UNDO_INSERT_REC, undoall other types of logs belong to this category, such as the , etc. undothat we mentioned earlier , and the logs generated by the statement generally belong to this category.TRX_UNDO_DEL_MARK_RECTRX_UNDO_UPD_EXIST_RECDELETEUPDATEundo

    The optional values ​​of this TRX_UNDO_PAGE_TYPEattribute are the two above, which are used to mark which category of logs this page is used to store undo. Logs of different categories undocannot be stored together. For example , if a page has Undoan TRX_UNDO_PAGE_TYPEattribute value of TRX_UNDO_INSERTThe storage type is TRX_UNDO_INSERT_REClogs undo, and other types of undo logs cannot be placed on this page.

    小提示:
    The reason why the undo logs are divided into two categories is that the undo logs of type TRX_UNDO_INSERT_REC can be deleted directly after the transaction is committed, while other types of undo logs also need to serve the so-called MVCC and cannot be deleted directly. Processing needs to be treated differently. Of course, if you are confused after reading this passage, you don’t need to read it again. Now you only need to know that undo logs are divided into two categories. We will explain more details later.

  • TRX_UNDO_PAGE_START: Indicates where undothe log is stored in the current page, or the starting offset of the first undolog in this page.

  • TRX_UNDO_PAGE_FREE: Corresponding to the above , it indicates the offset at the end of the TRX_UNDO_PAGE_STARTlast log stored in the current page , or starting from this position, you can continue to write new undo logs.undo

    假设现在向页面中写入了3条undo日志,那么TRX_UNDO_PAGE_STARTTRX_UNDO_PAGE_FREE的示意图就是这样:
    insert image description here
    当然,在最初一条undo日志也没写入的情况下,TRX_UNDO_PAGE_STARTTRX_UNDO_PAGE_FREE的值是相同的。

  • TRX_UNDO_PAGE_NODE:代表一个List Node结构(链表的普通节点,我们上边刚说的),下边马上用到这个属性,稍安勿躁。

六、Undo页面链表

6.1 单个事务中的Undo页面链表

因为一个事务可能包含多个语句,而且一个语句可能对若干条记录进行改动,而对每条记录进行改动前,都需要记录1条或2条的undo日志,所以在一个事务执行过程中可能产生很多undo日志,这些日志可能一个页面放不下,需要放到多个页面中,这些页面就通过我们上边介绍的TRX_UNDO_PAGE_NODE属性连成了链表:

insert image description here大家可以看一看上边的图,一边情况下把链表中的第一个Undo页称它为first undo page,因为在first undo page中除了记录Undo Page Header之外,还会记录其他的一些管理信息。其余的Undo页面称之为normal undo page

在一个事务执行过程中,可能混着执行INSERT、DELETEUPDATE语句,也就意味着会产生不同类型的undo日志。但是我们前边又说过,同一个Undo页面要么只存储TRX_UNDO_INSERT大类的undo日志,要么只存储TRX_UNDO_UPDATE大类的undo日志,反正不能混着存,所以在一个事务执行过程中就可能需要2个Undo页面的链表,一个称之为insert undo链表,另一个称之为update undo链表,画个示意图就是这样:

insert image description here
In addition, the logs InnoDBgenerated when the records of ordinary tables and temporary tables are changed undoshould be recorded separately (explained later), so there are at most 4 Undolinked lists composed of pages as nodes in a transaction:
insert image description here
of course, not at the beginning of the transaction These 4 linked lists will be allocated for this transaction, but allocated on demand. The specific allocation strategy is as follows:

  • When the transaction is just started, an Undo page linked list is not allocated either.
  • When a record is inserted into an ordinary table or an operation of updating the primary key of a record is performed during transaction execution, a linked insert undolist of an ordinary table will be allocated to it.
  • When the records in the ordinary table are deleted or updated during the execution of the transaction, a update undolinked list of the ordinary table will be assigned to it.
  • When a record is inserted into the temporary table or the operation of updating the primary key of the record is performed during the execution of the transaction, a linked list of the temporary table will be allocated to it insert undo.
  • When the records in the temporary table are deleted or updated during the execution of the transaction, a update undolinked list of the temporary table will be assigned to it.
    To sum it up is: 什么时候需要啥时候再分配,不需要就不分配.

6.2 Undo page linked list in multiple transactions

In order to improve the writing efficiency of the undo log as much as possible, 不同事务执行过程中产生的undo日志需要被写入到不同的Undo页面链表中. For example, now there are two transactions with transaction ids 1 and 2 respectively, we call them trx 1sum respectively trx 2, assuming that during the execution of these two transactions:

  • trx 1The operation is done on the ordinary table , and the operation DELETEis done on the temporary table . A linked list will be allocated , which are: INSERTUPDATE
    InnoDBtrx 13
    • update undoLinked list for normal table
    • insert undoLinked list for temporary table
    • A linked list against a temporary table update undo.
  • trx 2 performed INSERT, UPDATE, and DELETE operations on ordinary tables, but did not make changes to temporary tables.
    InnoDBA linked list will be trx 2allocated 2, which are:
    • Insert undo linked list for ordinary table
    • Update undo linked list for common table.

To sum up, in the process of trx 1and trx 2execution, InnoDBa total of 5 Undopage linked lists need to be allocated for these two transactions. This is how to draw a picture:

insert image description here
If there are more transactions, it means that more Undo page linked lists may be generated.

7. The specific writing process of the undo log

7.1 The concept of segment (Segment)

If you have carefully read the chapter on table space, you should be impressed by the concept of this segment. We spent a lot of space talking about this concept. Simply put, this segment is a logical concept, essentially composed of several scattered pages and several complete areas. For example, a B+ tree index is divided into two segments, a leaf node segment and a non-leaf node segment, so that leaf nodes can be stored together as much as possible, and non-leaf nodes can be stored together as much as possible. Each segment corresponds to an INODE Entry structure. This INODE Entry structure describes various information of this segment, such as the ID of the segment, various linked list base nodes in the segment, and the page numbers of scattered pages, etc. (specifically, in this structure You can revisit the meaning of each attribute in the chapter on table space). We also said before that in order to locate an INODE Entry, InnoDB designed a Segment Headerstructure:

insert image description here
The whole Segment Headeroccupies 10 bytes in size, and the meaning of each attribute is as follows:

  • Space ID of the INODE Entry: ID of the table space where the INODE Entry structure is located.

  • Page Number of the INODE Entry: The page number of the INODE Entry structure.

  • Byte Offset of the INODE Ent: The offset of the INODE Entry structure in this page

Knowing the table space ID, page number, and offset within the page, can you uniquely locate the address of an INODE Entry~

小提士:
The various concepts of segments in this part are explained in detail in the chapter on the table space. I will mention it here just to wake up your sleeping memory. If you have any unclear points, you can jump back to the table space again. read carefully

7.2 Undo Log Segment Header

InnoDBAccording to the regulations, each Undo page linked list corresponds to a segment, called Undo Log Segment. That is to say, the pages in the linked list are all applied for from this section, so they first undo pagedesigned a Undo Log Segment Headerpart called the first page of the Undo page linked list, which is the one mentioned above. This part contains the The information of the segment corresponding to the linked list segment headerand other information about this segment, so Undo页the first page of the linked list actually looks like this:

insert image description here
You can see that Undo链表the first page of this page is more than the normal page Undo Log Segment Header. Let's take a look at its structure:

insert image description here
The meaning of each attribute is as follows:

  • TRX_UNDO_STATE: What state is the Undo page linked list in? A Undo Log Segmentpossible state includes the following:

    • TRX_UNDO_ACTIVE: Active state, that is, an active transaction is writing undo logs to this segment.

    • TRX_UNDO_CACHED: The cached state. The Undo page linked list in this state is waiting to be reused by other transactions.

    • TRX_UNDO_TO_FREE: For the insert undo linked list, if the linked list cannot be reused after its corresponding transaction commits, it will be in this state.

    • TRX_UNDO_TO_PURGE: For the update undo linked list, if the linked list cannot be reused after its corresponding transaction commits, it will be in this state.

    • TRX_UNDO_PREPARED: Contains undo logs generated by transactions in the PREPARE phase

    小提士:
    When and how the Undo page linked list will be reused will be discussed in detail later. The PREPARE stage of the transaction only appears in the so-called distributed transaction. This book will not introduce more about distributed transactions, so you can ignore this state for now.

  • TRX_UNDO_LAST_LOG: The last position in the Undo page linked list Undo Log Header.

  • TRX_UNDO_FSEG_HEADERUndo: The information of the segment corresponding to the linked list on this page Segment Header(that is, the 10-byte structure we introduced in the previous section, through which you can find the corresponding segment INODE Entry)

  • TRX_UNDO_PAGE_LIST: The base node of the Undo page list.

    We said above that the Undo Page Header part of the Undo page has a 12-byte TRX_UNDO_PAGE_NODEattribute, which represents a List Nodestructure. Each Undopage contains Undo Page Headera structure, and these pages can be linked into a linked list through this property. This TRX_UNDO_PAGE_LISTattribute represents the base node of this linked list, of course, this base node only exists in the Undofirst page of the page linked list, that is, first undo pagein.

Undo Log Header

The way a transaction Undowrites undoa log to a page is very simple and violent, that is, it writes directly into it, and writes another one immediately after writing one, and each undolog is intimate. After writing an Undo page, apply for a new page from the segment, then insert this page into the Undo page linked list, and continue writing to the newly applied page. InnoDB considers the undo logs written into an Undo page linked list by the same transaction as a group. For example, the trx 1 we introduced above will allocate 3 Undo page linked lists, so it will also write 3 groups of undo logs; Since trx 2 will allocate 2 undo page linked lists, it will also write 2 groups of undo logs. Every time a group of undo logs is written, some attributes about this group will be recorded before this group of undo logs. InnoDB calls the place where these attributes are stored Undo Log Header. Therefore, before the first page of the Undo page list is actually written into the undo log, it will actually be filled with U ndo Page Header, Undo Log Segment Header, Undo Log Headerthese three parts, as shown in the figure:

insert image description here
The Undo Log Headerspecific structure is as follows:

insert image description here
There are a lot of attributes again, let's take a look at what they all mean:

  • TRX_UNDO_TRX_ID: Generate the transaction id of this group of undo logs

  • TRX_UNDO_TRX_NO: A sequence number is generated after the transaction is committed, and this sequence number is used to mark the commit order of the transaction (the sequence number submitted first is small, and the sequence number submitted later is large).

  • TRX_UNDO_DEL_MARKS: Mark whether this group undoof logs contains Delete markundo logs generated due to operations.

  • TRX_UNDO_LOG_START: Indicates the page offset of the first undo log in this group of undo logs.

  • TRX_UNDO_XID_EXISTS: Whether this group of undo logs contains XID information.

  • TRX_UNDO_DICT_TRANS: Mark whether this group of undo logs is generated by DDL statements.

  • TRX_UNDO_TABLE_ID: If TRX_UNDO_DICT_TRANS is true, then this attribute indicates the table id of the table operated by the DDL statement.

  • TRX_UNDO_NEXT_LOG: The offset in the page where the next set of undo logs starts.

  • TRX_UNDO_PREV_LOG: The offset in the page where the undo logs of the previous group start.

    小提士:
    Generally speaking, an Undo page linked list only stores a set of undo logs generated during the execution of a transaction, but in some cases, after a transaction is committed, the subsequent opened transaction may reuse this Undo page linked list, so that As a result, multiple sets of Undo logs may be stored in an Undo page. TRX_UNDO_NEXT_LOG and TRX_UNDO_PREV_LOG are used to mark the offsets of the next set and the previous set of undo logs in the page. Regarding when to reuse the Undo page linked list and how to reuse this linked list, we will explain in detail later. For now, just understand the meaning of the two attributes TRX_UNDO_NEXT_LOG and TRX_UNDO_PREV_LOG.

  • TRX_UNDO_HISTORY_NODE: A 12-byte List Node structure representing a node called a History linked list.

summary

For the page linked list that has not been reused Undo, the first page of the linked list, that is, before first undo pageit is actually written into undothe log, will be filled Undo Page Header、Undo Log Segment Header、Undo Log Header这3个部分, and then it will be officially written into the undo log. For other pages, that is, before normal undo pageactually writing to undothe log, it will only be filled Undo Page Header. List Base NodeThe storage of the linked list first undo page的Undo Log Segment Header部分, List Nodethe information is stored in the part of each Undopage undo Page Header, so draw a Undoschematic diagram of the page linked list like this:

insert image description here

8. Reuse the Undo page

We said earlier that in order to improve the performance of multiple concurrent transactions written to undothe log, InnoDBwe decided to allocate a corresponding Undopage linked list for each transaction (up to 4 linked lists may be allocated separately). But this also caused some problems. For example, in fact, only one or a few records may be modified during the execution of most transactions. For a certain Undo page linked list, only very few undo logs are generated, and these undo logs may only occupy a little bit. For storage space, Undowouldn’t it be too wasteful to create a new page linked list (although there is only one page in this linked list) to store such a loss of undo logs every time a transaction is opened ? It is indeed quite wasteful, so InnoDB decided to reuse the page list of the transaction in some cases after the transaction is committed Undo. The conditions for whether a Undopage linked list can be reused are simple:

  • The linked list contains only one Undopage.
    If a transaction is generated during execution 非常多的undo日志, it may apply for a lot of pages to be added to the Undo page linked list. After the transaction is submitted, if the pages in the entire linked list are reused, it means that even if the new transaction does not Undowrite many undologs to the page linked list, a lot of pages must be maintained in the linked list. Pages that are not available cannot be used by other firms, which creates another kind of waste. Therefore InnoDB, only Undowhen the Undo page list contains only one page, the list can be reused by the next transaction.

  • The space already used by the Undo page 小于整个页面空间的3/4
    As we said earlier, the Undo page linked list can be divided into insert undolinked list and update undolinked list according to the category of the stored undo logs. The strategies of these two linked lists are also different when they are reused. take a look

    • insert undo list

      insert undoOnly logs of type are stored in the linked list TRX_UNDO_INSERT_REC. undoThis type of undo log is useless after the transaction is committed and can be cleared. So after a transaction is committed, when reusing the insert undo linked list of this transaction (there is only one page in this linked list), you can directly overwrite a set of undo logs written by the previous transaction, and write a set of undo logs for the new transaction from scratch log, as shown in the figure below:

      insert image description here
      As shown in the figure, suppose there is a linked list used by a transaction insert undo. When the transaction is committed, only 3 undo logs are inserted into the insert undo linked list. This insert undo linked list only applies for one Undo page. Assuming at this moment 该页面已使用的空间小于整个页面大小的3/4, then the next transaction can reuse this insert undolinked list (there is only one page in the linked list). Assuming that a new transaction reuses the insert undolinked list at this time, the old set of undo logs can be directly overwritten and a new set of undologs can be written.

    • update undo linked list
      After a transaction is committed, the logs update undoin its linked list undocannot be deleted immediately (these logs are used for MVCC, which we will talk about later). So if subsequent transactions want to reuse update undothe linked list, they cannot overwrite the logs written by previous transactions undo. UndoThis is equivalent to writing multiple groups of logs in the same page undo, and the effect looks like this
      insert image description here

Nine, rollback segment

9.1 The concept of rollback segment

We now know that a transaction can allocate up to 4 page linked lists during execution Undo, and different transactions have different Undopage linked lists at the same time, so there can actually be many undo page linked lists in the system at the same time. In order to better manage these linked lists , InnoDBa page called . We can understand that each page linked list is equivalent to a class, and this linked list is equivalent to the monitor of this class. If you find the monitor of this class, you can find other students in the class (other students are equivalent ). Sometimes the school needs to convey the spirit to these classes, and it needs to call all the monitors in the conference room, which is equivalent to a conference room.Rollback Segment HeaderUndofrist undo pageundo slotUndofirst undo pagenormal undo pageRollback Segment Header

Let's take a look at Rollback Segment Headerwhat this so-called page looks like (take the default 16KB as an example):

insert image description here
InnoDBIt is stipulated that each Rollback Segment Headerpage corresponds to a segment, and this segment is called Rollback Segment, that is 回滚段. Different from the various sections we introduced before, there Rollback Segmentis actually only one page in this one (this may be InnoDBbecause they think that if they want to allocate pages for a certain purpose, they must first apply for a section, or they think that although the current version MySQLactually Rollback Segmentonly has One page, but it may be possible to add pages in later versions).

After understanding Rollback Segmentthe meaning of , let's take a look at Rollback Segment Headerthe meaning of each part of this so-called page:

  • TRX_RSEG_MAX_SIZE: The maximum value of the sum of the number of pages in Rollback Segmentall page linked lists managed in this book . In other words, Ben .UndoUndoRollback Segment中所有Undo页面链表中的Undo页面数量之和不能超过TRX_RSEG_MAX_SIZE代表的值

    The value of this property is infinite by default, that is, we can write as many Undo pages as we want.

    小提士:
    Infinity is actually just an exaggeration. The largest number that can be represented by 4 bytes is 0xFFFFFFFF, but we will see later that the number 0xFFFFFFFF has a special purpose, so the actual value of TRX_RSEG_MAX_SIZE is 0xFFFFFFFE.

  • TRX_RSEG_HISTORY_SIZE: HistoryThe number of pages occupied by the linked list.

  • TRX_RSEG_HISTORY: HistoryThe base node of the linked list.

  • TRX_RSEG_FSEG_HEADER: This is a Rollback Segment10-byte structure corresponding to this Segment Headersection, through which you can find the corresponding section INODE Entry.

TRX_RSEG_UNDO_SLOTS: The page number collection of each Undopage linked list , that is, the collection.first undo pageundo slot

A page number occupies 4bytes. For 16KBa page of this size, this TRX_RSEG_UNDO_SLOTSpart stores a total of 1024bytes undo slot, so a total of 1024 × 4 = 4096个字节

9.2 Apply for the Undo page linked list from the rollback segment

Initially, since no Undopage linked list is allocated to any transaction, Rollback Segment Headereach of it undo slotis set to a special value for a page: FIL_NULL(the corresponding hexadecimal value is 0xFFFFFFFF), indicating that it undo slotdoes not point to any page.

As time goes by, there are transactions that need to allocate Undopage linked lists, so start from the first one of the rollback segment undo slotto see if undo slotthe value is FIL_NULL:

  • If it is FIL_NULL, then create a new segment (that is), in the table space Undo Log Segment, and then apply for a page from the segment as a Undopage link list first undo page, and then set undo slotthe value of this to the page number of the page just applied, which means This undo slotis assigned to this transaction.

  • If not FIL_NULL, it means that this undo slothas already pointed to a undolinked list, that is to say, this undo slothas been occupied by other transactions, then skip to the next one undo slot, judge undo slotwhether the value of this is correct FIL_NULL, and repeat the above steps.

Rollback Segment HeaderIncluded in a page 1024个undo slot, 1024if undo slotthe value of this is none FIL_NULL, it means that 1024this undo slothas already been named (assigned to a certain transaction). At this time, because the new transaction can no longer obtain a new Undopage list, it is Will roll back the transaction and report an error to the user:

Too many active concurrent transactions

When the user sees this error, he can choose to re-execute the transaction (maybe other transactions are committed during re-execution, and the transaction can be allocated a Undopage list).

When a transaction commits, what it occupies undo slothas two fates:

  • If the undo slotpage Undolinked list pointed to meets the condition of being reused (that is, the Undo page linked list we mentioned above only occupies one page and the used space is less than 3/4 of the entire page).

    It undo slotis in the state of being cached, and InnoDBit is stipulated that the attribute Undoof the page linked list TRX_UNDO_STATE(the r part first undo pageof the attribute Undo Log Segment Heade) will be set to TRX_UNDO_CACHED.

    The cached ones will be added to a linked list, and will be added to different linked lists undo slotdepending on the type of the corresponding page linked list:Undo

    • If the corresponding Undopage linked list is insert undoa linked list, it undo slotwill be added to insert undo cachedthe linked list.

    • If the corresponding Undopage linked list is update undoa linked list, it undo slotwill be added to update undo cachedthe linked list.

    A rollback segment corresponds to the above two cachedlinked lists. If there is a new transaction to be allocated , it is first found in the undo slotcorresponding linked list. cachedIf it is not cached undo slot, it will go to the rollback segment 的Rollback Segment Headerpage to find it again.

  • If the page linked list undo slotpointed to Undodoes not meet the condition of being reused, then the undo slotcorresponding Undopage linked list will be handled differently according to the type:

  • If the corresponding Undopage linked list is insert undoa linked list, the attribute Undoof the page linked list TRX_UNDO_STATEwill be set to TRX_UNDO_TO_FREE, and then the Undosegment corresponding to the page linked list will be released (meaning that the pages in the segment can be used for other purposes), and then the undo slotThe value is set to FIL_NULL.

  • If the corresponding Undopage linked list is update undoa linked list, the property Undoof the page linked list TRX_UNDO_STATEwill be set to TRX_UNDO_TO_PRUGE, and undo slotthe value will be set to FIL_NULL, and then a set of logs written by this transaction undowill be placed in the so-called Historylinked list (note that , the segment corresponding to the Undo page linked list will not be released here, because these undologs are still useful~)

9.3 Multiple rollback segments

We say that the most allocated during the execution of a transaction 4个Undo页面链表, but only in a rollback segment 1024个undo slot, obviously undo slotthe number is a bit small. 1Even if we assume that only one Undopage linked 1024list is allocated during the execution of a read-write transaction , that undo slotcan only support 1024simultaneous execution of two read-write transactions, and it will crash if there are more. This is equivalent to the fact that the conference room can only accommodate 1024one monitor to hold a meeting at the same time. If thousands of people come to the conference room for a meeting at the same time, then those monitors will have no place to sit and can only wait for the people in front to finish the meeting before going in. open.

It is said that InnoDBthere is indeed only one rollback segment in the early development stage, but InnoDBlater realized this problem, how to solve this problem? There are not enough conference rooms, so we need to build a few more conference rooms. So InnoDBdefining 128a rollback segment in one breath is equivalent to having one 128 × 1024 = 131072个undo slot. 1Assuming that only one page linked list is allocated during the execution of a read-write transaction Undo, then 131072concurrent execution of multiple read-write transactions can be supported at the same time (I have never seen so many transactions executed concurrently on one machine~)

Each rollback segment corresponds to a Rollback Segment Headerpage. If there is 128a rollback segment, there must be 128a Rollback Segment Headerpage. The addresses of these pages must be stored somewhere! Therefore, a certain area of InnoDB​​the No. page of the system table space 5contains 128 8-byte grids:

insert image description here
Each 8-byte grid is constructed like this:

insert image description here
As shown, each 8-byte grid actually consists of two parts:

  • 4 bytes in size Space ID, representing the ID of a tablespace.

  • 4 bytes in size Page number, representing a page number.

That is to say, each 8-byte size 格子is equivalent to a pointer, pointing to a certain page in a certain table space, and these pages are Rollback Segment Header. One thing to note here is that to locate a Rollback Segment Header, you need to know the corresponding tablespace ID, which means that different rollback segments may be distributed in different tablespaces.

So through the above description, we can roughly understand that there are two page addresses 5stored in the No. page of the system table space, each of which is equivalent to a rollback segment. In the page, it also contains , each corresponding to a page linked list. Let's draw a diagram:128Rollback Segment HeaderRollback Segment HeaderRollback Segment Header1024个undo slotundo slotUndo

insert image description here
It's much more refreshing once the picture is drawn.

9.4 Classification of Rollback Segments

Let's number the 128 rollback segments. The initial rollback segment is called rollback segment No. 0, and then increments successively. The last rollback segment is called rollback segment No. 127. The 128 rollback segments can be divided into two categories:

  • 第0号、第33~127号回滚段属于一类. Among them, rollback segment No. 0 must be in the system table space (that is, the Rollback Segment Header page corresponding to rollback segment No. 0 must be in the system table space), and rollback segments No. 33 to 127 can be in the system table space. In, or in the undo tablespace configured by yourself, we will talk about how to configure it later.

    If a transaction needs to allocate an Undo page linked list due to changes to the records of the ordinary table during execution, the corresponding undo slot must be allocated from this type of segment.

  • 第1~32号回滚段属于一类. These rollback segments must be in the temporary tablespace (corresponding to the ibtmp1 file in the data directory).

    If a transaction needs to allocate an Undo page linked list due to changes to the records of the temporary table during execution, it must be allocated from this type of segment undo slot.

That is to say, if a transaction changes both the records of the ordinary table and the records of the temporary table during execution, it is necessary to allocate 2 rollback segments for this record, and then go to the two rollback segments respectively. corresponding to the allocation in the segment undo slot.

I don’t know if you have any doubts, why should we divide different types of rollback segments for ordinary tables and temporary tables? This has to Undostart with the page itself. We say that Undoa page is actually FIL_PAGE_UNDO_LOGthe abbreviation of a page of type . After all, it is also an ordinary page. As we said before, you must redowrite the corresponding log before modifying the page, so that when the system crashes and restarts, it can restore to the state before the crash. UndoWriting logs to the page undoitself is also a process of writing pages. For this reason, many types of logs InnoDBare designed , such as , , , and so on. That is to say, any changes we make to the page will record the corresponding type of log. But for temporary tables, the logs generated by modifying temporary tables only need to be valid during system operation. If the system crashes, it is not necessary to restore the pages where these logs are located when restarting, so when writing for temporary tables There is no need to record the corresponding log when the page is displayed . Summarize the reasons for dividing different types of rollback segments for ordinary tables and temporary tables: when modifying the pages in the rollback segments for ordinary tables, you need to record the corresponding logs, and modify the pages in the rollback segments for temporary tables , there is no need to record the corresponding log.redoMLOG_UNDO_HDR_CREATEMLOG_UNDO_INSERTMLOG_UNDO_INITUndoredoundoundoUndoredoUndoredoUndoredo

小提士:
If we only make changes to the records of the ordinary table, then only the rollback segment for the ordinary table will be allocated for the transaction, and the rollback segment for the temporary table will not be allocated. But if we only make changes to the records of the temporary table, then the transaction will be allocated both the rollback segment for the ordinary table and the rollback segment for the temporary table (but the allocation of the rollback segment will not Immediately allocate the undo slot, and only allocate the undo slot in the rollback segment when the Undo page linked list is really needed).

9.5 Detailed process of allocating Undo page linked list for transaction

Undo页面There are a lot of concepts mentioned above, and everyone should feel a little bit dizzy. Next, let’s take the example of a transaction changing the records of a common table to sort out the complete process of allocating a linked list during transaction execution.

  • Before making any changes to the records of ordinary tables for the first time during the execution of a transaction, it will first allocate a rollback segment to page 5 of the system table space (in fact, it is to obtain the address of a page) Rollback Segment Header. Once a certain rollback segment is assigned to this transaction, then when the records of the ordinary table are changed in the transaction later, it will not be allocated repeatedly.

    Use the legendary round-robin(recycling) method to allocate rollback segments. For example, if the current transaction allocates rollback segment No. 0, then the next transaction will allocate rollback segment No. 33, and the next transaction will allocate rollback segment No. 34. To put it simply, these rollback segments are allocated Allocation to different affairs in turn (it's so simple and rude, there's nothing to say).

  • After the rollback segment is assigned, first check cachedwhether the two linked lists of the rollback segment have been cached undo slot. For example, if the transaction is INSERTan operation, go to the i linked list corresponding to the rollback segment nsert undo cachedto see if there is any cache undo slot; If the transaction is DELETEan operation, go to the linked list corresponding to the rollback segment update undo cachedto see if there is any cache undo slot. If there is a cache undo slot, then undo slotassign this cache to the transaction.

  • If there is no cache undo slotavailable for allocation, then it is necessary to Rollback Segment Headerfind an available undo slotallocation in the page for the current transaction.

    Rollback Segment HeaderThe way to allocate the available pages from the page undo slotis also mentioned above, that is, starting from the 0th one undo slot, if the undo slotvalue of this value FIL_NULLmeans that this undo slotis free, then undo slotassign this to the current transaction, otherwise check whether the first one undo slotis satisfied condition, and so on, until the last one undo slot. If none of the 1024 undo slots have a value FIL_NULL, just report an error (generally this will not happen)~

  • After finding the available one undo slot, if it undo slotis obtained from cachedthe linked list, then its corresponding Undo Log Segmentone has been allocated, otherwise it needs to be re-allocated Undo Log Segment, and then Undo Log Segmentapply for a page from it as Undothe page linked list first undo page.

  • Then the transaction can undowrite the log to the above application Undo页面链表了!

The steps to modify the records of the temporary table are the same as those described above, so I won’t go into details here. However, it needs to be emphasized again that if a transaction changes both the records of the ordinary table and the records of the temporary table during execution, then it is necessary to allocate 2 rollback segments for this record. In fact, different transactions executed concurrently can also be assigned the same rollback segment, as long as they are assigned different ones undo slot.

9.6 Rollback segment related configuration

9.6.1 Configure the number of rollback segments

We said earlier that there are a total of rollback segments in the system 128. In fact, this is only the default value. We can configure the number of rollback segments through startup parameters innodb_rollback_segments. The configurable range is 1~128. But this parameter will not affect the number of rollback segments for temporary tables, the number of rollback segments for temporary tables is always 32, that is to say:

  • If we innodb_rollback_segmentsset the value to 1, there will only be 1one rollback segment available for normal tables, but there will still be 32one available for temporary tables.

  • If we innodb_rollback_segmentsset the value to a number between , the effect is the same 2~33as setting it to .1

  • If we set the number innodb_rollback_segmentsto 大于33, then the number of rollback segments available for ordinary tables is 该值减去32.

9.6.2 Configuring the undo tablespace

By default, the rollback segments (number 0and 33~127rollback segments) set up for ordinary tables are allocated to the system table space. The No. 1 0rollback segment is always in the system table space, but the No. 1 33~127rollback segment can be placed in a custom undotable space through configuration. But this configuration can only be used when the system is initialized (when creating the data directory). Once the initialization is completed, it cannot be changed again. Let's take a look at the relevant startup parameters:

  • By innodb_undo_directoryspecifying undothe directory where the tablespace is located, if this parameter is not specified, the default undodirectory where the tablespace is located is the data directory.

  • By innodb_undo_tablespacesdefining undothe number of tablespaces. The default value of this parameter is 0, indicating that no undotable space is created.

    Rollback segments No. 33~127 can be evenly distributed to different undo tablespaces.

小提士:
If we specify to create the undo tablespace when the system is initialized, then the No. 0 rollback segment in the system tablespace will be unavailable.

For example, when we initialized the system, we specified as innodb_rollback_segments, so that the No. and No. rollback segments will be distributed to a table space respectively .35innodb_undo_tablespaces23334undo

undo表空间One of the benefits of setting up is that when the undofile in the table space is large enough, it can be automatically converted into a small file. The size of the system table space can only be continuously increased, but cannot be truncated.undo表空间截断truncate

Guess you like

Origin blog.csdn.net/liang921119/article/details/130905213