Table of contents
- 1. The need for transaction rollback
- 2. Transaction id
- 3. Format of undo log
- 4. General Linked List Structure
- Five, FIL_PAGE_UNDO_LOG page
- Six, Undo page linked list
- 7. The specific writing process of the undo log
- 8. Reuse the Undo page
- Nine, rollback segment
1. The need for transaction rollback
When we learned about affairs earlier, we said that affairs need to be guaranteed 原子性
, that is 事务中的操作要么全做,要么全不做
. But sometimes there will be some situations in the transaction, such as:
情况一:
Various errors may be encountered during transaction execution, such as errors in the server itself, operating system errors, or even errors caused by sudden power failures- Case 2: Programmers can manually enter statements during transaction execution
ROLLBACK
to end the execution of the current transaction
The above two situations will cause the transaction to end halfway through the execution, but many things may have been modified during the transaction execution process. In order to ensure the transaction, 原子性
we need to change things back to the original state. This process is called 回滚
(English name : rollback
), this can create a false impression: this transaction does not seem to do anything, so it meets the atomicity requirements.
It's like playing cards with our friends when we were young. The regret card is a very typical rollback operation. For example, if you play two three cards, the operation corresponding to the regret card is to take out the two three cards. The rollback in the database is similar to the regret card. You insert a record, and the rollback operation corresponds to deleting this record; you update a record, and the rollback operation corresponds to updating the record to the old value; you If a record is deleted, the rollback operation corresponds to inserting the record again. It seems so simple
From the above description, we can already vaguely feel that whenever we want to make changes to a record (the changes here can refer to INSERT
, DELETE
, UPDATE
), we need to keep a hand-write down everything needed for rollback. For example:
- When you insert a record, you must at least write down the primary key value of this record. When you roll back later, you only need to delete the record corresponding to the primary key value.
- You delete a record, at least write down the contents of this record, so that when you roll back later, you can insert the records composed of these contents into the table
- If you modify a record, you must at least record the old value before modifying this record, so that you can update this record to the old value when you roll back later
These things recorded by the database for rollback are called undo logs, and the English name undo log
is called it undo日志
. One thing to note here is that since the query operation (SELECT) does not modify any user records, it is not necessary to record the corresponding log when the query operation is executed undo
. In reality InnoDB
, undo
the log is not as simple as what we said above, and the format of the log generated by different types of operations undo
is also different, but let’s put these details that are easy to confuse for a while. Let's go back and see what the transaction id is
2. Transaction id
2.1 When to assign an id to a transaction
As we said earlier when we learned about the introduction to transactions, a transaction can be one 只读事务
, or one 读写事务
:
- We can
START TRANSACTION READ ONLY
open one through a statement只读事务
. In a read-only transaction, we cannot add, delete, or modify ordinary tables (tables that can also be accessed by other transactions), but we can add, delete, and modify temporary tables. - We can
START TRANSACTION READ WRITE
start a transaction through a statement读写事务
, or use a statement to open a transaction by default . In the read and write transaction, we can perform addition, deletion, modification and query operations on the tableBEGIN
.START TRANSACTION
读写事务
If , , and 增
operations are performed on a table during the execution of a transaction , the storage engine will assign it a unique one , as follows:删
改
InnoDB
事务id
-
For
只读事务
the transaction, only when it performs add, delete, and modify operations on a temporary table created by a user for the first time, will it assign a transaction id to this transaction, otherwise it will not assign a transaction id小提示:
As we said earlier,EXPLAIN
when executing and analyzing a query plan for a certain query statement, sometimes you will see a Using temporary prompt in the Extra column, which indicates that an internal temporary table will be used when executing the query statement.CREATE TEMPORARY TABLE
This so-called internal temporary table is not the same as the user temporary table we created manually. When the transaction is rolled back, it is not necessary to roll back the internal temporary table used in the execution of the SELECT statement . Temporary tables are not assigned transaction ids. -
For
读写事务
example, a transaction id will be assigned to a transaction only when it performs add, delete, or modify operations on a table (including temporary tables created by users) for the first time, otherwise it will not assign a transaction id
Sometimes although we have enabled one 读写事务
, the transaction is full of query statements, and no addition, deletion, or modification statements are executed, which means that this transaction will not be assigned a transaction id
After talking for a long time, what is the use of transaction id? This will be kept secret for now, and I will talk about it in detail step by step later. Now just know that a transaction will be assigned a unique transaction id only when the transaction makes changes to the records in the table.
2.2 How the transaction id is generated
This 事务id
is essentially a number, and its allocation strategy is row_id
roughly the same as the allocation strategy for hidden columns (columns that InnoDB automatically creates when the user does not create a primary key and UNIQUE key for the table) we mentioned earlier. The specific strategy is as follows:
- The server will maintain a global variable in memory. Whenever a transaction needs to be allocated
事务id
, the value of the variable will be assigned to the transaction as the transaction id, and the variable will be incremented by 1 - Whenever the value of this variable
256
is a multiple of , the value of this variable will be refreshed to anMax Trx ID
attribute called in the page number 5 of the system table space. The占用8个字节
storage space of this attribute - When the system restarts next time, it will
Max Trx ID
load the attributes mentioned above into the memory, add 256 to the value and assign it to the global variable we mentioned earlier (because the value of the global
variable may be greater thanMax Trx ID
the attribute value)
This ensures that the transaction id value assigned throughout the system is an increasing number. The transaction that is assigned an id first gets a smaller transaction id, and the transaction that is assigned an id later gets a larger transaction id.
2.3 trx_id hidden column
When we learned InnoDB
the record row format, we emphasized that in addition to saving complete user data, the records of the clustered index will automatically add a hidden column named , trx_id
if roll_pointer
the user is a user 没有在表中定义主键以及UNIQUE键
, a row_id
hidden column named List. So the actual structure of a record on a page looks like this:
The columns in it trx_id
are actually pretty easy to understand, and they are just where a statement that makes changes to the clustered index record is located 事务对应的事务id
(the changes here can be INSERT
, , DELETE
or UPDATE
operations). As for roll_pointer
the hidden columns, we will analyze them later~
3. Format of undo log
In order to realize the atomicity of the transaction, the storage engine needs to write down the corresponding log first InnoDB
when actually performing 增
, 删
, or a record. Generally, every time a change is made to a record, it corresponds to a log, but in some operations of updating records, it may also correspond to 2 logs, which we will talk about later. During the execution of a transaction, there may be , , several records, that is to say, many corresponding logs need to be recorded, and these logs will be numbered from the beginning, that is to say, they are called No. 0 undo log, No. 1 undo log according to the order of generation. No. 1 undo log, ..., No. n undo log, etc., this number is also called .改
undo
undo
undo
新增
删除
更新
undo
undo
0
undo no
These undo logs are recorded in pages of type FIL_PAGE_UNDO_LOG
(the corresponding hexadecimal number is 0x0002
, students who have forgotten what the page type is, need to go back and look at the previous chapters). These pages can be allocated from the system table space, or from a table space dedicated to storing undo logs, which is the so-called undo tablespace
internal allocation. However, we will talk about how to allocate undo
pages for storing logs later. Now let’s take a look at what kind of undo
logs will be generated by different operations~ For the smooth development of the story, let’s first create a demo18
table named:
mysql> CREATE TABLE demo18 (
id INT NOT NULL,
key1 VARCHAR(100),
col VARCHAR(100),
PRIMARY KEY (id),
KEY idx_key1 (key1)
)Engine=InnoDB CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.06 sec)
There are 3 columns in this table, among which id
the column is the primary key, we key1
have created one for the column 二级索引
, and the col column is an ordinary column. As we InnoDB
mentioned in the data dictionary we introduced earlier, each table will be assigned a unique one table id
. We can check what a certain table corresponds to through the tables information_schema
in the system database . Now let's check how much it corresponds to :innodb_tables
table id
demo18
table id
mysql> SELECT * FROM information_schema.innodb_tables WHERE name = 'testdb/demo18';
+----------+---------------+------+--------+-------+------------+---------------+------------+--------------+--------------------+
| TABLE_ID | NAME | FLAG | N_COLS | SPACE | ROW_FORMAT | ZIP_PAGE_SIZE | SPACE_TYPE | INSTANT_COLS | TOTAL_ROW_VERSIONS |
+----------+---------------+------+--------+-------+------------+---------------+------------+--------------+--------------------+
| 1128 | testdb/demo18 | 33 | 6 | 66 | Dynamic | 0 | Single | 0 | 0 |
+----------+---------------+------+--------+-------+------------+---------------+------------+--------------+--------------------+
1 row in set (0.00 sec)
As can be seen from the query results, demo18
the table corresponds table id
to 1128
, first remember this value, we will use it later
3.1 The undo log corresponding to the INSERT operation
As we said before, when we insert a record into the table, there will be a distinction between 乐观插入
and 悲观插入
, but no matter how you insert it, the final result is that this record is placed in a data page. If you want to roll back the insert operation, then just delete this record, that is to say, when writing the corresponding undo
log, mainly record the primary key information of this record. Therefore, a log of InnoDB
type is designed , and its complete structure is shown in the following figure:TRX_UNDO_INSERT_REC
undo
According to the diagram, we emphasize a few points:
-
undo no
Yes in a transaction从0开始递增
, that is to say, as long as the transaction is not committed, every time an undo log is generated, the undo no of the log will increase by 1. -
If the primary key in the record contains only one column, you only need to record the sum occupied by the column in the type
TRX_UNDO_INSERT_REC
of log, and if the primary key in the record contains , then all the occupied need to be recorded (in the figure represents the column occupied The size of the storage space represents the real value of the column).undo
存储空间大小
真实值
多个列
每个列
存储空间大小和对应的真实值
len
value
小提示:
When we insert a record into a table, we actually need to insert a record into the clustered index and all secondary indexes. However, when recording undo logs, we only need to consider the situation when inserting records into the clustered index, because in fact, the clustered index records and the secondary index records are in one-to-one correspondence. When we roll back the insert operation, we only need to Know the primary key information of this record
, and then perform the corresponding delete operation according to the primary key information. When the delete operation is performed, the corresponding records in all secondary indexes will also be deleted. The undo logs corresponding to the DELETE operation and UPDATE operation mentioned later are also for the clustered index records, and we will not emphasize it later.
Now we insert two records into demo18:
mysql> BEGIN; # 显式开启一个事务,假设该事务的id为100
Query OK, 0 rows affected (0.00 sec)
mysql> # 插入两条记录
mysql> INSERT INTO demo18(id, key1, col) VALUES (1, 'AWM', '狙击枪'), (2, 'M416', '步枪');
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
Because the primary key of the record contains only one column, we only need to record the length ( ) and the storage space occupied by the column to be inserted into the record id
in the corresponding log . In this example, two records are inserted, so two logs of type will be generated :undo
id
id列的类型为INT,INT类型占用的存储空间长度为4个字节
真实值
TRX_UNDO_INSERT_REC
undo
- The first
undo
logundo no
is0
, the length of the storage space occupied by the record primary key is4
, and the actual value is1
. Draw a schematic like this:
- In the second
undo
log , the length of the storage space occupied by the record primary key is , and the actual value is . Draw a schematic like this:undo no
1
4
2
Compared with the first article undo日志
, undo no
there 主键各列信息
is a difference.
The meaning of roll_pointer hidden column
It's time to unveil roll_pointer
the real veil. This 7
field that takes up 3 bytes is actually not mysterious at all. It is essentially a pointer to the corresponding record undo日志的一个指针
. demo18
For example, we inserted a record into the table above 2
, and each record has a corresponding one undo日志
. Records are stored in FIL_PAGE_INDEX
pages of type (that is, the data pages we have been talking about before), and undo
logs are stored in FIL_PAGE_UNDO_LOG
pages of type . The effect is shown in the figure:
It can also be seen more intuitively from the figure that roll_pointer
the essence is a pointer to the log corresponding to the record undo
. roll_pointer
However, the specific meaning of each byte of these 7 bytes undo
will be explained in detail after we talk about how to allocate pages for storing logs~
3.2 Undo log corresponding to DELETE operation
We know that the records inserted into the page will next_record
form a one-way linked list according to the attributes in the record header information. We call this linked list a normal record linked list; as we said before when we talked about the data page structure, deleted records In fact, a linked list will also be formed according to the attributes in the record header information next_record
, but in this linked list 记录占用的存储空间可以被重新利用
, so this linked list is also called 垃圾链表
. PageHeader
The section has an PAGE_FREE
attribute called , which points to the head node in the garbage list consisting of deleted records. For the smooth development of the story, let's draw a picture first, assuming that the distribution of records in a certain page at the moment is like this (this is not a demo18
record in the table, but just an example we randomly cite):
In order to highlight the theme, in this simplified version of the schematic diagram, we only delete_mask
show the flags of the records. It can be seen from the figure that 正常记录链表中包含了3条正常记录
, the garbage linked list contains 2条已删除记录
, and the storage space occupied by these records in the garbage linked list can be reused. Page Header
The value of the property of the part of the page PAGE_FREE
represents a pointer to the head node of the garbage list. Assuming that we are going to use DELETE
the statement 正常记录链表
to delete the last record in , in fact, the deletion process needs to go through two stages:
-
Phase 1: Only
delete_mask
set the identification bit of the record to1
, and do not modify the others (in fact, the values of these hidden columns of the record will be modifiedtrx_id
)roll_pointer
. InnoDB calls this stagedelete mar
k. This is how the process is drawn:It can be seen that the last record in the normal record linked list
delete_mask值被设置为1
is not added垃圾链表
. That is to say, the record is in one state at this time中间状态
, and the deleted record has been in this so-called state until the transaction in which the delete statement is committed is committed中间状态
.小提示:
Why is there such a strange intermediate state? In fact, it is mainly to realize a function called MVCC, haha, I will introduce it later. -
Phase 2:
当该删除语句所在的事务提交之后
, there will be专门的线程
a real deletion of the record later. The so-called real deletion is to remove the record正常记录链表
from and add垃圾链表
it to it, and then adjust some other information of the page, such as the number of user records in the pagePAGE_N_RECS
, the position of the last inserted recordPAGE_LAST_INSERT
, the pointer of the head node of the garbage listPAGE_FREE
, The number of bytes that can be reused in the pagePAGE_GARBAGE
, and some information about the page directory, etc. InnoDB calls this phasepurge
.After
阶段二
the execution is completed, the record is truly deleted. The storage space occupied by this deleted record can also be reused. This is how it is drawn: comparing with the picture, we should also pay attention to one point. When adding the deleted record to the garbage list, it will actually modify the value of the attribute
when it is added .链表的头节点处
PAGE_FREE
小提示:
The Page Header part of the page has a PAGE_GARBAGE attribute, which records the total number of bytes occupied by the reusable storage space in the current page. Whenever a deleted record is added to the garbage list, the value of the PAGE_GARBAGE attribute will be added to the storage space occupied by the deleted record. PAGE_FREE points to the head node of the garbage list, and then whenever a new record is inserted, first judge whether the storage space occupied by the deleted record represented by the head node pointed to by PAGE_FREE is enough to accommodate the newly inserted record, if not, directly Apply for new space on the page to store this record (yes, you read that right, it will not try to traverse the entire garbage list to find a node that can accommodate the new record). If it can be accommodated, then directly reuse the storage space of this deleted record, and point PAGE_FREE to the next deleted record in the garbage list. But there is a problem here. If the storage space occupied by the newly inserted record is smaller than the storage space occupied by the head node of the garbage list, it means that part of the storage space occupied by the record corresponding to the head node is not used. This part of the space is called the debris space. Wouldn't these fragmented spaces be used forever? In fact, it is not. The size of the storage space occupied by these fragmented spaces will be counted in the PAGE_GARBAGE attribute. These fragmented spaces will not be reused until the entire page is almost used up. However, when the page is almost full, if you insert another record, At this time, the space for a complete record cannot be allocated on the page. At this time, we will first check whether the combined space of PAGE_GARBAGE and the remaining available space can accommodate this record. If possible, InnoDB will try to reorganize The process of reorganizing the records in the page is to open a temporary page first, insert the records in the page one by one, because no fragments will be generated when inserting in sequence, and then copy the content of the temporary page to this page, so that you can Free up those fragmented spaces (obviously, reorganizing the records in the page is more performance-consuming).
From the above description, we can also see that before the transaction where the delete statement is committed, it will only go through phase one, that is, delete mark
phase (we don’t need to roll back after committing, so we only need to consider doing phase one of the delete operation affected by rollback). For this purpose, a type of log InnoDB
is designed , and its complete structure is shown in the following figure:TRX_UNDO_DEL_MARK_REC
undo
My god, there are too many attributes in this~ (In fact, the meaning of most of the attributes has been introduced above) Yes, there are indeed a lot, but please don’t pay attention, if you can’t remember Don't force yourself, I'm listing them all here to make everyone familiar. I would like to trouble everyone to overcome the intensive panic disorder first, and then look up the attributes in the log of this type above, paying special attention to these points TRX_UNDO_DEL_MARK_REC
:undo
-
Before operating on a record
delete mark
, the oldtrx_id
androll_pointer
hidden column values of the record need to be recorded in the corresponding log, which is the sum attributeundo
shown in our figure . This has the advantage that the log corresponding to the record before modification can be found through the log . For example, in a transaction, we first insert a record, and then perform a delete operation on the record. The schematic diagram of this process is as follows:oldtrx_id
old roll_pointer
undo
old roll_pointer
undo
-
It can be seen from the figure that
delete mark
after the operation is executed, its correspondingundo
log and theINSERT
correspondingundo
log of the operation form a linked list. This is very interesting. This linked list is called版本链
. Now it seems that we can’t see the use of this version chain. Let’s take a look later. After talking about theUPDATE
correspondingundo
log of the operation, this so-called version chain will be slowly displayed. Out of its forceful place. -
Different from the log of type
TRX_UNDO_INSERT_REC
,undo
the log of typeTRX_UNDO_DEL_MARK_REC
hasundo
one more索引列各列信息
content, that is to say, if a column is included in an index, its related information should be recorded in this索引列各列信息
part, the so-called related information Including the position of the column in the record (indicatedpos
by ), the storage space occupied by the column (indicatedlen
by ), and the actual value of the column (indicated byvalue
). So索引列各列信息
the stored content is essentially<pos, len, value>
a list of . This part of information is mainly used in中间状态记录
the second stage of the real deletion after the transaction is committed, that is, it ispurge
used in the stage. How to use it can be ignored now~
We have finished the introduction, now continue to delete a record in the above transaction id
for the transaction, for example, we delete the record for :100
id
1
mysql> DELETE FROM demo18 WHERE id = 1;
Query OK, 1 row affected (0.01 sec)
delete mark
The structure of the log corresponding to this operation undo
is as follows:
according to this figure, we have to pay attention to the following points:
-
Because this undo log is generated in the transaction with id 100
第3条undo日志,所以它对应的undo no就是2
. -
When operating on the record
delete mark
,trx_id
the value of the hidden column of the record is100
(that is to say, the latest modification of the record occurred in this transaction), so fill in100
theold trx_id
attribute. Then takeroll_pointer
out the value of the hidden column of the record and fill itold roll_pointer
in the attribute, so that the logold roll_pointer
generated when the record was last changed can be found through the attribute value .undo
-
Since there are 2 indexes in the demo18 table:
一个是聚簇索引
, one is二级索引idx_key1
.pos
As long as it is a column included in the index, the position ( ), occupied storage space (len
) and actual value ( ) of this column in the recordvalue
need to be stored in the undo log.-
For the primary key, there is only one
id
column, and the relevant information stored in the undo log is:-
pos
:id
The column is the primary key, that is, it is recorded第一个列
, and its corresponding pos value is 0. pos takes 1 byte to store. -
len
: The type of the id column is INT, occupying 4 bytes, so the value of len is 4. len occupies 1 byte to store. -
value
: The value of the id column in the deleted record is 1, that is, the value of the value is 1. value takes 4 bytes to store. -
Draw a picture to demonstrate it like this:
-
So for
id
the column, the final storage result is <0, 4, 1
>, and the storage space occupied by storing this information is1 + 1 + 4 = 6个字节
.
-
-
For
idx_key1
, there is only onekey1
column, andundo
the relevant information stored in the log is:-
pos
: The key1 column is arranged after the id column, trx_id column, and roll_pointer column, and its corresponding pos value3
. pos takes 1 byte to store. -
len
: The type of the key1 column is VARCHAR(100), and the utf8 character set is used. The actual storage content of the deleted record is AWM, so it occupies a total of 3 bytes, that is, the value of len is 3. len occupies 1 byte to store. -
value
: The value of the key1 column in the deleted record is AWM, that is, the value of the value is AWM. value takes 3 bytes to store. -
Draw a picture to demonstrate it like this:
-
So for
key1
the column, the final storage result is <3, 3, 'AWM'
>, and the storage space occupied by storing this information is1 + 1 + 3 = 5
bytes.
-
As can be seen from the above description,
<0, 4, 1>
and<3, 3, 'AWM'>
occupy11
a total of bytes. Thenindex_col_info len
it occupies2
a byte, so it takes up a total of13
bytes, and the number 13 is filledindex_col_info len
in the attribute. -
3.3 Undo log corresponding to UPDATE operation
When executing UPDATE
a statement, InnoDB handles 更新主键
these 不更新主键
two cases completely differently.
3.3.1 The case where the primary key is not updated
In the case of not updating the primary key, it can be subdivided into the case where the storage space occupied by the updated column does not change or changes.
In-place update (in-place update)
When updating a record, for each column to be updated, if the storage space occupied by the updated column and the column before the update are the same, then an in-place update can be performed, that is, the corresponding column can be directly modified on the basis of the original record. The value of the column. Again, the storage space occupied by each column is the same before and after the update. Any updated column occupies a larger storage space than that after the update, or the storage space occupied before the update is smaller than that after the update. Update in place. For example, there is a record with an id value of 2 in the demo18 table, and the size of its columns is shown in the figure (because the utf8 character set is used, the two characters 'rifle' occupy 6 bytes):
Suppose we have a statement like this UPDATE
:
UPDATE demo18 SET key1 = 'P92', col = '手枪' WHERE id = 2;
In this UPDATE statement, the col column is updated from a rifle to a pistol, occupying 6 bytes before and after, that is, the occupied storage space has not changed; the key1 column is updated from M416 to P92, that is, it is changed from 4
bytes The update is 3 bytes, which does not meet the conditions required for in-place update, so in-place update cannot be performed. But if the UPDATE statement looks like this:
UPDATE demo18 SET key1 = 'M249', col = '机枪' WHERE id = 2;
Since the storage space occupied by each updated column is the same before and after the update, such a statement can perform an in-place update.
Delete old records first, then insert new ones
In the case of not updating the primary key, if the storage space occupied by any of the updated columns is inconsistent before and after the update, then you need to delete this old record from the clustered index page first, and then according to The value of the updated column creates a new record and inserts it into the page.
Please note that the deletion we are talking about here is not delete mark
an operation, but a real deletion, that is, remove this record from the normal record list and add it to the garbage list, and modify the corresponding statistical information on the page ( For example PAGE_FREE
, PAGE_GARBAGE
wait for these information). However, the thread that does the real delete operation here is not another special thread used when doing the operation in the nagging statement, but the real delete operation is performed synchronously by the user thread. After the real delete, it must be updated according to each DELETE
column purge
The new record created by the value is inserted.
Here, if the storage space occupied by the newly created record does not exceed the space occupied by the old record, then you can directly reuse the storage space occupied by the old record added to the garbage list, otherwise you need to apply for a new section of space in the page for The new record is used, if there is no space available in this page, then the page splitting operation is required, and then a new record is inserted.
For the situation where UPDATE does not update the primary key (including the above-mentioned in-place update and first delete the old record and then insert the new record), InnoDB designed a type of TRX_UNDO_UPD_EXIST_REC
undo log, its complete structure is as follows:
TRX_UNDO_DEL_MARK_REC
In fact, most of the properties are similar to the types of undo logs we have introduced , but we still need to pay attention to the following points:
n_updated
The attribute indicates that several columns will be updated after the execution of this UPDATE statement, and the following ones respectively<pos, old_len, old_value>
indicate the position of the updated column in the record, the storage space occupied by the column before the update, and the actual value of the column before the update.- If
UPDATE
the column updated in the statement includes an index column, the column information of the index column will also be added, otherwise this part will not be added.
Now continue to update a record in the above transaction for id
the transaction, for example, let's update the record for :100
id
2
BEGIN; # 显式开启一个事务,假设该事务的id为100
# 插入两条记录
INSERT INTO demo18(id, key1, col) VALUES (1, 'AWM', '狙击枪'), (2, 'M416', '步枪');
# 删除一条记录
DELETE FROM demo18 WHERE id = 1;
# 更新一条记录
UPDATE demo18 SET key1 = 'M249', col = '机枪' WHERE id = 2;
UPDATE
The size of the column updated by this statement has not changed, so 采用就地更新
it can be executed as follows. When actually changing the page record, an TRX_UNDO_UPD_EXIST_REC
undo log of type will be recorded first, which looks like this:
With this picture, let's pay attention to these places such as:
- Because this
undo
log is the first log generated in the transactionid
, it corresponds to .100
4
undo
undo no
3
- The log
roll_pointer
pointingundo no
to 1 of this log is2
the log generated when the record with the primary key value is insertedundo
, that is, the log generated when the record was changed last timeundo
. - Since the value of
UPDATE
the index column is updated in this statementkey1
, it is necessary to record the information of each column of the index column, that is, to fillkey1
in the information of the primary key and the column before updating.
3.3.2 The case of updating the primary key
In the clustered index, the records are connected into a one-way linked list according to the size of the primary key value. If we update the primary key value of a record, it means that the position of this record in the clustered index will change. Change, for example, if you will record 主键值从1更新为10000
, if there are a lot of records whose primary key values are distributed between 1 and 10000, then these two records may be very far apart in the clustered index, or even separated in the middle So many pages. For UPDATE
the case where the record primary key value is updated in the statement, InnoDB
the clustered index is processed in two steps:
-
Delete mark operation on old records
高能注意:
Here is the delete mark operation! That is to say,UPDATE
before the transaction of the statement is committed, only onedelete mark
operation is performed on the old record, and after the transaction is committed, a special thread performspurge
the operation and adds it to the garbage list. This must be distinguished from what we said above that when the primary key value of the record is not updated, the old record is actually deleted first, and then the new record is inserted!小提示:
The reason why the delete mark operation is only performed on the old record is that other transactions may also access this record at the same time. If it is actually deleted and added to the garbage list, other transactions will not be able to access it. This function is the so-called MVCC, and we will talk about what an MVCC is in detail in the following chapters. -
Create a new record based on the updated values of each column, and insert it into the clustered index (need to reposition the inserted position).
Since the primary key value of the updated record has changed, it is necessary to re-locate the location of this record from the clustered index, and then insert it.
For UPDATE
the case where the statement updates the record primary key value, before operating on the record delete mark
, an undo log of type will be recorded TRX_UNDO_DEL_MARK_REC
; when a new record is inserted later, a log of type will be recorded TRX_UNDO_INSERT_REC
, undo
that is to say, one record for each pair When the primary key value is changed, two undo logs will be recorded. We have talked about the format of these logs above, so I won’t go into details.
4. General Linked List Structure
Multiple linked lists will be used in 写入undo日志
the process, and many linked lists have the same node structure, as shown in the figure:
In a certain table space, we can uniquely locate the position of a node through the page number of a page and the offset within the page. These two pieces of information are equivalent to a pointer pointing to this node. so:
Pre Node Page Number
The combination of andPre Node Offset
is a pointer to the previous nodeNext Node Page Number
The combination of andNext Node Offset
is a pointer to the next node.
The whole List Node
takes up 12 bytes of storage space. In order to better manage the linked list, InnoDB proposes one 基节点的结构
, which stores this 链表的头节点
, 尾节点
and 链表长度信息
the structure diagram of the base node is as follows:
in:
List Length
Indicates how many nodes there are in the linked list.First Node Page Number和First Node Offset
The combination is a pointer to the head node of the linked list.Last Node Page Number和Last Node Offset
The combination is a pointer to the tail node of the linked list.
The whole List Base Node
takes up 16
bytes of storage space. So the schematic diagram of using List Base Node
the List Node
linked list composed of these two structures is like this:
Five, FIL_PAGE_UNDO_LOG page
When we talked about the table space before, we said that the table space is actually composed of many 页面构成
pages 默认大小为16KB
. There are different types of these pages. For example, FIL_PAGE_INDEX
pages of type are used to store clustered indexes and secondary indexes, FIL_PAGE_TYPE_FSP_HDR
pages of type are used to store table space header information, and various other types of pages, one of which is called This FIL_PAGE_UNDO_LOG
type of page is specially used 存储undo日志
, and the general structure of this type of page is shown in the following figure (take the default 16KB size as an example):
A page of type FIL_PAGE_UNDO_LOG
is simply referred to as Undo
a page. File Header
The and in the above picture File Trailer
are the common structures of various pages. We have learned many times before, so I won’t go into details here. Undo Page Header
is Undo页面
unique, let's take a look at its structure:
The meaning of each attribute is as follows:
-
TRX_UNDO_PAGE_TYPE
: What kind of undo logs are going to be stored on this page.
We introduced several types of undo logs earlier, which can be divided into two categories:-
TRX_UNDO_INSERT
(Denoted by decimal 1): TheTRX_UNDO_INSERT_REC
undo log of type belongs to this category, and is generallyINSERT
generated by a statement, or this type of logUPDATE
will also be generated when the primary key is updated in the statement .undo
-
TRX_UNDO_UPDATE
(Denoted by decimal 2), except for logs of typeTRX_UNDO_INSERT_REC
,undo
all other types of logs belong to this category, such as the , etc.undo
that we mentioned earlier , and the logs generated by the statement generally belong to this category.TRX_UNDO_DEL_MARK_REC
TRX_UNDO_UPD_EXIST_REC
DELETE
UPDATE
undo
The optional values of this
TRX_UNDO_PAGE_TYPE
attribute are the two above, which are used to mark which category of logs this page is used to storeundo
. Logs of different categoriesundo
cannot be stored together. For example , if a page hasUndo
anTRX_UNDO_PAGE_TYPE
attribute value ofTRX_UNDO_INSERT
The storage type isTRX_UNDO_INSERT_REC
logsundo
, and other types of undo logs cannot be placed on this page.小提示:
The reason why the undo logs are divided into two categories is that the undo logs of type TRX_UNDO_INSERT_REC can be deleted directly after the transaction is committed, while other types of undo logs also need to serve the so-called MVCC and cannot be deleted directly. Processing needs to be treated differently. Of course, if you are confused after reading this passage, you don’t need to read it again. Now you only need to know that undo logs are divided into two categories. We will explain more details later. -
-
TRX_UNDO_PAGE_START
: Indicates whereundo
the log is stored in the current page, or the starting offset of the firstundo
log in this page. -
TRX_UNDO_PAGE_FREE
: Corresponding to the above , it indicates the offset at the end of theTRX_UNDO_PAGE_START
last log stored in the current page , or starting from this position, you can continue to write new undo logs.undo
假设现在向页面中写入了3条undo日志,那么
TRX_UNDO_PAGE_START
和TRX_UNDO_PAGE_FREE
的示意图就是这样:
当然,在最初一条undo日志也没写入的情况下,TRX_UNDO_PAGE_START
和TRX_UNDO_PAGE_FREE
的值是相同的。 -
TRX_UNDO_PAGE_NODE:代表一个List Node结构(链表的普通节点,我们上边刚说的),下边马上用到这个属性,稍安勿躁。
六、Undo页面链表
6.1 单个事务中的Undo页面链表
因为一个事务可能包含多个语句,而且一个语句可能对若干条记录进行改动,而对每条记录进行改动前,都需要记录1条或2条的undo日志
,所以在一个事务执行过程中可能产生很多undo日志
,这些日志可能一个页面放不下,需要放到多个页面中,这些页面就通过我们上边介绍的TRX_UNDO_PAGE_NODE
属性连成了链表:
大家可以看一看上边的图,一边情况下把链表中的第一个Undo页称它为first undo page
,因为在first undo page
中除了记录Undo Page Header
之外,还会记录其他的一些管理信息。其余的Undo页面称之为normal undo page
。
在一个事务执行过程中,可能混着执行INSERT、DELETE
、UPDATE
语句,也就意味着会产生不同类型的undo日志。但是我们前边又说过,同一个Undo
页面要么只存储TRX_UNDO_INSERT
大类的undo日志,要么只存储TRX_UNDO_UPDATE
大类的undo日志,反正不能混着存,所以在一个事务执行过程中就可能需要2个Undo页面的链表,一个称之为insert undo
链表,另一个称之为update undo
链表,画个示意图就是这样:
In addition, the logs InnoDB
generated when the records of ordinary tables and temporary tables are changed undo
should be recorded separately (explained later), so there are at most 4 Undo
linked lists composed of pages as nodes in a transaction:
of course, not at the beginning of the transaction These 4 linked lists will be allocated for this transaction, but allocated on demand. The specific allocation strategy is as follows:
- When the transaction is just started, an Undo page linked list is not allocated either.
- When a record is inserted into an ordinary table or an operation of updating the primary key of a record is performed during transaction execution, a linked
insert undo
list of an ordinary table will be allocated to it. - When the records in the ordinary table are deleted or updated during the execution of the transaction, a
update undo
linked list of the ordinary table will be assigned to it. - When a record is inserted into the temporary table or the operation of updating the primary key of the record is performed during the execution of the transaction, a linked list of the temporary table will be allocated to it
insert undo
. - When the records in the temporary table are deleted or updated during the execution of the transaction, a
update undo
linked list of the temporary table will be assigned to it.
To sum it up is:什么时候需要啥时候再分配,不需要就不分配
.
6.2 Undo page linked list in multiple transactions
In order to improve the writing efficiency of the undo log as much as possible, 不同事务执行过程中产生的undo日志需要被写入到不同的Undo页面链表中
. For example, now there are two transactions with transaction ids 1 and 2 respectively, we call them trx 1
sum respectively trx 2
, assuming that during the execution of these two transactions:
trx 1
The operation is done on the ordinary table , and the operationDELETE
is done on the temporary table . A linked list will be allocated , which are:INSERT
UPDATE
InnoDB
trx 1
3
update undo
Linked list for normal tableinsert undo
Linked list for temporary table- A linked list against a temporary table
update undo
.
- trx 2 performed INSERT, UPDATE, and DELETE operations on ordinary tables, but did not make changes to temporary tables.
InnoDB
A linked list will betrx 2
allocated2
, which are:- Insert undo linked list for ordinary table
- Update undo linked list for common table.
To sum up, in the process of trx 1
and trx 2
execution, InnoDB
a total of 5 Undo
page linked lists need to be allocated for these two transactions. This is how to draw a picture:
If there are more transactions, it means that more Undo page linked lists may be generated.
7. The specific writing process of the undo log
7.1 The concept of segment (Segment)
If you have carefully read the chapter on table space, you should be impressed by the concept of this segment. We spent a lot of space talking about this concept. Simply put, this segment is a logical concept, essentially composed of several scattered pages and several complete areas. For example, a B+ tree index is divided into two segments, a leaf node segment and a non-leaf node segment, so that leaf nodes can be stored together as much as possible, and non-leaf nodes can be stored together as much as possible. Each segment corresponds to an INODE Entry structure. This INODE Entry structure describes various information of this segment, such as the ID of the segment, various linked list base nodes in the segment, and the page numbers of scattered pages, etc. (specifically, in this structure You can revisit the meaning of each attribute in the chapter on table space). We also said before that in order to locate an INODE Entry, InnoDB designed a Segment Header
structure:
The whole Segment Header
occupies 10 bytes in size, and the meaning of each attribute is as follows:
-
Space ID of the INODE Entry: ID of the table space where the INODE Entry structure is located.
-
Page Number of the INODE Entry: The page number of the INODE Entry structure.
-
Byte Offset of the INODE Ent: The offset of the INODE Entry structure in this page
Knowing the table space ID, page number, and offset within the page, can you uniquely locate the address of an INODE Entry~
小提士:
The various concepts of segments in this part are explained in detail in the chapter on the table space. I will mention it here just to wake up your sleeping memory. If you have any unclear points, you can jump back to the table space again. read carefully
7.2 Undo Log Segment Header
InnoDB
According to the regulations, each Undo page linked list corresponds to a segment, called Undo Log Segment
. That is to say, the pages in the linked list are all applied for from this section, so they first undo page
designed a Undo Log Segment Header
part called the first page of the Undo page linked list, which is the one mentioned above. This part contains the The information of the segment corresponding to the linked list segment header
and other information about this segment, so Undo页
the first page of the linked list actually looks like this:
You can see that Undo链表
the first page of this page is more than the normal page Undo Log Segment Header
. Let's take a look at its structure:
The meaning of each attribute is as follows:
-
TRX_UNDO_STATE
: What state is the Undo page linked list in? AUndo Log Segment
possible state includes the following:-
TRX_UNDO_ACTIVE
: Active state, that is, an active transaction is writing undo logs to this segment. -
TRX_UNDO_CACHED
: The cached state. The Undo page linked list in this state is waiting to be reused by other transactions. -
TRX_UNDO_TO_FREE
: For the insert undo linked list, if the linked list cannot be reused after its corresponding transaction commits, it will be in this state. -
TRX_UNDO_TO_PURGE
: For the update undo linked list, if the linked list cannot be reused after its corresponding transaction commits, it will be in this state. -
TRX_UNDO_PREPARED
: Contains undo logs generated by transactions in the PREPARE phase
小提士:
When and how the Undo page linked list will be reused will be discussed in detail later. The PREPARE stage of the transaction only appears in the so-called distributed transaction. This book will not introduce more about distributed transactions, so you can ignore this state for now. -
-
TRX_UNDO_LAST_LOG
: The last position in the Undo page linked listUndo Log Header
. -
TRX_UNDO_FSEG_HEADER
Undo
: The information of the segment corresponding to the linked list on this pageSegment Header
(that is, the 10-byte structure we introduced in the previous section, through which you can find the corresponding segmentINODE Entry
) -
TRX_UNDO_PAGE_LIST
: The base node of the Undo page list.We said above that the Undo Page Header part of the Undo page has a 12-byte
TRX_UNDO_PAGE_NODE
attribute, which represents aList Node
structure. EachUndo
page containsUndo Page Header
a structure, and these pages can be linked into a linked list through this property. ThisTRX_UNDO_PAGE_LIST
attribute represents the base node of this linked list, of course, this base node only exists in theUndo
first page of the page linked list, that is,first undo page
in.
Undo Log Header
The way a transaction Undo
writes undo
a log to a page is very simple and violent, that is, it writes directly into it, and writes another one immediately after writing one, and each undo
log is intimate. After writing an Undo page, apply for a new page from the segment, then insert this page into the Undo page linked list, and continue writing to the newly applied page. InnoDB considers the undo logs written into an Undo page linked list by the same transaction as a group. For example, the trx 1 we introduced above will allocate 3 Undo page linked lists, so it will also write 3 groups of undo logs; Since trx 2 will allocate 2 undo page linked lists, it will also write 2 groups of undo logs. Every time a group of undo logs is written, some attributes about this group will be recorded before this group of undo logs. InnoDB calls the place where these attributes are stored Undo Log Header
. Therefore, before the first page of the Undo page list is actually written into the undo log, it will actually be filled with U ndo Page Header
, Undo Log Segment Header
, Undo Log Header
these three parts, as shown in the figure:
The Undo Log Header
specific structure is as follows:
There are a lot of attributes again, let's take a look at what they all mean:
-
TRX_UNDO_TRX_ID
: Generate the transaction id of this group of undo logs -
TRX_UNDO_TRX_NO
: A sequence number is generated after the transaction is committed, and this sequence number is used to mark the commit order of the transaction (the sequence number submitted first is small, and the sequence number submitted later is large). -
TRX_UNDO_DEL_MARKS
: Mark whether this groupundo
of logs containsDelete mark
undo logs generated due to operations. -
TRX_UNDO_LOG_START
: Indicates the page offset of the first undo log in this group of undo logs. -
TRX_UNDO_XID_EXISTS
: Whether this group of undo logs contains XID information. -
TRX_UNDO_DICT_TRANS
: Mark whether this group of undo logs is generated by DDL statements. -
TRX_UNDO_TABLE_ID
: If TRX_UNDO_DICT_TRANS is true, then this attribute indicates the table id of the table operated by the DDL statement. -
TRX_UNDO_NEXT_LOG
: The offset in the page where the next set of undo logs starts. -
TRX_UNDO_PREV_LOG
: The offset in the page where the undo logs of the previous group start.小提士:
Generally speaking, an Undo page linked list only stores a set of undo logs generated during the execution of a transaction, but in some cases, after a transaction is committed, the subsequent opened transaction may reuse this Undo page linked list, so that As a result, multiple sets of Undo logs may be stored in an Undo page. TRX_UNDO_NEXT_LOG and TRX_UNDO_PREV_LOG are used to mark the offsets of the next set and the previous set of undo logs in the page. Regarding when to reuse the Undo page linked list and how to reuse this linked list, we will explain in detail later. For now, just understand the meaning of the two attributes TRX_UNDO_NEXT_LOG and TRX_UNDO_PREV_LOG. -
TRX_UNDO_HISTORY_NODE
: A 12-byte List Node structure representing a node called a History linked list.
summary
For the page linked list that has not been reused Undo
, the first page of the linked list, that is, before first undo page
it is actually written into undo
the log, will be filled Undo Page Header、Undo Log Segment Header、Undo Log Header这3个部分
, and then it will be officially written into the undo log. For other pages, that is, before normal undo page
actually writing to undo
the log, it will only be filled Undo Page Header
. List Base Node
The storage of the linked list first undo page的Undo Log Segment Header部分
, List Node
the information is stored in the part of each Undo
page undo Page Header
, so draw a Undo
schematic diagram of the page linked list like this:
8. Reuse the Undo page
We said earlier that in order to improve the performance of multiple concurrent transactions written to undo
the log, InnoDB
we decided to allocate a corresponding Undo
page linked list for each transaction (up to 4 linked lists may be allocated separately). But this also caused some problems. For example, in fact, only one or a few records may be modified during the execution of most transactions. For a certain Undo page linked list, only very few undo logs are generated, and these undo logs may only occupy a little bit. For storage space, Undo
wouldn’t it be too wasteful to create a new page linked list (although there is only one page in this linked list) to store such a loss of undo logs every time a transaction is opened ? It is indeed quite wasteful, so InnoDB decided to reuse the page list of the transaction in some cases after the transaction is committed Undo
. The conditions for whether a Undo
page linked list can be reused are simple:
-
The linked list contains only one
Undo
page.
If a transaction is generated during execution非常多的undo日志
, it may apply for a lot of pages to be added to the Undo page linked list. After the transaction is submitted, if the pages in the entire linked list are reused, it means that even if the new transaction does notUndo
write manyundo
logs to the page linked list, a lot of pages must be maintained in the linked list. Pages that are not available cannot be used by other firms, which creates another kind of waste. ThereforeInnoDB
, onlyUndo
when the Undo page list contains only one page, the list can be reused by the next transaction. -
The space already used by the Undo page
小于整个页面空间的3/4
As we said earlier, the Undo page linked list can be divided intoinsert undo
linked list andupdate undo
linked list according to the category of the stored undo logs. The strategies of these two linked lists are also different when they are reused. take a look-
insert undo list
insert undo
Only logs of type are stored in the linked listTRX_UNDO_INSERT_REC
.undo
This type of undo log is useless after the transaction is committed and can be cleared. So after a transaction is committed, when reusing the insert undo linked list of this transaction (there is only one page in this linked list), you can directly overwrite a set of undo logs written by the previous transaction, and write a set of undo logs for the new transaction from scratch log, as shown in the figure below:
As shown in the figure, suppose there is a linked list used by a transactioninsert undo
. When the transaction is committed, only 3 undo logs are inserted into the insert undo linked list. This insert undo linked list only applies for one Undo page. Assuming at this moment该页面已使用的空间小于整个页面大小的3/4
, then the next transaction can reuse thisinsert undo
linked list (there is only one page in the linked list). Assuming that a new transaction reuses theinsert undo
linked list at this time, the old set of undo logs can be directly overwritten and a new set ofundo
logs can be written. -
update undo linked list
After a transaction is committed, the logsupdate undo
in its linked listundo
cannot be deleted immediately (these logs are used for MVCC, which we will talk about later). So if subsequent transactions want to reuseupdate undo
the linked list, they cannot overwrite the logs written by previous transactionsundo
.Undo
This is equivalent to writing multiple groups of logs in the same pageundo
, and the effect looks like this
-
Nine, rollback segment
9.1 The concept of rollback segment
We now know that a transaction can allocate up to 4 page linked lists during execution Undo
, and different transactions have different Undo
page linked lists at the same time, so there can actually be many undo page linked lists in the system at the same time. In order to better manage these linked lists , InnoDB
a page called . We can understand that each page linked list is equivalent to a class, and this linked list is equivalent to the monitor of this class. If you find the monitor of this class, you can find other students in the class (other students are equivalent ). Sometimes the school needs to convey the spirit to these classes, and it needs to call all the monitors in the conference room, which is equivalent to a conference room.Rollback Segment Header
Undo
frist undo page
undo slot
Undo
first undo page
normal undo page
Rollback Segment Header
Let's take a look at Rollback Segment Header
what this so-called page looks like (take the default 16KB as an example):
InnoDB
It is stipulated that each Rollback Segment Header
page corresponds to a segment, and this segment is called Rollback Segment
, that is 回滚段
. Different from the various sections we introduced before, there Rollback Segment
is actually only one page in this one (this may be InnoDB
because they think that if they want to allocate pages for a certain purpose, they must first apply for a section, or they think that although the current version MySQL
actually Rollback Segment
only has One page, but it may be possible to add pages in later versions).
After understanding Rollback Segment
the meaning of , let's take a look at Rollback Segment Header
the meaning of each part of this so-called page:
-
TRX_RSEG_MAX_SIZE
: The maximum value of the sum of the number of pages inRollback Segment
all page linked lists managed in this book . In other words, Ben .Undo
Undo
Rollback Segment中所有Undo页面链表中的Undo页面数量之和不能超过TRX_RSEG_MAX_SIZE代表的值
The value of this property is infinite by default, that is, we can write as many Undo pages as we want.
小提士:
Infinity is actually just an exaggeration. The largest number that can be represented by 4 bytes is 0xFFFFFFFF, but we will see later that the number 0xFFFFFFFF has a special purpose, so the actual value of TRX_RSEG_MAX_SIZE is 0xFFFFFFFE. -
TRX_RSEG_HISTORY_SIZE
:History
The number of pages occupied by the linked list. -
TRX_RSEG_HISTORY
:History
The base node of the linked list. -
TRX_RSEG_FSEG_HEADER
: This is aRollback Segment
10-byte structure corresponding to thisSegment Header
section, through which you can find the corresponding sectionINODE Entry
.
TRX_RSEG_UNDO_SLOTS
: The page number collection of each Undo
page linked list , that is, the collection.first undo page
undo slot
A page number occupies 4
bytes. For 16KB
a page of this size, this TRX_RSEG_UNDO_SLOTS
part stores a total of 1024
bytes undo slot
, so a total of 1024 × 4 = 4096个字节
9.2 Apply for the Undo page linked list from the rollback segment
Initially, since no Undo
page linked list is allocated to any transaction, Rollback Segment Header
each of it undo slot
is set to a special value for a page: FIL_NULL
(the corresponding hexadecimal value is 0xFFFFFFFF), indicating that it undo slot
does not point to any page.
As time goes by, there are transactions that need to allocate Undo
page linked lists, so start from the first one of the rollback segment undo slot
to see if undo slot
the value is FIL_NULL
:
-
If it is
FIL_NULL
, then create a new segment (that is), in the table spaceUndo Log Segment
, and then apply for a page from the segment as aUndo
page link listfirst undo page
, and then setundo slot
the value of this to the page number of the page just applied, which means Thisundo slot
is assigned to this transaction. -
If not
FIL_NULL
, it means that thisundo slot
has already pointed to aundo
linked list, that is to say, thisundo slot
has been occupied by other transactions, then skip to the next oneundo slot
, judgeundo slot
whether the value of this is correctFIL_NULL
, and repeat the above steps.
Rollback Segment Header
Included in a page 1024个undo slot
, 1024
if undo slot
the value of this is none FIL_NULL
, it means that 1024
this undo slot
has already been named (assigned to a certain transaction). At this time, because the new transaction can no longer obtain a new Undo
page list, it is Will roll back the transaction and report an error to the user:
Too many active concurrent transactions
When the user sees this error, he can choose to re-execute the transaction (maybe other transactions are committed during re-execution, and the transaction can be allocated a Undo
page list).
When a transaction commits, what it occupies undo slot
has two fates:
-
If the
undo slot
pageUndo
linked list pointed to meets the condition of being reused (that is, the Undo page linked list we mentioned above only occupies one page and the used space is less than 3/4 of the entire page).It
undo slot
is in the state of being cached, andInnoDB
it is stipulated that the attributeUndo
of the page linked listTRX_UNDO_STATE
(the r partfirst undo page
of the attributeUndo Log Segment Heade
) will be set toTRX_UNDO_CACHED
.The cached ones will be added to a linked list, and will be added to different linked lists
undo slot
depending on the type of the corresponding page linked list:Undo
-
If the corresponding
Undo
page linked list isinsert undo
a linked list, itundo slot
will be added toinsert undo cached
the linked list. -
If the corresponding
Undo
page linked list isupdate undo
a linked list, itundo slot
will be added toupdate undo cached
the linked list.
A rollback segment corresponds to the above two
cached
linked lists. If there is a new transaction to be allocated , it is first found in theundo slot
corresponding linked list.cached
If it is not cachedundo slot
, it will go to the rollback segment的Rollback Segment Header
page to find it again. -
-
If the page linked list
undo slot
pointed toUndo
does not meet the condition of being reused, then theundo slot
correspondingUndo
page linked list will be handled differently according to the type: -
If the corresponding
Undo
page linked list isinsert undo
a linked list, the attributeUndo
of the page linked listTRX_UNDO_STATE
will be set toTRX_UNDO_TO_FREE
, and then theUndo
segment corresponding to the page linked list will be released (meaning that the pages in the segment can be used for other purposes), and then theundo slot
The value is set toFIL_NULL
. -
If the corresponding
Undo
page linked list isupdate undo
a linked list, the propertyUndo
of the page linked listTRX_UNDO_STATE
will be set toTRX_UNDO_TO_PRUGE
, andundo slot
the value will be set toFIL_NULL
, and then a set of logs written by this transactionundo
will be placed in the so-calledHistory
linked list (note that , the segment corresponding to the Undo page linked list will not be released here, because theseundo
logs are still useful~)
9.3 Multiple rollback segments
We say that the most allocated during the execution of a transaction 4个Undo页面链表
, but only in a rollback segment 1024个undo slot
, obviously undo slot
the number is a bit small. 1
Even if we assume that only one Undo
page linked 1024
list is allocated during the execution of a read-write transaction , that undo slot
can only support 1024
simultaneous execution of two read-write transactions, and it will crash if there are more. This is equivalent to the fact that the conference room can only accommodate 1024
one monitor to hold a meeting at the same time. If thousands of people come to the conference room for a meeting at the same time, then those monitors will have no place to sit and can only wait for the people in front to finish the meeting before going in. open.
It is said that InnoDB
there is indeed only one rollback segment in the early development stage, but InnoDB
later realized this problem, how to solve this problem? There are not enough conference rooms, so we need to build a few more conference rooms. So InnoDB
defining 128
a rollback segment in one breath is equivalent to having one 128 × 1024 = 131072个undo slot
. 1
Assuming that only one page linked list is allocated during the execution of a read-write transaction Undo
, then 131072
concurrent execution of multiple read-write transactions can be supported at the same time (I have never seen so many transactions executed concurrently on one machine~)
Each rollback segment corresponds to a Rollback Segment Header
page. If there is 128
a rollback segment, there must be 128
a Rollback Segment Header
page. The addresses of these pages must be stored somewhere! Therefore, a certain area of InnoDB
the No. page of the system table space 5
contains 128 8-byte grids:
Each 8-byte grid is constructed like this:
As shown, each 8-byte grid actually consists of two parts:
-
4 bytes in size
Space ID
, representing the ID of a tablespace. -
4 bytes in size
Page number
, representing a page number.
That is to say, each 8-byte size 格子
is equivalent to a pointer, pointing to a certain page in a certain table space, and these pages are Rollback Segment Header
. One thing to note here is that to locate a Rollback Segment Header, you need to know the corresponding tablespace ID, which means that different rollback segments may be distributed in different tablespaces.
So through the above description, we can roughly understand that there are two page addresses 5
stored in the No. page of the system table space, each of which is equivalent to a rollback segment. In the page, it also contains , each corresponding to a page linked list. Let's draw a diagram:128
Rollback Segment Header
Rollback Segment Header
Rollback Segment Header
1024个undo slot
undo slot
Undo
It's much more refreshing once the picture is drawn.
9.4 Classification of Rollback Segments
Let's number the 128 rollback segments. The initial rollback segment is called rollback segment No. 0, and then increments successively. The last rollback segment is called rollback segment No. 127. The 128 rollback segments can be divided into two categories:
-
第0号、第33~127号回滚段属于一类
. Among them, rollback segment No. 0 must be in the system table space (that is, the Rollback Segment Header page corresponding to rollback segment No. 0 must be in the system table space), and rollback segments No. 33 to 127 can be in the system table space. In, or in the undo tablespace configured by yourself, we will talk about how to configure it later.If a transaction needs to allocate an Undo page linked list due to changes to the records of the ordinary table during execution, the corresponding undo slot must be allocated from this type of segment.
-
第1~32号回滚段属于一类
. These rollback segments must be in the temporary tablespace (corresponding to the ibtmp1 file in the data directory).If a transaction needs to allocate an Undo page linked list due to changes to the records of the temporary table during execution, it must be allocated from this type of segment
undo slot
.
That is to say, if a transaction changes both the records of the ordinary table and the records of the temporary table during execution, it is necessary to allocate 2 rollback segments for this record, and then go to the two rollback segments respectively. corresponding to the allocation in the segment undo slot
.
I don’t know if you have any doubts, why should we divide different types of rollback segments for ordinary tables and temporary tables? This has to Undo
start with the page itself. We say that Undo
a page is actually FIL_PAGE_UNDO_LOG
the abbreviation of a page of type . After all, it is also an ordinary page. As we said before, you must redo
write the corresponding log before modifying the page, so that when the system crashes and restarts, it can restore to the state before the crash. Undo
Writing logs to the page undo
itself is also a process of writing pages. For this reason, many types of logs InnoDB
are designed , such as , , , and so on. That is to say, any changes we make to the page will record the corresponding type of log. But for temporary tables, the logs generated by modifying temporary tables only need to be valid during system operation. If the system crashes, it is not necessary to restore the pages where these logs are located when restarting, so when writing for temporary tables There is no need to record the corresponding log when the page is displayed . Summarize the reasons for dividing different types of rollback segments for ordinary tables and temporary tables: when modifying the pages in the rollback segments for ordinary tables, you need to record the corresponding logs, and modify the pages in the rollback segments for temporary tables , there is no need to record the corresponding log.redo
MLOG_UNDO_HDR_CREATE
MLOG_UNDO_INSERT
MLOG_UNDO_INIT
Undo
redo
undo
undo
Undo
redo
Undo
redo
Undo
redo
小提士:
If we only make changes to the records of the ordinary table, then only the rollback segment for the ordinary table will be allocated for the transaction, and the rollback segment for the temporary table will not be allocated. But if we only make changes to the records of the temporary table, then the transaction will be allocated both the rollback segment for the ordinary table and the rollback segment for the temporary table (but the allocation of the rollback segment will not Immediately allocate the undo slot, and only allocate the undo slot in the rollback segment when the Undo page linked list is really needed).
9.5 Detailed process of allocating Undo page linked list for transaction
Undo页面
There are a lot of concepts mentioned above, and everyone should feel a little bit dizzy. Next, let’s take the example of a transaction changing the records of a common table to sort out the complete process of allocating a linked list during transaction execution.
-
Before making any changes to the records of ordinary tables for the first time during the execution of a transaction, it will first allocate a rollback segment to page 5 of the system table space (in fact, it is to obtain the address of a page)
Rollback Segment Header
. Once a certain rollback segment is assigned to this transaction, then when the records of the ordinary table are changed in the transaction later, it will not be allocated repeatedly.Use the legendary
round-robin
(recycling) method to allocate rollback segments. For example, if the current transaction allocates rollback segment No. 0, then the next transaction will allocate rollback segment No. 33, and the next transaction will allocate rollback segment No. 34. To put it simply, these rollback segments are allocated Allocation to different affairs in turn (it's so simple and rude, there's nothing to say). -
After the rollback segment is assigned, first check
cached
whether the two linked lists of the rollback segment have been cachedundo slot
. For example, if the transaction isINSERT
an operation, go to the i linked list corresponding to the rollback segmentnsert undo cached
to see if there is any cacheundo slot
; If the transaction isDELETE
an operation, go to the linked list corresponding to the rollback segmentupdate undo cached
to see if there is any cacheundo slot
. If there is a cacheundo slot
, thenundo slot
assign this cache to the transaction. -
If there is no cache
undo slot
available for allocation, then it is necessary toRollback Segment Header
find an availableundo slot
allocation in the page for the current transaction.Rollback Segment Header
The way to allocate the available pages from the pageundo slot
is also mentioned above, that is, starting from the 0th oneundo slot
, if theundo slot
value of this valueFIL_NULL
means that thisundo slot
is free, thenundo slot
assign this to the current transaction, otherwise check whether the first oneundo slot
is satisfied condition, and so on, until the last oneundo slot
. If none of the 1024 undo slots have a valueFIL_NULL
, just report an error (generally this will not happen)~ -
After finding the available one
undo slot
, if itundo slot
is obtained fromcached
the linked list, then its correspondingUndo Log Segment
one has been allocated, otherwise it needs to be re-allocatedUndo Log Segment
, and thenUndo Log Segment
apply for a page from it asUndo
the page linked listfirst undo page
. -
Then the transaction can
undo
write the log to the above applicationUndo页面链表了
!
The steps to modify the records of the temporary table are the same as those described above, so I won’t go into details here. However, it needs to be emphasized again that if a transaction changes both the records of the ordinary table and the records of the temporary table during execution, then it is necessary to allocate 2 rollback segments for this record. In fact, different transactions executed concurrently can also be assigned the same rollback segment, as long as they are assigned different ones undo slot
.
9.6 Rollback segment related configuration
9.6.1 Configure the number of rollback segments
We said earlier that there are a total of rollback segments in the system 128
. In fact, this is only the default value. We can configure the number of rollback segments through startup parameters innodb_rollback_segments
. The configurable range is 1~128
. But this parameter will not affect the number of rollback segments for temporary tables, the number of rollback segments for temporary tables is always 32
, that is to say:
-
If we
innodb_rollback_segments
set the value to1
, there will only be1
one rollback segment available for normal tables, but there will still be32
one available for temporary tables. -
If we
innodb_rollback_segments
set the value to a number between , the effect is the same2~33
as setting it to .1
-
If we set the number
innodb_rollback_segments
to大于33
, then the number of rollback segments available for ordinary tables is该值减去32
.
9.6.2 Configuring the undo tablespace
By default, the rollback segments (number 0
and 33~127
rollback segments) set up for ordinary tables are allocated to the system table space. The No. 1 0
rollback segment is always in the system table space, but the No. 1 33~127
rollback segment can be placed in a custom undo
table space through configuration. But this configuration can only be used when the system is initialized (when creating the data directory). Once the initialization is completed, it cannot be changed again. Let's take a look at the relevant startup parameters:
-
By
innodb_undo_directory
specifyingundo
the directory where the tablespace is located, if this parameter is not specified, the defaultundo
directory where the tablespace is located is the data directory. -
By
innodb_undo_tablespaces
definingundo
the number of tablespaces. The default value of this parameter is0
, indicating that noundo
table space is created.Rollback segments No. 33~127 can be evenly distributed to different undo tablespaces.
小提士:
If we specify to create the undo tablespace when the system is initialized, then the No. 0 rollback segment in the system tablespace will be unavailable.
For example, when we initialized the system, we specified as innodb_rollback_segments
, so that the No. and No. rollback segments will be distributed to a table space respectively .35
innodb_undo_tablespaces
2
33
34
undo
undo表空间
One of the benefits of setting up is that when the undo
file in the table space is large enough, it can be automatically converted into a small file. The size of the system table space can only be continuously increased, but cannot be truncated.undo表空间截断
truncate