Mysql storage engine---InnoDB data page structure

1. Data page structure

The 16KB storage space represented by the data page can be divided into multiple parts. Different parts have different functions. Each part is as shown in the figure:
insert image description here
The storage space of an InnoDB data page is roughly divided into 7 parts. The number of bytes occupied by some parts is certain, and the number of bytes occupied by some parts is not certain.
insert image description here

storage of records

Among the 7 components of the page, our own stored records are stored in the row format we specify toUser Recordspart. However, when the page was first generated, there was no User Records part. Whenever we inserted a record, we would apply for a record-sized space from the Free Space part, that is, the unused storage space, and divide it into the User Records part. When all the space in the Free Space part is replaced by the User Records part, it means that the page is used up . If there are new records inserted, you need to apply for a new page. As shown below:
insert image description here

record header information

name size (bit) describe
reserved bit 1 1 did not use
reserved slot 2 1 did not use
delete_mask 1 mark whether the record is deleted
min_rec_mask 1 This mark will be added to the smallest record in the non-leaf node of each level of the B+ tree
n_owned 4 Indicates the number of records owned by the current record
heap_no 13 Indicates the location information currently recorded in the record heap
record_type 3 Indicates the type of the current record, 0 represents a normal record, 1 represents a B+ tree non-leaf node record, 2 represents the smallest record, and 3 represents the largest record
next_record 16 Indicates the relative position of the next record

For example: create a table

CREATE TABLE page_demo(
  c1 INT,
  c2 INT,
  c3 VARCHAR(10000),
  PRIMARY KEY (c1)
) CHARSET=ascii ROW_FORMAT=Compact;

INSERT INTO page_demo VALUES(1, 100, 'aaaa'), (2, 200, 'bbbb'), (3, 300, 'cccc'),(4, 400, 'dddd');

insert image description here
insert image description here

There is no gap when each record is stored in User Records. Here, each record is drawn on a line for the convenience of viewing.

  • The attribute delete_mask
    marks whether the current record is deleted or not, occupying 1 binary bit. When the value is 0, it means that the record has not been deleted, and when it is 1, it means that the record has been deleted.

The reason why these deleted records are not immediately removed from the disk is that it takes performance consumption to rearrange other records on the disk after removing them, so it is just a delete mark, all deleted records will be A so-called garbage linked list is formed . The space occupied by the records in this linked list is called the so-called reusable space. Later, if new records are inserted into the table, the storage space occupied by these deleted records may be overwritten.

  • The heap_no
    attribute indicates the current record's location on this page
  1. The positions of the 4 records we inserted on this page are: 2 , 3 , 4 , 5
  2. InnoDB automatically adds two records to each page. Since these two records are not inserted by ourselves, they are sometimes called pseudo records or virtual records (0, 1). One of these two pseudo-records represents the smallest record and the other represents the largest record.
  3. For a complete record , comparing the size of the record is comparing the size of the primary key . For example, the primary key values ​​of the 4 rows of records we insert are: 1, 2, 3, 4, which means that the sizes of these 4 records increase sequentially from small to large.
  4. A dummy record or dummy record consists of 5 bytes of record header information and a fixed part of 8 bytes in size. insert image description here
    Since these two records are not our own defined records, they are not stored in the User Records section of the page, they are placed separately in a section called Infimum + Supremum
  • The record_type
    attribute indicates the type of the current record. There are 4 types of records in total, 0 for ordinary records, 1 for B+ tree non-leaf node records, 2 for minimum records, and 3 for maximum records.

The records we insert ourselves are ordinary records, and their record_type values ​​are 0, while the record_type values ​​of the smallest record and the largest record are 2 and 3, respectively

  • next_record
    It represents the address offset from the real data of the current record to the real data of the next record.

The next record refers not to the next record in the order of our insertion, but the next record in the order of primary key values ​​from small to large . And it is stipulated that the next record of the Infimum record (minimum record) is the user record with the smallest primary key value in this page, and the next record of the user record with the largest primary key value in this page is the Supremum record (maximum record).
As shown in the figure: Our records form a singly linked list in the order of primary keys from small to large. insert image description here
No matter how we add, delete, or modify the records in the page, InnoDB will always maintain a single linked list of records. Values ​​are concatenated in ascending order. For example: the schematic diagram after deleting the second record: the record with the
insert image description here
primary key value of 2 was deleted by us, but the storage space was not reclaimed. If we insert this record into the table again, InnoDB will not be affected by the new record. Instead, it directly reuses the storage space of the original deleted record. (When there are multiple deleted records in the data page, the next_record attribute of these records will form a garbage list of these deleted records, so that this part of the storage space can be reused later.)

Q: What does the pointer next_record ask to point to the position between the record header information and the real data? Why not just point to the beginning of the entire record, where the extra information of the record begins?
Answer: Because this position is just right, reading to the left is the recording header information, and reading to the right is the real data. In addition, the information in the variable-length field length list and the NULL value list are stored in reverse order, which can make the distance between the fields located at the front of the record and their corresponding field length information in the memory closer, which may improve the cache memory. hit rate.

2. Detailed explanation of page structure

1. Page Directory

When we query a record:

SELECT * FROM page_demo WHERE c1 = 3;

The dumbest way: Start with the Infimum record (the smallest record) and work your way up the linked list. Because the values ​​of each record in the linked list are arranged in ascending order, when the primary key value of the record represented by a node in the linked list is greater than the primary key value you want to search, you can stop the search. But if a page stores a very large number of records, this lookup is still performance-intensive.

When we usually want to find a certain content from a book, we usually look at the table of contents first, find the page number of the book corresponding to the content we need to find, and then go to the corresponding page number to view the content. InnoDB also makes a similar directory for our records.

  1. Divide all normal records (including largest and smallest records, excluding records marked as deleted) into groups.
  2. In the header information of the last record of each group (that is, the largest record in the group)n_ownedThe attribute indicates how many records the record has, that is, how many records are in the group.
  3. The address offset of the last record of each group is extracted separately and stored in sequence near the end of the page. This place is the so-called Page Directory, which is the page directory. These address offsets in the page directory are calledgroove(Slot), so this page directory is composed of slots.

For example, there are 6 normal records in the page_demo table, and InnoDB divides them into two groups. There is only one minimum record in the first group, and the remaining 5 records in the second group.

The n_owned property in the header information of the min and max records:

  • The smallest record has an n_owned value of 1, which means that there is only 1 record in the group that ends with the smallest record, which is the smallest record itself.
  • The n_owned value of the largest record is 5, which means that there are only 5 records in the group that ends with the largest record, including the largest record itself and the 4 records we inserted ourselves.
    insert image description here
    The address offset is replaced by the arrow pointing, and the relationship between the record and the page directory is purely logically represented:
    insert image description here

Provisions : For the group where the smallest record is located, there can only be 1 record, the group where the largest record is located can only have between 1 and 8 records, and the number of records in the remaining groups can only be 4. Between ~8 bars

  • Initially, there are only two records in a data page, the smallest record and the largest record, which belong to two groups.
  • After each record is inserted, the slot whose primary key value is larger than the primary key value of this record and has the smallest difference will be found from the page directory , and then the n_owned value of the record corresponding to the slot is incremented by 1, indicating that this group has been added again. One record until the number of records in the group equals 8.
  • When the number of records in a group is equal to 8 and then a record is inserted, the records in the group will be split into two groups , one with 4 records and the other with 5 records. This process will add a slot in the page directory to record the offset of the largest record in the new group.

The process of finding a record with a specified primary key value in a data page

  1. Determine the slot where the record is located by dichotomy , and find the record with the smallest primary key value in the slot.
  2. Traverse the records in the group in which the slot is located through the record's next_record property .

2. Page Header

Page Header It is the second part of the page structure. This part occupies a fixed 56 bytes and is dedicated to storing variousstatus information
insert image description here

  • PAGE_DIRECTION
    If the primary key value of a newly inserted record is greater than the primary key value of the previous record, we say that the insertion direction of this record is the right, otherwise it is the left. The state used to indicate the insertion direction of the last record is PAGE_DIRECTION.
  • PAGE_N_DIRECTION
    assumes that the direction of inserting new records is the same for several consecutive times. InnoDB will record the number of records inserted in the same direction, and this number is represented by the state of PAGE_N_DIRECTION. Of course, if the insertion direction of the last record is changed, the value of this state will be cleared and counted again.

3. File Header

Page Header is a variety of status information recorded specifically for data pages .

File Header is common to various types of pages, that is to say, different types of pages will use File Header as the first component, which describes some information that is common to various pages.
insert image description here

  • FIL_PAGE_SPACE_OR_CHKSUM
    represents the checksum of the current page.

The checksum is that for a very long byte string, we will use some algorithm to calculate a relatively short value to represent this very long byte string. This relatively short value is called the checksum. In this way, the checksums of the two long byte strings are compared before comparing the two long byte strings. If the checksums are different, the two long byte strings must be different, so the direct comparison is omitted. The time consumption of two relatively long byte strings.

  • Each page of FIL_PAGE_OFFSET
    has a separate page number, and InnoDB can uniquely locate a page through the page number.
  • FIL_PAGE_TYPE
    This represents the type of the current page.

InnoDB divides pages into different types for different purposes. The data pages we introduced above are actually data pages that store records. In fact, there are many other types of pages. (The type of data page is actually FIL_PAGE_INDEX, which is the so-called index page)

  • Both FIL_PAGE_PREV and FIL_PAGE_NEXT
    InnoDB store data in units of pages. Sometimes we store a certain type of data and occupy a very large space. InnoDB may not be able to allocate a very large storage space for this amount of data at one time. If it is stored in multiple discontinuous pages, these pages need to be associated. FIL_PAGE_PREV and FIL_PAGE_NEXT represent the page numbers of the previous and next pages of this page, respectively . By building a doubly linked list, many pages are linked together without the pages being physically linked.

Not all types of pages have previous and next page attributes, but data pages (FIL_PAGE_INDEX) have these two attributes, so all data pages are actually a double-linked list.
insert image description here

4. File Trailer

The InnoDB storage engine will store the data on the disk, but the disk speed is too slow, and the data needs to be loaded into the memory for processing in units of pages. If the data in the page is modified in the memory, then a certain modified Time needs to sync data to disk.

What if the synchronization is terminated due to some factors (such as power failure) when synchronizing data? So in order to detect whether a page is complete, InnoDB adds a File Trailer part at the end of each page, which consists of 8 bytes.

  • The first 4 bytes represent the checksum of the page.
    This part corresponds to the checksum in the File Header . Whenever a page is modified in memory, its checksum must be calculated before synchronization. Because the File Header is in front of the page, the checksum will be first synchronized to the disk. When it is completely written, the checksum will be The checksum is also written to the end of the page. If the full synchronization is successful, the checksum at the header and end of the page should be the same. If the power is turned off after writing halfway, the checksum in the File Header represents the modified page, and the checksum in the File Trialer represents the original page. The difference between the two means that the synchronization is in the middle. Something went wrong.
  • The last 4 bytes represent the log sequence position (LSN) corresponding to the last modification of the page.
    This part is also used to verify the integrity of the page.

In order to ensure the integrity of the page synchronized from the memory to the disk, the checksum of the data in the page and the LSN value corresponding to the last modification of the page are stored at the head and tail of the page. If the checksum of the head and tail and the LSN value are If the verification is unsuccessful, it means that there is a problem with the synchronization process.

File Trailer is similar to File Header and is common to all types of pages.

3. Summary

Each data page can form a doubly linked list , and the records in each data page will form a singly , and each data page will generate a page directory for the records stored in it . , when searching for a record through the primary key, you can use the dichotomy method in the page directory to quickly locate the corresponding slot, and then traverse the records in the corresponding group of the slot to quickly find the specified record.
insert image description here
Among them, page a, page b, page c ... page n These pages may not be connected in physical structure, as long as they are associated through a doubly linked list.

Guess you like

Origin blog.csdn.net/myjess/article/details/115529704