Mysql advanced study summary 8: overview of InnoDB data storage structure page, internal structure of page, row format

1. The storage structure of the database: page

Since InnoDB is the default storage engine of Mysql, this article mainly analyzes the data storage structure of the InnoDB storage engine.

1.1 The basic unit of interaction between disk and memory: page

InnoDB divides data into several pages, and the default page size in InnoDB is 16KB .

The page is used as the basic unit of interaction between the disk and the memory, that is to say, whether reading one line or multiple lines, the page where these lines are located is loaded. Therefore, the basic unit of database management storage space is page (Page), and the smallest unit of database I/O operation is page .

1.2 Overview of page structure

Page a, page b, page c...page n, these pages may not be connected in physical structure , they can only be associated through a doubly linked list .

The records in each data page will form a one-way linked list according to the order of the primary key value from small to large , and each data page will generate a page directory for the records stored in it . When searching for a record through the primary key, You can use the dichotomy method in the page directory to quickly locate the corresponding slot, and then traverse the records in the group corresponding to the slot to quickly find the specified record.

insert image description here

1.3 Page size

Different database management systems have different page sizes.

For example, in the InnoDB storage engine of Mysql, the big bear of the default page is 16KB:

mysql> show variables like '%innodb_page_size';
+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| innodb_page_size | 16384 |
+------------------+-------+
1 row in set (0.21 sec)

The page size in SQL Server is 8KB. In Oracle, the term "block" is used to represent "page". The block sizes supported by Oracle are: 2KB, 4KB, 8KB, 16KB, 32KB, 64KB.

1.4 The superstructure of the page

In the database, there are also the concepts of Extent, Segment, and Tablespace. The relationship between them is as follows:
insert image description here

  1. Extent is a storage structure one level larger than page. In the InnoDB storage engine, an area allocates 64 consecutive pages . Because the default page size in InnoDB is 16KB, the size of an area is 64 * 16KB = 1MB .
  2. A segment (Segment) is composed of one or more areas. Areas are not required to be adjacent to each other. A segment is the allocation unit in the database, and different types of database objects exist in different segments . When creating data tables and indexes, corresponding segments will be created accordingly. For example, when creating a table, a table segment is created, and when an index is created, an index segment is created.
  3. Tablespace (Tablespace) is a logical container, and the objects stored in the tablespace are segments. A database is composed of one or more table spaces, which can be divided into system table space, user table space, undo table space, temporary table space and so on in terms of management .

2. The internal structure of the page

If the pages are divided by type, the common ones are data pages (save B+ tree nodes), system pages, undo pages, transaction data pages, etc. Data is also the page we use most often.

The 16KB storage space of the data page is divided into seven parts, namely:

  1. File Header
  2. 页头(Page Directory)
  3. Maximum and minimum records (Infimum + supremum)
  4. User Records
  5. Free Space
  6. Page Directory (Page Header)
  7. File Trailer
    insert image description here

The functions of these seven parts are as follows:
insert image description here

2.1 Part 1: File Header, File Trailer

First introduce the general parts of the file : file header and file tail

2.1.1 File Header: File Header

Function: Describe the general information of various pages (such as page number, who is on the previous page, who is on the next page, etc.)
Size: 38 bytes
insert image description here
1) FILE_PAGE_OFFSET:
Each page has a separate page number, and InnoDB passes the page number A page can be uniquely located.
2) FILE_PAGE_TYPE:
This represents the type of the current page
insert image description here
3) FILE_PAGE_PREV, FILE_PAGE_NEXT
represent the page numbers of the previous page and the next page respectively. It is enough to ensure that these pages do not need to be physically continuous, but logically continuous.
insert image description here
4) FILE_PAGE_SPACE_CHKSUM
represents the checksum of the current page.

What is a checksum?
That is to say, for a very long byte string, a relatively short value is calculated to represent the long byte string through a certain algorithm, and this relatively short value is the checksum.
Its function is to compare the checksums of two long byte strings first. If the checksums are not the same, then the two long byte strings must be different, eliminating the need to directly compare two long byte strings. String time loss.

Both the file header and the file tail have this attribute, which allows the database to verify whether the page is complete !

The InnoDB storage engine loads data into the memory for processing in units of pages, and then synchronizes the modified data to the disk. If half of the synchronization is performed at this time and the power is off or other reasons, the page transmission will be incomplete.
Then InnoDB can compare the checksum of the end of the file with the checksum of the header of the file. If the two values ​​are not equal, it means that there is a problem with the page transmission, and it needs to be dealt with accordingly.
5)
The position of the corresponding log sequence (Log Sequence Number) when the FILE_PAGE_LSN page was last modified

2.2 Part 2: Free Space, User Records, Maximum and Minimum Records

2.2.1 Free space: Free Space

The stored records will be stored in the User Records section according to the specified row format .

Whenever a record is inserted, a record-sized space will be allocated from the Free Space section to the User Records section . After the Free Space space application is completed, if there are new records to be inserted, you need to apply for a new page .
insert image description here

2.2.2 User Records: User Records

The data records in User Records are placed in the User Records section one by one according to the specified row format, forming a singly linked list with each other.

Next, let's talk about the record header information of each line of record line format

2.2.2.1 Row Format - Record Header Information

First, create a new table for demonstration purposes. The table name: page_demo, a total of 3 fields. The row format uses the Compact format.
insert image description here

Therefore, a data record is saved in the Compact row format. At this time, the information saved in a data record is as follows.
insert image description here

The following mainly explains the record header information in the row format, and see how it helps mysql to record user data records.
The attributes in the record header information are as follows:
insert image description here

The simplified row format is as follows:
insert image description here
4 pieces of data are inserted at this time, and the situation in the record header information is as follows:
insert image description here
1) The attribute delete_mask
marks whether the current record is deleted, occupying 1 binary bit:

  • A value of 0: indicates that the record has not been deleted
  • A value of 1: the record is deleted

Why is the deleted record still stored in the page ?
Because if the deleted records are removed, other records need to be rearranged, resulting in performance consumption .
So if you just mark a deletion, all deleted records will form a so-called garbage list.
The space occupied by the records in this linked list is called reusable space. If new records are inserted into the table later, the storage space occupied by these deleted records may be overwritten .
2) min_rec_mask The minimum record in each layer of non-leaf nodes
of the B+ tree will add this label, and the value of min_rec_mask is 1. 3) The attribute record_type indicates the type of the current record, and there are 4 types of records in total:

  • 0 means common user data record
  • 1 indicates B+ tree non-leaf node records
  • 2 means the minimum record
  • 3 means the maximum record

4)
The attribute heap_no indicates the position of the current record in this page.

In the figure above, it can be seen that heap_no has no records of 0 and 1. Because these two records are inserted by mysql, there are time pages called pseudo records or virtual records. Of these two records, one represents the minimum record and the other represents the maximum record.
5)
The header information of the last record in each group in the n_owned page directory will store the total number of records in the group as the n_owned field.
6) next_record
This attribute represents the address offset from the real data of the current record to the real data of the next record.

Note that the next record does not refer to the next record in the order we wrote it, but the next record sorted by the size of the primary key value.

And it is stipulated that the next record of the Infimum record (the smallest record) is the user record with the smallest primary key value on this page, and the next record of the user record with the largest primary key value on this page is the Supreme record (the largest record).

As shown below:
insert image description here

2.2.3 Minimum and maximum records: Infimum + Supremum

insert image description here
insert image description here

2.3 Part 3: Page Directory, Page Header

2.3.1 Page Directory: Page Directory

Why do you need a page directory?

In a page, records are stored in the form of a one-way linked list . The feature of a one-way linked list is that it is very convenient to insert and delete, but the retrieval efficiency is not high . In the worst case, it is necessary to traverse all the nodes on the linked list to complete the retrieval.
Therefore, the page directory module is specially designed in the page structure, and a directory is specially made for records , which can be searched by binary search method to improve efficiency.

The recording process of binary search using the page directory is as follows:

  1. Divide all records into several groups , these records include the smallest record and the largest record, but do not include the records marked as "deleted".
  2. Group 1, that is, the group where the smallest record is located has only one record. The last group, which is the group where the largest record is located, will have 1-8 records. The number of records in the remaining groups is between 4 and 8.
  3. In the header information of the last record in each group, the total number of records in the group will be stored as the n_owned field
  4. The page directory is used to store the address offset of the last record in each group . These address offsets will be stored in sequence. The address offset of each group is also called a slot . Each slot is equivalent to The pointer points to the last record of the different group .

For example, as shown in the figure below:
insert image description here

2.3.2 Page Header: Page Header

To get state information about records stored in a data page. For example, how many records have been stored in this page, what is the address of the first record, how many slots are rough and sticky in the page directory, etc., and a part called Page Header is specially defined in the page. This part occupies a fixed 56 A byte dedicated to storing various status information.
insert image description here

3. Row format

The usual data is inserted into the table in units of rows, and the storage method of these records on the disk is also called row format or record format .

The InnoDB storage engine has designed four different types of row formats , namely Compact, Dynamic, Compressed, and Redundant row formats. You can view the default row format of mysql through the following statement:

mysql> select @@innodb_default_row_format;
+-----------------------------+
| @@innodb_default_row_format |
+-----------------------------+
| dynamic                     |
+-----------------------------+
1 row in set (0.00 sec)

You can also view the row format used by a specific table:

SHOW TABLE STATUS LIKE '表名'\G

3.1 Compact line format

In mysql 5.1, the default is the Compact row format. A complete record can be divided into two parts: the extra information of the record and the real data of the record.
insert image description here

3.1.1 Variable length field length list

MySQL supports some variable-length data types, such as VARCHAR(M), VARBINARY(M), TEXT, and BLOB types. These data types are modified as variable-length fields, and the number of bytes of data stored in variable-length fields is not fixed . , so when we store real data, we need to store the number of bytes occupied by these data by the way.

In the Compact line format, the byte lengths occupied by the real data of all variable-length fields are stored at the beginning of the record , thus forming a list of variable-length field lengths.

For example, as shown in the figure below:
insert image description here
insert image description here

insert image description here
insert image description here

3.1.2 List of NULL values

The Compact row format will manage the columns that can be NULL in a unified way, and store them in a list marked as NULL values. If there are no columns in the table that allow NULLs, the list of NULL values ​​does not exist.

  • When the value of the binary bit is 1, it means that the value of the column is NULL;
  • When the value of the binary bit is 0, it means that the value of the column is not NULL;

For example, as shown in the figure below:

insert image description here
insert image description here
insert image description here

3.1.3 Record header information

The record header information has been introduced in detail above (2.2.2.1 Line Format - Record Header Information), please refer to the content of the above part.

3.1.4 Recorded real data

In addition to the data in the columns defined by ourselves, the recorded real data will also have 3 hidden columns:
insert image description here

3.2 Dynamic and Compressed row formats

3.2.1 Line overflow

The Innodb storage engine can store some data in a record outside the real data page .

Since the varchar type can store up to 65535 bytes, and it is necessary to remove 2 bytes for storing the variable-length field list and 1 byte for the NULL value list, then you can create a varchar field with a size of 65533 bytes.

CREATE TABLE varchar_size_demo(
    c VARCHAR(65533) not null
) CHARSET=ascii ROW_FORMAT=Compact;

Since the size of a page is generally 16KB, that is, 16384 bytes, a page cannot even fit a row of records, and this phenomenon is called row overflow .

3.2.2 Dynamic and Compressed row formats

For the above line overflow phenomenon.

In the Compact and Redundant row formats, only a part of the data in the column will be stored at the place where the real data is recorded, and the remaining data will be scattered and stored in several other pages for paging storage, and then 20 pages will be used to record the real data. The bytes store addresses pointing to these pages so that the page where the remaining data resides can be found.
insert image description here

The Compressed and Dynamic record formats use a complete overflow method for the data stored in the blob. Only 20 bytes of pointers are stored in the data page , and the actual data is stored in the off Page (overflow page) .
insert image description here

Another function of the Compressed row format is that the row format stored in it will be compressed with the zlib algorithm, so it can be very effective for storing large-length data such as blob, text, and varchar.

4. Area, segment, fragment area

4.1 Why are there districts?

The pages in each layer of the B+ tree will form a doubly linked list. If the storage space is allocated in units of pages, the physical positions between the two adjacent pages of the doubly linked list may be very far away. At this time, the search data is random I/O . So if the physical positions of adjacent pages can also be adjacent, so-called sequential I/O can be used when performing range queries . This query is much faster!

The concept of zone is introduced . A zone is 64 consecutive pages in physical location . Because the page size of InnoDB is 16KB, the size of an area is 64*16KB=1MB.

When the amount of data in the table is large, when allocating space for an index, it is no longer allocated in units of pages, but in units of regions . When Shenzhen has a lot of data in the table, it can allocate multiple continuous areas at one time, although it may cause a little waste of space (if the data cannot fill the entire area). But from a performance point of view, you can eliminate a lot of random I/O, which is more than worth the effort !

4.2 Why is there a segment?

The previous article introduced how to find the final user record data through the B+ tree, because the user record data is ultimately stored on the leaf nodes. Therefore, if you do not distinguish between leaf nodes and non-leaf nodes, and put them all in one area , the effect will be greatly reduced, because the leaf nodes may not be continuous .

Therefore, Innodb treats leaf nodes and non-leaf nodes differently, and they all have their own areas. The collection of areas storing . The collection pages of the regions that store non-leaf nodes are counted as a segment.

In addition to the leaf node segments and non-leaf node segments of the index, InnoDB also has segments defined for storing some special data, such as rollback segments. Therefore, common segments include data segment, index segment, and rollback segment . The data segment is the leaf node of the B+ tree, and the index segment is the non-leaf node of the B+ tree.

4.3 Why is there a debris area?

By default, a table using the Innodb storage engine has only one clustered index, and an index will generate 2 segments, and a segment applies for storage space in units of districts, and a district occupies 1M (64 * 16KB = 1024KB) storage by default space. If a table only stores a few pieces of data, if the page occupies 2M space, it is a bit wasteful .

In response to this problem, Innodb proposed a concept of **fragment**. In a fragmented area, not all pages exist to store the data of the same segment . For example, some pages are for segment A, some pages are for segment B, and some pages don't even belong to that segment. The fragmented area belongs directly to the table space and does not belong to any segment.

Therefore, the strategy for allocating storage space for a segment is as follows:

  1. When starting to insert data into the table, the segment allocates storage space in units of a single page from a fragmented area;
  2. When a segment has already occupied 32 pages in the fragmented area , it will apply for the allocation of storage space in units of complete areas

4.4 Classification of zones?

Regions can be roughly divided into four types:

  • Free area (FREE): No pages in this area are currently used
  • Area with remaining space (FREE_FRAG): Indicates that there are still pages available in the fragmented area
  • Area with no remaining space (FULL_FRAG): Indicates that all pages in the fragmented area are used and there are no free pages
  • The area attached to a certain segment (FSEG): each index can be divided into leaf node segments and non-leaf node segments

The areas in the three states of FREE, FREE_FRAG, and FULL_FRAG are independent and only belong to the table space, while the area filled with FSEG is attached to a certain segment.

5. Table space

A table space is a logical container, and the objects stored in the table space are segments. There can be one or more segments in a table space, but a segment can only belong to one table space. Tablespace database consists of one or more tablespaces, which can be divided into system tablespace (System tablespace), independent tablespace (File-per-table tablespace), undo tablespace (Undo Tablespace), and temporary tablespace in terms of management. Space (Temporary Tablespace), etc.

5.1 Independent table space

Independent table space, that is, each table has an independent table space, that is, data and index information will be stored in its own table space. Independent tablespaces can be migrated between different databases.

The .ibd file corresponding to a newly created table occupies only 96K (6 pages in size). This is because the space occupied by the table space is very small at the beginning, because there is no data in the table. As the data in the table increases, the file pages corresponding to the table space gradually increase.

Guess you like

Origin blog.csdn.net/xueping_wu/article/details/125469409