InnoDB storage engine-storage structure

  • Table structure definition file

     MySQL data storage is based on tables, and each table will have a corresponding file. But no matter what storage engine the table uses, MySQL has a file with the suffix frm, which records the table structure definition of the table. frm is also used to store the definition of the view. If the user creates a v_a view, a corresponding v_a.frm file will be generated to record the definition of the view.

To view the storage location of MySQL data files:

  • Table space file (InnoDB)

  InnoDB will initialize a size of 10MB under the default configuration (the size can be automatically increased), and the file named ibdata1 is used as the default tablespace file. The user can set it through innodb_data_file_path. After setting the innodb_data_file_path parameter, all the files based on the InnoDB storage engine The data of the table will be recorded in the shared table space. If the parameter innodb_file_per_table is set, the user can generate an independent table space for each table based on the InnoDB storage engine. The naming rule is: table name.ibd. It should be noted that a separate table space file only stores the table data, index and insert buffer BITMAP and other information, and the rest of the information (such as rollback information, insert buffer index page, system transaction information, secondary write buffer, etc.) is still stored In the default tablespace.

The table space is composed of segments, extents, and pages.

  • segment 

     Common segments include data segment (leaf node of B+ tree), index segment (B+ tree non-index node), rollback segment, etc.

  •  Area

   A zone is a space composed of consecutive pages, and the size of the zone is 1MB in any case. By default, the page size of the InnoDB storage engine is 16KB, that is, there are 64 consecutive pages in a region.

  • page

  Page is the smallest unit of InnoDB disk management. The default page size is 16KB. You can set the page size to 4K, 8K, or 16K through the innodb_page_size parameter.

In the InnoDB storage engine, common page types are:

  1. Data page (B-tree Node)
  2. undo Log Page
  3. System Page
  4. Transaction data page (Transaction system Page)
  5. Insert Buffer Bitmap page
  6. Insert Buffer Free List page (Insert Buffer Free List)
  7. Uncompressed binary large object page (Uncompass BLOB Page)
  8. Compressed binary large object page (compassed BLOB Page)
  • Row

The InnoDB storage engine provides Compact and Redundant formats to store record data. The Redundant format is reserved for compatibility with previous versions. The default row format is Compact. We can view the row format used by the current table by show table status like'table_name'.

  We can specify the row format in the statement to create or modify the table:

CREATE TABLE 表名 (列的信息) ROW_FORMAT=行格式名称
    
ALTER TABLE 表名 ROW_FORMAT=行格式名称
  •  Compact line record format

 

  1. Variable length field length list     

         We know that MySQL supports some variable-length data types, such as VARCHAR(M), VARBINARY(M), various TEXT types, and various BLOB types. We can also call columns with these data types as variable-length fields. The number of bytes of data stored in the field is not fixed, so when we store real data, we need to store the number of bytes occupied by these data by the way, so as not to confuse the MySQL server, so these variable-length fields The occupied storage space is divided into two parts:

    1. The real data content
    2. The number of bytes occupied
In the Compact row format, the byte length occupied by the real data of all variable length fields is stored at the beginning of the record, thus forming a variable length field length list, each The number of bytes occupied by the variable-length field data is stored in the reverse order of the column. We emphasize once again that it is stored in the reverse order. Its length is:

      If the length of the column is less than 255 bytes, it is represented by 1 byte.

      If it is greater than 255 bytes, it is represented by 2 bytes.

The maximum length of a variable-length field cannot exceed 2 bytes, because the maximum length of the VARCHAR type in the MYSQL database is limited to 65535. (Not all records have this variable-length field length list part, for example, if all the columns in the table are not variable-length data types, this part does not need to have). 

2. NULL flag

   If the table is not allowed to store the NULL column, the NULL value list does not exist, otherwise a bit corresponding to each storage allow NULL columns are arranged in reverse bit order of columns, the meaning of bits represented as follows:
          Binary When the bit value is 1, it means that the value of the column is NULL.
          When the value of the binary bit is 0, it means that the value of the column is not NULL.
MySQL stipulates that the NULL value list must be represented by an integer number of bytes. If the number of binary bits used is not an integer number of bytes, add 0 to the high bit of the byte. If 9 of a table are allowed to be NULL, then the NULL value list part of this record needs 2 bytes to represent.

3. Record header information 

Fixed occupancy 5 bytes (40 bits)

  •  Hidden column

In addition to the data columns, InnoDB will add hidden columns for each column

supplement:

transaction_id (DB_TRX_ID): Record the transaction ID of the last modification of each row,

roll_pointer (DB_ROLL_PTR): point to the undo information of the current record.

Both are related to the line record rollback and MVCC implementation (described later).

  • Redundant row format

  

  • Field length offset list

     Note that the beginning of the Compact line format is a variable-length field length list, and the beginning of the Redundant line format is a field length offset list, which is different from the variable-length field length list in two ways:

       1. The absence of variable length means that the Redundant row format will store the length information of all columns (including hidden columns) in the record in the field length offset list in reverse order.

        2. There are two more offset words, which means that the method of calculating the length of the column value is not as intuitive as the Compact row format. It uses the difference between two adjacent values ​​to calculate the length of each column value.

  • Record header information

The record header information in the Redundant row format occupies 6 bytes and 48 binary bits. The meanings of these binary bits are as follows:

  •    delete_mask

    This attribute marks whether the current record is deleted or not. It occupies 1 binary bit. When the value is 0, it means that the record has not been deleted, and when it is 1, it means that the record is deleted.

       The reason why the deleted records are not removed from the disk immediately is that after removing them, rearranging other records on the disk requires performance consumption, so it is just a mark for deletion. All deleted records will form one. The so-called garbage linked list, the space occupied by the records in this linked list is called the so-called reusable space, and then if new records are inserted into the table, the storage space occupied by these deleted records may be overwritten. (In fact, if the data is inserted in the ascending order of the index when inserting, the data is compact. If the data is inserted randomly, the data page of the index may be split and the space cannot be fully utilized).

      If you want to reclaim the disk space of the deleted data, you can use the alter table A engine=InnoDB command to rebuild the table. That is, MYSQL will automatically create a temporary table B, then import the data in A into table B, then delete the old table A and change the name of the B table to A.

  • min_rec_mask

    The mark is added to the smallest record in each non-leaf node of the B+ tree, and the min_rec_mask value of the record is 0, which means that they are not the smallest record in the non-leaf node of the B+ tree.

  • n_owned

  InnoDB divides all normal records (including the largest and smallest records, excluding the records marked as deleted) into several groups. The n_owned attribute in the header information of the last record of each group (that is, the largest record in the group) indicates how many records the record has, that is, how many records are in the group. The address offset of the last record of each group is extracted separately and stored in sequence near the end of the page. This place is the so-called Page Directory, which is the page directory. These address offsets in the page directory are called slots (English name: Slot), so this page directory is composed of slots.

  • heap_no

    This attribute indicates the position of the current record on this page, starting from 2, because 0 and 1 respectively record the smallest record and the largest record. Namely Infimum and Supremum.

  • record_type

      This attribute represents the type of the current record. There are a total of 4 types of records. 0 means normal record, 1 means B+ tree non-leaf node record, 2 means minimum record, and 3 means maximum record. The records we insert ourselves are ordinary records, and their record_type values ​​are 0, while the record_type values ​​of the smallest record and the largest record are 2 and 3, respectively.

  • next_record

      It represents the address offset from the real data of the current record to the real data of the next record. For example, the next_record value of the first record is 32, which means that 32 bytes from the address of the real data of the first record are the real data of the next record. This is actually a linked list, and the next record can be found through one record. But it should be noted that one point to note again is that the next record does not refer to the next record in the order of our insertion, but the next record in the order of the primary key value from small to large. And it is stipulated that the next record of the Infimum record (that is, the smallest record) is the user record with the smallest primary key value on this page, and the next record of the user record with the largest primary key value on this page is the Supremum record (that is, the largest record) .

 

  • InnoDB data page structure

 The InnoDB data page consists of the following 7 parts:

  • File Header (38 bytes of file header)
  • Page Header (56 bytes of page header)
  • Infimun and Supremum Records (minimum record and maximum record)
  • User Records (user records, that is, line records)
  • Free Space
  • Page Directory
  • File Trailer (8 bytes of file end information)

 

  • File Header

FIL_PAGE_SPACE_OR_CHKSUM

This represents the checksum of the current page. What is a checksum? For a very long and long byte string, we will use some algorithm to calculate a relatively short value to represent this very long byte string. This relatively short value is called the checksum. In this way, compare the checksums of the two long byte strings before comparing the two long byte strings. If the checksums are not the same, the two long byte strings must be different, so direct comparison is omitted. The time consumption of two relatively long byte strings.

FIL_PAGE_OFFSET

Each page has a separate page number, just like your ID number. InnoDB can uniquely locate a page through the page number.

  • Page Header

name Occupied space size description
PAGE_N_DIR_SLOTS 2byte Number of slots in the page directory
PAGE_HEAP_TOP 2byte The smallest address of the unused space, that is, after the address isFree Space
PAGE_N_HEAP 2byte Number of records on this page (including minimum and maximum records and records marked for deletion)
PAGE_FREE 2byte The address of the first record that has been marked as deleted (each deleted record next_recordwill also form a singly-linked list, and the records in this singly-linked list can be reused)
PAGE_GARBAGE 2byte Number of bytes occupied by deleted records
PAGE_LAST_INSERT 2byte The position where the record was last inserted
PAGE_DIRECTION 2byte Record the direction of insertion
PAGE_N_DIRECTION 2byte The number of records inserted continuously in one direction
PAGE_N_RECS 2byte The number of records on this page (excluding the minimum and maximum records and the records marked for deletion)
PAGE_MAX_TRX_ID 8byte Modify the maximum transaction ID of the current page, this value is only defined in the secondary index
PAGE_LEVEL 2byte The level of the current page in the B+ tree
PAGE_INDEX_ID 8byte Index ID, which indicates which index the current page belongs to
PAGE_BTR_SEG_LEAF 10byte The header information of the leaf segment of the B+ tree is only defined on the Root page of the B+ tree
PAGE_BTR_SEG_TOP 10byte The header information of the non-leaf segment of the B+ tree is only defined on the Root page of the B+ tree
  • groove

      The relative position of the record is stored in the Page Directory (note that the relative position is stored here, not the offset). Sometimes these record pointers are called Slots or Directory Slots.

      InnoDB has regulations on the number of records in each group: the group with the smallest record can only have 1 record, and the group with the largest record can only have 1 to 8 records, and the rest The number of records in the group can only be between 4 and 8. So the grouping is carried out according to the following steps:

  1. Initially, there are only two records, the smallest record and the largest record, in a data page, and they belong to two groups.
  2. After each record is inserted, the slot with the primary key value greater than the primary key value of this record and the smallest difference will be found from the page directory, and then the n_owned value of the record corresponding to the slot is increased by 1, indicating that the group has been added A record, until the number of records in the group is equal to 8.
  3. When a record is inserted after the number of records in a group is equal to 8, the records in the group will be split into two groups, 4 records in one group and 5 records in the other. This process will add a slot in the page directory to record the offset of the largest record in this new group.

 

  • File Trailer

We know that the InnoDB storage engine will store data on the disk, but the disk speed is too slow, and the data needs to be loaded into the memory for processing in units of pages. If the data in the page is modified in the memory, then the modified The data needs to be synchronized to the disk at a certain time. But what if the power is interrupted when the synchronization is halfway, isn't this embarrassing inexplicably? In order to check whether a page is complete (that is, whether there is an awkward situation where only half of the synchronization occurs during synchronization), InnoDB adds a File Trailer part at the end of each page. This part is composed of 8 bytes and can be divided into 2 small parts:

The first 4 bytes represent the checksum of the page

This part corresponds to the checksum in the File Header. Whenever a page is modified in the memory, its checksum must be calculated before synchronization. Because the File Header is at the front of the page, the checksum will be synchronized to the disk first. When it is completely written, the checksum will be checked. The checksum will also be written to the end of the page. If the complete synchronization is successful, the checksum at the beginning and the end of the page should be the same. If the power is cut off after half of the writing, the checksum in the File Header represents the modified page, and the checksum in the File Trialer represents the original page. The difference between the two means that the synchronization is in the middle Something went wrong.

The last 4 bytes represent the log sequence position (LSN) when the page was last modified

This part is also to verify the integrity of the page, but we haven't said what LSN means so far, so you can just ignore this attribute.

This File Trailer is similar to FILE Header and is common to all types of pages.

Guess you like

Origin blog.csdn.net/u014608280/article/details/99066173