MySQL Technical Insider InnoDB Storage Engine Study Notes Chapter 5 Indexes and Algorithms

If there are too many indexes, the performance of the application will be affected (the index must be updated and stored on disk every time you insert, increasing disk IO). If there are too few indexes, the query performance will be affected. Find a balance.

InnoDB supports B+ tree index and hash index. InnoDB's hash index is adaptive. InnoDB will generate a hash index for the table according to the usage of the table, and cannot artificially intervene whether to generate a hash index in a table. The B+ tree structure is similar to a binary tree. It is the most commonly used and most effective index in the current relational database system to quickly find data based on key-value pairs. The B in the B+ tree does not represent binary, but represents balance, because the B+ tree is from the earliest The balanced binary tree evolved, but the B+ tree is not a binary tree.

The B+ tree index cannot find a specific row with a given key value, but finds the page where the data row is located, and then MySQL reads the page into memory, and then searches in the memory.

On average, binary search is more efficient than sequential search. The slots in the Page Directory in each page are stored in the order of the primary key, and the query for a specific record is obtained by binary search on the Page Directory.

Balanced binary tree definition: First, it meets the definition of binary search tree, and secondly, it must satisfy that the maximum difference between the left and right subtrees of any node is 1.

The performance of the balanced binary tree for search is close to the highest. To achieve the best performance, an optimal binary tree needs to be established. However, the establishment and maintenance of the optimal binary tree requires a lot of operations, and generally only a balanced binary tree is required.

The query speed of the balanced binary tree is very fast, but the cost of maintaining a balanced binary tree is very high. It usually takes one or more left and right rotations to get the balance of the tree after insertion or update. The balanced binary tree is as follows:
Insert picture description here
When a new value is inserted, The following operations need to be done to ensure balance: the
Insert picture description here
above figure makes the inserted tree become balanced again by turning left once, sometimes it needs to be rotated multiple times: In the
Insert picture description here
B+ tree, all record nodes are stored in the same layer in the order of key value. In the node, the leaf nodes are connected by pointers. The height of the following B+ tree is 2, and each page can store 4 records, and the fan-out is 5:
Insert picture description here
the insertion of the binary tree must ensure that the records inserted in the back leaf node are still in order:
Insert picture description here
for Figure 5 In the B+ tree in -6, if you insert the 28 key value, and find that the Leaf Page and Index Page are not full, just insert it directly:
Insert picture description here
When inserting the row with the key value of 70, the Leaf Page is already full, but the Index Page is not yet full. Full, if inserted into Leaf Page, the situation is 55, 55, 60, 65, 70,
Insert picture description here
and the leaf nodes are split according to the middle value of 60: The pointer between leaf nodes is omitted in the above figure. Then insert a row with a key value of 95. At this time, both Leaf Page and Index Page are full, and two splits are needed: The
Insert picture description here
above table also omits the pointers between leaf nodes. It can be seen that no matter how it changes, the B+ tree will always maintain balance, but in order to maintain balance, a large number of page splitting operations may be required for the newly inserted key values. The B+ tree is mainly used for disks. Page splitting means disk operations. Should try to reduce page splits when possible, so the B+ tree rotation operation is provided.

The B+ tree rotation occurs when the Leaf Page is full, but the left and right sibling nodes are not full. At this time, the B+ tree does not split the page first, but transfers the record to the sibling node of the page. Usually the left brother is checked first, so when the row with key value 70 is inserted in Figure 5-7, it will rotate first: it
Insert picture description here
can be seen that using rotation reduces the number of page splitting operations for the B+ tree, and the height of the B+ tree is still 2.

The B+ tree uses the fill factor to control the deletion and change of the tree, and 50% is the minimum value that the fill factor can be set. The delete operation of the B+ tree must also ensure that the records in the leaf nodes are still in order after the deletion.

Insert picture description here
Assume that the fill factor of the delete operation is 50%.

Delete the record with the key value of 70 in the B+ tree in Figure 5-9,
Insert picture description here
just delete it directly: then delete the record with the key value of 25, the value is still the value in the Index Page, after deleting the value, the right sibling node of 25 Update to Index Page:
Insert picture description here
Then delete the row with the key value of 60. After the deletion, the fill factor of the Leaf Page is less than 50%, and the merge operation is required. Similarly, after deleting the related records in the Index Page, you need to merge the Index Page. : The
Insert picture description here
essence of B+ tree index is the realization of B+ tree in database. One characteristic of B+ index in database is high fan-out. Generally, the height of B+ tree in database is only 2~3 layers.

The B+ tree index in the database is divided into clustered index and auxiliary clustered index, and its internal structure is B+ tree, which is highly balanced.

The InnoDB engine table is an index-organized table. The clustered index constructs a B+ tree according to the primary key of each table. The row records of the entire table are stored in the leaf nodes. Therefore, the leaf nodes of the clustered index become data pages. Each data The pages are linked through a doubly linked list.

The actual data pages can only be sorted by a B+ tree, so each table can only have one clustered index. The query optimizer tends to use the clustered index because the clustered index allows us to find the data directly on the leaf nodes of the index.

Only two rows of records can be stored in each page artificially:

CREATE TABLE t (
    a    INT NOT NULL PRIMARY KEY,
    b    VARCHAR(8000)
);

Insert data:

INSERT INTO t
SELECT 1, REPEAT('a', 7000);

INSERT INTO t
SELECT 2, REPEAT('a', 7000);

INSERT INTO t
SELECT 3, REPEAT('a', 7000);

INSERT INTO t
SELECT 4, REPEAT('a', 7000);

The constructed binary tree is as follows:
Insert picture description here
Many database documents say that the clustered index stores data in the order of physical addresses, but if the clustered index must store physical records in a specific order, the maintenance cost will be very high. The clustered index is not physically continuous, but logically continuous of.

Since the clustered index defines the logical order of the data, it can quickly access the query for the primary key range value: in the
Insert picture description here
above range query, the page range can be obtained through the upper node of the leaf node, and then the data page can be read. The rows column in the above figure gives the estimated number of returned rows of the query result, not the exact value.

The auxiliary index (non-clustered index) leaf node does not contain all the data of the row. In addition to the key value, the leaf node also contains a bookmark in the index row in each leaf level to tell the InnoDB engine where to find the corresponding row. Data, that is, the bookmark is the clustered index key of the corresponding row of data.

The auxiliary index does not affect the organization of the data in the clustered index, and there can be multiple auxiliary indexes on each table. When looking for data through the auxiliary index, the auxiliary index is used to find the primary key of the clustered index in the leaf node of the auxiliary index, and then the complete row record is found through the primary key index.

For other databases, such as SQLserver, there is a table that is not index-organized, called a heap table, which is similar to MySQL's MyISAM engine in terms of data insertion. The indexes on the heap table are all non-clustered, and the heap table has no primary key. Its bookmark is a row identifier. The actual row on the disk can be located in a format such as "file number: page number: slot number".

The non-clustered index of the heap table does not need to search the clustered index through the primary key. In the case of read-only, the non-clustered index with the bookmark as the row identifier may be faster than the non-clustered index with the bookmark as the primary key, but when the table is increased During DML operations such as deletion and modification, a non-clustered index with a bookmark as a row identifier may need to constantly update the position of the data page pointed to by the row identifier. At this time, the overhead may be greater than a non-clustered index with a bookmark as the primary key. For sorting and range search, the index-organized table can find all the pages to be searched through the intermediate node of the B+ tree, and the characteristics of the heap table make this impossible. However, general databases use read-ahead technology to avoid multiple discrete read operations. Whether the heap table is fast or the index-organized table is fast depends on the specific situation. Which one is better.

View the index on the table:

SHOW INDEX FROM tableName;

Add a column c to table t, and create a non-clustered index on this column:

ALTER TABLE t
ADD c INT NOT NULL;

When adding a non-empty column to a table that already has data, the data in the column automatically becomes 0 or an empty string.

Use ALTER TABLE to create and delete index syntax:

ALTER TABLE tbl_name
| ADD {
   
   INDEX|KEY} [index_name]
[index_type] (index_col_name, ...) [index_option] ...

ALTER TABLE tbl_name
DROP PRIMARY KEY
| DROP {
   
   INDEX|KEY} index_name

Create and delete indexes with CREATE/DROP INDEX:

CREATE [UNIQUE] INDEX index_name
[index_type]
ON tbl_name (index_col_name, ...)

DROP INDEX index_name ON tbl_name

The index can only index the first 100 bytes of data on the column. For example, the previously created table t and b columns are varchar(8000):

ALTER TABLE t
ADD KEY idx_b(b(100));

A common problem with MySQL is that for index addition or deletion, MySQL first creates a new temporary table, then imports the data into the temporary table, deletes the original table, and then renames the temporary table to the original table name, so the large table is added And deleting the index is very slow. Starting from InnoDB Plugin, a method called fast index creation is supported. It is only limited to auxiliary indexes. The creation and deletion of primary keys still need to rebuild a table. For auxiliary indexes, InnoDB will first add an S to the table. Lock, there is no need to rebuild the table during the creation process, but the table can only be read during the creation process. This method to delete the auxiliary index only needs to mark the space of the auxiliary index as available under the internal view update of the InnoDB engine, and delete The index definition for the table on the MySQL internal view is sufficient.

A joint index refers to an index with multiple columns. Create a joint index on the table t:

ALTER TABLE t
ADD KEY idx_a_b(a, c);

Analyze the indexes
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
on the table: There are four indexes on the table: a primary key index, an index on column c, an index composed of the first 100 bytes of column b, and a joint index (occupying two rows in the upper table). The meaning of the above query result fields:
1. Table: The name of the table where the index is located.
2. Non_unique: Whether it is a non-unique index, the column of the primary key is 0, because the primary key must be unique.
3. Key_name: Index name, which can be used for DROP INDEX.
4. Seq_in_index: The position of the column in the index. Taking the joint index idx_a_b as an example, a value other than 1 will only appear in the joint index.
5. Column_name: Indexed column.
6. Collation: In what way is the column stored in the index. Can be'A' or NULL, B+tree index is always A, which means it is sorted. If the Heap engine is used and the Hash index is created, NULL will be displayed because Hash stores index data according to the Hash bucket, rather than sorting the data.
7. Cardinality: Represents an estimate of the number of unique values ​​in the index. The value divided by the number of rows in the table should be as close as possible to 1, if it is too small, consider whether to delete this index.
8. Sub_part: Whether to index only a part of the column, such as index idx_b, displayed as 100, which means that only the first 100 bytes of column b are indexed. The field value is NULL when indexing the entire column.
9. Packed: How the keywords are compressed, NULL means uncompressed.
10.Null: Whether the index column allows NULL values, if yes, it will display Yes, otherwise it will not be displayed.
11. Index_type: Index type, InnoDB only supports B+ tree index, so the above picture is BTREE.
12.Comment: Comment.

The optimizer will determine whether to use this index according to the Cardinality value, but this value is not updated in real time, because this is too costly. To update this value, use the following command:

ANALYZE TABLE t;

However, there are some problems with this command, and the results may be different on each system.

After running for a period of time, Cardinality may become NULL. At this time, it may appear that the index is established, but it does not use or explain two identical statements but the final results are different. One uses the index and the other is a full table scan. It is best to do an ANALYZE TABLE operation.

Fields with a wide range of values ​​and almost no duplicates (that is, high selectivity) are suitable for B+ tree indexes. But if the access field is highly selective, but the fetched row data occupies most of the data in the table, MySQL will not use the B+ tree index.

The following is an example. The table member has about 500W pieces of data, and there is a unique index on the usernick field. When searching for a user whose usernick is'David', the execution plan is as follows:
Insert picture description here
Because the usernick field is highly selective, and the above query selects a lot in the table Fewer lines, so the index is used. But the following statement:
Insert picture description here
Although the column used is still usernick, the index key actually used by the optimizer shows NULL. This is because the rows it fetches account for a large part of the table:
Insert picture description here
Check the following SQL statement execution plan for time-based search :
Insert picture description here
Insert picture description here
Although there is only one day difference, the execution plans of the two statements are different. When the second statement is executed, although idx_regdate can be used, the optimizer does not use the index, but uses a full table scan. The optimizer will pass EXPLAIN rows Field estimation query may get rows, if it is greater than a certain value (the author guessed as 20% of the data in the table), the B+ tree will choose a full table scan. However, the estimated number of returned rows is inaccurate, which may cause problems in the optimizer's choice. For example, we force the use of indexes:

SELECT id, userid, sex, registdate
INTO OUTFILE 'a'
FROM member
FORCE INDEX(idx_regdate) 
WHERE registdate < '2006-04-24';

Perform it:
Insert picture description here
If the index is not mandatory:
Insert picture description here
It can be seen that the optimizer's choice is not entirely correct.

Sequential reading refers to reading the blocks on the disk sequentially; random reading refers to that the blocks accessed are not continuous and require the head of the disk to move continuously. One of the bottlenecks of traditional mechanical disks is the low random read speed.

The following are the write back of RAID (when the cpu writes to the memory, the memory and the disk content are modified at the same time), write through (only the memory is modified during the cpu write operation, and the disk modification will be when the memory data is to be replaced by the newly entered data. For writing, RAID arrays generally have their own power supply, so there is no need to worry about data loss due to power failures and other reasons.) The difference in read performance tested by sysbench under the two methods: It
Insert picture description here
can be seen that sequential read performance is higher.

B+ tree sequential reading refers to reading the required row data sequentially according to the leaf node linked list of the index. This is only a logical sequential reading. The physical disk may still be read randomly, but the data on the physical disk is relatively sequential. , Because the area is 64 consecutive pages.

Random read is generally to access the secondary index leaf node to get the primary key, and then find the data through the primary key index. The access on the disk is random.

When there is too much content in the table read at a time, for non-clustered indexes, random reads increase greatly. For clustered indexes, the data on the disk is not necessarily read sequentially, and the performance of random reads is much lower than that of sequential reads. , Therefore, the situation of using a full table scan instead of an index when selecting a large amount of data occurs.

In order to improve read performance, the InnoDB engine uses pre-fetch technology to pre-fetch multiple pages into the buffer pool through one IO request. InnoDB predicts that the pre-fetched pages will be accessed immediately. Traditional IO requests only read one page at a time. Under the lower IOPS of traditional mechanical hard disks, read-ahead can greatly improve read performance.

Two pre-reading methods of the InnoDB engine:
1. Random pre-reading: When 13 pages in a zone (64 pages) are in the buffer and at the front end of the LRU list (pages are frequently accessed), this zone will be All pages in the middle are pre-read into memory.
2. Linear pre-reading: If all 24 pages in one area are accessed sequentially, all pages in the next area will be pre-read.

However, MySQL's pre-reading has caused performance degradation. The InnoDB engine has officially cancelled random access pre-reading from Plugin 1.0.4. Linear pre-reading has been retained, and the innodb_read_ahead_threshold parameter (default value 56) has been added, which means When how many pages in a zone are accessed sequentially, the InnoDB engine enables prefetching to read all pages in the next zone.

The internal structure of the solid-state hard disk is different from the traditional mechanical hard disk. Its random read performance has a qualitative leap. At this time, the 20% principle of the optimizer may be inaccurate. With the popularity of solid-state drives, various databases will speed up the optimization in this area.

The leaf node of the auxiliary index contains the primary key, but does not contain complete row information. The InnoDB engine always judges whether the required data can be obtained from the leaf node of the auxiliary index, as in the following example:

CREATE TABLE t (
    a   INT          NOT NULL,
    b   VARCHAR(20),
    PRIMARY KEY(a), 
    key(b)
);

INSERT INTO t
SELECT 1, 'kangaroo';

INSERT INTO t
SELECT 2, 'dolphin';

INSERT INTO t
SELECT 3, 'dragon';

INSERT INTO t
SELECT 4, 'antelope';

If executed at this time SELECT * FROM t;, many people think that the output is as follows:
Insert picture description here
but the actual result is: in the
Insert picture description here
above example, the auxiliary index contains the value of the primary key a, and the auxiliary index on column b is used, so you can get all the columns without the primary key Data, and usually more data stored in the auxiliary index page than on the primary key page, so the auxiliary index is used. Explain this SQL statement:

Insert picture description here
If you want to get the result of sorting column a, the optimizer will go directly to the primary key to avoid sorting column a:
Insert picture description here
or you can force the use of the primary key:

SELECT *
FROM t
FORCE KEY(PRIMARY);

Run it: The
Insert picture description here
joint index is essentially a B+ tree, but it has more than one key value. If there is a joint index composed of two integer columns (a, b), the data will be stored in the B+ tree in the order of (a, b):
Insert picture description here
at this time WHERE a = xxx AND b = xxx, the above index can be used for the query with a condition . The WHERE a = xxxabove index can also be used for queries with conditions . But WHERE b = xxxthe query with the condition cannot use the above index, because the value of b on the leaf node is not sorted.

When using a joint index, you can sort the second key value. For example, we query the shopping situation of a user and sort them in chronological order. At this time, why use joint search to avoid one more sorting operation, because the index itself is already in the leaf node Sorted:

CREATE TABLE buy_log (
    userid       INT UNSIGNED    NOT NULL,
    buy_date     DATE
);

INSERT INTO buy_log
VALUES(1, '2009-01-01');

INSERT INTO buy_log
VALUES(2, '2009-01-01');

INSERT INTO buy_log
VALUES(3, '2009-01-01');

INSERT INTO buy_log
VALUES(1, '2009-02-01');

INSERT INTO buy_log
VALUES(3, '2009-02-01');

INSERT INTO buy_log
VALUES(1, '2009-03-01');

INSERT INTO buy_log
VALUES(1, '2009-04-01');

ALTER TABLE buy_log
ADD KEY(userid);

ALTER TABLE buy_log
ADD KEY(userid, buy_date);

As above, two indexes containing the userid field are created, such as querying only userid: it
Insert picture description here
can be seen that there are two indexes available for possible_keys, but the optimizer chooses userid, because the leaf node contains a single key value, which can be contained in a page More records. If you want to retrieve the last three purchase records with userid 1: The
Insert picture description here
above statement is also available for two indexes, the optimizer chooses a joint index, because the buy_date in this joint index is already sorted, if the userid single index is forced to be used: it
Insert picture description here
can be seen that filesort (sort, It is not done in the file). If you execute the statement in the above figure, first check the current number of sort operations:
Insert picture description here
then execute the statement:
Insert picture description here
then check the number of sort operations, and find that the number of sort operations has increased:
Insert picture description here
and if you use the joint index, you will find that the number of sort operations has not increased.

The adaptive hash index in the InnoDB engine uses a hash table (hash table) data structure. The key is mapped to the hash table through the hash function. The link method is generally used when a collision occurs, and the data of the repeated key value is placed in a linked list.

The hash function is best to avoid collisions. The database generally converts keywords into natural numbers, and then uses the method of division hashing:, h(k) = k mod mthat is, by taking the remainder of k divided by m, the keyword k is mapped to some of the m slots One in.

InnoDB engine hash index conflict handling adopts linked list method, and hash function adopts division hash method. When converting a page into a natural number, first shift the table space number to the left by 20 bits, and add the offset of the page in the table space. This value can be hashed into the slot of the hash table by division.

The adaptive hash index is created and used by the database itself, and the DBA cannot intervene in it. After the parameter innodb_adaptive_hash_index is enabled in the configuration file, a hash table with innodb_buffer_pool_size/256 slots will be automatically created when the database is started. If the parameter value is 10M, InnoDB will create a hash table with 10M/256=40960 slots at startup. Adaptive hash table.

The adaptive hash index is very fast for dictionary type lookups, but is powerless for range lookups.

View the current adaptive hash index usage:

SHOW ENGINE innodb STATUS;

Run it:
Insert picture description here
you can see the size and usage of the adaptive hash index, the use of adaptive hash index search per second and the situation of not using adaptive hash index search per second.

The adaptive hash index is enabled by default.

Guess you like

Origin blog.csdn.net/tus00000/article/details/113487456