Mysql index tuning

1. mysql index tuning

1.1 Explanation

​ Before the optimization explanation, please remember not to listen to the "absolute truth" you see about optimization, but to verify your assumptions about the execution plan and response time through tests in actual business scenarios.

1.2 Optimization direction

Insert picture description here

As can be seen from the above figure, we divide the database optimization into four latitudes: hardware, system configuration, database table structure, SQL and index

**Hardware: ** CPU, memory, storage, network equipment, etc.

System configuration: server system, database service parameters, etc.

Database table structure: high availability, sub-database sub-table, read-write separation, storage engine, table design, etc.

Sql and index: sql statement, index usage, etc.

  • Consider from the optimization cost: hardware>system configuration>database table structure>SQL and index
  • Consider from the optimization effect: hardware <system configuration <database table structure <SQL and index

This article will explain the SQL and indexes with the lowest optimization cost and the best effect.

1.3 Data page

B + Tree is a tree to find a balance disk storage device substandard design, InnoDB storage engine is a B + Tree achieve its index structure

Before understanding the B+Tree structure, first understand these two concepts

​ **Disk: **The system reads data from the disk to the memory based on the disk block (block), and the data located in the same disk block will be read out at one time.

Data page: InnoDB is using a page as the basic unit of storage space management, InnoDB when the data is read into memory disk is the basic unit of a page, InnoDB storage engine the default size of each page is 16KB, can be modified to 4K, 8K, 16K.

1.3.1 Data page structure

A new data page is created, to the process of inserting data

Insert picture description here

When Free Spaceall of them are used up, if there are new records inserted, then apply for a new page.

**User Records:** The structure of a record is shown in the figure

Insert picture description here

name Size (bit) description
预留位 1 Don't use it yet
delete_mask 1 Mark whether the record is deleted
min_rec_mask 1 The smallest record in each non-leaf node of the B+ tree will add this mark
n_owned 4 Indicates the number of records owned by the current record
heap_no 13 Indicates the location information currently recorded in the recording heap
record_type 3 Represents the type of the current record, 0represents a normal record, 1represents a B+ tree non-leaf node record, 2represents the smallest record, 3represents the largest record
next_record 16 Indicates the relative position of the next record

Page Directory: is a directory for managing multiple User Records

  1. Group all normal records (the record with the smallest Infimum key value and the record with the largest Supermum key value)
  2. In the header information of the last record of each group (the record with the largest key value in the group), there are several records in the group where the record belongs to n_owned
  3. The address offset of the last record of each group is stored in a directory in order, this directory is Page Directory , and these address offsets in the directory are called ( slot )

The structure diagram of Infimum + Supermum + User Records + Page Directory is as follows

Insert picture description here

Use the data in the above figure to simulate the data search process in a data page: (binary search for slot, and then traverse the slot pair)

  1. Calculate the position of the middle slot: (0 + 3) / 2 = 1, check that the primary key value corresponding to slot 1 is 4, because the primary key 4 is smaller than the primary key 6.

    Set low = 1, high = 3 unchanged;

  2. Recalculate the position of the middle slot: (1 + 3) / 2 = 2, check that the corresponding value of slot 2 is 8, because the primary key 8 is greater than the primary key 6

    low = 1 unchanged, set high = 2

  3. Because high-low = 1, so determine the primary key 6 record and then slot2 position. Find the group's largest primary key 4 through slot1, the next_recode of this record records the address offset of slot2's primary key value 5, traverse the group corresponding to slot2, and find the record of primary key 6.

1.4 B+Tree data structure

​ Each data page can be formed into one 双向链表, and the records in 单向链表each data page will be formed in the order of primary key value from small to large . Each data page will generate one for the record stored in it 页目录(Page Directory), and search for a certain value through the primary key. When there is a record, you can 页目录use the dichotomy to quickly locate the corresponding slot, and then traverse the records in the corresponding group of the slot to quickly find the specified record.

Sketch map:

Insert picture description here

1.4.1 Search without index
  • Using the primary key as the search condition, assuming that you are looking for data with a record of 2, in a data page, you can use the dichotomy in the Page Directory to quickly locate the corresponding slot, and then traverse the records in the corresponding group of the slot to quickly find The specified record. But if there are 1000 data pages, and if you want to find 10000 in page500, you will have to load 500 IOs from page0 to page500 to get the data. (Solution: Main Index )
  • Use non-primary key as the search condition, because there is no so-called Page Directory for non-primary key columns in the data page , so we cannot quickly locate the corresponding dichotomy . In this case, you can only 最小记录traverse each record in the singly linked list from the beginning, and then compare whether each record meets the search criteria. Obviously, the efficiency of this search is very low. (Solution: auxiliary index )
1.4.2 Main Index

Insert picture description here

** Primary index: ** The key value is the primary key id, data is a row of data

例sql:select * from table where id = 20;

Search process:

1) Read the root node page0, load the data from the disk into the memory, find the slot and traverse the group according to the binary division. Find p1.

2) Read page1, load it into memory, find p5 according to binary search slot and traversal group.

  1. Read page5, load it into memory, find the record with key = 20 according to the binary search slot and traversal group.

MySQL 's InnoDB storage engine is designed to resident the root node in memory. For a b+tree with a height of 3, it means that only 1~3 disk I/O operations are required to find the row record of a certain key value. .

2. A data page defaults to 16KB, the primary key type of the general table is INT (occupies 4 bytes) or BIGINT (occupies 8 bytes), the pointer type is also generally 4 or 8 bytes, that is to say, a page ( A node in B+Tree) stores approximately 16B * 1024 / (8B+8B)=1024 key values.

B+tree main index with depth of 3 : 1024x1024x100 is approximately equal to 100 million pieces of data

B+tree auxiliary index with depth of 3 : 1024x1024x1024 is approximately equal to 1 billion pieces of data

1.4.3 Auxiliary Index

Insert picture description here

**Auxiliary index: **The key value is a non-primary key field, and data is the primary key id of the row data

例sql:select id from table where key= 4;

  1. Read the root node page0, load the data from the disk into the memory, find the slot and traverse the group according to the binary division. Find p1.
  2. Read page1, load it into memory, find the slot and traverse group in the page directory (page directory) through binary search, find p4. But since the key value is not uniquely restricted, key4 may exist in multiple data pages, and because 1<4<20, so the specific data is stored in p3 and p4
  3. Read the records of page3 and page4 into the memory, and find the record with key=4 based on the same search rule as above.

Why is the record data of the secondary index the primary key id?

1. The data page size is limited. When the data data is too large, the number of key values ​​stored in a data page will be small, which means that the same amount of data needs to be searched, and the data pages to be loaded are more, and the number of IOs is more. .

2. If you have rows of data, each equivalent to the establishment of a B+tree will need all the user records are copied again and again, a waste of storage space.

1.5 MySql optimization actual combat

1.5.1 limit keyword optimization

The existing user table has 500w pieces of data. One function is the simplest paging query, SQL is as follows:

select * from user where age > 45 limit page, size;

When the page is larger, the sql query is slower, for example, when page=3000000, size = 10. This sql is already in seconds, do you have a way to optimize it?

the reason:

  1. Suppose there is only a primary key index in the table: the data is on the disk first, and I only need the data from 3000000 to 3000000 + 10, but the execution engine does not know which record the 3000000 data is. All this SQL will query the entire table, match the records with the conditions, until the 3000000 + 10 data that meet the conditions are loaded into the memory, and then discard the previous one, then stop execution.
  2. Suppose there is a primary key index and a secondary index created by age in the table: the data is first on disk, this sql will query the primary key id based on the age index, because the b+tree data pages (nodes) are all ordered key value records , We can easily reach the position of the data page with age>45, and then load 3000000 + 10 pieces of id data. Because we want to check * instead of id, all the 3000000 + 10id data will get 3000000 + 10 records based on the primary key index again. Then execute the limit statement. (PS: We can see that the limit statement is the last interception, which is related to the execution order of the sql keyword, you can understand it)

**Optimization direction: **Consider using secondary indexes to reduce io.

select * from user  u1 right join (select id from user where age > 45 limit 3000000, 10);

Drive the table statement separately

select id from user where age > 45 limit 3000000, 10; 

Although this sql will read 3000000 + 10 pieces of data based on the non-primary key index, that is, the index created by age. Compared with the above sql: we know that the non-primary key index leaf node can store more data than the primary key index leaf node, that is, load the same data, and the non-primary key index needs to load fewer data pages than the primary key index. Next, it is very fast to get 10 IDs and then query the efficiency based on the join connection.

1.5.2 in key sub-optimization

The existing user table has 500w pieces of data. The table has three pieces of data (uid: 1,2,3) as follows:

select * from user where id in (select uid from table1), how efficient is its execution, do you have a way to optimize it?

Subjectively, we will think that we will execute the statement in in first, get three uid (1, 2, 3) data, and then perform primary key index query on the user table. This is very fast, and it is also the query method we want.

In mysql5.5 version: first explain extended analysis statement, and then execute SHOW WARNINGS; the real sql is as follows

SELECT `数据库名`.`user`.`id` AS `id`,`数据库名`.`user `.`name` AS `name`,`数据库名`.`user`.`age` AS `age`
FROM `数据库名`.`user` WHERE <in_optimizer>(`数据库名`.`user`.`id`,<EXISTS>(<primary_index_lookup>(<CACHE>(`数据库名`.`user`.`id`) IN table1 ON PRIMARY)))

In other words, the execution engine optimizes the in statement into an exists statement. Then analyze this sql: first perform a full table scan of the user table, load 500w pieces of data, and then take the id of the user table to the table1 table for matching. 500w pieces of data in the user table result in 500w matches for the table1 table. This SQL is very slow in version 5.5.

In mysql5.7 version

同样先执行 explain extended select * from user where id in (select uid from table1) ;

Then execute SHOW WARNINGS;

Get the real sql as follows


​ That is to say, in version 5.7, the execution engine optimizes the in sub-query statement into a join connection. From this sql, it can be seen that the table in in is optimized into a driving table, and the outer table in in is optimized into a driven table. The connection method is also in line with our inquiries.

1.5.4 Scope search analysis

The existing sql is as follows

  1. SELECT * FROM t_class WHERE id <= 6, its execution plan?
    First retrieve the primary key index to get the record from the lowest id=1, and then find the number of records '2,3,4,5,6,7' through a data page with a singly linked list relationship and return it to the server, where the id is judged =7 does not meet the conditions. Terminate the search. Get the result set. \
  2. SELECT * FROM t_class WHERE id >= 6, its execution plan?
    First retrieve the id record with the primary key index id=6 or greater than 6 and closest to 6. Then I feel that this record goes down to find out all the records.

Guess you like

Origin blog.csdn.net/weixin_44981707/article/details/108506087