MySQL five: index principle and slow query optimization

1. Introduction

1. Why do you need an index?

    In general application systems, the read-write ratio is about 10:1, and there are few performance problems in insert operations and general update operations. In the production environment, we encounter the most problems, and are also the most prone to problems, or some complex ones. Query operations, so the optimization of query statements is obviously the top priority. When it comes to speeding up queries, we have to mention indexes.

2. What is an index?

    An index, also called a "key" in MySQL, is a data structure that the storage engine uses to quickly find records. Indexes are critical to good performance, especially as the amount of data in the table increases, the impact of indexes on performance becomes more and more important.

    Index optimization should be the most effective way to optimize query performance. Indexes can easily improve query performance by orders of magnitude. The index is equivalent to the sequence table of the dictionary. If you want to look up a certain word, if you don't use the sequence table, you need to look up page by page from hundreds of pages.

Notice:

    Indexing is an important aspect of application design and development. If there are too many indexes, the performance of the application may suffer. Too few indexes will have an impact on query performance. Finding a balance is critical to application performance.

The principle of indexing

1. The principle of indexing

The purpose of indexing is to improve query efficiency, which is the same as the directory we use to look up books: first locate the chapter, then locate a subsection under the chapter, and then find the number of pages. Similar examples are: looking up a dictionary, looking up train numbers, plane flights, etc.

The essence is: filter out the final desired results by continuously narrowing the range of data you want to obtain, and at the same time turn random events into sequential events, that is, with this indexing mechanism, we can always use The same lookup method to lock data.

2. Disk IO and read-ahead

Here is a brief introduction to disk IO and pre-reading. Disk reading data relies on mechanical motion. The time it takes to read data each time can be divided into three parts: seek time, rotation delay, and transmission time;

    The seek time refers to the time required for the magnetic arm to move to the specified track, and the mainstream disk is generally less than 5ms;

    The rotation delay is the speed of the disk that we often hear about. For example, a disk of 7200 rpm means that it can rotate 7200 times per minute, which means that it can rotate 120 times in 1 second, and the rotation delay is 1/120/2 = 4.17ms;

    The transmission time refers to the time to read or write data from the disk, usually in a few tenths of a millisecond, which is negligible compared to the first two times, so the time to access the disk once, that is, the time of one disk IO is about equal to 5+ 4.17 = about 9ms

Sounds pretty good, but know that a 500-MIPS (Million Instructions Per Second) machine can execute 500 million instructions per second, because instructions rely on the nature of electricity, in other words, the time to execute one IO can be Executing about 4.5 million instructions, the database often contains 100,000,000,000 or even 10,000,000-level data, every 9 milliseconds, is obviously a disaster.

Considering that disk IO is a very expensive operation, the computer operating system has made some optimizations. When an IO is performed, not only the data of the current disk address, but also the adjacent data is read into the memory buffer, because local The read-ahead principle tells us that when a computer accesses data at an address, the adjacent data will also be accessed quickly. The data read by each IO is called a page. How much data a page has is related to the operating system, generally 4k or 8k, that is, only one IO actually occurs when we read the data in a page. This theory is very helpful for the data structure design of the index.

Third, the data type of the index

1036857-20170912011123500-158121126.png

1. b+ tree

As shown in the figure above, it is a b+ tree. For the definition of b+ tree, please refer to B+ tree. Only some key points are mentioned here. The light blue block is called a disk block. You can see that each disk block contains several data items. (shown in dark blue) and pointers (shown by ×××), such as disk block 1 contains data items 17 and 35, including pointers P1, P2, P3, P1 indicates disk blocks less than 17, P2 indicates between 17 and 35 The number of disk blocks in between, P3 represents disk blocks greater than 35. The real data exists in the leaf nodes namely 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not only store real data, but only store data items that guide the search direction. For example, 17 and 35 do not actually exist in the data table.

2. The search process of b+ tree

    As shown in the figure, if you want to find the data item 29, then the disk block 1 will be loaded from the disk to the memory first. At this time, an IO occurs, and the binary search is used to determine that 29 is between 17 and 35 in the memory, and the disk block 1 is locked. The P2 pointer, the memory time is negligible because it is very short (compared to the IO of the disk), and the disk block 3 is loaded from the disk to the memory through the disk address of the P2 pointer of the disk block 1, and the second IO occurs, 29 at 26 and Between 30, the P2 pointer of disk block 3 is locked, and disk block 8 is loaded into memory through the pointer, and the third IO occurs. The real situation is that a 3-layer b+ tree can represent millions of data. If millions of data lookups only require three IOs, the performance improvement will be huge. If there is no index, each data item will have an IO. , then a total of millions of IOs are required, obviously the cost is very, very high.

3, b+ tree properties

    1) The index field should be as small as possible: Through the above analysis, we know that the number of IO depends on the height h of the b+ number. Assuming that the data of the current data table is N, and the number of data items of each disk block is m, then there is h =㏒(m+1)N, when the amount of data N is constant, the larger m is, the smaller h is; and m = the size of the disk block / the size of the data item, the size of the disk block is also the size of a data page , is fixed, if the space occupied by data items is smaller, the number of data items is greater, and the height of the tree is lower. This is why each data item, that is, the index field, should be as small as possible. For example, int occupies 4 bytes, which is half less than bigint 8 bytes. This is also why the b+ tree requires the real data to be placed in the leaf nodes instead of the inner nodes. Once placed in the inner nodes, the data items of the disk blocks will drop significantly, causing the tree to grow taller. When the data item is equal to 1, it will degenerate into a linear table.

    2) The leftmost matching feature of the index: when the data item of the b+ tree is a composite data structure, such as (name, age, sex), the b+ number builds the search tree in the order from left to right, such as when When retrieving data like (Zhang San, 20, F), the b+ tree will first compare the name to determine the next search direction. If the name is the same, then compare age and sex in turn, and finally get the retrieved data; but when ( 20, F) When such data without name comes, the b+ tree does not know which node to check next, because name is the first comparison factor when building a search tree, and it must be searched according to name to know the next step. Where to go to inquire. For example, when data such as (Zhang San, F) is retrieved, the b+ tree can use name to specify the search direction, but the next field age is missing, so only the data whose name is equal to Zhang San can only be found, and then the gender is matched. is the data of F. This is a very important property, that is, the leftmost matching feature of the index.


Fourth, clustered index and auxiliary index

In the database, the height of the B+ tree is generally 2 to 4 layers, which means that it only takes 2 to 4 IOs at most to find the row record of a certain key value, which is not bad. Because the current general mechanical hard disk can do at least 100 IOs per second, 2~4 IOs mean that the query time only takes 0.02~0.04 seconds. The B+ tree index in the database can be divided into clustered index and secondary index.

    The clustered index is the same as the auxiliary index: whether it is a clustered index or an auxiliary index, its interior is in the form of a B+ tree, that is, the height is balanced, and the leaf nodes store all the data.

    The difference between a clustered index and an auxiliary index is: whether the leaf node stores the information of a whole row

1. Clustered index

The InnoDB storage engine represents an index-organized table, that is, the data in the table is stored in the order of the primary key. The clustered index constructs a B+ tree according to the primary key of each table, and the leaf nodes store the row record data of the entire table, and the leaf nodes of the clustered index are also called data pages. This feature of the clustered index determines that the data in the index-organized table is also part of the index. Like the B+ tree data structure, each data page is linked through a doubly linked list.

    If the primary key is not defined, MySQL takes the first unique index (unique) and contains only non-null columns (NOT NULL) as the primary key, and InnoDB uses it as the clustered index.

    If there is no such column, InnoDB itself generates one such ID value, which is six bytes long and hidden, making it a clustered index.

Since the actual data pages can only be sorted according to one B+ tree, each table can only have one clustered index. In some cases, the query optimizer tends to use a clustered index. Because the clustered index can directly find data on the leaf nodes of the B+ tree index. In addition, clustered indexes enable particularly fast access to range-valued queries because of the defined logical order of data.

    1) One of the advantages of a clustered index: it is very fast in sorting and range searching of the primary key, and the data of the leaf node is the data that the user wants to query. If the user needs to find a table and query the last 10 user information, since the B+ tree index is a doubly linked list, the user can quickly find the last data page and retrieve 10 records

    2) The second benefit of the clustered index: range query, that is, if you want to find data within a certain range of the primary key, you can get the page range through the upper intermediate node of the leaf node, and then directly read the data page.

2. Auxiliary index

Except for the clustered index, all other indexes in the table are auxiliary indexes (also called non-clustered indexes). The difference with the clustered index is that the leaf nodes of the auxiliary index do not contain all the data of the row records. In addition to the key value of the leaf node, the index row in each leaf node also contains a bookmark (bookmark). This bookmark is used to tell the InnoDB storage engine where to find the row data corresponding to the index. Since the InnoDB storage engine is an index-organized table, the bookmark of the secondary index of the InnoDB storage engine is the clustered index key of the corresponding row data.

    The existence of auxiliary indexes does not affect the organization of data in the clustered index, so there can be multiple auxiliary indexes on each table, but only one clustered index. When looking for data through the secondary index, the InnoDB storage engine traverses the secondary index and obtains the primary key that only wants the primary key index through the leaf-level pointer, and then uses the primary key index to find a complete row record.

    For example, if you search for data in an auxiliary index tree with a height of 3, you need to traverse the auxiliary index tree 3 times to find the specified primary key. If the height of the clustered index tree is also 3, then you need to traverse the auxiliary index tree three times to find the specified primary key. After 3 searches, a page containing a complete row data is finally found, so a total of 6 logical IO accesses are required to obtain a final data page.


Five, MySQL index management

1. Function

1) The function of the index is to speed up the search

2) The primary key, unique, and joint unique in mysql are also indexes. In addition to speeding up search, these indexes also have the function of constraints

2, MySQL commonly used indexes

Ordinary index INDEX: speed up lookups

Unique index:

    -primary key index PRIMARY KEY: speed up lookup + constraints (not empty, not repeatable)

    - UNIQUE index UNIQUE: speed up lookup + constraint (cannot be repeated)

Joint index:

    -PRIMARY KEY(id,name): Joint primary key index

    -UNIQUE(id,name): union unique index

    -INDEX(id,name): Joint common index

3. The two major types of indexes are hash and btree (when creating the above index, specify the index type for it)

    Hash type index: single query is fast, range query is slow

    Btree type index: b+ tree, the more layers, the exponentially increasing the amount of data (we use it because innodb supports it by default)

Different storage engines support different index types

    InnoDB supports transactions, supports row-level locking, supports B-tree, Full-text and other indexes, but does not support Hash indexes;

    MyISAM does not support transactions, supports table-level locking, supports indexes such as B-tree and Full-text, and does not support Hash indexes;

    Memory does not support transactions, supports table-level locking, supports indexes such as B-tree and Hash, and does not support Full-text indexes;

    NDB supports transactions, supports row-level locking, supports Hash indexes, but does not support indexes such as B-tree and Full-text;

    Archive does not support transactions, supports table-level locking, and does not support indexes such as B-tree, Hash, and Full-text;

4. Syntax for creating/deleting indexes

Method 1: When creating the table

      CREATE TABLE 表名 (

                fieldname1 datatype [integrity constraints...],

                field name 2 data type [integrity constraints...],

                [UNIQUE | FULLTEXT | SPATIAL ]   INDEX | KEY

                [index name] (field name [(length)] [ASC |DESC]) 

                );

Method 2: CREATE to create an index on an existing table

        CREATE [UNIQUE | FULLTEXT | SPATIAL ] INDEX index name 

                     ON tablename(fieldname[(length)][ASC|DESC]) ;

Method 3: ALTER TABLE creates an index on an existing table

        ALTER TABLE 表名 ADD  [UNIQUE | FULLTEXT | SPATIAL ] INDEX

                             indexname(fieldname[(length)][ASC|DESC]);

Drop index: DROP INDEX index name ON table name;


6. Index test


7. Correct use of indexes


Eight, joint index and covering index

1. Joint index

Joint index refers to combining multiple columns on the table to make an index. A joint index is created in the same way as a single index, except that there are multiple index columns

2. Covering Index

The InnoDB storage engine supports covering indexes (or index coverage), that is, query records can be obtained from auxiliary indexes without querying records in clustered indexes.


9. Query optimization artifact-explain


Ten, the basic steps of slow query optimization

0. Run it first to see if it is really slow, pay attention to setting SQL_NO_CACHE

1. Where condition is a single table query, and the minimum return record table is locked. The meaning of this sentence is to apply the where of the query statement to the table with the smallest number of records returned in the table and start the search. Each field of the single table is queried separately to see which field has the highest degree of discrimination.

2.explain to see the execution plan, whether it is consistent with the expectation of 1 (start the query from the table with fewer locked records)

3. The sql statement in the form of order by limit allows the sorted table to be checked first

4. Understand the usage scenarios of the business side

5. When adding indexes, refer to several principles of index building

6. Observation results that do not meet expectations continue to analyze from 0


Eleven slow log management

slow log

    - execution time > 10

    - miss index

    - log file path

Configuration:

    - RAM

        show variables like '%query%';

        show variables like '%queries%';

        set global variablename=value

    - Profile

        mysqld --defaults-file='E:\wupeiqi\mysql-5.7.16-winx64\mysql-5.7.16-winx64\my-default.ini'        

    my.conf content:

        slow_query_log = ON

        slow_query_log_file = D:/....

    Note: After modifying the configuration file, you need to restart the service


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325337197&siteId=291194637
Recommended