Learning (High Performance Policy Index) Mysql index two together

By previous study, we understand the realization of the principle behind the index data structure Mysql and two MyISAM and InnoDB storage engine BTree index. Once you understand the mechanism behind the index, strong brother today and chat optimization Mysql index.

But before that we need to understand some concepts:

  • Samsung System

  • Selective index

  • BTree index limit

Samsung System

First, let's look at how to evaluate whether an index for a query of the "Samsung System": the index will put together the relevant records to get a star; if the same sort order and sequence data to find the index of the obtained Award ; if the column in the index contains all the columns query needs is to get Samsung.

According to Samsung system, we can better evaluate the quality of an index.

Selective index

Selectivity refers to the index, the index values ​​will not be repeated (also referred to as a base), and the ratio of the total number of the data table records (rows) in the range from 1 / rows between 1. The higher the selectivity of the query, the higher the efficiency of the index, because of the high selectivity index allows Mysql filter out more lines when looking. Selective unique index is 1, this is the best index of selectivity, the performance is the best. And if we are gender-field as an index to a table column, such selectivity is low, the index effect is not very obvious.

BTree index limit

According to previous tweets BTree index relevant content type we know, the index tree is ordered, which also caused some limitations:

  • If it is not in accordance with the leftmost column index to start looking, you can not use the index.

  • Can not skip a column index, that can not skip a column in front of the matching index does not directly jump to a final match.

  • If you have a column range queries query, all the columns to its right can not use an index to optimize queries.

Then we have to talk some high-performance indexing strategy:

Selective Index

According to the selective index mentioned above, we can determine the situation two non recommend creating an index:

  1. Table data comparison is low, for example, two thousand or even a few hundred records of the table, no need to build the index, so queries do full table scan just fine. As for how many records are considered more, this individual has a personal opinion, my personal experience is based on 5000 as the dividing line.

  2. When a relatively low selectivity index. When the selectivity is low, the total number of data rows and rows of data search engine returns after taking the index is almost the same, so that the operation of more than one taking the index, it is better to go directly to the full table faster.

Prefix index

Sometimes the need to index the column length is long, this time to let the index becomes big and slow. Then you can use the prefix index, that is, instead of the entire column with the prefix columns to create the index, the index can greatly save space, improve efficiency index. Column length such as 11 (a register value: performance), we have taken a length of 7 do prefix index (index value: perform).

But it will also reduce the selectivity index, the prefix is ​​too short, the selectivity is low, the index lost value. So it is necessary to select a high long enough to ensure the selective prefix, the index can not be too long leads to excessive.

We can calculate the size of the selective index of the column with the following statement:

SELECT COUNT(DISTINCT 索引字段名)/COUNT(*) FROM 表;//单列索引
SELECT COUNT(DISTINCT (LEFT(索引字段名,前缀长度)))/COUNT(*) FROM 表;//单列前缀索引
SELECT COUNT(DISTINCT (CONCAT(索引字段名1,索引字段名2)))/COUNT(*) FROM 表;//多列索引
SELECT COUNT(DISTINCT (CONCAT(索引字段名1,LEFT(索引字段名2,前缀长度))))/COUNT(*) FROM 表;//多列前缀索引

When large enough and the selective index length is not very big, we will be able to establish the prefix index.

Index column select the appropriate order

BTree in a multi-column index, the order of the index means that the index column is first sorted according to the leftmost column, followed by the second column, and so on. Therefore, the index can be scanned in ascending or descending order, column order to meet the exact-match the ORDER BY, GROUP BY clause and a query needs DISTINCT like. Therefore, the order of columns is essential multi-column index.

How to choose the order of columns for the index it? According to selectively expressed above, we are not able to draw the following conclusions: the most selective index into the forefront of the highest column.

When necessary to consider sorting and grouping, the highest selectivity index into the top of the column is generally very good. This effect only when the index for optimizing find WHERE condition. However, performance is not only dependent on the selectivity of all indexed columns, and also related to the specific value query condition, and is related to the distribution of values.

When using a prefix of the index, in certain conditions when technical values ​​higher than normal, the problem came. For example, in some applications, there is a different special administrator account and general account, all other users in the system, are the user's friends. This account of the huge list of friends can easily lead to the site server performance issues arise. In this case, even with all, but when obtaining the user's friends list, or get almost all records in the table - that is, the index is basically useless.

The solution to this situation is to modify the application code, distinguish these special user is prohibited to perform such queries operations for such users. Rather than using the index.

Innodb recommended for use in auto-increment primary keys

In the previous tweets, we know that the Mysql in Innodb engine used most BTree clustered index, the data record itself is stored in the leaf nodes of the main index. This requires the same within a leaf node the respective data (the size of a memory page or disk page) records stored primary key sequence, so whenever there is a new record is inserted into, MySQL according to its primary key which is inserted into the appropriate node and location, if the page load factor reached (InnoDB as the default 15/16), a new page (node) is open.

If the table using increment primary key, each time a new record is inserted, it will record a subsequent sequence added to the current position of the index node, when a filled, it will automatically open up a new page. As shown below:

This will form a compact index structure to fill the order approximation. Because each do not need to move existing data is inserted, so efficiency is very high, it will not add a lot of overhead in maintaining an index.

If non-increment primary key (if the ID number or student number, etc.), because each primary key values ​​similar to random insertion, each time a new record is inserted into an existing index page must have a certain intermediate position:

At this point MySQL had to order a new record into the proper position and movement data, even the target page may have been written back to disk and cleared from the cache, then you have to read back from the disk, which adds a lot of overhead while the frequent moves, paging operation caused a lot of debris, it was not compact index structure, the follow-up had to rebuild the table by oPTIMIZE tABLE statement and optimization fill the page. Therefore, if possible, should be designed to self-energizing a primary key fields on InnoDB.

Of course, the appeal does not take into account the increasing use of self-id id repeated in a distributed database system problems, but we can also generate special increment id of the database table or through a redis solved the problem, of course, also be used snowflake algorithm. In short, to understand BTree clustered index, we can use the knowledge gained to create a relatively efficient and rational master-key.

Associated delay index to achieve some coverage

Also on the concept of a tweet also mentioned a "covering index" when an index the index column covers all the fields select will be used to cover the index. Covering index can be obtained only on the necessary data from the index table without the need to query a back section, and because the index is in order, so that the efficiency for a range query I / O intensive high.

But often we rarely encounter situations covered by the index, are generally more than select the column index column, so you can not use the covering index. So is there any way to let us use the index to cover it?

This time we will be able to use the delay associated with that index can be used to cover the first few paragraphs Mysql query match is found in the FROM clause of a query data Id (or other unique attribute fields), then out based on these values ​​id layers that match the query to get all column values ​​need. Although you can not use this index covers the query, but at least better than some completely unable to cover the use of the index.

But here also mention that the strong brother, Mysql5.6 revision reflects a significant improvement in the storage engine API, which is called "Push conditions index (index condition pushdown)". This feature will greatly improve the current query, the next tweet will show strong brother, so stay tuned.

Using index scans do sort

Scanning the index itself is very fast because you only need to move from an index record next record immediately. However, if the index can not cover all of the columns required for the query, it would not have to scan an index entry to all inquiries back to the table once the corresponding row. This is basically a random I / O, and therefore the order of the index data read speed than normally full table scan sequentially slow, especially when the I / O intensive workloads.

Only when the order of the index column order and ORDER BY clauses of exactly the same order and direction of all the columns (or positive flashback sequence) are the same time, Mysql can use the index to sort the results do. If the index is associated with more than one table, the field only when the ORDER BY clause reference in its entirety for the first table, you can use the index to do the sort. ORDER BY clause is required to meet the requirements of the most left-prefix of the index, otherwise, Mysql need to perform the sort operations, but can not use the index to sort.

In one case there is a demand ORDER BY clause does not satisfy the index of the most left-prefix, it is listed as the leading time constants. If the WHERE clause or JOIN clause specified constants for these columns, you can make up the index.

If you do need to sort different directions, a technique is to store the column values, if the string, the string inverting the stored value; Inversion If a numeric type, the stored value. In this sort of time you can get to that column different from the other in the column direction of sorting ultimately want.

to sum up

According to the above study, we can see, understand Mysql data structure of the index, how important it is for index creation and optimization. Understand some of the underlying details, we will be able to operate your code more emboldened, rather than blindly guessing heuristics or rules of thumb to use other people to deal with the problems encountered.

Do not envy the moves, xiu internal strength, God willing you will soon become.

No. get more public attention, there are problems in public may ask questions Oh No:

 

Strong brother hundred thousand forced hundred

Hundred thousand forced hundred programming, Internet and nothing new insights

Published 54 original articles · won praise 69 · Views 250,000 +

Guess you like

Origin blog.csdn.net/seanxwq/article/details/93663081
Recommended