Principle and optimization of MySQL indexes

Foreword

Mysql need to learn to learn to say online in two parts, indexes and transaction, in fact, in a recent Mysql learning process, I think there should be a three-part indexes, queries, transactions, which mainly refers to the high query query optimization that is written efficiency of SQL statements.

This paper records to learn some knowledge about MySQL indexing process. Mainly for reading "High Performance MySQL" some understanding and expand.

(I want to learn programming from a small partner search circle T community , more and more industry-related industry information about free video tutorials. Oh, absolutely free!)

What is the index

The index is a data structure storage engine used to quickly find the record.

This is the official definition of MySQL indexes, you can see the index is a data structure, then how should we understand the index it? A common example is the book's table of contents. We have developed a habit of looking catalog, get a when this book, we will first go to see his catalog, and when we want to find an item, we will look in the directory, and then locate the fragment corresponding page number, and then go find the book according to the corresponding page number. If there is no index (catalog), we can only page by page to find a.

In MySQL, suppose we have a record of the following table:

id name age
1 huyan 10
2 huiui 18
3 lumingfei 20
4 chuzihang 15
5 ninth 21

If we want to find the age of 15 names of people, in the absence of an index through all the data we can only do it one by one comparison, then the time complexity is O (n).

And if we are in the process of inserting data in an array of additional maintenance, the age field of orderly storage. Get the following array.

[10,15,18,20,21]
 |  |  |  |  |
[x1,x4,x2,x3,x5]

The following x is the storage location on the analogue data and disk. This time if we need to find 15-year-old man's name. We can cover an array of binary search. As we all know, time binary search complexity is O (logn). Find after again to acquire the real data according to a specific location.

PS: MySQL array index is not used, B + tree, but the use of (say later), where an array of example only because of a better understanding.

What indexes can bring us?

As mentioned above, the index can help us to quickly find the data. Secondly, because the value of the index is the order of storage, you can help us orderby operation and the index is also stored in real value, so there are some queries directly It can be completed in the index (that is, the concept of covering indexes, will be mentioned later).

Summarizes the advantages is that the index ( "high performance" book summary):

  • Reduce the amount of data query needs to scan (to speed up the query speed)
  • Reduce sorting operation of the server and the creation of a temporary table (to speed up the groupby orderby and other operations)
  • The random IO server becomes the order of IO (speed up queries).

Index What are the disadvantages?

First, the index data is also need to be stored, and therefore will bring additional storage space occupied. Second, insert, update and delete operations at the same time, the need to maintain the index, and therefore will bring additional time overhead.

in conclusion:

  • Index takes up disk space or memory
  • Slows down the insertion of update operations

In fact, in a certain data range (the index is not the super-more cases), indexing the overhead is much smaller than the benefits it brings, but we are still indexed to prevent abuse.

What types of indexes are?

For MySQL, the index at the server level is not achieved, but to the storage engine to achieve, and therefore different types of storage engines index achieved quite the same .InnoDB most widely used storage engine as the current, using a B + tree index, so most of the time we are also mentioned in the index finger of it.

MySQL mainly the following indexes:

  • B- tree index / B + tree index
  • Hash indexes
  • Spatial Data Index
  • Full-text index

Learning herein only B- and B + tree index tree index.

B- and B + tree index tree index

Here will not be particularly detailed explanation of the principles tree data structure B- and B + tree, there is little interest in the venue and partners can refer to the article in the article. Google or through self-understanding.

B- tree

B- tree is more than one balanced search trees, for an order M B- tree has the following properties:

  1. Root node has at least two children.
  2. Each node contains the elements of the k-1 and k children, where m / 2 <= k <= m.
  3. Each leaf node contains elements k-1, where m / 2 <= k <= m.
  4. All leaf nodes at the same level.
  5. Each node element in the ascending order, then the k-1 is divided exactly k elements comprising the range of children.

So there may be some difficult to understand, B- tree can be understood as a more chunky binary search tree.

B + Tree

B- B + tree is a tree advanced version, based on the B- tree has made the following restrictions:

  1. Each intermediate node does not save the data only for the index, which means the value of all non-leaf nodes are stored in a leaf node.
  2. The link between leaf nodes according to their order.

This will bring what good is it?

  1. Intermediate node does not save the data, then you can save more indexes, reducing the number of database disk IO.
  2. Since the intermediate node does not save the data, so each will look to hit the leaf nodes, and leaf nodes are at the same level, so the more stable performance of the query.
  3. All the leaves become nodes sequentially linked list, it can be convenient to carry out the scope of the query.

How to create a high performance index?

Since the optimization index and query optimization in general are inseparable, so this one may contain the query optimizer portion of the content.

Prefix index and selectivity index

If you want to give a long string adding an index, you can consider using a prefix index. Prior to the formal introduction prefix index, we first consider the index probably work steps, when the database using the index to find, usually following a few steps :

  1. Find the corresponding values ​​in the index of B + trees, such as finding a name for a school record Kassel College, and get this address data on the disk.
  2. According to the address on the disk to find, to get all the values ​​of the record.

So if the value of all the name of the school, the Kassel can only identify this data, then use the index to do if you can reach Kassel and Kassel College to do the same index effect?

The answer is yes, and the use of Kassel, it is possible to reduce the size of the index to 60% of the original. This is the role of the prefix index.

Prefix index: When a relatively long string index, the index can be only a part of a character start, thus greatly saving space index, thereby improving the efficiency index, but it will also reduce the selectivity index.

Selective index: Unique value / values ​​can be seen that all of the selective index 0-1, the highest is the only column, efficient so no duplicate unique index value is better...

But in general, some of the longer string prefix selectivity is relatively good, that we may be calculated using the following statement:

select 
    count(distinct left(school_name,3))/count(*) as sch3, 
    count(distinct left(school_name,4))/count(*) as sch4,
    count(distinct left(school_name,5))/count(*) as sch5,
    count(distinct school_name)/count(*) as original
from 
    user;

Where to find the original is the original selectivity, sch3,sch4,sch5namely 3,4,5 take the first characters of the column as a selective index of the time. Gradually increase this value, the selectivity of the original little difference when the time is a suitable length of the prefix index. (generally this is the case, but there are exceptions, when the data is extremely uneven, this prefix index will show in a particular case is very bad).

After finding the right length, you can create an index of the prefix: alter table user add index sch_pre3(School (3))

Note: The prefix indexes and covering indexes are used together is difficult, just this morning I tried to optimize the index goes through this process after without success, to explain the specific reasons described below after covering index finished.

Joint index

Generally, we all have multiple columns demand index, as demand for a variety of queries. This time we can choose to create multiple independent index or the establishment of a joint index. Most of the time are more appropriate number of joint index .

Suppose we want to execute this statement: select * from user where school_name = 'Kassel' and age> 20, we have established two separate indexes were in school and age, we expect this query will hit two indexes , but the use is not necessarily explain command to view the find. it is a metaphysical process. personally I did not understood.

In theory, the index MySQL support for the merger, which is the simultaneous use of two indexes version 5.0 after it, but MySQL optimizer does not necessarily think so, he might think twice the cost of the query is greater than B + tree queries after the first index to the data table is filtered, and therefore will choose only one index. (I did a similar test in this case on its own 5 tables, the result is only an index.)

The syntax to create a joint index: the Table the User the Add the ALTER index school_age ( school, age).

When used in conjunction index, there is a very important factor is that all index columns only can be the most left-prefix match, for example school_age joint index above, when using only age as a query is not available, that select * from user where age = 20 was not hit above the joint index.

Without considering any queries, we should speak highly selective index of the column in front of the joint, but in fact we are more thrust reversers by querying the index, so that queries can be a fixed as much as possible hit the index to improve query speed. after all, our aim is to establish an index to speed up queries.

Therefore, more optimization is based on a joint index or some statement optimization does not have a general rule.

A leftmost prefix of principle

When the data column when ordering, mysql can use the index, it is assumed that we have established school_age index, the sample data is as follows:

school age
a 12
b 12
b 14
b 15
c 1

In this data, school field is completely ordered, indexed school can use the index.

From the point of view the whole table, age field is not ordered, and therefore can not directly use the index, then look at the data sheet, the age when ordered it? When are valued matches in school, such as the time when school = b for these three data concerned, age is ordered, so you can use age index. this is the most left-prefix principle.

In addition, a leftmost prefix can only use a range of queries, such as select * from user where school> a, select * from user where school = a and age> 12, hit all be indexed, but select * from user where school > a and age> 12, the school can only hit the index, which can be concluded from the above because when school is time to match the range, mysql unable to confirm the age field is strict and orderly, such as the scope of school match hit b , c of the four pieces of data, so age is not orderly. can not be used subsequent index.

Clustered index

Clustered index is not an index type, but a way of storing data .Innodb clustered index is an index and data stored in the same data structure.

Because the data is real data there is only one sort, so a table can have only one clustered index .Innodb to using the primary key clustered index, primary key, they would not choose a unique non-empty index, if it has not , innodb will choose to generate an implicit primary key to the clustered index. Why innodb so persistent need to engage in a clustered index, because the data of a data table must have and only one sort to store on disk, So this is a must.

This is also the innodb recommend reason we use auto-increment primary keys, because the auto-increment primary keys increment and continuous, at the time of insertion just need to keep in the back append the data can be. Imagine that use UUID as primary key, then every insert operation , you need to find the current position in the sorted primary key in the primary key, and then inserted, and to move the primary key data, and the master key so that the data in the same order, this is very costly.

Also for this reason, the leaf nodes other indexed, stored in the "data" is actually not a real physical address of the data, but the primary key of the data, and then find the primary key, and then did an index based on the primary key, to get the data.

And non-clustered index difference between the cluster index may be a simple example to illustrate:

When we get a book, directory is the primary key is a clustered index, because in the directory continuous content in the body is continuous, facing the sun when we want to see a grand escape chapters, just in it is found in the directory corresponding page, such as 459, then go to the corresponding page number to view the body.

Rather than clustered index does, it is similar to the back of the book as Appendix index proper nouns (two general index), when you look Bondarev when the appendix will tell you, the term appeared in a grand facing the sun fleeing a, then you need to go to the directory (primary key index) looks again to the corresponding page number.

Covering index

When an index contains (or covering) the value of all the fields to be queried, we call covering index.

Imagine the following query:

select 
  school_name,age
from  
  user
where 
  school_name = '金色莺尾花学院'

This statement is based on the school name to query the school name and age data lines from step data query above we can know when to find the requirements in the index values, you also need to conduct a index based on the primary key, to get all the the data, which is then selected from the columns needed to return. but now the index already contains all of the need to return a column, then do not back the operation of the query data table, in addition to the index it is generally much smaller than the real size data size, covering indexes can greatly reduce the amount of data loaded from disk.

Why it can not be used with the prefix indexes and covering indexes?

Because the purpose of the index is prefixed with the prefix to represent the true value, they have little differences in selectivity, but MySQL still can not determine what the real data, such as Ali Baba and Ali mother prefix when 2 is the same, However, in order to ensure your inquiry Ali Baba Ali mother when content does not appear, is the need to get data back to the data table a precise match to filter again.

Therefore, the coverage index can not be used with the prefix index column, This is my time to test a morning concluded.

Delete redundant and repetitive index

Some indexes are never used in the query before, but in vain to increase the overhead data insertion, the index for this we should be deleted timely.
For example, re-establish a normal index on the primary key, there is no doubt the role.
Also for example, in the presence of joint index school_age case, then create a separate index of a school, because most left-prefix index matching principle, school_age is completely separate query hits to school, so that the latter can be removed.

How to view some information about the index?

Index Information

In mysql you can use show index from table_nameto see the index on a table, it will have the following output:
Here Insert Picture Description
or use show create table table_nameto view the construction of the table statement, which contains the index creation statement.

Index Size

5.0 in future versions, we can see information_schema.TABLESto get more detailed data table.

The meaning of each field of the table as follows:

Field meaning
Table_catalog Directory table registration data
Table_schema The name of the database table data belongs
Table_name Table name
Table_type Table Type [system view
Engine Database Engine [MyISAM use
Version Version, the default value of 10
Row_format Line format [Compact
Table_rows The number of table rows of data stored
Avg_row_length The average row length
Data_length Data length
Max_data_length The maximum data length
Index_length Index length
Data_free Space debris
Auto_increment Do auto-increment primary keys automatically increment the current value
Create_time Table creation time
Update_time Updated table
Check_time Check the time table
TABLE_COLLATION Check encoded character set table
Checksum Checksum
Create_options Creation Options
Table_comment Table notes, notes

We can obtain detailed information through a number of queries, such as:

// 查看当前MySQL服务器所有索引的大小(以MB为单位,默认是字节)
SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), ' MB') AS 'Total Index Size' FROM TABLES
// 查看某一个库的所有大小
SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), ' MB') AS 'Total Index Size' FROM TABLES  WHERE table_schema = 'XXX';
// 查看某一个表的索引大小
SELECT CONCAT(ROUND(SUM(index_length)/(1024*1024), 2), ' MB') AS 'Total Index Size' FROM TABLES  WHERE table_schema = 'yyyy' and table_name = "xxxxx";  
// 汇总查看一个库中的数据大小及索引大小
SELECT CONCAT(table_schema,'.',table_name) AS 'Table Name', CONCAT(ROUND(table_rows/1000000,4),'M') AS 'Number of Rows', CONCAT(ROUND(data_length/(1024*1024*1024),4),'G') AS 'Data Size', CONCAT(ROUND(index_length/(1024*1024*1024),4),'G') AS 'Index Size', CONCAT(ROUND((data_length+index_length)/(1024*1024*1024),4),'G') AS'Total'FROM information_schema.TABLES WHERE table_schema LIKE 'xxxxx';

View all of the tables table data are possible, which also contains some data table itself, but because the subject matter of this article and do not meet, there is not an example of the child.

Update this table when the table above is cached, when updating the database index, the best execution analyze table xxxx, and then view .MySQL major changes will occur in the form of data (more than 1/16 the size of the change Note: or insert 2 billion rows).

Index fragmentation

In the deletion process to create the index, the index fragmentation products inevitably, of course, fragmentation of data, we can perform optimize table xxxto re-organize and index the data do not support this command storage engine, it can be a meaningless alter statement to trigger consolidation, such as: the table storage engine to replace the current engine,alter table xxxx engine=innodb.

Reference article

Book "High Performance MySQL (third edition)"

Guess you like

Origin blog.csdn.net/wanghao112956/article/details/91040590