MySQL Index Tuning-depth analysis of the implementation plan

MySQL Index Optimization
Why did you write sql query slow? Why do you build an index often fail? By this chapter, you will learn the meaning of the reasons MySQL performance degradation, the principle of introduction of the index, the index was created, use the command explain, and explain the output field. Help you understand the index, index analysis, using the index, so write more performance sql statement. Waiting for what? Roll up their sleeves is dry!

Case studies
we briefly look at the difference between non-relational databases and relational databases.

MongoDB is a NoSQL in. NoSQL stands for Not only SQL, non-relational database. It is characterized by high performance, expansion and strong, flexible model, was particularly prominent in high concurrency scenarios. But it is only complementary relational databases, and relational databases are still some gaps in its data consistency, data security, query complexity.

MySQL is a relational database, strong query capabilities, high data consistency, data security, support for secondary indexes. But slightly inferior performance and MongoDB, in particular, more than one million level data, the query is prone to the phenomenon of slow. This time the need to analyze the causes of slow queries, under normal circumstances is a sql programmer to write bad, or there is no key index or an index such as the failure causes.

The company is a major ERP system database MongoDB (NoSQL closest relational data), followed by Redis, only a small part of MySQL. Now again using MySQL, thanks to Alibaba's Miracle system and gather stone tower system. Considering the number of orders has more than one million, the analysis of the performance of MySQL will become more important.

Let's take two simple examples to get started. Details will be back on the role and significance of each parameter.
Description: The need to use sql has been placed on github, and like the students can tap the star, ha ha. https://github.com/ITDragonBlog/daydayup/tree/master/MySQL/

Scene One: introducing orders, avoiding duplicate transactions by single number
service logic: when introducing order to avoid repetition by single, generally by transaction number to query the database, determines whether the order already exists.

The most basic sql statement

mysql> select * from itdragon_order_list where transaction_id = “81X97310V32236260E”;
±------±-------------------±------±-----±---------±-------------±---------±-----------------±------------±------------±-----------±--------------------+
| id | transaction_id | gross | net | stock_id | order_status | descript | finance_descript | create_type | order_level | input_user | input_date |
±------±-------------------±------±-----±---------±-------------±---------±-----------------±------------±------------±-----------±--------------------+
| 10000 | 81X97310V32236260E | 6.6 | 6.13 | 1 | 10 | ok | ok | auto | 1 | itdragon | 2017-08-18 17:01:49 |
±------±-------------------±------±-----±---------±-------------±---------±-----------------±------------±------------±-----------±--------------------+

mysql> explain select * from itdragon_order_list where transaction_id = “81X97310V32236260E”;
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±------------+
| 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 33.33 | Using where |
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±------------+

We do not have any problem queries, online test environment without any problems. However, once the on-line functions, query slow problem is oncoming. Hundreds of millions of orders, with a full table scan? what? Well!
How do you know which sql is a full table scan it? By command can explain clearly how MySQL is handling sql statement. Anything printed respectively:
the above mentioned id: Query serial number 1.
select_type: type of query is a simple query, simple select statement and no union sub-queries.
table: Table is itdragon_order_list.
partitions: no partition.
type: the type of connection, all by way of showing a full table scan.
possible_keys: use the index may be null.
key: actually used index is null.
key_len: index length of course null.
ref: no columns or parameters and key are used together.
Extra: Use of where the query.
Since the database only three data, so little information about the role of rows and filtered. It is important to understand here is that the type is ALL, full table scan performance is the worst, assume that the database has millions of pieces of data, without the help of the index will be extremely Caton.

Preliminary Optimization: Create index transaction_id

mysql> create unique index idx_order_transaID on itdragon_order_list (transaction_id);
mysql> explain select * from itdragon_order_list where transaction_id = “81X97310V32236260E”;
±—±------------±--------------------±-----------±------±-------------------±-------------------±--------±------±-----±---------±------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±--------------------±-----------±------±-------------------±-------------------±--------±------±-----±---------±------+
| 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | NULL |
±—±------------±--------------------±-----------±------±-------------------±-------------------±--------±------±-----±---------±------+

Index created here is a unique index, rather than the general index.

type print a unique index value is const. Representation can be found through the index once. That ended a scan to find the value of return query results.

type value general index print is ref. It represents a non-unique index scan. Find the value will continue to scan, index file is scanned until the last. (There is no posted code), it is apparent, const far higher than the performance of ref. And it is determined according to the service logic, creating a unique index is reasonable.

再次优化:覆盖索引
mysql> explain select transaction_id from itdragon_order_list where transaction_id = “81X97310V32236260E”;
±—±------------±--------------------±-----------±------±-------------------±-------------------±--------±------±-----±---------±------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±--------------------±-----------±------±-------------------±-------------------±--------±------±-----±---------±------------+
| 1 | SIMPLE | itdragon_order_list | NULL | const | idx_order_transaID | idx_order_transaID | 453 | const | 1 | 100 | Using index |
±—±------------±--------------------±-----------±------±-------------------±-------------------±--------±------±-----±---------±------------+

Select * from here after changed to select transaction_id from, Extra display Using index, indicating that the query uses a covering index, which is a very good news, indicating that the good performance sql statement. If the tips are Using filesort (using internal sorting) and Using temporary (a temporary table) indicates the immediate need to optimize the sql.

According to the business logic, the query returns transaction_id structure is to meet the requirements of the business logic.

Scene Two: order management page, the level of orders and order entry time by ordering
business logic: to prioritize high-order level, long-time order entry.

Since it is sorted, the first thought should be the order by, there is a terrible Using filesort waiting for you.

The most basic sql statement

mysql> explain select * from itdragon_order_list order by order_level,input_date;
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±---------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±---------------+
| 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort |
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±---------------+

First, a full table scan unreasonable, also uses file sorting Using filesort, more slow down performance.
MySQL 4.1 version of the file before sorting is the use of two-way sorting algorithm, due to the two scans the disk, I / O takes too long. After sorting algorithm optimized for single channel. Its essence is to use space for time, but if too much data, not enough space buffer, it will lead to multiple I / O's. Its impact will be even worse. Instead of looking for the operation and maintenance colleagues to modify the MySQL configuration, not as their own obediently to build the index.

初步优化:为order_level,input_date 创建复合索引
mysql> create index idx_order_levelDate on itdragon_order_list (order_level,input_date);
mysql> explain select * from itdragon_order_list order by order_level,input_date;
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±---------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±---------------+
| 1 | SIMPLE | itdragon_order_list | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100 | Using filesort |
±—±------------±--------------------±-----------±-----±--------------±-----±--------±-----±-----±---------±---------------+

After you create a composite index will be surprised, and did not create the same index? ? ? Are full table scans are used to sort the files. Failure is an index? Or index creation fails? We try to look at the situation following print

mysql> explain select order_level,input_date from itdragon_order_list order by order_level,input_date;
±—±------------±--------------------±-----------±------±--------------±--------------------±--------±-----±-----±---------±------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±--------------------±-----------±------±--------------±--------------------±--------±-----±-----±---------±------------+
| 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | Using index |
±—±------------±--------------------±-----------±------±--------------±--------------------±--------±-----±-----±---------±------------+

He will be replaced by a select * from select order_level, after input_date from. type all to upgrade from the index, expressed (full index scan) scan the whole index file, Extra also shows the use of a covering index. But not ah! ! ! ! Although fast retrieval, but returned content only order_level and input_date two fields, so that business colleagues how to use? Is the Every field to build a composite index?
MySQL is not stupid, you can use the force index enforce the index. Modify force index on the original sql statement (idx_order_levelDate) can be.

mysql> explain select * from itdragon_order_list force index(idx_order_levelDate) order by order_level,input_date;
±—±------------±--------------------±-----------±------±--------------±--------------------±--------±-----±-----±---------±------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±--------------------±-----------±------±--------------±--------------------±--------±-----±-----±---------±------+
| 1 | SIMPLE | itdragon_order_list | NULL | index | NULL | idx_order_levelDate | 68 | NULL | 3 | 100 | NULL |
±—±------------±--------------------±-----------±------±-----------

Optimization again: the order level really want to sort it?
In fact, to sort the order level significance, added to the order level index not be very meaningful. Because the value of order_level may be low, medium, high, rush, four. For this is not repeated and the average of the field, sorting and processing index distribution role.

Can we order_level first fixed value, and then give input_date sort? If the query effect is obvious, it is recommended to use the query business colleagues.

mysql> explain select * from itdragon_order_list where order_level=3 order by input_date;
±—±------------±--------------------±-----------±-----±--------------------±--------------------±--------±------±-----±---------±----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±--------------------±-----------±-----±--------------------±--------------------±--------±------±-----±---------±----------------------+
| 1 | SIMPLE | itdragon_order_list | NULL | ref | idx_order_levelDate | idx_order_levelDate | 5 | const | 1 | 100 | Using index condition |
±—±------------±--------------------±-----------±-----±--------------------±--------------------±--------±------±-----±---------±----------------------+

And compared to the previous sql, type upgraded from index ref (non-unique index scan). 68 from the length of the index into a 5 described only an index. ref is a constant. Using index condition for the Extra represented automatically based on a threshold, select the index scan or a full table scan. Overall sql performance far better than before.

The above two cases just started, we need to remember that strict: optimization is based on the business logic. In order to optimize and must not be allowed to modify business logic. If you can modify the course is the best.

Introduction to Indexes
official definition: Index (Index) to help MySQL efficiently get the data structure of the data.
We must be very curious as to why the index is a data structure, it is how to improve the speed of queries? We get most commonly used to analyze the working principle of a binary tree index. Look at the following picture:

The advantages of creating an index:

  1. Improve the speed of data retrieval, database IO reduce costs: by using the index of significance is the number of records in the table need to narrow the query in order to speed up the search.
  2. Reduce the cost of data sorting, reducing CPU consumption: The reason why the index search fast, because the data is first sorted, if just need to sort this field is nice reduces the cost of sorting.

Create an index of disadvantage:

  1. Take up storage space: the index is actually a table, records the primary key and index fields, generally in the form of index files are stored on disk.
  2. Update the table to reduce speed: data table changes, the corresponding index also needs to be changed together, thereby reducing the update speed. Otherwise, the index pointed to physical data may be wrong, and this is one of the reasons for the failure of the index.
  3. Quality index creation difficult: the index is not created in a day, it is not been the same. Frequently based on user behavior and the specific business logic to create the best index.

Index classification
we often say that the index generally refers index (multiple search tree) structure of the organization BTree. Wherein there are clustering index, secondary index, a composite index, prefix index, a unique index, the index collectively, of course, in addition to the B + tree, and a hash index (hash index) and the like.

Single-valued index: index contains only a single column, a table can have multiple separate index
unique index: index column value must be unique, but allow nulls
composite index: an index containing a plurality of columns, the actual development is recommended to use
the actual the number of index development is recommended to use a composite index, and the creation of a single table is not recommended over five

The basic syntax:
create:

create [unique] index indexName on tableName (columnName…)
alter tableName add [unique] index [indexName] on (columnName…)

delete:

drop index [indexName] on tableName

View:

show index from tableName

What circumstances need to be indexed:

  1. Primary key, a unique index
  2. Fields often used as the query conditions need to create an index
  3. Often we need sorting, grouping and statistics fields need to index
  4. Query tables and other related fields, foreign key relationships indexes

What case do not be indexed:

  1. Record table too little, one million less data does not need to create an index
  2. Frequent additions and deletions to the list do not need to create an index
  3. And an average of duplicate data field distribution need to create an index, such as true, false and the like.
  4. Frequent updates are not suitable for creating an index field
  5. where conditions do not need to create an index less than in the field

Performance Analysis
MySQL itself bottleneck
MySQL see their own performance problems have insufficient disk space, disk I / O is too large, low-performance server hardware.

  1. CPU: CPU saturation generally occurs when the data into memory or reading data from the disk when
  2. IO: disk I / O bottleneck occurs when data is loaded into the memory capacity is much larger than
  3. Server hardware performance bottlenecks: top, free, iostat and vmstat to view the performance state of the system

analysis sql statement explain
the use of keywords can explain the analog sql query optimizer performs, so learn how to deal with MySQL sql statement.

±—±------------±------±-----------±-----±--------------±----±--------±-----±-----±---------±------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
±—±------------±------±-----------±-----±--------------±----±--------±-----±-----±---------±------+

ID
SELECT query sequence number, contains a set of numbers can be repeated, a sequence sql statement query execution. There are three general cases:
first: all the same id, sql execution order is from top to bottom;
second: id all be different, the order of priority is performed according to sql id large;
third: id presence of both the same, but there are different. According to id a big priority execution, and then executed in accordance with the same id from top to bottom.

select_type
type select queries, mainly used to distinguish between normal query, the union query, nested complex queries
simple: simply select the query, the query does not contain sub-queries or of Union
Primary: If the query contains any complex sub-queries, most outer query were marked as Primary
subquery: contains the sub-query select list or where
derived: from sub-query included in the list are marked as derived (derived) MySQL recursively execute these sub-queries, the results in a temporary table.
union: If after the second select appears in the union, were labeled union, if the union included in the sub clause query from the outer layer to select is marked: derived
union Result: Getting results from a table select union

partitions
table used by the partition, if you want to count the amount of years the company's orders, data can be divided into ten districts, each year on behalf of a region. This can greatly improve query efficiency.

type
This is a very important parameter, connection type, common are: all, index, range, ref , eq_ref, const, system, null eight levels. The performance of the worst sort from best to: system> const> eq_ref> ref > range> index> all.

For java programmers, if the query to reach at least level or range to achieve the best ref is regarded as a good programmer but responsible.

all: (full table scan) full table scan is undoubtedly the worst, if millions and level data amount, a full table scan will be very slow.

index: (full index scan) scan the whole index file much better than all, after all, look for data from the index tree to find data faster than the whole table.

range: retrieve only a given range of line, using the index to match the line. Narrow, of course, scan faster than full table scan and index the whole file. sql statement generally have between, in,>, <and other inquiries.

ref: non-unique index scan, in essence, is a kind of index access, returns all rows that match a single value. Such as querying all my colleagues belong to R & D team, the result of the match is not the only multiple values.

eq_ref: unique index scan, for each index key, the table has a record matching. For example, query the company's CEO, could only be the result of matching a record,

const: representation can be found through the index once, const for comparing the primary key or unique index. Because only one row of data matching, so soon, if the primary key as to where the list, MySQL will be able to convert the query to a constant.

system: only one record table (equal to the system table), which is a special type const column, usually do not occur, can learn

possible_keys
display the index query might use (one or more or is null), the query may not be practical to use. For reference only.

key
to display the index query actually used. If it is null, it means no use of the index.

key_len
shows the number of bytes used in the index, the index can be used in the query length is calculated by key_len. Without loss of accuracy index length as short as possible. most probable length values are displayed key_len index fields, not actual length, i.e. calculated on the basis key_len table definition is not retrieved by the table.

ref
display indexes which column or constant value is used to find the index of the column.

rows
according to the selection table statistics and case index, a rough estimate of the number of lines needed to find the record to be read, not larger values.

Extra
the Using filesort: MySQL will be described using an external data ordering index, instead of reading the index order in the table. MySQL can not use indexes to complete the sorting operation is called "Sort files." This will optimize the sql appear immediately.

Using temporary: the use of a temporary table to hold intermediate results, MySQL use temporary tables when sorting query results. Common in order by sorting and grouping query group by. This appears also to optimize the sql immediately.

Using index: represented by a corresponding select operation using the cover index (Covering index), to avoid accessing the data row of the table, good results! If the Using where the same time, shows that the index is used to perform a lookup index key values. If it does not Using where at the same time that is the index used to read data rather than performing a lookup operation.

Covering index (Covering Index): also called index covering, that is, the column select the data can be acquired only from the index, without reading the data row, MySQL can use an index to return the select list of fields, according to the index without having to read again data files.

Using index condition: after 5.6 version adds new features, the optimizer will exist in the case of the index, in line with the ratio selected by RANGE range and the number of total use an index or full table traverse.

Using where: show uses where filtering.

Using join buffer: that the use of the connection cache.

impossible where: value where the statement is always false, is not available and can not be used to obtain any element.

distinct: Optimization of distinct operations, the first matching tuple is found after stopping the operation to find the same value.

filtered
value of a percentage of the value of rows and columns is used together, the result set can be estimated query execution plan (the QEP) in front of a table, to determine the number of join operations cycles. Small table-driven large table, to reduce the number of connections.

By introduction explain the parameters, we can see that:

  1. Table reading order (id)
  2. Data read operation operation type (type)
  3. Which indexes are actually used (key)
  4. References between tables (ref)
  5. Each table of how many rows the query optimizer (rows)

For performance reasons falling
from a programmer's point of view

  1. Query badly written
  2. Did not build the index, index or index-built unreasonable failure
  3. There are too many join relational query

From the server's

  1. Server disk space shortage
  2. Server tuning configuration parameters is unreasonable

to sum up

  1. The index is sorted and quickly find the data structure. Its purpose is to improve the efficiency of queries.
  2. After creating the index, query data faster, but slow to update the data.
  3. The cause of performance degradation is likely to lead to failure of the index.
  4. Principles index creation, the field frequently queried for creating an index, frequently updated data is not required for creating an index.
  5. Index field frequently updated, or deleted table data is likely to cause physical failure of the index.
  6. Unauthorized explain sql statement analysis
  7. In addition to optimizing sql statement, but also can optimize the design table. The single-table queries made possible to reduce the correlation between the tables. Design archive lists.

Here, MySQL index optimization analysis is over, there is nothing wrong with it, we can put forward. If you feel good you can tap praise.

Published 78 original articles · won praise 9 · views 6194

Guess you like

Origin blog.csdn.net/WANXT1024/article/details/104379816