Understand SQL index and optimization in ten minutes

Index concept and function

Indexing is a technology for ordering records. It can be specified to be sorted by a certain column/a few columns in advance, thereby greatly improving the query speed (similar to searching by pinyin or strokes in a Chinese dictionary).

The main function of the index is to speed up the data search speed and improve the performance of the database.

MySQL index type

From the perspective of physical storage, indexes can be divided into clustered indexes and non-clustered indexes.

 

1. Clustered Index (Clustered Index)

The clustered index determines the physical ordering of data on the disk, and a table can only have one clustered index.

The logical order of the key values ​​in the index determines the physical order of the corresponding rows in the table (the physical storage address of the data in the index is the same as the order of the index) , which can be understood as follows: as long as the index is continuous, then the data is on the storage medium The storage location is also continuous.

  • If a primary key is defined, then this primary key is used as a clustered index
  • If no primary key is defined, then the first unique non-empty index of the table is used as the clustered index
  • If there is no primary key and no suitable unique index, then innodb will generate a hidden primary key as a clustered index. The hidden primary key is a 6-byte column, and the value of the changed column will increase automatically as the data is inserted.

The InnoDB engine adds a clustered index to each table, and the data pointed to by the clustered index is stored in the order of physical disks. The self-incrementing primary key will automatically insert the data backwards, avoiding the clustered index during the insertion process. Sorting problem. If the clustered index is sorted, this will bring about a very large loss of disk IO performance.

2. Non-clustered Index (Non-clustered Index)

The non-clustered index does not determine the physical ordering of the data on the disk. The index only contains the indexed data and a row locator row-locator. This row locator can be understood as a pointer to the physical ordering of the clustered index. This pointer can find row data.

From a logical point of view, the index can be divided into the following categories.

  • Ordinary index: The most basic index, it has no restrictions.

  • Unique index: Similar to a normal index, the difference is that the value of the index column must be unique, but null values ​​are allowed. If it is a composite index, the combination of column values ​​must be unique.

  • Primary key index: It is a special unique index that is used to uniquely identify a record in the data table. Null values ​​are not allowed. Generally, the primary key is used to constrain. The relationship between the primary key and the clustered index is detailed in Question 4 in "Detailed Problem Solving".

  • Joint index (also called compound index): An index built on multiple fields can speed up the retrieval of compound query conditions.

  • Full-text indexing: The full-text index that comes with the old version of MySQL can only be used for data tables whose database engine is MyISAM. The new version of InnoDB of MySQL 5.6 supports full-text indexing. By default, MySQL does not support Chinese full-text search. You can support Chinese by extending MySQL, adding Chinese full-text search or providing a corresponding English index table for the Chinese content table.

 

MySQL index optimization rules

The MySQL index can be optimized through the following rules.

1. The leading fuzzy query cannot use the index.

For example, the following SQL statement cannot use an index.

select * fromdoc where title like '%XX'

Instead of leading fuzzy queries, you can use indexes, such as the following SQL statement.

select * fromdoc where title like 'XX%'

Page search is strictly prohibited left fuzzy or full fuzzy, if necessary, you can use a search engine to solve it.

2. Union, in, or can all hit the index, it is recommended to use in.

  • union: Ability to hit the index.

The sample code is as follows:

select * fromdoc where status=1

union all

select * fromdoc where status=2

Tell MySQL how to do it directly. MySQL consumes the least CPU, but it is generally not written like this.

  • in: Able to hit the index.

The sample code is as follows:

select * fromdoc where status in (1, 2)

Query optimization consumes more CPU than union all, but it can be ignored. In general, it is recommended to use in

  • or: The new version of MySQL can hit the index.

The sample code is as follows:

select * fromdoc where status = 1 or status = 2

Query optimization consumes more CPU than in, and it is not recommended to use or frequently.

3. Negative conditional queries cannot use indexes, and can be optimized as in queries.

Negative conditions include: !=, <>, not in, not exists, not like, etc.

For example, the following code:

select * fromdoc where status != 1 and status != 2

Can be optimized to in query:

select * fromdoc where status in (0,3,4)

4. The principle of the leftmost prefix of the joint index (also called the leftmost query)

  • If a joint index is established on the three fields (a, b, c), then it can speed up the query speed of a | (a, b) | (a, b, c).

For example, log in business requirements, the code is as follows.

selectuid, login_time from user where login_name=? andpasswd=?

A joint index of (login_name, passwd) can be established.

Because there is almost no single-condition query requirement for passwd in the business, but there are many single-condition query requirements for login_name, a joint index of (login_name, passwd) can be established instead of (passwd, login_name).

  • When building a joint index, the field with the highest degree of discrimination is on the far left.

  • If the (a, b) joint index is established, there is no need to create an index separately. In the same way, if the (a, b, c) joint index is established, there is no need to separately establish a, (a, b) indexes.

  • When there are mixed judgment conditions of non-equal sign and equal sign, please put the column of equal sign condition in front when constructing index. For example, where a>? and b=?, even if a has a higher degree of discrimination, b must be placed in the forefront of the index.

  • The leftmost query requirement does not mean that the where order of the SQL statement should be consistent with the joint index.

The following SQL statement can also hit (login_name, passwd) this joint index.

selectuid, login_time from user where passwd=? andlogin_name=?

However, it is recommended that the order after where is the same as the joint index, and develop a good habit.

5. Indexes can be used for range columns (the combined index must be the leftmost prefix).

  • Range conditions are: <, <=, >, >=, between, etc.

  • Indexes can be used for range columns (the joint index must be the leftmost prefix), but the columns after the range column cannot be used for indexes. The index can be used for at most one range column. If there are two range columns in the query condition, the index cannot be used for all .

If there is a joint index (empno, title, fromdate), then the emp_no in the following SQL can use the index, but the title and from_date cannot use the index.

select * fromemployees.titles where emp_no < 10010' and title='Senior Engineer'and from_date between '1986-01-01' and '1986-12-31'

6. Put calculations in the business layer instead of the database layer.

  • Calculations on the field cannot hit the index.

For example, the following SQL statement.

select * fromdoc where YEAR(create_time) <= '2016'

Even if an index is established on date, the entire table will be scanned, which can be optimized for value calculation, as follows:

select * fromdoc where create_time <= '2016-01-01'

  • Put calculations on the business layer.

This not only saves the CPU of the database, but also optimizes the query cache.

For example, the following SQL statement:

select * fromorder where date < = CURDATE()

Can be optimized as:

select * fromorder where date < = '2018-01-2412:00:00'

The optimized SQL releases the CPU of the database for multiple calls, and the query cache can be used only if the incoming SQL is the same.

7. Forced type conversion will scan the entire table

If the phone field is of varchar type, the following SQL cannot hit the index.

 select * fromuser where phone=13800001234

Can be optimized as:

select * fromuser where phone='13800001234'

8. It is not advisable to build indexes on fields that are updated frequently and whose data is not highly distinguished.

  • Updates will change the B+ tree, and indexing frequently updated fields will greatly reduce database performance.

  • The attribute of "gender" is not very distinguishable. Indexing is meaningless, and the data cannot be filtered effectively. The performance is similar to that of a full table scan.

  • Generally, the index can be created when the degree of discrimination is above 80%, and the degree of discrimination can be calculated using count(distinct(column name))/count(*).

9. Use the covering index to perform query operations to avoid returning to the table.

The data of the queried column can be retrieved from the index instead of the row locator row-locator and then retrieved on the row, that is, "the query column must be covered by the built index", which can speed up the query.

For example, log in business requirements, the code is as follows.

selectuid, login_time from user where login_name=? andpasswd=?

A joint index of (login_name, passwd, login_time) can be established. Since login_time has been established in the index, the uid and login_time being queried do not need to go to the row to obtain data, thus speeding up the query.

10. If there are order by and group by scenarios, please pay attention to the orderliness of the index.

  • The last field of order by is a part of the composite index and is placed at the end of the index combination order to avoid file_sort and affect query performance.

  • For example, for the statement where a=? and b=? order by c, a joint index (a, b, c) can be established.

  • If there is a range search in the index, then the order of the index cannot be used, such as WHERE a>10 ORDER BY b;, the index (a, b) cannot be sorted.

11. Use short index (also called prefix index) to optimize the index.

Prefix index is to use the prefix of the column instead of the entire column as the index key. When the prefix length is appropriate, it can make the distinction of the prefix index close to the full column index, and at the same time reduce the size and size of the index file because the index key becomes shorter. For maintenance overhead, count(distinct left(column name, index length))/count(*) can be used to calculate the distinction of prefix index.

Prefix index takes into account index size and query speed, but its disadvantage is that it cannot be used for ORDER BY and GROUP BY operations, nor can it be used for covering indexes (Covering Index, that is, when the index itself contains all the data required for the query, the data file is no longer accessed Itself), in many cases it is not necessary to index all fields, and the index length can be determined according to the actual text discrimination.

For example, the following SQL statement:

SELEC *FROM employees.employees WHERE first_name='Eric'AND last_name='Anido';

We can create an index: (firstname, lastname(4)).

12. The column to be indexed is not allowed to be null.

Single-column indexes do not store null values, and composite indexes do not store all-null values. If the column is allowed to be null, you may get "unexpected" result sets. Therefore, please use the not null constraint and the default value.

13. Use delayed associations or sub-queries to optimize super-multi-page scenarios.

MySQL does not skip the offset rows, but takes the offset+N rows, then returns the offset rows before giving up, and returns N rows. When the offset is particularly large, the efficiency is very low, or it controls the total number of pages returned, or SQL rewrite for the number of pages that exceed a certain threshold.

The example is as follows, first quickly locate the id segment that needs to be obtained, and then associate:

selecta.* from 表1 a,(select id from 表1 where 条件 limit100000,20 ) b where a.id=b.id

14. Fields with unique characteristics in business, even if it is a combination of multiple fields, must build a unique index.

Don't think that the unique index affects the insert speed. This speed loss can be ignored, but it is obvious to improve the search speed. In addition, even if a very complete verification control is done at the application layer, as long as there is no unique index, according to Murphy's law, dirty data must be generated.

15. It is best not to join more than three tables.

The data types of the fields that need to be joined must be consistent. When multiple tables are associated with queries, ensure that the associated fields need to have indexes.

16. If you know that only one result is returned, limit 1 can improve efficiency.

For example, the following SQL statement:

select * fromuser where login_name=?

Can be optimized as:

select * fromuser where login_name=? limit 1

I clearly know that there is only one result, but the database does not know it, so I tell it clearly and let it actively stop the cursor movement.

17. SQL performance optimization explain the type: at least to reach the range level, the requirement is the ref level, if it can be consts the best.

  • consts: There is at most one matching row (primary key or unique index) in a single table, and the data can be read during the optimization phase.

  • ref: Use a normal index (Normal Index).

  • range: Perform a range search on the index.

  • When type=index, the index physical file is scanned completely, and the speed is very slow.

18. The single-table index is recommended to be controlled within 5.

19. The number of single index fields is not allowed to exceed 5.

When there are more than 5 fields, it can no longer effectively filter the data.

20. Avoid the following misconceptions when creating an index

  • The more indexes, the better, thinking that one query needs to build an index.

  • Ning Que is not overrun, thinking that the index will consume space and seriously slow down the update and new addition speed.

  • Resist the unique index, believing that the uniqueness of the business needs to be resolved at the application layer through "check before insert".

  • Optimize too early, start optimizing without understanding the system.

About the use of the leftmost prefix

There are two explanations below:

  • The left-most prefix matching principle, a very important principle, mysql will always match to the right until it encounters a range query (>, <, between, like) and stop matching , such as a = 1 and b = 2 and c> 3 and d = 4 If you create an index in the order of (a, b, c, d), the index is not used for d. If you create an index of (a, b, d, c), you can all use it, in the order of a, b, d Can be adjusted arbitrarily.
  • = And in can be out of order, such as a = 1 and b = 2 and c = 3 The index can be established in any order (a, b, c), and MySQL's query optimizer will help you optimize it into a form that the index can recognize

Guess you like

Origin blog.csdn.net/qq_27828675/article/details/102621726