Some questions about indexing applications

What is an index: a data structure that speeds up retrieval

Advantages and disadvantages of indexes

Advantages of indexes:

1. After the index is established, the speed of database retrieval data rises linearly (if used correctly), the larger the amount of data, the more obvious

2. When grouping and sorting, you can use indexes to speed up

3. By establishing a unique index, the uniqueness of the data can be ensured, and there is no need to add other restrictions (both the index is established and the uniqueness is guaranteed)

4. When querying tables, building an index based on the primary and foreign key fields can bring about a very obvious performance improvement.

shortcoming:

1. Indexing will generate local disk files, which require additional space to store index data, and the disk usage will increase.

2. When writing data, it is necessary to maintain the index structure additionally. When adding, deleting, and changing data, additional operations on the index are required.

3. Maintaining the index when writing data requires additional time overhead, SQLthe efficiency of writing will be reduced, and the performance will be reduced.

Advantages and disadvantages of various indexes

primary key index

First of all, the field that is too long should not be used as the primary key index, and the primary key index field should be unique

So we generally use self-incrementing ID as the primary key index, which can meet these two requirements

But why don't we use UUID as the primary key index? This also meets our requirements

Because the InnoDB storage engine uses the B+ tree as the data structure when storing the index, and the B+ tree is based on the binary function, that is, its nodes are ordered (for details, please refer to this blog B+ tree), but the UUID is random That is to say, when we build the index, we need to perform random insertion, and random insertion may disrupt the existing shape of the B+ tree, while the leaf nodes of the primary key index contain all the data. The mobile node will move the data, which is the cost of data migration. great

But using auto-increment IDwill not have this problem, all newly inserted data will be put at the end.

So the primary key adds another requirement, ordered

joint index

The joint index can achieve index coverage, reduce the return operation of the secondary index to improve the retrieval speed, or speed up the grouping and sorting

But for example, we create a joint index on the abc field

Then there are three statements

SELECT * FROM table WHERE a = 1 and b = 1 and c = 1;

SELECT * FROM table WHERE a = 1 and c = 1;

SELECT * FROM table WHERE b = 1 and c = 1; 

Only the third statement cannot use the joint index

Because the leftmost condition of the query condition must match the leftmost condition of the index, that is to say, the first condition of the query condition here must be a to use the joint index

And the joint index will stop matching when it encounters a range query (>, <), such as

select * from t_user  where age > 20 and reward = 100000;

age can use joint index but reward can't

prefix index

The prefix index is to use the first few characters of this field to create an index. Compared with ordinary indexes, it can save storage space. When the amount of data is high enough, the storage space saved will be considerable.

However, the complete value of a field is not stored in its index node, so MySQLit is impossible to complete the equal grouping and sorting work through the prefix index ORDER BY、GROUP BY, and it is also impossible to complete operations such as coverage scanning.

full text index

InnoDB full-text index introduced in MySQL 5.6 and later

Full-text indexing is a technology to find any content information in large texts stored in the database

Suppose we search for MYSQL on Baidu and look at the search results. The red part is the place corresponding to the keyword.

This is the effect of our full-text index. The field to be searched is broken into several roots and then retrieved in the full text.

 Do you feel that the effect of fuzzy matching is similar to that of fuzzy matching (it seems to be a big fuzzy matching). Although fuzzy matching can achieve the effect, as the table becomes larger and more data, its performance will decline significantly, and the introduction of full-text index Then this problem can be perfectly solved, and the full-text index can be used instead of like%syntax to realize fuzzy query, and its performance will be like%twice as fast N.

Disadvantages of full-text indexing:

①Because the full-text index is implemented based on word segmentation, after a full-text index is established for a field, word MySQLsegmentation processing will be performed on the field, and these word segmentation results will also be stored in the full-text index, so the file of the full-text index will be extra large!

②Because the full-text index will perform word segmentation for each field value, it takes time to segment the word after modifying the field value, so the full-text index will not be automatically updated immediately after modifying the field data. At this time, we need to write a stored procedure and call it Manually update the data in the full-text index.

③In addition to the above two points, the biggest flaw of the full-text index is that it is not friendly enough to support Chinese. Similar to English, words can be divided directly through symbols and spaces, but what about Chinese? To describe it in one word is broad and profound, and it is impossible to accurately segment a piece of text. Therefore, there are some accuracy problems in the full-text index when searching Chinese.

unique index

The unique index can ensure the uniqueness of the data. When it searches, it will only return one piece of data and will not look down, while the ordinary index will look down to know that the condition is not met, so the retrieval speed will be faster (actually not much faster because of reading The behavior reads one data page into the memory at a time, and the query operation in the memory is very fast. Details can be seen - common index VS unique index )

when inserting

When the current data page has been loaded into memory

Unique index: first find the insertion position to see if there is any conflict, and then insert if there is no conflict

Normal index: direct insertion

It's not bad for a cpu operation

But if the current data

It won’t work, that is, the unique index cannot use the change buffer, so each insertion operation requires an I/O operation, and we know that I/O operations are laborious

hash index

(I haven't seen it, but it's better to know than not to know, right)

Hash index, that is, Hashthe index of data structure type, but it is estimated that everyone has less contact with it. After all, the B+tree structure is used by default when creating an index. But compared to the query speed, the hash index is definitely MySQLthe well-deserved leader! Because the index using the hash structure stores the index field value in the form of a hash table, when querying data based on this field, only one hash calculation is required to obtain the data.

But the fatal problem of the hash structure is that it is out of order, that is, it is impossible to sort, group, etc. based on the fields of the hash index.

Therefore, if you are sure that you will not do sorting work in a table, you can properly choose the hash structure as the index data structure, which will bring you unexpected performance benefits

index failure

It’s just that the established index is useless, and the full table is scanned directly.

left or left fuzzy matching

The index B+ tree is stored in order according to the "index value", and can only be compared according to the prefix. The left fuzzy match and the left and right fuzzy matches are prefix unknown

use function

Suppose we have built an index on the value column and then used WHERE ​LENGTH(name)=6 when querying. At this time, the index will not be used. The reason is very simple. We have built an index on the value and not on the function.

But now MYSQL is really just a function index. If we create a LENGTH(name) index, then we can use the index when querying.

use expressions

The reason for the failure here is the same as that of using the function. For example, WHERE id + 1 = 10

But if you use WHERE id = 10 - 1, it can be used because MYSQL is lazy to judge, so we can change it ourselves

implicit type conversion

For example, we defined value VARCHAR(20) at the beginning but forgot to write single quotation marks when querying

 For example, WHERE value = 60 MYSQL will think that you are looking for an int type, and then the index will fail

On the other hand, if you add single quotes value = '60' when defining the value INT query, the index will not fail

Why, because: When MySQL encounters a comparison between a string and a number, it will automatically convert the string to a number, and then compare it . And this conversion behavior calls the function

That is, the first method is equivalent to  

where CAST(value AS signed int) = 60;

That is, the index invalidation caused by the use of functions occurred

while the second way

where value  = CAST(60 AS signed int);

This time our judged field does not use the function

Joint index non-leftmost match

Introduced in the section on joint queries above

used OR

If the condition column before the OR is an index column, but the condition column after the OR is not an index column, the index will fail.

Comparison of different field values

If we use WHERE name = 'zhangsan' the index will not fail but

If you use WHERE name_student = name_teacher, it will fail. You cannot use two fields (parameters) for comparison (even if the name is established)

Reverse range operation causes index invalidation

Generally speaking, if SQLit belongs to the forward range query, such as the >、<、between、like、in...operation of waiting, the index can take effect normally, but if SQLthe operation of the reverse range is performed, such as NOT IN、NOT LIKE、IS NOT NULL、!=、<>...the operation of waiting, there will be problems.

However, although IS NULL is a forward query, the index will also fail, and IS NOT NULL will also fail.

Index invalidation caused by MYSQL optimizer

Sometimes a large number of I/O operations on the secondary index (caused by returning to the table) are not as fast as direct full table query MYSQL will automatically perform lower-cost operations

例如:SELECT * FROM table WHERE a>1000000 and b <500000;

There are many rows of this data (estimated by tens of thousands of rows) Our process is

(When the joint index encounters > <, it will stop, that is, a uses the index and b does not use it)

Determine whether a meets the requirements, and then return to the table with the id of a to get the data of b. Judgment

Instead of indexing, it
traverses each piece of data sequentially. Since each data page is loaded into the table, a lot of I/O operations are reduced, and the cost is greatly reduced.

Some index optimization behavior

index coverage

When the data to be queried is all contained in the joint index, the value will be directly obtained from the index to avoid multiple table return operations

For example, SELECT name, age FROM table WHERE..... If name and age are included in the joint index, it will not follow the normal secondary index logic (find the corresponding primary key and then use the primary key to find the data of the entire record)

index pushdown

That is, Serverthe work of filtering data at the layer is pushed down to the engine layer for processing

For example:

SELECT * FROM table where a  > 500 and b = 20;

This SQL will partially use the index and only apply to the index of a

If there is no index pushdown, then its execution process is to first find all entries with a>500, then scan back to the table to determine whether b is equal to 20, and then return the matching results to the user

With the index push down, it is to find the entries with a>500 and judge whether they meet b=20 in the joint index, and filter out those that do not meet

Then go directly back to the table to get the data, which reduces the number of back-to-table operations


MRR (Multi-Range Read) mechanism

Multi-Range ReadReferred to as the mechanism, this is also a performance optimization measure introduced in the version MRRtogether with index pushdown , so what is optimization?MySQL5.6MRR

Generally speaking, in actual business, we should try our best to reduce IOthe number of back-to-table operations through the feature of index coverage, but in many cases, we often have to do back-to-table to query data, but back-to-table will obviously lead to a large number of disks IO, and the more serious point is: there will be a lot of discrete IO, let's give an example to understand.

SELECT * FROM `zz_student_score` WHERE `score` BETWEEN 0 AND 59;

The above SQLwork is very simple. It is to query the information of all students who have failed the grades in the student grade table. Assuming that there is a common index on the grade field, then think about it, what is the execution process of this one SQL?

  • ① First find the score node on the index of the score field 0, and then take IDthe return table to get the student information with zero score.
  • ② Go back to the grade index again, continue to find all 1the points of the nodes, and continue to return to the table to get 1the student information of the points.
  • ③Go back to the score index again, and continue to find all 2the points of the nodes......
  • ④ Repeat this process over and over again until 0~59all the student information of the points is obtained.

At this time, it is assumed that the 0~5table data of scores is located page_01on the page of disk space, and the data of scores is located on the page 5~10of disk space , and the data of scores is located on the page of disk space . At this time, when returning to the table for query, it will result in switching back and forth between the two page spaces, but the divided data can be merged completely, and then read only once , which can reduce the number of times and avoid discreteness at the same time .page_0210~15page_01page_01、page_020~5、10~15page_01IOIO

The MRRmechanism is mainly to solve this problem. For the back-to-table query of the auxiliary index, it reduces discreteness IOand IOconverts random into order IO, thereby improving query efficiency.

MRRIn the mechanism, for the query in the auxiliary index ID, it will be placed in the buffer read_rnd_buffer, and then after all the index retrieval work is completed, or when the data in the buffer reaches read_rnd_buffer_sizethe size, MySQLthe data in the buffer will be deleted at this time Sort to get an ordered IDcollection: rest_sort, and finally go IOto the cluster/primary key index to query data back into the table according to the order.

Index Skip Scan index skip scan

There are too many restrictions, just understand 

We built an index on abc

SELECT * FROM WHERE b = 1 and c = 1;

Originally, this SQL should not use indexes, but MYSQL helps us "skip" a and force the use of indexes

In fact, MySQLthe optimizer will automatically de-duplicate the value of the first field in the joint index, and then splicing all the values ​​​​based on the de-duplication to check again

Advice on using indexes

Here is the original text of https://juejin.cn/post/7149074488649318431

  • ① The fields that are frequently used as query conditions should be considered to create indexes as appropriate.
  • ② The primary and foreign keys of the table or the fields of the linked table must be indexed, because it can greatly improve the performance of the linked table query.
  • ③For the fields to be indexed, the general value should be sufficiently differentiated, so as to improve the retrieval efficiency of the index.
  • ④ The value of the indexed field should not be too long. If a longer field needs to be indexed, you can choose a prefix index.
  • ⑤ To establish a joint index, the principle of the leftmost prefix should be followed, and multiple fields should be combined in order of priority.
  • ⑥ Indexes should be created for fields that are often valued, sorted, and grouped according to the range, because the index is ordered and can speed up the sorting time.
  • ⑦For a unique index, if it is confirmed that this field will not be used for sorting, then the structure can be changed to Hasha structure.
  • ⑧Try to use the joint index instead of the single-value index. The joint index is more efficient than multiple single-value index queries.

At the same time, in addition to the above-mentioned principles of indexing, there are some points to pay attention to when indexing:

❶ Fields whose values ​​are often added, deleted, or modified are not suitable for indexing, because the index structure needs to be maintained after each change.

❷ When a field has a large number of duplicate values, it is not suitable for indexing, such as the gender field in the previous example. (However, low selectivity and skewed data can be used. For example, three states are repetitive in principle, but if one state is particularly small and the other state is very large, indexes can also be used (CBO automatic optimization))

❸ Indexes cannot participate in calculations, so fields that are often queried with functions are not suitable for indexing.

❹ The number of indexes in a table is not as many as possible. Generally 3, it should be controlled at a maximum and cannot be exceeded 5.

❺ When building a joint index, you must consider the priority, and the field with the highest query frequency should be placed first.

❻ When the data in the table is small, the index should not be established, because when the amount of data is not large, the maintenance of the index will cost more.

❼ When the field values ​​of the index are out of order, it is not recommended to build an index, because it will cause page splitting, especially the primary key index.

Although the first eight points do not have to be followed, the last seven points must be avoided

Guess you like

Origin blog.csdn.net/chara9885/article/details/131624475