What is an index: a data structure that speeds up retrieval
Advantages and disadvantages of indexes
Advantages of indexes:
1. After the index is established, the speed of database retrieval data rises linearly (if used correctly), the larger the amount of data, the more obvious
2. When grouping and sorting, you can use indexes to speed up
3. By establishing a unique index, the uniqueness of the data can be ensured, and there is no need to add other restrictions (both the index is established and the uniqueness is guaranteed)
4. When querying tables, building an index based on the primary and foreign key fields can bring about a very obvious performance improvement.
shortcoming:
1. Indexing will generate local disk files, which require additional space to store index data, and the disk usage will increase.
2. When writing data, it is necessary to maintain the index structure additionally. When adding, deleting, and changing data, additional operations on the index are required.
3. Maintaining the index when writing data requires additional time overhead, SQL
the efficiency of writing will be reduced, and the performance will be reduced.
Advantages and disadvantages of various indexes
primary key index
First of all, the field that is too long should not be used as the primary key index, and the primary key index field should be unique
So we generally use self-incrementing ID as the primary key index, which can meet these two requirements
But why don't we use UUID as the primary key index? This also meets our requirements
Because the InnoDB storage engine uses the B+ tree as the data structure when storing the index, and the B+ tree is based on the binary function, that is, its nodes are ordered (for details, please refer to this blog B+ tree), but the UUID is random That is to say, when we build the index, we need to perform random insertion, and random insertion may disrupt the existing shape of the B+ tree, while the leaf nodes of the primary key index contain all the data. The mobile node will move the data, which is the cost of data migration. great
But using auto-increment
ID
will not have this problem, all newly inserted data will be put at the end.So the primary key adds another requirement, ordered
joint index
The joint index can achieve index coverage, reduce the return operation of the secondary index to improve the retrieval speed, or speed up the grouping and sorting
But for example, we create a joint index on the abc field
Then there are three statements
SELECT * FROM table WHERE a = 1 and b = 1 and c = 1;
SELECT * FROM table WHERE a = 1 and c = 1;
SELECT * FROM table WHERE b = 1 and c = 1;
Only the third statement cannot use the joint index
Because the leftmost condition of the query condition must match the leftmost condition of the index, that is to say, the first condition of the query condition here must be a to use the joint index
And the joint index will stop matching when it encounters a range query (>, <), such as
select * from t_user where age > 20 and reward = 100000;
age can use joint index but reward can't
prefix index
The prefix index is to use the first few characters of this field to create an index. Compared with ordinary indexes, it can save storage space. When the amount of data is high enough, the storage space saved will be considerable.
However, the complete value of a field is not stored in its index node, so
MySQL
it is impossible to complete the equal grouping and sorting work through the prefix indexORDER BY、GROUP BY
, and it is also impossible to complete operations such as coverage scanning.full text index
InnoDB full-text index introduced in MySQL 5.6 and later
Full-text indexing is a technology to find any content information in large texts stored in the database
Suppose we search for MYSQL on Baidu and look at the search results. The red part is the place corresponding to the keyword.
This is the effect of our full-text index. The field to be searched is broken into several roots and then retrieved in the full text.
Do you feel that the effect of fuzzy matching is similar to that of fuzzy matching (it seems to be a big fuzzy matching). Although fuzzy matching can achieve the effect, as the table becomes larger and more data, its performance will decline significantly, and the introduction of full-text index Then this problem can be perfectly solved, and the full-text index can be used instead of
like%
syntax to realize fuzzy query, and its performance will belike%
twice as fastN
.Disadvantages of full-text indexing:
①Because the full-text index is implemented based on word segmentation, after a full-text index is established for a field, word
MySQL
segmentation processing will be performed on the field, and these word segmentation results will also be stored in the full-text index, so the file of the full-text index will be extra large!②Because the full-text index will perform word segmentation for each field value, it takes time to segment the word after modifying the field value, so the full-text index will not be automatically updated immediately after modifying the field data. At this time, we need to write a stored procedure and call it Manually update the data in the full-text index.
③In addition to the above two points, the biggest flaw of the full-text index is that it is not friendly enough to support Chinese. Similar to English, words can be divided directly through symbols and spaces, but what about Chinese? To describe it in one word is broad and profound, and it is impossible to accurately segment a piece of text. Therefore, there are some accuracy problems in the full-text index when searching Chinese.
unique index
The unique index can ensure the uniqueness of the data. When it searches, it will only return one piece of data and will not look down, while the ordinary index will look down to know that the condition is not met, so the retrieval speed will be faster (actually not much faster because of reading The behavior reads one data page into the memory at a time, and the query operation in the memory is very fast. Details can be seen - common index VS unique index )
when inserting
When the current data page has been loaded into memory
Unique index: first find the insertion position to see if there is any conflict, and then insert if there is no conflict
Normal index: direct insertion
It's not bad for a cpu operation
But if the current data
It won’t work, that is, the unique index cannot use the change buffer, so each insertion operation requires an I/O operation, and we know that I/O operations are laborious
hash index
(I haven't seen it, but it's better to know than not to know, right)
Hash index, that is,
Hash
the index of data structure type, but it is estimated that everyone has less contact with it. After all, theB+
tree structure is used by default when creating an index. But compared to the query speed, the hash index is definitelyMySQL
the well-deserved leader! Because the index using the hash structure stores the index field value in the form of a hash table, when querying data based on this field, only one hash calculation is required to obtain the data.But the fatal problem of the hash structure is that it is out of order, that is, it is impossible to sort, group, etc. based on the fields of the hash index.
Therefore, if you are sure that you will not do sorting work in a table, you can properly choose the hash structure as the index data structure, which will bring you unexpected performance benefits
index failure
It’s just that the established index is useless, and the full table is scanned directly.
left or left fuzzy matching
The index B+ tree is stored in order according to the "index value", and can only be compared according to the prefix. The left fuzzy match and the left and right fuzzy matches are prefix unknown
use function
Suppose we have built an index on the value column and then used WHERE LENGTH(name)=6 when querying. At this time, the index will not be used. The reason is very simple. We have built an index on the value and not on the function.
But now MYSQL is really just a function index. If we create a LENGTH(name) index, then we can use the index when querying.
use expressions
The reason for the failure here is the same as that of using the function. For example, WHERE id + 1 = 10
But if you use WHERE id = 10 - 1, it can be used because MYSQL is lazy to judge, so we can change it ourselves
implicit type conversion
For example, we defined value VARCHAR(20) at the beginning but forgot to write single quotation marks when querying
For example, WHERE value = 60 MYSQL will think that you are looking for an int type, and then the index will fail
On the other hand, if you add single quotes value = '60' when defining the value INT query, the index will not fail
Why, because: When MySQL encounters a comparison between a string and a number, it will automatically convert the string to a number, and then compare it . And this conversion behavior calls the function
That is, the first method is equivalent to
where CAST(value AS signed int) = 60;
That is, the index invalidation caused by the use of functions occurred
while the second way
where value = CAST(60 AS signed int);
This time our judged field does not use the function
Joint index non-leftmost match
Introduced in the section on joint queries above
used OR
If the condition column before the OR is an index column, but the condition column after the OR is not an index column, the index will fail.
Comparison of different field values
If we use WHERE name = 'zhangsan' the index will not fail but
If you use WHERE name_student = name_teacher, it will fail. You cannot use two fields (parameters) for comparison (even if the name is established)
Reverse range operation causes index invalidation
Generally speaking, if SQL
it belongs to the forward range query, such as the >、<、between、like、in...
operation of waiting, the index can take effect normally, but if SQL
the operation of the reverse range is performed, such as NOT IN、NOT LIKE、IS NOT NULL、!=、<>...
the operation of waiting, there will be problems.
However, although IS NULL is a forward query, the index will also fail, and IS NOT NULL will also fail.
Index invalidation caused by MYSQL optimizer
Sometimes a large number of I/O operations on the secondary index (caused by returning to the table) are not as fast as direct full table query MYSQL will automatically perform lower-cost operations
例如:SELECT * FROM table WHERE a>1000000 and b <500000;
There are many rows of this data (estimated by tens of thousands of rows) Our process is
(When the joint index encounters > <, it will stop, that is, a uses the index and b does not use it)
Determine whether a meets the requirements, and then return to the table with the id of a to get the data of b. Judgment
Instead of indexing, it
traverses each piece of data sequentially. Since each data page is loaded into the table, a lot of I/O operations are reduced, and the cost is greatly reduced.
Some index optimization behavior
index coverage
When the data to be queried is all contained in the joint index, the value will be directly obtained from the index to avoid multiple table return operations
For example, SELECT name, age FROM table WHERE..... If name and age are included in the joint index, it will not follow the normal secondary index logic (find the corresponding primary key and then use the primary key to find the data of the entire record)
index pushdown
That is, Server
the work of filtering data at the layer is pushed down to the engine layer for processing
For example:
SELECT * FROM table where a > 500 and b = 20;
This SQL will partially use the index and only apply to the index of a
If there is no index pushdown, then its execution process is to first find all entries with a>500, then scan back to the table to determine whether b is equal to 20, and then return the matching results to the user
With the index push down, it is to find the entries with a>500 and judge whether they meet b=20 in the joint index, and filter out those that do not meet
Then go directly back to the table to get the data, which reduces the number of back-to-table operations
MRR (Multi-Range Read) mechanism
Multi-Range Read
Referred to as the mechanism, this is also a performance optimization measure introduced in the version MRR
together with index pushdown , so what is optimization?MySQL5.6
MRR
Generally speaking, in actual business, we should try our best to reduce
IO
the number of back-to-table operations through the feature of index coverage, but in many cases, we often have to do back-to-table to query data, but back-to-table will obviously lead to a large number of disksIO
, and the more serious point is: there will be a lot of discreteIO
, let's give an example to understand.
SELECT * FROM `zz_student_score` WHERE `score` BETWEEN 0 AND 59;
The above SQL
work is very simple. It is to query the information of all students who have failed the grades in the student grade table. Assuming that there is a common index on the grade field, then think about it, what is the execution process of this one SQL
?
- ① First find the score node on the index of the score field
0
, and then takeID
the return table to get the student information with zero score. - ② Go back to the grade index again, continue to find all
1
the points of the nodes, and continue to return to the table to get1
the student information of the points. - ③Go back to the score index again, and continue to find all
2
the points of the nodes...... - ④ Repeat this process over and over again until
0~59
all the student information of the points is obtained.
At this time, it is assumed that the 0~5
table data of scores is located page_01
on the page of disk space, and the data of scores is located on the page 5~10
of disk space , and the data of scores is located on the page of disk space . At this time, when returning to the table for query, it will result in switching back and forth between the two page spaces, but the divided data can be merged completely, and then read only once , which can reduce the number of times and avoid discreteness at the same time .page_02
10~15
page_01
page_01、page_02
0~5、10~15
page_01
IO
IO
The
MRR
mechanism is mainly to solve this problem. For the back-to-table query of the auxiliary index, it reduces discretenessIO
andIO
converts random into orderIO
, thereby improving query efficiency.
MRR
In the mechanism, for the query in the auxiliary index ID
, it will be placed in the buffer read_rnd_buffer
, and then after all the index retrieval work is completed, or when the data in the buffer reaches read_rnd_buffer_size
the size, MySQL
the data in the buffer will be deleted at this time Sort to get an ordered ID
collection: rest_sort
, and finally go IO
to the cluster/primary key index to query data back into the table according to the order.
Index Skip Scan index skip scan
There are too many restrictions, just understand
We built an index on abc
SELECT * FROM WHERE b = 1 and c = 1;
Originally, this SQL should not use indexes, but MYSQL helps us "skip" a and force the use of indexes
In fact, MySQL
the optimizer will automatically de-duplicate the value of the first field in the joint index, and then splicing all the values based on the de-duplication to check again
Advice on using indexes
Here is the original text of https://juejin.cn/post/7149074488649318431
- ① The fields that are frequently used as query conditions should be considered to create indexes as appropriate.
- ② The primary and foreign keys of the table or the fields of the linked table must be indexed, because it can greatly improve the performance of the linked table query.
- ③For the fields to be indexed, the general value should be sufficiently differentiated, so as to improve the retrieval efficiency of the index.
- ④ The value of the indexed field should not be too long. If a longer field needs to be indexed, you can choose a prefix index.
- ⑤ To establish a joint index, the principle of the leftmost prefix should be followed, and multiple fields should be combined in order of priority.
- ⑥ Indexes should be created for fields that are often valued, sorted, and grouped according to the range, because the index is ordered and can speed up the sorting time.
- ⑦For a unique index, if it is confirmed that this field will not be used for sorting, then the structure can be changed to
Hash
a structure. - ⑧Try to use the joint index instead of the single-value index. The joint index is more efficient than multiple single-value index queries.
At the same time, in addition to the above-mentioned principles of indexing, there are some points to pay attention to when indexing:
❶ Fields whose values are often added, deleted, or modified are not suitable for indexing, because the index structure needs to be maintained after each change.
❷ When a field has a large number of duplicate values, it is not suitable for indexing, such as the gender field in the previous example. (However, low selectivity and skewed data can be used. For example, three states are repetitive in principle, but if one state is particularly small and the other state is very large, indexes can also be used (CBO automatic optimization))
❸ Indexes cannot participate in calculations, so fields that are often queried with functions are not suitable for indexing.
❹ The number of indexes in a table is not as many as possible. Generally 3
, it should be controlled at a maximum and cannot be exceeded 5
.
❺ When building a joint index, you must consider the priority, and the field with the highest query frequency should be placed first.
❻ When the data in the table is small, the index should not be established, because when the amount of data is not large, the maintenance of the index will cost more.
❼ When the field values of the index are out of order, it is not recommended to build an index, because it will cause page splitting, especially the primary key index.
Although the first eight points do not have to be followed, the last seven points must be avoided