10,000 words + 30 pictures to explain the concept and principle of mysql index in an all-round way

By the way, this article mainly explains the InnoDB storage engine.

  1. index classification

Index classification can be classified from different dimensions

1. Divided by the data structure used
  • B+ tree index

  • Hash index

2. According to the actual physical storage data structure division
  • clustered index

  • Nonclustered Index (Secondary Index)

Clustered indexes and non-clustered indexes will be emphasized later.

3. Divided by index characteristics
  • primary key index

  • unique index

  • normal index

  • full text index

4. Divide by the number of fields
  • single column index

  • joint index

  1. index data structure

2.1 preparation

In order to better explain the next article, here I have prepared a usertable, and the examples of the entire article will be explained with this table

CREATE TABLE `user` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `age` int(10) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

2.2 Hash index

Hash indexes are not very practical, mainly because InnoDB, the most common storage engine, does not support explicit creation of Hash indexes, but only supports adaptive Hash indexes.

Although you can use the sql statement to display and declare the Hash index in InnoDB, it actually does not take effect

Create a Hash index on the name field, but you show index from 表名will find that it is actually a B + tree

In the storage engine, the Memory engine supports Hash index

Hash index is actually a bit like the underlying data structure of HashMap in Java. It also has a lot of slots, which also store key-value pairs. The key value is the index column, and the value is the row pointer of the data. You can find it through the row pointer data

Assuming that the table now useruses the Memory storage engine, a Hash index is established for the name field, and three pieces of data are inserted into the table

The Hash index will perform Hash calculation on the value of the index column name, and then find the corresponding slot, as shown in the figure below

When the Hash value of the name field is the same, that is, a Hash conflict, a linked list will be formed. For example, if there are two data of name = Zhang San, a linked list will be formed.

After that, if you want to check the data of name = Li Si, you only need to perform Hash calculation on Li Si, find the corresponding slot, traverse the linked list, take out the row pointer corresponding to name = Li Si, and then search for the corresponding data according to the row pointer.

Hash index advantages and disadvantages

  • The hash index can only be used for equality comparison, so the query efficiency is very high

  • Range queries are not supported, and sorting is not supported because the distribution of indexed columns is unordered

2.3 B+ tree

B + tree is the most used data structure in mysql index, which will not be introduced here, but will be introduced in the next section.

In addition to Hash and B + tree, there are other indexes such as full-text index, which will not be discussed here

  1. clustered index

3.1 Data page data storage

We know that the data we insert into the table will eventually be persisted to the disk. InnoDB proposes the concept of pages in order to facilitate the management of these data . It divides the data into multiple pages. The default size of each page is 16KB . We can call this page a data page .

When we insert a piece of data, the data will be stored in the data page, as shown in the figure below

When data is continuously inserted into the data page, the data will be sorted according to the size of the primary key (if not, it will be automatically generated) to form a one-way linked list

In addition to storing the data we insert, the data page will also have a part of space for storing additional information. There are many types of additional information. I will meet one later and say one

3.2 Data search for a single data page

Since the data will be stored in the data page, how to check the data from the data page?

Suppose now you need to locate the data of this record with id=2 in the data page, how to quickly locate it?

There is a stupid way to traverse the linked list from the beginning, judge whether the id is equal to 2, and if it is equal to 2, just take out the data.

Although this method is feasible, if a data page stores a lot of data, dozens or hundreds of pieces of data, it is not too troublesome to traverse in this way every time

So mysql thought of a good way, which is to group these data

Assuming that 12 pieces of data are stored in the data page, the entire grouping is roughly as shown in the figure below

For convenience, I only marked the id value here, omitting the values ​​of other fields

Here I assume that every 4 pieces of data count as a group, and there are 3 groups in the picture. After the group is completed, mysql will take out the largest id value in each group, which is 4, 8, and 12 in the picture. Find a location in the data page to save it. This is one of the additional information stored in the data page mentioned above, which is called the page directory

Assume that after querying the data with id=6 at this time, you only need to search from the page directory according to the binary search and find that it is between 4-8. Since 4 and 8 are the largest ids of their groups, then id=6 must be In the group of 8, then go to the group with id=8, traverse each data, and judge whether the id is equal to 6 or not.

Since mysql stipulates that the number of data items in each group is about 4~8, it must be much faster than traversing the data of the entire data page

In fact, I have simplified the above grouping situation a little bit, but it does not delay understanding

3.3 Data lookup in multiple data pages

When we continuously insert data into the table, the space occupied by the data will continue to increase, but the size of a data page is fixed. When a data page cannot store data, a data page will be recreated to store data

In order to distinguish each page, mysql will assign a page number to each data page, which will be stored in the storage space of additional information. At the same time, the additional information will also store the position of the previous and next data page of the current data page, thus forming a data page doubly linked list between

The page number of data page 2 is 2, and the page number of data page 3 is 3. Here, for the convenience of understanding, I directly write the number of data pages.

And mysql stipulates that the maximum value of the stored data id of the previous data page is smaller than the minimum value of the stored data id of the next data page, so that the data is sorted according to the size of the id in all data pages .

Now, if there are multiple data pages, what should we do when we need to find the data with id=5?

Of course, the above stupid method can still be used, that is, to traverse from the first data page, and then traverse the data in each data page, and finally find the data with id=5.

But if you think about it carefully, this stupid method is equivalent to a full table scan, which definitely won't work.

So how to optimize it?

The idea of ​​mysql optimization is actually similar to the previous optimization idea of ​​searching for data on a single data page

It will take out the smallest id in each data page and put it into another data page separately. This data page does not store the data we actually insert, but only stores the smallest id and the page number of the data page where the id is located, such as As shown in the figure

In order to make the picture more full, I added a data page for storing data 4

At this time, data page 5 is extracted, which stores the minimum id and corresponding data page number of the following three data pages storing data

It is very convenient to find the data with id=5 at this time, roughly divided into the following steps:

  • From data page 5 directly based on binary search, found between 4-7

  • Since 4 and 7 are the smallest ids of the data pages, the data with id=5 must be on the data page with id=4 (because the smallest id of the data page with id=7 is 7),

  • Next, go to the page number of data page 2 corresponding to id=4 to find data page 2

  • Then search for data according to the process of searching from a single data page based on the primary key id of the data mentioned above

In this way, it is possible to find data between multiple data pages according to the primary key id

3.4 Clustered Index

As the amount of data continues to increase, the number of data pages that store data continues to increase, and data page 5 will contain more and more data, but each data page defaults to 16k, so data page 5 will also split into multiple data pages situation, as shown below

Data page 10 acts the same as data page 5

At this time, if you want to search for the data with id=5, should you go to data page 5 for binary search or go to data page 10 for binary search?

The stupid way is to traverse, but it is really unnecessary, mysql will extract the id of the smallest data stored in data page 5 and data page 10 and the corresponding data page number, and put them separately into a data page, as shown in the figure below

Data page 11 is the newly extracted data page, which stores the page number of id=1 and the corresponding data page 5 and the number id=10 and the corresponding page number of data page 10

And this is the B + tree .

Generally speaking, the B + tree of mysql database can hold tens of millions of data in three layers.

At this time, the search for data with id=5 is roughly divided into the following steps:

  • From data page 11, locate to id=5 according to binary search, corresponding to data page 5

  • Then go to data page 5 and locate data page 3 according to id=5 binary search

  • Then go to data page 3 to find data according to id=5. The specific logic has been mentioned many times before.

In this way, the data can be found successfully.

The B+ tree in which the leaf nodes store the actually inserted data is called a clustered index , and the non-leaf nodes store the record id and the corresponding data page number.

So for the InnoDB storage engine, the data itself is stored in a B + tree.

  1. secondary index

A secondary index is also called a non-clustered index, which itself is a B + tree. A secondary index corresponds to a B + tree, but the data stored in the secondary index B + tree is different from that of the clustered index.

As mentioned earlier in the clustered index, leaf nodes store the data we insert into the database, and non-leaf nodes store the primary key id of the data and the corresponding data page number.

The leaf nodes of the secondary index store the data of the index column and the corresponding primary key id, and the non-leaf nodes also store the page number of the data page in addition to the data and id of the index column.

The data page mentioned above is actually called an index page, because the leaf node stores the data of the actual table, so I call it a data page. Next, because I really want to talk about the index, I will use the secondary index The page is called the index page , you know it is the same, but the stored data is not the same.

4.1 Single column index

Suppose we now add a common non-unique index to the name field, then name is the index column, and the name index is also a single-column index

At this time, if three pieces of data are inserted into the table, the data stored in the leaf node of the name index is as shown in the figure below

Mysql will sort according to the value of the name field. Here I assume that Zhang San is ranked in front of Li Si. When the values ​​of the index columns are the same, they will be sorted according to the id, so the index has actually been sorted according to the value of the index column.

There must be some friends here who have questions, can the Chinese stored in the name field be sorted?

The answer is yes, and mysql supports many kinds of sorting rules. We can specify the sorting rules when building databases or tables, and the sorting of strings involved in the following articles is done randomly by me. The actual situation may not be the same. the same .

The data search for a single index column is the same as the clustered index mentioned above, and the data is also grouped, and then the data can be searched in a single index column according to the binary search.

When the data continues to increase and one index page cannot store the data, multiple index pages will be used to store the data, and the index pages will directly form a doubly linked list

When the number of index pages continues to increase, in order to facilitate the search for data in different index pages, an index page will also be extracted. In addition to storing the id in the page, it will also store the value of the index column corresponding to the id

When the data becomes more and more, it will be extracted and a three-layer B + tree will be formed, so I won't draw it here.

4.2 Joint Index

In addition to the single-column index, the joint index is actually the same, except that the data stored in the index page has more index columns.

For example, to create a joint index on name and age, a single index page is shown in the figure

First sort by name, then sort by age if the name is the same, if there are other columns, and so on, and finally sort by id.

Compared with the index with only name field, the index page stores one more index column.

The final B + tree is simplified as shown in the figure below

4.3 Summary

In fact, it can be seen from the above analysis that the main differences between the clustered index and the non-clustered index are as follows

  • The leaf nodes of the clustered index store the values ​​of all columns, and the leaf nodes of the non-clustered index only store the values ​​of the index columns and the primary key id

  • The data of the clustered index is sorted by id, and the data of the non-clustered index is sorted by the index column

  • The non-leaf nodes of the clustered index store the primary key id and page number, and the non-leaf nodes of the non-clustered index store the index column, primary key id, and page number

Since the latter index tree will be used frequently, for your convenience, I inserted the corresponding data in the table based on the data of the above index tree, and the sql is at the end of the article

In reality, the index B + tree may not be sorted as shown in my picture, but it does not delay understanding.

  1. return form

After talking about the secondary index, let's talk about how to use the secondary index to find data.

Here it is assumed that an index is created on the name field, and several pieces of data in the above example are stored in the table, here I will take the picture again

So how should the following sql be executed?

select * from `user` where name = '赵六';

Since the query condition is name = '赵六', the name index will be used

The whole process is roughly divided into the following steps:

  • Start the binary search from the top index page, which is the index page 113 in our figure. If there is another layer above the index page 113, start the binary search from the upper layer

  • Find between and on the index page 113 赵六, 王五and 刘七then 王五go to the corresponding index page 111 to search赵六

  • 赵六The first record found on index page 111 , that is, the one with id=4

  • Because it is select *, you need to check other fields. At this time, you will search for other field data in the clustered index according to id=4. This search process has been mentioned many times before. This is to search for data in the clustered index based on id=4. Backlist _

  • Since it is a non-unique index, 赵六this value may be repeated, so it will continue to traverse along the linked list on the index page 111. If the name is still there, 赵六it will return to the table according to the id value, and so on, until the name No longer equal 赵六, for the illustration, it is actually two pieces of data

From the analysis of the data search process of the secondary index above, we can understand the meaning of returning to the table , that is, first look up the corresponding primary key id from the secondary index according to the field value of the query condition, and then look up the values ​​of other fields according to the id and then to the clustered index .

  1. covering index

The previous section said that when this sql is executed , the corresponding primary key id select * from user where name = '赵六';will be found from the index page first , and then returned to the table to query the values ​​of other fields in the clustered index.name = '赵六';

So what happens when the following sql is executed?

select id from `user` where name = '赵六';

This time the query field select *changes from select id, and the query conditions remain unchanged, so the name index will also be used

So it is still the same as before. After finding out the name = '赵六';corresponding primary key id from the index page, I was surprised to find that the id value of the field that needs to be queried in SQL has already been found. When the id is reached, what table should be returned.

The situation that the fields that need to be queried are all in the index column is called a covering index , and the index column covers the meaning of the query field.

When using a covering index, the number of times to return to the table will be reduced, so that the query speed is faster and the performance is higher.

Therefore, in daily development, try not to select *, and check what you need. If there is a covering index, the query will be much faster.

  1. index pushdown

Assuming that a joint index of name and age is established for the table now, for the convenience of understanding, I will take the previous figure again

Next, execute the following sql

select * from `user` where name > '王五' and age > 22;

Before MySQL5.6 (excluding 5.6), the general execution steps of the entire SQL are as follows:

  • First, according to the binary search, locate name > '王五'the first piece of data, that is, Zhao Liu with id=4

  • After that, it will return to the table according to id=4, search the data in other fields with id=4 in the clustered index, and then judge whether the age in the data is greater than 22, if it is, it means that it is the data we need to find, otherwise it is not

  • Then follow the linked list, continue to traverse, and then return to the table once when a record is found, and then judge the age, and so on until the end

Therefore, as shown in the figure, the entire search process will go through 5 back-to-table operations, two Zhao Liu, two Liu Qi, and one Wang Jiu, and finally the data that meets the conditions is the data of Zhao Liu with id=6, and the rest ages do not match and.

Although there is no problem with this execution, I don’t know if you have found that it is not necessary to return to the table so many times, because it can be seen from the above index diagram that the matching data name > '王五' and age > 22is the data of Zhao Liu with id=6

So after MySQL5.6, the above age > 22judgment logic is optimized

It is still the same as before, find Zhao Liu with id=4, and then do not go back to the table to judge the age, because the index column has the value of age, then directly judge whether it is greater than 22 according to the age in the index, if it is greater, then Return to the table to query the remaining field data (because it is select *), and then traverse the linked list sequentially until the end

So after this optimization, the number of times to return to the table becomes 1, which greatly reduces the number of times to return to the table compared to the previous 5 times.

And this optimization is called index pushdown , which is to reduce the number of times to return to the table.

The reason why this optimization is called index pushdown is actually age > 22related to the place where the judgment logic is executed, so I won’t go into details here.

  1. index merge

Index merge (index merge) is an index optimization mechanism introduced from MySQL5.1. In previous MySQL versions, multiple query conditions in one sql can only use one index . In the case of multiple indexes will be scanned, and then the scan results will be merged

The results will be merged into the following three situations:

  • Take the intersection (intersect)

  • Take union

  • Union after sorting (sort-union)

In order not to delay the demonstration, delete all previous indexes, and then create a secondary index idx_name and idx_age for name and age respectively

8.1 Intersection (intersect)

When the following sql is executed, the intersection will appear

select * from `user` where name = '赵六' and age= 22;

View execution plan

The type is yes index_merge, and both possible_key and key are sums idx_name, idx_ageindicating that index merge is used, and Extra is available Using intersect(idx_age,idx_name), and intersect means intersection.

The whole process is roughly like this, take out the corresponding primary key id according to idx_nameand respectively, and then take the intersection of the primary key id, then the id of this part of the intersection must meet the query conditions of the query at the same time (think carefully), and then return to the table according to the id of the intersectionidx_agename = '赵六' and age= 22

However, if you want to use the joint index that takes the intersection, you need to satisfy that the primary key ids found by the respective indexes are sorted, which is for the convenience of quickly taking the intersection

For example, the following sql cannot use the joint index

select * from `user` where name = '赵六' and age > 22;

You can only use the name index, because age > 22the IDs found out are unordered. I mentioned the sorting rules of index columns when talking about indexes.

It can be seen from this that the conditions for using a joint index are relatively harsh.

8.2 Union

andTaking the union is to replace the previous example withor

select * from `user` where name = '赵六' or age = 22;

The previous execution is the same. According to the conditions, go to the respective indexes to search, and then take the union of the query id to deduplicate, and then return to the table

Similarly, taking the union also requires that the primary key ids found by the respective indexes are sorted. If the query condition is changed age > 22to

select * from `user` where name = '赵六' or age > 22;

8.3 Union after sorting (sort-union)

Although the union requires that the primary key ids found by the respective indexes be sorted, but if there is an unsorted situation, mysql will automatically optimize this situation, sorting the primary key id first, and then fetching and merging Set, this situation is called sort-union.

For example, the sql mentioned above that cannot directly take the union is in line with the situation of taking the union after sorting (sort-union)

select * from `user` where name = '赵六' or age > 22;
  1. How does mysql choose indexes

In daily production, a table may have multiple indexes, so how does mysql determine which index to use when executing SQL, or scan the entire table?

When mysql selects the index, it will judge according to the cost of using the index

The cost of a sql execution is roughly divided into two parts

  • IO cost, because these pages are all on the disk, if you want to judge, you must first load them into memory. MySQL stipulates that the cost of loading a page is 1.0

  • CPU cost, in addition to IO cost, there is also the cost of conditional judgment, that is, CPU cost. For example, in the previous example, you have to judge that the loaded data name = '赵六'character does not meet the conditions. MySQL stipulates that the cost of each judged data is 0.2

9.1 Full table scan cost calculation

For a full table scan, the cost calculation is roughly as follows

mysql will count the data of the table, this statistics is approximate, not very accurate, you show table status like '表名'can view the statistical data through

For example, how many data rows are there in this table, and the number of bytes data_length occupied by the clustered index, since the default is 16kb, you can calculate the approximate number of data pages (data_length/1024/16).

So the cost of full table scan is calculated like this

rows * 0.2 + data_length/1024/16 * 1.0

9.2 Secondary index + return table cost calculation

The cost calculation of secondary index + table return is more complicated, and its cost data depends on the number of scanning intervals and the number of table return times in two parts

In order to facilitate the description of the scanning interval, here I will take the above picture again

select * from `user` where name = '赵六';

Look at the picture!

The query condition name = '赵六'will generate a scanning interval, from Zhao Liu with id=4 to Zhao Liu with id=6

For another example, if the query condition is name > '赵六', then a scanning interval will be generated from Liu Qi with id=7 to the end of the data (Wang Jiu with id=9)

For another example, if the query condition is name < '李四' and name > '赵六', two scanning intervals will be generated at this time, one is counted from Zhang San with id=2 to Zhang San with id=3, and the other is counted from Liu Qi with id=7 until the end of the data

So the scan interval means the record interval that meets the query conditions

When calculating the cost of the secondary index, mysql stipulates that the cost of reading a range is the same as the IO cost of reading a page, both of which are 1.0

After the interval is available, it will estimate how many pieces of data are in these intervals based on statistical data, because to read and write these data, the cost of reading is roughly the number of pieces * 0.2

So the cost of going through the secondary index is区间个数 * 1.0 + 条数 * 0.2

Afterwards, these data need to be returned to the table (if necessary), and mysql stipulates that the IO cost of each return to the table is the same as that of reading a page, which is also 1.0

When returning to the table, it is necessary to judge the remaining query conditions for the data retrieved from the clustered index, which is the CPU cost, which is roughly the number of entries * 0.2

So the cost of returning the table is roughly条数 * 1.0 + 条数 * 0.2

So the approximate cost of secondary index + table return is区间个数 * 1.0 + 条数 * 0.2 + 条数 * 1.0 + 条数 * 0.2

When the cost of the index and the cost of the full table scan are calculated, mysql will choose the index with the lowest cost to execute

mysql will fine-tune the above cost calculation results, but the fine-tuning value is very small, so I omit it here, and here is just a general introduction to the cost calculation rules. The actual situation will be more complicated, such as querying tables, etc., there are Interested partners can refer to relevant information

9.3 Summary

In general, this section is mainly to let you understand one thing, when mysql selects the index, it will calculate the cost of using each index according to the statistics and cost calculation rules, and then choose the index with the lowest cost to execute query

  1. index failure

In daily development, you must have more or less encountered the problem of index failure. Here I summarize several common index failure scenarios.

For the convenience of explanation, here I will take the picture again

10.1 Inconsistency and leftmost prefix matching principle

i.e. leftmost first

When it does not match the leftmost prefix matching principle, it will cause the index to fail

  • For example, likeif it starts with %, the index is invalid or the joint index does not take the first index column.
  • Joint index name, age. In fact, two indexes are created, namely (name), (name, age). Therefore, querying the index with age alone does not take effect; mysql will always match to the right until it encounters a range query (>, <, between, like) and stops matching

For example, when the joint index of name and age is executed select * from user where name > '王五' and age > 22;, if you want to use the index, you need to scan the entire index at this time, because the index columns are sorted by the name field first, and then by the age field. For age, in The entire index is unordered. It can also be seen from the figure that 18, 23...9 are out of order, so it is impossible to locate age > 22which index page starts from based on the binary search.

Therefore, if you use the index, you need to scan the entire index, judge one by one, and finally return to the table, which consumes a lot of performance. It is better to directly scan the clustered index, that is, the full table scan.

10.2 Indexed columns are calculated

+1, abs(), f;oor()etc.

When performing expression calculations or using functions on the index, it will also cause the index to become invalid

This is mainly because the original value of the index field is saved in the index. As can be seen from the picture drawn above, when the value is calculated by the function, there is no way to go to the index

10.3 Implicit conversions

When the index column has an implicit conversion, the index may become invalid

For example, mysql stipulates that when a string is compared with a number, it will first convert the string into a number and then compare it. As for how to convert a string into a number, mysql has its own rules

For example, an implicit conversion occurs when I execute the following sql

select * from `user` where name = 9527;

The name field is a varchar type, 9527, without quotation marks, it is a number, mysql will namefirst convert the value of the field into a number according to the rules, and then compare it with 9527. At this time, namethe index is invalid because the field has been converted

ALL means that the index is not taken, and it is invalid.

But suppose you create an index on age now and execute the following sql

select * from `user` where age = '22';

At this time, the age index will not be invalid, mainly because of the sentence mentioned earlier:

When a string is compared with a number, the string will be converted to a number before comparison

Therefore, '22'it will be implicitly converted into a number, and then compared with age. At this time, the age field has not undergone implicit conversion, so it will not be invalid.

So, implicit conversions may invalidate the index.

10.4 mysql statistical data error is large

Large errors in mysql statistical data may also lead to index failure, because as mentioned earlier, mysql will calculate the cost of using indexes based on statistical data, so once the statistical data errors are large, the calculated cost errors will be large, and it is possible The actual cost of indexing is small, but the calculated cost of indexing is large, resulting in index failure

When this happens, you can execute analyze table 表名this sql, and mysql will re-count the data, and the index will be valid again

  1. Indexing Principles

11.1 The number of single table indexes should not be too much

  • From the above analysis, we know that each index corresponds to a B + tree, and the leaf nodes store the full amount of data in the index column. Once the number of indexes is large, it will occupy a large amount of disk space

  • At the same time, as mentioned earlier, the index cost will be calculated before the query. Once there are many indexes, the number of calculations will be large, which may also waste performance.

11.2 Fields that often appear after where should be indexed

Needless to say, the index is to speed up. If there is no suitable index, the full table scan will be performed. For InnoDB, the full table scan starts from the first leaf node of the clustered index and judges one by one along the linked list. The data service does not meet the query conditions

11.3 Fields after order by and group by can be indexed

For example, the following sql

select * from `user` where name = '赵六' order by age asc;

Query name = '赵六'and agejoin index by order, name and age

You may not remember the index tree, I'll bring the index tree

At this point, looking at the index tree, you can find name = '赵六'that ageit has been sorted at that time (the sorting rules were mentioned in the previous introduction to the index), so you can use agethe index column for sorting.

11.4 Frequently updated fields should not be indexed

Because the index needs to be sorted according to the value of the index column, once the index field data is updated frequently, in order to ensure the order of the index, the position of the index column in the index page must be frequently moved

Such as name and age joint index

王九At this time, change the name of the data with id=9 from 赵六to, then move the changed data on the index page between Wang Wu and Zhao Liu with id=4, because when the names are the same, the Guarantee the order, and at the same time sort by age, the age of id=9 is 9, the smallest, then it is ranked first.

Therefore, building an index for frequently updated fields will increase the cost of maintaining the index.

11.5 Choose highly differentiated fields for indexing

This is because, if the discrimination is low, the index effect is not good.

For example, suppose there is a gender field sex, which is either male or female. If sex is indexed, assuming that male ranks before female, the data on the index page is roughly arranged as follows:

Here I have drawn 6 pieces of data, assuming there are 10w pieces of data, then this will continue to be arranged, with men at the front and women at the back.

At this time, if you go to the sex index and query the data with sex = male, assuming that the male and female data are half-half, then the scanned records will have 5w, and if you want to return to the table, then according to the cost calculation rules, you will find that the cost is huge. It is not as good as direct full table scan.

So choose a field with a high degree of discrimination as an index

  1. Summary

At this point, this article is over, here is a review of the content of this article

First of all, it mainly talked about clustered index and non-clustered index, and then talked about the optimization of MySQL for some common queries, such as covering index and index pushdown, all of which are to reduce the number of times of returning to the table, thereby reducing the performance consumption caused, and then It is mentioned later how MySQL selects indexes, and finally introduces the scenarios of index failure and the principles of index establishment.

Finally, I hope this article is helpful to you!

Finally, the table data sql is as follows

INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (1, '李四', 20, '杭州');INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (2, '张三', 18, '北京');INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (3, '张三', 23, '上海');INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (4, '赵六', 22, '杭州');INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (5, '王五', 19, '北京');INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (6, '赵六', 24, '上海');INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (7, '刘七', 20, '上海');INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (8, '刘七', 22, '上海');INSERT INTO `user` (`id`, `name`, `age`, `city`) VALUES (9, '王九', 9, '杭州');

reference:

[1]. "How MySQL Works"

[2].https://blog.csdn.net/weixin_44953658/article/details/127878350

Guess you like

Origin blog.csdn.net/agonie201218/article/details/131825718