Prefix index, both useful and useful


The last few articles have been talking about indexing with you. Today, let's take a look at prefix indexing.

1. What is a prefix index

To put it bluntly, the so-called prefix index is to build an index on the first few characters of the text (specifically, several characters are specified when building the index) , so that the index established in this way is smaller, so the query is faster. This is somewhat similar to using the Left function on a field to create a functional index in Oracle, except that this prefix index of MySQL automatically completes the matching internally during query, and does not need to use the Left function.

So why not index the entire field? Generally speaking, the use of a prefix index may be because the data volume of the entire field is too large, and it is not necessary to build an index for the entire field. The prefix index only selects part of the characters of a field as an index, which can save the index space on the one hand, and save the index space on the other hand. On the one hand, the index efficiency can be improved. Of course, it is obvious that this method will also reduce the selectivity of the index.

Here again involves a concept, what is index selectivity ?

2. What is index selectivity

Regarding the index selectivity (Index Selectivity), it refers to the ratio of the unique index value (also called cardinality) to the total number of records in the data table, and the value range is [0,1]between . The higher the selectivity of the index, the more efficient the query, because a highly selective index allows MySQL to filter out more rows when searching.

Then some friends want to ask, is the index with higher selectivity better? of course not! The highest index selectivity is 1. If the index selectivity is 1, it is the only index. When searching, you can directly locate a specific row of records through the search criteria! At this time, although the performance is the best, it is also the most space-consuming, which is not in line with our original intention of creating a prefix index .

The reason why we created a prefix index instead of a unique index at the beginning is to find a balance between the performance of the index and the space , and we hope to be able to choose a long enough prefix to ensure high selectivity (so that in the process of querying you don't need to scan many rows), but also hope that the index does not take up too much storage space.

So how do we choose an appropriate index selectivity? The index prefix should be long enough so that the selectivity of the prefix index is close to that of the entire column of the index, i.e. the cardinality of the prefix should be close to the cardinality of the full column.

First, we can get full column selectivity through the following SQL:

SELECT COUNT(DISTINCT column_name) / COUNT(*) FROM table_name;

Then, the selectivity of a certain length prefix is ​​obtained by the following SQL:

SELECT COUNT(DISTINCT LEFT(column_name, prefix_length)) / COUNT(*) FROM table_name;

When the above SQL is executed, we should pay attention to selecting the appropriate prefix_length, until the calculation result is approximately equal to the selectivity of all columns, which is the best result.

3. Create a prefix index

3.1 A small case

As an example, let's create a prefix index.

The data sample used by Songge here is a test script found on the Internet. There are 300W+ pieces of data. It is enough to do SQL test optimization. The friends reply in the background of the official account mysql-data-samplesto get script download link.

Let's take a general look at this table structure:

This table has a user_uuid field, and we will make a fuss about this field.

Git friends should all use it, right? Unlike Svn, the version number on Git is not a number but a Hash string, but when we use it in a specific application, for example, if you want to roll back the version, you don't need to enter the complete version number at this time, just enter the version The first few characters of the number are enough, because the version number can be determined according to the previous part.

Then the user_uuid field in this table also means this. If we want to index the user_uuid field, there is no need to index the complete string, we only need to index a part of the string.

Maybe some friends still don't understand, let me give an example, for example, I want to query according to the user_uuid field, but I don't need to write the complete user_uuid for the query conditions, I only need to write the first part to distinguish what I want Recorded, let's look at the following SQL:

As you can see, I only need to give a part of user_uuid to uniquely lock a record.

Of course, the above SQL was tested by Song Ge, and the given '39352f%'condition cannot be any shorter. If it is shorter, two or more records will be found.

From the above example, we can see that if the user_uuid field is indexed, it may not be necessary to index the complete string, but only a part of the prefix string.

So what about indexing the first few strings? This is not a slap on the forehead, it requires scientific calculation, we will continue to look down.

3.2 Prefix index

First, let's look at the selectivity of the user_uuid full-column index through the following SQL:

SELECT COUNT(DISTINCT user_uuid) / COUNT(*) FROM system_user;

As you can see, the result is 1. The selectivity of all columns is 1, which means that the values ​​of this column are all unique and non-repeating.

Next, let's try a few different prefix_lengths to see how selective it is.

Song Ge tested a total of 5 different prefix_lengths here, let's take a look at their options:

The selectivity of 8 and 9 is the same, because in the uuid string, the -ninth string is the same for all uuid ninth strings, so the distinction between 8 characters and 9 strings is the same.

When the prefix_length is 10, the selectivity is already 1, which means that in these 300W+ pieces of data, if I use the user_uuid field to query, I only need to enter the first ten characters to uniquely locate a specific piece of data. 's record.

So what are you waiting for, create a prefix index now:

alter table system_user add index user_uuid_index(user_uuid(10));

Check out the prefix index just created:

show index from system_user;

As you can see, the second line is the prefix index we just created.

Next, we analyze whether the index is used in the query statement:

select * from system_user where user_uuid='39352f81-165e-4405-9715-75fcdf7f7068';

As you can see, this prefix index has been used.

The specific search process is as follows:

  1. Find the first record with the value from the user_uuid_indexindex 39352f81-1(the first ten characters of user_uuid).
  2. Since user_uuid is a secondary index, and the leaf node stores the primary key value, the primary key id obtained at this time is 1.
  3. Go back to the table with the primary key id, find the complete record of the row with id 1 on the primary key index, and return it to the server layer.
  4. The server layer determines whether its user_uuid is 39352f81-165e-4405-9715-75fcdf7f7068(so the Extra of the execution plan is Using where).
    1. If not, the row is discarded.
    2. If so, add the record to the result set.
  5. There is a singly linked list between the data on the index leaf nodes, so following the result of the first search, continue to read the next record backwards, and then repeat steps 2, 3, and 4 until the user_uuid_index is obtained. When the value is 39352f81-1not , the loop ends.

If we build a prefix index and the selectivity of the prefix index is 1, then step 5 is not needed, and if the selectivity of the prefix index is less than 1, step 5 is needed.

From the above case, the friends can see that we not only save space, but also improve the search efficiency.

3.3 A question

After using the prefix index, let's look at a problem. Let's look at the following query SQL:

select user_uuid from system_user where user_uuid='39352f81-165e-4405-9715-75fcdf7f7068';

Not this time select *, but select user_uuid, according to Song Ge's previous article ( it's time to check whether the posture of using the index is correct! ), everyone knows that the covering index should be used here, let's take a look at the execution plan:

Hey, what about index coverage? (Note that Extra is Using where).

Think about it, in the prefix index, the value of the complete user_uuidfield and you must return to the table to get the required data. Therefore, using a prefix index, you can't use a covering index.

4. Summary

Well, this is the prefix index, please use it according to the actual needs of your own projects. Let’s talk about so much today, let’s talk about the rest later~

References:

  1. https://blog.csdn.net/dhrome/article/details/72853153

Guess you like

Origin blog.csdn.net/u012702547/article/details/122877656