mysql index usage strategy

Quote:

I recently read "High-Performance MySQL", although I haven't finished it yet, but I think it is really well written. After reading the index part, it is of
great help to create an index and understand the operation principle of mysql index. I made some notes about the index, and you can go back and refer to it when you encounter problems.

1. Advantages of index:

If you are not sure about the basic concepts of mysql index, you can read my two blogs.
MySQL clustered index and non-clustered index , the vernacular btree and b+tree .

1.1 Index greatly reduces the number of rows that the server needs to scan
1.2 Index can help the server avoid sorting and temporary tables
1.3 Index can change random I/O to sequential I/O

2. Strategies for using indexes

2.1 Independent column

If the index column is an independent column when querying, it means that the index cannot be a part of the expression, nor can it be a parameter of a function.
Error example:

1. select user_id from user where user_id + 1 = 7;  这里完全可以写成user_id = 6,这样索引才会生效  
2. select * ...where to_days(current_date)-to_days(date_col) <=10  

Note: If the indexed column, like the field behind where in the above example, is indexed, but because the indexed column is manipulated (it becomes an expression like user_id + 1),
or you use a function pair The index column was manipulated. This will cause the index to fail.

2.2 Prefix index and index selectivity

2.2.1

When some columns that need to be queried are relatively long, we can create a prefix index, which is an index of a certain length in front of this column, such as alibabayushishidadao, we build the first eight indexes, which is alibaba, and use this prefix to search for the corresponding The column. But here is a question, how to determine the length of the prefix index?
This mentions a concept called index selectivity . Index selectivity refers to the ratio of the number of unique indexes to the total number of data records T. The range is 1 /T~1. The higher the selectivity of the index, the higher the query efficiency, because this means that the index covers more unique data and can filter out more rows when querying. The selectivity of the index is 1. Then the performance of this index is extremely high.
So we need to set a reasonable prefix index length to make the index selection rate higher.
Example:

select count(distinct left(phone, 3))/count(*) as prefix3,  
count(distinct left(phone, 5))/count(*) as prefix5,  
count(distinct left(phone, 7))/count(*) as prefix7  
from table;

Let's assume that prefix indexing is performed on the phone field, and the above count can be used to calculate which ratio is the closest to the index selection rate of directly using the complete phone.
In this way, we can change the index to a prefix index, thereby reducing the length of the index and improving query efficiency.

2.2.2 Create a prefix index:

alter table user add key (user_name(7)) The number is the length of the prefix index

2.2.3 Disadvantages of prefix index:

1. Because the prefix index is not the full length of the column, group by and order by cannot be performed
2. At the same time, because it is not the full length of the column, the covering index cannot be reached

2.2.4 Note: Other usage scenarios of prefix index are to use unique id for longer data, or sometimes need to use suffix index (of course, mysql does not support it, but we turn the data over and store it when storing data )

2.3 Multi-column index

If you find that type=index_merge appears in the explain, then you have to consider the rationality of index creation.
This kind of index merging usually consumes a lot of cpu and memory resources. More importantly, the optimizer will not calculate these into the cost of the query. The optimizer only cares about the amount of data read by random pages.

2.4 Choose the appropriate index order

There is no fixed rule for the order of the index, it should be created according to actual usage.
But in general, we put the fields with high frequency of use and high index selection rate in the front, and put the fields involving range query and low frequency of use in the back.

3.5 clustered index

3.5.1

The primary key is preferably an auto-incremented id, so that every time new data comes in, you only need to add data to the last column of the clustered index data.Even if the current data page is full, you only need to renew it. Start adding data on a page. If it is an unordered primary key such as uuid, because the inserted data is in order in innodb, assuming that the primary key of the inserted uui is smaller than the previous primary key, then the previous data will be moved , To allow new data to be inserted, and when the data page is full, it will consume more resources to deal with such a situation. There will be continuous page splitting, and continuous page splitting will cause fragmentation. , Then it will take up more space than a normal self-incrementing primary key.

3.5.2

Disadvantages of the auto-increment primary key: In the case of concurrency, it may lead to resource competition, because the upper bound of the auto-increment id is that every thread will compete, and all inserts need to obtain the latest and largest auto-increment. id, and concurrency will keep this upper bound constantly changing.

3.5.3

MySQL cannot perform like operations in the index. If it is the leftmost prefix like comparison, it can be indexed, because it will be converted to a simple comparison operation. But if it is a range query like "%xxx%" at the beginning of a wildcard, it is Don't use the index. Because search engines cannot compare wildcards with specific indexes.

tips: select sum(description = 3), sum(category_type = 2) from shop_page_field; In this way, you can count how many pieces of data the changed field belongs to a certain value. It seems to be more convenient to write than count, but how the performance compares, That's not known.

4.5. Covering Index

4.5.1 Benefits of covering index:

1. The entries of the covering index data are smaller than the total data volume, and the query speed will be faster
. 2. For innod, if the data is directly obtained from the index, there is no need to walk the clustered index, and no secondary query is required.

4.6. Unused indexes

Open userstates in the server (the default is closed), and then let the server run for a period of time. Then query INFORMATION_SCHEMAINDEX.STATISTCS to find out the usage rate of an index. If an index is not used, it can be deleted.

4.7 Indexes and locks

The granularity of innodb locks can reach the row level, and there are a total of row-level locks and table-level locks. The row lock must be added to the index to achieve, because the primary key and index and information are stored on the index, the
row of data can be accurately locked. If an index is not added, the table will be locked when performing operations such as update. Pay attention to this.

4.8 Indexing and sorting

If a field is often used for sorting, it is best to add an index. Through the explain keyword, you can see that filesort is returned in the extra field (mysql is called file sorting, although disk files are not necessarily used). After
adding the index This filesort will not be displayed.
Suppose we have indexes (A, B, C)
and then query statements, whether the sorting under the object is valid

(1)select * from table where A = 'a' order by B, C;(索引生效)
(2)select * from table where A = 'a' order by B;(索引生效)
(3)select * from table where A = 'a' order by A, B;(索引生效)
(4)select * from table where A = 'a' order by C; (索引对A生效,对C排序没有生效)
(5)select * from table where A = 'a' order by B, D (不生效,引用了一个不再索引列的字段)
(6)select * from table where A > 'a' order by B, C(不生效,对于A是范围查询,索引失效)
(7)select * from table where A = 'a' and B in ('b1', 'b2') order by C (失效对于B in的情况也是范围查询,索引失效)

4.9 Other optimization strategies

When the query content is similar to the URL, the efficiency of using btree will not be so high, because the URL is generally relatively long, and the index search times and efficiency are not good.
At this time, we can perform crc32 or crc64 on the url, calculate its hash value and store it.
But crc32/crc64 will collide, so the query conditions must bring the original url;
select * from url_table where url_hash = "1342134234" and url = "Http://www.baidu.com".
First, the corresponding url will be found according to url_hash, there may be collisions but the query is fast, and then filter according to the value of the url,
so that the query performance will be high.
Here url_hash is still the btree index used, but it will be much faster to filter the url than the direct long url. It can become a pseudo hash index

Note: Both crc64 and fnv64() require mysql to install additional plug-ins, not mysql officially comes with it. So if it is not installed, we can perform MD5 and similar operations to save a hash value when writing data in the program.

Guess you like

Origin blog.csdn.net/sc9018181134/article/details/104887125