PHP Interview Essentials | MySQL Index Usage Strategy and Optimization

MySQL optimization is mainly divided into structure optimization (Scheme optimization) and query optimization (Query optimization).

The high-performance indexing strategies discussed in this article mainly belong to the category of structural optimization. The content of this article is completely based on the theoretical basis above. In fact, once you understand the mechanism behind the index, choosing a high-performance strategy becomes pure reasoning, and you can understand the logic behind these strategies.

One, the sample database

In order to discuss the indexing strategy, a database with a small amount of data is needed as an example. This article uses one of the sample databases provided in the official MySQL documentation: employees. This database has moderate complexity and a large amount of data. The following figure is the ER diagram of this database (quoted from the official MySQL manual):

Insert picture description here

2. The principle of the leftmost prefix and related optimization

The first condition for efficient use of indexes is to know what kind of query will use the index. This problem is related to the "leftmost prefix principle" in B+Tree. The following example illustrates the leftmost prefix principle.

Let me talk about the concept of joint index first. In the above, we have assumed that the index only refers to a single column. In fact, the index in MySQL can refer to multiple columns in a certain order. This kind of index is called a joint index.

Generally, a joint index is an ordered tuple <a1, a2, …, an>, where each element is a column of the data table. In fact, to strictly define the index requires the use of relational algebra, but I don’t want to discuss too much here. The topic of multi-relational algebra, because that would be very boring, so there is no strict definition here. In addition, a single-column index can be regarded as a special case where the number of elements in a joint index is 1.

Take the employees.titles table as an example, let's first check which indexes are on it:

Insert picture description here

Three, EXPLAIN

In our daily work, we sometimes open slow queries to record some SQL statements that have been executed for a long time. Finding out these SQL statements does not mean we are done. Sometimes we often use the explain command to view one of these SQL statements. The execution plan of the SQL statement, check whether the SQL statement uses the index, whether to do a full table scan, this can be checked through the explain command.

So we have a deep understanding of MySQL's cost-based optimizer, and we can also get a lot of details about the access strategy that may be considered by the optimizer, and which strategy is expected to be adopted by the optimizer when running SQL statements.

The information from EXPLAIN has 10 columns, namely id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra

Summary description:

  • id: select identifier
  • select_type: Indicates the type of query.
  • table: the table of the output result set
  • type: indicates the connection type of the table

all (full table scan), index (full table scan according to the index order), range (range index scan)
req (search condition column uses the index and is not the primary key and unique, the value of the index column is not unique ), ref_eq (when the primary key or unique index is used for search),
const (the primary key is placed after where as a conditional query, the mysql optimizer can optimize this query into a constant)

  • possible_keys: indicates the indexes that may be used when querying
  • key: indicates the actual index used
  • key_len: the length of the index field
  • ref: column and index comparison
  • rows: the number of rows scanned (estimated number of rows)
  • Extra: Description and explanation of implementation

Four, specific content

Case 1: Full column match

Insert picture description here

explain SELECT * FROM employees.titles WHERE emp_no='10001' AND title = 'Senior Engineer' AND from_date='1986-06-26';

Obviously, the index can be used when performing exact matching according to all the columns in the index (here exact matching refers to "=" or "IN" matching). One thing to note here is that the index is theoretically sensitive to order, but because MySQL's query optimizer will automatically adjust the conditional order of the where clause to use a suitable index, for example, we reverse the order of the conditions in where the effect is the same .

Case 2: The leftmost prefix matches

Insert picture description here

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001';

When the query condition exactly matches one or several columns on the left side of the index, such as or <emp_no, title>, it can be used, but only part of it, that is, the leftmost prefix composed of the condition. The above query uses the PRIMARY index from the analysis results, but the key_len is 4, indicating that only the first column prefix of the index is used.

Case 3: The query condition uses the exact match of the column in the index, but one of the conditions in the middle is not provided

Insert picture description here

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10001' AND from_date='1986-0626';

At this time, the index usage is the same as in case two, because the title is not provided, so the query only uses the first column of the index, and the later from_date is also in the index, but it cannot be connected with the left prefix because the title does not exist, so it needs Scan and filter the results from_date (here, because emp_no is unique, there is no scan).

If you want from_date to also use an index instead of where to filter, you can add an auxiliary index <emp_no, from_date>, and the above query will use this index. In addition, you can also use an optimization method called "isolation column" to fill in the "pit" between emp_no and from_date.

First, let’s look at the title. There are several different values:

MySQL index usage strategy and optimization

There are only 7 kinds. In this case where there are fewer column values ​​called "pits", you can consider using "IN" to fill the "pits" to form the leftmost prefix:

Insert picture description here

This time key_len is 56, indicating that the index is used up, but from the type and rows we can see that IN actually executed a range query, here 7 keys are checked.

The performance improved a little after "filling the hole". If a lot of data is left after emp_no filtering, the performance advantage of the latter will be more obvious. Of course, if the value of title is too many, it is not appropriate to fill the hole, and an auxiliary index must be established.

Situation 4: The query condition does not specify the first column of the index

Insert picture description here

Since it is not the leftmost prefix, the index is obviously not used for queries such as indexes.

Situation five: match the prefix string of a column

Insert picture description here

The index can be used at this time, but if the wildcard does not only appear at the end, the index cannot be used. (The original text is incorrect. If the wildcard% does not appear at the beginning, the index can be used, but depending on the specific situation, only one of the prefixes may be used)

Situation six: range query

Insert picture description here

The range column can use the index (must be the leftmost prefix), but the column after the range column cannot use the index. At the same time, the index is used for at most one range column, so if there are two range columns in the query condition, the index cannot be used all.

Insert picture description here

You can see that the index can't do anything with the second range index. Here is a special point to explain that MySQL is interesting, that is, only using explain may not be able to distinguish between range index and multi-value matching, because both are displayed as range in type. At the same time, using "between" does not mean it is a range query, such as the following query:

Insert picture description here

It seems that two range queries are used, but "BETWEEN" acting on emp_no is actually equivalent to "IN", which means that emp_no is actually a multi-value exact match. You can see that this query uses all three columns of the index. Therefore, it is necessary to carefully distinguish between multi-value matching and range matching in MySQL, otherwise the behavior of MySQL will be confused.

Case seven, index selection and prefix index

Since the index can speed up the query, is it necessary to build the index as long as the query statement requires it? the answer is negative. Although the index speeds up the query, the index also has a price: the index file itself consumes storage space, and the index will increase the burden of inserting, deleting, and modifying records. In addition, MySQL also consumes resources to maintain the index during operation. Therefore, the index is not better. Generally, it is not recommended to build an index in two cases.

The first case is that the table records are relatively small, such as a table with one or two thousand or even a few hundred records. There is no need to build an index, just let the query do a full table scan. As for how many records are counted as many, this individual has a personal opinion. My personal experience is based on 2000 as the dividing line. If the number of records does not exceed 2000, you can consider not indexing, and if you exceed 2000, you can consider indexing as appropriate.

Another situation where indexing is not recommended is the low selectivity of the index. The so-called index selectivity (Selectivity) refers to the ratio of the unique index value (also called Cardinality) to the number of table records (#T):

Index Selectivity = Cardinality / #T

Obviously, the selectivity range is (0, 1], the higher the selectivity, the greater the value of the index, which is determined by the nature of B+Tree. For example, the employees.titles table used above, if the title field It is often queried separately, whether it is necessary to build an index, let's take a look at its selectivity:

Insert picture description here

The selectivity of title is less than 0.0001 (the exact value is 0.00000179), so there is really no need to build a separate index for it.

There is an index optimization strategy related to index selectivity called prefix index, which uses the prefix of the column instead of the entire column as the index key. When the prefix length is appropriate, the selectivity of the prefix index can be close to that of the full column index. The index key becomes shorter, which reduces the size of the index file and maintenance overhead. The following takes the employees.employees table as an example to introduce the selection and use of prefix indexes.

From the sample database diagram, we can see that the employees table has only one index. If we want to search for a person by name, we can only scan the entire table. If we frequently search for employees by name, this is obviously very inefficient, so we can consider building an index . There are two options, build or <first_name, last_name>, look at the selectivity of the two indexes:

Insert picture description here

Obviously the selectivity is too low, the selectivity is very good, but the total length of first_name and last_name is 30, is there a way to balance length and selectivity? >> Consider using the first few characters of first_name and last_name to build an index, for example, to see its selectivity:

MySQL index usage strategy and optimization

At this time, the selectivity is ideal, and the length of this index is only 18, which is nearly half shorter than that. We build this prefix index: ALTER TABLE employees.employees ADD INDEX `first_name_last_name4 `(first_name, last_name (4));At this time, perform the query by name again, and compare and analyze the results before the index: The performance improvement is significant, and the query speed has been increased by more than 120 times.

Prefix index takes into account index size and query speed, but its disadvantage is that it cannot be used for ORDER BY and GROUP BY operations, nor can it be used for covering index (that is, when the index itself contains all the data required for the query, the data file itself is no longer accessed).

Pay attention, don't get lost

Alright, everyone, the above is the entire content of this article. The people who can see here are all talents . As I said before, there are a lot of technical points in PHP, because there are too many, it is really impossible to write, and you will not read too much after writing it, so I will organize it into PDF and documents here, if necessary Can

Click to enter the secret code: PHP+「Platform」

Insert picture description here

Insert picture description here


For more learning content, please visit the [Comparative Standard Factory] excellent PHP architect tutorial catalog, as long as you can read it to ensure that the salary will rise a step (continuous update)

The above content hopes to help everyone . Many PHPers always encounter some problems and bottlenecks when they are advanced. There is no sense of direction when writing too much business code. I don’t know where to start to improve. I have compiled some information about this, including But not limited to: distributed architecture, high scalability, high performance, high concurrency, server performance tuning, TP6, laravel, YII2, Redis, Swoole, Swoft, Kafka, Mysql optimization, shell scripts, Docker, microservices, Nginx, etc. Many knowledge points, advanced advanced dry goods, can be shared with everyone for free, and those who need can join my PHP technology exchange group

Guess you like

Origin blog.csdn.net/weixin_49163826/article/details/108760331