Make good use of composite indexes, and the performance can be improved by more than 10 times!

Hello everyone, I am Piao Miao!

I believe that during the interview, you will usually be asked "what is an index?" and you can definitely blurt out: an index is a data structure that improves query speed . The reason why the index can improve the query speed is that it sorts the data when inserting.

In actual business, we will encounter many complex scenarios, such as querying multiple columns. At this time, the user may be required to create an index composed of multiple columns, such as the combined index created by columns a and b, but whether to create an index of (a, b) or an index of (b, a) is completely different. different.

Today, let's talk about the composite index that is closer to the actual business, and let's feel the power of the composite index together. (Of course, the index mentioned in the article refers to the B+ tree index, which is the short and fat man)

composite index

A compound index refers to a B+ tree index composed of multiple columns, which is exactly the same as the B+ tree index, except that a single column index sorts one column, and now it sorts multiple columns.

As can be seen from the figure above, the composite index only changes the sorted key value from one to multiple, and it is essentially a B+ tree index. But you must be aware that the sorting results of composite indexes like (a, b) and (b, a) are completely different.

If there is the following table test, a composite index is created for it union_index.

create table test
(
    id       int auto_increment primary key,
    name     varchar(50null,
    workcode varchar(50null,
    age      int         null
);

create index union_index on test (name, workcode);

For the combined index (name, workcode), because it sorts name and workcode, it can optimize the following two queries

select * from test  where  name = 'zhang' ;
select * from test  where  name = 'zhang' and workcode='20190169';

It is worth noting that the order of the query column name and workcode after where is irrelevant, even if it is written, where workcode = '20190169' and name ='zhang'the composite index (name, workcode) can still be used.


However, the following sql cannot use the composite index (name, workcode), because (name, workcode) sorting cannot be derived from (workcode, name) sorting.

select * from test where  workcode='20190169';

In addition, because the index (name, workcode) is sorted, the following SQL can still use the composite index (name, workcode) to improve query efficiency:

select * from test  where  name = 'zhang' order by workcode;

For the same reason, index (name, workcode) sorting cannot result in (workcode, name) sorting, so the following SQL cannot use composite index (name, workcode):

select * from test  where  workcode = '20190169' order by name ;

Speaking of this, you have mastered the basic content of the composite index. Next, let's take a look at how to correctly design the composite index in business practice?

Business index design practice

avoid extra sorting

In a real business scenario, you will encounter a query based on a certain column, and then display it in reverse order in a time-sorted manner.

For example, in the Weibo business, the user’s Weibo display is to query the Weibo subscribed by the user according to the user ID, and then display it in reverse order according to time; for example, in the e-commerce business, the user’s order details page is to query the user according to the user ID. order data, and then display them in reverse order according to the purchase time.

Next, let's take a look at a real business opportunity table on our line. The fields have been simplified, and only a few key fields are reserved. At the same time, more than 700,000 data are directly initialized for the convenience of testing.

CREATE TABLE t_opp_base
(
    id                  int            primary key auto_increment,
    opp_code            varchar(50)    NOT NULL,  -- 商机编码
    opp_name            varchar(200)   NOT NULL,
    principal_user       varchar(50)    NOT NULL,  -- 责任人
    opp_status          char(1)        NOT NULL,
    opp_amount          decimal(152NOT NULL,
    opp_date            date           NOT NULL,
    opp_priority        char(15)       NOT NULL,
    remark              varchar(79)    NOT NULL,
    KEY `idx_opp_code` (opp_code),
    KEY `idx_principal_user` (principal_user)
);

in:

  • The field id is a primary key of type INT;

  • The fields opp_code, principal_user have added a single-field index because there are many query scenarios

  • The fields opp_date, opp_status, opp_amount, and opp_priority are used for the basic details of the business opportunity, respectively indicating the time of the business opportunity, the status of the current business opportunity, the total value of the business opportunity, and the priority of the business opportunity.

After having the above business opportunity table, when the user views the business opportunity information that javadaily is responsible for, and needs to sort the query according to the business opportunity time, the following SQL can be used:

select * from t_opp_base  where principal_user = 'javadaily' order by opp_date DESC

However, due to the index design of the above table structure, the index idx_principal_useronly sorts the column principal_user, so after fetching the user's data, an additional sort is required to get the result. You can check the execution plan EXPLAIN to confirm:

 

It can be seen from the above execution plan that the SQL statement can indeed use the index idx_principal_user, but the Using filesort displayed in the Extra column indicates that an additional sort is required to obtain the final result.

Since the column principal_user has been indexed, the above SQL statement will not be executed particularly slowly, but in the case of high concurrency, each SQL execution needs to be sorted, which will have a very obvious impact on business performance, such as CPU load Higher, QPS lower.

To solve this problem, the best way is: the results are already opp_datesorted , so that no additional sorting is required.

So, we create a new composite index on the table t_opp_base, idx_principal_oppdate, to index the fields (principal_user, opp_date).

create index idx_principal_oppdate
    on t_opp_base (principal_user,opp_date);

This is the sql before execution. According to the time, it shows the business opportunity project that the person in charge is responsible for. The execution plan is:


In this way, we eliminate Using filesortand improve execution efficiency.

Index coverage to avoid returning to the table

Basic concept:

SQL requires secondary index query to obtain the primary key value, and then searches the primary key index according to the primary key value, and finally locates the complete data. This process is called back to the table. However, since the leaf nodes of the secondary composite index include index key values ​​and primary key values, if the queried field is in the leaf nodes of the secondary index, the results can be returned directly without returning to the table. This optimization technique of avoiding returning to the table by combining indexes is also called index coverage (Covering Index).

For example, there is the following SQL:

select principal_user,opp_date,opp_amount from t_opp_base  where principal_user = 'javadaily' ;

View its execution plan:

-> Index lookup on t_opp_base using idx_principal_oppdate (principal_user='javadaily')  (cost=312.51 rows=321) (actual time=0.452..0.908 rows=321 loops=1)

Its execution plan shows that the previously created combined index is used idx_principal_user. However, since the leaf nodes of the combined index only contain the value of(principal_user,opp_date,id) the field , it is necessary to find the corresponding one through the id back table .opp_amountopp_amount

The execution plan shows that the execution cost is 312.51. (cost=312.51 represents the current execution cost of this SQL. You don’t need to care about the specific unit of cost, you only need to understand that the smaller the cost, the smaller the overhead and the faster the execution speed.)

If you want to avoid returning to the table, you can use the index coverage technique to create (principal_user,opp_date,opp_amount)a composite index, such as:

alter table t_opp_base add index
 idx_principal_oppdate_amount(principal_user,opp_date,opp_amount);

Check the execution plan again:

-> Index lookup on t_opp_base using idx_principal_oppdate_amount (principal_user='javadaily')  (cost=41.52 rows=321) (actual time=0.149..0.337 rows=321 loops=1)

The execution cost has dropped significantly, from 312.51 to 41.52, and the execution efficiency has been greatly improved.

 

You can see that the execution plan selects idx_principal_oppdate_amountthe index, and the Extra column is displayed as Using index, which means that the covering index technology is used.

The above SQL returns a total of 321 records, which means that before the index coverage technology is used, this SQL needs to return to the table 321 times in total. Every time data is read from the secondary index, the field opp_amount needs to be obtained through the primary key . After using the index coverage technology, there is no need to return to the table, reducing the cost of returning to the table 321 times, which is why the execution cost will be reduced so much.

Next, let's look at this SQL

select principal_user,sum(opp_amount) from t_opp_base  group by principal_user;

This SQL is grouped and summarized according to the person in charge of the business opportunity, finds out the total value of the business opportunity that each person in charge is responsible for, and evaluates the person in charge.

In order to let everyone intuitively feel the power of index coverage, I first delete the previously created indexidx_principal_oppdate_amount

ALTER TABLE t_opp_base
drop INDEX idx_principal_oppdate_amount;

View its execution plan

 

 

It can be seen that this SQL optimization selects the index idx_principal_oppdate, but because the index does not contain the field opp_amount, it needs to return to the table. According to the estimates of the rows, it is estimated that the table will be returned about 717912 times. At the same time, it can also be seen that the execution cost is 76850.31 and the execution time is 10.9 seconds.

Then we add the composite index againidx_principal_oppdate_amount

alter table t_opp_base add index
 idx_principal_oppdate_amount(principal_user,opp_date,opp_amount);

Check the execution plan again

 

It can be seen that this time the execution plan upgrade uses the composite index idx_principal_oppdate_amount, and Using indexthe prompt of indicates that the index coverage technology is used. At the same time, the execution time is 1.74s, and the SQL performance is greatly improved.

 

This is the power of index coverage technology, and this is only based on a total of 700,000 records in the t_opp_base table. If the number of records in the table t_opp_base is larger, the number of times to return to the table will be more, and the performance improvement through index coverage technology will be more obvious.

summary

A composite index is also a B+ tree, except that the indexed columns consist of multiple columns. The composite index can be either a primary key index or a secondary index. Composite indexes mainly have the following three advantages:

  • Cover multiple query conditions, such as (a, b) index can cover query a = ? or a = ? and b = ?;

  • Avoid additional sorting of SQL to improve SQL performance, such WHERE a = ? ORDER BY bas query conditions;

  • Using the feature that the composite index contains multiple columns can implement the index coverage technology and improve the query performance of SQL. It is not difficult to improve the performance by 10 times by using the index coverage technology well.

Well, that's all for today's article. I hope that through this article, you can reasonably create composite indexes in actual projects to improve query efficiency. Finally, I am Piao Miao Jam, an architect who writes code, a programmer who does architecture, and I look forward to your attention. We are destined to see you again!

If you pay attention, you will get 10 G teaching videos. What are you waiting for? Why don't you get on the bus?

 

This article is shared from the WeChat public account - JAVA Rizhilu (javadaily).
If there is any infringement, please contact [email protected] to delete it.
This article participates in the " OSC Source Creation Program ". You are welcome to join in and share it.

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/1388595/blog/5136464