MySQL index--from a slow query optimization

A slow query log

 

MySQL's index is a double-edged sword. If used properly, it will bring great performance improvement to the system; on the contrary, if used improperly, it may bring disastrous consequences. The most frightening thing is that it is difficult to find out in the early stage. As the amount of data increases and the business peak period approaches, the problem will suddenly be exposed.

 

A system I was in charge of had a similar problem this week, and thankfully it was dealt with in a timely manner without catastrophic consequences. Take time today to record what happened, and motivate yourself in the future: new members must learn the basic specifications of SQL statements, and each time they go online, they must conduct code review.

 

This system is a new business and has been up and running for some time. As the 618 promotion approached, the business volume increased, and the problem was exposed. Working overtime this Wednesday night, I suddenly received a message from the DBA that the cpu utilization rate of the server where our mysql is located has risen sharply in the past 20 minutes, approaching 90%. multiple databases corresponding to each application). When the incident happened suddenly, I immediately asked the DBA for the slow query log for analysis. The log is as follows (the business information has been blocked):



 

As can be seen from the slow query log, the most performance-consuming statement is "SELECT xxx_pc_act_profile", which executes 7618 within 26 minutes (Time range: 2017-05-31 20:20:02 to 20:46:04) times, with an average of about 113ms each time, which has reached an intolerable level. And unfortunately, this table does belong to our system.

 

Problem solving steps

 

Step 1: stop service

Since there are other application databases in the mysql server, in order to prevent other businesses from being affected, the first step is to immediately decide to stop our subsystem service (weigh the scope of influence). Observe the situation of the msyql server again, and the cpu usage has returned to normal, which further shows that it is indeed caused by this business.

 

Step 2: Preliminary determination of indexing problems

Analysis of this problematic statement is a select statement:

      SELECT

        xxx,xxx,xxx,xxx

        FROM   xxx_pc_act_profile

         where 

          and start_time <= '2017-05-31 20:30:00'

 and end_time >= '2017-05-31 20:30:00'

 and valid_flag = 1

          and status = 1

          and brandIds = '94924'

        order by weight desc desc

 

It can be seen that there are many query conditions in the where statement, as well as the order by statement. The performance problem caused by the select statement can be 99% sure that the index is not set properly.

 

Step 3: Analyze the execution plan and index hits

View the execution plan: explain select xxx from xxx_pc_act_profile where xxx;

Found that the query hits the index 'idx_status', which looks like an index on a status field. Further confirmation confirms that the status field is a status field (0-normal, 1-offline).

So far, the cause of the problem has been located: wrongly creating an index on a "low cardinality column".

 

Step 4: View the creation details of the table index

CREATE TABLE `xxx_pc_act_profile` (

  -- omit fields

  PRIMARY KEY (`id`),

  KEY `idx_url` (`url`),

  KEY `idx_third_cate` (`third_cate`),

  KEY `idx_start_time` (`start_time`),

  KEY `idx_end_time` (`end_time`),

  KEY `idx_status` (`status`),

  KEY `idx_valid_flag` (`valid_flag`),

  KEY `idx_pre_cate_level` (`pre_cate_level`),

  KEY `idx_confirm_flag` (`confirm_flag`),

  KEY `idx_last_publish_date` (`last_publish_date`),

  KEY `idx_valid_query` (`start_time`,`end_time`,`status`,`valid_flag`)

) ENGINE=InnoDB COMMENT='xxx activity portrait table'

 

I am amazed to see what indexes have been created. Preliminary list of questions:

1. Too many indexes are created (ordinary indexes are B-TREEs, which require separate storage space).

2. Create indexes on low-cardinality columns, such as status, valid_flag, etc.

3. Create an index on the string type, such as: third_cate, etc.

4. Create an index for useless fields: url, this index is useless at all.

There are also many other problems found.

 

Here, I suddenly found that I should bear a lot of responsibility: the training of new colleagues usually only stays on java coding, especially now that most businesses have redis caches blocking the upper layer, and there is no organization for the basic specifications of sql Learning, the code review before going online did not cover sql.

 

The basic principles of msyql index creation

 

Through this negative textbook, a quick summary of the precautions (basic principles) when creating a cable:

1. Do not create indexes on low cardinality columns. It wastes index storage space and does not improve query efficiency.

2. Try not to build indexes on fields that are frequently modified, which will increase the cost of insertion and increase the probability of deadlocks. For example, this example will not add an index to the weight field

3. Delete redundant indexes. All unused indexes must be deleted to avoid unnecessary space waste. The url index is useless in this example.

4. Don't create too many indexes, because when inserting data, indexes also need to be inserted. Too many indexes can degrade insert performance. This example has only two indexes left after optimization.

5. Do not create an index on a non-null column. If the value is null, it is recommended to replace it with a constant such as 1 or -1. In this example, the start_time and end_time fields are optimized to be non-null.

6. If the query is multi-condition, do not create an index for each condition field, but create a composite index, because mysql only uses one index.

7. Create a composite index, pay attention to the left matching principle, and try to consider reusability. For example, creating a composite index index(a, b, c) is equivalent to creating index(a) index(a, b) index(a, b, c) at the same time.

8. To create a composite index, you need to pay attention to put the highest degree of discrimination at the top (if it conflicts with point 6, you need to balance it according to the business).

 

By looking for information, there are other points:

 

9. It is best to use self-increasing primary key to ensure data continuity (mysql innodb primary key uses b+tree by default, index and data are placed in the same btree), do not use uuid, hash, md5, etc.

10. Less use of foreign keys will cause the two tables to affect each other when the data is changed. Do as much as possible through business.

11. Do not use the previously matched like query, which will cause the index to fail. You can use post-match like, such as "xxx%".

12. Create indexes on string columns, try to use prefix indexes. The prefix cardinality should be balanced before the matching degree and the storage capacity (the storage capacity of the index) according to the specific business.

13. Do not use not in\like, it will cause the index to fail. not in can be replaced with not exists. It is best to have an index on the column where in and or are located

 

(ps: Ordinary java development, non-dba summary, incomplete places, I hope to be supplemented by the DBA god).

 

In this accident example, the index is optimized according to the above principles: all previous indexes are finally removed, and only two composite indexes are newly created according to the specific business (other queries can reuse parts of the composite index).

(ps: actual repair steps: create a new table with the same field information, create a new index, and then synchronize the data in the old table to the new table)

 

So far, the problem has been solved, and some business data was lost during the period, but fortunately, the system is a peripheral system, and the loss is still within the controllable range.

 

New issue Duplicate PRIMARY

 

In the process of solving the above index problem, I always feel that the way the primary key of this table is created will cause the problem. According to the above-mentioned principle of primary key creation: "It is better to use auto-increment for the primary key", but the primary key of the above table does not satisfy this rule:

  PRIMARY KEY (`id`),

 

After the problem was fixed, we continued to check the log from time to time, and sure enough, there were new discoveries. The log occasionally reported errors:

Duplicate entry 'xxx' for key 'PRIMARY'

 

The problem is obvious, that is, when the data is inserted, it is found that the primary key id 'xxx' already exists, and the primary key duplicate write conflict error is reported. Immediately extract the code analysis problem, where the primary key id is the primary key of another table X, query the X table according to the business, process the records that meet the conditions, and insert them into the new table. The code logic is as follows:

 

------Open the spring transaction to omit the code-------

xxxP cActProfile oldInfo = xxxDao.getById( newInfo.getId );//First check whether the record corresponding to the primary key id exists

if (oldInfo == null){

     xxxDao.insert(newInfo);//If it does not exist, insert

}else{

      xxxDao.update(newInfo);//If it already exists, modify it

}

------Submit the spring transaction to omit the code-------

 

At first glance, there is no problem with the code. In a transaction, how can there be a "primary key conflict" when inserting.

In fact, don't be deceived by the firm. Let's analyze two insert requests with the same id in the case of high concurrency:

 

                     Transaction 1 Transaction 2

1. Judging whether id: 123 exists Judging whether id: 123 exists

2. Judgment result: does not exist, insert it Judgment result: does not exist, insert it    

3. Insert the record with id 123 and wait

4. Insertion completed Insertion failed, id: 123 already exists

5. Close the transaction Close the transaction

 

Since mysql's insertion is an inseparable instruction, it is atomic, and it must wait for one of the insertions to complete before the other can be inserted. This leads to the occurrence of the above "primary key violation" exception.   

 

Consequences of this problem: After it is found that it already exists, the modification should be performed, but an exception is thrown directly after the conflict, and the modification operation is not executed, resulting in the loss of modified data. This problem often occurs in the case of high concurrency.

 

The final solution: use the insert duplicate update statement, the problem is solved, the syntax is as follows:

 

INSERT INTO table (xx,xx,xx) VALUES (xx,xx,xx) ON DUPLICATE KEY UPDATE ....

 

So far the problem is solved. But there are still lingering fears. It is imperative to strengthen the standardization of SQL writing and conduct code review for all SQL.

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326247308&siteId=291194637
Recommended