Basic ideas of SQL tuning
Operating System SQL Tuning in the Scenario of 10 Million Users
This article summarizes from "Take you to become a master of MySQL combat optimization from scratch"
In an Internet company, it is necessary to use the operating system to filter out a large number of users, and then push messages to these users. The executed sql is as follows:
users store the core data of users, such as id, name, nickname, etc.
users_extent_info stores extended information of users, such as home address, hobbies, last login time, etc.
Of course, we have to count the number of related users first, so execute the following sql
In the scenario of a large table with a data volume of tens of millions, the above sql runs out at a speed of tens of seconds. The corresponding execution plan is as follows
. First, look at the third line of the execution plan. The select_type is MATERIALIZED, indicating that the subquery has been materialized
Then see that the first and second rows of the execution plan have the same id values, indicating that the materialized table and the users table are connected and queried.
Since it is a join query, it means that when mysql generates an execution plan, an ordinary in statement is automatically optimized into a join operation based on semi_join
And this join operation driving table and driven table are full table scans, which is the reason for poor performance
Let's verify the idea. First, execute set optimizer_switch='semijoin=off' to turn off the semi-join optimization. At this time, execute the explain command to see the execution plan at this time, and find that it returns to a normal state at this time.
That is, there is a subquery of SUBQUERY, which is scanned based on range, and then there is a main query of PRIMARY type, which performs search directly based on the clustered index of the primary key. Run this sql again, and find that the performance has improved dozens of times, and it has become more than 100 seconds.
Of course, these configurations cannot be changed at will in the production environment, so we only need to change the way sql is written so that it does not generate semi-query optimization
An or condition is added to the original statement, but the or condition cannot be established, because the latest_login_time without data is less than -1. Since the addition of the or condition does not satisfy the conditions for semi-query optimization, the semi-query optimization will not be performed. Query optimization, but normal use of subqueries
SQL Tuning Practice of Billion-level Data Commodity System
A slow query was found in the monitoring system of the database.
This is a very simple statement. It is filtered according to the category of the product and its subcategories, and then sorted by id in reverse order, and finally paginated. The above statement actually takes tens of seconds to execute
The connection of the database is basically filled with slow queries. A connection needs to execute sql for dozens of seconds before the next sql can be executed, and the database is basically scrapped.
It stands to reason that when the category index is used, the speed is very fast, explain it to see a wave
There is our index_category in possible_keys, and the result is not actually using this index, but PRIMARY
Use the force index syntax
to force sql to use the sql you specified. At this time, if you execute this statement again, you will find that it only takes 100ms, and the performance will come up instantly.
SQL Tuning Practice of Billion Order Comment System
sql for pagination query on comments table
where product_id = 'xx' and is_good_comment = '1' These two conditions are not a joint index, and a large number of return table operations will inevitably occur, which is extremely time-consuming
Rewriting the above sql
statement will completely change his execution plan. First execute the sub-query in the parentheses. The sub-query will use the PRIMARY clustered index to scan in the reverse direction of the id value of the clustered index, and match where product_id = 'xx' and is_good_comment = '1' conditional data is filtered out
You will see the result set for the subquery from the execution plan, a temporary table, perform a full table scan, get 20 pieces of data, and then traverse the 20 pieces of data, each piece of data goes to the clustered index according to the id to find the complete data, that's it
Summarize
The in statement is optimized in the form of semi_join. The 2-table join leads to low execution efficiency. Rewrite the sql to change it into a SUBQUERY subquery.
The sql statement does not use the index correctly, use force index to force the index to be used
Deep paging results in a large number of back-to-table operations, so change it to a derived table query
Reference blog
[1]https://mp.weixin.qq.com/s/2ATCvniADrxyb0MhV5k3EQ