Mysql slow query analysis and optimization

What is a slow query?

Slow queries, as the name suggests, perform very slow queries. When the execution of SQL exceeds the time threshold set by the long_query_time parameter (default 10s), it is considered a slow query, and this SQL statement needs to be optimized. Slow queries are recorded in the slow query log .

The slow query log is disabled by default. If you need to optimize SQL statements, you can enable this function, which allows you to easily know which statements need to be optimized.

Slow query configuration

Taking the MySQL database as an example, the slow query function is disabled by default. When the slow query switch is enabled and the executed SQL statement reaches the threshold set by the parameter, the slow query function will be triggered to print out the log.

  • slow query log

    查询是否开启慢查询日志:
    show variables like ‘slow_query_log’;
    
    开启慢查询sql:
    set global slow_query_log = 1/on;
    
    关闭慢查询sql:
    set global slow_query_log = 0/off;
    
  • Whether to enable logging for unused indexes

    查询未使用索引是否开启记录慢查询日志: 
    show variables like ‘log_queries_not_using_indexes’;
    
    开启记录未使用索引sql:
    set global log_queries_not_using_indexes=1/on
    
    关闭记录未使用索引sql:
    set global log_queries_not_using_indexes=0/off
    
  • Slow query time setting

    查询超过多少秒的记录到慢查询日志中:
    show variables like ‘long_query_time’;
    
    设置超X秒就记录慢查询sql:
    set global long_query_time= X;
    
  • The above parameter settings are effective in the current database, and will become invalid after MySQL restarts. If you want to take effect permanently, you must modify the configuration file my.cnf

  • slow query path

    查询MySQL慢查询日志的路径:
    show variables like ‘slow_query_log_file%’;
    
Slow query log analysis: mysqldumpslow tool

Taking MySQL as an example, the mysqldumpslow tool is generally used to analyze slow query logs and commands to query slow SQL statements.

The 10 slowest queries with the most time:

sql mysqldumpslow -s t -t 10 -g 'select' /data/mysql/data/dcbi-3306/log/slow.log

Get one of the results shown in the figure below:

  • Count: Represents how many times this SQL statement has been executed
  • Time: Represents the execution time, and the brackets are the cumulative time
  • Lock: Indicates the locked time, and the brackets are the cumulative time
  • Rows: Indicates the number of records returned, and the brackets are the cumulative number of records

With such a clear slow query log analysis, we can deal with the problem of slow query SQL statements in a more targeted and faster manner, and directly find the corresponding program location to optimize the code to avoid slow queries.

exists、not exists和in、not in
  • The in statement in mysql is a hash connection between the outer table and the inner table, and the return result of the sub-condition of the in query must have only one field

  • The exists statement is a loop cycle on the outer table, and each loop cycle queries the inner table.

  • Everyone has always believed that exists is more efficient than the in statement, but this statement is actually inaccurate. This is to distinguish the environment.

    • If the two tables being queried are of similar size, there is little difference between using in and exists.

    • If one of the two tables is small and the other is a large table, use exists for the large subquery table and in for the small subquery table:

select *from small where title in (select title from big);

not as efficient as

select *from small where exists (select title from big where title=small.title);
not in 和not exists
  • If the query statement uses not in, then both the internal and external tables are scanned in full, and no index is used;
  • The subquery of not extsts can still use the index on the table. So no matter the size of the table, using not exists is faster than not in .
  • You can know that not in is a range query. This kind of != range query cannot use any index, which means that every record in table A must be traversed once in table B to check whether this record exists in table B.
in 和 exists
  • When the table queried in parentheses is large, exists has high efficiency
  • The table queried in the brackets is small, and the in efficiency is high
index failure

The reason for the slow query is undoubtedly the problem of the SQL statement, which is usually caused by the large amount of scanned data , no index used , or index failure . The following are some index failure situations:

  • Query statement using the LIKE keyword

    • In a query statement using the LIKE keyword, if the first character of the matching string is "%", the index will not take effect. It will only work if the "%" is not indexed in the first position.
  • Query statement using multi-column index

    • MySQL can create indexes for multiple fields. An index can include up to 16 fields. For a multi-column index, the index will be used only when the query condition uses the first field in these fields, that is, the left matching principle .

    • The leftmost matching principle: the leftmost is first, and any continuous index starting from the leftmost can be matched. At the same time, when a range query (>, <, between, like) is encountered, the matching will stop.

      • For example: b = 2 If you create an index in the order of (a, b), it will not match the (a, b) index;
      • But if the query condition is a = 1 and b = 2 or (or b = 2 and a = 1), it is fine, because the optimizer will automatically adjust the order of a and b.
      • Another example is a = 1 and b = 2 and c > 3 and d = 4 If you create an index in the order of (a,b,c,d), d will not use the index, because the c field is a range query, after it fields will stop matching.
SQL statement optimization
  • Query statements should try to avoid full table scans. First, you should consider building indexes on the Where clause and OrderBy clause , but each SQL statement will only use one index at most.

    • Establishing too many indexes will bring the overhead of inserting and updating. At the same time, for fields with little distinction, you should try to avoid establishing indexes
    • You can use the explain keyword before the query statement to view the execution plan of the SQL statement and determine whether the query statement uses an index;
  • Try to use EXIST and NOT EXIST instead of IN and NOT IN, because the latter is likely to cause the full table scan to abandon the use of indexes;

  • Should try to avoid field in the Where clause

    • Perform NULL judgment, because NULL judgment will lead to full table scan;
    • Use or as the connection condition, because it will also cause a full table scan;
    • use! The = or <> operator will also cause a full table scan;
    • Using like "%abc%" or like "%abc" will also cause a full table scan, while like "abc%" will use an index.
    • Use the expression operator, because it will cause a full table scan;
    • Use functions on the fields, as this will also result in a full table scan
  • When using the Union operator, you should consider whether you can use Union ALL instead, because the Union operator will sort the results and delete duplicate records when merging the results. For applications that do not have this requirement, you should use Union ALL , the latter just merges and returns the results, which can greatly improve performance;

  • Try to avoid using "*" in the Select statement, because "" will be converted into the column names of all columns during the parsing process of the SQL statement, and this work is done by querying the data dictionary, which has a certain overhead;

  • In the Where clause, the table connection condition should be written before other conditions, because the analysis of the Where clause is from the back to the front, so try to put the restriction that can filter most records at the end of the Where clause;

  • If there is a joint index such as index(a,b,c) on the database table, the order of appearance of the condition fields in the Where clause should be consistent with the order of appearance of the index fields, otherwise the joint index cannot be used ;

  • The order of appearance of the tables in the From clause will also affect the execution performance of the SQL statement. The From clause is parsed from the back to the front, that is, the table written at the end will be prioritized, and the table with fewer records should be selected. Put it as the base table at the back, and if there are 3 or more tables connected to the query, the cross table should be used as the base table;

  • Try to use the >= operator instead of the > operator

    • For example, the following SQL statement

      select dbInstanceIdentifier from DBInstance where id > 3;
      
    • This statement should be replaced with

      select dbInstanceIdentifier from DBInstance where id >=4 
      
    • The execution results of the two statements are the same, but the performance is different, the latter is more efficient

    • Because when the former is executed, it will first find the record equal to 3, and then scan forward, while the latter directly locates the record equal to 4.

Table structure optimization

This mainly refers to how to build indexes correctly, because unreasonable indexes will lead to full table scans, and too many indexes will bring performance overhead for inserting and updating;

  • First of all, it must be clear that each SQL statement can only use one index at most. If there are multiple indexes that can be used, the system will select an index to execute according to the execution cost;

  • For Innodb tables, although if the user does not specify a primary key, the system will automatically generate a primary key column, but there are many problems with the automatically generated primary key column

    • Insufficient performance, unable to use cache to read;
    • Insufficient concurrency, all tables without a primary key in the system share a global Auto_Increment column.
    • Therefore, for all tables of InnoDB, the primary key must be specified while creating the table .
  • Do not create indexes for fields with little distinction;

  • Only one type of index needs to be built for a field, and there is no need to create a unique index and an INDEX index.

  • For large text fields or BLOB fields, do not create indexes;

  • The join field of the join query should be indexed;

  • Sorting fields generally need to be indexed;

  • Group statistics fields generally need to be indexed;

  • Use the joint index correctly, the first field of the joint index can be used alone

    • For example, the following joint index:

      index(userID,dbInstanceID)
      
    • The query statement can use the index

      select dbInstanceIdentifier from DBInstance where userID=? 
      
    • But the following statement cannot use the index

      select dbInstanceIdentifier from DBInstance where dbInstanceID=?
      
  • Indexes are generally used for tables with many records. If there is a table DBInstance, all queries have a userID conditional field. It is known that this field can already distinguish records well, that is, the number of records under each userID is not many, so the table only You need to create an index on userID

    • Even if other conditional fields are used, since the record data corresponding to each userID is not much, so other fields do not need to be indexed and basically have no effect
    • At the same time, it can also avoid the performance overhead of inserting and updating caused by creating too many indexes;

Guess you like

Origin blog.csdn.net/s_frozen/article/details/129031350