MySQL Performance Optimization Tips

1. Background

Recently, new functions have been added to the company's projects. After going online, I found that the list of some functions took a long time to query. The reason is that the new function uses the interface of the old function, and the SQL query statements of these old interfaces are associated with 5 or 6 tables and the writing is not standardized enough, which causes MySQL to fail the index when executing the SQL statement and perform a full table scan. The colleague who was originally in charge of optimization asked for leave to go home, so the problem of optimizing the query data fell into the hands of the author. The author successfully solved the problem after consulting the online SQL optimization information. Here, I record and summarize MySQL query optimization related skills from a == global perspective ==.

2. Optimization ideas

The slow data query does not mean that there is a problem with the writing of the SQL statement.  First, we need to find the source of the problem in order to "prescribe the right medicine". The author uses a flowchart to show the idea of ​​MySQL optimization:

image

Without more words, it is clear from the figure that there are many reasons for slow data query, such as: cache invalidation, MySQL server crashes due to high concurrent access during this period; SQL statement writing problems; MySQL server parameter problems; hardware configuration limits MySQL service performance problems, etc.

3. View the status value of the MySQL server running

If the number of concurrent requests in the system is not high and the query speed is slow, you can ignore this step and directly perform the SQL statement tuning step.

Excuting an order:

show status

Due to the large number of returned results, the results are not posted here. Among them, in the returned results, we mainly focus on the values ​​of "Queries", "Threads_connected" and "Threads_running", that is, the number of queries, the number of thread connections and the number of threads running.

We can monitor the status value of MySQL server running by executing the following script

#!/bin/bash
while true
do
mysqladmin -uroot -p"密码" ext | awk '/Queries/{q=$4}/Threads_connected/{c=$4}/Threads_running/{r=$4}END{printf("%d %d %d\n",q,c,r)}' >> status.txt
sleep 1
done

Execute the script for 24 hours, get the content in status.txt, and calculate it again through awk==Number of MySQL service requests per second==

awk '{q=$1-last;last=$1}{printf("%d %d %d\n",q,$2,$3)}' status.txt

Copy the calculated content to Excel to generate a chart to observe the periodicity of the data.

If the observed data changes periodically, as explained in the above figure, the cache invalidation strategy needs to be modified.

E.g:

Obtain one of the values ​​in the range [3,6,9] through random numbers as the cache invalidation time, which spreads the cache invalidation time and saves some memory consumption.

During the peak access period, part of the requests are offloaded to the cache that is not invalidated, and the other part accesses the MySQL database, which reduces the pressure on the MySQL server.

4. Obtain the SQL statement that needs to be optimized

4.1 Method 1: View running threads

Excuting an order:

show processlist

Return result:

mysql> show processlist;
+----+------+-----------+------+---------+------+----------+------------------+
| Id | User | Host      | db   | Command | Time | State    | Info             |
+----+------+-----------+------+---------+------+----------+------------------+
|  9 | root | localhost | test | Query   |    0 | starting | show processlist |
+----+------+-----------+------+---------+------+----------+------------------+
1 row in set (0.00 sec)

From the returned results, we can understand what command/SQL statement was executed by the thread and the execution time. In practical applications, the returned result of the query will have N records.

Among them, the value of the returned State is the key for us to judge the performance . If the value appears as follows, the SQL statement recorded in this row needs to be optimized:

Converting HEAP to MyISAM # 查询结果太大时,把结果放到磁盘,严重
Create tmp table #创建临时表,严重
Copying to tmp table on disk  #把内存临时表复制到磁盘,严重
locked #被其他查询锁住,严重
loggin slow query #记录慢查询
Sorting result #排序

The State field has many values, to learn more, see the link provided at the end of the article.

4.2 Method 2: Enable slow query log

Add two parameters below the [mysqld] line in the configuration file my.cnf:

slow_query_log = 1
slow_query_log_file=/var/lib/mysql/slow-query.log
long_query_time = 2

log_queries_not_using_indexes = 1

Among them, slow_query_log = 1 indicates that slow query is enabled; slow_query_log_file indicates the location where the slow query log is stored;
long_query_time = 2 indicates that the query will be logged for >= 2 seconds; log_queries_not_using_indexes = 1 is to log SQL statements that do not use indexes.

Note: The path of slow_query_log_file cannot be written casually, otherwise the MySQL server may not have permission to write the log file to the specified directory. It is recommended to directly copy the path above.

After modifying the save file, restart the MySQL service. The slow-query.log log file is created in the /var/lib/mysql/ directory. Connect to the MySQL server and execute the following command to view the configuration.

show variables like 'slow_query%';

show variables like 'long_query_time';

Test the slow query log:

mysql> select sleep(2);
+----------+
| sleep(2) |
+----------+
|        0 |
+----------+
1 row in set (2.00 sec)

Open slow query log file

[root@localhost mysql]# vim /var/lib/mysql/slow-query.log
/usr/sbin/mysqld, Version: 5.7.19-log (MySQL Community Server (GPL)). started with:
Tcp port: 0  Unix socket: /var/lib/mysql/mysql.sock
Time                 Id Command    Argument
# Time: 2017-10-05T04:39:11.408964Z
# User@Host: root[root] @ localhost []  Id:     3
# Query_time: 2.001395  Lock_time: 0.000000 Rows_sent: 1  Rows_examined: 0
use test;
SET timestamp=1507178351;
select sleep(2);

We can see that the SQL statement just executed for 2 seconds is recorded.

Although slow query SQL information is recorded in the slow query log, the content of the log records is dense and difficult to review. Therefore, we need to filter out the SQL through tools.

MySQL provides the mysqldumpslow tool for log analysis. We can use mysqldumpslow --help to view command related usage.

Common parameters are as follows:

    -s:排序方式,后边接着如下参数
        c:访问次数
        l:锁定时间
        r:返回记录
        t:查询时间
    al:平均锁定时间
    ar:平均返回记录书
    at:平均查询时间
    -t:返回前面多少条的数据
    -g:翻遍搭配一个正则表达式,大小写不敏感

Case:

获取返回记录集最多的10个sql
mysqldumpslow -s r -t 10 /var/lib/mysql/slow-query.log

获取访问次数最多的10个sql
mysqldumpslow -s c -t 10 /var/lib/mysql/slow-query.log

获取按照时间排序的前10条里面含有左连接的查询语句
mysqldumpslow -s t -t 10 -g "left join" /var/lib/mysql/slow-query.log

Five, analyze the SQL statement

5.1 Method 1: explain

To filter out the problematic SQL, we can use the explain provided by MySQL to view the SQL execution plan (associated tables, table query order, index usage, etc.).

usage:

explain select * from category;

Return result:

mysql> explain select * from category;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table    | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | category | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | NULL  |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

Field explanation:
1) id: select query serial number. If the id is the same, the execution order is from top to bottom; if the id is different, the higher the id value, the higher the priority and the first to be executed.

2) select_type: The operation type of the query data, its values ​​are as follows:

simple:简单查询,不包含子查询或 union
primary:包含复杂的子查询,最外层查询标记为该值
subquery:在 select 或 where 包含子查询,被标记为该值
derived:在 from 列表中包含的子查询被标记为该值,MySQL 会递归执行这些子查询,把结果放在临时表
union:若第二个 select 出现在 union 之后,则被标记为该值。若 union 包含在 from 的子查询中,外层 select 被标记为 derived    
union result:从 union 表获取结果的 select

3) table: show which table the row of data is about

4) partitions: matching partitions

5) type: The connection type of the table, its value and performance are arranged as follows:

system:表只有一行记录,相当于系统表
const:通过索引一次就找到,只匹配一行数据
eq_ref:唯一性索引扫描,对于每个索引键,表中只有一条记录与之匹配。常用于主键或唯一索引扫描
ref:非唯一性索引扫描,返回匹配某个单独值的所有行。用于=、< 或 > 操作符带索引的列
range:只检索给定范围的行,使用一个索引来选择行。一般使用between、>、<情况
index:只遍历索引树
ALL:全表扫描,性能最差

Note: The first 5 cases are all ideal case index usage. Usually optimized to at least the range level, preferably to ref

6) possible_keys: Indicates which index MySQL uses to find row records in this table. If the value is NULL, it means that no index is used, and an index can be created to improve performance

7) key: Displays the index actually used by MySQL. If NULL, no index query was used

8) key_len: Indicates the number of bytes used in the index, through which the length of the index used in the query is calculated. Shorter lengths are better without loss of accuracy
Shows the maximum length of an indexed field, not the actual used length

9) ref: Displays which field of which table the index field of the table is associated with

10) rows: According to the statistics and selection of the table, roughly estimate the number of records to be found or the number of rows to be read. The smaller the value, the better.

11) filtered: The number of rows returned as a percentage of the number of rows read, the larger the value, the better

12) extra: Contains extra information that is not suitable to be displayed in other columns but is very important. Common values ​​are as follows:

using filesort:说明 MySQL 会对数据使用一个外部的索引排序,而不是按照表内的索引顺序进行读取。出现该值,应该优化 SQL
using temporary:使用了临时表保存中间结果,MySQL 在对查询结果排序时使用临时表。常见于排序 order by 和分组查询 group by。出现该值,应该优化 SQL 
using index:表示相应的 select 操作使用了覆盖索引,避免了访问表的数据行,效率不错
using where:where 子句用于限制哪一行
using join buffer:使用连接缓存
distinct:发现第一个匹配后,停止为当前的行组合搜索更多的行

Note: For the first 2 values, the SQL statement must be optimized.

5.2 Method 2: Profiling

Use the profiling command to get detailed information about the resources consumed by the SQL statement (the cost of each execution step).

5.2.1 View profile opening

select @@profiling;

Return result:

mysql> select @@profiling;
+-------------+
| @@profiling |
+-------------+
|           0 |
+-------------+
1 row in set, 1 warning (0.00 sec)

0 means off, 1 means on

5.2.2 Enable profiles

set profiling = 1;  

Return result:

mysql> set profiling = 1;  
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> select @@profiling;
+-------------+
| @@profiling |
+-------------+
|           1 |
+-------------+
1 row in set, 1 warning (0.00 sec)

After the connection is closed, the profiling state is automatically set to the closed state.

5.2.3 View the list of executed SQL

show profiles;

Return result:

mysql> show profiles;
+----------+------------+------------------------------+
| Query_ID | Duration   | Query                        |
+----------+------------+------------------------------+
|        1 | 0.00062925 | select @@profiling           |
|        2 | 0.00094150 | show tables                  |
|        3 | 0.00119125 | show databases               |
|        4 | 0.00029750 | SELECT DATABASE()            |
|        5 | 0.00025975 | show databases               |
|        6 | 0.00023050 | show tables                  |
|        7 | 0.00042000 | show tables                  |
|        8 | 0.00260675 | desc role                    |
|        9 | 0.00074900 | select name,is_key from role |
+----------+------------+------------------------------+
9 rows in set, 1 warning (0.00 sec)

Before the command is executed, other SQL statements need to be executed to have records.

5.2.4 Query the execution details of the specified ID

show profile for query Query_ID;

Return result:

mysql> show profile for query 9;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000207 |
| checking permissions | 0.000010 |
| Opening tables       | 0.000042 |
| init                 | 0.000050 |
| System lock          | 0.000012 |
| optimizing           | 0.000003 |
| statistics           | 0.000011 |
| preparing            | 0.000011 |
| executing            | 0.000002 |
| Sending data         | 0.000362 |
| end                  | 0.000006 |
| query end            | 0.000006 |
| closing tables       | 0.000006 |
| freeing items        | 0.000011 |
| cleaning up          | 0.000013 |
+----------------------+----------+
15 rows in set, 1 warning (0.00 sec)

Each row is a process of state changes and how long they last. The Status column is consistent with the State of show processlist. Therefore, the points that need to be optimized are the same as those described above.

Among them, the value of the Status field can also refer to the link at the end.

5.2.5 Get CPU, Block IO and other information

show profile block io,cpu for query Query_ID;

show profile cpu,block io,memory,swaps,context switches,source for query Query_ID;

show profile all for query Query_ID;

6. Optimization means

It mainly explains query optimization, index usage and table structure design.

6.1 Query optimization

1) Avoid SELECT *, and query the corresponding field for whatever data is needed.

2) Small tables drive large tables, that is, small data sets drive large data sets. For example, take two tables, A and B, as an example, the two tables are related by the id field.

当 B 表的数据集小于 A 表时,用 in 优化 exist;使用 in ,两表执行顺序是先查 B 表,再查 A 表
select * from A where id in (select id from B)

当 A 表的数据集小于 B 表时,用 exist 优化 in;使用 exists,两表执行顺序是先查 A 表,再查 B 表
select * from A where exists (select 1 from B where B.id = A.id)

3) In some cases, joins can be used instead of subqueries, because with joins, MySQL does not create temporary tables in memory.

4) Appropriately add redundant fields to reduce table associations.

5) Fair use of indexes (described below). Such as: indexing for sorting and grouping fields to avoid the appearance of filesort.

6.2 Index usage

6.2.1 Scenarios suitable for using indexes

1) The primary key automatically creates a unique index

2) Fields that are frequently used as query conditions

3) Fields associated with other tables in the query

4) Fields to sort in the query

5) Statistics or grouping fields in the query

6.2.2 Scenarios where indexing is not suitable

1) Frequently updated fields

2) Fields not used in where conditions

3) Too few table records

4) Tables that are frequently added, deleted, and modified

5) The value of the field has little difference or high repeatability

6.2.3 Principles of Index Creation and Use

1) Single table query: which column is used as the query condition, create an index on that column

2) Multi-table query: when left join, the index is added to the associated field of the right table; when right join, the index is added to the associated field of the left table

3) Do not perform any operations on indexed columns (calculations, functions, type conversions)

4) Do not use != in index columns, <> is not equal to

5) The index column should not be empty, and do not use is null or is not null to judge

6) The index field is a string type, and the value of the query condition should be added with '' single quotes to avoid automatic conversion of the underlying type

Violation of the above principles may lead to index failure, the specific situation needs to use the explain command to view

6.2.4 Index Failure Conditions

In addition to violating the principles of index creation and use, the following conditions will also cause the index to fail:

1) When fuzzy query, start with %

2) When using or, such as: field 1 (non-index) or field 2 (index), the index will be invalid.

3) When using a composite index, the first index column is not used.

index(a,b,c) , taking fields a,b,c as a composite index as an example:

statement Whether the index is valid
where a = 1 Yes, the field a index is valid
where a = 1 and b = 2 Yes, field a and b indexes are in effect
where a = 1 and b = 2 and c = 3 Yes, all in effect
where b = 2 或 where c = 3 no
where a = 1 and c = 3 Field a is valid, field c is invalid
where a = 1 and b > 2 and c = 3 Field a, b are valid, field c is invalid
where a = 1 and b like 'xxx%' and c = 3 Field a, b are valid, field c is invalid

6.3 Database table structure design

6.3.1 Select the appropriate data type

1) Use the smallest data type that can hold the data

2) Use simple data types. int is easier to handle in mysql than varchar type

3) Try to use tinyint, smallint, mediumint as integer types instead of int

4) Define fields with not null as much as possible, because null occupies 4 bytes of space

5) Use the text type as little as possible, and consider sub-tables when it is absolutely necessary

6) Try to use timestamp instead of datetime

7) A single table should not have too many fields, it is recommended to be within 20

6.3.2 Splitting of Tables

When the data in the database is very large, and the query optimization solution cannot solve the problem of slow query speed, we can consider splitting the tables to reduce the amount of data in each table, thereby improving query efficiency.

1) Vertical split: separate multiple columns in a table into different tables. For example, some fields in the user table are frequently accessed, and these fields are placed in one table, and some less frequently used fields are placed in another table.
When inserting data, use transactions to ensure data consistency between the two tables.

2) Horizontal split: split by row. For example, in the user table, use the user ID, take the remainder of 10 for the user ID, and evenly distribute the user data to 10 user tables ranging from 0 to 9. When searching, the data is also queried according to this rule.

6.3.3 Read-write separation

Generally speaking, the database is "read more and write less". In other words, the pressure on the database is mostly caused by a large number of operations to read data. We can adopt the database cluster solution, using one library as the master library, responsible for writing data; other libraries are slave libraries, responsible for reading data. This can ease the pressure on the access to the database.

Seven, server parameter tuning

7.1 Memory related

sort_buffer_size Sort buffer memory size

join_buffer_size use join buffer size

read_buffer_size The size of the buffer allocated during a full table scan

7.2 IO related

Innodb_log_file_size transaction log size

Innodb_log_files_in_group number of transaction logs

Innodb_log_buffer_size Transaction log buffer size

Innodb_flush_log_at_trx_commit Transaction log flush strategy, its values ​​are as follows:

0: Write the log to the cache once per second, and flush the log to the disk

1: Execute log write cache at each transaction commit, and flush log to disk

2: Every time a transaction is committed, the log data is written to the cache, and the flush log is executed to the disk every second

7.3 Safety related

expire_logs_days specifies the number of days to automatically clean up the binlog

max_allowed_packet controls the size of packets that MySQL can receive

skip_name_resolve disables DNS lookups

read_only prohibits non-super permission users from writing permission

skip_slave_start level you use slave auto recovery

7.4 Others

max_connections controls the maximum number of connections allowed

tmp_table_size temporary table size

max_heap_table_size maximum memory table size

The author did not use these parameters to tune the MySQL server. For specific details and performance effects, please refer to the information at the end of the article or Baidu.

8. Hardware purchase and parameter optimization

The performance of the hardware directly determines the performance of the MySQL database. The performance bottleneck of the hardware directly determines the operating data and efficiency of the MySQL database.

As software development programmers, we mainly focus on the optimization content of software, and the following hardware optimization can be understood as

8.1 Memory related

The memory IO is much faster than the hard disk, which can increase the buffer capacity of the system and make the data stay in the memory for a longer time to reduce the IO of the disk

8.2 Disk I/O Related

1) Use SSD or PCle SSD devices, at least hundreds of times or even 10,000 times of IOPS improvement

2) Purchasing an array card equipped with CACHE and BBU modules can significantly increase IOPS

3) Use RAID-10 instead of RAID-5 whenever possible

8.3 Configuring CUP related

In the server's BIOS settings, adjust the following configuration:

1) Select Performance Per Watt Optimized (DAPC) mode to maximize CPU performance

2) Turn off options such as C1E and C States to improve CPU efficiency

3) Memory Frequency (memory frequency) select Maximum Performance

9. References

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325049000&siteId=291194637