MySQL 中的 SQL 查询性能调优

通过 MySQL 中的索引加速 SQL 查询。安装、分析查询并使用存储过程以获得最佳结果。

        在本文中，我们将了解索引表列如何帮助提高 SQL 查询的快速响应时间。我们将介绍安装 MySQL、创建存储过程、分析查询以及了解索引的影响的步骤。
        我在Ubuntu上使用了 MySQL 版本 8 。另外，我使用Dbeavor工具作为 MySQL 客户端来连接 MySQL 服务器。那么我们一起来学习一下吧。
        我使用 MySQL 进行演示；然而，这个概念在所有其他数据库中也保持不变。

1.按照下面的方式，我们可以安装MySQL并使用root用户访问它。该MySQL实例仅用于测试；因此，我使用了一个简单的密码。

$ sudo apt install mysql-server
$ sudo systemctl start mysql.service
$ sudo mysql
mysql> SET GLOBAL validate_password.policy = 0;
mysql> ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
mysql> exit
$ mysql -uroot -ppassword

2. 创建一个数据库并使用它。

mysql> create database testdb;

mysql> show databases;

mysql> use testdb;

3. 创建两个表employee1 和employee2。这里，employee1 没有主键，employee 2 有主键。

mysql> CREATE TABLE employee1 (id int,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE employee2 (id int primary key,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.02 sec

mysql> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| employee1        |
| employee2        |
+------------------+
2 rows in set (0.00 sec)

4. 现在，如果我们检查每个表的索引，我们会发现employee2 表的id 列已经有一个索引，因为它是主键。

mysql> SHOW INDEXES FROM employee1 \G;
Empty set (0.00 sec)

ERROR: 
No query specified

mysql> SHOW INDEXES FROM employee2 \G;
*************************** 1. row ***************************
        Table: employee2
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 0
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
      Visible: YES
   Expression: NULL
1 row in set (0.00 sec)

ERROR: 
No query specified

5. 现在，创建一个存储过程以在两个表中插入批量数据。我们在每个表中插入 20000 条记录。然后我们可以使用CALL procedure-name命令调用存储过程。

mysql> 

CREATE PROCEDURE testdb.BulkInsert()
BEGIN
		DECLARE i INT DEFAULT 1;
truncate table employee1;
truncate table employee2;
WHILE (i <= 20000) DO
    INSERT INTO testdb.employee1 (id, FirstName, Address) VALUES(i, CONCAT("user","-",i), CONCAT("address","-",i));
    INSERT INTO testdb.employee2 (id,FirstName, Address) VALUES(i,CONCAT("user","-",i), CONCAT("address","-",i));    
   SET i = i+1;
END WHILE;
END

mysql> CALL testdb.BulkInsert() ;

mysql> SELECT COUNT(*) from employee1 e ;
COUNT(*)|
--------+
    20000|
    

mysql> SELECT COUNT(*) from employee2 e ;
COUNT(*)|
--------+
    20000|

6. 现在，如果我们选择任何随机 id 的记录，我们会发现 employee1 表的响应很慢，因为它没有任何索引。

mysql> select * from employee2 where id = 15433;
+-------+----------+------------+---------------+---------+
| id    | LastName | FirstName  | Address       | profile |
+-------+----------+------------+---------------+---------+
| 15433 | NULL     | user-15433 | address-15433 | NULL    |
+-------+----------+------------+---------------+---------+
1 row in set (0.00 sec)

mysql> select * from employee1 where id = 15433;
+-------+----------+------------+---------------+---------+
| id    | LastName | FirstName  | Address       | profile |
+-------+----------+------------+---------------+---------+
| 15433 | NULL     | user-15433 | address-15433 | NULL    |
+-------+----------+------------+---------------+---------+
1 row in set (0.03 sec)

mysql> select * from employee1 where id = 19728;
+-------+----------+------------+---------------+---------+
| id    | LastName | FirstName  | Address       | profile |
+-------+----------+------------+---------------+---------+
| 19728 | NULL     | user-19728 | address-19728 | NULL    |
+-------+----------+------------+---------------+---------+
1 row in set (0.03 sec)

mysql> select * from employee2 where id = 19728;
+-------+----------+------------+---------------+---------+
| id    | LastName | FirstName  | Address       | profile |
+-------+----------+------------+---------------+---------+
| 19728 | NULL     | user-19728 | address-19728 | NULL    |
+-------+----------+------------+---------------+---------+
1 row in set (0.00 sec)

mysql> select * from employee1 where id = 3456;
+------+----------+-----------+--------------+---------+
| id   | LastName | FirstName | Address      | profile |
+------+----------+-----------+--------------+---------+
| 3456 | NULL     | user-3456 | address-3456 | NULL    |
+------+----------+-----------+--------------+---------+
1 row in set (0.04 sec)

mysql> select * from employee2 where id = 3456;
+------+----------+-----------+--------------+---------+
| id   | LastName | FirstName | Address      | profile |
+------+----------+-----------+--------------+---------+
| 3456 | NULL     | user-3456 | address-3456 | NULL    |
+------+----------+-----------+--------------+---------+
1 row in set (0.00 sec)

7. 现在检查命令EXPLAIN ANALYZE的输出。该命令实际上执行查询并计划查询、对其进行检测并执行它，同时计算行数并测量执行计划中各个点所花费的时间。
这里我们发现employee1执行了表扫描，这意味着扫描或搜索整个表以获取输出。我们也称其为表的完整扫描。

mysql> explain analyze select * from employee1 where id = 3456;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee1.id = 3456)  (cost=1989 rows=1965) (actual time=5.24..29.3 rows=1 loops=1)
    -> Table scan on employee1  (cost=1989 rows=19651) (actual time=0.0504..27.3 rows=20000 loops=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)

# Here is detailed explanation from ChatGPT.

Filter: (employee1.id = 3456): This indicates that there is a filter operation being performed on the "employee1" table, and only rows where the "id" column has a value of 3456 will be selected.

(cost=1989 rows=1965) (actual time=5.3..31.9 rows=1 loops=1): This part provides some performance-related information about the query execution:

cost=1989: It represents the cost estimate for the entire query execution. Cost is a relative measure of how much computational effort is required to execute the query.

rows=1965: It indicates the estimated number of rows that will be processed in this part of the query.

actual time=5.3..31.9: This shows the actual time taken for this part of the query to execute, which is measured in milliseconds.

rows=1 loops=1: The number of times this part of the query is executed in a loop.
-> Table scan on employee1 (cost=1989 rows=19651) (actual time=0.034..29.7 rows=20000 loops=1): This part shows that a table scan is being performed on the "employee1" table:

Table scan: This means that the database is scanning the entire "employee1" table to find the rows that match the filter condition.

cost=1989: The cost estimate for this table scan operation.

rows=19651: The estimated number of rows in the "employee1" table.

actual time=0.034..29.7: The actual time taken for the table scan operation, measured in milliseconds.

rows=20000 loops=1: The number of times this table scan operation is executed in a loop.

Overall, this query plan suggests that the database is executing a query that filters the "employee1" table to only return rows where the "id" column is equal to 3456. 
The table scan operation reads a total of 20,000 rows to find the matching row(s) and has an estimated cost of 1989 units. 
The actual execution time is 5.3 to 31.9 milliseconds, depending on the number of rows that match the filter condition.

8. 对于表employee2，我们发现只搜索了一行，并获取了结果。因此，如果表中有大量记录，我们将观察到 SQL 查询的响应时间有相当大的改善。

mysql> explain analyze select * from employee2 where id = 3456;
+---------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                           |
+---------------------------------------------------------------------------------------------------+
| -> Rows fetched before execution  (cost=0..0 rows=1) (actual time=110e-6..190e-6 rows=1 loops=1)
 |
+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

# As per ChatGPT explanation of this query plan is :

Rows fetched before execution: This part indicates that the database is fetching some data before the main query is executed.

(cost=0..0 rows=1): The cost estimate for this operation is 0 units, and it expects to fetch only one row.

(actual time=110e-6..190e-6 rows=1 loops=1): This provides the actual time taken for the data fetching operation:

actual time=110e-6..190e-6: The actual time range for the fetching operation, measured in microseconds (µs).

rows=1: The number of rows fetched.

loops=1: The number of times this data fetching operation is executed in a loop.

Overall, this part of the query plan indicates that the database is fetching a single row before executing the main query. 
The actual time taken for this data fetching operation is in the range of 110 to 190 microseconds. This preliminary data fetch might be related to obtaining some essential information or parameters needed for the subsequent execution of the main query.

9. 现在，让我们让它变得更有趣。让我们分析一下当我们在两个表上搜索非索引列FirstName的记录时的查询计划。从输出中我们发现表扫描是为了搜索记录，这需要相当长的时间来获取数据。

mysql> explain analyze select * from employee2 where FirstName = 'user-13456';
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee2.FirstName = 'user-13456')  (cost=2036 rows=2012) (actual time=15.7..24 rows=1 loops=1)
    -> Table scan on employee2  (cost=2036 rows=20115) (actual time=0.0733..17.8 rows=20000 loops=1)
 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

mysql> explain analyze select * from employee1 where FirstName = 'user-13456';
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee1.FirstName = 'user-13456')  (cost=1989 rows=1965) (actual time=23.7..35.2 rows=1 loops=1)
    -> Table scan on employee1  (cost=1989 rows=19651) (actual time=0.0439..28.9 rows=20000 loops=1)
 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)

10. 现在，让我们在employee1 表上为FirstName 列创建一个索引。

mysql> CREATE INDEX index1 ON employee1 (FirstName);
Query OK, 0 rows affected (0.13 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show indexes from employee1 \G;
*************************** 1. row ***************************
        Table: employee1
   Non_unique: 1
     Key_name: index1
 Seq_in_index: 1
  Column_name: FirstName
    Collation: A
  Cardinality: 19651
     Sub_part: NULL
       Packed: NULL
         Null: YES
   Index_type: BTREE
      Comment: 
Index_comment: 
      Visible: YES
   Expression: NULL
1 row in set (0.01 sec)

ERROR: 
No query specified

11. 现在，当我们搜索列 FirstName 的单个记录时，让我们再次检查两个表的查询计划。我们发现employee1很快就给出了响应，只有1行需要搜索，并且当使用FirstName列上的索引时，对employee1表进行了索引查找。但对于employee2来说，响应时间很大，需要搜索所有20000行才能得到响应。

mysql> explain analyze select * from employee1 where FirstName = 'user-13456';
+-------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on employee1 using index1 (FirstName='user-13456')  (cost=0.35 rows=1) (actual time=0.0594..0.0669 rows=1 loops=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


mysql> explain analyze select * from employee2 where FirstName = 'user-13456';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                             |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee2.FirstName = 'user-13456')  (cost=2036 rows=2012) (actual time=15.7..23.5 rows=1 loops=1)
    -> Table scan on employee2  (cost=2036 rows=20115) (actual time=0.075..17.5 rows=20000 loops=1)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

就是这样，伙计们。本文将帮助我们了解索引对表的影响。如何使用解释分析命令分析查询。此外，还学习如何设置 MySQL 以及如何编写用于批量插入的存储过程。

MySQL 中的 SQL 查询性能调优

通过 MySQL 中的索引加速 SQL 查询。安装、分析查询并使用存储过程以获得最佳结果。

猜你喜欢