SQL Query Performance Tuning in MySQL

        Speed ​​up SQL queries with indexes in MySQL. Install, analyze queries and use stored procedures for best results.

        In this article, we'll see how indexing table columns can help improve the fast response time of SQL queries. We'll walk through the steps to install MySQL, create stored procedures, analyze queries, and understand the impact of indexes.
        I am using MySQL version 8 on Ubuntu. In addition, I use the Dbeavor tool as a MySQL client to connect to the MySQL server. So let's learn together.
        I use MySQL for demonstration; however, the concept remains the same in all other databases.

1. In the following way, we can install MySQL and access it with root user. This MySQL instance is for testing only; therefore, I use a simple password.

$ sudo apt install mysql-server
$ sudo systemctl start mysql.service
$ sudo mysql
mysql> SET GLOBAL validate_password.policy = 0;
mysql> ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
mysql> exit
$ mysql -uroot -ppassword

2. Create a database and use it.

mysql> create database testdb;

mysql> show databases;

mysql> use testdb;

3. Create two tables employee1 and employee2. Here, employee1 has no primary key and employee 2 has a primary key.

mysql> CREATE TABLE employee1 (id int,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE employee2 (id int primary key,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.02 sec

mysql> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| employee1        |
| employee2        |
+------------------+
2 rows in set (0.00 sec)

4. Now, if we check the indexes of each table, we will find that the id column of the employee2 table already has an index because it is the primary key.

mysql> SHOW INDEXES FROM employee1 \G;
Empty set (0.00 sec)

ERROR: 
No query specified

mysql> SHOW INDEXES FROM employee2 \G;
*************************** 1. row ***************************
        Table: employee2
   Non_unique: 0
     Key_name: PRIMARY
 Seq_in_index: 1
  Column_name: id
    Collation: A
  Cardinality: 0
     Sub_part: NULL
       Packed: NULL
         Null: 
   Index_type: BTREE
      Comment: 
Index_comment: 
      Visible: YES
   Expression: NULL
1 row in set (0.00 sec)

ERROR: 
No query specified

5. Now, create a stored procedure to insert bulk data in both tables. We insert 20000 records in each table. Then we can call the stored procedure using the CALL procedure-name command.

mysql> 

CREATE PROCEDURE testdb.BulkInsert()
BEGIN
		DECLARE i INT DEFAULT 1;
truncate table employee1;
truncate table employee2;
WHILE (i <= 20000) DO
    INSERT INTO testdb.employee1 (id, FirstName, Address) VALUES(i, CONCAT("user","-",i), CONCAT("address","-",i));
    INSERT INTO testdb.employee2 (id,FirstName, Address) VALUES(i,CONCAT("user","-",i), CONCAT("address","-",i));    
   SET i = i+1;
END WHILE;
END

mysql> CALL testdb.BulkInsert() ;

mysql> SELECT COUNT(*) from employee1 e ;
COUNT(*)|
--------+
    20000|
    

mysql> SELECT COUNT(*) from employee2 e ;
COUNT(*)|
--------+
    20000|

6. Now, if we select any record with random id, we will find that the employee1 table is slow to respond because it does not have any indexes.

mysql> select * from employee2 where id = 15433;
+-------+----------+------------+---------------+---------+
| id    | LastName | FirstName  | Address       | profile |
+-------+----------+------------+---------------+---------+
| 15433 | NULL     | user-15433 | address-15433 | NULL    |
+-------+----------+------------+---------------+---------+
1 row in set (0.00 sec)

mysql> select * from employee1 where id = 15433;
+-------+----------+------------+---------------+---------+
| id    | LastName | FirstName  | Address       | profile |
+-------+----------+------------+---------------+---------+
| 15433 | NULL     | user-15433 | address-15433 | NULL    |
+-------+----------+------------+---------------+---------+
1 row in set (0.03 sec)

mysql> select * from employee1 where id = 19728;
+-------+----------+------------+---------------+---------+
| id    | LastName | FirstName  | Address       | profile |
+-------+----------+------------+---------------+---------+
| 19728 | NULL     | user-19728 | address-19728 | NULL    |
+-------+----------+------------+---------------+---------+
1 row in set (0.03 sec)

mysql> select * from employee2 where id = 19728;
+-------+----------+------------+---------------+---------+
| id    | LastName | FirstName  | Address       | profile |
+-------+----------+------------+---------------+---------+
| 19728 | NULL     | user-19728 | address-19728 | NULL    |
+-------+----------+------------+---------------+---------+
1 row in set (0.00 sec)

mysql> select * from employee1 where id = 3456;
+------+----------+-----------+--------------+---------+
| id   | LastName | FirstName | Address      | profile |
+------+----------+-----------+--------------+---------+
| 3456 | NULL     | user-3456 | address-3456 | NULL    |
+------+----------+-----------+--------------+---------+
1 row in set (0.04 sec)

mysql> select * from employee2 where id = 3456;
+------+----------+-----------+--------------+---------+
| id   | LastName | FirstName | Address      | profile |
+------+----------+-----------+--------------+---------+
| 3456 | NULL     | user-3456 | address-3456 | NULL    |
+------+----------+-----------+--------------+---------+
1 row in set (0.00 sec)

7. Now examine the output of the command EXPLAIN ANALYZE. This command actually executes the query and plans it, instrumenting it and executing it, counting rows and measuring the time taken at various points in the execution plan.
Here we find that employee1 performed a table scan, which means scanning or searching the entire table for output. We also call this a full scan of the table.

mysql> explain analyze select * from employee1 where id = 3456;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee1.id = 3456)  (cost=1989 rows=1965) (actual time=5.24..29.3 rows=1 loops=1)
    -> Table scan on employee1  (cost=1989 rows=19651) (actual time=0.0504..27.3 rows=20000 loops=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)

# Here is detailed explanation from ChatGPT.

Filter: (employee1.id = 3456): This indicates that there is a filter operation being performed on the "employee1" table, and only rows where the "id" column has a value of 3456 will be selected.

(cost=1989 rows=1965) (actual time=5.3..31.9 rows=1 loops=1): This part provides some performance-related information about the query execution:

cost=1989: It represents the cost estimate for the entire query execution. Cost is a relative measure of how much computational effort is required to execute the query.

rows=1965: It indicates the estimated number of rows that will be processed in this part of the query.

actual time=5.3..31.9: This shows the actual time taken for this part of the query to execute, which is measured in milliseconds.

rows=1 loops=1: The number of times this part of the query is executed in a loop.
-> Table scan on employee1 (cost=1989 rows=19651) (actual time=0.034..29.7 rows=20000 loops=1): This part shows that a table scan is being performed on the "employee1" table:

Table scan: This means that the database is scanning the entire "employee1" table to find the rows that match the filter condition.

cost=1989: The cost estimate for this table scan operation.

rows=19651: The estimated number of rows in the "employee1" table.

actual time=0.034..29.7: The actual time taken for the table scan operation, measured in milliseconds.

rows=20000 loops=1: The number of times this table scan operation is executed in a loop.

Overall, this query plan suggests that the database is executing a query that filters the "employee1" table to only return rows where the "id" column is equal to 3456. 
The table scan operation reads a total of 20,000 rows to find the matching row(s) and has an estimated cost of 1989 units. 
The actual execution time is 5.3 to 31.9 milliseconds, depending on the number of rows that match the filter condition.

8. For the table employee2, we found that only one row was searched and got the result. Therefore, if the table has a large number of records, we will observe a considerable improvement in the response time of the SQL query.

mysql> explain analyze select * from employee2 where id = 3456;
+---------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                           |
+---------------------------------------------------------------------------------------------------+
| -> Rows fetched before execution  (cost=0..0 rows=1) (actual time=110e-6..190e-6 rows=1 loops=1)
 |
+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

# As per ChatGPT explanation of this query plan is :

Rows fetched before execution: This part indicates that the database is fetching some data before the main query is executed.

(cost=0..0 rows=1): The cost estimate for this operation is 0 units, and it expects to fetch only one row.

(actual time=110e-6..190e-6 rows=1 loops=1): This provides the actual time taken for the data fetching operation:

actual time=110e-6..190e-6: The actual time range for the fetching operation, measured in microseconds (µs).

rows=1: The number of rows fetched.

loops=1: The number of times this data fetching operation is executed in a loop.

Overall, this part of the query plan indicates that the database is fetching a single row before executing the main query. 
The actual time taken for this data fetching operation is in the range of 110 to 190 microseconds. This preliminary data fetch might be related to obtaining some essential information or parameters needed for the subsequent execution of the main query.

9. Now, let's make it more interesting. Let's analyze the query plan when we search for records of the non-indexed column FirstName on both tables. From the output we found that the table scan is for searching records and it takes quite a long time to fetch the data.

mysql> explain analyze select * from employee2 where FirstName = 'user-13456';
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee2.FirstName = 'user-13456')  (cost=2036 rows=2012) (actual time=15.7..24 rows=1 loops=1)
    -> Table scan on employee2  (cost=2036 rows=20115) (actual time=0.0733..17.8 rows=20000 loops=1)
 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

mysql> explain analyze select * from employee1 where FirstName = 'user-13456';
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee1.FirstName = 'user-13456')  (cost=1989 rows=1965) (actual time=23.7..35.2 rows=1 loops=1)
    -> Table scan on employee1  (cost=1989 rows=19651) (actual time=0.0439..28.9 rows=20000 loops=1)
 |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)

10. Now, let's create an index on the employee1 table for the FirstName column. 

mysql> CREATE INDEX index1 ON employee1 (FirstName);
Query OK, 0 rows affected (0.13 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show indexes from employee1 \G;
*************************** 1. row ***************************
        Table: employee1
   Non_unique: 1
     Key_name: index1
 Seq_in_index: 1
  Column_name: FirstName
    Collation: A
  Cardinality: 19651
     Sub_part: NULL
       Packed: NULL
         Null: YES
   Index_type: BTREE
      Comment: 
Index_comment: 
      Visible: YES
   Expression: NULL
1 row in set (0.01 sec)

ERROR: 
No query specified

11. Now, let's check the query plan for both tables again when we search for a single record of the column FirstName. We found that employee1 responded quickly, only 1 row needed to be searched, and when using the index on the FirstName column, an index lookup was done on the employee1 table. But for employee2 the response time is huge and need to search all 20000 rows to get a response.

mysql> explain analyze select * from employee1 where FirstName = 'user-13456';
+-------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on employee1 using index1 (FirstName='user-13456')  (cost=0.35 rows=1) (actual time=0.0594..0.0669 rows=1 loops=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)


mysql> explain analyze select * from employee2 where FirstName = 'user-13456';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                             |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee2.FirstName = 'user-13456')  (cost=2036 rows=2012) (actual time=15.7..23.5 rows=1 loops=1)
    -> Table scan on employee2  (cost=2036 rows=20115) (actual time=0.075..17.5 rows=20000 loops=1)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

That's it, folks. This article will help us understand the impact of indexes on tables. How to analyze queries using the explain analyze command. Also, learn how to set up MySQL and write stored procedures for bulk inserts.

Guess you like

Origin blog.csdn.net/qq_28245905/article/details/132333751