MySQL's optimized SELECT statement

MySQL's optimized SELECT statement

insert image description here

Summary:

The topic of this article is to optimize the SELECT statement for MySQL, covering an overview of database performance improvement, WHERE clause optimization, range optimization and hash join optimization. In terms of database performance, software structure, minimization and efficient execution of CPU and I/O operations need to be considered. WHERE clause optimization involves improving the algorithm and readability of queries. Range optimization describes the conditions and methods under which the optimizer uses range access methods. In terms of hash join optimization, MySQL uses hash join instead of the block nested loop join algorithm in the newer version, thereby improving query speed.

introduction:

MySQL is a widely used relational database management system that is critical to ensuring database performance. Optimizing SELECT statements is a key part of improving database performance. This article will explore several key optimization techniques, including WHERE clause optimization, range optimization, and hash join optimization. By optimizing the query algorithm and database structure, we can significantly improve the query efficiency and response time of the MySQL database.

insert image description here

1. Overview of MySQL performance commission optimization

With continuous in-depth study and practice, you will become an expert in the field of database performance optimization. You will gain an in-depth understanding of the internal operating mechanism of the database and master optimization techniques such as index optimization, query optimization, cache configuration, and transaction management. A better understanding of the internal situation allows you to accurately locate and resolve database performance bottlenecks, thereby improving the overall efficiency of the system.

At the professional level, you will also master more advanced performance measurement methods and tools, such as performance profilers, so that you can measure and analyze the health of the database in depth, catch potential performance problems, and make targeted optimization improvements to provide More efficient and stable database services.

Suppose we have a simple database table employeescontaining an employee's name (name), position ( position), age ( age), and date of entry ( hire_date).

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  position VARCHAR(50),
  age INT,
  hire_date DATE
);`

Now we want to query the information of employees whose age is greater than or equal to 30 in the company, and sort them in ascending order of entry date. We'll start by optimizing the query.

  • Query Before Query Optimization
SELECT * 
FROM employees
WHERE age >= 30
ORDER BY hire_date;
  • Query after query optimization
SELECT id, name, position, age, hire_date
FROM employees
WHERE age >= 30
ORDER BY hire_date;

Note explanation:

1. The query results before and after query optimization are the same, but the optimized query only selects the required columns ( id, name, position, age, hire_date), which avoids unnecessary data transmission and improves query efficiency.

2. Add appropriate indexes: For frequently used columns, such as ageand hire_date, in this case, you can add indexes to speed up the query. For example:

CREATE INDEX idx_age ON employees (age);
CREATE INDEX idx_hire_date ON employees (hire_date);

3. Use an appropriate data type: Selecting an appropriate data type can help reduce storage space and improve query performance. Make sure to use the smallest possible data type, such as using INT instead of VARCHAR, and choose a more efficient data type according to the actual situation.

4. Avoid full table scans: avoid full table scans in queries, and use indexes as much as possible to locate data. Full table scans are one of the main causes of database performance degradation, especially on large tables.

5. Regular maintenance of the database: Regular maintenance of the database, including rebuilding indexes, compressing tables, cleaning logs, etc., helps to maintain good performance of the database.

2. WHERE clause optimization

When dealing with WHERE clauses, a number of optimizations can be taken to improve query performance. These optimization principles also apply to AND statements WHEREthat contain clauses . It should be noted that the MySQL optimizer is constantly evolving and may perform various optimizations on queries, so the following examples are only part of them.DELETEUPDATE

Example:

Suppose we have a simple database table employees, containing the employee's name (name), position (position), age (age) and date of entry (hire_date).

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  position VARCHAR(50),
  age INT,
  hire_date DATE
);

Use indexes to optimize queries:

  • Before query optimization: no index is used, full table scan
SELECT * 
FROM employees
WHERE age >= 30 AND position = 'Manager';
  • After query optimization: use indexes on age and position columns
SELECT * 
FROM employees
WHERE age >= 30 AND position = 'Manager';

Comment explanation: In the first query, no index is created on the age and position columns, causing the database to perform a full table scan to find records that meet the conditions. In the second query, we created indexes on the age and position columns, and the database can use the indexes to quickly locate records that meet the conditions.

Avoid using functions:

  • Before query optimization: use functions to process the age column
SELECT * 
FROM employees
WHERE YEAR(hire_date) = 2023;
  • After query optimization: avoid using functions
SELECT * 
FROM employees
WHERE hire_date >= '2023-01-01' AND hire_date < '2024-01-01';

Explanation of comments: In the first query, we used the YEAR() function to extract the year of the hire_date column, which made it impossible to take advantage of the index. In the second query, we filter directly using the date range so that the database can use the index to optimize the query.

Optimized logical operators:

  • Before query optimization: the order of logical operators is unreasonable
SELECT * 
FROM employees
WHERE age >= 30 OR position = 'Manager' AND hire_date >= '2023-01-01';
  • After query optimization: Reasonable use of parentheses to group conditions
SELECT * 
FROM employees
WHERE age >= 30 OR (position = 'Manager' AND hire_date >= '2023-01-01');

Comment explanation: In the first query, the order of logical operators is unreasonable, which may cause the execution result of the query to be inconsistent with expectations. In the second query, we judiciously use parentheses to group the conditions, ensuring that the logical operators are evaluated in the expected order.

Make sure the columns on the WHERE condition have the appropriate data types:

  • Before query optimization: use strings for comparison
SELECT * 
FROM employees
WHERE age = '30';
  • After query optimization: use the correct data type
SELECT * 
FROM employees
WHERE age = 30;

Comment explanation: In the first query, we use the string '30' to compare with the age column, which will cause implicit data type conversion and affect query performance. In the second query, we use the correct data type (integer) for the comparison, avoiding unnecessary conversions.

3. Range optimization

The range access method is an access method used by the MySQL optimizer to retrieve a subset of table rows within a time interval containing one or several index values. It can be used for single-part or multi-part indexes. Each part is described in detail below, and examples are given to help you understand better:

1. Range access methods for single-part indexes:

  • When a query uses a single-part index and involves range conditions, such as using operators such as BETWEEN, <, >, etc., the optimizer can use range access methods to locate table rows that meet the conditions.
  • This optimization method can be used to select rows corresponding to index values ​​in a continuous range, thereby avoiding full table scans and improving query efficiency.
    Example:

Suppose we have a simple database table employees with a single-part index on the age column:

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  position VARCHAR(50),
  age INT,
  hire_date DATE,
  INDEX idx_age (age)
);

The query uses the range access method:

  • Before query optimization: no index is used, full table scan
SELECT * 
FROM employees
WHERE age BETWEEN 30 AND 40;
  • After query optimization: use age index for range access
SELECT * 
FROM employees
WHERE age BETWEEN 30 AND 40;

In the optimized query, the MySQL optimizer will use the idx_age index to locate table rows with an age between 30 and 40, without performing a full table scan on the entire table.

2. Range access methods for multipart indexes:

  • When a query uses a multipart index (joint index), and range conditions involving multiple indexed columns, the optimizer can also use the range access method to optimize the query.
  • A multipart index enables range access on multiple indexed columns to more precisely locate table rows that meet all criteria.
    Example:

Suppose we create a multipart index on the position and hire_date columns:

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  position VARCHAR(50),
  age INT,
  hire_date DATE,
  INDEX idx_position_hire_date (position, hire_date)
);

The query uses the range access method:

  • Before query optimization: no index is used, full table scan
SELECT * 
FROM employees
WHERE position = 'Manager' AND hire_date >= '2023-01-01';
  • After query optimization: use idx_position_hire_date index for range access
SELECT * 
FROM employees
WHERE position = 'Manager' AND hire_date >= '2023-01-01';

In the optimized query, the MySQL optimizer will use the idx_position_hire_date index to locate table rows whose position is 'Manager' and whose hire_date is greater than or equal to '2023-01-01', without performing a full table scan on the entire table.

3. Equidistant range optimization for multi-valued comparisons:

  • When multiple index columns have equidistant range conditions, the MySQL optimizer can optimize multi-value comparisons and perform more efficient queries by skipping some ranges.
    Example:

Suppose we have a simple database table employees with a single-part index on the age column:

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  position VARCHAR(50),
  age INT,
  hire_date DATE,
  INDEX idx_age (age)
);

The query uses the range access method:

– Before query optimization: range condition intervals are not equal, no index is used

SELECT * 
FROM employees
WHERE age BETWEEN 30 AND 40 OR age BETWEEN 50 AND 60;
  • After query optimization: the range conditions are equally spaced, and the age index is used for range access
SELECT * 
FROM employees
WHERE age BETWEEN 30 AND 60;

In the optimized query, the MySQL optimizer will combine the two range conditions into one range condition, and use the idx_age index to locate table rows whose age is between 30 and 60, without performing a full table scan on the entire table.

4. Skip scan scope access method:

  • In some cases, the MySQL optimizer can use the scan-skip range access method to skip some unqualified index values ​​faster, thereby reducing the scan range and improving query performance.
    5. Range optimization for row constructor expressions:

  • When a row constructor, such as (col1, col2), is used in a query, and the result of that row constructor is used in a range condition, the MySQL optimizer can use range access optimization to improve query efficiency.
    Limiting memory usage for range optimization is an important aspect considered by the MySQL optimizer. When performing range access, the optimizer tries to minimize memory usage to ensure query efficiency and performance.

These are some situations and examples where the MySQL optimizer uses the range access method. It should be noted that the behavior of the optimizer may change with the upgrade of the MySQL version, so in specific applications, it is best to perform performance testing and tuning based on specific queries and data conditions.

insert image description here

4. Hash join optimization

MySQL 8.0.18 introduces an important optimization. For any query that has an equality join condition and does not use an index, MySQL will use the hash join (Hash Join) algorithm. The goal of this optimization is to replace the block nested loop algorithm (Block Nested-Loop Join) used in earlier versions of MySQL, thereby improving query performance.

Hash join is a join algorithm used to perform a join operation between two datasets. When MySQL finds that a query involves a join between two tables, and the join condition is an equality condition (such as ON t1.c1 = t2.c1), and no index is used, it will choose to use a hash join.

The basic principle of the hash join algorithm is as follows:

Build a hash table for the smaller table (usually the inner table) in the join operation, with the value of the joined column as the key and the row data as the value stored in the hash table.

Scans a larger table (usually an external table), and for each row, compares the value of the join column with the key in the hash table, and if there is a match, joins the row with the value in the hash table to form the result set.

Due to the fast lookup properties of hash tables, the hash join algorithm is usually more efficient than the block nested loop algorithm, especially when the size of the join table varies widely.

Example using hash join:

  • Create two simple tables
CREATE TABLE t1 (id INT PRIMARY KEY, name VARCHAR(50));
CREATE TABLE t2 (id INT PRIMARY KEY, age INT);
  • insert some data
INSERT INTO t1 (id, name) VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');
INSERT INTO t2 (id, age) VALUES (1, 25), (2, 30), (4, 40);
  • Join operation using hash join
SELECT *
FROM t1
JOIN t2 ON t1.id = t2.id;

In the above example, since the join condition is an equality condition and no index is used, MySQL will choose to use a hash join to execute this query.

It should be noted that hash join may consume more memory in some cases, especially when the joined table is large. MySQL tries to limit memory usage to avoid excessive consumption of system resources. If the amount of data involved in the query is large, you may need to adjust the MySQL configuration parameters appropriately to ensure the performance and stability of the hash join.

In general, hash join is an important optimization introduced by MySQL 8.0.18, which can significantly improve query performance in some cases, especially in the case of equal join conditions and no index is used. However, in practical applications, performance testing and tuning are still required based on specific queries and data conditions.

5. Optimization under the storage engine

Optimization improves the efficiency of direct comparisons between non-indexed columns and constants. In this case, the condition is "pushed down" to the storage engine for evaluation. This optimization can only be used by MySQL's NDB storage engine. For NDB Cluster, this optimization can eliminate the need to send unmatched rows over the network between the cluster's data nodes and the MySQL server issuing the query, and can speed up query usage by a factor of 5 to 10 (in some cases) .

The NDB storage engine is a cluster storage engine of MySQL, designed for high-availability and high-performance distributed environments. In the NDB storage engine, the optimization effect of conditional pushdown is particularly significant. For those queries involving direct comparison between non-indexed columns and constants, by pushing the query conditions to the storage engine layer for evaluation, you can avoid Mismatched rows are transmitted, thereby significantly reducing network communication overhead and data transfer volume.

Example:

Suppose we have a MySQL cluster using the NDB storage engine, and have a simple table employeeswith the following fields:

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  age INT,
  department VARCHAR(50)
);

Now, we want to find out the employees whose department is "Sales" and whose age is greater than or equal to 30 years old. We use the following query:

SELECT *
FROM employees
WHERE department = 'Sales' AND age >= 30;

This optimization is called "conditional pushdown" ( Condition Pushdown), and it improves the efficiency of direct comparisons between non-indexed columns and constants. This optimization is mainly for MySQL's NDB storage engine. Through condition pushdown, the conditions of the query will be pushed to the NDB storage engine layer for evaluation, thereby reducing the need to transmit unmatched rows to the MySQL server over the network, greatly improving query performance, and can query in some cases 5 to 10 times faster.

Specifically, the optimization process of conditional push-down is as follows:

1. When the MySQL server receives a query request, it involves direct comparison conditions between non-indexed columns and constants.

2. The MySQL optimizer judges whether the query is suitable for conditional pushdown optimization. For the NDB storage engine, if the query contains conditions suitable for condition pushdown, the optimizer will push these conditions to the NDB storage engine layer.

3. The NDB storage engine performs conditional push-down operations on the data nodes, directly performs conditional matching on the data nodes, and returns qualified data rows to the MySQL server.

4. Since the conditional pushdown eliminates the transmission of unmatched rows, only the data that meets the query conditions is transmitted back to the MySQL server, which greatly reduces the network communication overhead and the amount of data transmission.

Since the condition is pushed down to execute in the NDB storage engine, the performance improvement brought by this optimization can only be enjoyed when the NDB storage engine is used.

It should be noted that condition pushdown is not applicable to all types of queries, it is mainly for queries involving direct comparison between non-indexed columns and constants, and may not produce optimization effects for other types of queries. In practical applications, if the NDB storage engine is used, you can pay attention to the usage of conditional pushdown, and evaluate the effect of optimization by viewing the execution plan and performance test.

6. Index condition push-down optimization

Index Condition Pushdown (ICP) is an optimization in MySQL for cases where an index is used to retrieve rows from a table. It can significantly improve query performance, especially for those queries involving indexed columns.

If ICP is not enabled, the storage engine will traverse the index, locate the qualified rows, and return these rows to the MySQL server. The MySQL server then performs further conditional evaluation on the returned rows.

After ICP is enabled, if the WHERE condition of the query can only use the columns in the index to evaluate some conditions, the MySQL server will "push down" these conditions to the storage engine layer for processing. This means that the storage engine can evaluate some conditions at the index level and filter out unqualified index rows, thereby reducing the number of rows that the MySQL server needs to process and improving query efficiency.

ICP optimization usually involves the following two situations:

1. Index condition filtering (Index Condition Pushdown, ICP):

  • When MySQL finds that the WHERE condition of the query can only use the columns in the index for conditional filtering, it will push these conditions down to the storage engine layer, so that the storage engine can filter at the index level, reducing the data that the MySQL server needs to process quantity.

2. Covering Index (Covering Index):

  • When MySQL finds that the SELECT columns of the query are already included in the index, it can use the covering index to avoid accessing the data rows of the table, thereby improving query efficiency. Covering indexes can reduce I/O operations because all needed data can be fetched from the index.
    Example:

Suppose we have a simple table employees with the following fields:

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50),
  age INT,
  department VARCHAR(50),
  salary INT,
  INDEX idx_department (department),
  INDEX idx_age_salary (age, salary)
);

Now, we want to query the employees whose department is "Sales" and whose age is greater than or equal to 30 years old, and only need to return the two columns of id and name.

SELECT id, name
FROM employees
WHERE department = 'Sales' AND age >= 30;

In this case, if ICP is enabled, the MySQL server will push down part of the condition department = 'Sales' to the storage engine layer for index condition filtering, and only the index rows that meet the conditions will be returned to the MySQL server, and then MySQL will extract Two columns of id and name are displayed, which avoids accessing unqualified data rows and improves query efficiency.

It should be noted that ICP optimization is only effective for certain types of queries, and in actual applications, the optimization effect may vary depending on the database structure, data volume, and query complexity. You can evaluate the impact of ICP optimization on query performance by looking at execution plans and performance tests.

7. Nested loop join algorithm

MySQL performs joins between tables using the nested-loop algorithm, or variations on it. Nested Loop Join Algorithm Nested Loop Join Algorithm Nested Loop Join Algorithm A simple Nested Loop Join (NLJ) algorithm reads rows at a time from the first table in a loop and passes each row to a nested loop join algorithm. A loop that processes the next table in the join. Repeat this process as many times as you want to join the tables.

MySQL uses the nested-loop algorithm, or variations on it, when performing joins between tables. There are two main nested loop algorithms: block nested loop join algorithm and simple nested loop join algorithm (NLJ).

1 block nested loop join algorithm (Block Nested-Loop Join):

  • In the block nested loop join algorithm, MySQL reads data from one table and stores it in a block in memory, then scans another table and compares it row by row with the data in the block.

  • If there is a match, the matching row is returned as the result. This process is repeated until all rows have been scanned.
    2. Simple nested loop join algorithm (Nested-Loop Join, NLJ):

  • The simple nested-loop join algorithm is similar to the block nested-loop join algorithm, but differs in that it does not require pre-reading of data into blocks. It is the most basic nested loop join algorithm.

  • The NLJ algorithm is a row-by-row comparison algorithm that reads rows from a table one by one, then passes each row to a nested loop that processes the next table in the join.

  • This process is repeated until all rows are scanned, or a matching row is found.

Among the above algorithms, the specific algorithm used depends on the optimal join algorithm selected by the MySQL optimizer. The optimizer will select the most suitable join algorithm according to the query conditions, table size, index usage and other factors to achieve the best performance and execution efficiency.

The join type (Join Type) is an indication used to control the selection of the join algorithm when performing the join. MySQL supports a variety of join types, such as: inner join (INNER JOIN), left join (LEFT JOIN), right join (RIGHT JOIN), full join (FULL JOIN), etc. Depending on the join type, MySQL may choose a different join algorithm to perform the join operation.

It is important to note that the nested loop join algorithm may lead to poor performance in some cases, especially when there is a large amount of data in the joined tables. In practical applications, indexes can be used to optimize the performance of joins, and help the optimizer choose a more appropriate join algorithm through the appropriate join sequence and join type. Execution plan observation and performance testing are also effective means to optimize join operations.

Demonstrates the Nested-Loop Join algorithm (Nested-Loop Join, NLJ) in MySQL.

Suppose we have two tables students and scores, which contain student information and student performance information respectively.

sql

  • Student Information Form
CREATE TABLE students (
  student_id INT PRIMARY KEY,
  name VARCHAR(50),
  age INT
);

– Student report card

CREATE TABLE scores (
  student_id INT,
  subject VARCHAR(50),
  score INT
);
  • insert some data
INSERT INTO students (student_id, name, age)
VALUES (1, 'Alice', 20),
       (2, 'Bob', 22),
       (3, 'Charlie', 21);
INSERT INTO scores (student_id, subject, score)
VALUES (1, 'Math', 85),
       (1, 'Science', 78),
       (2, 'Math', 92),
       (2, 'Science', 80),
       (3, 'Math', 88);

Now, we want to query for each student's name, age, and math score.

SELECT s.name, s.age, sc.score
FROM students s
JOIN scores sc ON s.student_id = sc.student_id
WHERE sc.subject = 'Math';

In this query, we used the nested loop join algorithm (NLJ). MySQL will scan the students table row by row, and then pass each row to the nested loop for the next step of the join operation. In the process of nested loops, MySQL will look up the matching student grades in the scores table, and return the student grades that meet the conditions as the result.

The query in the example returns results similar to the following:

+---------+-----+-------+
| name    | age | score |
+---------+-----+-------+
| Alice   | 20  | 85    |
| Bob     | 22  | 92    |
| Charlie | 21  | 88    |
+---------+-----+-------+

Note that this is just a simple example, and the queries and data volumes in real applications may be more complex. The MySQL optimizer will choose the best join algorithm on a case-by-case basis, possibly not just the nested-loop join algorithm. For complex queries, the final execution plan may involve multiple tables and multiple join operations. By looking at execution plans and performance tests, you can better understand the join algorithm and optimization strategy MySQL chooses in real-world scenarios.

8. Nested join optimization (JOIN)

Join (Join) is an operation used to combine data in multiple tables in SQL. Multiple tables can be joined together via the JOIN clause to create a new result set that contains the associated data between these tables.

In MySQL, the join syntax supports nested joins, that is, nested use of multiple table joins in the JOIN clause. This makes it possible to relate more tables together through multiple join conditions.

In general, the basic form of join syntax is as follows:

SELECT *
FROM table1
JOIN table2 ON table1.column = table2.column;

In the above example, we used INNER JOIN to connect tables table1 and table2 according to the column column. Only rows satisfying the join condition will be returned as the result set.

If you need to add another table to the join, you can use a nested join. For example:

SELECT *
FROM table1
JOIN table2 ON table1.column = table2.column
JOIN table3 ON table2.column = table3.column;

In this example we do a nested join of the tables table1, table2 and table3. The join conditions are table1.column = table2.column and table2.column = table3.column in turn.

MySQL's join syntax supports various types of joins, such as: inner join (INNER JOIN), left join (LEFT JOIN), right join (RIGHT JOIN), full join (FULL JOIN), etc. Different types of joins result in different result sets.

It should be noted that nested joins may lead to increased query complexity, especially when the number of joined tables is large or the join conditions are complex. In practical applications, nested joins should be used carefully to avoid performance degradation caused by too many table joins. Reasonable use of indexes and optimization of query conditions are also the key to improving join performance.

Summarize:

Performance optimization of MySQL databases is critical to improving application efficiency and user experience. This article discusses several aspects of optimizing SELECT statements, including improving the WHERE clause, range optimization, and using hash joins instead of block nested loop join algorithms. When optimizing database performance, developers and database administrators should study the working principle of the MySQL optimizer in depth, and take corresponding optimization measures according to specific conditions. By optimizing the query algorithm and index design, the database query performance can be significantly improved, so as to better meet user needs.

Guess you like

Origin blog.csdn.net/qq_42055933/article/details/132032373