How to do mysql tuning? 7 deadly tricks to tune slow SQL by 100 times

Foreword:

In the reader community (50+) of the 40-year-old architect Nien, many friends can't get offers, or can't get good offers .

Nien often optimizes projects, optimizes resumes, and digs out technical highlights for everyone . Java tuning is an important guide to guide your resume .

The problem is that many friends don't even have a basis for tuning. Of course, they don't even understand high-concurrency scenarios.

In fact, whether it is tuning or high-concurrency scenarios, we need to solve some basic problems, such as:

100 million users, with an average business volume of 10 times per person per day, requires a concurrent number of more than 5000, a peak value between 5w and 10w, and a QPS of more than 25w. How to perform stress testing? How to tune it?

For architects and advanced developers, tuning is the core content, and stress testing is the most important part of internal skills.

Combining senior architectural experience and industry cases, the Nien team sorted out a series of " Java Tuning Bible " PDF e-books, including five parts planned for this article:

(1) Tuning Bible 1: Zero-based proficiency in Jmeter distributed pressure measurement, 10Wqps+ pressure measurement practice (completed)

(2) Tuning Bible 2: from 70s to 20ms, a 3500 times performance optimization practice, the solution is available to everyone (completed)

(3) Tuning Bible 4: How to do mysql tuning? Deadly 7 tricks to tune slow SQL by 100 times, realize your freedom of Mysql tuning (this article)

(4) Tuning Bible 3: Zero-based proficiency in JVM tuning operations to achieve JVM freedom (writing)

(5) Tuning Bible 5: Zero-based proficiency in Linux and Tomcatl tuning operations to achieve freedom of infrastructure tuning (in writing)

The above articles will be published on the official account of the technology free circle in succession. The complete "Java Tuning Bible" PDF is available from Nien.

Background of mysql tuning:

MySQL is a relational database management system widely used in applications of all sizes and types. In actual database applications, we often face various performance bottlenecks and challenges.

When database performance degrades, applications experience increased response times, reduced throughput, and can even cause system crashes.

Therefore, tuning MySQL is a key part to ensure the efficient operation of the database system.

In the process of actually using MySQL, we may encounter a series of problems, such as:

  1. Degraded query performance : Some query statements are executed slowly, resulting in slower application response time and reduced user experience.
  2. Concurrent access problems : When multiple users access the database at the same time, concurrent access problems such as lock competition and deadlock may occur, resulting in system performance degradation.
  3. Improper database configuration : The default configuration of MySQL may not meet the needs of specific applications, and parameters need to be adjusted appropriately for better performance.
  4. Hardware and operating system limitations : The hardware and operating system resources of the database server may become a bottleneck, affecting the performance of the database.
  5. Storage engine selection : Selecting an appropriate storage engine is crucial for certain specific application scenarios, and different storage engines have different performance characteristics.

Facing these problems, we need to take a series of tuning measures to improve the performance and scalability of the MySQL database.

This article will discuss the key steps and techniques of MySQL tuning through actual cases, and help you understand how to identify and solve common performance problems, so as to optimize the performance of the database system.

In the following content, we use relevant cases to discuss in depth the specific technologies and practices of monitoring and diagnosis, query optimization, database configuration tuning, hardware and operating system optimization, etc., to help readers fully understand the actual process of MySQL tuning , and ultimately improve the performance and stability of their applications.

Seven deadly tricks to speed up "slow SQL" by 100 times

MySQL query optimization and tuning are common problems in production, and they are also common interview questions. Nien's team has summarized a large number of production cases and practical experience, and extracted seven deadly tricks to speed up "slow SQL" by 100 times:

Step 1: Index optimization:

  • Reasonably design indexes: According to the query conditions and access modes, design appropriate indexes, including single-column indexes, composite indexes, unique indexes, etc.
  • Avoid too many indexes: Too many indexes will increase the overhead of data maintenance and also reduce the performance of update operations.
  • Maintain indexes regularly: delete indexes that are no longer used, and rebuild or reorganize indexes to improve the efficiency and performance of indexes.

Step 2: Covering the index:

  • Use the covering index to avoid the operation of returning to the table: If the queried fields are included in the index, you can avoid accessing the data rows of the main table, thereby improving query performance.

Step 3: Push down the index:

  • Use Index Condition Pushdown to optimize queries: MySQL 5.6+ supports index pushdown, which can filter some conditions at the index level, reduce table return operations, and improve query efficiency.

Step 4: Subquery optimization:

  • Try to avoid using a large number of subqueries: too many subqueries will increase the complexity and overhead of the query. You can improve performance by optimizing query statements, converting subqueries into join queries, temporary tables, etc.
  • Use the appropriate subquery type: Select the appropriate subquery type according to the specific scenario, such as scalar subquery, correlated subquery, existential subquery, etc.

Step 5: Sorting optimization:

  • Avoid unnecessary sorting in queries: If the query does not require sorting results, you can adjust query conditions or index design to avoid sorting operations and improve query performance.
  • Optimize sorting operations: For queries that require sorting, you can optimize sorting operations by designing reasonable indexes, increasing the size of the sorting buffer, and adjusting the sorting algorithm.

Step 6: Use of query cache:

  • Reasonable use of query cache: query cache can cache query results and improve query performance. However, in a high-concurrency environment, the effect of the query cache may not be ideal, so it needs to be configured and used according to the specific situation.

Step 7: SQL statement optimization:

  • Optimize the writing of query statements: choose reasonable query methods, use correct keywords and functions, and reduce unnecessary calculations and data operations.
  • Avoid full table scans: try to use indexes to cover queries, avoid full table scan operations, and reduce IO overhead and query time.

Nien's team has refined the seven deadly tricks to speed up "slow SQL" by 100 times. Let's look at a comprehensive tuning case below.

Case background:

In this article, we will take an online e-commerce website as an example to discuss a practical case of MySQL tuning. As a popular online shopping platform, the e-commerce website has a large number of users browsing products and placing orders every day.

However, as the number of users increased and the transaction volume increased, their database performance encountered some challenges.

The database of the e-commerce website mainly stores important data such as product information, user information, and orders. With the development of business, the scale of the database gradually increases, and the number of records in the table and the number of indexes also increase sharply. Followed by a series of performance problems.

First of all, when users search or browse products, the execution speed of some query statements slows down, resulting in prolonged website response time. The user experience is degraded, which may even lead to the loss of users.

Secondly, the website will experience high concurrency during a certain period of time, such as during promotional activities. A large number of users accessing the database at the same time may lead to concurrent access problems such as lock competition and deadlock, which will affect the performance and availability of the system.

In addition, the configuration of hardware resources and operating system parameters of the database server may also become a performance bottleneck. The original hardware and operating system settings cannot meet the requirements of the current database load, resulting in the inability to fully utilize the database performance.

In the face of these challenges, MySQL tuning is necessary to improve database performance and scalability. We can solve performance bottlenecks and concurrent access problems by optimizing query statements, adjusting database configuration parameters, optimizing storage engines and hardware, and ensuring that the database can support growing business needs stably and efficiently.

In the following content, we will introduce in detail the tuning steps and practical experience adopted by the e-commerce website, and show how to improve database performance, optimize query efficiency, and achieve fast and reliable online shopping experience.

The MySQL tuning process mainly includes:

  1. Database configuration tuning
  2. Hardware and OS tuning
  3. SQL statement query optimization

Let's first understand the database configuration tuning:

Database configuration tuning

Database configuration is an important part of MySQL tuning. By properly adjusting database parameter settings, performance and resource utilization can be significantly improved.

In this chapter, we'll explore some key database configuration tuning strategies and provide corresponding examples of actual scripts or commands.

Adjust database buffer size and thread pool settings

1. InnoDB buffer pool size (innodb_buffer_pool_size):

For e-commerce website platforms, we recommend setting this parameter to 70%-80% of the entire server memory.

For example, if the server has 16GB of memory, you can set the size of the buffer pool to 12GB-14GB to ensure that most of the data and indexes can be cached in memory to improve read performance.

Example:

mysql> SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
+-------------------------+-----------+
| Variable_name           | Value     |
+-------------------------+-----------+
| innodb_buffer_pool_size | 134217728 |
+-------------------------+-----------+
1 row in set (0.00 sec)

mysql> SET GLOBAL innodb_buffer_pool_size = 12*1024*1024*1024;
Query OK, 0 rows affected (0.00 sec)

2. Query cache settings (query_cache_type, query_cache_size):

In the e-commerce website platform, due to the high update frequency, we recommend disabling the query cache.

Query caching may bring more performance burdens in environments with high concurrency and frequent updates.

Example:

mysql> SHOW VARIABLES LIKE 'query_cache%';
+------------------------------+---------+
| Variable_name                | Value   |
+------------------------------+---------+
| query_cache_limit            | 1048576 |
| query_cache_min_res_unit     | 4096    |
| query_cache_size             | 1048576 |
| query_cache_type             | OFF     |
| query_cache_wlock_invalidate | OFF     |
+------------------------------+---------+
5 rows in set (0.00 sec)

Query cache settings in the MySQL configuration file:

[mysqld]
query_cache_type = 0
query_cache_size = 0

3. Thread pool settings (thread_pool_size, thread_pool_max_threads):

For the high concurrent access of the e-commerce website platform, we recommend properly adjusting the size of the thread pool and the maximum number of threads according to the estimated number of concurrent connections and system load.

You can first set a smaller value according to the actual situation, and then make dynamic adjustments by monitoring the system load and connection pool status.

Example:

SHOW VARIABLES LIKE 'thread_pool%';

Make thread pool settings in the MySQL configuration file:

[mysqld]
thread_pool_size = 100
thread_pool_max_threads = 200

Optimize the parameter configuration of the InnoDB storage engine

1. Log configuration (innodb_log_file_size, innodb_log_buffer_size):

For e-commerce website platforms, we recommend adjusting the log file size and buffer size according to the log writing speed and transaction processing volume of the database.

In general, the log file size should be large enough to reduce the frequency of log switching, while the buffer size should be able to accommodate a certain number of log writes.

Example:

mysql> SHOW VARIABLES LIKE 'innodb_log_file%';
+---------------------------+----------+
| Variable_name             | Value    |
+---------------------------+----------+
| innodb_log_file_size      | 50331648 |
| innodb_log_files_in_group | 2        |
+---------------------------+----------+
2 rows in set (0.00 sec)

Adjust log file size and buffer size:

SET GLOBAL innodb_log_file_size = 1G;
SET GLOBAL innodb_log_buffer_size = 32M;

Make permanent settings in the MySQL configuration file:

[mysqld]
innodb_log_file_size = 1G
innodb_log_buffer_size = 32M

2. Lock configuration (innodb_lock_wait_timeout):

For concurrent access to the e-commerce website platform, we recommend setting the lock wait timeout to a shorter value to reduce long lock waits.

In general, you can set the lock wait timeout to a few seconds.

Example:

mysql> SHOW VARIABLES LIKE 'innodb_lock_wait_timeout';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50    |
+--------------------------+-------+
1 row in set (0.00 sec)

Adjust the lock wait timeout:

SET GLOBAL innodb_lock_wait_timeout = 5;

Make permanent settings in the MySQL configuration file:

[mysqld]
innodb_lock_wait_timeout = 5

3. Automatic growth parameter (innodb_autoinc_lock_mode): For the high concurrency environment of the e-commerce website platform, it is recommended to set the automatic growth lock mode to "2", that is, to use the continuous batch insert mode. This reduces locking conflicts for autogrowth fields and improves concurrency performance.

Example:

mysql> SHOW VARIABLES LIKE 'innodb_autoinc_lock_mode';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_autoinc_lock_mode | 1     |
+--------------------------+-------+
1 row in set (0.00 sec)

Adjust autogrow lock mode:

SET GLOBAL innodb_autoinc_lock_mode = 2;

Make permanent settings in the MySQL configuration file:

[mysqld]
innodb_autoinc_lock_mode = 2

Adjust logging and locking strategies to improve concurrency performance

1. Transaction isolation level (transaction-isolation): For e-commerce website platforms, it is recommended to set the transaction isolation level to "READ-COMMITTED" to balance the requirements of concurrency performance and data consistency.

Example:

mysql> SELECT @@global.tx_isolation;
+-----------------------+
| @@global.tx_isolation |
+-----------------------+
| REPEATABLE-READ       |
+-----------------------+
1 row in set, 1 warning (0.00 sec)

Set the transaction isolation level:

mysql> SET GLOBAL tx_isolation = 'READ-COMMITTED';
Query OK, 0 rows affected, 1 warning (0.00 sec)

Make permanent settings in the MySQL configuration file:

[mysqld]
transaction-isolation = READ-COMMITTED

2. Concurrency control strategy (innodb_thread_concurrency):

According to the number of concurrent connections and system load of the e-commerce website platform, adjust the concurrency control strategy appropriately.

You can first set innodb_thread_concurrency to 0, so that MySQL can dynamically adjust the number of concurrent threads according to the system load.

Example:

mysql> SHOW VARIABLES LIKE 'innodb_thread_concurrency';
+---------------------------+-------+
| Variable_name             | Value |
+---------------------------+-------+
| innodb_thread_concurrency | 0     |
+---------------------------+-------+
1 row in set (0.00 sec)

Configure thread concurrency control settings in the MySQL configuration file:

[mysqld]
innodb_thread_concurrency = 0

3. Locking optimization : For tables that are frequently updated on the e-commerce website platform, row-level locking should be used instead of table-level locking to reduce lock conflicts and improve concurrency performance. Adding appropriate indexes to tables that require row-level locking can further improve concurrency performance.

Through careful database configuration tuning, we can maximize the performance and reliability of the MySQL database, thereby ensuring the high concurrent access and transaction processing capabilities of the e-commerce website platform.

In practical applications, it is recommended to conduct performance testing and evaluation according to specific business needs and system conditions, and to conduct regular tuning and optimization checks.

In the following chapters, we will delve into the key steps of hardware and operating system optimization to comprehensively improve the performance and reliability of MySQL databases.

Hardware and OS optimization

Optimization of hardware and operating systems is also critical to the performance and stability of MySQL databases.

In the e-commerce website platform, we can improve system performance through the following configuration and optimization strategies to meet the requirements of high concurrent access and massive data processing.

Hardware Configuration

Hardware configurations are determined based on specific needs and budget, and therefore may vary from case to case.

However, a reasonable hardware configuration reference can be provided based on the characteristics of the above-mentioned e-commerce website platform:

1. Memory (RAM) :

Allocate enough memory to hold the database's cache and index data.

For medium to large e-commerce website platforms, it is recommended to choose at least 64GB to 128GB of memory.

For larger platforms, higher capacities of memory may be required.

2. Storage device :

Choose high-performance storage devices to provide fast data read and write capabilities.

For data files and log files, it is recommended to use a solid-state drive (SSD) or NVMe SSD to improve data access speed.

For larger data volumes, consider using a RAID array for better performance and redundancy.

3. CPU

Choose a multi-core CPU to improve concurrent processing capabilities.

For large-scale e-commerce website platforms, it is recommended to choose high-performance server-level CPUs, such as Intel Xeon series or AMD EPYC series, and consider configuring 4-core or more core CPUs and high-frequency processors.

4. Network bandwidth :

Make sure there is enough network bandwidth to handle a large number of concurrent requests and data transfers.

For e-commerce website platforms, it is recommended to choose a high-speed network connection, such as Gigabit Ethernet or a higher-speed network.

Please note that this is only a rough hardware configuration reference, and specific hardware requirements should be considered comprehensively based on actual conditions, budget, and performance requirements.

Of course, generally medium and large e-commerce platforms basically adopt mysql cluster deployment, and cluster deployment also needs to consider the following aspects:

  1. Data sharding : Distribute the data of the database to multiple nodes. Each node is responsible for managing a part of the data. When configuring distributed MySQL, you need to determine the strategy and rules for data sharding, such as sharding according to user ID, geographic location, or other business-related methods.
  2. Master-slave replication : Configure master-slave replication on each node to achieve data replication and synchronization. The master node is responsible for processing write operations, and the slave nodes are responsible for processing read operations. The availability and scalability of the system can be improved through master-slave replication.
  3. Load balancing : Use a load balancer to distribute requests to different database nodes. A load balancer can decide which node to send a request to based on load conditions, node status, and other policies. Common load balancers include Nginx, HAProxy, etc.
  4. High Availability : Configure failover and automatic failback mechanisms to ensure that the system remains available in the event of a node failure. High availability can be achieved using master-slave replication and heartbeat monitoring.
  5. Security : Configure appropriate security measures for distributed MySQL, including access control, encrypted transmission, firewall settings, etc., to protect the security and integrity of data.
  6. Network communication : To ensure that the network communication between nodes is smooth and the delay is low, you can consider using a high-speed network connection and optimizing network settings.
  7. Monitoring and performance optimization : Configure monitoring tools and performance optimization tools for real-time monitoring and performance tuning of distributed MySQL. You can use tools such as Prometheus, Grafana, etc. for monitoring and data analysis.
  8. Scalability : Consider future expansion requirements and design scalable architecture and configuration. For example, adding more nodes, adjusting the sharding strategy, etc.

The above are the general configuration principles of distributed MySQL servers. The exact configuration depends on application requirements, data volume, budget, and performance requirements.

operating system configuration

Configure at the operating system level to meet MySQL optimization. The following are some common configuration items and their detailed descriptions, functions, configuration methods, scripts and commands:

1. File descriptor limit :

A file descriptor is an identifier used by the operating system to keep track of open files.

Adjusting the file descriptor limit can improve MySQL's ability to handle concurrent connections and file operations.

File configuration method:

  • Edit /etc/security/limits.confthe file and add the following configuration:
    * soft nofile 65536
    * hard nofile 65536
    

Command configuration method:

  • Set soft limits:
    ulimit -n 65536
    
  • Set a hard limit:
    ulimit -Hn 65536
    

illustrate:

MySQL uses file descriptors to manage database files, log files, etc.

By increasing the file descriptor limit, MySQL can open more files at the same time, improving the ability of concurrent connections and file operations.

2. Network parameter tuning :

Adjusting network parameters can improve the network communication performance of the database.

Configuration method:

  • Edit /etc/sysctl.confthe file and add the following configuration:

    net.core.somaxconn = 65535
    net.ipv4.tcp_max_syn_backlog = 65535
    net.ipv4.tcp_tw_reuse = 1
    
  • Reload configuration file:

    sysctl -p
    

illustrate:

  • net.core.somaxconn: Specify the maximum number of connections of the system in the queue state, increasing this value can increase the number of clients connected to the MySQL server at the same time.
  • net.ipv4.tcp_max_syn_backlog: Specifies the maximum number of connection requests that the operating system can process without completing the three-way handshake. Increasing this value can accommodate more connection requests.
  • net.ipv4.tcp_tw_reuse: Enable the address reuse function of TCP connections, which can reduce the number of connections in TIME-WAIT state and release system resources.

3. Time synchronization :

Time synchronization ensures the time accuracy of the database server and prevents data problems caused by time out of synchronization.

Configuration method:

  • Install and configure the NTP (Network Time Protocol) service to synchronize the server time with the standard time.

Install command:

  • Install NTP service (Ubuntu as an example):
    apt-get install ntp
    
  • Start the NTP service:
    service ntp start
    

illustrate:

Time synchronization is to ensure time consistency between the database server and other servers and clients.

Many functions and logging of MySQL rely on accurate time. By installing and configuring the NTP service, the database server can be synchronized with the standard time source to avoid data problems caused by time out of synchronization.

The above are some common configuration items for optimizing MySQL at the operating system level. Note that specific configurations may vary by OS version and distribution. When configuring, it is recommended to refer to official documents and operating system best practice guidelines, and make appropriate adjustments based on actual needs.

Next, we continue to optimize queries at the SQL level.

query optimization

The first choice for query optimization is to understand the query optimizer. The query optimizer is a key component in MySQL, which is responsible for analyzing query statements and generating optimal query execution plans.

The query optimizer evaluates different execution plans based on factors such as query complexity, table statistics, and indexes, and selects the execution plan with the lowest cost to execute the query.

The working principle and related concepts of the query optimizer are as follows:

  1. The workflow of the query optimizer :
    • Parse the query statement : The query optimizer first parses the query statement and converts it into an internal query tree or logical expression.
    • Query rewriting : The optimizer may rewrite the query to optimize the query structure and query conditions.
    • Query optimization : The optimizer generates different execution plans based on statistical information, indexes, and other relevant information, and evaluates the cost of each execution plan.
    • Select the optimal execution plan : The optimizer selects the execution plan with the lowest cost and generates the execution instructions of the execution plan.
    • Execute query : MySQL's execution engine executes the query and returns the result according to the execution plan generated by the optimizer.
  2. The optimization process of the query optimizer :
    • Query estimation : The optimizer estimates the size of the query result set based on statistical information and query conditions to decide which execution plan to use.
    • Index selection : The optimizer decides whether to use an index and which index to use based on the selectivity of the index and the selectivity of the columns.
    • Connection order selection : For queries involving multiple tables, the optimizer selects an appropriate table connection order to reduce the size of intermediate result sets and the cost of connection operations.
    • Subquery optimization : The optimizer attempts to convert a subquery into a join operation or apply optimization techniques to reduce the number of subquery executions and overhead.
    • Rewrite query : The optimizer may rewrite the query and use an equivalent query structure to improve query execution efficiency.
  3. Use of statistical information :
    • Table Statistics : The optimizer uses table statistics, such as the number of rows, the number of unique values ​​for a column, etc., to estimate the selectivity and cost of a query.
    • Index statistics : The optimizer uses index statistics, such as index selectivity, average data page size, etc., to evaluate the cost of index usage.
    • Update statistics : Statistics will change as the data changes, and the optimizer may need to update statistics regularly to maintain the accuracy of query optimization.
  4. Influencing factors of query optimizer :
    • Query complexity : The higher the complexity of the query, the more execution plans the optimizer needs to consider, and the optimization time and cost will also increase.
    • Data distribution : The distribution of data will affect the optimizer's index selection and connection order decisions, and different data distributions may lead to different execution plans.

Only by understanding the logic involved in the query optimizer can we better grasp the related methods of query optimization.

The main means of query optimization

SQL query optimization involves many aspects. The following is a detailed description of some common SQL query optimization techniques and related aspects:

Optimize the query statement:

  • Use appropriate SQL statements : Select appropriate SQL statements according to query requirements to avoid redundant or complex query operations.
  • Reduce the amount of data returned : only select the required columns, avoid returning unnecessary data, and reduce network transmission and result set processing overhead.

Create appropriate indexes:

A MySQL index is a data structure used to speed up data retrieval and improve query performance.

It is similar to the directory of a book. By sorting and storing according to the value of one or more columns, the database can locate and access specific data rows faster.

The types of indexes are as follows:

  • B-Tree index : B-Tree (balanced tree) is the most commonly used index type for MySQL. It stores index data in a tree structure and supports fast range search and precise search.
  • Hash index : The hash index maps the value of the index column to a hash value and stores it in a hash table. It works for equality queries, but not range queries and sorting.
  • Full-text index : Full-text index is used to perform full-text search on text fields, providing more advanced text search functions.

How indexes are created and viewed:

  • Create an index : use the CREATE INDEX statement to create an index, you can choose a single-column index or a composite index, and you can also specify the sorting method and storage type for the index.
  • View index : Use the SHOW INDEX statement to view the index information of the table, including index name, column name, type, etc.
  • Modify and delete indexes : Use the ALTER TABLE statement to modify or delete existing indexes.

When creating an index, the following principles can be followed to ensure the validity and performance of the index:

  1. Choose the right column : choose the column that is often used in query conditions, joins and sorting as the index column. These columns are usually frequently used for data retrieval and filtering, and the query performance can be improved through indexing.
  2. Consider the selectivity of the column : choose a column with high selectivity as the index column, that is, the more unique values, the better. Highly selective columns can narrow the search range faster and improve query efficiency.
  3. Avoid too many indexes : Too many indexes will increase the overhead of data insertion, update and deletion, and take up more storage space. It is necessary to weigh the number of indexes and the degree of performance improvement to avoid creating unnecessary indexes.
  4. Consider composite indexes : Composite indexes can be created when multiple columns are frequently used in query conditions or join operations at the same time. Composite indexes can reduce the number of indexes and improve query performance.
  5. Index length : For string-type columns, try to specify an appropriate index length to reduce the storage space occupied by the index and the query performance overhead.
  6. Consider the sorting requirements of the query : If the query often involves sorting operations, you can include sorting columns in the index to avoid the use of temporary tables and sorting operations and improve query performance.
  7. Monitor and evaluate the effect of the index : After the index is created, evaluate the effect of the index through performance monitoring tools and query analysis. Adjust and optimize the index according to the actual performance.
  8. Regularly maintain indexes : As data changes and query patterns evolve, the effectiveness of indexes may change. Rebuild or optimize the index regularly to maintain the efficiency and performance of the index.

In general, index creation requires comprehensive consideration of query mode, data characteristics, and performance requirements.

By properly designing and using indexes, you can speed up queries, reduce the overhead of data scans, and improve the overall performance of your database.

When creating and managing indexes, it is necessary to select the appropriate index type, index columns, and index quantity according to actual needs and query modes, so as to achieve the best query optimization effect.

Optimize the data model and table structure:

  • Normalized data model : follow the specifications of database design, eliminate data redundancy, and improve query efficiency.
  • Reasonably divide tables and partitions : Divide large tables into smaller tables or use partitioning technology to improve query efficiency and data maintenance performance.

Monitor and analyze query performance:

  • Use performance monitoring tools : monitor database performance indicators, such as query response time, lock waiting time, etc., and discover performance bottlenecks in time.
  • Analyze the execution plan : use the EXPLAIN statement to analyze the execution plan of the query, check the index usage and performance bottlenecks, and optimize the query statement and index design.

Regular maintenance and optimization:

  • Regularly collect statistical information : By collecting table statistical information, optimize the decision of the query optimizer and improve the accuracy and performance of the query plan.
  • Rebuild indexes regularly : When the index fragmentation is serious, rebuild the index regularly to improve the efficiency of the index.

SQL query optimization is a comprehensive work, which needs to comprehensively consider database structure, index design, query statement, system configuration and other aspects. By continuously optimizing query performance, the response speed of the database and the overall performance of the system can be improved.

Actual cases of e-commerce scenarios

Preparation

Before the actual combat case demonstration, we need to prepare relevant data,

We all know that in an e-commerce platform, the core data are: users, products, orders,

Therefore, we need to create three corresponding tables and initialize a large amount of data in batches.

Among them, the simple design of the table structure is as follows:

CREATE TABLE `my_customer` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) NOT NULL DEFAULT '' COMMENT '姓名',
  `age` int(3) DEFAULT '20' COMMENT '年龄',
  `gender` tinyint(1) NOT NULL DEFAULT '0' COMMENT '性别 0-女 1-男',
  `phone` varchar(20) DEFAULT '' COMMENT '地址',
  `address` varchar(100) DEFAULT NULL,
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `my_customer_name_IDX` (`name`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='客户';

CREATE TABLE `my_order` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) NOT NULL,
  `product_id` int(11) NOT NULL,
  `quantity` int(11) NOT NULL DEFAULT '1' COMMENT '数量',
  `total_price` int(11) NOT NULL DEFAULT '1' COMMENT '总价',
  `order_status` smallint(5) unsigned NOT NULL DEFAULT '0' COMMENT '订单状态 0-未支付 1-已支付 2-派送中 3-已签收',
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='订单';

CREATE TABLE `my_product` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(100) NOT NULL COMMENT '商品名',
  `type` int(11) NOT NULL DEFAULT '1' COMMENT '类型 1-衣服 2-食品 3-书籍',
  `brand` varchar(100) DEFAULT '' COMMENT '品牌',
  `shop_id` int(11) NOT NULL DEFAULT '1' COMMENT '店铺ID',
  `created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='商品';

Among them, the user data volume is 1 million, the product data volume is 100,000, and the order data volume is nearly 10 million.

Next, we analyze and optimize based on the common query scenarios of the actual e-commerce platform.

Scenario 1: User Search

E-commerce background management systems usually need to search for relevant user information based on user names, mobile phone numbers, and addresses.

Common query SQL statements are as follows:

select * from `my_customer` where phone like '%176%'

We use to explain analyze the execution process of the sql statement.

MySQL EXPLAINis a very useful tool that can analyze and explain the execution plan of query statements to help developers optimize query performance. By executing EXPLAINthe command, you can get detailed information about the query execution plan, including the following fields:

  1. id: A unique identifier for the query, used to identify the different steps or subqueries of each query.
  2. select_type: Query type, indicating the type and method of the query. Common values ​​include:
    • SIMPLE: A simple query that contains no subqueries or joins.
    • PRIMARY: Outer query.
    • SUBQUERY: subquery.
    • DERIVED: A query for a derived table.
    • UNION: UNION query.
    • UNION RESULT: The result of the UNION query.
    • DEPENDENT UNION: A UNION query that depends on the outer query.
    • UNION RESULT: The result of the UNION query.
    • DEPENDENT UNION: A UNION query that depends on the outer query.
    • UNION RESULT: The result of the UNION query.
    • DEPENDENT UNION: A UNION query that depends on the outer query.
  3. table: Table name, indicating the table name or alias of the table involved in the query.
  4. partitions: Partition information, if the query involves a partition table, the partition information will be displayed.
  5. type: Access type, indicating the access method and algorithm used by the query. Common values ​​include:
    • ALL: Full table scan, the entire table needs to be scanned.
    • index: Only access the index without scanning table data.
    • range: Query using index range.
    • ref: Query using a non-unique index or a unique index prefix.
    • eq_ref: Use unique index for equivalent query.
    • const: Constant query, use constant value to query.
    • system: System table query.
    • NULL: Invalid or unknown query type.
  6. possible_keys: Indexes that may be used, indicating the list of indexes that may be used by the query.
  7. key: The index actually used, indicating the index actually used by the query.
  8. key_len: Use the length of the index, indicating the number of bytes used in the index.
  9. ref: Connection condition, indicating the columns or constants used in the connection.
  10. rows: Estimated number of rows, indicating the estimated number of rows scanned by the query.
  11. filtered: The percentage of rows filtered, indicating the percentage of rows actually returned in the query result.
  12. Extra: Additional information, indicating additional information about the query, which may include sorting, temporary tables, used files, etc.

By analyzing EXPLAINthe output results, you can understand the query execution plan, access method, and possible performance problems. According to the field information in the output results, query statements, index design and database configuration can be optimized to improve query performance and efficiency.

mysql> explain select * from `my_customer` where phone like '%157%';
+----+-------------+-------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table       | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | my_customer | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 995164 |    11.11 | Using where |
+----+-------------+-------------+------------+------+---------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

We can see that in the execution plan of the SQL statement, the type field is ALL, which means a full table scan, which will lead to low query efficiency and long time consumption.

First of all, we should consider adding an index to the query field, such as the phone field.

mysql> CREATE INDEX my_customer_phone_IDX USING BTREE ON store.my_customer (phone);

It should be noted here that the use of fuzzy matching queries %at the beginning will cause the index to fail.

You can try to change the query condition to %a fuzzy match ending with , for example

select * from `my_customer` where phone like '157%';

Next, use explainthe command to view the execution plan again:

mysql> explain select * from `my_customer` where phone like '157%';
+----+-------------+-------------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+-----------------------+
| id | select_type | table       | partitions | type  | possible_keys         | key                   | key_len | ref  | rows   | filtered | Extra                 |
+----+-------------+-------------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+-----------------------+
|  1 | SIMPLE      | my_customer | NULL       | range | my_customer_phone_IDX | my_customer_phone_IDX | 83      | NULL | 103520 |   100.00 | Using index condition |
+----+-------------+-------------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

We can see that it is actually used in the execution of SQL my_customer_phone_IDX 索引. Compared with the full table scan, it is estimated that the scan function is only more than 100,000 rows.

In the actual development process, it should be avoided SELECT *: only select the required fields, instead of using wildcards *. Selecting only necessary fields can reduce data transfer and memory overhead and improve query performance.

For example, we only need to query the user id and name according to the user's mobile phone number,

Then, the sql should be rewritten as follows:

select id, name from `my_customer` where phone like '157%';

So here, can the current SQL statement be further optimized?

The answer is yes,

First we need to understand 回表查询this concept,

Back-to-table query means that when using a non-covering index (Non-Clustered Index) for query, when the data column required to obtain the query result is not in the index, MySQL needs to return to the main index (Clustered Index) or Get the missing data columns in the data page.

In the query back to the table, first locate the index record that meets the query condition according to the non-covering index, and then obtain the corresponding data column in the main index or data page through the pointer in the index. This process involves two disk accesses, that is, accessing the index page first, and accessing the main index or data page again. Therefore, compared with the query covering the index, the query back to the table will introduce additional disk read operations, which increases the overhead and response of the query. time.

Table return query may have a certain impact on query performance, especially in the case of large data volume and high concurrent query. Therefore, in order to reduce the overhead of returning to the table query, you can consider 覆盖索引(Covering Index)optimizing it by,

Covering Index means that during the query process, the index contains all the data columns required for the query, and there is no need to go back to the table to query the main index or data page. In other words, the covering index can directly provide the data required by the query without having to access the main index or data page, thereby improving query performance and efficiency.

In a general index, only the indexed columns and the reference pointer of the main index are included. When executing a query, MySQL first locates the eligible records through the index, and then obtains the missing data columns through the main index or data page. This process is called back-to-table query. The covering index avoids the overhead of querying back to the table, because the index itself contains all the data columns required for the query.

The benefits of covering indexes are mainly reflected in the following aspects:

  1. Improve query performance : Since the covering index can directly provide the data required by the query, it reduces the random access of the disk and additional query operations back to the table, thus speeding up the execution speed of the query.
  2. Reduce disk I/O : back-to-table queries require additional disk read operations, while covering indexes can reduce disk I/O operations and reduce system disk load.
  3. Reduce memory consumption : Covering indexes can reduce the amount of data that needs to be loaded into memory, save memory usage, and improve query efficiency.

To create a covering index, you need to choose the appropriate index columns to include all the columns involved in the query statement. This requires comprehensive consideration of factors such as query requirements, data column selectivity, and index size. It should be noted that creating too many covering indexes may increase index maintenance costs and storage space usage.

In short, the covering index is an optimization method. By including all the data columns required by the query through the index, it avoids querying back to the table and improves the performance and efficiency of the query. Using covering indexes allows you to optimize queries where appropriate, but there is a trade-off in the design and maintenance costs of the indexes.

Here, we recreate my_customer_phone_IDXthe index, the script is as follows:

CREATE INDEX my_customer_phone_IDX USING BTREE ON store.my_customer (phone,name);

Re-use explainthe command to view the execution plan again:

mysql> explain select id, name from `my_customer` where phone like '157%';
+----+-------------+-------------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
| id | select_type | table       | partitions | type  | possible_keys         | key                   | key_len | ref  | rows   | filtered | Extra                    |
+----+-------------+-------------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
|  1 | SIMPLE      | my_customer | NULL       | range | my_customer_phone_IDX | my_customer_phone_IDX | 83      | NULL | 100018 |   100.00 | Using where; Using index |
+----+-------------+-------------+------------+-------+-----------------------+-----------------------+---------+------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

Here, we can see Extrathat the value of the field contains Using index, indicating that index coverage is triggered, and there is no query back to the table, and the query time is greatly reduced.

Similarly, if the SQL is as follows,

select count(name) from `my_customer` where phone like '157%';

Covering indexes also take effect.

Scenario 2: Order query

Regardless of whether it is on the user's App side or in the e-commerce background, there are scenarios for order inquiries.

For example, we need to query the order of the corresponding brand according to the brand,

We first add a brand field as an index to the product table:

CREATE INDEX my_product_brand_IDX USING BTREE ON store.my_product (brand);

Let's first give a common query SQL:

select * from my_order mo  where product_id  in (select id from my_product mp  where brand  = 'Apple');

The sql query takes nearly 6000ms, check the execution plan:

mysql> explain select * from my_order mo  where product_id  in (select id from my_product mp  where brand  = 'Apple');
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------+---------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys                | key     | key_len | ref                 | rows    | filtered | Extra       |
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------+---------+----------+-------------+
|  1 | SIMPLE      | mo    | NULL       | ALL    | NULL                         | NULL    | NULL    | NULL                | 7529130 |   100.00 | NULL        |
|  1 | SIMPLE      | mp    | NULL       | eq_ref | PRIMARY,my_product_brand_IDX | PRIMARY | 4       | store.mo.product_id |       1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+------------------------------+---------+---------+---------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

You can see that there are two execution plans, in which the query of the order table uses a full table scan,

Let's add an index to the fields of the order table prodcut_id,

CREATE INDEX my_order_product_id_IDX USING BTREE ON store.my_order (product_id);

Check the execution plan again:

mysql> explain select * from my_order mo  where product_id  in (select id from my_product mp  where brand  = 'Apple');
+----+-------------+-------+------------+------+------------------------------+-------------------------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys                | key                     | key_len | ref         | rows | filtered | Extra       |
+----+-------------+-------+------------+------+------------------------------+-------------------------+---------+-------------+------+----------+-------------+
|  1 | SIMPLE      | mp    | NULL       | ref  | PRIMARY,my_product_brand_IDX | my_product_brand_IDX    | 403     | const       | 1027 |   100.00 | Using index |
|  1 | SIMPLE      | mo    | NULL       | ref  | my_order_product_id_IDX      | my_order_product_id_IDX | 4       | store.mp.id |   75 |   100.00 | NULL        |
+----+-------------+-------+------------+------+------------------------------+-------------------------+---------+-------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

Here, we can see that both plans use prodcut_idthe field index, which speeds up the query efficiency.

Although subqueries fulfill the query requirements in the current situation, using subqueries may cause some performance problems, so when optimizing queries, it is generally not recommended to rely too much on subqueries. Here are some reasons:

  1. Executing multiple queries : The efficiency is too poor. When executing subqueries, MYSQL needs to create temporary tables, and delete these temporary tables after the query is completed. Therefore, the speed of subqueries will be affected to a certain extent. Here is an additional creation and destruction of temporary tables. the process of.
  2. Poor readability and maintainability : Complex nested subqueries can make query statements difficult to understand and maintain. Subqueries often require an understanding of the nesting hierarchy and the relationships between individual subqueries, making query statements lengthy and difficult to read.
  3. Lack of optimization flexibility : The database optimizer has relatively weak optimization capabilities when processing subqueries. It is difficult for the optimizer to fully optimize complex nested subqueries, and may not be able to select the best execution plan, resulting in performance degradation.
  4. May cause performance problems : Subqueries may cause full table scans or creation of temporary tables, increasing the I/O burden and memory consumption of the system. Especially when subqueries involve large amounts of data or involve multi-table associations, performance problems may be more apparent.

For subqueries that can be replaced by join queries (JOIN) or other more efficient methods, it is generally recommended to use more concise and efficient query methods. Join queries can make better use of indexes and optimize execution plans, while providing better readability and maintainability.

However, subqueries are not recommended in all cases. In some specific scenarios, subqueries are a reasonable choice, such as when you need to perform existence checks or nest aggregate functions in the query. When using subqueries, it is necessary to comprehensively consider the trade-offs of performance, readability, and maintainability according to the actual situation to ensure the best query results.

Here, we should rewrite the SQL statement as a connection query (JOIN),

SELECT mo.id as orderId, mo.customer_id as customerId, mp.name as productName, mo.order_status as orderStatus FROM my_order mo JOIN my_product mp ON mo.product_id = mp.id WHERE mp.brand = 'Apple';

Although multi-table join query (multi-table JOIN) is one of the common query methods, it is difficult to guarantee the efficiency once the amount of data involved in the join is large. In this case, it is strongly recommended to fetch data from a single table based on the index, and then Do join and merge data in the application layer.

The advantages of association at the application layer are as follows:

  1. Improve caching efficiency : Applications can conveniently cache the result objects of single-table queries. By splitting the associated query, when the data in the associated table changes, the query cache will not be affected, thereby improving the efficiency of the cache.
  2. Reduced lock contention : Splitting queries can reduce lock contention. When executing a single query, only a single table is involved, which reduces lock conflicts and improves concurrency performance.
  3. Ease of database splitting : performing associated queries at the application layer makes it easier to split the database, providing high performance and scalability.
  4. Improve query efficiency : When using IN() instead of associated queries, MySQL can query in the order of IDs, which may be more efficient than random associated queries.
  5. Reduce redundant record query : application layer association query means that each record only needs to be queried once, while association query in the database may require repeated access to some data. Therefore, this refactoring also reduces network and memory overhead.
  6. Hash association is more efficient : application layer association is equivalent to implementing hash association in the application, instead of using MySQL's nested loop association. In some scenarios, hash associations are much more efficient.

Conversely, the reason why JOIN is not recommended:

  1. Performance pressure on large-scale tables : When the amount of data in a table reaches millions, using JOIN may lead to performance degradation.
  2. Distributed sub-database and sub-table : Cross-database JOIN is not recommended, because the current MySQL distributed middleware does not support cross-database JOIN well.
  3. The complexity of table structure modification : It is relatively easy to modify a single table query, but modifying the SQL statement of JOIN is more complicated and the maintenance cost is higher.

Of course, join is also beneficial in some scenarios.

For example, pagination query: JOIN query can be paginated conveniently, you can use the fields of the sub-table as query conditions, use the matching fields of the sub-table as the result set when querying, and use the main table for IN() query.

Scenario 3: Paging query

Then, let's look at the optimization in the case of paging queries:

A typical pagination query statement is as follows:

SELECT mo.id as orderId, mo.customer_id as customerId, mo.order_status as orderStatus FROM my_order mo where mo.order_status  = 1  order by mo.id asc limit 1000000, 10

limit is the most commonly used pagination method. During execution, it is equivalent to traversing the first 1,000,000, then taking the 1,000,000 to 1,000,10th, and discarding the first 1,000,000. The larger the limit, the lower the query performance, and the limit is only applicable to Paginated queries on small data ranges.

mysql> explain SELECT mo.id as orderId, mo.customer_id as customerId, mo.order_status as orderStatus FROM my_order mo where mo.order_status  = 1  order by mo.id asc limit 1000000, 10;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | mo    | NULL       | index | NULL          | PRIMARY | 4       | NULL | 1000010 |    10.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

So how should we optimize?

We can use the index to optimize. For example, if we query the 1,000,000th piece of data in pages, and the order ID is 3997806, then all the order IDs in the next page are greater than 3997806.

The sql statement can be rewritten as:

SELECT mo.id as orderId, mo.customer_id as customerId, mo.order_status as orderStatus FROM my_order mo  inner join (select id from my_order  where id > 3997806 and order_status  = 1 limit 100) mo2 on mo.id = mo2.id order by mo.id asc

The execution of SQL statements is reduced from 10s to 100ms, which is nearly 100 times higher.

We can look at the execution plan,

mysql> explain SELECT mo.id as orderId, mo.customer_id as customerId, mo.order_status as orderStatus FROM my_order mo  inner join (select id from my_order  wh
ere id > 3997806 and order_status  = 1 limit 100) mo2 on mo.id = mo2.id order by mo.id asc
    -> ;
+----+-------------+------------+------------+--------+---------------+---------+---------+--------+---------+----------+---------------------------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref    | rows    | filtered | Extra                           |
+----+-------------+------------+------------+--------+---------------+---------+---------+--------+---------+----------+---------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL   |     100 |   100.00 | Using temporary; Using filesort |
|  1 | PRIMARY     | mo         | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | mo2.id |       1 |   100.00 | NULL                            |
|  2 | DERIVED     | my_order   | NULL       | range  | PRIMARY       | PRIMARY | 4       | NULL   | 3764565 |    10.00 | Using where                     |
+----+-------------+------------+------------+--------+---------------+---------+---------+--------+---------+----------+---------------------------------+
3 rows in set, 1 warning (0.00 sec)

From the query plan, we can see that first, the subquery obtains up to 10 order IDs according to the primary key index, and then obtains the data details according to these 10 IDs. There is no need to query millions of pieces of data and then sort and fetch the required rows of data.

Scenario 4: Order Statistics Report Query

Next, let's look at the fourth scenario, order statistics.

E-commerce platforms often need to count order data from multiple dimensions, such as the number of orders, the total amount of orders, the ranking of popular products, and so on.

Suppose here that we need to query the number of orders and the total order amount of different commodities,

The first version of sql given looks like this:

select mo.product_id , count(*) as num , sum(mo.total_price) from my_order mo group by mo.product_id 

The execution part is as follows:

|      99995 |  65 |               21528 |
|      99996 |  85 |               24549 |
|      99997 |  75 |               23156 |
|      99998 |  89 |               27123 |
|      99999 |  90 |               24190 |
|     100000 |  79 |               26625 |
+------------+-----+---------------------+
100000 rows in set (1 min 48.82 sec)

It took more than 48 seconds for 10,000 items to arrive.

For group statistics query, the following are some optimization ideas:

  1. Use appropriate indexes : To support grouping and statistical operations, consider creating appropriate indexes. Optimization ideas include:
    • Create indexes for grouping fields and statistical fields to improve the efficiency of grouping and aggregation operations.
    • Consider a covering index, that is, the index contains all the required fields to avoid querying back to the table.
    • For different query scenarios and conditions, select the appropriate index type (such as B-tree index, hash index, etc.).
  2. Cache the result set : For frequent group statistics queries, you can consider caching the result set to avoid recalculation every time. Optimization ideas include:
    • Use caching technology (such as Redis) to store result sets so that statistics can be retrieved quickly.
    • Set an appropriate cache invalidation strategy, and perform regular updates or manual updates according to the update frequency of the data.
  3. Pre-aggregated data : For large data volumes and complex statistical queries, you can consider pre-computing and storing aggregation results to reduce the amount of calculations during querying. Optimization ideas include:
    • Create regular or real-time pre-aggregation tasks and store statistical results in specific tables.
    • Get the results directly from the pre-aggregated table when querying, avoiding repeated calculations and grouping operations.
  4. Set grouping fields reasonably : For grouping statistics query, the selection of grouping fields will affect query performance. Optimization ideas include:
    • Try to select fields with high cardinality (more different values) as grouping fields to reduce the number of groups and calculations.
    • Avoid using too many complex expressions or functions as grouping fields in queries to reduce calculation overhead.
  5. Consider parallel computing : For large-scale data grouping statistics query, you can consider using parallel computing to improve query efficiency. Optimization ideas include:
    • Split the query task into multiple parallel subtasks, each processing a different subset of data.
    • Parallel queries are supported using parallel computing frameworks or database engines to speed up queries and increase throughput.

Of course, specific optimization strategies need to be selected and adjusted according to specific business scenarios and data characteristics.

Because we have indexed the product_id field in the previous scenario, we can follow the third and fifth optimization suggestions: parallel computing, aggregate data at the application layer,

Consider that each sql only counts some commodities, for example:

select mo.product_id , count(*) as num , sum(mo.total_price) from my_order mo  where mo.product_id between 1000 and 2000 group by mo.product_id;

Here, only the orders with product IDs in the range of (1000, 2000) are counted, and we can query different data multiple times.

After using segmentation statistics, we can look at the execution efficiency,

|       1997 |  91 |               27524 |
|       1998 |  54 |               14298 |
|       1999 |  74 |               24560 |
|       2000 |  68 |               23343 |
+------------+-----+---------------------+
1001 rows in set (1.26 sec)

mysql> explain select mo.product_id , count(*) as num , sum(mo.total_price) from my_order mo  where mo.product_id between 1000 and 2000 group by mo.product_id
;
+----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+-----------------------+
| id | select_type | table | partitions | type  | possible_keys           | key                     | key_len | ref  | rows   | filtered | Extra                 |
+----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+-----------------------+
|  1 | SIMPLE      | mo    | NULL       | range | my_order_product_id_IDX | my_order_product_id_IDX | 4       | NULL | 147998 |   100.00 | Using index condition; Using temporary; Using filesort  |
+----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

It can be seen that the my_order_product_id_IDX index is used here to speed up the query. In addition, due to the reduction in the amount of data, the time spent on sorting and statistics is also greatly reduced.

In addition, in the execution plan, Extrathe field contains Using filesorta value, which indicates that sorting is also used by default during the grouping process.

If sorting is not required, we should display the statement, modifying the SQL to look like this:

select mo.product_id , count(*) as num , sum(mo.total_price) from my_order mo  where mo.product_id between 1000 and 2000 group by mo.product_id order by null

MySQL performance monitoring and alarm

MySQL performance monitoring and alarming are key steps to ensure database operation stability and performance optimization. It can help us monitor various indicators of the database in real time, and send out alarms in time when abnormal situations occur, so that timely measures can be taken for troubleshooting and performance optimization.

The following are some commonly used MySQL performance monitoring indicators, tools and usage instructions:

  1. Monitoring indicators:
    • Query performance : including slow query, query response time, query throughput, etc.
    • Connection and concurrent performance : including the number of concurrent connections, thread pool usage, connection requests, etc.
    • Cache hit rate : including the hit rate of query cache and InnoDB buffer pool.
    • Lock waiting and deadlock conditions : including lock waiting time, deadlock times, etc.
    • Disk IO performance : including read and write speed, disk utilization, IO waiting time, etc.
    • Master-slave replication status : including delay time, synchronization status, etc.
  2. Monitoring tools:
    • MySQL Enterprise Monitor : An official commercial monitoring tool that provides comprehensive performance monitoring and alerting capabilities.
    • Percona Monitoring and Management (PMM) : A free and open source MySQL monitoring and management tool that provides rich performance metrics and alerting capabilities.
    • Nagios : A widely used open source monitoring tool that can monitor various indicators of MySQL through plug-in extensions.
    • Zabbix : Another open source monitoring tool that supports MySQL's performance monitoring and alerting capabilities.
    • Prometheus : An open source system monitoring and alerting tool that can monitor MySQL performance indicators through plug-ins or Exporters.
  3. Instructions for use:
    • Configure the monitoring tool : According to the selected monitoring tool, install and configure according to its documentation, including specifying the MySQL instance to be monitored and setting the alarm threshold.
    • Select the appropriate indicators : According to actual needs and concerns, select indicators to be monitored and set appropriate alarm thresholds.
    • Regularly collect and analyze data : The monitoring tool will periodically collect MySQL performance data and store it in the database. Data can be viewed and analyzed through the interface or API of the monitoring tool.
    • Set alarm rules : According to the rule setting function of the monitoring tool, set appropriate alarm rules and trigger an alarm when the indicator exceeds the threshold.
    • Troubleshooting and optimization : When an alarm is received, troubleshoot in a timely manner, locate the problem and take corresponding optimization measures, such as adjusting parameter configuration, optimizing query statements, increasing hardware resources, etc.
    • Regular evaluation and adjustment : Regularly evaluate database performance and make adjustments according to requirements, including adding monitoring indicators, adjusting alarm thresholds, and optimizing the configuration of monitoring tools.

Through the comprehensive use of monitoring tools and appropriate indicators, you can monitor MySQL performance in real time and discover potential problems in advance. Taking timely measures for troubleshooting and performance optimization can ensure the stability and high-performance operation of the database.

MySQL tuning summary

MySQL tuning is a continuous optimization process that involves multiple aspects of work, including parameter configuration, index optimization, query optimization, and hardware operating system optimization.

When tuning, it is necessary to comprehensively consider various factors according to the actual configuration of the server and the requirements of the application layer, and optimize based on actual scenarios.

The tuning ideas can be summarized as follows:

  1. Monitoring and evaluation : First of all, MySQL needs to be monitored and evaluated to understand the performance bottleneck of the database and the cause of the bottleneck. Issues such as slow queries, high loads, and resource bottlenecks can be identified through monitoring tools and performance assessment reports.
  2. Parameter configuration optimization : According to the results of monitoring and evaluation, the parameter configuration of MySQL is adjusted in a targeted manner. Reasonably configure parameters such as buffer size, number of concurrent connections, and query cache to make full use of system resources and improve performance.
  3. Index optimization : Design and optimize appropriate indexes by analyzing query statements and data access patterns. Avoid creating too many indexes, and ensure that critical queries can use indexes to speed up query operations.
  4. Query optimization : optimize query statements to avoid full table scans and unnecessary join operations. Use technical means such as appropriate query conditions, JOIN statements, and subqueries to improve query performance.
  5. Hardware and operating system optimization : According to the actual configuration of the server and the characteristics of the operating system, corresponding hardware and operating system optimization is carried out. Reasonable allocation of memory and disk space, optimization of file system and network configuration, in order to improve the operating efficiency of the database.

The tuning work needs to be closely combined with the actual configuration of the server and the application layer. Different application scenarios and business requirements may require different tuning strategies and focus areas. Therefore, in the tuning process, it is necessary to work closely with application developers to understand the characteristics and requirements of the application, so as to ensure the effectiveness and feasibility of the tuning scheme.

Tuning is an ongoing process that requires regular monitoring and evaluation of system performance, and optimization and adjustments based on actual conditions. Constantly optimizing the MySQL database can improve system performance and stability, and provide users with better experience and response speed.

The Java Tuning Bible Iteration Planning

All the articles and PDF e-books of the Nien team are continuously iterative, and the latest version is always the most complete.

Combining senior architectural experience and industry cases, the Nien team sorted out a series of " Java Tuning Bible " PDF e-books, including five parts planned for this article:

(1) Tuning Bible 1: Zero-based proficiency in Jmeter distributed pressure measurement, 10Wqps+ pressure measurement practice (completed)

(2) Tuning Bible 2: from 70s to 20ms, a 3500 times performance optimization practice, the solution is available to everyone (completed)

(3) Tuning Bible 4: How to do mysql tuning? Deadly 7 tricks to tune slow SQL by 100 times, realize your freedom of Mysql tuning (this article)

(4) Tuning Bible 3: Zero-based proficiency in JVM tuning operations to achieve JVM freedom (writing)

(5) Tuning Bible 5: Zero-based proficiency in Linux and Tomcatl tuning operations to achieve freedom of infrastructure tuning (in writing)

The above articles will be published on the official account of the technology free circle in succession. The complete "Java Tuning Bible" PDF is available from Nien.

recommended reading

" Starting from 0, handwriting MySQL transactions "

" The Most Complete Hadoop Interview Questions in History: Nien Big Data Interview Collection Topic 1 "

" Tencent is too ruthless: 4 billion QQ accounts, give you 1G memory, how to deduplicate? "

" Meituan One Side: After OOM, Will JVM Exit? Why? "

4000 pages of "Nin's Java Interview Collection" 40 topics

Please go to the following [Technical Freedom Circle] to pick up the PDF files of Nien’s architecture notes and interview questions↓↓↓

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/131316025