MySQL from entry to proficiency [Advanced] 20 SQL optimization laws + 10 DBA experiences


insert image description here

0. Preface

In modern software applications, databases have always played a vital role. Although NoSQL`` 和 NewSQL 数据库的使用日益增加,但SQL数据库(结构化查询语言数据库),例如``MySQL, NoSQL PostgreSQL SQLServer`, etc. are still widely used in various business fields. The optimization of SQL queries is undoubtedly a necessary skill for every developer and database administrator (DBA).

SQL query optimization is a complicated process, which includes reasonable writing of SQL statements, reasonable design of database structure, scientific use of indexes, reasonable adjustment of configuration, proper matching of data and hardware environment, etc. Almost every link involving SQL query may affect the efficiency of query.

In this article, we interpret the mainstream 20 SQL optimization laws and suggestions. Don't get me wrong. These laws are not specific technical operations, but a kind of guidance and guiding ideas, not only applicable to a specific database, but also applicable to most application scenarios.

These laws will help you understand when, where, and how to optimize SQL. They will reveal the essence of SQL optimization in a simple and simple way, and lead you to appreciate the charm of concise, efficient, and maintainable SQL statements.

Before starting this optimization journey, we must maintain such a mentality: 优化不是万能的,任何优化都是在一定场景下、对于特定问题的解决方案. therefore,Understand and master these optimization laws, and more importantly, learn to integrate these laws into your actual combat experience, and become your master of SQL design and optimization “内功”. Don't pursue the so-called “绝对优化”, but pay close attention to business needs, taking into account performance and development efficiency

OK, let's start this journey and explore the mysteries of SQL optimization together.

1. 20 Laws of SQL Optimization

Translated, it can also be called 20 optimization principles, or SQL writing principles.
The knowledge and practice in SQL optimization are accumulated and formed on the basis of experience summary and practice of many experts in programming and database management. There may not be one specific source of reference for these 20 laws of SQL optimization, they are more likely to be formed by combining the knowledge and experience of multiple sources, multiple experienced database administrators and developers.

NoSQLAlthough these optimization laws have been widely accepted and used, in specific applications, they need to be flexibly applied in combination with specific database types (such as MySQL, SQL Server, etc.), data structures, and query requirements. At the same time, database optimization is a dynamic and continuous process that involves many factors, including but not limited to data changes, changes in the hardware environment, system upgrades, and changes in business requirements.

1.1. Only Retrieve The Data You Really Need

避免SELECT *,尽量选择真正需要的字段,减少数据传输的负载。

This item is very straightforward, and there is nothing to interpret. This is what the teacher told me in the first class of SQL.

1.2. Be Aware of The Index

制定有效的索引策略,如按照where条件或者join字段建索引,可以极大提升查询效率。

Let's interpret this law, which is what most SQL optimization posts tell. Compared with everyone's practical experience,
Effective use of indexes is key to improving query performance. A well-applied index strategy can reduce the query execution time of the database from hours to seconds.
Some key points for developing an effective indexing strategy:

  1. Create an index for the fields in the WHERE clause of the common query statement : If a certain field often appears in the WHERE clause of the query, then building an index for the field can greatly improve the efficiency of the query.

    For example:

    CREATE INDEX idx_students_age ON Students(Age);
    
  2. Create indexes for fields that are often used by JOIN operations : If fields in JOIN operations are not indexed, then the database may need to perform a full table search to find matching rows, which is very inefficient. Indexing these fields can significantly improve the performance of JOIN.

    For example:

    CREATE INDEX idx_orders_customer_id ON Orders(CustomerID);
    
  3. Use composite indexes to optimize multi-column queries : If your queries often require searches on multiple columns, consider creating a composite index that includes these columns.

    For example:

    CREATE INDEX idx_students_age_name ON Students(Age, Name);
    
  4. Avoid over-indexing : Not all columns need to be indexed. Over-indexing can lead to poor performance for insert, update, and delete operations, because each time these operations occur, the index needs to be updated. We need to find the right balance to only index important, commonly used query paths.

1.3. Use Joins Carefully

JOIN查询的效率通常高于子查询,但要注意JOIN的顺序,正确的顺序可以减少中间结果集的大小。

In SQL query optimization, JOIN operations are usually more efficient than subqueries, because JOIN operations can compare and filter data when reading data, while subqueries usually need to generate an intermediate result set first, and then perform subsequent processing , which may cause efficiency issues when dealing with large data volumes.

When processing JOIN queries, the order of JOIN is also very critical.A good rule of thumb is to JOIN tables with a small number of rows first, which will greatly reduce the size of the intermediate result set, thereby improving query efficiency

MySQL performs JOIN optimization, trying to find the best JOIN order, but not always, especially in complex queries. Therefore, manually specifying the JOIN order, or using STRAIGHT_JOIN in the query to force MySQL to perform JOIN in the specified order, can sometimes help improve performance

For example the following query:

    SELECT * FROM Orders 
    STRAIGHT_JOIN Customers ON Orders.CustomerID = Customers.CustomerID 
    WHERE Customers.Country = 'USA' 

The filtering operation of the Customers table will be performed first, and the size of the result set will be reduced first, and then the JOIN operation will be performed, which can effectively improve the query efficiency.

Optimizing JOIN queries is a process that requires experience and skill, and the order of JOINs is only part of it. We also need to pay attention to other aspects, such as reasonable use of indexes, avoiding full table scans, etc., in order to comprehensively improve query performance.

1.4. Avoid Using NOT IN

尽量避免使用NOT IN,因为NOT IN会导致遍历全表。可以考虑用NOT EXISTS替换。

Avoiding the use of NOT IN is an effective strategy for improving query performance. The NOT IN clause will trigger a full table scan when processing a query. Even if an index is established on the field, the SQL engine will not use the corresponding index, which will seriously affect the query performance.

This is because the database cannot predict the values ​​in the NOT IN list and has to query the full table to make sure no possible results are missed

Although NOT IN and NOT EXISTS can complete similar query tasks, in actual use, we need to pay attention to their performance differences and choose the optimal solution based on actual needs.

Let's think about it Consider the following query:

SELECT * FROM Students WHERE Age NOT IN (18, 19);

我们想想就知道,该查询需要遍历全表,检查每一行的 Age 值是否在列表中,这将消耗大量资源

And NOT EXISTS is an alternative method that can optimize the performance problems caused by NOT IN. NOT EXISTS generally performs better than NOT IN, especially when the correlated subquery returns a large number of rows.

The NOT EXISTS clause is used in SQL queries to determine that a row of the main query does not match a row of the subquery. It is executed once for each row in the subquery until a matching row is found. Once a matching row is found, it immediately stops further processing and returns a result indicating that there is no row matching the subquery for this row of the main query, ie "does not exist".

NOT EXISTS is a semi-join (Semi-Join), which only cares about whether there are matching rows, not how many matching rows. Once the first matching row is found in the subquery, the query stops without searching further, that's how it works.

NOT EXISTS is relatively more efficient in processing queries for the following reasons:

1. NOT EXISTS 具有更短的测试周期,
2. 因为它在发现第一个匹配行时就停止检查,这就意味着数据引擎可以尽早停止扫描操作。

3. 数据库优化器通常能对 EXISTS 的查询进行优化,以提高其性能。

4. NOT EXISTS 通常能使用索引,而 NOT IN 无法使用索引进行查询优化。

This test is not easy to test, because I tested several times and found that the performance difference between the two is basically not big. It may be that the data I created is too small or other reasons, resulting in NOT IN and NOT EXISTS may produce approximately the same results, but In terms of performance, NOT EXISTS is usually the better choice. Let's follow this iron rule and don't go into the details. When I have time later, I will verify it a few times.

Rewriting the above query with NOT EXISTS , it might look like this:

SELECT s.* FROM Students s 
WHERE NOT EXISTS (SELECT 1 FROM (VALUES (18), (19)) AS v(Age) WHERE s.Age = v.Age);

1.5. Use UNION ALL Instead of UNION

UNION需要去重,消耗的性能相对较大。若数据默认不重复,优先考虑使用UNION ALL。

UNIONand UNION ALLare used to merge the result sets of two or more SELECT statements. Although they can both do this, there are significant differences in how they are handled and in terms of performance.

UNION performs deduplication after merging the result sets, which means it deletes duplicate rows. To do this, the database needs to do extra work, like sorting or hashing, which can be costly substantial amount of resources, especially when dealing with large amounts of data.

For example:

SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

The above query will return unique values ​​from both tables.

In contrast, UNION ALL simply merges the two result sets together without removing duplicate rows. This makes UNION ALL faster than UNION because it requires no extra processing to remove duplicate rows.

For example:

SELECT column_name(s) FROM table1
UNION ALL
SELECT column_name(s) FROM table2;

The above query will return all values ​​from both tables including duplicated values.

therefore,If you know or can ensure that the two result sets do not contain duplicate values, you should give priority to using UNION
ALL
. However, if you need to ensure that the values ​​in the result set are unique, you should use UNION, although its performance may be relatively poor.

1.6. Mind the NULLs

NULL可以导致索引失效,需要特别注意处理。

In databases, the treatment of NULL values ​​requires special attention, because NULL values ​​have some unique properties in the behavior of databases. Especially in terms of indexing and query performance, NULL may behave unexpectedly.

  1. In indexes , NULL values ​​may be treated specially by the database engine. Some database systems (such as the one implemented by MySQL MyISAM和InnoDB引擎) will include NULL values ​​in the index, but some systems (for example NoSQL) will not. This means that searching for a field whose value is NULL may result in a full table scan because the database cannot utilize the index for the query. NoSQLFor example, the following query may not use indexes in some database systems, for example :

    SELECT * FROM Students WHERE Age IS NULL;
    
  2. In comparison operations , NULL also has its particularity. In SQL, NULL represents an unknown value. For any comparison operation, including, =、<、>、<>etc., the result is unknown, ie NULL. Even when using = for NULL to compare, the result is NULL, not TRUEor FALSE. This means that the following query may not return the expected results:

    SELECT * FROM Students WHERE Age = NULL;
    

The above query will return no results in most database systems because NULL is not equivalent to any value, including itself.

We should be very careful when dealing with fields containing NULL values. If possible, you should try to avoid NULL values ​​appearing in the index field, so as to ensure that the query can effectively use the index for optimization. Also, comparisons to NULL values ​​should use IS NULL or IS NOT NULL, not = or <>.

Let’s talk about some common ways to deal with NULL values ​​here. Although it is off topic, it is very common and important in work:

  1. To judge NULL value, you can use IS NULLor IS NOT NULLto check whether the field is NULL value.
SELECT * FROM table_name WHERE column_name IS NULL;
  1. In queries, you can use COALESCEthe OR IFNULLfunction to handle NULL values, replace them with specific values, or perform some action.
SELECT column_name, COALESCE(column_name, 'N/A') AS new_column FROM table_name;
  1. Calculations that include NULL values, you need to be extra careful when performing calculations on fields that contain NULL values, because any operation with a NULL value will result in a NULL value. You can use IFNULLthe or ISNULLfunction to handle this situation. For example:
SELECT column1, column2, IFNULL(column1 * column2, 0) AS result FROM table_name;
  1. Handle sorting of NULL values, when sorting fields containing NULL values,NULL values ​​will be placed at the end by default, you can use or ORDER BYin the statement to specify the sorting position of NULL valuesNULLS FIRSTNULLS LAST
SELECT column_name FROM table_name ORDER BY column_name ASC NULLS FIRST;

Care needs to be taken to avoid incorrect results when handling NULL values, so carefully consider how NULL values ​​are handled when writing queries or performing calculations.

1.7. Do Not Use Functions in Predicates

谓词中的函数会导致索引失效,应尽可能避免。

A popular understanding means that functions cannot be used on the left side of query conditions, it is more intuitive for us to illustrate with an example. The function in the predicate will invalidate the index and negatively affect the query performance, so try to avoid using it. Choosing an appropriate database design and query method can help improve performance and efficiency.

A function operates on its input and returns an output. So when a function is used in a predicate, the database actually has to calculate the result of the function for each row before it can decide whether the predicate condition is met. This increases the amount of computation required to process each row and prevents the database from taking advantage of indexes to speed up operations

Consider the following example:

SELECT * 
FROM Customers 
WHERE MONTH(BirthDate) = 7;

The above query is searching for all customers whose birthday is in July, but doing so cannot use an index because it needs to calculate the MONTH(BirthDate)value of the function on every row in the entire table.

How to optimize it?One way to do this is to transform the condition into a form that can take advantage of the index. For example, we might rewrite the above query as:

SELECT * 
FROM Customers 
WHERE BirthDate >= '2021-07-01' 
AND BirthDate < '2021-08-01';

For further optimization, sometimes we can foresee this requirement in the database design stage and design accordingly, such as creating an index on the birthday month, so as to avoid calculation during query.

1.8. The More SQL Statements, the Slower

尽量减少SQL语句的数量,一个复杂的SQL通常比多个简单的SQL快。

In fact, I think this law is very straightforward and there is no need to talk about it. Students with a certain foundation of development should understand what things need to be done during the execution of a SQL. In order to take care of Xiaobai students, I will talk about it roughly


为什么说一个复杂的SQL语句确实往往比多个简单的SQL语句执行得更快


The reason is that when the database processes a query, it not only needs to execute the statement itself, but also needs to process many other operations related to it, such as parsing the query, compiling the query, establishing and canceling the database connection, and so on. These steps are required for every SQL statement, so executing multiple independent SQL statements causes the overhead of these extra operations to accumulate.
Execute a complex SQL query. Although the parsing and compilation overhead of the query itself may be large, it only needs to be performed once. This way, all the needed information can be fetched from the database in one go without multiple round trips between the application and the database.


Let's illustrate with an example, assuming that there is an e-commerce application that needs to obtain all product details of a specific order. Two strategies can be used

  1. Multiple simple SQL queries : first query for the IDs of all the products in the order, then do a separate query for each product ID to get their details.

    SELECT product_id FROM Orders WHERE order_id = 123;
    SELECT * FROM Products WHERE product_id = 1;
    SELECT * FROM Products WHERE product_id = 2;
    ...
    SELECT * FROM Products WHERE product_id = N;
    
  2. A complex SQL query : use the connection operation (JOIN) to collect all the necessary information in one query.

    SELECT Orders.order_id, Products.* 
    FROM Orders 
    JOIN Products ON Orders.product_id = Products.product_id
    WHERE Orders.order_id = 123;
    

Although the query statement of the latter method is more complex, its execution speed may be much faster than the former method because it avoids the overhead of multiple queries and database connections. 哈哈,此定律好像是我强行解读,大家可以刚一下,因为这个定律是总结出的大多数场景, if multiple queries may be more efficient than associated queries under special circumstances, it is only possible.

At the same time, a complex query also helps keep the database's processing logic atomic, which is very important, especially in a concurrent environment.

Remember, don't overcorrect. I'm not encouraging everyone to use Big SQL. It depends on the scene. Although large SQL queries may take more time to write and debug, they are usually better than multiple small queries in terms of execution speed and efficiency. It also includes the shit mountain code written by some students to execute SQL with nested loops haha.

1.9. Use the Database for What It Is Built For

尽量将计算放到数据库中执行,除非对结果集要做的运算特别复杂,
否则数据库通常会比代码执行得快。

How do you understand this sentence? I actually don't agree with this. In our company's SQL specification, it is not allowed to implement business logic in the form of SQL calculation, and it is not allowed to build a function, stored procedure, or trigger by yourself. poison. Because these have a history of blood and tears, the most straightforward point is that when something goes wrong, I don’t know where it is caused. There is no chance to debug online or check logs. Even during project handover, project database type migration (such as MySQL migration to NoSQLor PGSQL) are disasters. ε=(´ο`*))) Oh, I won’t talk anymore, I’ll cry if I talk too much. Students in the banking system here can ignore it. A large number of reports, homework, etc. are historical issues. There is nothing to say. Follow the company's norms and the legacy left by the bosses.
If we force it to be interpreted, we can also chatter from the other side

  1. Database systems are designed to handle large amounts of data efficiently. They can take advantage of technologies such as indexes, query optimizers, parallel processing, etc. to speed up data processing, which are not available in ordinary applications. Therefore, it is usually a good strategy to use the computing power of the database as much as possible.

  2. When computing is performed in the database, the data does not need to be transferred over the network to the application, which can greatly reduce network latency and the overhead of data serialization and deserialization. The processing of data inside the database can also use optimization methods such as parallel computing and memory management of the database.

For example, if you want to calculate the total price of an order, you can ask the database to perform the following query:

SELECT SUM(price * quantity) 
FROM OrderItems 
WHERE order_id = 1234;

Instead of fetching all the line items back into the app and then calculating the total price in the app.

Again, although in most cases, letting the database do the calculations is always better, but there are some exceptions. For some complex calculations, such as those involving complex algorithms or business rules, the database may not be the best place to handle these logics. Moreover, putting a large amount of computing load on the database may also affect the performance and stability of the database. Therefore, it is necessary to weigh and find a reasonable balance. In general, try to let the database do what it is good at - processing and computing data, while the application is mainly responsible for handling business logic and user interaction.

1.10. Use LIMIT to Sample Result Data

使用LIMIT取一部分数据做样本分析,可以避免全表扫描,提升效率。

This is nothing to interpret. The LIMIT keyword is used to limit the number of rows of data in the SQL query result. It can be used to obtain a part of the data sample for analysis, which can save a lot of time and system resources, avoiding full table scans. It is especially useful when only a portion of the data needs to be captured for initial analysis or a quick preview.

1.11. Optimize Like Statements

Like语句中针对前导%做优化,避免全表扫描。

Let's interpret this law. In fact, we can always see in most SQL optimization posts that the leftmost matching principle is actually similar to this. In SQL queries, the LIKE statement is often used for pattern matching. But when we use the leading percent sign (%) in the search mode of the LIKE statement, this will cause the index to be unusable, thus triggering a full table scan, as in the following query:

SELECT * FROM Users WHERE name LIKE '%Smith';

The SQL query will search Usersthe table for all Smithusers whose last name is , however, since the pattern starts with a percent sign (%), the database cannot use the index (if it exists), and has to traverse the entire table, searching for each row.

In order to improve query performance, we can avoid using leading percent signs as much as possible. If we know the beginning of the search pattern, then we should use it instead of the percent sign, the following query will search for all Smithusers whose last name starts with . Since the pattern no longer starts with a percent sign, the database can use the index (if it exists), which improves query performance. :

SELECT * FROM Users WHERE name LIKE 'Smith%';

I heard from a friend in the community that an interviewer asked, if the business requirement is to use the leading % to fuzzy the query, how to optimize it, he was forced to ask the question directly, because his memory is left fuzzy and he cannot use the index, so there is no The method is optimized, but it is not the case. As long as the art is good enough, there is no unparalleled road.
There are two other optimization methods,One optimization method is to consider using full-text search, and
the second method is to use reverse field storage to reduce full table scans
. I think the interviewer is examining the degree of understanding of knowledge, not just memorizing some stereotyped essays. Well, let's talk about these two optimization methods.
Can really help optimize queries when a leading % cannot be avoided.

  1. Use full-text search: Full-text search is a powerful technique that helps in efficient searching in text data. Not only can it ignore leading % when performing a search, but it can also return a relevance score, which is not possible with normal LIKE queries. existIn MySQL, you can use MATCH ... AGAINSTthe statement for full-text search. It should be noted that not all database systems support full-text search, and full-text search requires setting up full-text indexes on relevant columns.

  2. Use reversed fields: This is another technique to optimize for leading %. The basic idea is to keep a reverse copy of a field in the database. For example, if you frequently need to search for usernames that end with a certain string, you can create a new column to hold a reverse copy of the usernames, and do a "begins with the given string" search on that column. In this way, the index can be used for searching. For example:

    -- 原始查询,不会使用索引
    SELECT * FROM Users WHERE username LIKE '%son';
    
    -- 使用倒序字段进行查询,可以使用索引
    SELECT * FROM Users WHERE reversed_username LIKE 'nos%';  -- 假设reversed_username是username的反向拷贝
    
  1. These are some common optimization methods for full table scans. But these all have some prerequisites. These optimization methods require some preliminary work in the database design phase (such as setting up full-text indexes or creating reversed fields). Therefore, the key to optimization is to have a deep understanding of the specific application requirements in order to be able to choose the most suitable optimization strategy.

  2. Although the pattern matching of the leading percent sign provides us with powerful search capabilities, the impact on performance also requires our attention. When we design the database and write queries, we should try to optimize the LIKE statement to avoid full table scans.

1.12. Mind the Column DataTypes

注意列类型,避免因数据类型转换导致的性能下降。

In fact, we should have found two situations in our work. One is that the field type does not match and the index is forced to fail. The other is that the data set encoding does not match. One table defines utf8, and the other table defines utf8mb4. Dead people don't pay for their lives, no matter how hard you check, it's hard to find the problem, it's a lesson of blood and tears.

data type mismatch

When we use columns of different data types for comparison in WHERE conditions or JOIN operations, implicit conversions may occur in SQL queries. This conversion may cause the index to be invalid, so that the query cannot be optimized by using the index, and may even cause wrong query results in some cases. For example, because order_id is an integer type, but a string is used for comparison, the database needs to convert all order_ids to strings, which will make it impossible to use the index on order_id.

SELECT * 
FROM Orders 
WHERE order_id = '12345';  -- 假设order_id是整数类型

Dataset encoding mismatch

Dataset encoding mismatches are also a common problem, especially when dealing with multilingual or text containing special characters. Different tables may use different character set encodings, such as utf8 and utf8mb4, when the two perform data interaction, unexpected problems may occur. For example, you may encounter garbled characters, missing characters, or degraded query performance.

1.13. Use Transactions

有必要的使用事务能保持数据的一致性,并优化数据的修改操作。

There is no interpretation of this. This is the basic knowledge that developers should know and understand. If they can’t use transactions correctly when they should be used, there is no need to optimize SQL. Even the correctness cannot be guaranteed. What else can they do? performance. There are a lot of problems in the daily business, which need to be corrected, and there is still time to consider performance, which is a fantasy.

1.14. Use Prepared Statements

使用预备语句防止SQL注入并能提高性能。

There is nothing to say about this, the teacher told Prepared Statements when I was learning JDBC.

1.15. Leverage Partial Indexes

利用部分索引,如在性别列设立部分索引。

Actually, let's talk about this.For MySQL, the function of creating partial indexes is not supported. Therefore, even if the known data has a specific pattern or distribution, it is still impossible to create an index only for this part of the data..
Although we have learned this law,Unfortunately, MySQL does not yet support the creation of partial indexes (Partial Index). In MySQL, each index will index every row in the table, and some records in the table cannot be indexed

We can think of some ways to achieve similar effects. For example, you might consider creating a new table or view that holds active users and creating an index on that new table or view. This new table or view can then be queried when needed instead of the original big table. This can indirectly improve the efficiency of the query. It also achieved the purpose of saving the country with a curve.
For example a new view is created ActiveUserscontaining only active users. You can then query this view instead of the original users table when querying for active users.

CREATE VIEW ActiveUsers AS 
SELECT * 
FROM Users 
WHERE isActive = 1;

Although this method cannot really implement partial indexes, and may lead to data redundancy or data inconsistency in some cases, it is a feasible method to achieve similar partial index functions in MySQL. If your application scenario is suitable for this method, you can consider using it.

In addition, some other database systems, such as PostgreSQLSQLite, support the creation of partial indexes. If your application scenario extremely requires partial indexes and is not limited by the database system, you can also consider switching to a database system that supports partial indexes.

You can also consider using some other strategies to replace partial indexes, for example, you can create a new table containing only the rows that need to be indexed. Alternatively, for boolean fields, try using bitmaps to store and query data.

There are always more methods than difficulties. There is no optimal solution, and we can look for alternatives that are close to the optimal solution.

16. Use Materialized Views

使用物化视图,节省计算量,提高效率。

Unfortunately, MySQL does not support materialized views. Currently NoSQLand PostgreSQLare supported. What we learn about SQL optimization rules is not limited to MySQL.

The concept of a materialized view is a special database view that can physically store query results. Compared with conventional database views, a materialized view performs data synchronization when data is updated, which can greatly improve the efficiency of data query.

Not only in MySQL, but in most databases, the view is a virtual table. When we query it, we actually execute the corresponding SELECT statement, generate a temporary table, and return the result. This means that every time a view is queried, a query calculation needs to be performed in the background, including operations such as joining, filtering, and sorting.

However, when a materialized view is created, the SELECT statement is executed and the results are stored. In this way, when we query the materialized view, we actually query the calculated result set, which avoids repeated calculations and greatly improves query efficiency, especially suitable for scenarios with large amounts of data and complex queries.
Created a SalesSummarymaterialized view called , which contains the total sales for each product. When we need to query sales summary information, we only need to query this materialized view.

CREATE MATERIALIZED VIEW SalesSummary AS
SELECT product_id, SUM(sales) total_sales
FROM Sales 
GROUP BY product_id;

Both advantages and disadvantages are double-edged swords. We all know that the data of the materialized view is not real-time. When the data of the original table (such as the table in the above example Sales) changes, the data of the materialized view will not be updated immediately. We need to manually update the materialized view through the REFRESH command, or use some tools or database features to automatically maintain the materialized view.

In some database systems, such as NoSQLand PostgreSQL, materialized views are a built-in feature. However, in MySQL, we may need to implement similar functions through some additional means. For example, you can create a new table to simulate a materialized view, and write a trigger or use an event scheduler to update the table when the original data changes, but this measure is strongly not recommended by the author because there are too many pitfalls.

17. Use Appropriate Isolation Level

选择合适的事务隔离级别,平衡并发性能和数据一致性。

I won't go into details on this one, but everyone should understand it.
Four transaction isolation levels are defined in the SQL standard, which are from low to high: Read Uncommitted, Read Committed, Repeatable Read, Serializable . Each level corresponds to a balance point, which respectively solves the three types of problems of dirty reads, non-repeatable reads, and phantom reads. Choosing an appropriate transaction isolation level requires trade-offs based on business requirements and system load conditions. If you have high requirements for consistency, you should choose a high isolation level; if you pay more attention to performance, you can choose a low isolation level. This is because increasing the isolation level can reduce the problem of data inconsistency, but it may also reduce concurrency performance, because higher isolation levels require more complex lock management, which may cause transactions to wait

18. Analyze Your Data

深入理解的数据,找出最适合的优化策略。==也可以翻译为进行SQL执行计划分析==

Only by deeply understanding your data can you find out the most suitable optimization strategy.

data analysis

When you're trying to optimize a critical query, it's important to first understand the general picture of the data being processed by the query. You need to know how many rows of data there are in the table, the approximate size of the returned result set, and the distribution of these data and other basic information. Databases usually provide some built-in functions or tools to help you perform data analysis.
You are right, no matter what database system we use, they usually provide some built-in tools and functions to help us better understand and analyze our data. These tools and functions can help us better understand the basic information of the data, including the amount, distribution, and trend of the data, and so on. , and some open source or commercial third-party tools,Such as pgAdmin, Toad, MySQL Workbench, etc., they usually provide some graphical interfaces and integrate many functions, such as query analysis, index optimization suggestions, performance monitoring, etc., which is also very helpful for database performance optimization.

Data analysis tools provided by the database:

  1. MySQL:

    • SHOW TABLE STATUS: This command can provide general information about the table, including the number of rows, the size of the data, and so on.
    • EXPLAIN: This command can analyze the execution plan of the SQL query to help understand and optimize the query statement. We will write an article about the result analysis of the Explain command later.
  2. PostgreSQL :

    • EXPLAIN: PostgreSQL There is also an EXPLAIN command in EXPLAIN, which is used to analyze the execution plan of the query.
    • pg_stat_views : * This series of views provides statistics about table and index usage. For example, the pg_stat_user_tables view contains access statistics for user tables.
    • pg_stat_activity: This view shows the active sessions in the current system and can be used to understand the activity of the database.
  3. NoSQL:

    • DBMS_STATS: This package provides a set of procedures and functions for collecting and managing database statistics.
    • EXPLAIN PLAN: NoSQL The EXPLAIN PLAN command in EXPLAIN PLAN is used to display the execution plan of the SQL query.
    • V$ views: This series of views provides a lot of information about the internal workings of the database, including performance, wait events, and so on.

Query Plan Analysis

Viewing and understanding a query's execution plan is a key step in optimizing a SQL query. The query plan can tell you how the database will execute your query, including how it will access the data in the database (such as a full table scan or using an index), what operations need to be performed (such as sorting or joining), and so on. Through the query plan, you can find the bottleneck of your query, so as to find out the direction of optimization.

index design

For most relational databases, indexes are a key tool for improving query performance. Properly designing and using indexes can make the database find the data you need faster. However, the index is not a panacea, the index needs to occupy additional storage space, and will increase the complexity of data insertion and update. Therefore, you need to design appropriate indexes according to your data and query characteristics.

database features

Different databases have their own advantages and features. A deep understanding of the characteristics of the database you are using can help you find an optimization strategy that works for you. For example, you might be able to take advantage of some features of the database,Such as partition table, compressed data, parallel query, etc., to improve query performance.
The one marked in yellow is very important. Our company’s IoT data vehicle location track information, although the business system uses MySQL to store, it can also meet the query requirements, because the business generally queries the track data of a vehicle or an order within half a year. The query range does not exceed 7 days, so we use the partition table feature of MySQL to partition the GPS positioning data by shard Key and GPSTime. Even if there are hundreds of millions of data in a single table, it can normally meet business needs.

business understanding

Last but not least, understanding your business needs is paramount. No matter how optimized the technology is, it must meet the business needs as the goal. Understanding your business needs, including data life cycle, data change frequency, query complexity, etc., are the basis for database optimization.

19. Avoid Too Many Joins in a Single Query

Do not have too many JOINs in one query, and it is recommended to keep them within 5.
I think this is an experience value. I checked it, but I didn't find the joinquantity requirement in the SQL standard description. Remember that this is an experience value, so it's OK. There is no need to delve into it.
Here are a few suggestions:

  1. If your query includes a large number of JOINs, you can try to put part of the JOIN operation into a subquery, or create a temporary table first, which can simplify the complexity of the query.
  2. Create indexes for commonly used JOIN fields, so that the database can find matching rows more quickly.
  3. Optimize the order of JOIN. When relational databases process JOIN queries, the choice of table order may affect query performance. Especially when your query contains multiple JOINs, or the JOIN table sizes vary widely, changing the order of JOINs may have a significant impact on performance.

20. Do Regular Performance Tuning

==This straightforward translation is regularly optimized for performance, which is a continuous process. == We should all be able to understand that in the process of accumulating and accumulating database data, the data performance bottleneck of MySQL single table also exists. Therefore, when the data is accumulated to a certain extent, the query performance will decrease exponentially, so regular optimization, archiving the data, and so on. If necessary, you can consider optimizing your data structure, including considering data partitioning, sub-database, sub-table and other strategies, using appropriate data types, and avoiding data redundancy;Regularly check and analyze slow query logs, find out SQL statements that take a long time to execute and consume large resources, and optimize them;

Summarize

The above is my interpretation of the 20 SQL optimization laws, in practical applications, it is necessary to select the appropriate method according to the specific needs. In daily development, we should cultivate the awareness of continuously improving the performance of SQL statements, and develop it into a basic skill. It is a big move when you make a move, and the best state is when you write it.Still the same sentence optimization sometimes only needs a small change to bring about a big improvement

10 DBA experiences

  1. Learn and understand the data you are dealing with : you need to know where the data comes from, where it is used, how it is used, etc.

  2. Design the database : The design of the database is like building a house. You must first draw a good picture, determine how many rooms are needed, how the rooms are distributed, etc., to avoid trouble later.

  3. Regularly back up data : Accidents will always happen. In order to deal with these situations, it is necessary to do a "physical check" on the data frequently. If there is a problem, do a "surgery" in time.

  4. Regularly optimize the database : In order to make the data "run faster", it must be optimized regularly, such as trimming redundant data branches, adding grease to the roads that are often run, and so on.

  5. Keep an eye on the dynamics of the database : In order to find problems and provide timely "remedy", you have to keep an eye on the various situations of the database, such as checking how much data it has eaten, or what data it is processing.

  6. Know the database tools you use : The tools you work with may have many useful functions. Learn and use them, just like a carpenter needs to know all the tools in his hands.

  7. Follow security rules : Protecting data is like protecting your baby, you have to put them in a safe place to prevent bad people or things from getting close to them.

  8. Keep learning new things : New data technologies and new ways to use them will make your job easier, so keep learning and absorb new knowledge like a sponge.

  9. Record what you did : After each operation of the database, it is best to write down what you did, like writing a diary. This makes it easy to find when you need it.

  10. Be prepared to deal with emergencies : This is very important. Larger companies have experienced such as automatic backup failures, sudden restores, and so on. There are always issues that will arise that you didn't expect, so you need a backup plan, such as back-up equipment, emergency documentation, and more.

Guess you like

Origin blog.csdn.net/wangshuai6707/article/details/132571224