A brief discussion on SQL optimization tips | JD Cloud technical team

Review the execution process of MySQL and help introduce how to optimize SQL.

(1) The client sends a query statement to the server;

(2) The server first queries the cache, and if it hits the cache, it immediately returns the data stored in the cache;

(3) After a cache miss, MySQL parses the SQL statement through keywords and generates a corresponding parse tree. The MySQL parser will use MySQL syntax for verification and parsing.

​ For example, verify whether the wrong keyword is used, or whether the keyword is used correctly;

(4) Preprocessing is to check whether the parse tree is reasonable according to some MySQL rules, such as checking whether tables and columns exist, and also parsing names and aliases, and then the preprocessor will verify permissions;

Query the execution engine according to the execution plan and call the API interface to call the storage engine to query data;

(5) Return the results to the client and cache them;



Common strategies for SQL statement performance optimization

1. Create indexes on the columns involved in WHERE and ORDER BY

To optimize the query, you should try to avoid full table scans, and first consider creating indexes on the columns involved in WHERE and ORDER BY.

2. Use default value instead of null in where Try to avoid NULL value judgment on fields in WHERE clause. NULL is the default value when creating a table, but most of the time You should use NOT NULL, or use a special value such as 0, -1 as the default value.

There are four reasons why it is recommended to use default values ​​instead of null in where:

(1) It does not mean that if you use is null or is not null, the index will not be used. This is related to the MySQL version and query cost;

(2) If the MySQL optimizer finds that the cost of indexing is higher than not indexing, it will abandon the index. These conditions !=, <>, is null, is not null are often considered to make the index invalid;

(3) In fact, it is because under normal circumstances, the query cost is high and the optimizer automatically abandons the index;

(4) If you replace the null value with the default value, it will often become possible to index, and at the same time, the meaning will be relatively clear;

3. Use the != or <> operators with caution.

MySQL uses indexes only for the following operators: <, <=, =, >, >=, BETWEEN, IN, and sometimes LIKE.

Therefore: Try to avoid using != or <> operators in the WHERE clause, which will cause a full table scan.

4. Use OR with caution to connect conditions

Using or may invalidate the index, resulting in a full table scan;

You should try to avoid using OR in the WHERE clause to connect conditions, otherwise the engine will give up using the index and perform a full table scan.

Queries can be combined using UNION:

select id from t where num=10

union all

select id from t where num=20

A key question is whether to use indexes. Their speed is only related to whether they use indexes. If the query requires a joint index, UNION all will perform more efficiently. Multiple OR statements do not use the index, so rewrite them in the form of UNION and try to match the index.

5. Use IN and NOT IN with caution

IN and NOT IN should be used with caution, otherwise it will lead to a full table scan. For continuous values, do not use IN if you can use BETWEEN: select id from t where num between 1 and 3.

6. Use with caution left blur like ‘%…’

For fuzzy queries, programmers’ favorite is to use like, which is likely to invalidate the index.

for example:

select id from t where name like‘%abc%’ select id from t where name like‘%abc’ 而select id from t where name like‘abc%’才用到索引。

so:

First, try to avoid fuzzy queries. If you must use them, instead of using full fuzzy queries, you should try to use right fuzzy queries, that is, like '...%', which will use the index; left fuzzy queries like '%...' cannot use the index directly, but You can use the form of reverse + function index and change it to like '...%'; Fully fuzzy query cannot be optimized. If you must use it, it is recommended to use a search engine, such as ElasticSearch. Note: If you must use left fuzzy search like ‘%…’, it is generally recommended to use ElasticSearch+Hbase architecture

7. Using parameters in the WHERE condition will cause a full table scan.

The following statement will perform a full table scan:

select id from t where num=@num

Because SQL only resolves local variables at runtime, the optimizer cannot defer access plan selection until runtime;

It must be selected at compile time. However, if the access plan is built at compile time, the value of the variable is still unknown and cannot be used as input for index selection.

So, you can force the query to use the index instead:

select id from t with(index(索引名)) where num=@num

8. Using EXISTS instead of IN is a good choice

Many times it is a good choice to use exists instead of in:

select num from a where num in(select num from b) Replace with the following statement: select num from a where exists(select 1 from b where num=a.num)

9. The more indexes, the better

Although the index can improve the efficiency of the corresponding SELECT, it also reduces the efficiency of INSERT and UPDATE.

Because the index may be rebuilt during INSERT or UPDATE, how to build the index needs to be carefully considered and depends on the specific situation.

It is best not to have more than 6 indexes on a table. If there are too many, you should consider whether it is necessary to build indexes on some columns that are not commonly used.

10. Use numeric fields as much as possible

(1) Because the engine will compare each character in the string one by one when processing queries and connections;

(2) For digital types, only one comparison is enough;

(3) Characters will reduce query and connection performance and increase storage overhead;

Therefore: Try to use numeric fields. If fields contain only numerical information, try not to design them as character fields. This will reduce the performance of queries and connections, and increase storage overhead.

11. Use varchar and nvarchar instead of char and nchar as much as possible

(1) The varchar variable-length field is stored according to the actual length of the data content, and the storage space is small, which can save storage space;

(2) char is stored according to the declared size, and spaces are not filled if necessary;

(3) Secondly, for queries, searching within a relatively small field is more efficient;

Because first of all, the storage space of variable-length fields is small, which can save storage space. Secondly, for queries, the search efficiency in a relatively small field is obviously higher.

14. Try not to use select * when querying SQL, but specific fields

It is best not to use return all: select * from t, replace "*" with a specific field list, and do not return any fields that are not used.

Disadvantages of select *:

(1) Increases a lot of unnecessary consumption, such as CPU, IO, memory, and network bandwidth;

(2) Increased the possibility of using covering index;

(3) Increases the possibility of return to the table;

(4) When the table structure changes, the front end also needs to be changed;

(5) Query efficiency is low;

15. Precalculate the results to be queried

Pre-calculate the results that need to be queried and put them in the table. Select when querying instead of calculating when querying.

16. The value that appears most frequently after IN is placed first.

If IN must be used, then:In the list of values ​​​​after IN, put the value that appears most frequently at the front, and the value that appears least frequently at the end, and reduce number of judgments.

17. Try to use EXISTS instead of select count(1) to determine whether a record exists.

The count function is only used when counting all rows in the table, and count(1) is more efficient than count(*).

18. Use batch insert or batch update

When there is a batch of inserts or updates, use batch inserts or batch updates instead of updating records one by one.

(1) Multiple submissions

INSERT INTO user (id,username) VALUES(1,'xx'); INSERT INTO user (id,username) VALUES(2,'yy');

(2) Batch submission

INSERT INTO user (id,username) VALUES(1,'xx'),(2,'yy'); By default, newly added SQL has transaction control, which causes each transaction to be opened. And transaction submission, while batch processing is a transaction opening and submission, the efficiency is significantly improved, reaching a certain level, the effect is significant, and it is not visible at ordinary times.

19. Filter out unnecessary records before GROUP BY

To improve the efficiency of GROUP BY statements, you can filter out unnecessary records before GROUP BY.

The following two queries return the same results, but the second one is significantly faster.

Inefficient:

SELECT JOB, AVG(SAL) FROM EMP GROUP BY JOB HAVING JOB = 'PRESIDENT' OR JOB = 'MANAGER' 高效:

SELECT JOB, AVG(SAL) FROM EMP WHERE JOB = 'PRESIDENT' OR JOB = 'MANAGER' GROUP BY JOB

20. Avoid deadlocks

Always access the same table in the same order in your stored procedures and triggers; transactions should be as short as possible and the amount of data involved in a transaction should be minimized; never wait for user input in a transaction .

21. Index creation rules:

The primary key and foreign key of the table must have indexes;

Tables with data volume exceeding 300 should have indexes;

Tables that are frequently connected to other tables should create indexes on the connection fields;

Fields that frequently appear in WHERE clauses, especially those in large tables, should be indexed;

Indexes should be built on highly selective fields;

Indexes should be built on small fields. Do not build indexes on large text fields or even very long fields;

The establishment of a composite index requires careful analysis, and try to consider replacing it with a single-field index;

Correctly select the main column field in the composite index, which is generally a field with better selectivity;

Do several fields of a composite index often appear ANDed together in the WHERE clause? Are there few or no single-field queries? If so, a composite index can be established; otherwise, a single-field index can be considered;

If the fields contained in the composite index often appear alone in the WHERE clause, it is broken into multiple single-field indexes;

If the composite index contains more than 3 fields, carefully consider the necessity and consider reducing the number of composite fields;

If there are both single-field indexes and composite indexes on these fields, you can generally delete the composite index;

For tables that undergo frequent data operations, do not create too many indexes; delete useless indexes to avoid negative impact on the execution plan;

Each index created on the table will increase storage overhead, and the index will also increase processing overhead for insert, delete, and update operations.

In addition, too many compound indexes are generally of no value when there are single-field indexes; on the contrary, they will also reduce the performance when data is added and deleted, especially for frequently updated tables, the negative impact is even greater big. Try not to index a field in the database that contains a large number of duplicate values.

22. When writing SQL statements, the use of spaces should be minimized.

The query cache does not automatically handle spaces. Therefore, when writing SQL statements, the use of spaces should be minimized, especially the spaces at the beginning and end of SQL (because the query cache does not automatically intercept the spaces at the beginning and end).

23. Each table is set with an ID as its primary key.

We should set an ID as the primary key for each table in the database, and it is best to use an INT type (UNSIGNED is recommended), and set the automatically increased AUTO_INCREMENT flag.

24. Use explain to analyze your SQL execution plan

(1)type

system: The table has only one row and is basically not used;

const: The table can match at most one row of data, and is more likely to be triggered when querying the primary key;

eq_ref: For each combination of rows from the previous table, read one row from this table. This is probably the best join type, other than the const type;

ref: For each combination of rows from the previous table, all rows with matching index values ​​will be read from this table;

range: Retrieve only a given range of rows, using an index to select rows. When using =, <>, >, >=, <, <=, IS NULL, <=>, BETWEEN or IN operators, you can use range when comparing key columns with constants;

index: This join type is the same as ALL, except that only the index tree is scanned. This is usually faster than ALL because index files are usually smaller than data files;

all: full table scan;

Performance ranking: system > const > eq_ref > ref > range > index > all. In actual SQL optimization, the ref or range level is finally reached.

(2) Extra commonly used keywords

Using index: Only obtain information from the index tree, without the need to query back to the table;

Using where: The WHERE clause is used to limit which row matches the next table or is sent to the customer. Unless you specifically request or check all rows from the table, the query may have some errors if the Extra value is not Using where and the table join type is ALL or index. A table query is required.

Using temporary: MySQL often creates a temporary table to accommodate the results. A typical situation is when the query contains GROUP BY and ORDER BY clauses that can list columns according to different situations;

25. Use LIMIT 1 when there is only one row of data. 

Sometimes when you query a table, you already know that the result will only be one result, but because you may need to fetch the cursor, or you may want to check the number of records returned.

In this case, adding LIMIT 1 can increase performance.

In this way, the MySQL database engine will stop searching after finding a piece of data, instead of continuing to search for the next piece of data that matches the record.

26. Turn large DELETE, UPDATE, and INSERT queries into multiple small queries

Doesn’t being able to write a SQL statement with dozens or hundreds of lines seem very impressive? However, for better performance and better data control, you can break them into multiple small queries.

27. Reasonably divide the tables and try to control the size of the data in a single table. It is recommended to control it within 5 million.

5 million is not a limit for the MySQL database. If it is too large, it will cause big problems in modifying the table structure, backup, and recovery.

You can use historical data archiving (applied to log data), sub-database sub-table (applied to business data) and other means to control the amount of data.

Author: Jingdong Technology Liang Fawen

Source: JD Cloud Developer Community Please indicate the source when reprinting

IntelliJ IDEA 2023.3 & JetBrains annual major version update New concept "defensive programming": make yourself a stable job GitHub .com runs more than 1,200 MySQL hosts, how to seamlessly upgrade to 8.0? Stephen Chow’s Web3 team will launch an independent app next month Will Firefox be eliminated? Visual Studio Code 1.85 released, floating window Yu Chengdong: Huawei will launch disruptive products next year and rewrite industry history U.S. CISA It is recommended to abandon C/C++ and eliminate memory security vulnerabilities TIOBE December: C# is expected to become the programming language of the year Lei Jun’s paper written 30 years ago: "Computer Virus Determination Expert System Principles and Design
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/10320425