Performance Optimization of SQL Statements

Performance Optimization of SQL Statements

This article will mention 52 SQL statement performance optimization strategies.

1. To optimize the query, you should try to avoid full table scanning. First, consider building indexes on the columns involved in WHERE and ORDER BY.


2. Try to avoid judging the NULL value of the field in the WHERE clause. NULL is the default value when creating a table, but most of the time you should use NOT NULL, or use a special value, such as 0, -1 as the default value.


3. Try to avoid using != or <> operators in the WHERE clause. MySQL only uses indexes for the following operators: <, <=, =, >, >=, BETWEEN, IN, and sometimes LIKE.


4. Try to avoid using OR in the WHERE clause to connect the conditions, otherwise the engine will give up using the index and perform a full table scan. You can use UNION to combine queries.

	select id from t where num = 10 
	union all select id from t where num = 20

5. IN and NOT IN should also be used with caution, otherwise it will cause a full table scan. For continuous values, don't use IN if you can use BETWEEN.

	select id from t where num between 1 and 3

6. The following query will also result in a full table scan. If you want to improve efficiency, you can consider full-text search.

	select id from t where name like '%abc%'
	select id from t where name like '%abc'

In this case, the index is used

	select id from t where name like 'abc%'

7. If parameters are used in the WHERE clause, it will also cause a full table scan.


8. Try to avoid performing expression operations or function operations on fields in the WHERE clause.


9. In many cases, it is a good choice to use EXISTS instead of IN

	select num from a where num in(select num from b)

Replace with the following statement

	select num from a where exists(select 1 from b where num = a.num)

10. Although the index can improve the efficiency of the corresponding SELECT, it also reduces the efficiency of INSERT and UPDATE. Because the index may be rebuilt during INSERT and UPDATE, how to build the index needs to be carefully considered, depending on the specific situation. It is best not to have more than 6 indexes for a table. If there are too many, you should consider whether the indexes built on some infrequently used columns are necessary.


11. Avoid updating clustered index data columns as much as possible, because the order of clustered index data columns is the physical storage order of table records. Once the value of this column is changed, the order of the entire table records will be adjusted, which will consume considerable resources. If the application system needs to update the clustered index data columns frequently, then it needs to consider whether the index should be built as a clustered index.


12. Try to use numeric fields. If the fields that only contain numerical information try not to design them as character types, this will reduce the performance of queries and connections, and increase storage overhead.


13. Use varchar and nvarchar instead of char and nchar as much as possible. Because firstly, the storage space of variable-length fields is small, which can save storage space, and secondly, for queries, the search efficiency in a relatively small field is obviously higher.


14. It is best not to use return all

	select * from t

Replace " * " with a list of specific fields, and don't return any fields that are not used.


15. Try to avoid returning a large amount of data to the client. If the amount of data is too large, you should consider whether the corresponding demand is reasonable.


16. Use the alias of the table (Alias): When connecting multiple tables in the SQL statement, please use the alias of the table and prefix the alias with each Column. This reduces parsing time and reduces syntax errors caused by Column ambiguity.


17. Use "temporary table" to temporarily store intermediate results:
an important way to simplify SQL statements is to use temporary tables to temporarily store intermediate results. But the benefits of the temporary table are far more than these. Temporary results are temporarily stored in the temporary table, and subsequent queries will be in tempdb. This can avoid multiple scans of the main table in the program and greatly reduce the "shared lock" blocking during program execution. "Update lock" reduces blocking and improves concurrency performance.


18. Nolock should be added to some SQL query statements, because reading and writing will block each other, in order to improve concurrent performance. For some queries, nolock can be added to allow writing when reading, but the disadvantage is that uncommitted dirty data may be read.


19. Common simplification rules are as follows:
Do not have more than 5 table connections (JOIN), and consider using temporary tables or table variables to store intermediate results. Use less subqueries, and do not nest views too deeply. Generally, it is better not to nest more than 2 views.


20. Pre-calculate the results that need to be queried and put them in the table, and then SELECT when querying. This was the most important method before SQL7.0, such as the calculation of hospitalization fees in hospitals.


21. The OR clause can be decomposed into multiple queries, and multiple queries can be connected through UNION. Their speed is only related to whether to use indexes. If the query needs to use the joint index, it is more efficient to execute with UNION ALL. Multiple OR clauses do not use the index, rewrite it in the form of UNION and try to match the index. A key question is whether to use indexes.


22. In the list of values ​​behind IN, put the most frequent value at the front, and the least frequent value at the end to reduce the number of judgments.


23. Try to put data processing on the server to reduce network overhead, such as using stored procedures.
Stored procedures are SQL statements that are compiled, optimized, organized into an execution plan, and stored in the database. It is a collection of control flow languages, and the speed is of course fast. Dynamic SQL that is executed repeatedly can use a temporary stored procedure, which is placed in Tempdb.


24. When the server has enough memory, configure the number of threads = the maximum number of connections + 5 , which can maximize the efficiency; otherwise, use the number of configured threads < the maximum number of connections , and enable the thread pool of SQL SERVER to solve the problem, if the number still = the maximum The number of connections + 5 seriously damages the performance of the server.


25. The query association is the same as the writing order

	select a.personMemberID, * from a, b 
	where personMemberID = b.referenceid and a.personMemberID = 'JCNPRH39681' (A = B, B = '号码'select a.personMemberID, * from a, b 
	where a.personMemberID = b.referenceid and a.personMemberID = 'JCNPRH39681' 
	and b.referenceid = 'JCNPRH39681' (A = B, B = '号码', A = '号码'select a.personMemberID, * from a, b 
	where b.referenceid = 'JCNPRH39681' and a.personMemberID = 'JCNPRH39681' (B = '号码', A = '号码'

26. Try to use EXISTS instead of select count(1) to judge whether there are records. The count function is only used when counting all the rows in the table, and count(1) is more efficient than count(*).
27. Try to use ">=" instead of ">".

28. Specifications for the use of indexes:

  • The creation of indexes should be considered in combination with the application. It is recommended that large OLTP tables should not exceed 6 indexes;
  • Use index fields as query conditions as much as possible, especially clustered indexes. If necessary, index index_name can be used to force the specified index;
  • Avoid performing table scan when querying large tables, and consider creating new indexes if necessary;
  • When using an index field as a condition, if the index is a joint index, the first field in the index must be used as the condition to ensure that the system uses the index, otherwise the index will not be used;
  • Pay attention to the maintenance of the index, rebuild the index periodically, and recompile the stored procedure.

29. The columns in the following SQL conditional statements are properly indexed, but the execution speed is very slow:
	SELECT * FROM record WHERE substrIng(card_no, 1, 4) = '5378'  --13秒
	SELECT * FROM record WHERE amount/30 < 1000 --11秒 
	SELECT * FROM record WHERE convert(char(10), date, 112) = '19991201' --10秒

analyze:

The result of any operation on a column in a WHERE clause is calculated column by column at SQL runtime, so it has to do a table search without using an index on that column.

If these results are available at query compilation time, they can be optimized by the SQL optimizer, using indexes, and avoiding table searches, so rewrite the SQL as follows:

	SELECT * FROM record WHERE card_no like '5378%' -- < 1秒
	SELECT * FROM record WHERE amount < 1000*30 -- < 1秒 
	SELECT * FROM record WHERE date = '1999/12/01' -- < 1秒

30. When there is a batch of inserts or updates to be processed, use batch inserts or batch updates, and never update records one by one.


31. In all stored procedures, if SQL statements can be used, I will never use loops to implement them.

For example: to list every day of the previous month, I will use connect by to recursively query, and I will never use a loop from the first day of the previous month to the last day.


32. Choose the most efficient order of table names (only valid in the rule-based optimizer):
Oeacle's parser processes the table names in the FROM clause in order from right to left, and the FROM clause is written in the last The table (the driving table) will be processed first. In the case of multiple tables in the FROM clause, you must choose the table with the least number of records as the base table.

If there are more than 3 tables connected to the query, you need to select the intersection table as the base table, and the intersection table refers to the table that is referenced by other tables.


33. To improve the efficiency of the GROUP BY statement, you can filter out unnecessary records before the GROUP BY. The following two queries return the same results, but the second is significantly faster.

Inefficient:

	SELECT JOB, AVG(SAL) FROM EMP GROUP BY JOB HAVING JOB = 'PRESIDENT' OR JOB = 'MANAGER'

Efficient:

	SELECT JOB, AVG(SAL) FROM EMP WHERE JOB = 'PRESIDENT' OR JOB = 'MANAGER' GROUP BY JOB

34. Use uppercase for SQL statements, because Oracle always parses SQL statements first, converts lowercase letters into uppercase and then executes them.


35. The use of aliases, aliases are the application skills of large databases, that is, table names and column names are aliased with a letter in the query, and the query speed is 1.5 times faster than building a connection table.


36. Avoid deadlocks, always access the same table in your stored procedures and triggers in the same order; transactions should be as short as possible, and the amount of data involved in a transaction should be reduced as much as possible; never Wait for user input within a transaction.


37. Avoid using temporary tables. Unless necessary, try to avoid using temporary tables. Instead, use table variables instead. Most of the time (99%), table variables reside in memory and are therefore faster than temp tables, which reside in the TempDb database, so operations on temp tables require cross-database communication and are naturally slower.


38. It is best not to use triggers:

  • Triggering a trigger and executing a trigger event itself is a resource-consuming process;
  • If it can be implemented using constraints, try not to use triggers;
  • Do not use the same trigger for different trigger events (Insert, Update, Delete);
  • Do not use transactional code in triggers

39. Index creation rules:

  • The primary key and foreign key of the table must have an index;
  • Tables with more than 300 data should have indexes;
  • For tables that are often joined with other tables, indexes should be established on the join fields;
  • The fields that often appear in the WHERE clause, especially the fields of large tables, should be indexed;
  • Indexes should be built on highly selective fields;
  • Indexes should be built on small fields, and do not build indexes for large text fields or even super-long fields;
  • The establishment of a composite index needs to be carefully analyzed, and try to use a single-field index instead;
  • Correctly select the main column field in the composite index, which is generally a field with better selectivity;
  • Do several fields of a compound index often appear in the WHERE clause in an AND manner at the same time? Are there little to no single-field queries? If yes, you can build a composite index; otherwise, consider a single-field index;
  • If the fields contained in the composite index often appear alone in the WHERE clause, it is decomposed into multiple single-field indexes;
  • If the composite index contains more than 3 fields, carefully consider its necessity and consider reducing the composite fields;
  • If there is both a single-field index and a compound index on these fields, the compound index can generally be deleted;
  • For tables with frequent data operations, do not create too many indexes;
  • Delete useless indexes to avoid negative impact on the execution plan;
  • Each index created on the table will increase the storage overhead, and the index will also increase the processing overhead for insert, delete, and update operations. In addition, too many composite indexes are generally useless in the case of single-field indexes; on the contrary, they will also reduce the performance of data addition and deletion, especially for frequently updated tables, the negative impact is even greater big.
  • Try not to index a field in the database that contains a large number of duplicate values.

40. MySQL query optimization summary:

Use the slow query log to find slow queries, use the execution plan to determine whether the query is running properly, and always test your queries to see if they are running optimally.

Performance will always change over time, avoid using count (*) on the entire table, it may lock the entire table, keep the query consistent so that subsequent similar queries can use the query cache, use GROUP BY instead of DISTINCT when appropriate , use indexed columns in WHERE, GROUP BY, and ORDER BY clauses, keep indexes simple, and don't include the same column in multiple indexes.

Sometimes MySQL will use the wrong index, in this case use USE INDEX, check the problem of using SQL_MODE = STRICT, for the index field with the number of records less than 5, use LIMIT instead of OR in UNION.

To avoid SELECT before updating, use INSERT ON DUPLICATE KEY or INSERT IGNORE; do not use UPDATE to implement, do not use MAX; use index fields and ORDER BY clause LIMIT M, N can actually slow down the query In some cases, there are Use sparingly, use UNION in WHERE clause instead of subquery, after restarting MySQL, remember to warm up your database to ensure data in memory and query fast, consider persistent connections instead of multiple connections to reduce overhead.

Benchmark queries, including using the load on the server, sometimes a simple query can affect other queries, when the load increases on the server, use SHOW PROCESSLISE to see slow and problematic queries, test all in the mirror data generated in the development environment Suspicious query.


41. MySQL backup process:

  • backup from secondary replica server;
  • Stop replication while backups are in progress to avoid inconsistencies in data dependencies and foreign key constraints;
  • Completely stop MySQL and make a backup from the database file;
  • If you use MySQL dump for backup, please also backup the binary log file - to ensure that the replication is not interrupted;
  • Don't trust LVM snapshots, it's likely to create data inconsistencies that will get you into trouble in the future;
  • For easier single-table recovery, export data by table - if the data is isolated from other tables.
  • Please use -opt when using MySQL dump;
  • Check and optimize tables before backup;
  • For faster import, foreign key constraints are temporarily disabled during import;
  • In order to import faster, temporarily disable uniqueness detection during import;
  • Calculate the size of the database, tables and indexes after each backup to be able to monitor the growth of the data size;
  • Monitor replication instances for errors and latency with automated scheduling scripts
  • Perform backups regularly.

42. The query buffer does not automatically process spaces. Therefore, when writing SQL statements, you should minimize the use of spaces, especially the spaces at the beginning and end of the SQL (because the query buffer does not automatically intercept the first and last spaces).


43. Is it convenient for member to use mid as the standard for sub-table division?

In general business requirements, the username is basically used as the query basis. Normally, the username should be used as a hash modulo to divide the table.
As for sub-tables, the partition function of MySQL does this, and it is transparent to the code; it seems unreasonable to implement it at the code level.


44. We should set an ID as its primary key for each table in the database, and the best is an INT type (recommended to use UNSIGEND), and set the AUTO_INCREMENT flag that is automatically increased.


45. Set SET NOCOUNT ON at the beginning and SET NOCOUNT OFF at the end of all stored procedures and triggers. There is no need to send a DONE_IN_PROC message to the client after every statement in stored procedures and triggers is executed.


46. ​​MySQL query can enable high-speed query cache. This is one of the effective MySQL tuning methods to improve database performance. When the same query is executed multiple times, it is much faster to fetch data from the cache and return it directly from the database.


47. The EXPLAIN SELECT query is used to track the viewing effect:

Use the EXPLAIN keyword to let you know how MySQL handles your SQL statement. This can help you analyze performance bottlenecks in query statements or table structures. The query results of EXPLAIN will also tell you how the index primary key is used and how the data table is searched and sorted.


48. When there is only one row of data, use LIMIT 1:

When you query the table, you already know that there will only be one result, but because you may need to fetch the cursor, or check the number of records returned.

In this case, adding LIMIT 1 can increase performance. In this way, the MySQL database engine will stop searching after finding a piece of data, instead of continuing to query the next piece of data that matches the record.


49. Select the appropriate storage engine for the table

  • myisam : The application is mainly read and insert operations, with only a small amount of updates and deletions, and the integrity and concurrency requirements of transactions are not very high.
  • InnoDB : transaction processing, and data consistency required under concurrent conditions. In addition to inserts and queries, many updates and deletes are included. (InnoDB effectively reduces locks caused by deletion and update)
    For InnoDB tables that support transactions, the main reason for affecting the speed is that the default setting of AUTOCOMMIT is turned on, and the program does not explicitly call BEGIN to start the transaction, resulting in every insert They are automatically submitted, seriously affecting the speed. You can call begin before executing SQL, and multiple SQLs form a transaction (even if autocommit is enabled), which will greatly improve performance.

50. Optimize the data type of the table and select the appropriate data type:

Principle : Smaller is usually better, simple is good, all fields must have default values, try to avoid NULL.

For example: When designing a database table, use a smaller integer type as much as possible to occupy a smaller disk space. (mediumint is more suitable than int)

Such as time fields: datetime and timestamp. datetime occupies 8 bytes, and timestamp occupies 4 bytes, only half used. Moreover, the range represented by timestamp is 1970-2037, which is suitable for update time.

MySQL can well support the access of large amounts of data, but generally speaking, the smaller the table in the database, the faster the query executed on it. Therefore, when creating a table, in order to obtain better performance, we can set the width of the fields in the table as small as possible.

For example: when defining the zip code field, if it is set to char (255), it will obviously add unnecessary space to the database. Even using the varchar type is redundant, since char(6) will do the job just fine.

For some text fields, such as "province" or "gender", we can define them as ENUM type. Because in MySQL, ENUM type is treated as numerical data, and numerical data is processed much faster than text type.


51. String data type: char, varchar, text choose the difference.


52. Any operation on the column will lead to table scanning, which includes database functions, calculation expressions, etc. When querying, move the operation to the right of the equal sign as much as possible.


The law of good things: Everything will be a good thing in the end, if it is not a good thing, it means that it is not the end yet.

Guess you like

Origin blog.csdn.net/Cike___/article/details/113928791