Dry! SQL performance optimization, high-quality writing SQL statements

When we write SQL statements are often concerned about the results of the implementation of SQL, but really focus on the efficiency of the implementation of SQL, pay attention to whether the wording of the SQL specification?

The following is a summary of dry goods share in the actual development process, we hope to help!

1. limit Page Optimization

When the offset is particularly large, limit efficiency is very low.

SELECT id FROM A LIMIT 1000,10 soon

SELECT id FROM A LIMIT 90000,10 很慢

Option One :

select id from A order by id limit 90000,10;
复制代码

If we combine the order by use. Soon, 0.04 seconds OK. Because the primary key id used to index! Of course, the ability to use indexes also need to be based on business logic here just to remind you that in the page when needed with caution!

Option II

select id from A order by id  between 90000 and 90010;
复制代码

2. Use the limit 1, top 1 Fetch a row

Some business logic query (according to a particular field DESC, taking the maximum sum) can be used to limit 1 or termination top 1 [Index Database] continues to scan the entire table or index.

Counterexample

SELECT id FROM A LIKE 'abc%' 
复制代码

Positive example

SELECT id FROM A LIKE 'abc%' limit 1
复制代码

3. no circumstances use select * from table, replace it with a specific list of fields "*", not to return less than the field, avoid full scan!

4. Insert Batch Optimization

Counterexample

INSERT into person(name,age) values('A',24)
INSERT into person(name,age) values('B',24)
INSERT into person(name,age) values('C',24)
复制代码

Positive example

INSERT into person(name,age) values('A',24),('B',24),('C',24),
复制代码

Optimized sql statement lies in the proper use of the index, and we often make the mistake is in development on a table full scan, one to affect performance, from time-consuming!

Optimization 5.like statement

Counterexample

SELECT id FROM A WHERE name like '%abc%'
复制代码

Since abc front with a "%", so the query will inevitably take full table queries, unless necessary (fuzzy query needs to contain abc), or not preceded by the keyword%

Positive example

SELECT id FROM A WHERE name like 'abc%'
复制代码

6.where clause or not in use in optimization

sql statement in and not in use with caution! Use in or not in the index will be dropped, so full scan!

Option One : between replacement in

Counterexample

SELECT id FROM A WHERE num in (1,2,3) 
复制代码

Positive example

SELECT id FROM A WHERE num between 1 and 3
复制代码

Option Two : exist replaced in

Note: About exist and in usage, credits have eggs ~

Counterexample

SELECT id FROM A WHERE num in (select num from B)
复制代码

Positive example

SELECT num FROM A WHERE num exists (select 1 from B where B.num = A.num)
复制代码

Option Three : left join replaced in

Counterexample

SELECT id FROM A WHERE num in (select num from B) 
复制代码

Positive example

SELECT id FROM A LEFT JOIN B ON A.num = B.num
复制代码

7.where clause of optimization or

Replace commonly used or union union all the way "or" will get better results. where clause used or keyword, the index will be to abandon the use.

Counterexample

SELECT id FROM A WHERE num = 10 or num = 20
复制代码

Positive example

SELECT id FROM A WHERE num = 10 union all SELECT id FROM A WHERE num=20

复制代码

8.where clause IS NULL or IS NOT NULL Optimization

Counterexample

SELECT id FROM A WHERE num IS NULL
复制代码

Use IS NULL in the where clause or IS NOT NULL judgment, abandon the use of the index will be, will be a full table queries .

Positive example

Optimized to set the default value of 0 num , num make sure the table is not a null value, IS NULL usage in real business scenarios SQL so heavily , we should pay attention to avoid full table scan

SELECT id FROM A WHERE num=0
复制代码

9.where clause optimized for expression operand field

Do not function, arithmetic operations, or other expressions in the where clause "=" left, or the system may not work properly indexed.

  • 1
SELECT id FROM A WHERE datediff(day,createdate,'2019-11-30')=0 
复制代码

Optimized for

SELECT id FROM A WHERE createdate>='2019-11-30' and createdate<'2019-12-1'
复制代码
  • 2
SELECT id FROM A WHERE year(addate) <2020
复制代码

Optimized for

SELECT id FROM A where addate<'2020-01-01'
复制代码

10. sort of indexing problems 

mysql query using only one index where clause so if the index has been used, then order by the column will not use the index . So the database default sort can not use the sorting operation in line with the request of the circumstances;

Try not to include sorting multiple columns, if these columns need to create the best composite index .

11. Alternatively union union all possible with

union all union and difference are mainly the former requires two (or more) after the result sets and uniqueness of the filtering operation, which would involve sorting, increase the number of cpu operation, increase the resource consumption and delay . So when we can confirm the result set can not be duplicated or do not care about repeat of the result set time , try to use union all instead of union

12.Inner join and left join, right join, subquery

  • First: inner join is also connected to the equivalent connection is called, left / rightjoin outer connector.
SELECT A.id,A.name,B.id,B.name FROM A LEFT JOIN B ON A.id =B.id;

SELECT A.id,A.name,B.id,B.name FROM A RIGHT JOIN ON B A.id= B.id;

SELECT A.id,A.name,B.id,B.name FROM A INNER JOIN ON A.id =B.id;
复制代码

After many confirmed to the inner join performance is faster, because the inner join is equivalent connection, perhaps relatively small number of rows returned. But we must remember that some statements invisible uses equivalent connection, such as:

SELECT A.id,A.name,B.id,B.name FROM A,B WHERE A.id = B.id;

Recommended: inner join can be connected to make use of inner join connection

  • Second: the external connection performance surpasses the performance of the slow subquery, as far as possible to replace the outer connecting subqueries.

Counterexample

mysql is performed first on the appearance of A full-table query, and then perform queries based uuid successive child, if the outer table is a big table, we can imagine query performance was even worse than this.

Select* from A where exists (select * from B where id>=3000 and A.uuid=B.uuid);
复制代码

Execution time: 2s

Positive example

Select* from A inner join B ON A.uuid=B.uuid where b.uuid>=3000;  这个语句执行测试不到一秒;
复制代码

Execution Time: 1s less than

  • Third: Use JOIN, it should drive big results with little result

left join the results table on the left as small as possible, if the conditions should be placed on the left first deal, right join empathy reverse. Such as:

Counterexample

Select * from A left join B A.id=B.ref_id where  A.id>10
复制代码

Positive example

select * from (select * from A wehre id >10) T1 left join B on T1.id=B.ref_id;
复制代码

13.exist instead of in

Counterexample

SELECT * from A WHERE id in ( SELECT id from B )
复制代码

Positive example

SELECT * from A WHERE id EXISTS ( SELECT 1 from A.id= B.id )
复制代码

analysis:

in comparison is traversed in memory

exist need to query the database, so that when a large amount of data B, exists efficient than in **

in () is executed only once, all the cached table id field B, after checking table id A and B are equal in the table id, the recording will be equal if the id Table A was added to the result set, until traversed a table of all records.

In the operation principle of the process as the code

    List resultSet={};

    Array A=(select * from A);
    Array B=(select id from B);

    for(int i=0;i<A.length;i++) {
          for(int j=0;j<B.length;j++) {
          if(A[i].id==B[j].id) {
             resultSet.add(A[i]);
             break;
          }
       }
    }
    return resultSet;
复制代码

As it can be seen, when a large data table B is not suitable for use in (), because all the data will traverse a table B

Such as: A table has 10,000 records, B table has one million records, then it is possible to traverse up to 10000 * 1000000, poor efficiency.

Another example: A table has 10,000 records, B table has 100 records, then it is most likely to traverse the 10000 * 100 times, traversing times greatly reduced, efficiency is greatly improved.

  Conclusion: in () for the data smaller than the case of Table B Table A

exist () performs A.length () times, execution code is as follows


List resultSet={};
Array A=(select * from A);
for(int i=0;i<A.length;i++) {
    if(exists(A[i].id) {  //执行select 1 from B where B.id=A.id是否有记录返回
       resultSet.add(A[i]);
    }
}return resultSet;
复制代码

When the table for B is greater than A table data using the exists (), because it is not so much traversal operation, just need to execute a query on the line.

Such as: A table has 10,000 records, records B table has one million, then the exists () is executed to judge 10000 A table id is equal to the B table id.

Such as: A table has 10,000 records, Table B there are 100 million records, then the exists () is performed 10,000 times, because it A.length execution times, the more visible the data table B, for the exists () an effect.

Another example: A table has 10,000 records, B table has 100 records, then exists () is executed 10,000 times, might as well use in () traversal 10000 * 100, because in () is traversed relatively in memory, and exists () needs to query the database,

We all know that higher performance of database query consumed, while memory is relatively fast.   

Conclusion: exists () is larger than for the case of the data in Table A Table B

Guess you like

Origin juejin.im/post/5e0f5eec5188253a9d4a436f