Mysql statement optimization experience

Sql statement optimization and indexing

 

Note: Make good use of explain to view the SQL execution order

EXPLAIN select * from es_order o where EXISTS (select * from es_member m where o.member_id = m.member_id );

It can be seen from the above: the subquery depends on the rows=2226 obtained by the outer query set, and each record in the outer layer will form a new query statement with the subquery.
So the query time is about 0.8s.

用内连查询EXPLAIN select * from es_order o left JOIN es_member m ON  m.member_id=o.member_id ;

All are common queries, so the query time is about 0.004s.

   Reference: http://lopez.iteye.com/admin/blogs/2292546

Reference: http://lopez.iteye.com/admin/blogs/2292546

 

1.Innerjoin and left join, right join, subquery

 

A. inner join inner join is also called equijoin, left/right join is outer join.

 

SELECT A.id,A.name,B.id,B.name FROM A LEFT JOIN B ON A.id =B.id;

 

SELECT A.id,A.name,B.id,B.name FROM A RIGHT JOIN ON B A.id= B.id;

 

SELECT A.id,A.name,B.id,B.name FROM A INNER JOIN ON A.id =B.id;

 

It has been confirmed from various sources that the performance of inner join is faster, because inner join is an equal join, and the number of rows returned may be relatively small. But we must remember that some statements implicitly use equi-join, such as:

 

SELECT A.id,A.name,B.id,B.name FROM A,B WHERE A.id = B.id;

 

Recommendation: Use the inner join connection as much as possible to use the inner join connection

 

b. The performance of subqueries is slower than that of outer joins. Try to replace subqueries with outer joins.

 

  Select* from A where exists (select * from B where id>=3000 and A.uuid=B.uuid);

 

The data in table A is a 100,000-level table, and table B is a million-level table. It takes about 2 seconds to execute on the local machine. We can see that the subquery is a correlated subquery (DEPENDENCE SUBQUERY) through explain; MySQL is the first Execute a full table query on foreign table A, and then execute subqueries one by one according to uuid. If the outer table is a large table, we can imagine that the query performance will be worse than this.

 

  A simple optimization is to use the innerjoin method to replace the subquery, and the query statement is changed to:

 

   Select* from A inner join B using(uuid) where b.uuid>=3000;

 

  This statement executes the test in less than a second;

 

c. When using ON and WHERE, remember their order, for example:

 

SELECT A.id,A.name,B.id,B.name FROM A LEFT JOIN B ON A.id =B.id WHERE B.NAME=’XXX’

 

The execution process will first execute ON and then filter out some rows of the B table. However, WHERE is to filter the records generated by their two connections.

 

But here's a reminder: the conditions behind ON can only filter out the number of rows in table B, but the number of rows returned by the connection is the same as the number of rows in table A. Such as:

 

SELECT A.id,A.name,B.id,B.name FROM A LEFT JOIN B ON A.id =B.id;

 

The number of records returned is the number of records in table A, the condition after ON only filters the number of records in table B, and

 

SELECT A.id,A.name,B.id,B.name FROM A ,B WHERE A.id = B.id

 

The number of returned records is the records that meet the condition of A.id = B.id after Cartesian product

 

D. When using JOIN, you should use small results to drive the results (left join, the left table result should be as small as possible, if there is a condition, it should be processed on the left side, and right join is the same as the reverse). Split multiple queries (multiple table queries are inefficient, easy to lock tables and block). Such as:

 

Select * from A left join B ona.id=B.ref_id where B.ref_id>10;

 

可以优化为:select * from (select * from A wehre id >10) T1 left join B onT1.id=B.ref_id;

 

2. Build indexes to speed up query performance.

 

A. When building a composite index, if the field used in the where condition is in the composite index, it is best to put this field at the leftmost end of the composite index, so that the index can be used and the query can be improved.

 

b. Ensure that the index of the connection is of the same type, which means that the fields associated with the A table and the B table must be of the same type. These types are all indexed so that both tables can use the index. If the types are different, at least one table cannot use the index.

 

c. Indexes, not only primary and unique keys, but also any other column. When using like one of the indexed field columns.

 

如: select *from A name like ‘xxx%’;

 

This sql will use the index of name (provided that the index is established by name); and the following statement will not use the index

 

Select * from A name like ‘%xxx’;

 

Because '%' represents any character, %xxx doesn't know how to index, so indexing cannot be used.

 

D. Compound Index

 

For example, there is a statement like this: select* from users where area ='beijing' and age=22;

 

If we create indexes on area and age respectively, since mysql query can only use one index at a time, although this has improved a lot of efficiency compared to full table scan without indexes, if area and age are created on two columns Compound indexes will bring higher efficiency. If we create a composite index of (area, age, salary), then it is actually equivalent to creating three indexes (area, age, salary), (area, age), (area), which is called the best left prefix feature . Therefore, when we create a composite index, we should put the column most commonly used as a constraint on the leftmost, decreasing in turn.

 

E. The index will not contain columns with NULL values

 

As long as the column contains NULL values, it will not be included in the index (unless it is a unique value field, which can have a NULL value), as long as there is a column in the composite index that contains NULL values, then this column is invalid for this composite index . So we don't let the default value of the field be NULL when designing the database.

 

F. Use short indexes

 

Index the list, specifying a prefix length if possible. For example, if you have a CHAR(255) column, if most values ​​are unique within 10 or 20 characters, then don't index the entire column. Short indexes can not only improve query speed but also save disk space and I/O operations.

 

g. Sorted index problem

 

Mysql query only uses an index, so if the index has been used in the where clause, the column in the order by will not use the index. Therefore, the default ordering of the database can meet the requirements and do not use the ordering operation; try not to include the ordering of multiple columns, if necessary, it is best to create a composite index for these columns.

 

3.Optimize when the limit is tens of millions of levels of paging.

 

A. We usually use limit, such as:

 

Select * from A order by id limit 1,10;

 

In this way, when there is very little table data, there is no performance problem. If it reaches tens of millions, such as:

 

Select * from A order by id limit10000000,10;

 

Although only 10 records are queried, this performance is unbearable. So why do we continue to use persistence layer frameworks such as hibernate and ibatis when the table data is large, there will be some performance problems unless the persistence layer framework optimizes these large data tables.

 

b. In the above situation, we can use another statement to optimize, such as:

 

Select * from A where id>=(Select idfrom a limit 10000000,1) limit 10;

 

It is indeed much faster, but the premise is that the id field is indexed. Maybe this is not optimal, in fact, it can also be written like this:

 

Select * from A where id between 10000000and 10000010;

 

This is more efficient.

 

5.Limit optimization:
The sql example of the timer is as follows:
select * from table where status=0 limit 29800,200;
select * from table where status=0 limit 30000,200;
select * from table where status=0 limit 30200,200;
Limit paging to this level is very slow.
Tuning method:
select * from table where status=0 and id>0 limit 0,200;
select * from table where status=0 and id>200 limit 0,200;
select * from table where status=0 and id>400 limit 0,200;
After tuning, the effect is very significant, and SQL is executed with almost no delay.

 

 

4. Try to avoid the Select * command

 

A. The more data is read from the table, the slower the query becomes. It will increase the disk operation time, or in the case where the database server is separate from the web server, you will experience very long network delays. Simply because data is being transferred between servers unnecessarily.

( The main consideration is to save the memory of the application server. )

 

5. Try not to use the BY RAND() command

 

 A. If you really need to display your results randomly, there are many better ways to do it. And this function might execute a BY RAND() command for each individual row in the table - which would consume processing power from the processor, and give you just one row back.

 

 

 

6. Use limit 1 to get unique rows

 

 A. Sometimes when you want to query a table, you need to know that you need to look at a row, you may be querying a unique record. You can use limit 1. to stop the database engine from continuing to scan the entire table or index, such as:

 

Select * from A  where name like ‘%xxx’ limit 1;

 

In this way, as long as the query matches records like '%xxx', the engine will not continue to scan the table or index.

 

 

 

7. Minimize sorting

 

A. The sorting operation will consume more CPU resources, so reducing the sorting can improve the cache hit rate

 

 

 

8. Minimize OR

 

 A. When there are multiple conditions in the where clause coexisting with "or", the optimizer of Mysql does not solve the optimization problem of its execution plan very well, coupled with the unique sql and storage layered architecture of mysql, resulting in Because its performance is relatively low, it is often better to use union all or union (when necessary) instead of "or".

 

 

 

9. Try to use union all instead of union

 

 A. The difference between union and union all is that the former needs to combine two (or more) result sets before performing the unique filtering operation, which involves sorting, adding a lot of CPU operations, increasing resource consumption and delay . So when we can confirm that duplicate result sets are impossible or do not care about duplicate result sets, try to use union all instead of union.

 

10. Avoid type conversions

 

A. The "type conversion" mentioned here refers to the type conversion that occurs when the type of the column field in the where clause is inconsistent with the type of the incoming parameter. The artificial conversion is performed by the conversion function, which directly causes MySQL to be unable to use the index. If you have to convert it, you should convert it on the incoming parameter.

 

 

 

11. Don’t operate on columns

 

A. As follows: select * fromusers where YEAR(adddate)<2007; will perform operations on each row, which will cause the index to fail and perform a full table scan, so we can change it to:

 

Select * from users where adddate<’2007-01-01’;

 

 

 

12. Try not to use NOT IN and <> operations

 

A. Neither the NOT IN nor the <> operation will use an index, but will perform a full table scan. NOT IN can be replaced by NOT EXISTS, and id<>3 can be id>3 or id <3; if NOT EXISTS is a subquery, it can be converted into an outer join or an equijoin as much as possible, depending on the business logic of the specific SQL.

 

b. Convert NOT IN to LEFT JOIN such as:

 

SELECT * FROM customerinfo WHERE CustomerIDNOT in (SELECT CustomerID FROM salesinfo );

 

optimization:

 

SELECT * FROM customerinfo LEFT JOINsalesinfoON customerinfo.CustomerID=salesinfo. CustomerID WHEREsalesinfo.CustomerID IS NULL;

 

Without using exists:
select * from es_order o,es_member m  where  o.member_id = m.member_id ;
 


use exists
select * from es_order o where EXISTS (select * from es_member m where o.member_id = m.member_id );

 

 

 

13. Use bulk inserts to save interaction (preferably using stored procedures)

 

A. Try to use insert intousers(username,password) values('test1','pass1'), ('test2','pass2'), ('test3','pass3');

 

 

14. Create a view for queries related to multiple tables

A. The association of multiple tables may have performance problems. We can create views on multiple tables. This way, if the operation is simple, data security is increased. Through the view, users can only query and modify the specified data. In addition, the logical independence of the table is improved, and the view can shield the impact of changes in the original table structure.

demo:

 

Create view V_term
as
select MOBILE,CERTID,GROUPID,'0' as TERMTYPE from carterm
union all select MOBILE,USERNAME,GROUPID,'1' as TERMTYPE from mobileterm
union all select MOBILE,USERNAME,GROUPID,'2' as TERMTYPE from handsetterm
GO
 
--transfer
select * from v_term

 

15. In addition, there are some optimization suggestions

Here are a few common schema design specifications and SQL usage suggestions that help improve MySQL efficiency:

1. All InnoDB tables are designed with a non-business purpose self-incrementing column as the primary key. This is true for most scenarios. There are not many pure read-only InnoDB tables. If this is the case, it is better to use TokuDB.

2. On the premise that the field length meets the requirements, choose the smallest length as possible. In addition, the field attributes should be added with NOT NULL constraints as much as possible, which can improve performance to a certain extent;

3. Do not use TEXT/BLOB type as much as possible. If it is really necessary, it is recommended to split it into sub-tables and not put them together with the main table to avoid poor read performance during SELECT *.

4. When reading data, only select the required columns, do not SELECT * every time, to avoid serious random read problems, especially when some TEXT/BLOB columns are read;

6. Under normal circumstances, the performance of sub-queries is relatively poor, it is recommended to transform into JOIN writing method;

7. When querying multiple tables, the associated field types should be as consistent as possible, and there must be indexes;

8. When querying multiple tables, use the table with a small result set (note that this refers to the filtered result set, not necessarily the full table with a small amount of data) as the driving table;

9. When multiple tables are joined and there is sorting, the sorting field must be in the driving table, otherwise the sorting column cannot use the index ;

10. Use more composite indexes and less use of multiple independent indexes , especially for some columns whose cardinality is too small (for example, the total number of unique values ​​of the column is less than 255), do not create independent indexes;

11. For SQL similar to the paging function, it is recommended to use the primary key to associate first, and then return the result set, the efficiency will be much higher;

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327030810&siteId=291194637