High-performance MYSQL (study notes) - query performance optimization 2

Refactor the query method

One complex query or multiple simple queries

In traditional implementations, there is always an emphasis on the need for the database layer to do as much work as possible. The logic behind this is that network communication, query parsing, and optimization have always been considered costly. But this idea does not apply to MySQL. MySQL is designed to be lightweight in connection and disconnection, and it is very efficient to return a small query result. Modern networks are much faster than they used to be, whether it's loans or latency, and on some versions, a gigabit network can easily handle a wonderful 2000 queries per second.

Split query

Sometimes we need to "divide and conquer" a large query, dividing the large query into small queries, each query has the same function, only a small part is completed, and only a small part of the query results are returned each time. For example, when deleting old data, add a limit of 10000 and delete it cyclically. The lock time on the table can be reduced, and if it is suspended for a while in the middle, the original one-time pressure on the server can be decomposed into a long period of time, reducing the impact of the server.

Decomposing a relational query

Many high-performance applications will decompose the associated query, simply perform a form query on each table, and then correlate the results in the application.

例如:select * fromtag join a on ….join b on …where

It can be decomposed into: select *from tag where …

           Select * from a where …

           Select * fro b in ()…

The advantage of doing this is that

1. To make the cache more efficient, many applications can conveniently cache the result object corresponding to a single-table query.

2. After decomposing the query, executing a single query can reduce lock competition.

3. By making associations at the application layer, it is easier to split the database, and it is easier to achieve high performance and scalability.

4. The efficiency of the query itself will also be improved. Using IN() to replace the associated query allows MySQL to query in the order of IDs, which may be more efficient than random association.

5. It can reduce the query of redundant records and perform associated query at the application layer, which means that the application only needs to query a certain record once, while doing associated query in the database may require repeated access to a part of the data. From this point of view, such refactoring may also reduce network and memory consumption.

6. Going further, this is equivalent to doing hash association in the application instead of using MySQL nested loop association. In some scenarios, the efficiency of hash association is much higher. In many scenarios, it will be more efficient to refactor the query to correlate to the application.

Fundamentals of Query Execution

The process of MySQL executing a query, the steps are as follows:

1. The client sends a query to the server

2. The server checks the cache first, and if the query hits the cache, it immediately returns the result stored in the cache. Otherwise, go to the next stage.

3. The server performs SQL parsing and preprocessing, and the optimizer generates the corresponding execution plan.

4. According to the execution plan generated by the optimizer, MySQL calls the API of the storage engine to execute the query

5. Return the result to the client

MySQL Client/Server Communication Protocol

MySQL client-server communication is half-duplex, which means that at any time, either the server sends data to the client, or the client sends data to the server. So we can't and don't need to cut a message into small pieces and send it independently.

There is a lot of data that the server responds to the user. When the server starts to respond to the client's request, the client must receive the result in its entirety, and then fetch the required results from the previous items. Therefore, the reason for adding LIMIT restrictions to the query is added when necessary.

query status

For a MySQL connection, or a thread, there is a status at any time, which indicates what MySQL is currently doing. Viewed by show Fullprocesslist, the status will change many times during the life cycle of a query. The following is the status column :

The Sleep thread is waiting for the client to send a new request

The Query thread is executing a query or sending results to the client

Locked At the MySQL service layer, the thread is waiting for a table lock. Locks implemented at the storage engine level, such as INNODB row locks, are not reflected in thread state. This is the typical state for myisam.


 

The Analyzing and statistics thread is collecting storage engine statistics and generating query execution plans.

The Copying to tmptable thread is collecting storage engine statistics and generating an execution plan for the query.

Sorting result thread is sorting the result set

A Sending data thread may be passing data between multiple states, or generating a result set, or returning data to the client.

Seeing that by judging the state takes a lot of time.

query cache

     Before parsing a query, if the cache is open, MySQL will first check whether the query hits the data in the query cache. This check is implemented via a case-sensitive hash lookup. If the value in the query and the cache differ by one byte, it will not match the cached result either.

Query optimization processing

The next step in the life cycle of a query is to convert a SQL into an execution plan, and MySQL interacts with the storage engine according to this execution plan. Including: parsing SQL, preprocessing, optimizing SQL execution plan.

Parser and Preprocessing

Parsing grammar rules verification, parsing queries, whether the keyword sequence is correct, whether quotation marks match, whether query tables and data columns exist, parsing names, and aliases.

query optimizer

MySQL uses a cost-based optimizer, which will try to predict the cost of a query using a certain execution plan, and choose the one with the least cost. Initially, the smallest unit of cost was the cost of randomly reading a 4K data page, and later some factors were introduced to calculate the cost of the operation. Generally, it is through: the number of pages of each table or index, the cardinality of the index, the index, the length of the data row, and the index distribution. The optimizer does not consider any level of caching when evaluating the cost, it assumes that reading any data requires a disk I/O.

Optimization strategies can be simply divided into two types, one is static optimization and the other is dynamic optimization. Static optimization can directly analyze the parse tree and complete the optimization. For example, the optimizer can convert the where condition into another equivalent form through some simple algebraic transformations. Static optimizations do not depend on special values ​​and are always valid after the first execution, similar to compile-time optimizations.

Dynamic optimization is related to the query context, such as the value in the where condition and the data row corresponding to the entry in the index, which needs to be re-evaluated every time a query is made.

Here are the types of optimizations that MySQL can handle:

Redefine the order of association tables

The association of data tables does not always occur in the order specified in the query. Determining the order of associations is an important part of the optimizer's function.

Convert outer join to inner join

In the OUTER JOIN statement, the where condition and the library table structure may make the outer join equivalent to an inner join.

Using Equivalent Transformation Rules

MySQL can use some equivalent transformations to simplify and normalize expressions, for example 5=5 and a>5 will be rewritten as a>5. (a<b and b=c) and a=5 will be rewritten as b>5 and b=c and a=5

Optimize count(), min(), max()

Whether the index and column are nullable can usually help MySQL optimize such expressions. For example, to find the minimum value of a certain column, you only need to query the leftmost record corresponding to the B-Tree index, and MySQL can directly obtain the first entry of the index. The largest is also the last entry to get the index.

Estimate and convert to constant expression

When MySQL detects that an expression can be converted into a constant, it will always process the expression as a constant. If the constant condition of this type of index is used in the where clause, MySQL can find it at the beginning of the query. These values, when an id field has a primary key index, the MySQL optimizer knows that this will only return one row of data, where the table access type is const.

Covering index scan

When the columns in the index contain all the columns that need to be used in the query, MySQL can use the index to return the required data without querying the corresponding data row

Subquery optimization

MySQL can convert subqueries into a more efficient form in some cases, thereby reducing multiple queries and multiple data accesses

Terminate a query early

MySQL is always able to terminate the query early when it finds that the query requirements have been satisfied.

Equivalent propagation

If the values ​​of two columns are related by equality, then MySQL can pass the where condition of one column to the other column, for example: select film.film_id from sakila.film inner join sakila.film_actor using(film_Id) where film.film_id >500;

MySQL can know that film_id applies to both the film_actor and film tables

List IN() comparison

In MySQL, the data lines in the in() list are sorted, and then the binary search method is used to determine whether the values ​​in the list meet the conditions. This is an O(log n) complexity operation, which is equivalently converted into an OR query The complexity is O(N), and MySQL's processing speed is faster when there are a large number of values ​​in the IN() list.

Statistics for data and indexes

There is a query optimizer at the server layer and no statistics are kept for data and indexes. Statistics are implemented by the storage engine, and different storage engines may store different statistics.

When the MySQL query optimizer generates the execution plan of the query, it needs to obtain the corresponding statistical information from the storage engine. The storage engine provides the optimizer with corresponding statistical information, including: how many pages are in the latter index of each table, what is the cardinality of each index for each table, the length of data rows and indexes, the distribution information of indexes, etc., the optimizer Based on this information, an optimal execution plan is selected.

How MySQL performs associated queries

The association of MySQL is not limited to the joint query of two tables. MySQL believes that each query may be an association. Let's take a look at the UNION query. MySQL first puts a series of single query results into a temporary table, and then re-reads Take the temporary table data to complete the UNION query. In the MySQL concept, each query is an association, so reading the temporary table is also an association.

MySQL performs a nested loop association operation for any association, that is, MySQL first loops out a single piece of data in a table, then nests loops to the next table to find matching rows, and continues until it finds matching rows in all tables. Then, according to the matching rows of each table, each column required in the query is returned. MySQL will try to find all matching rows in the last associated table. If the last associated table cannot find more rows, MySQL will return to the upper associated table to see if more matching records can be found, and so on. For example, the following example describes what a "nested loop association" is:

Selecttbl1.col1,tbl2.col2 from tbl1 inner join tbl2 using(col3) where tbl1.col1in(5,6);

Outer_iter =iterator over tbl1 where col1 in(5,6)

Outer_row =outer_iter.next

While outer_row

  Inner_iter = iterator over tbl2 where col3 =outer_row.col3

  Inner_row = inner_iter.next

End

Outer_row =outer_iter.next

End

When MySQL encounters a subquery in the from clause, it first executes the subquery and puts the result into a temporary table, and then treats the temporary table as a normal table.

Implementation plan

MySQL generates an instruction tree for the query, and then executes the tree instruction tree through the storage engine and returns the result. The final execution plan contains all the information to reconstruct the query. MySQL executes the query by nesting loops from a table, and backtracking to complete all table associations, which is a left-side depth-first tree.

Associative Query Optimizer

The most important part of the MySQL optimizer is the associated query optimization, which determines the order in which multiple tables are associated. Usually when multiple tables are associated, there can be multiple different association orders to obtain the same execution result, and the association optimizer selects the least expensive association order by evaluating the cost of different orders. For example, when there are multiple inner joins ... inner joins, MySQL will optimize the order of associations and redefine the order of associations to achieve fewer nested loops and backtracking operations.

Sort optimization

When index sorting cannot be used, MySQL needs to sort by itself. If the amount of data is small, it will be done in memory. If the amount of data is large, it needs to use disk, which is collectively referred to as file sorting, even if it is completely in-memory sorting without any disk files. .

MySQL has two sorting algorithms:

two transfer ordering

   Read the row pointer and the bullet that needs to be sorted, sort it, and read the desired row of data based on the sorting result.

single transfer ordering

First read all the required columns, then sort according to the given column, and finally return the sorted result directly. This kind of data does not need to be read twice. For I/O-intensive applications, this efficiency is much improved. This algorithm only needs one sequential I/O to read all data without any random I/O. O.

In the associated query, if sorting is required, MySQL will process such file sorting in two cases. If all the columns in the order by clause are from the first table associated, then MySQL will process the first table in the association. The files are sorted. If there is a query, the extra field in explain will have "Using filesort", and if there is a LIMIT in the query, the LIMIT will also be applied after sorting, so even if less data needs to be returned, the number of temporary tables and sorting still remains will be very large.

query execution engine

In the parsing and optimization phase, MySQL will generate the execution plan corresponding to the query, and the query execution engine of MySQL will complete the entire query according to the execution plan. The execution plan here is just a data structure and will not generate the corresponding bytecode.

Return the result to the client

MySQL returning the results to the client is an incremental/step-by-step process. For example, when the server processes the last associated table and generates the first result, MySQL can gradually return the result set like the client.

The advantage of this is that the server does not need to store too many results, and it does not consume too much memory because too many results are returned. In addition, such processing also allows the MySQL client to obtain the returned result at the first time.


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325422550&siteId=291194637