Mysql query optimizer of the basic optimization

For a SQL statement, look at the query optimizer is not able to convert JOIN, and then optimize JOIN

Optimization is divided into:

  1. Optimization

  2. Calculate the cost of a full table scan

  3. Find all the index can be used to

  4. Calculate different access ways for each cost index

  5. Pick the smallest cost index and access methods

Open query optimizer log

- Turn 
the SET optimizer_trace = "Enabled = ON"; 
- execute SQL 
- view the log information 
the SELECT * from information_schema.OPTIMIZER_TRACE; 
- closed 
set optimizer_trace = "enabled = off" ;

Constant transfer (Constant_propagation)

a = 1 AND b > a

This can be converted to the above sql:

a = 1 AND b > 1  

Equivalent transfer (equality_propagation)

a = b and b = c and c = 5

This can be converted to the above sql:

a = 5 and b = 5 and c = 5 

Remove condition useless (trivial_condition_removal)

a = 1 and 1 = 1

This can be converted to the above sql:

a = 1

Based on cost

A query can have a different implementation of the program, you can choose an index for the query, you can choose a full table scan, the query optimizer will choose the lowest cost plan to execute the query .

I / O cost

InnoDB storage engine are the data and indexes are stored, when we want to query records in the table, need to put data or index is loaded into memory on the disk and then operate. This is called I / O cost from disk into memory for the time of loading process losses.

InnoDB storage engine to read a page takes to the provisions of the cost of default is 1.0, reading and detecting whether the cost of a record matching the search criteria default is 0.2.

Cost-based optimization step

Before a single-table query actually executed, MySQL query optimizer implementation of the program will find all of the statements might be used to find the lowest cost plan after contrast, the lowest-cost plan is called the execution plan, only after call storage engine interface provided by the real query is executed

Below we have an example to analyze these steps, single-table queries as follows:

select * from employees.titles where emp_no > '10101' and emp_no < '20000' and to_date = '1991-10-10';

1, according to the search criteria, the index may be used to identify all

  • emp_no> '10101', the search condition may be used primary key index PRIMARY.
  • to_date = '1991-10-10', the search condition may be used a secondary index idx_titles_to_date.

To sum up, on top of the query might use the index, which is only possible keys PRIMARY and idx_titles_to_date.

2, the cost of computing a full table scan

For InnoDB storage engine, a full table scan, which means that the records are clustered index and in turn given search condition to do some comparison, the records that match the search criteria are added to the result set, it needs to be clustered indexes correspond the page is loaded into memory, and then detect whether the record matches the search criteria. Because the query cost = I / O Cost + CPU cost, so the calculation of the cost of a full table scan requires two pieces of information:

  1. The number of pages occupied by the clustered index

  2. The number of records in the table

MySQL to maintain a series of statistical information for each table, SHOW TABLE STATUS statement to view the statistics table.

Rows

It indicates the number of records in the table. For table use MyISAM storage engine, this value is accurate, use the table for the InnoDB storage engine, this value is an estimate.

Data_length

Indicates the number of bytes of storage space occupied by the table. Use MyISAM storage engine tables, this value is the size of the data file for table use the InnoDB storage engine, this value is equivalent to the size of the storage space occupied by the clustered index, that can be calculated as the value of size:

Size Data_length = clustered indexes x the number of pages per page

Our titles using the default page size of 16KB, while the top of the query results show the value of Data_length is 20,512,768, so we can reverse to derive the number of pages clustered indexes:

The number of pages clustered index = Data_length ÷ 16 ÷ 1024 = 20512768 ÷ 16 ÷ 1024 = 1252 

We have now got an estimate of the number of pages occupied by the clustered index and the number of records in the table, so you can calculate the cost of a full table scan. But in the real computing costs MySQL will be some fine-tuning.
I / O cost: 12521 = 1252.1252 refers to the number of pages occupied by clustered indexes, 1.0 loading a page refers to a constant cost.
CPU cost: 4420700.2 = 88414.442070 refers to the number of records of data table, InnoDB storage engine is to estimate a value of 0.2 means that the cost required for recording one constant access
Total cost: 1252 + 88414 = 89666 .
In summary, the total cost for the titles of full table scan is required is 89,666.

Doubly linked list of records in the table in front of us said, in fact, are stored in the leaf node clustered index corresponding to the B + tree, so long as we get the left-most leaf nodes by the root, we can form along the leaf nodes of all records View all again. That this process is actually full table scan some B + tree node within the access is not required, but the number of pages that use MySQL clustered index is taken directly in the calculation of the cost of a full table scan as the basis for calculating the I / O cost, is not distinguish between intra-node and leaf nodes.

PRIMARY need to calculate the cost

PRIMARY costing many critical issues that needs: the need to estimate the B + how many records that meet the conditions present in the primary key index tree in accordance with the corresponding conditions where.

Range interval number

When we query the record from the index, whether it is =, in,>, <these operations need to identify a range from the index, the index regardless of the scope of this section takes up many pages in the end, the query optimizer is considered rude to read the index a range of interval I / O cost and reading a page is the same.

PRIMARY used in this example is only one range interval: (10101, 20000), the range corresponding to the index to access a pay interval I / O cost is:

1 x 1.0 = 1.0

The number of records within the estimated range

Calculating an index optimization requires a certain range of the number of records in the end section comprising, for this example is to calculate how many data records contain PRIMARY in (10101, 20000) this range interval, the calculation process is as follows:

Step 1: First according to emp_no> 10101 conditional access at this B + tree index PRIMARY corresponding to find meet emp_no> 10101 The first record of the condition, we put the record called the leftmost interval recording.

Step 2: Then according emp_no <20000 satisfies this condition continue to identify the recording condition first from the B + tree index corresponding PRIMARY, we put the call recording section records the rightmost.

Step 3: If the recording interval leftmost and rightmost recording section not so far apart (spaced as long as not more than 10 pages can be), it is possible to accurately count meet emp_no> record '10101' and emp_no < '20000 ' conditions several. Otherwise, read-only recording intervals along the left-right most 10 pages, calculate the average number of records per page contains, and then use this average value is multiplied by the number of pages of the interval between the leftmost rightmost recording and interval recording on it. So the question again, how to estimate how many pages the interval between the leftmost rightmost recording and interval recording it? Several interposed between the recording directory entries recorded in the corresponding parent node calculation thereof to it.

The above steps can be calculated the number of records PRIMARY index, the CPU reads the recorded cost: 26808 * 0.2 = 5361.6, where 26808 is the number of records of data estimated to be read, a read record 0.2 costs constant.

The total cost of PRIMARY

Determining the cost of access to IO + CPU cost of the filter data = 1 + 5361.6 = 5362.6

 

 

 

 

Idx_titles_to_date need to calculate the cost

Because the query need to return to the table through the secondary index, so when calculating the secondary index a cost plus the cost back to the table, and back to the table of costs is equivalent to the following SQL execution:

select * from employees.titles where the primary key field in (1 master key, the master key 2, ..., primary key value 3);

So the cost of query cost = cost idx_titles_to_date secondary indexes back to the table query +

Comparing the cost select the best person

Select the minimum cost index

Cost is calculated based on the index statistics

Sometimes there will be a lot of single query is executed using the index point range, such as using the IN statement it is prone to a lot of single-point range, such as the following query:

select * from employees.titles where to_date in ('a','b','c','d', ..., 'e');

Obviously, this query may use an index is to idx_titles_to_date, because of this index is not the only secondary index, therefore it can not determine the number of single-point range corresponds to a secondary index records of how much we need to calculate. We calculated the top has been introduced, it is the first to obtain the interval B + tree leftmost rightmost recording and interval recording corresponding to the index, and then calculate how much a small number of records (of records when you can do between these two records accurate calculation, more time can only estimate). Such a range interval to calculate the number of records corresponding to the index corresponding to the index by direct access mode called B + tree index dive.

If only a few single-point range, use index dive way to calculate the number of records in these single-point range corresponding is not a problem, but if a lot of it, for example, there are 20,000, MySQL query optimizer to calculate these single-point range the corresponding index number of records, to 20,000 times the index dive operation, then this case is a waste of performance, so MySQL provides a system variable eq_range_index_dive_limit, we look at the default value of the system variable: SHOW vARIABLES LIKE '% dive% '; 200.

That is, if the number of parameters in our statement IN less than 200, it will use the index dive is calculated as the number of records each corresponding to a single point zone, if more than or equal to 200, then, may not use the index dive , to use the so-called index statistics to be estimated. Like maintains a statistical data the same for each table, MySQL also maintains a statistics table each index, the index view a table of statistics can use the syntax SHOW INDEX FROM table name.

Cardinality property represents the number of columns in the index values will not be repeated. For example, for a table of ten thousand rows, the columns of an index Cardinality property is 10000, that means that there are no duplicate column value, then if Cardinality property is 1, it means that the value of the column are all repeat of. But note that, for the InnoDB storage engine, using SHOW INDEX statement shows out of an index column Cardinality property is an estimate, not a precise. You can be estimated parameters in the statement number IN corresponding to the recording of this property according to:
  1) use the Rows SHOW TABLE STATUS value exhibited, i.e. how many records in a table have.

  2) use SHOW INDEX statement to demonstrate the Cardinality property.

  3) The above two values ​​can be calculated for the number of repetitions of the index idx_key1 key1 column mean a single value: Rows / Cardinality

  The number of parameters in IN statements * Rows / Cardinality: 4) the number of records back to the table is that it requires a total.

NULL value processing

The above statistics columns do not know when duplicate values, will affect the query optimizer. For NULL, there are three ways to understand:

  1) NULL value represents an undetermined value, a NULL value each is unique, and when the column is not repeated in the statistical values ​​should all be treated as independent.

  2) NULL values ​​in business is to represent not, meaning all of NULL values ​​are represented by the same, so all of NULL values ​​are the same, when the column does not duplicate values ​​should only be considered in a statistical.

  3) NULL completely meaningless, when the statistics do not duplicate column values ​​should be ignored NULL.

innodb provides a system variable:

show global variables like '%innodb_stats_method%';

This variable has three values:

  1, nulls_equal: all NULL values ​​are considered equal. The default value is the value of innodb_stats_method. If an index column, particularly if a NULL value, this statistical approach makes optimization thinks a column in the average value of a particularly large number of repetitions, so prefer not to use the index to access it.

  2, nulls_unequal: think all NULL values ​​are not equal. If an index column, particularly if a NULL value, this statistical approach makes optimization thinks a column in the average value of a particular few repetitions, so tend to use the index to access it.

  3, nulls_ignored: NULL value directly to ignore.

Is the positive solution is best not to store NULL values ​​in a column index

Statistical data

InnoDB provides two ways of storing statistical data:

• statistical data stored on disk. 
• statistical data stored in memory, when the server shuts them these statistics were all cleared out.  

MySQL provides us with a system variable innodb_stats_persistent to control which way in the end uses to store statistical data. Prior to MySQL 5.6.6, the default value innodb_stats_persistent is OFF, that is to say InnoDB default statistics are stored in memory, after the release value innodb_stats_persistent default is ON, that is, by default statistical data is stored to disk.

However, InnoDB is the default table for the unit to collect and store statistical data, which means that we can put on a disk, some other statistics table storing statistical data (statistical data and the index of the table) some tables stored in memory. When we can create and modify tables to indicate the statistical data is stored in the table by specifying STATS_PERSISTENT property.

1, disk-based permanent statistics

When we choose to store a table and index statistics of the table to disk, in fact, it is to store these statistics into two tables:

• innodb_table_stats store statistics on the table, each record corresponds to a table of statistics
• innodb_index_stats store statistics on the index, each record corresponds to a statistical term statistical data of an index of

2, regularly updated statistics

• System variables innodb_stats_auto_recalc determine whether the server automatically recalculate statistics, its default value is ON, that is, the feature is enabled by default. Each table maintains a variable that records the table number of additions and deletions of records, the number of records if changes occurred more than 10% of the size of the table, and automatically recalculate the statistics function is open, the server will again conduct a statistical calculation data, and update innodb_table_stats and innodb_index_stats table. But the process of automatically recalculate statistics occurs asynchronously, that is, even if the number of records in the table changes more than 10%, automatically re-calculate statistical data does not occur immediately, may be delayed for a few seconds will be calculated.

• If innodb_stats_auto_recalc system variable is OFF, then we can call ANALYZE TABLE statement manually recalculate the statistics. ANALYZE TABLE single_table;

3, the control implementation plan

Index Hints

Within the scope of restrictions on the use of the index, the data table to establish a lot of indexing, when the index to choose MySQL, these indexes are considering the range: • USE INDEX. But sometimes we want to consider only a few index MySQL, but not all of the index, which need to use USE INDEX the query statement to be set.

• IGNORE INDEX: do not limit the scope of use of the index

• FORCE INDEX: We want MySQL must use a certain index (due to MySQL can only use an index in the query, it can only be forced to use an index MySQL). This requires the use of FORCE INDEX to complete this function.

The basic syntax:

SELECT * FROM table1 USE|IGNORE|FORCE INDEX (col1_index,col2_index) WHERE col1=1 AND col2=2 AND col3=3

 

Guess you like

Origin www.cnblogs.com/tongxuping/p/12329828.html