Optimization of database indexes and SQL processing

Want to design a good index, you must first understand the process of SQL statements in the database server, this article describes a database index design and optimization of several optimization is very important concepts for the index.

predicate

Predicate is a conditional expression. where clause of a SQL statement by one or more predicates.

WHERE   SEX = 'M'
        AND 
        (WHIGHT > 90
        OR
        HEIGHT > 190)

The above three simple WHERE clause predicate:

  • SEX = ‘M’
  • WRIGHT > 90
  • HEIGHT >190

It may also be considered as a combination of two predicates:

  • WEIGHT > 90 OR HEIGHT >190
  • SEX = ‘M’ AND (WEIGHT > 90 OR HEIGHT >190)

 

Optimizer and access path

One advantage of relational databases is that users do not have access to the relationship of the data. Access path which is a component of the DBMS, i.e., the optimizer determined. SQL optimizer is the heart of the process .

Here an example to show a simple mysql mysql server logical structure

In the figure we can see the position of the optimizer.

 

Before the SQL statement to be actually executed, the optimizer must first determine how to access the data. Such as mysql can parse the query and create a parse tree, and then subjected to various optimizations, including the decision to choose the right index to determine the reading order of the table.

The predicate expression is the main starting point for the design of the index. If an index to meet all the predicate expressions SELECT query, then the optimizer is likely to establish an efficient access path.

 

The index match column and

If the organization in the form of B + tree when indexing, if there is a predicate expression WHERE A > 100 AND A < 110, the query to the leaf nodes of the range will eventually figure below:

 

Left of the figure is a narrow segment of the index, we call this segment as an index sheet . This fragment will be scanned sequentially, row index of the above value between 100 to 110, the corresponding table rows by synchronizing the read (and possibly in the buffer pool) read from the table.

Therefore, the cost of access paths depends largely on the thickness of the index sheet, which is the predicate expression determined range. The thicker the index, the more index pages to be scanned, the more the index record in need of treatment, but the biggest cost is increased from the synchronous read operations on the table, each page table read I / O operation may take 10ms. Accordingly, the index sheet is relatively narrow, it will reduce the reading of the synchronization table.

 

The index filter and filtration column

Not all index columns are able to define the size of the index. Sometimes, the column may be present both in the WHERE clause, it is also present in the index, but the column can not participate in the definition of the index, for example. A joint index (A, B, C, D) on the table, the following sql statement:

WHERE   A = :A
        AND
        B > :B
        AND
        C = :C

We need to determine whether the WHERE clause predicate to determine the size of the index:

  1. First we look in the WHERE clause, the column has at least a simple enough predicates corresponding? If so, then this column is to match the column. If not, then the back row and column index are unmatched columns.
  2. If the predicate is a predicate range, then the remaining non-matching index columns are columns.
  3. For the index of the column after the last match column, has a simple enough if the predicate corresponding thereto, then the filter as a column. According to this method, we can determine a contour appears in column A predicate, which is a sufficiently simple predicate, thus matching column A, column B is in a range predicate, also match column. C column B while the back of the index can not be defined (not allow the index becomes narrower), but it still can participate in the filtering process of the index. That is, we define the size of the index sheet through the columns A and columns B, C and the column is not, but before accessing the table, the filter can still be recorded by the column C, is possible to reduce unnecessary access table. Column C belongs filtration columns , it is just as important and Column A Column B.

to sum up:

WHERE clause match the above two columns, column A and column B, to define the index scan. C in addition to a column as the filter column. So you will access the data in the table only when the line at the same time meet the three predicates.

If the predicate expression is equivalent predicate column B, then three columns can be used as a match column.

If the cancellation of column A predicate expression, then the index fragment is the size of the entire index, columns B and C are simply used to filter only.

 

Filter factor

Ratio of the number of rows factors described filter predicate selectivity, i.e., satisfies the predicate condition table occupied, which mainly depends on the distribution of column values.

Filter factor is calculated as:

结果集数量/表行的数量

For example, we have a list of a user SEX this field when adding a female user, SEX = 'F' filter factor will be larger.

If 70% of males in the table, then SEX = 'M' filter factor is 70%, SEX = 'F' filter factor is 30%, the worst case SEX column filtration factor of 70%, an average filter factor 50%.

If the male to female ratio one to one, then the filter factors listed in the SEX worst case and the average filter factor is 50%.

 

 

We assess the suitability of an index when the filter factor in the worst case is more important than the average filter factor, because the worst-case and worst-related input, that input in this condition, the query based on a specific index will consume a maximum time.

 

Combination filter factors for predicates

Then we how to calculate trio predicate expression filter factor of it?

If the columns between predicates irrelevant , then the combination of filter factors for predicates predicates can be derived from a single filter factor.

Non-related mean value independently of each other two predicates, for example, we have a user table, which has a "province" and "city" two fields, and that this is related to the two predicate, because the value of his city must where to save the city. The CITY and BD (birthday) is not related to the predicate.

Such combination predicate CITY = :CITY AND BD = :BDfilter factor equal to the predicates CITY = :CITYand predicate BD = :BDmultiplication factor of the filter.

If the column CITY 2,000 different values, the column BD 2,700 different values, then the combination predicate filter factor is: 1/2000*1/2700. Then the combination of columns [CITY, BD] a total of 5.4 million different values.

For lists relevance, the value will be much smaller than this.

We design the index structure, the need to combine predicates SQL statement seen as a whole to assess the filter factor.

 

Factor on the index filter design

Obviously, the size of the need to scan the index of performance of the access path of influence is essential. The smaller the filter factor, the smaller screened out of the index, the less often it means access to the table.

Suppose the table has a joint index (MAKE, MODEL, YEAR)

For sql statement:

SELECT PRICE, COLOR, DEALERNO
FROM CAR
WHERE   MAKE = :MAKE
        AND
        MODEL = :MODEL
ORDER BY PRICE

MAKE and MODEL are matching columns. If the combination predicate filter factor was 0.1%, then the size of the index will need to access the entire index of 0.1%.

 

For the following sql statement that the index is not good:
SELECT PRICE, COLOR, DEALERNO
FROM AUTO
WHERE   MAKE = :MAKE
        AND
        YEAR = :YEAR

由于联合索引的最左匹配原则,匹配列只有MAKE。过滤因子为1%,索引片比较大。

 

sql语句:

SELECT LNAME, FNAME, CNO
FROM CUST
WHERE   SEX='M'
        AND
        (WEIGHT > 90
        OR
        HEIGHT > 190)
ORDER BY LNAME, FNAME

这个SQL语句查找身材高大有一定要求的男性,此时匹配谓词只有一个SEX,过滤因子正常情况下为50%,如果表有100万行记录,那么索引片就有50万行,这就是相当厚的索引片了。

 

练习

思考一下为以下两个SQL语句设计最佳的索引

SELECT LNAME, FNAME, CNO
FROM CUST
WHERE   SEX = 'M'
        AND
        HEIGHT > 190
ORDER BY LNAME, FNAME
SELECT LNAME, FNAME, CNO
FROM CUST
WHERE   SEX = 'M'
        AND
        (WHIGHT > 90
        OR
        HEIGHT > 190)
ORDER BY LNAME, FNAME

Guess you like

Origin www.cnblogs.com/yuanrw/p/11373975.html