SQLite database query optimization

SQLite is a typical embedded DBMS, it has many advantages, it is lightweight, small after compilation, one reason is relatively simple in terms of query optimization, it is only indexing mechanism to optimize the use, through to SQLite analysis of query optimization and the study of the source code, I would SQLite query optimization summarized as follows:

First, the factors affecting the performance of the query:

1. Retrieves the number of rows in the table, the smaller the better

2. Sort or not.

3. Do you want to index.

4. Query form of the statement

Second, several query optimization conversion

1. For a single column of a single table, if there are shaped such as TC = clause expr, and are connected together by the OR operator, the form: x = expr1 OR expr2 = x OR x = expr3 time due to the OR in SQLite can not be optimized using the index, so it can be converted into a clause with the iN operator: x iN (expr1, expr2, expr3) so that the index can be optimized, the effect is obvious, but if at all without an index OR statement execution efficiency will be slightly better than the efficiency of the iN statement.

2. If the operator is a clause BETWEEN, the same can not be performed in the index SQLite optimization, it must also be the corresponding equivalent conversion: as: a BETWEEN b AND c can be converted into: (a BETWEEN b AND c) AND ( a> = b) AND (a <= c). In the above clause, (a> = b) AND (a <= c) is dynamic and is set to (a BETWEEN b AND c) of the clause, if BETWEEN statement has been encoded, the clause is ignored excluding, if there is available index clause makes the conditions have been met, then the parent sentence is ignored.

3. If the operator is a unit LIKE, it will do the following conversion: x LIKE 'abc%', converted into: x> = 'abc' AND x < 'abd'. Because LIKE in SQLite is not optimized with an index, so if the index exists, then the conversion and the conversion is not very far, because LIKE does not work, but if the index does not exist, in terms of efficiency or LIKE less than the conversion efficiency of the back.

Third, several query processing (complex queries) sentence
1. Query is: <SelectA> <operator> < selectB> ORDER BY <orderbylist> ORDER BY
performing the method:. Is one of UNION ALL, UNION, EXCEPT, or INTERSECT execution of this statement is first executed and sorted selectA and selectB , then the results of the two scanning process, the four different operations above, the execution is divided into seven sub-processes:

outA: the results of selectA row into the final result set

outB: a row of selectA result into the final result set (only UNION ALL UNION operation and operation, other operations are not placed in the final result set)

AltB: selectA less than the current record of the current record when selectB

AeqB: when the current is equal to the recording selectA current record selectB

AgtB: When the current record is greater than selectA current record selectB

EofA: When selectA results traversed

EofB: When selectB results traversed

Here is the execution of four operations:

 Execution order

UNION ALL

UNION

EXCEPT

INTERSECT

AltB:

Out, Next

Out, Next

Out, Next

Next

AeqB;

Out, Next

Next

Next

Out, Next

AgtB:

outB, nextB

outB, nextB

nextB

nextB

EofA:

outB, nextB

outB, nextB

halt

halt

EofB:

Out, Next

Out, Next

Out, Next

halt

2. If possible, it can be used in a GROUP BY query statement converted into DISTINCT statement to query as GROUP BY sometimes might use index, and for DISTINCT will not be used in the index.

Fourth, the subquery flattening

例子:SELECT a FROM (SELECT x+y AS a FROM t1 WHERE z<100) WHERE a>5

Execute SQL statements on the default method is generally within the first query is executed, the results into a temporary table, then this table in the outer query, it is necessary for data processing twice, in addition to the temporary table has no index, so outer query can not be optimized, and if for the above SQL processing can be obtained by the SQL statement: SELECT x + y aS a FROM t1 WHERE z <100 aND a> 5, this result is clearly Same as above, but this when only need to

Data query Once is enough, if there is another index on the table t1, then to avoid traversing the entire table.

SQL optimization methods using flatten conditions:

1. subqueries are not used and the outer query set function

2. subquery outer collector function or not a connection query tables

3. subquery right operand is not a left outer join

4. subquery or an outer query with the DISTINCT table is not connected

5. subquery or an outer query with the DISTINCT set function is not used

6. subquery with no set function or outer query does not use the keyword DISTINCT

7. There is a subquery FROM statement

8. subquery or use LIMIT outer query table is not connected

9. subquery or use LIMIT outer query with no set function

10. The outer query or subquery collector function useless LIMIT

11. subquery and the outer query is not at the same ORDER BY clause

12. subquery and the outer query does not have a LIMIT

13. The sub-query does not use OFFSET

14. A composite outer query is not a query or subquery part simultaneously with and ORDER BY keywords LIMIT

15. The outer query does not contain an ORDER BY query collector functor

16. The composite flat subquery: a sub-query is not complex query, he or UNION ALL is a compound query, he is constituted by a plurality of non-set function queries his father subquery is not a complex query , no collector or DISTINCT query function, or a table and no other sub-queries, queries and subqueries parent may contain WHERE clause, these are limited by the conditions 11, 12 in the FROM clause above.

例: SELECT a+1 FROM (

SELECT x FROM tab

UNION ALL

SELECT y FROM tab

UNION ALL

SELECT abs(z*2) FROM tab2

) WHERE a!=5 ORDER BY 1

Convert:

SELECT x+1 FROM tab WHERE x+1!=5

UNION ALL

SELECT y+1 FROM tab WHERE y+1!=5

UNION ALL

SELECT abs(z*2)+1 FROM tab2 WHERE abs(z*2)+1!=5

ORDER BY 1

17. If the sub-query is a complex query, then all the ORDER BY statement to the parent query must be a simple column subquery references

18. The sub-query does not use LIMIT or outer query does not have a WHERE clause

Flat subquery is a function implemented by a dedicated, function:

static int flattenSubquery(

Parse *pParse, /* Parsing context */

Select *p, /* The parent or outer SELECT statement */

int iFrom, /* Index in p->pSrc->a[] of the inner subquery */

int isAgg, /* True if outer SELECT uses aggregate functions */

int subqueryIsAgg /* True if the subquery uses aggregate functions */

)

It is implemented in Select.c file. Obviously for a more complex query, this query statements flattening process if the above conditions are met can be achieved after optimization of queries. If there is an index exactly, then the effect will be even better!

Fifth, join queries

Before returning query results, related to each row of the table must have been connected, in SQLite, this is achieved with nested loops, in earlier versions, the far left is the outermost loop, the far right is the innermost loop layer when connecting two or more tables, if there is an index into the inner loop, that is placed in the rearmost FROM, since for each selected row in front, looking behind the corresponding row, if there will be indexed quickly, if not will have to traverse the entire table, so efficiency is very low, but in the new version, this optimization has been achieved.

Optimization method is as follows:

To query for each table, statistical information on the table this index, the cost assigned to the first (a constant system has been defined) SQLITE_BIG_DBL:

1) If there is no index, the query to find there is no rowid in this table:

1. If there Rowid = EXPR, if any, on this table estimates the cost of return, the cost basis is zero, the number of records from the query is 1, and the cost to complete this table estimates,

2. If no Rowid = EXPR but ROWID IN (...), and IN is a list, then the number is recorded as the number of records returned IN list of elements, the cost is estimated NlogN,

3. If IN is not a child but a list of query results, then due to this specific sub-queries are not sure, we can only estimate a value, it returns the number of records for the 100, 200 consideration.

4. If the query is rowid range, then it is estimated that all the qualifying record one-third of the total recorded total record was estimated at 1,000,000, and the estimated costs but also for the number of records.

5. If the query also requires sorting, plus the cost of the re-ordering of NlogN

6. If the price obtained at this time is less than the total consideration, then update the total consideration, or not updated.

2) If the OR operator exists in the WHERE clause, then all of these clauses OR connection should then analyzed separately.

1. If there is a clause composed of AND connector, and then were then analyzed then joined by AND clause.

2. If the connection is in the form of clause X <op> <expr>, then again the clause analysis.

3. The next step is the total cost of the whole of the OR operation is calculated.

4. If the query requires sorted, and then on top multiplied by the total cost of ordering costs NlogN

5. If the price obtained at this time is less than the total consideration, then update the total consideration, or not updated.

3) If there is an index, the index statistics information for each table, for each index:

1. The index to find the corresponding column number, can be used to find the corresponding (operator must either the IN = (...)) this index WHERE clause if not found, then the loop is exited for each index, if found, it is determined what the operator is clause, if a =, then no additional cost, if iN (sub-select), it is estimated that the additional cost of inMultiplier 25, if it is iN (list), then additional consideration is N (N is the number of columns in the list).

2. And then calculate the total cost and the total number of records the query results and costs.

3. nRow = pProbe-> aiRowEst [i] * inMultiplier; / * row count * /

4. cost = nRow * estLog (inMultiplier); / * statistics * the cost /

5. If no operator is either the IN = (...) clause, but the scope of the query, the query result estimation had the same number of records nRow / 3, the cost is estimated to cost / 3.

6. Similarly, if the requirements of this sort, then the query, and then add NlogN on the total consideration above

7. If the price obtained at this time is less than the total consideration, then update the total consideration, or not updated.

4) by the above optimization process can be obtained for the total cost of a query table (the cost is the sum of the respective above), then the second table of the same operation, and so on until all the FROM clause the table are calculated the cost of each, and finally take a minimum, which will serve as the innermost nested loop, the entire sequence can be nested loop nesting order, this time is optimal, to achieve the purpose of optimization.

5) so nested loop sequence is not necessarily consistent with the order of clause FROM, as in the implementation process will be optimized to rearrange the order of the index.

Sixth, the index

In SQLite, there are several indexes:

1) separate index

2) multi-column index

3) unique index

4) For statement is: the primary key of INTEGER PRIMARY KEY, this column will be sorted by default, so although it is not in the data dictionary to generate the index, but it functions like an index. So if on the primary key index individual words, such a waste of space and no benefits.

Note the use of the index:

1) For a small table is not necessary to index

2) on a table If you often do is insert the update operation, then it would refrain from using the index

3) Do not create too many indexes on a table, if you create too many, then SQLite may not be the best choice to execute the query in query time, a solution is to establish poly nest index

The timing of the use of the index:

1) operator: =,>, <, IN, etc.

2) operator BETWEEN, LIKE, OR index can not be used,

如BETWEEN:SELECT * FROM mytable WHERE myfield BETWEEN 10 and 20;

Then they should convert to:

SELECT * FROM mytable WHERE myfield >= 10 AND myfield <= 20;

At this point if there is an index on myfield, then you can use, greatly improving the speed

再如LIKE:SELECT * FROM mytable WHERE myfield LIKE 'sql%';

At this point it should be converted into:

SELECT * FROM mytable WHERE myfield >= 'sql' AND myfield < 'sqm';

At this point if there is an index on myfield, then you can use, greatly improving the speed

再如OR:SELECT * FROM mytable WHERE myfield = 'abc' OR myfield = 'xyz';

At this point it should be converted into:

SELECT * FROM mytable WHERE myfield IN ('abc', 'xyz');

At this point if there is an index on myfield, then you can use, greatly improving the speed

3) Sometimes indexes are not used, then they should traverse the whole table (presentation program)

SELECT * FROM mytable WHERE myfield % 2 = 1;

SELECT * FROM mytable WHERE substr(myfield, 0, 1) = 'w';

SELECT * FROM mytable WHERE length(myfield) < 5;

Reproduced in: https: //www.cnblogs.com/kevinGao/archive/2012/06/18/2555414.html

Guess you like

Origin blog.csdn.net/weixin_34137799/article/details/93342846