Today, in an article itput point of view, is optimized to discuss a statement:
Original Post Address: http://www.itpub.net/viewthread.php?tid=1015964&extra=&page=1
First, identify problems
Optimization of the statement:
the CREATE TABLE aa_001 (ip VARCHAR2 ( 28 ), name VARCHAR2 ( 10 ), password VARCHAR2 ( 30 )) the SELECT * from aa_001 the WHERE ip in ( 1 , 2 , 3 ) the Order by name desc ; - currently table records there are about ten million strips, but also in the number of values is uncertain.
These are the need to optimize the optimization of statements and situations.
Many people in the back thread: Some say no way to optimize, and some say this is the IN EXISTS, some said index composite index (ip, name) and so on ip.
Second, ask questions
That such a situation, to optimize it, how to optimize? Today to discuss this issue.
Third, the analysis of the problem
1, 10 million more than the amount of data.
2, in the number of values is uncertain
3.1 Analysis of data distribution
Here the author did not mention the distribution of data columns ip, ip currently distributed data columns may have the following:
1, ip column (unique probability data, or data duplication is small)
2, ip columns (data not uniform, some of the data may be repeated multiple, repeated some less)
3, ip columns (data is more evenly distributed, large amounts of data duplication, mainly some of the same data (may be different from thousands of ip level data, etc.)
Solve the problem:
1, the data distribution for the first case, based on an index to column ip. At this time no matter how many rows the table, in case the number is uncertain, very quickly.
2, corresponding to the second distribution of data, the column index in the ip, ineffective. Because the uneven distribution of data, there may be some fast, some slow
3, the third data corresponding to the distribution, the index ip column definitely slow speed.
Note : order by name desc here is to retrieve data and then sort of. Instead of taking data before sorting
For the two cases 2 and 3, it is possible because the need to remove large amounts of data, the optimizer uses table scan (table scan), rather than index lookup (index seek), very slow, because then the efficiency is excellent scan table Find the index , especially under high concurrency, low efficiency.
2 and 3 that corresponds to the situation, how to deal with. It is in change exists. In fact, the optimizer in sql server 2005 and oracle where the data came from behind in, efficiency is the same . In this case the use of low efficiency of the general index. Then if ip build clustered index on the column, it would be more efficient. We do a test in SQL server 2005.
Table: . [The dbo] [[zping.com]]] of about 200 million data. Column contains the Userid , the above mentioned id, RuleId and other columns. According to the above case inquiries about similar statement:
userid in ( ' 402881410ca47925010cb329c7670ffb ' , ' 402881ba0d5dc94e010d5dced05a0008 '
, ' 4028814111a735e90111a77fa8e30384 ' ) order by Ruleid desc
Userid we look at the distribution of data, execute the following statement:
Then we look at the data distribution: A total of 379 data, two are from 1-150000, data distribution is significantly tilted . It is part of the FIG.
Then if the establishment of a non-clustered index on ip, inefficient, and is forced to scan the index , efficiency is very low, you will find IO times higher than table scan . At this time we can only build a clustered index on ip. Then look at the results.
Then found, a search using the (clustered index seek) gather search scan.
Take a look at the results returned by the query:
table ' [zping.com] ' . Scan count 8 , read logic 5877 , physical reads 0 , read-ahead 0 times, lob logical reads 0 times, lob physical reads 0 times, lob read-ahead 0 times.
Table ' Worktable ' . Scan count 0 , logical reads 0 , physical reads 0 , read-ahead 0 times, lob logical reads 0 times, lob physical reads 0 times, lob read-ahead 0 times.
Return 150,000 rows, only less than 6,000 times IO. High efficiency, because the 15 million lines to sort, query cost in ordering accounted for 51% . Of course, you can build ( userid, RuleId ) composite clustered index to improve performance, but this DML higher maintenance costs. Not recommended.
As can be seen from the test above example, the optimization solution:
Data distribution of 1: Create an index to ip
Data distribution 2,3: ip build clustered index on the column.
Reproduced in: https: //www.cnblogs.com/flysun0311/archive/2012/08/28/2659721.html