SQL confusing to troubleshoot and optimize performance

Author: Daiqiu Long Reprinted from:  the Oracle user group one machine 

640?wx_fmt=jpeg

About the Author

Daiqiu Long , with over eight years of telecommunications, insurance, taxation industry core system ORACLE database optimization, optimization experience, has a wealth of industry service background. Have a deep understanding of the Oracle database, specializes in database troubleshooting, database performance tuning.

 

background:

A SQL, logical reads customer is relatively high. It needs to be optimized. Also gives AWR report, AWR report mainly several SQL are similar problems.

SQL_ID: g4nbv7twn23fw, cost: 3000 logical reading / time 400,000 / h

SELECT * FROM (SELECT XX.*, ROWNUM AS RN FROM (select count(*) from PARTY_CERT P inner join CUSTOMER C on P.PARTY_ID = C.PARTY_ID and C.STATUS_CD = '1100' where P.PARTY_ID in (:1 ) and P.STATUS_CD in (:2 ) and P.IS_DEFAULT = '1') XX WHERE ROWNUM <= 1000 ) XXX WHERE RN > 0

analysis:

Check out the bind variable values ​​into SQL, we found that only nine logical reads. AWR report does not comply with the

 

640?wx_fmt=png


Some readers may think that performance issues ID = 5 Cartesian product problem, but the problem is not here to see afterwards. At this impasse. But ASH view might be able to give clues.

640?wx_fmt=png

By ash analysis, more performance overhead in step 9 implementation of the plan. It was at the C table (CUSTOMER) back to the table.

 

C SQL derived table used in two fields C.PARTY_ID, C.STATUS_CD. Built on the index PARTY_ID, back to the table is to visit STATUS_CD field.

 

It is recommended indexing index C (PARTY_ID, STATUS_CD); to avoid back to the table.

 

Recommendations for the SQL optimization is indexed.

 

After the group indexed embodiment, multiple AWR late reports, read logic 330 average the SQL / times.

 

Thinking ability to continue to optimize

Into bind variables not previously optimized logic 9 but AWR report read logic 3000 average reading. Combine the data to see if it leads to uneven distribution?

 

Investigation found that the selectivity 98% C in Table PARTY_ID field, in conjunction with the variable bindings continue troubleshooting.

640?wx_fmt=png

Figure: There is a value of 100,000 in the table, only one other values ​​in the table.

 

When PARTY_ID = 15151723602037, back to the table need to return 100,000 times. This value into SQL. Logical reads 7770 / times. It is logical reads pulled average above 3000. There have been some suggestions on the issue. That could be further optimized?

 

Discussion: The following investigate on the basis of not creating a new index on

Since the data is unevenly distributed, you can improve performance by collecting a histogram? the answer is negative.

 

Do the test environment.

 

(Create a test table:. CUSTOMER_test import all the data, the establishment of the relevant index, collect histogram) implementation of SQL, SQL efficiency worse, 150,000 logical reads / second

640?wx_fmt=png

640?wx_fmt=png

 

Back to SQL. Analysis of SQL, SQL just need to count (1), the type of statistics, consider using half-connection

 

And business need to confirm whether the connection can be changed to half. (Here not to discuss the business, only to discuss how to optimize this data distribution) because the uneven effect than the semi-connected data distribution for the better.

 

Rewrite the SQL :( most value into the data)

SELECT * FROM (SELECT XX.*, ROWNUM AS RN  FROM (select count(1)
from CUST_YC_APP.PARTY_CERT P where P.PARTY_ID in (15151723602037)
And P.PARTY_ID in( select C.PARTY_ID from  CUSTOMER_test  C
where C.STATUS_CD = '1100' )  and P.STATUS_CD in ('1000')  
and P.IS_DEFAULT = '1') XX WHERE ROWNUM <= 1000) XXX WHERE RN > 0;

 

Do not add hints later in will go into full table, logical reads 1286 / S

 

Added plus hint  / nl_sj + index * (C) * /  . 9 logical read / time

 

SQL does not automatically take the best execution plan, need to bind only to go hints.

How to automatically use the best execution plan it?

 

  • Removes the histogram.

After deleting the histogram P.PARTY_ID in (15151723602037) the amount of data Although many but CBO assessment of the amount of data one, go directly to the hash join (sometimes in conjunction with C.PARTY_ID = P.PARTY_ID assess C.PARTY_ID = 15151723602037 It is a direct correlation to go Cartesian product, similar to the beginning of the problem). But not the best execution plan.

 

Collect histogram, will take the index, removes the histogram will go hash association / Cartesian product. Is not semijoin

 

It seems in trouble.

 

  • Selective setting data.

CBO data to help assess P table returned to its accuracy requirements are not high, even as long as we evaluate a few, let CBO tend to choose to go half-connection.

DBMS_STATS.set_column_stats(colname =>'PARTY_ID',distcnt => 1645919);
1645919 大约数据总量的30%,

Test SQL, see SQL is not directly choose the best execution plan.

 

Implementation plan really is associated nested_loop seml and take the index. We are currently exploring the best execution plan. 9 logical read / time

640?wx_fmt=png

to sum up:

 

Analysis and optimization of the SQL, noting that there is a place 6:00

 

  • Cartesian product association, not a performance bottleneck.

  • Special data distribution, data set at a value that serious return to the index table.

  • Binding data distribution into the semi-connected in the form of SQL, costs are significantly reduced.

  • Because of the special distribution, collection histogram when the value of the test when the special distribution will bring a large table full table scan, does not collect histogram will bring nested_loop seml hash join is not what we want.

  • Set statistics go both fixed index scan (regardless of case data in this table are the highest index scan efficiency), but also to meet the best way associated nested_loop seml.

  • The final implementation of the optimization program using the most straightforward solution, rather than change the SQL text in our inquiry, set up statistics. And the final results were pretty good.

Two-dimensional code-one of the following user groups, interested may concern:

640?wx_fmt=jpeg

640?wx_fmt=png

Guess you like

Origin blog.csdn.net/yangjianrong1985/article/details/102493808