[Oracle] Optimizer RBO and CBO

1. What is an optimizer?

Optimizer (Optimizer) is a core subsystem built into the Oracle database, which can be understood as a core module or a core functional component in the Oracle database. The purpose of the optimizer is to get the execution plan of the target SQL.

There are two types of optimizers in Oracle:

  • RBO: Abbreviation of Rule-Based Optimizer, literally translated as: rule-based optimizer
  • CBO: Abbreviation of Cost-Based Optimizer, literally translated as: cost-based optimizer

The judgment principle used by RBO is a set of built-in rules. These rules are hard-coded in the Oracle database code. RBO will select one of the many possible execution paths of the target SQL according to these rules as its execution plan; however, CBO uses The principle of judgment is cost. CBO will choose the smallest one from the many possible execution paths of the target SQL as its execution plan. The cost value of each execution path is based on the related objects such as tables, indexes, and columns involved in the target SQL statement. The statistical information is calculated.

The SQL statement execution process in the Oracle database can be represented by Figure 1-1

2. Rule-based optimizer RBO

Oracle does not support RBO after 10g, but the RBO code has not been deleted from the Oracle database, which means that we can continue to use RBO by modifying the optimizer block

Oracle will assign a grade to various types of execution paths in the code in advance. There are a total of 15 grades, from grade 1 to grade 15, and Oracle will think that the lower the grade, the higher the execution efficiency. When deciding the execution path of the target SQL, if there is more than one possible execution path, the RBO will select an execution path with the lowest rank value from the many possible execution paths of the SQL as its execution plan.

For OLTP SQL statements, access via ROWID is the most efficient way, while access via a full table scan is the least efficient way.Correspondingly, the execution path corresponding to RBO's built-in level 1 is "single row by rowid" (accessing a single row of data through rowid), and the execution path corresponding to level 15 is "full table scan (full table scan)"

2.1 Columns in Table 2 also have indexes. How to choose RBO?

(1) Create a table and establish an index

-- 建表
create table emp_tmp as select * from emp;
-- 创建索引
create index idx_mgr_tmp on emp_tmp(mgr);
create index 了IDX_DEPTNO_TMP on emp_tmp(deptno);

(2) Switch to RBO mode and verify which index is selected by RBO

-- 切换为RBO模式
SQL> alter session set optimizer_mode=rule;

会话已更改。

-- 产生执行计划
/**
知识补充:
1.set autotrace off 不生成autotrace报告,这是默认模式;
2.set autotrace on explain 只显示优化器执行路径报告
3.set autotrace on statistics 只显示执行统计信息
4.set autotrace on 显示执行计划和统计信息
5.set autotrace traceonly 同4,但是不显示查询输出
6.set autotrace traceonly explain 同explain plan,但是不执行语句,只产生计划
7.set autotrace traceonly statistics 同5,只显示执行路径
*/ 
SQL> set autotrace traceonly explain

-- 执行SQL语句
select * from emp_tmp where mgr > 100 and deptno > 100

The output execution plan is as follows:

From the above figure, we can see that RBO has indexed IDX_DEPTNO_TMP, but not IDX_MGR_TMP index.

2.2 Adjust RBO plan

For the case in 2.1, joining us found that the index IDX_MGR_TMP is more efficient than the index IDX_DEPTNO_TMP, so how can I let RBO listen to my opinion?

2.2.1 Writing equivalent SQL

We can take some transformations on the deptno column, for example: deptno + 0, then the IDX_DEPTNO_TMP index will not be taken

-- 等价SQL
select * from emp_tmp where mgr > 100 and deptno + 0> 100;

The output execution plan is as follows:

Through the above picture, we found that the index is really obedient, leaving IDX_MGR_TMP, but there is another method, as follows 2.2.2

2.2.2 Modify the cache order of objects in the data dictionary

Since we just created the IDX_MGR_TMP index first, and then the IDX_DEPTNO_TMP index, the IDX_MGR_TMP is cached first in the data dictionary cache, and then the IDX_DEPTNO_TMP is cached. Because "the bricks of the wall, the latter come first", RBO chose The index IDX_DEPTNO_TMP, here, we can do this, we delete the index IDX_MGR_TMP and rebuild, so let the index IDX_MGR_TMP be called the latecomer?

-- 删除索引
drop index IDX_MGR_TMP;

-- 重建索引
create index idx_mgr_tmp on emp_tmp(mgr);

-- 查询
select * from emp_tmp where mgr > 100 and deptno > 100;

The output execution plan is as follows:

If not, it is really possible to adjust the index selection of the RBO by modifying the cache order of the data dictionary.

2.2.3 Multi-table connection modification table order

If the target SQL has two or more execution paths with the same level value, you can adjust the execution plan of the target SQL by changing the order in which the objects involved in the target SQL appear in the SQL text.This is usually applicable to the situation where multiple table joins occur in the target SQL. When there are two or more execution path level values ​​in the target SQL, the RBO will be determined in order from right to left Who is the driving table and who is the driven table, and then will choose the execution plan according to this, if we change the order of the various objects involved in the target SQL in the SQL text, it will also change the driving of the table connection Table and driven table, and then adjusted the SQL execution plan.

-- 建表
create table emp_tmp1 as select * from emp;

-- 执行SQL
select t1.mgr,t2.deptno from emp_tmp t1,emp_tmp1 t2 where t1.empno=t2.empno

The output execution plan is as follows:

As can be seen from the above figure, the table emp_tmp1 is on the right in the SQL text, so the table is the driving table, and emp_tmp is the driven table. The above execution plan is to sort and merge connections.
Strictly speaking, the sort merge connection does not have the concept of drive table and driven table, here is just for the convenience of explanation and artificially added the above concept to sort merge connection

But the above conclusion has a premise: the target SQL must have two or more execution paths with the same level value. It is difficult for RBO to choose an execution plan based solely on the level value

Let's use the emp table to associate with the emp_tmp table to see what is going on? The emp table has a primary key index on emnpno

select t1.mgr,t2.deptno from emp t1,emp_tmp t2 where t1.empno=t2.empno;

At this time, the execution plan is a nested loop connection, and the driving table is emp_tmp

Let's look at the execution plan after swapping the position of the emp table and emp_tmp.

It can be seen from the above that there is no change, then the conclusion is verified: if the RBO can select the execution plan based on the size of each execution path level value of the target SQL, no matter how to adjust the position of the object in the SQL text, for None of the SQL execution plan will have an impact.

3. Cost-based optimizer CBO

RBO has obvious defects, such as many good features in Oracle, which are no longer supported in RBO, the plan generated by RBO is difficult to adjust, etc. These are still minor, the biggest criticism is that RBO is hard-coded in the database , Does not consider the actual number of objects involved in the target SQL, actual data distribution, etc., so that once the rules do not apply to the actual objects involved in the SQL, the execution plan generated according to the RBO is not the optimal execution plan .
Give an example:

select * from emp where deptno=20

Suppose there is a single-key value B-tree index named IDX_DEPTNO_TMP on the deptno of the emp table. If we apply RBO, then no matter how large the EMP data is, and regardless of the distribution of the DEPTNO column, Oracle will execute it First go to the index, and then fetch the records in the EMP in the back table. Oracle will not scan the EMP table at this time. For RBO,The level value of the full table scan is higher than the level value of the index range scan.

This kind of performance of RBO is no problem when the amount of data is not large, or the amount of data is large but there are few records that meet the conditions. But if the number is large, and three-quarters of the DEPTNO column records are deptno = 20, then RBO first scans the index and then returns the table to fetch the data, which obviously is not as fast as a full table scan.

Based on the shortcomings of RBO, the index introduced CBO after Oracle 7, when CBO chooses the execution path of the target SQL, all the judgment principles are cost, and CBO will choose an execution with the smallest cost value from the many execution paths of the SQL statement Path as its execution plan. The cost value of each execution path is calculated based on the statistical information of related objects such as tables, indexes, and columns involved in the target SQL statement.

The cost in the Oracle database is actually an estimate of the I / O, CPU, and network resources required to execute the target SQL.

Special note:
When calculating the cost of an execution path, Oracle does not necessarily complete the calculation from beginning to end, as long as Oracle finds that the calculated part of the cost value has been greater than the smallest cost value saved to the present during the calculation process. , Will immediately terminate the calculation of the cost value of the current execution path, and instead execute the cost of the next new execution path. This process will continue until all possible execution paths of the target SQL have been calculated or a pre-defined threshold of the number of execution paths to be calculated.

3.1 Set potential

Cardinality is a unique property of CBO, translated as: set potential. It refers to the number of records contained in the collection. To put it plainly is to specify the number of rows in the result set.

Cardinality actually represents an estimate of the number of records contained in the execution result of a specific execution step of the target SQL. Of course, if you target the entire target SQL, then Cardinality at this time represents an estimate of the number of records included in the final SQL execution result.

Cardinality is closely related to the estimation of cost value, because the I / O resources consumed by the result set obtained by Oracle can be viewed as increasing with the number of records.Therefore, the greater the value of Cardinality corresponding to an execution step, the greater the corresponding cost value, and the greater the total cost value of the execution path where the execution step is located.

3.2 Selectivity

Selectivity is also a unique concept of CBO. It refers to return a result set after a specified number of records predicate condition is applied to the number of records in the original result set is not applied to any of the predicate condition ratio.

The selectable rate ranges from 0 to 1. The smaller the value, the better the selectable rate.

Selectivity and cost values ​​are also closely related, becauseThe greater the selectability, the greater the value of Cardinality in the returned result set, so the greater the estimated cost.

In fact, CBO uses selectivity to estimate Cardinality. Original Cardinality is used here to indicate the number of records in the original result set without any predicate conditions, and Computed Cardinality is used to indicate the records of the result set returned after applying the specified predicate conditions. Number, the formula is as follows:

Computed Cardinality = Original Cardinality *  selectivity

Although it seems that the calculation formula for the selection rate is very simple, in fact, the specific calculation process is very complicated. Each specific case will have a different calculation formula. In particular, there is no histogram on the target column and no NULL value. In this case, the selectivity of the target column for equivalent query is calculated using the following formula:

selectivity = 1/NUM_DISTINCT 
--这里的NUM_DISTINCT 表示目标列的distinct值的数量

Case:

--------建立测试表
create table person(id int,name varchar2(40),sal number,addr varchar2(100));
insert into person values(1,'Jack',10000,'China');
insert into person values(2,'Tom',20000,'China');
insert into person values(3,'Alice',30000,'China');
insert into person values(4,'Json',50000,'China');


SQL> alter table person modify (sal not null);

Table altered.

SQL> create index idx_person_sal on person(sal);

Index created.
SQL> select count(1) from person;

  COUNT(1)
----------
        4


SQL> select count(distinct sal) from person;

COUNT(DISTINCTSAL)
------------------
                 4
                 
SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'PERSON',estimate_percent=>100,cascade=>true,method_opt=>'for all columns size 1',no_invalidate => false);

PL/SQL procedure successfully completed.   

SQL> set linesize 800
SQL> set pagesize 900

SQL> set autotrace traceonly explain
SQL> select * from person where sal=10000;

The output execution plan is as follows:

The column Cost (% CPU) already exists in the above execution plan, which means that CBO is used during SQL parsing, and Rows here is the Cardinality value corresponding to each execution step in the above execution plan, column Cost (% CPU) records the cost value corresponding to each step in the above execution plan.

As can be seen from the above display, the execution plan now takes the index range scan of IDX_PERSON_SAL, and the value of the row Rows corresponding to the execution step of Id = 2 is 1, which means that the Cardinality corresponding to this step evaluated by CBO is 1. Similarly, the Cardinality corresponding to Id = 0 is also 1.

So how are these two values ​​CBO calculated?

As mentioned above: when there is no histogram on the target column and there is no NULL value, the selectivity of the target column for equivalent query is: 1/4, and then according to the formula

Computed Cardinality = Original Cardinality *  selectivity

It can be easily obtained that the number of records returned after querying through the where condition is: 4 * (1/4) = 1, so the Cardinality corresponding to the result of the Id = 2 step is also 1, and because the entire query has only one where condition, the final execution result The corresponding Cardinality is also 1

We now modify all the values ​​of the column SAL to 10000 and test again

SQL> update person set sal=10000;

4 rows updated.
SQL> commit;

Commit complete.

-- 重新收集一下统计信息
SQL> exec dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'PERSON',estimate_percent=>100,cascade=>true,method_opt=>'for all columns size 1',no_invalidate => false);

PL/SQL procedure successfully completed.

SQL> select * from person where sal=10000;

The output execution plan is as follows:

As you can see from the above figure, the value of Cardinality has changed from 1 to 4, which is easy to understand. Now the value after ditcinct in the SAL column has changed to 1, so the selectivity of the query for the equivalent value of the column SAL is from 1/4 Becomes 1, so the value of Cardinality corresponding to the execution step and the value of Cardinality corresponding to the final result will be 4 * 1/1 = 4

If the current data volume of the person table is 10 million, and all sal values ​​are 10000, how will the CBO choose at this time?

Here it is not necessary to really insert 10 million rows of records into the table, it is important to make the data volume of the table 10 million (because the CBO calculates the cost based entirely on the statistical information of the relevant objects of the target SQL, so here only need to change The statistics of table person and index IDX_PERSON_SAL can be)

-- 调整表的统计信息
SQL> exec dbms_stats.set_table_stats(ownname=>'SCOTT',tabname=>'PERSON',numrows=>10000000,no_invalidate => false);

PL/SQL procedure successfully completed.
-- 将索引IDX_PERSON_SAL对应其索引叶子块的数量的统计信息修改为10万
SQL> exec dbms_stats.set_index_stats(ownname=>'SCOTT',indname=>'IDX_PERSON_SAL',numlblks=>100000,no_invalidate => false);

PL/SQL procedure successfully completed.
SQL> select * from person where sal=10000;

The output execution plan is as follows:

As can be seen from the figure, the current Cardinality value has become 10M (10M means 10 million), that is to say, when this extreme situation occurs, the CBO will not go to the index, and a full table scan is performed .

At this time we modify the optimizer to RBO and view the results

alter session set optimizer_mode=rule;
SQL> select * from person where sal=10000;

As you can see from the picture above, the RBO mode still chooses to take the index, it is not as smart as CBO.

3.3 Transmissibility

Transitivity is a unique concept of CBO. The first thing that CBO does in query conversion is that CBO may simply rewrite the original target SQL.A simple equivalent rewrite of target SQL using transferability is only applicable to CBO, RBO will not do such a thing.

3.3.1 Simple predicate delivery

t1.c1 = t2.c1 and t1.c1=10 等价于t1.c1 = t2.c1 and t1.c1=10 and t2.c1=10 

3.3.2 Connection predicate delivery

t1.c1=t2.c1 and t2.c1=t3.c1 等价于 t1.c1=t2.c1 and t2.c1=t3.c1 and t1.c1=t3.c1 

3.3.3 Outer connection predicate delivery

t1.c1=t2.c1(+) and t1.c1=10 等价于 t1.c1=t2.c1(+) and t1.c1=10 and t2.c1(+)=10

Case test:

-- 建立两个表
create table t1(c1 number,c2 varchar2(10));
create table t2(c1 number,c2 varchar2(10));

-- 在表t2的c1列上建立索引
create index idx_t2 on t2(c1);

-- 插入测试数据
insert into t1 values(10,'aaa');
insert into t1 values(11,'bbb');
insert into t1 values(12,'ccc');
insert into t1 values(13,'ddd');

insert into t2 values(10,'aaa');
insert into t2 values(11,'bbb');
insert into t2 values(12,'ccc');
insert into t2 values(13,'ddd');

--- 修改为CBO优化器
alter session set optimizer_mode=all_rows;

select t1.c1,t2.c1 from t1,t2 where t1.c1=t2.c1 and t1.c1=10

The output execution plan is:

It can be seen from the above figure that although we do not have a simple predicate condition for column c1 of table t2, Oracle still takes the index of t2 table. The driving query condition for id = 4 below is 4 - access("T2"."C1"=10)that we do not have this in our SQL The predicate condition indicates that the predicate condition is equivalently rewritten. SQL was rewritten as:

select t1.c1,t2.c1 from t1,t2 where t1.c1=t2.c1 and t1.c1=10 and t2.c1=10

3.4 Limitations

The birth of CBO is to solve the congenital defects of RBO, but as the Oracle version continues to change, CBO becomes more and more intelligent and more powerful, but this does not mean that CBO is perfect. The following are the defects of CBO:

(1) CBO default target SQL statement where the conditions appear in each column is independent, there is no relationship

(2) CBO will assume that all target SQLs are executed separately and do not interfere with each other.

The index leaf blocks and data blocks that we need to access when executing the target SQL may have been cached in the Buffer Cache due to the previously executed SQL, so this time there is no need to spend physical I / O to read on the relevant disk To fetch data, only need to read in the cache. Therefore, CBO is executed separately. If the cost value is calculated without considering the cache method, the cost of the relevant index may be overestimated, which may lead to the wrong execution plan.

(3) CBO has many restrictions on histogram statistics

(4) CBO may miss the correct execution plan when parsing the target SQL of multi-table association.

Guess you like

Origin www.cnblogs.com/OliverQin/p/12723891.html