oracle connect by的cardinality预估基数问题

转载自老熊的三分地:http://www.laoxiong.net/category/oracle/oraperf

本文不讲述connect by 如何使用,只是提出在使用树形查询时需要考虑的一个有关于性能方面的问题。
这里提到的问题,主要是Oracle优化器在评估connect by 语句的cardinality时,存在的缺陷,下面将举例说明。
在这个例子中所使用的表,是一个真实的生产系统中的表,BSS_ORG:

SQL> desc bss_org  
 名称                         是否为空? 类型  
 ---------------------------- -------- --------------  
  
 BSS_ORG_ID                   NOT NULL NUMBER(9)  
 NAME                         NOT NULL VARCHAR2(64)  
 BSS_PARENT_ORG_ID                     NUMBER(9)  
 BSS_ORG_LEVEL_ID             NOT NULL NUMBER(3)  
 STATE                        NOT NULL VARCHAR2(3)  
 STATE_DATE                            DATE  
 BSS_ORG_CODE                          VARCHAR2(15)  

在这个BSS_ORG表中,BSS_ORG_ID是主键,BSS_PARENT_ORG_ID与BSS_ORG_ID形成上下层级关系。这个表的统计信息如下:

Table                   Number                 Empty Average    Chain Average Global  
Name                   of Rows   Blocks       Blocks   Space    Count Row Len Stats  
--------------- -------------- -------- ------------ ------- -------- ------- ------  
BSS_ORG                  5,739       52            0       0        0      53 YES  
  
Column                    Column                       Distinct            Number  
Name                      Details                        Values   Density Buckets  
------------------------- ------------------------ ------------ --------- ------- --  
BSS_ORG_ID                NUMBER(9,0) NOT NULL            5,739   .000174       1  
NAME                      VARCHAR2(64) NOT NULL           5,034   .000361     200  
BSS_PARENT_ORG_ID         NUMBER(9,0)                       905   .002189     200  
BSS_ORG_LEVEL_ID          NUMBER(3,0) NOT NULL                6   .000087       6  
STATE                     VARCHAR2(3) NOT NULL                2   .000087       2  
STATE_DATE                DATE                            1,624   .001434     200  
BSS_ORG_CODE              VARCHAR2(15)                    5,639   .000179     200  

下面的测试,是在Oracle 11.1.0.6 for Windows版本下进行的测试。在Oracle 9i、Oracle 10g下测试的结果与Oracle 11g下测试的结果是相符的。当然这里谈到的问题是cardinality,因此在三个版本下,SQL的执行计划可能有所不同,但最终的结论是一致的。(BTW:从10g开始,connect by语句有一个新的执行步骤,称为CONNECT BY NO FILTERING,对应的Hint是no_connect_by_filtering)。

SQL> explain plan for  
  2  SELECT   bss_org_id  
  3        FROM bss_org t  
  4  START WITH bss_org_id = 1  
  5  CONNECT BY bss_parent_org_id = PRIOR bss_org_id;  
  
已解释。  
  

SQL> @showplan  
  
PLAN_TABLE_OUTPUT  
  
---------------------------------------------------------------------------------------------------  
| Id  | Operation                               | Name    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                        |         |  5739 | 80346 |    16   (0)| 00:00:01 |  
|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|         |       |       |            |          |  
|   2 |   TABLE ACCESS FULL                     | BSS_ORG |  5739 | 80346 |    16   (0)| 00:00:01 |  
---------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
  
   1 - access("BSS_PARENT_ORG_ID"=PRIOR "BSS_ORG_ID")  
       filter("BSS_ORG_ID"=1)  
  
SQL> explain plan for  
  2  SELECT   bss_org_id  
  3        FROM bss_org t  
  4  START WITH bss_org_id = 832044754  
  5  CONNECT BY bss_parent_org_id = PRIOR bss_org_id;  
  
已解释。  
  
SQL> @showplan  
  
---------------------------------------------------------------------------------------------------  
| Id  | Operation                               | Name    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                        |         |  5739 | 80346 |    16   (0)| 00:00:01 |  
|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|         |       |       |            |          |  
|   2 |   TABLE ACCESS FULL                     | BSS_ORG |  5739 | 80346 |    16   (0)| 00:00:01 |  
---------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
  
   1 - access("BSS_PARENT_ORG_ID"=PRIOR "BSS_ORG_ID")  
       filter("BSS_ORG_ID"=832044754)  
上面的2条SQL,第1条实际应该返回的行数为5739,第2条SQL实际应该返回的行数为4,
但是从执行计划上看,Oracle优化器评估的行数均为5739。

SQL> create index bss_org_idx1 on bss_org(bss_parent_org_id);  
  
索引已创建。  
  
SQL> explain plan for  
  2  SELECT   *  
  3        FROM bss_org t  
  4  START WITH bss_org_id = 1  
  5  CONNECT BY bss_parent_org_id = PRIOR bss_org_id;  
  
已解释。  
  
SQL> @showplan  
  
----------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |  
----------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |              |     6 |   318 |     4   (0)| 00:00:01 |  
|*  1 |  CONNECT BY WITH FILTERING    |              |       |       |            |          |  
|   2 |   TABLE ACCESS BY INDEX ROWID | BSS_ORG      |     1 |    53 |     2   (0)| 00:00:01 |  
|*  3 |    INDEX UNIQUE SCAN          | PK_BSS_ORG   |     1 |       |     1   (0)| 00:00:01 |  
|   4 |   NESTED LOOPS                |              |       |       |            |          |  
|   5 |    CONNECT BY PUMP            |              |       |       |            |          |  
|   6 |    TABLE ACCESS BY INDEX ROWID| BSS_ORG      |     6 |   318 |     4   (0)| 00:00:01 |  
|*  7 |     INDEX RANGE SCAN          | BSS_ORG_IDX1 |     6 |       |     1   (0)| 00:00:01 |  
----------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
  
   1 - access("BSS_PARENT_ORG_ID"=PRIOR "BSS_ORG_ID")  
   3 - access("BSS_ORG_ID"=1)  
   7 - access("BSS_PARENT_ORG_ID"=PRIOR "BSS_ORG_ID")  
  
SQL> explain plan for  
  2  SELECT   *  
  3        FROM bss_org t  
  4  START WITH bss_org_id = 832044754  
  5  CONNECT BY bss_parent_org_id = PRIOR bss_org_id;  
  
已解释。  
  
SQL> @showplan  
  
----------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |  
----------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |              |     6 |   318 |     4   (0)| 00:00:01 |  
|*  1 |  CONNECT BY WITH FILTERING    |              |       |       |            |          |  
|   2 |   TABLE ACCESS BY INDEX ROWID | BSS_ORG      |     1 |    53 |     2   (0)| 00:00:01 |  
|*  3 |    INDEX UNIQUE SCAN          | PK_BSS_ORG   |     1 |       |     1   (0)| 00:00:01 |  
|   4 |   NESTED LOOPS                |              |       |       |            |          |  
|   5 |    CONNECT BY PUMP            |              |       |       |            |          |  
|   6 |    TABLE ACCESS BY INDEX ROWID| BSS_ORG      |     6 |   318 |     4   (0)| 00:00:01 |  
|*  7 |     INDEX RANGE SCAN          | BSS_ORG_IDX1 |     6 |       |     1   (0)| 00:00:01 |  
----------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
  
   1 - access("BSS_PARENT_ORG_ID"=PRIOR "BSS_ORG_ID")  
   3 - access("BSS_ORG_ID"=832044754)  
   7 - access("BSS_PARENT_ORG_ID"=PRIOR "BSS_ORG_ID")  

这里在BSS_PARENT_ORG_ID列上建一个索引,是为了使执行计划与9i、10g下的一致。
这2条SQL返回的结果行数,与前面的2条SQL一样,分别是5739和4,但是从执行计划上看,Oracle优化器评估出来的行数都是6。
从前面的两个测试来看,优化器评估出来的SQL返回的行数要么是5739(表BSS_ORG的总行数),要么是6(总行数/BSS_PARENT_ORG_ID的Disctint Values)。但无论如何,随着不同的start with条件,这个行数(cardinality)与实际返回的结果行数可能会存在非常大的差异。如果仅仅是测试中这样一个简单的SQL,实际上不会有什么问题,很容易出现问题的地方在于,一个复杂的SQL中,有类似于测试SQL这样的子查询,这样使得表连接的评估出现很大的偏差,这样容易引起非常大的性能问题。
在9i下,如果BSS_PARENT_ORG_ID上如果没有索引,那么最后一个测试SQL的执行计划如下:

SQL> explain plan for
2 SELECT *
3 FROM bss_org t
4 START WITH bss_org_id = 832044754
5 CONNECT BY bss_parent_org_id =
6 PRIOR bss_org_id
7 ;

已解释。

SQL> @showplan  
  
----------------------------------------------------------------------------  
| Id  | Operation                    |  Name       | Rows  | Bytes | Cost  |  
----------------------------------------------------------------------------  
|   0 | SELECT STATEMENT             |             |  5739 |   297K|     9 |  
|*  1 |  CONNECT BY WITH FILTERING   |             |       |       |       |  
|   2 |   TABLE ACCESS BY INDEX ROWID| BSS_ORG     |       |       |       |  
|*  3 |    INDEX UNIQUE SCAN         | PK_BSS_ORG  |     1 |     7 |     1 |  
|   4 |   HASH JOIN                  |             |       |       |       |  
|   5 |    CONNECT BY PUMP           |             |       |       |       |  
|   6 |    TABLE ACCESS FULL         | BSS_ORG     |  5739 |   297K|     9 |  
|   7 |   TABLE ACCESS FULL          | BSS_ORG     |  5739 |   297K|     9 |  
----------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
  
   1 - filter("T"."BSS_ORG_ID"=832044754)  
   3 - access("T"."BSS_ORG_ID"=832044754) 

这里9i的优化器评估出来的cardinality为5739,而11g与此同样的执行计划,评估的cardinality是6。
前段时间就遇上由于connect by语句引起的性能问题。数据库为Oracle 9208,开始由于bss_org表的bss_parent_org_id列上没有索引,导致connect by部分得到的cardinality为5739,结果SQL性能非常差,在bss_parent_org_id上建索引后,执行计划改变,connect by 部分得到的cardinality为6,SQL性能大幅提升。

猜你喜欢

转载自blog.csdn.net/u011165335/article/details/82819455