SQL table join optimization method Chapter V

Shenkaoziliao:
This series of blog mainly Cenkaoziliao have CUUG database Rannai Gang teacher teaching notes, "SQL optimization core idea" (Luobing Sen, Huang Chao Zhong Jiao with), "PostgreSQL Inside: query optimization Depth" (Zhang Shujie a), ranking in no particular order.

 

SQL optimization personally think there are two strokes, the first move, the proper use of the index, this method is simple and quick; The second measure is to change the table connection method, this sometimes complicated, many times, and outer joins us in half connection, reverse connection, standard quantum query fight.

The following are experimental environment of this chapter.

drop table tests1;

drop table tests2;

create table tests1 as select  D.* from dba_objects D;

create table tests2 as select * from tests1;

create index ix_tests1_id on tests1(object_id);

create index ix_tests2_id on tests2(object_id);

1 nested loop

Implementation of nested loops that drive the appearance of the table, take a look inside to retrieve a table, this connection can handle the equivalent access and non-equivalent connection, however, satisfy the following conditions in order to obtain better efficiency

First, a small amount of the final return data;

Second, take the index table within a query.

select t1.OWNER,t2.OBJECT_NAME

from tests1 t1,tests2 t2

where t1.OWNER='SCOTT'

and t1.OBJECT_ID=t2.OBJECT_ID;

2 HASH connection

HASH algorithm is connected to the driving connection table column with a HASH algorithm packet, the table is connected to the column driver HASH value is calculated using the same algorithm, and then matched so that the drive and the driven table must be a full table scan once, returns more data for scene.

select t1.OWNER,t2.OBJECT_NAME

from tests1 t1,tests2 t2

where t1.OWNER='SYS'

and t1.OBJECT_ID=t2.OBJECT_ID;

3 Merge Join

将两个表排序,然后再进行join。个人理解这种连接方式首先要全排序,成本较高,使用场景也比较有限,在不等连接的某些场景可能效率较好,这种连接方式没必要做太深入学习,

后面我们也不讨论Merge Join的半连接,反连接。

select * from emp e ,dept d where e.DEPTNO>e.DEPTNO;

4 半连接

嵌套循环,HASH连接都有自己的半连接。

半连接的意思是驱动表和被驱动表根据连接列进行匹配,只要匹配成功,只返回驱动表的数据,这样,在被驱动表连接列有重复值时,因为只返回驱动表数据,连接后的数据量不会翻倍。下面的执行计划中SEMI就是半连接关键字。

嵌套循环半连接

select t1.OWNER

from tests1 t1

where t1.OWNER='SCOTT'

  and exists(select 1

             from tests2 t2

             where t1.OBJECT_ID=t2.OBJECT_ID);

HASH半连接

select t1.OWNER

from tests1 t1

where t1.OWNER='SYS'

  and exists(select 1

             from tests2 t2

             where t1.OBJECT_ID=t2.OBJECT_ID);

半连接的基本优化方法与嵌套循环及HASH连接相同,涉及到查询转换在第七章讨论。

5 反连接

嵌套循环,HASH连接都有自己的反连接。

反连接的意思是驱动表和被驱动表根据连接列进行匹配,全匹配不上时,只返回驱动表的数据,连接后的数据量不会翻倍。下面的执行计划中ANTI就是半连接关键字。

嵌套循环反连接

select t1.OWNER

from tests1 t1

where t1.OWNER='SCOTT'

  and not exists(select 1

             from tests2 t2

             where t1.OBJECT_ID=t2.OBJECT_ID);

HASH反连接

select t1.OWNER

from tests1 t1

where t1.OWNER='SYS'

  and not exists(select 1

             from tests2 t2

             where t1.OBJECT_ID=t2.OBJECT_ID);

反连接的基本优化方法与嵌套循环及HASH连接相同,涉及到查询转换在第七章讨论。

 

6 Filter

严格意义上说,这个不算是表与表或者结果集与结果集的连接,这个算是结果集与查询的连接,因为Filter的驱动表时一个结果集,但是被驱动部分无法成为一个独立或者固定的结果集,运算方法类似嵌套循环。

select t1.OWNER

from tests1 t1

where t1.OWNER='SCOTT'

  and not exists(select 1

                 from tests2 t2

                 where t1.OBJECT_ID=t2.OBJECT_ID

                    or t1.OBJECT_ID>5000);

上面执行计划,因为子查询里面的or t1.OBJECT_ID>5000关联主表,子查询无法形成独立的结果集,只能走Filter。

select t1.OWNER

from tests1 t1

where t1.OWNER='SCOTT'

  and not exists(select 1

                 from tests2 t2

                 where t1.OBJECT_ID=t2.OBJECT_ID

                    and rownum>0);

上面的执行计划,因为子查询中有rownum,这样,驱动表每次传来的值,子查询肯能产生不同的结果集(rownum是根据查询结果加上的虚拟列),也只能走Filter。

Filter是因为写法问题,优化器无法改写而被逼留下的,优化器不喜欢,调优人员也不该喜欢,优化方法在第七章有讲解。

7 hint

我们在调优过程中,有时候要使用hint改变执行计划,但是,hint要慎用,因为hint会固定执行计划,对优化器生成更优执行计划有不好影响,特别是随着数据的变化,影响优化器改变执行计划。

7.1 hint写法

SELECT/INSERT/DELETE/UPDATE /*+ index(T IX_T)*/,+号后面有空格。

7.2 常用hint

/*+ index (emp,emp_idx) */        强制优化器走索引

/*+ use_nl(emp,dept)*/         采用嵌套循环连接。

/*+ use_merge(emp,dept) */    采用排序合并连接。

/*+ use_hash(emp,dept)*/    采用哈希连接。

/*+ leading(emp) */               选择emp为驱动表。

/*+ order */                        按照from列出的表顺序进行连接。

/* +parallel */                      使用并行查询

(上面HINT写法摘自CUUG冉乃纲老师教学笔记)

Guess you like

Origin blog.csdn.net/songjian1104/article/details/91349949