参考资料:
本系列博客主要参考资料有CUUG冉乃纲老师数据库教学笔记,《SQL优化核心思想》(罗炳森,黄超,钟侥著),《PostgreSQL技术内幕:查询优化深度探索》(张树杰著),排名不分先后。
9.1 提取公共项
9.1.1. 数据脚本
drop table test1;
drop table test2;
create table test1 as select * from dba_objects;
create table test2 as select * from dba_objects;
insert into test2 select * from test2;
--反复执行test2的自插操作,直到行数足够多
commit;
drop index ix_test1;
drop index ix_test2;
create index ix_test1_01 on test1(owner);
create index ix_test1_02 on test1(object_id);
9.1.2 改写过程
(1)改写前
select '1' as seq, count(t1.OBJECT_ID),count(t2.OBJECT_ID)
from test1 t1
left join (select * from test2 t where t.OBJECT_ID<1000 and t.OWNER='OUTLN') t2
on t1.OBJECT_ID=t2.OBJECT_ID
where t1.OBJECT_ID<1000
union
select '2', count(t1.OBJECT_ID),count(t2.OBJECT_ID)
from test1 t1
left join (select * from test2 t where t.OBJECT_ID<1000 and t.OBJECT_TYPE='EDITION') t2
on t1.OBJECT_ID=t2.OBJECT_ID
where t1.OBJECT_ID<1000;
改写前物理读和逻辑多都很大
(2)第一次改写,提取公共项目,减少大表test2的扫描次数
with tmp as
(select t.*
from test2 t
where t.OBJECT_ID<1000
and rownum>0)
select '1' as seq,count(t1.OBJECT_ID),count(t2.OBJECT_ID)
from test1 t1
left join tmp t2
on t1.OBJECT_ID=t2.OBJECT_ID
and t2.OWNER='OUTLN'
where t1.OBJECT_ID<1000
union
select '2', count(t1.OBJECT_ID),count(t2.OBJECT_ID)
from test1 t1
left join tmp t2
on t1.OBJECT_ID=t2.OBJECT_ID
and t2.OBJECT_TYPE='EDITION'
where t1.OBJECT_ID<1000;
通过下面执行计划,可见提取公共项目后,和预期相反,执行时间反倒变长了。
原因是出现了1515个物理写,因为数据量较大,需要将with的数据写到磁盘上,让时间增加。
(3)改写方案就是把test2表的检索条件都加大with里,减少行数,同时,只是检索需要的项目,同时从两个方向减少内存占用。
with tmp as
(select t.OWNER, t.OBJECT_ID,t.OBJECT_TYPE
from test2 t
where t.OBJECT_ID<1000
and (t.OWNER='OUTLN' or t.OBJECT_TYPE='EDITION'))
select '1' as seq, count(t1.OBJECT_ID),count(t2.OBJECT_ID)
from test1 t1
left join tmp t2
on t1.OBJECT_ID=t2.OBJECT_ID
and t2.OWNER='OUTLN'
where t1.OBJECT_ID<1000
union
select '2', count(t1.OBJECT_ID),count(t2.OBJECT_ID)
from test1 t1
left join tmp t2
on t1.OBJECT_ID=t2.OBJECT_ID
and t2.OBJECT_TYPE='EDITION'
where t1.OBJECT_ID<1000;
通过执行计划,逻辑读,物理读都大幅度降低,虽然还有4个物理写,还是把性能提高了接近一倍。
9.1.3 结论
(1)一个大表被引用多次,IO成本较大时,可以用提取公共项的方式提升SQL性能;
(2)注意点:With提取时,从检索条件(选择)和检索项目(投影)尽量减少with结果集的大小,以避免With结果集太大,产生物理写等其他性能问题。
9.2 With改善Filter
9.2.1 数据脚本
drop table test1;
drop table test2;
create table test1 as select * from dba_objects;
create table test2 as select * from dba_objects;
insert into test2 select * from test2;
insert into test2 select * from test2;
commit;
create index ix_test1_01 on test1(object_id);
create index ix_test2_01 on test2(object_id);
9.2.2 改写过程
(1)改写前
EXPLAIN PLAN FOR
select count(*)
from test1 t1
where exists (select 1
from test2 t2
where t1.owner=t2.owner
and t1.object_type = t2.object_type
and t1.OBJECT_NAME=t2.OBJECT_NAME
group by t2.owner,t2.object_type,OBJECT_NAME
having count(*)>30)
and t1.object_id>70000;
test2是一张大表,执行计划走了Filter连接,意味着主查询返回多少条,子查询的大表test2就要全表扫描多少次,IO成本直接把性能卡死了。
这里有两点需要说明:
①改善方法至少有两种,第一种是with改写,消除filter,在本节重点讨论,第二种子查询改成表连接,根据t2.owner,t2.object_type,OBJECT_NAME的选择性和最终返回数据量建立索引,让Filter连接走索引,在本系列<SQL优化方法><第二章 Filter连接那点事>进行分析;
②Filter连接并不是一无是处,我们有时候要善待它,不要见一次砍死一次,也在本系列<SQL优化方法><第二章 Filter连接那点事>进行分析。
(2)
with tmp as
(select t2.owner,t2.object_type ,t2.OBJECT_NAME
from test2 t2
where rownum>0
group by t2.owner,t2.object_type,OBJECT_NAME
having count(*)>30)
select count(*)
from test1 t1
where exists (select 1
from tmp t2
where t1.owner=t2.owner
and t1.object_type = t2.object_type
and t1.OBJECT_NAME=t2.OBJECT_NAME)
and t1.object_id>70000;
9.2.3 结论
如果Filter连接需要改善,可以把子查询拿出来做成with,取消Filter连接。让主查询和with走Hash,注意是走Hash,不是嵌套循环,因为同with子句连接无法走索引。