从一个SQL看in和exists的隐式转换

版权声明:本文原创,转载请注明出处。 https://blog.csdn.net/weixin_39004901/article/details/89012645

遇到一个SQL,学习到了一些关于in和exists隐式转换的优化知识。

SELECT 
      DISTINCT t1.differ_code as differCode,                       
      t1.relative_code AS relativeCode,                             
		  t2.deli_store_code as deliStoreCode,                          
		  t2.re_store_code as reStoreCode,                              
		  t2.order_type as orderType                                    
		from                                                            
		  a t1                                   
		join b t2                                   
		  on  t1.relative_code=t2.order_code                         
		join c t3                            
		  on t1.differ_code=t3.differ_code                           		                                                                
		where                                                           
		     t1.differ_order_status=1                                   
		     and t2.deli_store_code in                                  		                                                                
		( select                                                        
		    store_code                                                  
		  from d                       
		   where distributor_code = '123' )                              		                                                                
		   and t3.deal_object like 'out%'                               
		   and t3.deal_type= 'add' and t3.deal_num>0                    
		   and t1.differ_code not in                                    		                                                                
		( select                                                         
		   relative_code 
		  from e 
		   where
		   method = 'abc'
		   and response_flag like '%jkl%'
    )

本文的主题是集中关注以下子查询:

and t1.differ_code not in                                    		                                                                
		( select                                                         
		   relative_code 
		  from e 
		   where
		   method = 'abc'
		   and response_flag like '%jkl%'
    )

对于这个子查询,第一时间当然是看一下method上是否有索引,发现没有:

+------------------+------------+--------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table            | Non_unique | Key_name                       | Seq_in_index | Column_name   | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------+------------+--------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| e                |          0 | PRIMARY                        |            1 | id            | A         |    54098212 |     NULL | NULL   |      | BTREE      |         |               |
| e                |          1 | relative_code                  |            1 | relative_code | A         |      149442 |     NULL | NULL   | YES  | BTREE      |         |               |
+------------------+------------+--------------------------------+--------------+---------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

按照一般的思路,根据e表上的索引情况,这个子查询是要走全表扫描了。同时,很不幸的是,e表数据量大,粗略估计已经达到5000万的量级:

mysql> show table status like 'e'\G
*************************** 1. row ***************************
           Name: e
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 54098213
 Avg_row_length: 258
    Data_length: 13998489600
Max_data_length: 0
   Index_length: 6451806208
      Data_free: 5242880
 Auto_increment: 46898108
    Create_time: 2018-10-01 06:02:20
    Update_time: NULL
     Check_time: NULL
      Collation: utf8mb4_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec) 

单独执行该子查询是花费很长时间都没有查出来,但是执行本文的完整SQL,出乎意料的在大概几秒钟就查询出来了。那么看一下执行计划:

+----+--------------------+--------------------------------------+-------------+-----------------------------+---------------+---------+-------------------------------------+------+------------------------------------+
| id | select_type        | table                                | type        | possible_keys               | key           | key_len | ref                                 | rows | Extra                              |
+----+--------------------+--------------------------------------+-------------+-----------------------------+---------------+---------+-------------------------------------+------+------------------------------------+
|  1 | PRIMARY            | t1                                   | ALL         | PRIMARY,idx_relative_code   | NULL          | NULL    | NULL                                | 1369 | Using where; Using temporary       |
|  1 | PRIMARY            | t2                                   | eq_ref      | PRIMARY,idx_deli_store_code | PRIMARY       | 98      | test.t1.relative_code               |    1 | Using where                        |
|  1 | PRIMARY            | d                                    | eq_ref      | PRIMARY,idx_store_code      | PRIMARY       | 124     | test.t2.deli_store_code,const       |    1 | Using where; Using index; Distinct |
|  1 | PRIMARY            | t3                                   | ref         | PRIMARY                     | PRIMARY       | 98      | test.t1.differ_code                 |    2 | Using where; Distinct              |
|  3 | DEPENDENT SUBQUERY | e                                    | ref_or_null | relative_code               | relative_code | 203     | func                                |  724 | Using where                        |
+----+--------------------+--------------------------------------+-------------+-----------------------------+---------------+---------+-------------------------------------+------+------------------------------------+
5 rows in set (0.00 sec)

执行计划显示,e表居然走了relative_code字段上的索引,但奇怪的是,子查询中根本没有指定relative_code字段的谓词条件。而且ref显示的是func,rows显示仅仅访问了724行数据,虽然是粗略的统计信息,但跟预期的全表扫描5000多万相距甚远。是不是优化器经过改写了呢,以上执行计划并没有warnings提示,那么继续使用explain extended来解释一下看看:

+----+--------------------+--------------------------------------+-------------+-----------------------------+---------------+---------+-------------------------------------+------+----------+------------------------------------+
| id | select_type        | table                                | type        | possible_keys               | key           | key_len | ref                                 | rows | filtered | Extra                              |
+----+--------------------+--------------------------------------+-------------+-----------------------------+---------------+---------+-------------------------------------+------+----------+------------------------------------+
|  1 | PRIMARY            | t1                                   | ALL         | PRIMARY,idx_relative_code   | NULL          | NULL    | NULL                                | 1369 |   100.00 | Using where; Using temporary       |
|  1 | PRIMARY            | t2                                   | eq_ref      | PRIMARY,idx_deli_store_code | PRIMARY       | 98      |  test.t1.relative_code              |    1 |   100.00 | Using where                        |
|  1 | PRIMARY            | d                                    | eq_ref      | PRIMARY,idx_store_code      | PRIMARY       | 124     |  test.t2.deli_store_code,const      |    1 |   100.00 | Using where; Using index; Distinct |
|  1 | PRIMARY            | t3                                   | ref         | PRIMARY                     | PRIMARY       | 98      |  test.t1.differ_code                |    2 |   100.00 | Using where; Distinct              |
|  3 | DEPENDENT SUBQUERY | e                                    | ref_or_null | relative_code               | relative_code | 203     | func                                |  724 |   100.00 | Using where                        |
+----+--------------------+--------------------------------------+-------------+-----------------------------+---------------+---------+-------------------------------------+------+----------+------------------------------------+
5 rows in set, 1 warning (0.02 sec) 

mysql> show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select distinct ` test`.`t1`.`differ_code` AS `differCode`,` test`.`t1`.`relative_code` AS `relativeCode`,` test`.`t2`.`deli_store_code` AS `deliStoreCode`,` test`.`t2`.`re_store_code` AS `reStoreCode`,` test`.`t2`.`order_type` AS `orderType` from ` test`.`d` join ` test`.`mall_erp_differ_order` `t1` join ` test`.`mall_erp_order_extend` `t2` join ` test`.`mall_erp_differ_order_detail` `t3` where ((` test`.`t2`.`order_code` = ` test`.`t1`.`relative_code`) and (` test`.`t3`.`differ_code` = ` test`.`t1`.`differ_code`) and (` test`.`d`.`store_code` = ` test`.`t2`.`deli_store_code`) and (` test`.`t1`.`differ_order_status` = 1) and (` test`.`t3`.`deal_object` like 'out%') and (` test`.`t3`.`deal_type` = 'add') and (` test`.`t3`.`deal_num` > 0) and (not(<in_optimizer>(` test`.`t1`.`differ_code`,<exists>(/* select#3 */ select 1 from ` test`.`e` where ((` test`.`e`.`method` = 'syscDiffOutOrders') and (` test`.`e`.`response_flag` like '%WMS_ACCEPT%') and ((convert(<cache>(` test`.`t1`.`differ_code`) using utf8mb4) = ` test`.`e`.`relative_code`) or isnull(` test`.`e`.`relative_code`))) having <is_not_null_test>(` test`.`e`.`relative_code`))))) and (` test`.`d`.`distributor_code` = '61272'))
1 row in set (0.00 sec)

可以看到优化器确实是改写了SQL,那么我们稍微格式化一下改写后的SQL,看看是什么样子:

select 
      distinct ` test`.`t1`.`differ_code` AS `differCode`,
      ` test`.`t1`.`relative_code` AS `relativeCode`,
      ` test`.`t2`.`deli_store_code` AS `deliStoreCode`,
      ` test`.`t2`.`re_store_code` AS `reStoreCode`,
      ` test`.`t2`.`order_type` AS `orderType` 
      from ` test`.`d` join ` test`.`mall_erp_differ_order` `t1` 
      join ` test`.`mall_erp_order_extend` `t2` 
      join ` test`.`mall_erp_differ_order_detail` `t3` 
      where (
          (` test`.`t2`.`order_code` = ` test`.`t1`.`relative_code`) 
      and (` test`.`t3`.`differ_code` = ` test`.`t1`.`differ_code`) 
      and (` test`.`d`.`store_code` = ` test`.`t2`.`deli_store_code`) 
      and (` test`.`t1`.`differ_order_status` = 1) 
      and (` test`.`t3`.`deal_object` like 'out%') 
      and (` test`.`t3`.`deal_type` = 'add') 
      and (` test`.`t3`.`deal_num` > 0) 
      and (
           not(
               <in_optimizer>(
                     ` test`.`t1`.`differ_code`,<exists>(
                                   /* select#3 */ select 1 from ` test`.`e` 
                                         where (
                                        (` test`.`e`.`method` = 'syscDiffOutOrders') 
                                         and (` test`.`e`.`response_flag` like '%WMS_ACCEPT%') 
                                          and (
                                          (convert(<cache>(` test`.`t1`.`differ_code`) using utf8mb4) = ` test`.`e`.`relative_code`) 
                                          or isnull(` test`.`e`.`relative_code`)
                                          )
                                         ) having <is_not_null_test>(` test`.`e`.`relative_code`)
                                         )
                             )
              )
           ) 
      and (` test`.`d`.`distributor_code` = '61272')
             )

原来,优化器将not in子查询转换成了以下的exists相关子查询:

 not(
               <in_optimizer>(
                     ` test`.`t1`.`differ_code`,<exists>(
                                   /* select#3 */ select 1 from ` test`.`e` 
                                         where (
                                        (` test`.`e`.`method` = 'syscDiffOutOrders') 
                                         and (` test`.`e`.`response_flag` like '%WMS_ACCEPT%') 
                                          and (
                                          (convert(<cache>(` test`.`t1`.`differ_code`) using utf8mb4) = ` test`.`e`.`relative_code`) 
                                          or isnull(` test`.`e`.`relative_code`)
                                          )
                                         ) having <is_not_null_test>(` test`.`e`.`relative_code`)
                                         )
                             )
              )

我们可以看到,相关子查询的关联条件是:
convert(<cache>(` test`.`t1`.`differ_code`) using utf8mb4) = ` test`.`e`.`relative_code
那么,转换成exists之后,子查询和外部查询的执行顺序就变了。
原本的not in子查询,执行顺序应该是先执行子查询得到结果集,然后再与外部查询进行对比;
而经过优化器内部改写后,not in变成了not exists,那么执行顺序应该是,外部查询先确定一个differ_code,再进到子查询里跟e.relative_code进行比较,而e.relative_code上有索引,所以也就解释了执行计划里为什么用到了relative_code上的索引。而ref对应的func,应该是t1表和e表对应的字段字符集不一样,通过convert进行了隐式转换,所以这里用于跟e.relative_code匹配的数据是经过函数处理的。

在结束本文之前,还有一个题外的注意点,看转换后的以下条件:

and (
(convert(<cache>(` test`.`t1`.`differ_code`) using utf8mb4) = ` test`.`e`.`relative_code`) 
or isnull(` test`.`e`.`relative_code`)
)

转换成相关子查询以后,除了convert(<cache>(` test`.`t1`.`differ_code`) using utf8mb4) = ` test`.`e`.`relative_code条件以外,还有一个条件isnull(` test`.`e`.`relative_code,为什么呢?
我们先看一下原SQL的逻辑,

扫描二维码关注公众号,回复: 5970625 查看本文章
 t1.differ_code not in                                    		                                                                
		( select                                                         
		   relative_code 
		  from e 

t1.differ_code需要满足不包含子查询返回的relative_code,那么这个relative_code即包括具体的数据,也可能包括null值,not in的逻辑是不包含后面指定的值,以及null。
在转换之后,convert(<cache>(` test`.`t1`.`differ_code`) using utf8mb4) = ` test`.`e`.`relative_code的对比只是确定了t1.differ_codee.relative_code匹配了,但是对于e.relative_code是null的情况,如果还通过=来对比,那么就是返回null,即判断条件布尔值为0,那么就是说e.relative_code是null的情况是符合情况的,这明显是不符合原SQL的逻辑,所以转换之后,需要带上or isnull(test.e.relative_code)这个条件。

猜你喜欢

转载自blog.csdn.net/weixin_39004901/article/details/89012645