导读:
作者:郑松华,知数堂SQL优化班老师,网名:骑龟的兔子
一想到你在关注我就忍不住有点紧张
已经很久没写文章了,今天写篇文章祝大家圣诞快乐!
今天给大家分享下,关于not in ,not exists相关的文章。其实这个可以归纳为exists to in的一类,mysql以前的版本中对in和exists的处理是完全不一样的,直到8.0.16版本。
16版本之前in可以优化成semi join那些关于semi join的几个优化,如:loosescan=on,firstmatch=on,duplicateweedout=on,materialization=on,都是对于in的 , exists只有一种运算就是DEPENDENT SUBQUERY,直到16版本开始exists可以和in等价了,也可以享受到semi join,这回最新版本中mysqlservertiam.com网站中新出现了关于anti join的文章。
Antijoin in MySQL 8 mysqlserverteam.com
我们可以简单理解为not exists和not in在直白点就是子查询变成半连接了,下面我举个5.7和8.0.18版本同一种sql的两种不同的执行计划
首先5.7
--------------
mysql Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using EditLine wrapper
Connection id: 2
Current database: employees
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.7.14-log MySQL Community Server (GPL)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /tmp/mysql3306.sock
Uptime: 5 min 4 sec
Threads: 1 Questions: 21 Slow queries: 0 Opens: 116 Flush tables: 1 Open tables: 109 Queries per second avg: 0.069
--------------
[email protected]>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
| 1 | PRIMARY | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
如上述5.7.14版本中not exists执行计划是DEPENDENT SUBQUERY
我们在8.0.18 版本中看下同一个sql:
[email protected]>[employees]>\s
--------------
/usr/local/mysql8/bin/mysql Ver 8.0.18 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL)
Connection id: 7
Current database: employees
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 8.0.18 MySQL Community Server - GPL
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: utf8mb4
Db characterset: utf8mb4
Client characterset: utf8mb4
Conn. characterset: utf8mb4
UNIX socket: /tmp/mysql3308.sock
Uptime: 19 min 24 sec
Threads: 1 Questions: 46 Slow queries: 0 Opens: 175 Flush tables: 3 Open tables: 95 Queries per second avg: 0.039
[email protected]>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL |
| 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
我们可以看到变成了anti join
[email protected]>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop anti-join (cost=12.25 rows=10)
-> Table scan on t (cost=1.25 rows=10)
-> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no) (cost=1.01 rows=1)
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
因为5.7中没有8.0.18版本新增的explain format=tree查看执行计划的方法,我们比较show warnings\G
5.7
[email protected]>[employees]>show warnings\G
*************************** 1. row ***************************
Level: Note
Code: 1276
Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` where (not(exists(/* select#2 */ select 1 from `employees`.`employees` `e` where (`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`))))
2 rows in set (0.00 sec)
8.0
[email protected]>[employees]>show warnings\G
*************************** 1. row ***************************
Level: Note
Code: 1276
Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` anti join (`employees`.`employees` `e`) on((`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`)) where true
2 rows in set (0.00 sec)
可以看出5.7中是还是not exists而8.0.18版本中变成了anti join,那有人问了 DEPENDENT SUBQUERY也好,anti join也好到底有啥用?
简单来说DEPENDENT SUBQUERY类似于函数调用 ,需要1个重要的前提:必须接受参数 !也就是说只能是nested loop join方式。但是anti join就不一定了它也可以是nested loop join方式,也可以是hash join方式;而hash join就是 8.0.18版本的新特性!
关于hash join的文章 可以看如下:
hash join mysqlserverteam.com
而且DEPENDENT SUBQUERY随着外层表的结果集的数据量的增大而执行时间增大,以前如果碰到这种极端的问题,我们可以利用 left join来解决 ,具体的方法可以看我的公开课。
现在随着新版的anti join这些我们就可以省略了。
最后我想说的是,优化不能离开版本,离开版本谈优化,是不行的。
还有在这次的8.0.18 版本新引入的
explain format=tree / analyze 查看执行计划的方式中发现了一个很严重的问题
[email protected]>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL |
| 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.t.emp_no | 1 | 100.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
[email protected]>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop anti-join (cost=12.17 rows=10)
-> Table scan on t (cost=1.25 rows=10)
-> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no) (cost=1.00 rows=1)
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
如上所述,可以看到发生了nested loop anti-join,现在,我要把这个改成如下:
[email protected]>[employees]>desc select * from t_group t where not exists (select 1 from employees e ignore index(pri) where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
| 1 | SIMPLE | t | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | NULL |
| 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299246 | 100.00 | Using where; Not exists; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
[email protected]>[employees]>desc analyze select * from t_group t where not exists (select 1 from employees e ignore index(pri) where e.emp_no = t.emp_no) ;
ERROR 1235 (42000): This version of MySQL doesn't yet support 'EXPLAIN ANALYZE on this query'
如上述所示,这回desc会出现执行计划,但是desc analyze就会发生错误!
我的推测这个有可能是一个bug ! 或者是还没来得及支持,毕竟新特性,希望以后版本中更改。
水平有限,如有错误,欢迎纠正。
谢谢大家~ 欢迎转发~
我是知数堂SQL 优化班老师~ ^^
如有关于SQL优化方面疑问和一起交流的请加:
高性能MySQL,SQL优化群
并且 @兔子@知数堂SQL优化
欢迎加入 知数堂大家庭。
扫码加入MySQL技术Q群
(群号:650149401)
点“在看”给我一朵小黄花