MySQL 8 Anti-Join 几点总结

导读

作者:郑松华知数堂SQL优化班老师,网名:骑龟的兔子

一想到你在关注我就忍不住有点紧张

已经很久没写文章了,今天写篇文章祝大家圣诞快乐!

今天给大家分享下,关于not in ,not exists相关的文章。其实这个可以归纳为exists to in的一类,mysql以前的版本中对in和exists的处理是完全不一样的,直到8.0.16版本。

16版本之前in可以优化成semi join那些关于semi join的几个优化,如:loosescan=on,firstmatch=on,duplicateweedout=on,materialization=on,都是对于in的 , exists只有一种运算就是DEPENDENT SUBQUERY,直到16版本开始exists可以和in等价了,也可以享受到semi join,这回最新版本中mysqlservertiam.com网站中新出现了关于anti join的文章。

Antijoin in MySQL 8 mysqlserverteam.com

我们可以简单理解为not exists和not in在直白点就是子查询变成半连接了,下面我举个5.7和8.0.18版本同一种sql的两种不同的执行计划

首先5.7

--------------
mysql  Ver 14.14 Distrib 5.7.14, for linux-glibc2.5 (x86_64) using  EditLine wrapper

Connection id:          2
Current database:       employees
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         5.7.14-log MySQL Community Server (GPL)
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8
Db     characterset:    utf8
Client characterset:    utf8
Conn.  characterset:    utf8
UNIX socket:            /tmp/mysql3306.sock
Uptime:                 5 min 4 sec

Threads: 1  Questions: 21  Slow queries: 0  Opens: 116  Flush tables: 1  Open tables: 109  Queries per second avg: 0.069
--------------

[email protected]>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+
|  1 | PRIMARY            | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+-------------+

如上述5.7.14版本中not exists执行计划是DEPENDENT SUBQUERY

我们在8.0.18 版本中看下同一个sql:

[email protected]>[employees]>\s
--------------
/usr/local/mysql8/bin/mysql  Ver 8.0.18 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL)

Connection id:          7
Current database:       employees
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server version:         8.0.18 MySQL Community Server - GPL
Protocol version:       10
Connection:             Localhost via UNIX socket
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    utf8mb4
Conn.  characterset:    utf8mb4
UNIX socket:            /tmp/mysql3308.sock
Uptime:                 19 min 24 sec

Threads: 1  Questions: 46  Slow queries: 0  Opens: 175  Flush tables: 3  Open tables: 95  Queries per second avg: 0.039

[email protected]>[employees]>desc select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra                                |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | NULL                                 |
|  1 | SIMPLE      | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+

我们可以看到变成了anti join

[email protected]>[employees]>desc format=tree  select * from t_group t where not exists (select 1 from employees e where e.emp_no = t.emp_no) ;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop anti-join  (cost=12.25 rows=10)
    -> Table scan on t  (cost=1.25 rows=10)
    -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no)  (cost=1.01 rows=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

因为5.7中没有8.0.18版本新增的explain format=tree查看执行计划的方法,我们比较show warnings\G

5.7

[email protected]>[employees]>show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1276
Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` where (not(exists(/* select#2 */ select 1 from `employees`.`employees` `e` where (`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`))))
2 rows in set (0.00 sec)

8.0

[email protected]>[employees]>show warnings\G
*************************** 1. row ***************************
  Level: Note
   Code: 1276
Message: Field or reference 'employees.t.emp_no' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `employees`.`t`.`emp_no` AS `emp_no`,`employees`.`t`.`dept_no` AS `dept_no`,`employees`.`t`.`from_date` AS `from_date`,`employees`.`t`.`to_date` AS `to_date` from `employees`.`t_group` `t` anti join (`employees`.`employees` `e`) on((`employees`.`e`.`emp_no` = `employees`.`t`.`emp_no`)) where true
2 rows in set (0.00 sec)

可以看出5.7中是还是not exists而8.0.18版本中变成了anti join,那有人问了 DEPENDENT SUBQUERY也好,anti join也好到底有啥用?

简单来说DEPENDENT SUBQUERY类似于函数调用 ,需要1个重要的前提:必须接受参数 !也就是说只能是nested loop join方式。但是anti join就不一定了它也可以是nested loop join方式,也可以是hash join方式;而hash join就是 8.0.18版本的新特性!

关于hash join的文章 可以看如下:

hash join mysqlserverteam.com

而且DEPENDENT SUBQUERY随着外层表的结果集的数据量的增大而执行时间增大,以前如果碰到这种极端的问题,我们可以利用 left join来解决 ,具体的方法可以看我的公开课。

现在随着新版的anti join这些我们就可以省略了。

最后我想说的是,优化不能离开版本,离开版本谈优化,是不行的。

还有在这次的8.0.18 版本新引入的

explain format=tree / analyze 查看执行计划的方式中发现了一个很严重的问题

[email protected]>[employees]>desc  select * from t_group t where not exists (select 1 from employees e   where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref                | rows | filtered | Extra                                |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL    | NULL          | NULL    | NULL    | NULL               |   10 |   100.00 | NULL                                 |
|  1 | SIMPLE      | e     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.t.emp_no |    1 |   100.00 | Using where; Not exists; Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------+------+----------+--------------------------------------+

[email protected]>[employees]>desc format=tree select * from t_group t where not exists (select 1 from employees e   where e.emp_no = t.emp_no) ;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop anti-join  (cost=12.17 rows=10)
    -> Table scan on t  (cost=1.25 rows=10)
    -> Single-row index lookup on e using PRIMARY (emp_no=t.emp_no)  (cost=1.00 rows=1)
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

如上所述,可以看到发生了nested loop anti-join,现在,我要把这个改成如下:

[email protected]>[employees]>desc  select * from t_group t where not exists (select 1 from employees e ignore index(pri)  where e.emp_no = t.emp_no) ;
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra                                                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL |     10 |   100.00 | NULL                                                           |
|  1 | SIMPLE      | e     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299246 |   100.00 | Using where; Not exists; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------------------+

[email protected]>[employees]>desc analyze  select * from t_group t where not exists (select 1 from employees e ignore index(pri)  where e.emp_no = t.emp_no) ;
ERROR 1235 (42000): This version of MySQL doesn't yet support 'EXPLAIN ANALYZE on this query'

如上述所示,这回desc会出现执行计划,但是desc analyze就会发生错误!

我的推测这个有可能是一个bug ! 或者是还没来得及支持,毕竟新特性,希望以后版本中更改。


水平有限,如有错误,欢迎纠正。

谢谢大家~ 欢迎转发~

我是知数堂SQL 优化班老师~ ^^

如有关于SQL优化方面疑问和一起交流的请加:

高性能MySQL,SQL优化群

并且 @兔子@知数堂SQL优化

欢迎加入 知数堂大家庭。

扫码加入MySQL技术Q群

(群号:650149401)

   

点“在看”给我一朵小黄花

发布了254 篇原创文章 · 获赞 91 · 访问量 68万+

猜你喜欢

转载自blog.csdn.net/n88Lpo/article/details/103725035