MySQL8.0 new features Hash Join

Overview & background

MySQL has been criticized not achieved HashJoin, the latest release of 8.0.18 has been put on this function, gratifying. Sometimes I think, MySQL support HashJoin Why has not it? I think it might be because MySQL used for simple OLTP scenarios, and the majority of applications on the Internet, the demand is not so urgent. On the other hand may be due to previously entirely by the community, this evolution speed is limited, after Oracle's acquisition of MySQL, MySQL made version of the evolution of a lot faster than it.

HashJoin algorithm itself is not complicated, to say the complex and may be selected when the optimizer supporting the implementation of plans, whether to select HashJoin, select appearance, in the table may be more complicated. Anyway now we have HashJoin, the optimizer one more choice when selecting Join algorithm. MySQL spirit of pragmatism, I believe that this enhanced also responded to some questions, some features are not without the ability to do a good job, but has its priority.

Prior to 8.0.18, MySQL supports only NestLoopJoin algorithms, the simplest is Simple NestLoop Join, MySQL for this algorithm to do a number of optimized to achieve Block NestLoop Join, Index NestLoop Join and Batched Key Access, etc. With these optimizations, in to some extent, can alleviate the urgency of HashJoin. Below will take a separate chapter speaks of these Join MySQL optimization, the following talk about HashJoin.

Hash Join algorithm

NestLoopJoin simple algorithm, is to double loop, traversing the outer (driving table), for each row the appearance of the recording, and then traverse the table, then the join is determined whether the conditions to decide on whether to discharge the recording performed on a node. From the perspective of the algorithm, which is a complexity of M * N. HashJoin is optimized for equal-join the scene, the basic idea is that the appearance of the load data into memory, and the establishment of a hash table, so only need to traverse over the table, you can complete the join operation, the output matching records. If the data can load into memory all well and good, simple logic, generally referred to as such join CHJ (Classic Hash Join), before MariaDB has been achieved this HashJoin algorithm. If the data can not be fully load into memory, we need to batch load into memory, then join in batches, to achieve the following specific description of these types of join algorithms.

In-Memory Join(CHJ)

HashJoin generally includes two processes, process and build process to create a probe probing hash table hash table.

1).build phase

Traversal looks to join conditions for key, query column as the value needed to create a hash table. Here it comes to a choice based on appearance, mainly involved in assessing join the two tables (result set) is to determine the size, small on the choice of who is who, so limited memory easier to put down the hash table.

2).probe phase

After the build is completed hash table, and then the traverse table row for each record in the table, the hash value calculated join condition in the hash table lookup and, if a match is output, otherwise skip. All records in the table traversed, the entire process is over. Process the following illustration, from MySQL official blog

    

Build process is left, the right is the probe process, the country_id is equal_join conditions, Countries table is a foreign, persons in the table is the table.

On-Disk Hash Join

CHJ constraints that can hold the entire memory required appearance. In MySQL, Join can be used in the control parameter memory join_buffer_size. If the desired memory join join_buffer_size exceeded, then CHJ will do nothing but to look into segments, each segment one by one build process, and the traverse table once for each probe segment further process. Suppose the outer sheet is divided into N, then N times within the scan table. This is of course weak. In MySQL8.0, if you need more memory than join the join_buffer_size, build stage will be the first operator to use hash partitioning appearance, and generate temporary fragmentation written to disk; then in the probe stage, for in the table using the same hash algorithm partition. Since fragmentation using the same hash function, the same key (the same join condition) necessarily in the same slice number. Subsequently, again the same for the inner and outer data pieces of the process number CHJ all slices CHJ done, the whole process is over join. The cost of this algorithm is that for outer and inner tables were twice read IO, a write IO. With respect to the required N times before the scan table IO now better approach.

                     

                                   

FIG upper left side is the outer slice process, the upper right side of FIG inner fragment is a process, is the bottom of FIG slice process for build + probe.

Grace Hash Join

主流的数据库Oracle,SQLServer,PostgreSQL早就支持了HashJoin。Join算法都类似,这里介绍下Oracle使用的Grace Hash Join算法。其实整个过程与MySQL的HashJoin类似,主要有一点区别。当出现join_buffer_size不足时,MySQL会对外表进行分片,然后再进行CHJ过程。但是,极端情况下,如果数据分布不均匀,导致大量的数据hash后都分布在一个分桶中,导致分片后,join_buffer_size仍然不够,MySQL的处理方式是一次读分片读若干记录构建hash表,然后probe对应的外表分片。处理完一批后,清理hash表,重复上述过程,直到这个分片的所有数据处理完为止。这个过程与CHJ在join_buffer_size不足时,处理逻辑相同。

GraceHash在遇到这种情况时,会继续分片进行二次Hash,直到内存足够放下一个hash表为止。但是,这里仍然有极端情况,如果输入join条件都相同,那么无论进行多少次Hash,都没法分开,那么这个时候GraceHashJoin也退化成和MySQL的处理方式一样。

hybrid hash join

与GraceHashJoin的区别在于,如果缓存能缓存足够多的分片数据,会尽量缓存,那么就不必像GraceHash那样,严格地将所有分片都先读进内存,然后写到外存,然后再读进内存去走build过程。这个是在内存相对于分片比较充裕的情况下的一种优化,目的是为了减少磁盘的读写IO。目前Oceanbase的HashJoin采用的是这种join方式。

MySQL-Join算法优化

在MySQL8.0.18之前,也就是在很长一段时间内,MySQL数据库并没有HashJoin,主要的Join算法是NestLoopJoin。SimpleNestLoopJoin显然是很低效的,对内表需要进行N次全表扫描,实际复杂度是N*M,N是外表的记录数目,M是记录数,代表一次扫描内表的代价。为此,MySQL针对SimpleNestLoopJoin做了若干优化,下面贴的图片均来自网络

BlockNestLoopJoin(BNLJ)

MySQL采用了批量技术,即一次利用join_buffer_size缓存足够多的记录,每次遍历内表时,每条内表记录与这一批数据进行条件判断,这样就减少了扫描内表的次数,如果内表比较大,间接就缓解了IO的读压力。

                                                  

IndexNestLoopJoin(INLJ)

如果我们能对内表的join条件建立索引,那么对于外表的每条记录,无需再进行全表扫描内表,只需要一次Btree-Lookup即可,整体时间复杂度降低为N*O(logM)。对比HashJoin,对于外表每条记录,HashJoin是一次HashTable的search,当然HashTable也有build时间,还需要处理内存不足的情况,不一定比INLJ好。

Batched Key Access

IndexNestLoopJoin利用join条件的索引,通过Btree-Lookup去匹配减少了遍历内表的代价。如果join条件是非主键列,那么意味着大量的回表和随机IO。BKA优化的做法是,将满足条件的一批数据按主键排序,这样回表时,从主键的角度来说就相对有序,缓解随机IO的代价。BKA实际上是利用了MRR特性(MultiRangeRead),访问数据之前,先将主键排序,然后再访问。主键排序的缓存大小通过参数read_rnd_buffer_size控制。

      

总结

MySQL8.0以后,Server层代码做了大量的重构,虽然优化器相对于Oracle还有很大差距,但一直在进步。HashJoin的支持使得MySQL优化器有更多选择,SQL的执行路径也能做到更优,尤其是对于等值join的场景。虽然MySQL之前对于Join做过若干优化,比如NBLJ,INLJ以及BKA等,但这些代替不了HashJoin的作用。一个好用的数据库就应该具备丰富的基础能力,利用优化器分析出合适场景,然后拿出对应的基础能力以最高效的方式响应请求。

参考文档

https://en.wikipedia.org/wiki/Hash_join

https://mysqlserverteam.com/hash-join-in-mysql-8/

https://dev.mysql.com/worklog/task/?id=2241

https://www.cnblogs.com/qixinbo/p/10524142.html

https://zhuanlan.zhihu.com/p/35040231

Guess you like

Origin www.cnblogs.com/cchust/p/11961851.html