Mysql-InnoDB存储引擎中-join原理

本文翻译自mysql性能调优及架构设计优化部分。主要是做个记录和增加学习的广度。

实现算法：

Nested Loop Join。本质上和我们平时使用时的直观感受一致，选定一张驱动表作为循环基础数据，然后循环该结果集中的数据作为过滤条件到下一张表内查询数据，然后合并结果。如果有多个join，则将前面的结果集作为循环条件，到后一个表中查询数据。

书中的例子：

select m.subject msg_subject, c.content msg_content  
from user_group g,group_message m,group_message_content c  
where g.user_id = 1  
and m.group_id = g.group_id  
and c.group_msg_id = m.id

userid,group_id,group_msg_id都存在索引。

*************************** 1. row ***************************  
id: 1  
select_type: SIMPLE  
table: g  
type: ref  
possible_keys: user_group_gid_ind,user_group_uid_ind,user_group_gid_uid_ind  
key: user_group_uid_ind  
key_len: 4  
ref: const  
rows: 2  
Extra:  
*************************** 2. row ***************************  
id: 1  
select_type: SIMPLE  
table: m  
type: ref  
possible_keys: PRIMARY,idx_group_message_gid_uid  
key: idx_group_message_gid_uid  
key_len: 4  
ref: g.group_id  
rows: 3  
Extra:  
*************************** 3. row ***************************  
id: 1  
select_type: SIMPLE  
table: c  
type: ref  
possible_keys: idx_group_message_content_msg_id  
key: idx_group_message_content_msg_id  
key_len: 4  
ref: m.id  
rows: 2  
Extra:

伪代码如下：

for each record g_rec in table user_group that g_rec.user_id=1{
     for each record m_rec in group_message that m_rec.group_id=g_rec.group_id{
          for each record c_rec in group_message_content that c_rec.group_msg_id=m_rec.id
                pass the (g_rec.user_id, m_rec.subject, c_rec.content) row
          combination to output;
      }
}

如果去掉group_msg_id字段的索引，执行计划就不一样了。

*************************** 1. row ***************************  
id: 1  
select_type: SIMPLE  
table: g  
type: ref  
possible_keys: user_group_uid_ind  
key: user_group_uid_ind  
key_len: 4  
ref: const  
rows: 2  
Extra:  
*************************** 2. row ***************************  
id: 1  
select_type: SIMPLE  
table: m  
type: ref  
possible_keys: PRIMARY,idx_group_message_gid_uid  
key: idx_group_message_gid_uid  
key_len: 4  
ref: g.group_id  
rows: 3  
Extra:  
*************************** 3. row ***************************  
id: 1  
select_type: SIMPLE  
table: c  
type: ALL  
possible_keys: NULL  
key: NULL  
key_len: NULL  
ref: NULL  
rows: 96  
Extra:Using where;Using join buffer

最后一张表的索引删除了，就只能全表扫描，因此使用到了join buffer，其实也就是缓存。注意到，前两张表的数据放到了缓存内，然后通过这个缓存再和第三张表进行查询。这是因为第三张表需要全表扫描了，这时候没有必要每一次都关联。。。

for each record g_rec in table user_group{
      for each record m_rec in group_message that m_rec.group_id=g_rec.group_id{
           put (g_rec, m_rec) into the buffer
           if (buffer is full)
                 flush_buffer();
      }
}
flush_buffer(){
      for each record c_rec in group_message_content that c_rec.group_msg_id = c_rec.id{
            for each record in the buffer
                 pass (g_rec.user_id, m_rec.subject, c_rec.content) row combination to output;
      }
      empty the buffer;
}

join语句的优化：

1.小结果集驱动大结果集

2.优先优化内层循环。

3.对join字段建立索引。

扫描二维码关注公众号，回复： 1306863 查看本文章

4.当无法建立索引时，需要注意join Buffer Size（暂时不清楚，没遇到具体场景）

Mysql-InnoDB存储引擎中-join原理

猜你喜欢