本文翻译自mysql性能调优及架构设计 优化部分。主要是做个记录和增加学习的广度。
实现算法:
Nested Loop Join。本质上和我们平时使用时的直观感受一致,选定一张驱动表作为循环基础数据,然后循环该结果集中的数据作为过滤条件到下一张表内查询数据,然后合并结果。如果有多个join,则将前面的结果集作为循环条件,到后一个表中查询数据。
书中的例子:
select m.subject msg_subject, c.content msg_content from user_group g,group_message m,group_message_content c where g.user_id = 1 and m.group_id = g.group_id and c.group_msg_id = m.id
userid,group_id,group_msg_id都存在索引。
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: g type: ref possible_keys: user_group_gid_ind,user_group_uid_ind,user_group_gid_uid_ind key: user_group_uid_ind key_len: 4 ref: const rows: 2 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: m type: ref possible_keys: PRIMARY,idx_group_message_gid_uid key: idx_group_message_gid_uid key_len: 4 ref: g.group_id rows: 3 Extra: *************************** 3. row *************************** id: 1 select_type: SIMPLE table: c type: ref possible_keys: idx_group_message_content_msg_id key: idx_group_message_content_msg_id key_len: 4 ref: m.id rows: 2 Extra:
伪代码如下:
for each record g_rec in table user_group that g_rec.user_id=1{ for each record m_rec in group_message that m_rec.group_id=g_rec.group_id{ for each record c_rec in group_message_content that c_rec.group_msg_id=m_rec.id pass the (g_rec.user_id, m_rec.subject, c_rec.content) row combination to output; } }
如果去掉group_msg_id字段的索引,执行计划就不一样了。
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: g type: ref possible_keys: user_group_uid_ind key: user_group_uid_ind key_len: 4 ref: const rows: 2 Extra: *************************** 2. row *************************** id: 1 select_type: SIMPLE table: m type: ref possible_keys: PRIMARY,idx_group_message_gid_uid key: idx_group_message_gid_uid key_len: 4 ref: g.group_id rows: 3 Extra: *************************** 3. row *************************** id: 1 select_type: SIMPLE table: c type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 96 Extra:Using where;Using join buffer最后一张表的索引删除了,就只能全表扫描,因此使用到了join buffer,其实也就是缓存。注意到,前两张表的数据放到了缓存内,然后通过这个缓存再和第三张表进行查询。这是因为第三张表需要全表扫描了,这时候没有必要每一次都关联。。。
for each record g_rec in table user_group{ for each record m_rec in group_message that m_rec.group_id=g_rec.group_id{ put (g_rec, m_rec) into the buffer if (buffer is full) flush_buffer(); } } flush_buffer(){ for each record c_rec in group_message_content that c_rec.group_msg_id = c_rec.id{ for each record in the buffer pass (g_rec.user_id, m_rec.subject, c_rec.content) row combination to output; } empty the buffer; }
join语句的优化:
1.小结果集驱动大结果集
2.优先优化内层循环。
3.对join字段建立索引。
扫描二维码关注公众号,回复:
1306863 查看本文章
4.当无法建立索引时,需要注意join Buffer Size(暂时不清楚,没遇到具体场景)