The execution process of MySQL JOIN (1)

Happy moment

　　Me: Hey, boss, there is no ice tea
　　boss: There
　　I was: how much a bottle
　　proprietress: 3
　　I: Give me a bottle, give, three
　　boss: come, your ice tea
　　I: Play Now, I want to Ice black tea, can you give me a bottle cap dry?
　　Mrs. Boss: Here is another bottle. My store is sold out. Go to the next door to change it.

Problem background

　　Regarding the MySQL JOIN, I don’t know if you have ever thought about his execution process, or have doubted your own understanding (self-confident and self-thinking!); if you don’t know how to check, you can try to answer the following problem

　　Selection of drive table

　　　　How will MySQL choose the driving table, choosing the first one from left to right?

　　The order of multi-table connection

　　　　Suppose we have 3 tables: A, B, C, and the following SQL

-- 伪 SQL，不能直接执行
A LEFT JOIN B ON B.aId = A.id
LEFT JOIN C ON C.aId = A.id
WHERE A.name = '666' AND B.state = 1 AND C.create_time > '2019-11-22 12:12:30'

　　　　Is it the result of the processing of the combined tables of A and B and then the combined table processing with C, or the filtering processing after the combined tables of A, B, and C are combined, or the two are wrong and there are other processing methods?

　　When ON and WHERE take effect

　　　　The landlord accidentally went to a blog post with the following introduction

Serious Figure 1 is taken from Mysql-JOIN Detailed Explanation

　　　　After reading this, the host has the feeling of discovering the new world for the first time. The original JOIN execution order is like this (not subverting the host’s previous cognition, because the host has not thought about this problem before, but has acquired a new skill Satisfied), but the more I think about it, the more wrong it becomes. It feels like I learned the wrong skill (I didn't learn it at level 6!)

　　　　If the two tables each have tens of millions of data, the Cartesian product of these two tables will result in unimaginable results! That is to say, the order in Figure 1 is still to be discussed, and the effective time of ON and WHERE is also to be discussed.

　　If you know everything about the above issues, please go away and don’t prevent me from pretending to be forced; if you are not particularly clear about the above issues, then please sit down and I’m about to start pretending to be forced.

Prerequisite preparation

　　Before the official lecture, I will prepare some peanuts, melon seeds and beer for everyone. If you are pretending, you have to have a pretending atmosphere. (Original, you liar, are you selling goods?)

　　Drive table

　　　　What is a driven table? It refers to the first table to be processed in a multi-table related query, which can also be called the base table, and then use the records of this table to associate other tables. The selection of the driving table follows a principle: under the premise that it has no effect on the final result set, the table with the least result set is preferred as the driving table. This principle is not easy to understand, and the result set is the smallest. Maybe we can estimate it, but it does not affect the final result set. This is difficult to judge and difficult to solve, but there are still certain rules:

LEFT JOIN generally uses the left table as the driving table (RIGHT JOIN generally uses the right table), and INNER JOIN generally uses the table with less result set as the driving table. If you still feel in doubt, you can use EXPLAIN to find the driving table. The result is the first A table is the driving table. 
Do you think EXPLAIN must be accurate? The execution plan may change when it is actually executed! 

Applicable in most cases, especially EXPLAIN

　　　　LEFT JOIN will be optimized into INNER JOIN by the query optimizer in some cases; the result set refers to the filtered result of the records in the table, not all the records in the table, if there is no filter condition, it is all the records in the table

　　　　For more information, please see: Execution details of Mysql multi-table join query (1)

　　Flow chart of SQL execution

　　　　What did MySQL do when we sent a request to MySQL

SQL execution path, taken from "High Performance MySQL"

　　　　As you can see, the execution plan is the output result of the query optimizer, and the execution engine queries the data according to the execution plan

　　data preparation

　　　　MySQL 5.7.1, InnoDB engine; table creation SQL and data initial SQL

View Code

　　Single table query

　　　　The single-table query process is easier to understand, roughly as follows

　　　　I won’t go into details about single-table query, it mainly involves: clustered index, covering index, back-to-table operation. Knowing these 3 points, the above picture is easy to understand (if you don’t know, please check the information quickly, if you expose it, you will be ashamed! ).

Join table algorithm

　　MySQL's join table algorithm is a series of algorithms derived from the nested-loop algorithm . Different algorithms are selected according to different conditions.

In the case of using index association, there are two algorithms: Index Nested-Loop join and Batched Key Access join; 
in the case of not using index association, there are two algorithms: Simple Nested-Loop join and Block Nested-Loop join;

　　Simple Nested-Loop

　　　　Simple nested loop, SNL for short; match one by one, like this

View Code

　　　　This algorithm is simple and rude, but it has no performance at all. The time performance is n (the number of records in the table) to the power of m (the number of tables), so MySQL has been optimized, and this will not occur when querying join tables. An algorithm, even if there is no index on the connection key without WHERE condition and ON, this algorithm will not be used

　　Block Nested-Loop

　　　　Cache block nested loop connection, referred to as BNL, is an optimization of INL; it caches the data of multiple drive tables at one time, and then uses the data in the Join Buffer to batch match the data read by the inner loop, like this

View Code

　　　　Compare each row read in the inner loop with all the records in the buffer, so that you can reduce the number of table readings in the inner loop. For example, if there is no Join Buffer, the drive table has 30 records, and the driven table has 50 records, then the number of table readings in the inner loop should be 30 * 50 = 1500. If the Join Buffer is available and can store 10 Record, then the number of meter readings in the inner loop should be 30/10 * 50 = 150, and the number of times the driven meter must be read is reduced by an order of magnitude.

　　　　When the driven table has no index on the join key and the driven table has no index on the WHERE filter condition, this algorithm is often used to complete the join table, as shown below

　　Index Nested-Loop

　　　　Index nesting loop, or INL for short, is an algorithm for connecting based on the index of the driven table; the records of the driving table are matched with the index of the driven table one by one, avoiding comparison with each record of the driven table, and reducing the number of The matching times of the driving table, the approximate flow is as shown in the figure below

　　　　Let’s take a look at the actual case, first add an index to tbl_user_login_log ALTER TABLE tbl_user_login_log ADD INDEX idx_user_name (user_name);, let’s look at the join table execution plan

　　　　You can see that the index of tbl_user_login_log is in effect, let’s look down

　　　　An interesting thing happened, the driving table became tbl_user_login_log, and tbl_user became the driven table, tbl_user_login_log got the result set after index filtering, and then matched the result set with tbl_user through the BNL algorithm. This is actually MySQL optimized, because tbl_user_login_log has fewer results after index filtering than tbl_user records, so tbl_user_login_log is chosen as the driving table, and the rest is of course taken for granted. Does it feel that MySQL is so powerful?

　　Batched Key Access

　　　　Bulk key access, referred to as BKA, is an optimization of the INL algorithm;

　　　　BKA's optimization of INL is similar to BNL's optimization of SNL, but there are differences; for space reasons, we will release BKA to the next issue, hope you guys will forgive me! It really doesn't work, you come and hit me!

to sum up

　　1. There is a set of algorithms for the selection of the driving table. Those who are interested can go to specialize; the more reliable method to determine is to use EXPLAIN

　　2. The order of joining tables is not to join the third table after the two-by-two combination, but to pass through one record of the drive table to the end. After matching all the associated tables, take the next record of the drive table and repeat the joining table operation;

　　3. MySQL's connection algorithm is based on the nested loop algorithm, and different derived algorithms are used based on different situations

　　4. Regarding ON and WHERE, we will explain in detail in the next article, you can first consider the difference between them and the effective time