[MySQL] Block Nested-Loop Join (BNL) for MySQL Performance Optimization

An introduction I
  believe that many developers/DBAs have been dissatisfied with the way MySQL handles multi-table associations or its performance in the process of using MySQL. For queries that contain joins submitted by development, it is generally more resistant, so it is recommended to split the joins to avoid performance problems that may be caused by joins, and also increase the network interaction between the program and the DB.
Before version 5.5, MySQL itself only supported one way of association between tables, that is, Nested Loop. If the amount of data in the associated table is large, the execution time of the join association will be very long. In versions after 5.5, MySQL optimizes nested execution by introducing the BNL algorithm. This article introduces two join algorithms, Nested-Loop Join (NLJ) and Block Nested-Loop Join (BNL).

Two principles
2.1 Nested Loop Join Algorithm

  NLJ Algorithm : The result set of the driving table/external table is used as the basic data of the loop, and then the data is obtained from the result set one at a time as the filter condition query data of the next table, and then the results are merged. If there is a multi-table join, the result set of the previous table is used as the circular data, and each row is fetched and then matched in the next table of the join, and the obtained result set is returned to the client.
The pseudo-algorithm of Nested-Loop is as follows:
for each row in t1 matching range {
  for each row in t2 matching reference key {
     for each row in t3 {
      if row satisfies join conditions,
      send to client
    }
  }
}
  Because the ordinary Nested-Loop only passes one row into the inner loop at a time, the memory loop has to be executed as many times as there are rows in the outer loop (the result set) . When there is an index on the connection of the inner table, its The scan cost is O(Rn), and if there is no index, the scan cost is O(Rn*Sn). If the inner table S has many records, SimpleNested-Loops Join will scan the inner table many times, and the execution efficiency is very poor.

2.2 Block Nested-Loop Join Algorithm
BNL Algorithm: Store the row/result set of the outer loop into the join buffer, and compare each row of the inner loop with the records in the entire buffer, thereby reducing the number of inner loops.
For example , the result set of the outer loop is 100 rows. Using the NLJ algorithm needs to scan the inner table 100 times. If the BNL algorithm is used, first put the 10 rows of records read from the Outer Loop table (external table) into the join buffer, and then By directly matching these 10 rows of data in the InnerLoop table (internal table), the memory loop can compare with these 10 rows at a time, so that only 10 comparisons are required, and the scan of the internal table is reduced by 9/10. Therefore, the BNL algorithm can significantly reduce the number of inner circular table scans.
For the query described above, if the join buffer is used, the actual join is shown as follows:
for each row in t1 matching range {
   for each row in t2 matching reference key {
    store used columns from t1, t2 in join buffer
    if buffer is full {
      for each row in t3 {
         for each t1, t2 combination in join buffer {
          if row satisfies join conditions,
          send to client
        }
       }
      empty buffer
    }
  }
}


if buffer is not empty {
   for each row in t3 {
    for each t1, t2 combination in join buffer {
      if row satisfies join conditions,
      send to client
     }
  }
}
If the length of the columns t1 and t2 participate in the join is only s, and c is the number of combinations of the two, then the number of times the t3 table is scanned is
(S * C)/join_buffer_size + 1
scan t3 The number of times decreases with the increase of join_buffer_size, until the join buffer can accommodate all t1, t2 combinations, and then increases the join buffer size, the query speed will not become faster.

2.3 MySQL uses Join Buffer to have the following points:
  1. The join_buffer_size variable determines the buffer size.
  2. The join buffer can be used only when the join type is all, index, and range.
  3. Each join that can be buffered will allocate a buffer, which means that a query may eventually use multiple join buffers.
  4. The first nonconst table does not allocate a join buffer, even if its scan type is all or index.
  5. The join buffer will be allocated before the join, and released after the query is executed.
  6. Only the columns participating in the join are stored in the join buffer, not the entire data row.

3 How to use
  version 5.6 and later, the block_nested_loop parameter in the optimizer management parameter optimizer_switch controls whether BNL is used for the optimizer . It is enabled by default. If it is set to off, the optimizer will choose the NLJ algorithm when choosing the join method.

Four References
     Version 5.6 of BNL supports outer join and semi-join, and is associated with other features such as BKA. I will write an article to sort out other optimization points later.
    "Nested-Loop Join Algorithms"
    "Block Nested-Loop and Batched Key Access Joins"
    "mysql's join buffer"

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326506240&siteId=291194637