Three algorithm ideas for join table connection: Nested-Loop Join and Index Nested-Loop Join and Block Nested-Loop Join and BKA

一.Nested-Loop Join

In Mysql, the algorithm idea of ​​Nested-Loop Join is used to optimize join, and Nested-Loop Join is translated into Chinese as "nested loop join".

For example:
select * from t1 inner join t2 on t1.id=t2.tid
(1) t1 is called the outer table and can also be called the driving table.
(2) t2 is called the inner table and can also be called the driven table.

In the implementation of Mysql, Nested-Loop Join has three implementation algorithms:

  • Simple Nested-Loop Join: SNLJ, simple nested loop join
  • Index Nested-Loop Join: INLJ, index nested loop join
  • Block Nested-Loop Join: BNLJ, cache block nested loop join

When selecting the Join algorithm, there will be priority, and theoretically, it will be prioritized to determine whether to use INLJ, BNLJ:
Index Nested-LoopJoin> Block Nested-Loop Join> Simple Nested-Loop Join

二.Simple Nested-Loop

  1. Simple nested loop connection is actually a simple and rude nested loop. If table1 has 10,000 data and table2 has 10,000 data, then the number of data comparisons = 10,000 * 1 trillion = 1 billion times, this query efficiency It will be very slow.
  2. So Mysql continues to optimize, and then derives two NLJ algorithms, Index Nested-LoopJoin and Block Nested-Loop Join. When executing a join query, mysql will choose one of two types for join query according to the situation.

3. Index Nested-LoopJoin (reduce the number of matches of the inner table data)

  1. The index nested loop connection is an algorithm for connecting based on the index. The index is based on the inner table. It directly matches the inner table index through the outer table matching condition to avoid comparison with each record of the inner table. The index query reduces the number of matches to the inner table, which greatly improves the performance of join:

The original number of matches = the number of rows in the outer table * the number of rows in the inner table The number of
optimized matches = the number of rows in the outer table * the height of the index of the inner table

  1. Usage scenario: Index Nested-LoopJoin can be used to connect only when the inner table join column has an index.
  2. Since the index is used, if the index is an auxiliary index and the returned data also includes other data in the inner table, it will go back to the inner table to query the data, and some IO operations will be added.

Four. Block Nested-Loop Join (reduce the number of cycles of the inner table data)

  1. Cache block nested loop connection caches multiple pieces of data at one time, caches the columns participating in the query in the Join Buffer, and then matches the data in the join buffer with the data in the inner table in batches, thereby reducing the number of inner loops (Traversing the inner table once can match the outer table data in the Join Buffer in batches).
  2. When Index Nested-Loop Join is not used, Block Nested-Loop Join is used by default.
  3. What is Join Buffer?
    (1) Join Buffer will cache all columns participating in the query instead of only Join columns.
    (2) You can adjust the cache size of join_buffer_size
    (3) The default value of join_buffer_size is 256K, and the maximum value of join_buffer_size is 4G-1 before MySQL version 5.1.22, and only later versions can apply for larger than 4G under the 64-bit operating system Join Buffer space.
    (4) To use the Block Nested-Loop Join algorithm, you need to enable the optimizer management configuration of the optimizer_switch setting block_nested_loop to be on, which is enabled by default.

5. BKA

     The working steps of Batched Key Access Join algorithm are as follows:

1) Put the relevant columns in the external table into the Join Buffer.

2) Send the Key (index key value) to the Multi-Range Read (MRR) interface in batches

3) Multi-Range Read (MRR) sorts the received Key according to its corresponding ROWID, and then reads the data.

 

4) Return the result set to the client.

6. How to optimize Join speed

  1. Use a small result set to drive the large result set to reduce the amount of data in the outer loop:
    If the columns connected by the small result set and the large result set are all index columns, mysql will also choose to use the small result set to drive the large result set when it is internally connected. Because the cost of index query is relatively fixed, at this time, the less the outer loop, the faster the join speed.
  2. Increase index for matching conditions: strive to use INLJ to reduce the number of cycles of the inner table
  3. Increase the size of the join buffer size: When using BNLJ, the more data is cached at a time, the less the number of cycles of the outer table is
  4. Reduce unnecessary field queries:
    (1) When BNLJ is used, the fewer fields, the more data cached in the join buffer, and the fewer cycles of the outer table;
    (2) When INLJ is used, if It is not necessary to return to the table query, that is, to use the covering index, you may be able to prompt the speed. (Unverified, just an inference)

Guess you like

Origin blog.csdn.net/qq_42000661/article/details/108578997