Nested Loops, Hash Join and Sort Merge Join. Differences for three different joins:

Original text: https://blog.csdn.net/tianlesoftware/article/details/5826546

 

 

Nested Loops, Hash Join and Sort Merge Join. Differences for three different joins:

 

 

一. NESTED LOOP:

Nested loop joins are a better choice when the subset of data being joined is small. In the nested loop, the inner table is driven by the outer table, and each row returned by the outer table must be retrieved in the inner table to find a matching row, so the result set returned by the entire query cannot be too large (more than 10,000 is not suitable). Use the table that returns a smaller subset as the outer table (the default outer table of CBO is the driving table), and there must be an index on the join field of the inner table. Of course, you can also use the ORDERED prompt to change the CBO default drive table, use USE_NL (table_name1 table_name2) to force CBO to perform nested loop joins.

        

Nested loop is generally used when the connected table has an index and the index selectivity is good.

 

Steps: Determine a driver table (outer table), another table is the inner table, each row in the driver table and the corresponding record in the inner table JOIN. Similar to a nested loop. The recordset suitable for the driver table is relatively small (<10000) and the inner table needs to have an efficient access method (Index). It should be noted that the order of JOIN is very important, the recordset of the driving table must be small, and the response time of returning the result set is the fastest.

cost = outer access cost + (inner access cost * outer cardinality)

 

| 2 | NESTED LOOPS | | 3 | 141 | 7 (15)|
| 3 | TABLE ACCESS FULL | EMPLOYEES | 3 | 60 | 4 (25)|
| 4 | TABLE ACCESS BY INDEX ROWID| JOBS | 19 | 513 | 2 (50)|
| 5 | INDEX UNIQUE SCAN | JOB_ID_PK | 1 | | |

EMPLOYEES为outer table, JOBS为inner table.

 

 

二. HASH JOIN :

Hash join is a common way for CBO to join large data sets. The optimizer uses the smaller of the two tables (or data source) to build a hash table in memory using the join key, and then scans the larger table and detects the hash. list to find the line that matches the hash table.

This approach works well when the smaller table can fit entirely in memory, so that the total cost is the sum of the cost of accessing the two tables. However, when the table is very large, it cannot be completely put into the memory. At this time, the optimizer will divide it into several different partitions, and the part that cannot be put into the memory will write the partition into the temporary segment of the disk. Larger temporary segments to maximize I/O performance.

You can also use the USE_HASH(table_name1 table_name2) hint to force a hash join. If you use a hash connection, the HASH_AREA_SIZE initialization parameter must be large enough. If it is 9i, Oracle recommends using SQL workspace automatic management, setting WORKAREA_SIZE_POLICY to AUTO, and then adjusting PGA_AGGREGATE_TARGET.

        

Hash join is used when the amount of data in the two tables is very different.

 

Steps: Construct a HASH table (for JOIN KEY) in memory with the smaller of the two tables, scan the other table, and also perform HASH on the JOIN KEY to detect whether JOIN is possible. It is suitable for the case where the recordset is relatively large. It should be noted that if the HASH table is too large to be constructed in memory at one time, it will be divided into several partitions and written to the temporary segment of the disk, which will cost an extra write, which will reduce efficiency.

cost = (outer access cost * # of hash partitions) + inner access cost

 

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 665 | 13300 | 8 (25)|
| 1 | HASH JOIN | | 665 | 13300 | 8 (25)|
| 2 | TABLE ACCESS FULL | ORDERS | 105 | 840 | 4 (25)|
| 3 | TABLE ACCESS FULL | ORDER_ITEMS | 665 | 7980 | 4 (25)|
--------------------------------------------------------------------------

ORDERS is HASH TABLE, ORDER_ITEMS scan

 

 

三.SORT MERGE JOIN

In general, the effect of hash join is better than that of sort merge join. However, if the row source has already been sorted, it does not need to be sorted again when performing sort merge join. At this time, the performance of sort merge join will be better than hash join. . You can use USE_MERGE(table_name1 table_name2) to force a sort-merge join.

        

Sort Merge join is used when there is no index and the data is already sorted.

 

cost = (outer access cost * # of hash partitions) + inner access cost

 

Steps: Sort the two tables, then merge the two tables. Typically, this JOIN method is only used when:

1. RBO mode

2. Inequality association (>,<,>=,<=,<>)

3.HASH_JOIN_ENABLED=false

4. The data source is sorted

 

 

4. Comparison of three connection working methods:

     

       Hash join works by hashing a table (usually the smaller table), storing column data in a hash list, extracting records from another table, performing hash operations, and finding the corresponding data in the hash list. value to match.

        

Nested loops work by reading data from one table and accessing another table (usually an index) for matching. Nested loops are suitable when a related table is relatively small, and the efficiency will be higher.

 

         Merge Join first sorts the associated columns of the associated table, and then extracts data from the respective sorting table and matches it to another sorting table. Because merge join needs to do more sorting, it consumes more resources. Generally speaking, where merge joins can be used, hash joins can perform better.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325217589&siteId=291194637