Merge Join vs. Hash Join vs. Nested Loop

Original article, please be sure to put the following paragraph at the beginning of the article (retain the hyperlink) when reprinting.
This article is forwarded from the technology world , the original link is  http://www.jasongj.com/2015/03/07/Join1/

Nested Loop,Hash Join,Merge Join介绍

  • Nested Loop:
    For the case where the connected data subset is small, Nested Loop is a better choice. Nested Loop is to scan a table (outer table), every time a record is read, it will search in another table (inner table) according to the index on the Join field. If there is no index on the Join field, the query optimizer will generally not choose Nested. Loop. In the Nested Loop, the inner table (usually a large table with an index) is the outer table (also called a "driver table"), usually a small table-it is not a small table relative to other tables, and the absolute value of the number of records is also small , Does not require an index) drive, each row returned by the outer table must be retrieved in the inner table to find a matching row, so the result set returned by the entire query cannot be too large (more than 10,000 is not suitable).

  • Hash Join:
    Hash Join is a common way to connect large data sets. The optimizer uses the smaller (relatively small) of the two tables to build a hash table in memory using the Join Key, and then scans the larger table and probes Hash table to find out the rows that match the Hash table.
    This method is suitable for the situation where a smaller table can be put in memory completely, so that the total cost is the sum of the cost of accessing the two tables. However, when the table is very large, it cannot be completely put into memory. At this time, the optimizer will divide it into several different partitions. The part that cannot be put into memory will write the partition into the temporary segment of the disk. At this time, Larger temporary segments to maximize I/O performance. It can work well in the environment of large tables without indexes and parallel query, and provide the best performance. Most people say it is Join's heavy lift. Hash Join can only be applied to equivalent joins (such as WHERE A.COL3 = B.COL4), which is determined by the characteristics of Hash.

  • Merge Join:
    Generally, the effect of Hash Join is better than that of sort merge join. However, if the two tables have been sorted, there is no need to sort again when performing sort merge join. At this time, the performance of Merge Join will be better than Hash Join . The operation of Merge join is usually divided into three steps:
      1. Do table access full for each table connected;
      2. Sort the results of table access full.
      3. Perform merge join to merge the sorted results.
    In the case where a full table scan is preferable to an index range scan followed by table access, Merge Join will perform better than Nested Loop. When the table is extremely small or extremely large, full table access may be more effective than index range scans. The performance overhead of Merge Join is almost all in the first two steps. Merge Join can be suitable for non-equivalent Join (>, <, >=, <=, but does not contain !=, that is, <>)

Nested Loop,Hash JOin,Merge Join对比

category Nested Loop Hash Join Merge Join
Conditions of Use Any condition Equivalent connection (=) Equivalent or non-equivalent connection (>, <, =, >=, <=), except for'<>'
related resources CPU, optical disc I / O Memory, temporary space Memory, temporary space
Features When there is a highly selective index or a restricted search, the efficiency is relatively high, and it can quickly return to the first search result. Hash Join is more effective than Nested Loop when there is a lack of index or when the index conditions are ambiguous. Usually faster than Merge Join. In the data warehouse environment, if the number of records in the table is large, the efficiency is high. Merge Join is more effective than Nested Loop when there is a lack of index or when the index conditions are ambiguous. When non-equal join, Merge Join is more effective than Hash Join
Disadvantage When the index is lost or the query conditions are not restricted enough, the efficiency is very low; when the number of records in the table is large, the efficiency is low. To build a hash table, a large amount of memory is required. The first result is returned slowly. All tables need to be sorted. It is designed for optimal throughput and does not return data until all results are found.

Guess you like

Origin blog.csdn.net/qq_32323239/article/details/103635931