In-depth understanding of Oracle - Nested loop join and Sort merge join with detailed explanation of the three major table connection methods

        The essence of relational database technology is normalized data storage through relational tables, and information retrieval and processing through various table connection technologies and various types of indexing technologies. Let's learn and share Oracle's three major table connection technologies. .

        In earlier versions, Oracle provided nested-loop join. The connection of two tables is equivalent to a double loop. Assuming that the two tables have m rows and n rows respectively, if the inner loop is a full table scan, the time complexity is O(m* n); if the inner loop is an index scan, the time complexity is O(m*㏒n); and the time complexity of hash join is O(m*n). Therefore, after 10g, hash join becomes the default connection method.

        For all three connections, we can use hint to force the optimizer to go: use_hash, use_nl, use_merge

1. Overview of Three Connections

1.nested loop

        Extract a record from table A, traverse table B to find matching records, and then extract the next record from table A and traverse table B, which is a double loop.

2.hash join

        Calculate a hash table from table A according to the connection key, then extract records from table B one by one, calculate the hash value, and match the records that meet the conditions according to the hash of the hash to table A.

3.sort merge join

        Sort the A and B tables in order, then do the merge, and select the eligible ones.

 

Two kinds of connection details

1.Nested Loop Join

a. Execution principle

        E.g:

 

select t1.*,t2.* from t1,t2 where t1.col1=t2.col2;

 

        The access mechanism is as follows:

 

for i in (select * from t1) loop
  for j in (select * from t2 where col2=i.col1) loop
  display results;
  end loop;
  end loop;

 

        Similar to a nested loop, when the nested loop is executed, the outer loop first enters the inner loop, and after the inner loop terminates, the outer loop is executed, and then the outer loop enters the inner loop. When terminated, the program ends.

b. The steps are as follows

        ① Determine the drive table

        ② Assign the inner table to the driver table

        ③ For each row of the driving table, access all rows of the driven table

c. The execution plan is roughly as follows

NESTED LOOPS
outer_loop
inner_loop

        When the optimizer mode is FIRST_ROWS, we often find that there are a large number of NESTED LOOPs. At this time, when returning data to the user, we do not need to cache any data, which is a highlight of the nested loop.

d. Usage scenarios

        It is generally used when there is an index in the connected table, and the index selectivity is good (that is, the Selectivity is close to 1), that is, the recordset of the driving table is relatively small (<10000) and the inner table needs to have an effective access method (Index ), it should be noted that the order of JOIN is very important, the recordset of the driving table must be small, and the response time of returning the result set is the fastest.

e. Relationship with index

        Nested loops and indexes are like a pair of twin brothers, and generally need to be considered and designed together. This can be seen from the execution mechanism of the optimizer. For example, there are 2 tables, one with 10 records, and one with 10 million records. The table is the driving table, and the cost is: 10* (the cost of querying a record in the large table through the index), if the large table of 10 million does not have an index, then the cost of COST can be imagined.

        Therefore, when joining multiple tables, pay attention to whether an index needs to be created on the join field of the driven table, or whether a composite index needs to be created on the join field and other constraint fields of the table.

 

2.Sort Merge Join

a. Execution principle

        E.g:

select t1.*,t2.* from t1,t2 where t1.id=t2.id;

        The access mechanism is as follows:

        Access t1, and order by t1_1.id, where id represents the connection field;

        Visit t2, and order by t2_1.id join t1_1.id = t2_1.id, alternately compare and merge, but it doesn’t matter what the drive is.

b. Usage scenarios

        Although hash join is used to replace Sort Merge Join, if your server's CPU resources and MEM resources are very tight, it is recommended to use SORT MERGE JOIN, because hash join requires more resources than sort merge join, especially is the cpu.

        The 10g sql tuning documentation says:

        On the other hand, sort-merge joins can perform better than hash joins if both of the following conditions are met:

        The row sources are already sorted. 

        A sort operation does not have to be done.

        Therefore, Sort Merge Join is probably used when there is no index and the data is already sorted.

 

Article source: http://www.2cto.com/database/201301/186885.html

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326687128&siteId=291194637