Shanghai Tengke Education Dameng database training dry goods sharing Dameng SQL optimization-basic articles (2)

In the previous share of " Dameng SQL Optimization-Basics ", we learned the common SQL operators when doing single-table queries. This time we will take a look at the operators in multi-table queries.

01

Multi-table join operator

When doing multi-table join queries, the SQL operators we may encounter are in the following categories:

NEST LOOP nested loop connection

HASH JOIN

INDEX JOIN index connection

MERGE JOIN merge connection      

Generally, there is more than one table in the query. Different tables have certain relationships. These operators will be involved when processing multiple tables. Here we only look at the case of two tables. The case of multiple tables can be analogized .

 

02

Set up a test environment

Build test tables and enter data

create table t1(id varchar);

create table t2(id varchar);

insert into t1 values('AMEE'),('AMSS'),('BURNING'),('ABED'),('CHALICE');

insert into t2 values ​​('AAAA'), ('AAAA'), ('BBBB'), ('CCCC'), ('DDDD'), ('AAME'), ('AMEE'), ('EEEE' );

 

03

Related tests

NEST LOOP INNER JOIN  : The most basic connection method, splicing a value of one table with all the values ​​of another table to form a large result set, and then filter out the rows that meet the conditions from the large result set

 

/*+ENABLE_HASH_JOIN(0)*/ in SEL9 is the optimizer prompt, here is the statement-level dynamic prompt for ini, which means that hash connection is not enabled.

Here there are 5 rows of data in T1 and 8 rows of data in T2. ​​NEST LOOP JOIN unconditionally forms a 5 * 8 = 40 row table from these two tables, and then filters out the T1.ID for the 40 rows of tables in turn = T2.ID data (SLCT operator).

It is not difficult to see that this method is relatively undesirable to see. If the T1 and T2 tables are very large, the generated tables will be very large, and the upper filter conditions need to be executed a lot of times. On the output, the result set is ordered by the index involved in the left table (T1)

 

HASH JOIN : In the absence of an index, most of the connection processing methods are to make the connection column of one table into a HASH table, and the data of the other table matches the HASH table, and returns that meet the conditions

The form of the plan is generally as follows

HASH JOIN will be selected by default when the two tables are connected equally. Use the connection column of one table as the Hash key to construct a HASH table, and perform HASH detection on the connection column of the other table to find records that meet the conditions. Because the HASH hit rate is high, the efficiency of HASH JOIN will be much higher than NEST LOOP in the case of large amounts of data. There are three main calculations.

1. Full table scan of the left and right tables (T1, T2)

2. HASH table calculation (depending on the computational complexity of the HASH algorithm)

3. Match each row of data in the right table (T2)

Since all the output is completed when scanning the right table, the output of HASH JOIN is ordered by the index involved in the right table

 

INDEX JOIN : Take out the data of one table, and perform a range scan on another table to find out the required data rows. An index is required on the connection column of the right table

This approach is basically equivalent to doing a statement like select * from t2 where id =? On the right table (T2) N times, and the cost depends on the number of rows in the result set of the statement like select * from t2 where id =? And the number of rows in the left table T1, if both are very small, then this method is the most ideal connection method. This connection mode is output in order according to the index involved in the base table operator of T1.

 

MERGE JOIN : Both tables scan the index and merge according to the index order

Two ordered indexes of SSCN are required at the same time, and the values ​​satisfying the conditions are output to the result set, which is more efficient than NEST LOOP. The output here is ordered by the index of T1.

 

SPL : After one table outputs a row of results, it is brought to another table for execution, and the output is output when the conditions are met

 

In the case of two tables here, we see that the first is to scan T1 to get the data, and then each row of results is placed in T2 for filtering (SEEK I_TEST2 scan_range[var1,var1]). In the case of two tables, This processing method is basically similar to INDEX JOIN, but when INDEX JOIN cannot be used in some more complex situations, this processing method helps to improve processing efficiency.

Guess you like

Origin blog.csdn.net/qq_42726883/article/details/108487406