In-depth understanding of Oracle - those things that drive tables

        Take an example comparing two dictionaries:

       One dictionary has an index directory (dict a), one does not (dict b), and now you want to find out the similarities and differences of all words starting with a, then when comparing, how would you compare?

        A reasonable approach would be to start with a dictionary dict b without an index, find the page that starts with a, and then for each word, find the corresponding entry in dict a by using the index.

        Would the efficiency be the same if it was the other way around?

        The driver table, in layman's terms, is which table to start with, and a good driver table is half of the successful optimization. E.g:

select * from a,b where a.id = b.id and a.name = 'Megrain' and b.gender = 'female';

        In the case that the a and b tables are of the same order of magnitude, it is obviously better to use the a table as the driving table, because the name can filter out more data than the gender, so find a way to make your execution plan scan the a table first and then It is ideal to associate with the b table through the nest loop.

        Generally speaking, when there are 2 tables, select the small table; when there are 3 or more tables, select the table with more associations as the driving table.

        However, nothing is absolute, let's look at a case first:

        There are two tables T1 and T2. It can be assumed that T1 stores basic information such as name and ID, and each row is short, while T2 also has an ID column and some description and remark information. Each row may be one or two thousand bytes long.

        For these two tables, using the nested loop connection, the T1 table occupies a relatively small number of blocks and is a small table, while the T2 table occupies several times the number of blocks than T1, which is a large table. Then, it should be better to use T1 as the driving table, but in fact, it is better to use the large T2 table as the driving table.

        A person who doesn't know DBA or development, doesn't know this, in order to force the small table T1 to be the driver table, he wrote the following statement:

select /*+ ordered use_nl(t1,t2) */ from t2,t1 where t1.id=t2.id;

        Later this statement caused some performance issues. By understanding how the nested loop works, it is found that it is indeed better to use a large table as a driving table.

        Here is a problem. It is said that the nested loop needs a small table to be the driving table. If you don't know why, it is easy to cause others to copy what they say. In this way, we only know the surface, but not the inside, and it is easy to affect the judgment made on certain issues in the future.

        The driving table/outer table is also called the outer table. The driving table is only used for nested_loop join and hash join. The driving table is used to drive the query. In CBO, the optimizer will automatically select the driving table according to the cost. The order of the tables is irrelevant.

        Usually, the selectivity of the driver table is high (the ratio of the unique key of the column to the column is high), and there are more restrictions in where, and the table with fewer rows returned is suitable for the driver table.

        In fact, a large table is sometimes used as a driver table. As long as there are more restrictions on the large table and the number of rows returned from the large table is small, the large table is also suitable as a driver table.

        Always keep in mind that the returned result set (the filtered data of a table, or the data produced by a join of some tables) is the smaller table that is suitable for driving the table. That is to say, whichever returns less data, whichever drives the table, that is, the join operation should be driven from the rows that return less data. When we look at the execution plan, the table near the nested loops and hash join is the driving table, which means that the upper table in the execution plan is the driving table.

        The following are a few experiences of quickly judging the drive table, which may be wrong:

        (1) The columns of the clause used for join should be indexed, and the index should be used as much as possible in the Where clause, instead of avoiding the index  

        (2) Join operations should be driven from returning fewer lines

        (3) If the connected tables A and B, the length of the A table is much larger than the B table, it is recommended to drive from the larger A table  

        (4) If the Where clause contains optional conditions, Where No=20, put the most selective part at the end of the expression  

        (5) If only one table has an index and the other table has no index, the table without index is usually used as the driving table. If the No column of the A table has been indexed, but the No column of the B table has not been indexed, then the B table should be used as the driving table and the A table as the driven table.

The so-called drive table under RULE

        Let's take nested loop as an example:

        If the join fields of the two tables have no indexes (usually sort merge/hash join at this time), the driving table will choose the latter;

        If one of the two tables has an index and the other has no index, the driving table is the one without an index, regardless of the order;

        If both tables have indexes, the driving table is the latter table.

        So in fact, under RULE, the order problem only needs to be considered when both tables have indexes of the join field, that is, the small table is placed at the back, and the large table is placed in the front (of course, which is better? This actually depends on The number of eligible records, data distribution and other factors are related, so the actual test should prevail, and if it is under CBO, it has nothing to do with the order. 

        The key is to understand the execution plan, not to remember the rules.

        For example, a table join returns one record, and there are two tables, one with 10 records and one with 10 million records. If both tables have join field indexes.

        If the small table is used as the driving table, the cost is: 10* (the cost of querying a record in the large table through the index);

        With a large table as the driving table, the cost is: 10 million * (the cost of querying a record in a small table through an index).

        Obtaining a record through the index, 10rows table, the cost is usually 3 blocks, 2 blocks for the index, and 1 block for the table. And if it is a 10 million table, the index may reach 4, and the table may be 1. The speed is known by the toes.

 

Article source: http://www.2cto.com/database/201301/186606.html

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326679729&siteId=291194637