Using the Histogram (Histograms)

1. What is the histogram

When analyzing the table or index, the distribution of the histogram for recording data. By obtaining this information, the cost-based optimizer can decide to use the index to return small number of rows, and returns the index to avoid the use of many row-based restrictions. Using the histogram index is not limited, and may be constructed on any column in the histogram table.

The main reason is to help construct a histogram of the optimizer make better planning seriously skewed data in the table: For example, if one or two values constitute the majority of the data in the table (data skew), related index may not be able to help reduce I / O count needed to satisfy the query. Create a histogram can make cost-based optimizer to know when to use an index is most appropriate, or when it should return 80% of the records in a table based on values in the WHERE clause.

2. When using the histogram

It is recommended to use the histogram in the following situations:

When the Where clause refers to the existence of significant deviations column value distribution column: When this deviation is quite obvious, the value of the WHERE clause that will make the optimizer to choose different execution plans. In this case you should use a histogram to help the optimizer to correct execution path. (Note: If the query does not reference the column, create histograms no mistake this is common sense, many DBA creates a histogram on the skewed columns, even without any query references the column.)

When the column values lead to incorrect determination: This usually happens when a multi-table joins, for example, suppose we have a five-table join, the result set only 10 rows. Oracle will be the first in a result set one coupling (set cardinality) as small as possible ways to link up table. It will run faster in the middle of the result set by carrying less load, query. In order to minimize the intermediate result, the optimizer attempts to execute the analysis phase evaluation set groups per SQL result set. Have on the deviation of column histograms will greatly help the optimizer make the right decisions. The optimizer makes an incorrect determination of the size of the intermediate result sets, it may choose a table join optimization method does not reach. To add a column histogram thus often provide the information needed to use the best method of coupling the optimizer.

3. Histogram species

Oracle histogram to improve calculation accuracy and selectivity art non-uniform distribution of data. But in fact, Oracle will use other kinds of different strategies to generate a histogram: One is rarely included for different values of the data set; the other is for containing a lot of different data sets. Oracle will generate the frequency histogram for the first case, the height balanced histogram is generated for the second case. Typically when BUCTET <NUM_DISTINCT table is worth to HEIGHT BALANCED (highly balanced) histogram, and when BUCTET> NUM_DISTINCT value table is obtained when the FREQUENCY (frequency) histogram.

4. How to generate a histogram

When the histogram is generated, specify a size, the size of this with the number of BUCKET. Each BUCKET contains information about the number of rows and field values.

1	EXECUTE DBMS_STATS.GATHER_TABLE_STATS('scott','company',METHOD_OPT =>'FOR COLUMNS SIZE 10 company_code');

The preceding query will create a ten-bucket histogram on the COMPANY table, as shown in Figure 2-2. The values for the COMPANY_CODE column will be divided into the ten buckets as displayed in the figure. This example shows a large number (80 percent) of the company_code is equal to 1430. As is also shown in the figure, most of the width-balanced buckets contain only 3 rows; a single bucket contains 73 rows. In the height-balanced version of this distribution, each bucket has the same number of rows and most of the bucket endpoints are '1430', reflecting the skewed distribution of the data.

5. Experiment

5.1.1 create an experiment table

SQL>createtableobj asselect* from dba_objects;
SQL>createindexobj_id_idx on obj(object_id)onlinenologging;
SQL>SELECTMAX(object_id),MIN(object_id)FROMobj;
MAX(OBJECT_ID)		MIN(OBJECT_ID)
————– 				————–
58410              	2
–制造不均匀数据分布
SQL> UPDATE   obj
        SET   object_id =1000
 WHERE   object_id >100ANDobject_id <54000;
SQL>commit;

5.1.2 create a histogram

BEGIN
   DBMS_STATS.gather_table_stats(cascade            =>TRUE,
                                  degree             =>2,
                                  estimate_percent   =>100,
                                  force              =>TRUE,
                                  ownname            =>'FUNG',
                                  tabname            =>'OBJ');
END;
/

In gather_table_stats method, the default value method_opt: FOR ALL COLUMNS SIZE AUTO, it is also histogram statistics are gathered (and related oracle version), which specifies the degree of parallelism depending on the number of the host CPU may be, designated estimate_percent the sampling rate, Auto oracle goal is to collect the sample rate is determined, will be drawn in accordance with the analysis result of the sampling data for the histogram, of course, can be artificially specified sampling rate. Such as: estimate_percent => 20 specify the sampling ratio is 20%, cascade => true statistics of the specified index gather list, the parameter defaults to false, so the default when dbms_stats collect statistical information is not collected by the table index information.

- Note: The distribution of ENDPOINT_NUMBER, ENDPOINT_VALUE of

1
2
3

SQL>SELECT   *
  FROM   user_histograms
 WHERE   table_name ='OBJ'ANDcolumn_name ='OBJECT_ID';

SQL>SELECTCOLUMN_NAME,HISTOGRAM FROMUSER_TAB_COLS WHERE TABLE_NAME='OBJ' AND column_name='OBJECT_ID';
COLUMN_NAME          HISTOGRAM
——————– 			——————————
OBJECT_ID            HEIGHT BALANCED

5.1.3 Histogram implementation plan

SQL>selectobject_name from obj whereobject_id=100;

CBO select index range scan.

SQL>selectobject_name from obj whereobject_id=1000;

Histogram statistics, Oracle object_id know the value of 1000 probably accounts for more than 80% of the total amount of data, then choose a full table scan.

5.1.4 removes the histogram implementation plan

BEGIN
   DBMS_STATS.gather_table_stats(
      cascade            =>TRUE,
      degree             =>2,
      estimate_percent   =>100,
      force              =>TRUE,
      ownname            =>'FUNG',
      tabname            =>'OBJ',
      method_opt         =>'FOR ALL COLUMNS SIZE 1'
   );
END;
/

Removes the histogram, set method_opt: FOR ALL COLUMNS SIZE 1 to

test result

The SQL> SELECTCOLUMN_NAME, the HISTOGRAM FROMUSER_TAB_COLS the WHERE TABLE_NAME = '= OBJ'ANDcolumn_name' the OBJECT_ID '; 
COLUMN_NAME the HISTOGRAM 
------- ---------- 
the OBJECT_ID NONE 
histogram information has been deleted

Still the index range scan, correct. Next, look at the execution plan id = 1000:

Obviously, this time to a full table scan, the results of Oracle still silly to use an index scan.

Original: Large column using a histogram (Histograms)