SQL Tuning Tips: Statistics (Welfare at the end of the article)

Click on " Asynchronous Community " above and select "Picture Official Account"

Technical dry goods, delivered as soon as possible


Stats are similar to scouts in war, if intelligence is not done well, wars are lost. In the same way, if the statistical information of the table is not collected correctly, or the statistical information of the table is not updated in time, the execution plan of the SQL will deviate, and the SQL will have performance problems. Statistics are collected to allow the optimizer to choose the best execution plan to query the data in the table with the least cost (cost).

Statistics are mainly divided into table statistics, column statistics, index statistics, system statistics, data dictionary statistics, and dynamic performance view base table statistics.

The statistical information of the system, the statistical information of the data dictionary, and the statistical information of the base table of the dynamic performance view are not discussed in this article. This article focuses on the statistical information of the table, the statistical information of the column, and the statistical information of the index.

The statistical information of the table mainly includes the total number of rows in the table (num_rows), the number of blocks in the table (blocks), and the average row length (avg_row_len). We can obtain the statistical information of the table by querying the data dictionary DBA_TABLES.

Now we create a test table T_STATS.

We look at the statistics of commonly used tables in the table T_STATS.


Because T_STATS is a newly created table and no statistics have been collected, the query data from DBA_TABLES is empty.

Now let's collect statistics for the table T_STATS.

Let's look at the statistics of the table again.


From the query, we can see that the table T_STATS has a total of 72 to 674 rows of data, 1 to 061 data blocks, and the average row length is 97 bytes.

The statistics of the column mainly include the cardinality of the column, the number of null values ​​in the column, and the data distribution (histogram) of the column. We can view the statistics of the column through the data dictionary DBA_TAB_COL_STATISTICS.

Now we look at the commonly used column statistics of the table T_STATS.


In the above query, the first column indicates the column name, the second column indicates the cardinality of the column, the third column indicates the number of NULL values ​​in the column, the fourth column indicates the number of buckets in the histogram, and the last column indicates the histogram. Types of.

At work, we often use the script below to view table and column statistics.


The statistics of the index mainly include the index blevel (index height -1), the number of leaf blocks (leaf_blocks), and the clustering factor (clustering_factor). We can view the statistics of the index through the data dictionary DBA_INDEXES.

We create an index on the OBJECT_ID column.

When an index is created, the statistics of the index are automatically collected. Run the following script to view the statistics of the index.

In later chapters, we will detail how table statistics, column statistics, and index statistics are applied to costing.

Statistics important parameter settings

We usually use the script below to collect table and index statistics.

ownname indicates the owner of the table and is not case-sensitive.

tabname represents the table name, which is not case-sensitive.

granularity indicates the granularity of collecting statistics. This option is only valid for partitioned tables. The default is AUTO, which means that Oracle can determine how to collect statistics of partitioned tables according to the partition type of the table. For this option, we generally use the AUTO mode, which is the default mode of the database. Therefore, in the following scripts, this option is omitted.

estimate_percent represents the sampling rate, in the range of 0.000 001 to 100.

We generally perform 100% sampling for tables smaller than 1GB, because the table is small, even 100% sampling speed is relatively fast. Sometimes small tables may have uneven data distribution. If there is no 100% sampling, the statistical information may be inaccurate. We therefore recommend 100% sampling of small tables.

We generally sample 50% for tables with a table size of 1GB to 5GB, and 30% for tables larger than 5GB. If the table is particularly large, with tens or even hundreds of GB, we recommend that the table should be partitioned first, and then collect statistics for each partition separately.

In general, in order to ensure more accurate statistics, we recommend that the sampling rate should not be lower than 30%.

We can view the sample rate of the table using the script below.

From the above query we can see that the table T_STATS is 100% sampled. Now we set the sample rate to 30%.


From the above query we can see that the sampling rate is 30% and the total number of rows in the table is estimated to be 73-067, when in fact the total number of rows in the table is 72-674. When the sampling rate is set to 30%, a total of 21-920 pieces of data are analyzed, and the total number of rows in the table is equal to round(21-920*100/30), which is 73-067.

除非一个表是小表,否则没有必要对一个表100%采样。因为表一直都会进行DML操作,表中的数据始终是变化的。

method_opt 用于控制收集直方图策略。

method_opt => 'for all columns size 1'

表示所有列都不收集直方图,如下所示。

我们查看直方图信息。


从上面查询我们看到,所有列都没有收集直方图。

method_opt => 'for all columns size skewonly'

表示对表中所有列收集自动判断是否收集直方图,如下所示。

我们查看直方图信息,如下所示。


从上面查询我们可以看到,除了OBJECT_ID列和EDITION_NAME列,其余所有列都收集了直方图。因为EDITION_NAME列全是NULL,所以没必要收集直方图。OBJECT_ID列选择性为100%,没必要收集直方图。

在实际工作中千万不要使用

method_opt => 'for all columns size skewonly'

 收集直方图信息,因为并不是表中所有的列都会出现在where条件中,对没有出现在where条件中的列收集直方图没有意义。

method_opt => 'for all columns size auto'

表示对出现在where条件中的列自动判断是否收集直方图。

现在我们删除表中所有列的直方图。

我们执行下面SQL,以便将owner列放入where条件中。

接下来我们刷新数据库监控信息。

我们使用method_opt => 'for all columns size auto'方式对表收集统计信息。

然后我们查看直方图信息。


从上面查询我们可以看到,Oracle自动地对owner列收集了直方图。

思考,如果将选择性比较高的列放入where条件中,会不会自动收集直方图?现在我们将OBJECT_NAME列放入where条件中。

然后我们刷新数据库监控信息。

我们收集统计信息。

我们查看OBJECT_NAME列是否收集了直方图。

从上面查询我们可以看到,OBJECT_NAME列没有收集直方图。由此可见,使用AUTO方式收集直方图很智能。mothod_opt默认的参数就是 for all columns size auto。method_opt => 'for all columns size repeat'表示当前有哪些列收集了直方图,现在就对哪些列收集直方图。


本文摘自《SQL优化核心思想》

《SQL优化核心思想》

罗炳森 黄超 钟侥 著

点击封面购买纸书

结构化查询语言(Structured Query Language,SQL)是一种功能强大的数据库语言。它基于关系代数运算,功能丰富、语言简洁、使用方便灵活,已成为关系数据库的标准语言。 本书旨在引导读者掌握SQL优化技能,以更好地提升数据库性能。本书基于Oracle进行编写,内容讲解由浅入深,适合各个层次的读者学习。

本书面向一线工程师、运维工程师、数据库管理员以及系统设计与开发人员,无论是初学者还是有一定基础的读者,都将从中获益。

小福利

关注【异步社区】服务号,转发本文至朋友圈或 50 人以上微信群,截图发送至异步社区服务号后台,并在文章底下留言你学习SQL语言经验,或者试读本书感受,我们将选出3名读者赠送《SQL优化核心思想》1本,赶快积极参与吧!
活动截止时间:2018年 5月10 日


“异步社区”后台回复“关注”,即可免费获得2000门在线视频课程;推荐朋友关注根据提示获取赠书链接,免费得异步图书一本。赶紧来参加哦!

扫一扫上方二维码,回复“关注”参与活动!

阅读原文,购买《SQL优化核心思想》

阅读原文


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325234721&siteId=291194637