Oracle 12c 新特性(3)_Attribute Clustering

Oracle 12 c 新特性:http://docs.oracle.com/database/121/NEWFT/chapter12102.htm#NEWFT499

1.3 Attribute Clustering

Attribute Clustering 是表级别命令,基于确定列的内容,簇(clusters)数据在物理存储上非常接近。这个命令用于任何类型的直接路线操作,比如批量插入或者移动操作。

在物理存储上接近,逻辑上在一起的存储数据可以大量减少数据处理量,并且提高压缩率。(总之就是提高数据处理performance滴)

在Oracle数据仓库存储中,对于Attribute Clustering有如下说明:


关于Attribute Clustering

在基于表的某个列集或其他表的一组列集,Attribute Clustering 表存储的数据在磁盘上非常接近地以有序的方式存储。

我们可以根据特定的列的先行次序或利用允许多位聚簇函数去聚簇

我们可以根据指定的列或通过使用一个函数,它允许多维集群的线性顺序聚集(也称为交错聚类)。 Attribute clustering提高Zone maps, Exadata Storage Indexes, 和In-memory min/max pruning效率。查询簇列将仅仅进入簇区域。当attribute clustering被定义在一个分区表clustering被用户所有分区。

Attribute clustering 是表属性命令。它并不强制执行每个DML操作,但是仅仅影响直线路径插入数据,数据移动,或者创建表。常规的表上的DML操作不受attribute clustering影响。这就意味着任何聚簇数据的操作仅仅在当前的工作数据集上完成。这就相当于人工执行ORDER BY命令,比如作为CTAS(create table as)操作的一部分。

以如下方法cluster data:

 1.聚类基于表的一个或多个列,且表的attribute clustering被定义;

2.聚类基于一个或者多列,和定义attribute clustering的表相交(join)。聚类基于相交的列被称作join attribute clustering。表可以通过主外键关系连接,但是外键不是必须强制执行。

因为星型查询通常制定维度层级,如果事实表是基于一个或者多个维度表的列(属性)的簇是有利的。利用join attribute clustering,使得事实表可以连接一个或者多个维度表,进而凭借维度层级列来聚簇事实表数据。为了从一个或者多个维度表上的列聚簇一个事实表,连接维度表必须在维度表的主键或者唯一键上。Join attribute clustering在星型查询语境下被认为是层级聚簇(hierarchical clustering ),因为表数据被以维度层级聚簇,每个组成一个层级列的有序列清单(如:国家,州以及城市列组成一个地区层级)。

注意:同Oracle Table Clusters相比,join attribute clustered表不能从在相同数据库块的一组表中存储数据。比如:考虑一个属性聚簇表sales连接一个维度表products。sales表仅仅包含sales表的行,但是排序的行将基于与products表相交的列的值。在数据移动,直接路径插入以及CTAS操作过程中,合适的相交将被执行。

Attribute Clustering类型

Attribute clustering是一个用户定义在表上的命令,以提供表上一列或者多列的数据聚簇。该命令能在创建或者更新表时使用。

Oracle 数据库提供如下类型的attribute clustering:

1.线性排序(Linear Ordering)Attribute Clustering

2.交错排序(Interleaved Ordering) Attribute Clustering

不论哪种attribute clustering被使用,都可以聚簇在单独表上或者链接多张表(join attribute clustering)。

Attribute Clustering with Linear Ordering

线性存储数据是根据特定列的次序,这是聚簇默认的方式。比如:线性存储在SALES表的列(prod_id,channel_id),以prod_id为第一排序,channel_id为第二排序。有序的数据存储在磁盘与集群列数据非常接近。


线性排序可以定义在一个单独的表上或者多个表——通过一个主外键关系连接。

利用 CLUSTERING ... BY LINEAR ORDER 命令实现基于指定列有序的attribute clustering。

基于先行排序的Attribute clustering在以下场景应用最好:

1. 在单独表上,查询指定的列的前缀,包括在CLUSTERING语句上。

比如:如果查询sales经常定义在customer ID或者customer ID 和product ID组合上,我们可以利用cust_id,prod_id列的次序,在表上聚簇数据。

2. 列带有CLUSTERING语句,具有基数可接受的级别。

"Advantages of Attribute-Clustered Tables"场景描述下,从一个列的谓词上减少数据,潜在的数据缩减正比例增加

线性聚簇和zone maps组合使用有降低I/O .

Attribute Clustering with Interleaved Ordering

交错排序用于基于Z-order曲线拟合(  Z-order curve fitting)的  特别的多维聚簇技术。它映射多列属性值(多维值点)到一个单独维度值,进而保留列值(数据点data points)的多维局部性。在单表或者多表上,交错排序都被支持。不像线性排序,这种方法不要求聚类定义的前列(leading columns)呗显示以获得I/O优化的福利, "Advantages of Attribute-Clustered Tables" 有详细描述。

列可以单独使用或者组团成列组。每个单独的列或者列组被用于构成集群中的多维数据点中的一个。列组利用('..')被括号括起来,并且必须遵循唯独层级,从一个粗粒度级别到最好的粒度级别。如(product_categoryproduct_subcategory)。

使用CLUSTERING ... BY INTERLEAVED ORDER 命令来交错排序聚类。

交错排序对于在多列上具有不同谓词的sql操作最有效。这通常针对一维模型下,在维度表和谓词变化数量上,进行星型查询谓词。维度表上的列可能包含一个层级,如产品类别和子类别这个层级。在这种情况下,事实表的簇可能发生在维度表上以形成一个层级。这是 星型模式有丝毫参考层级聚簇以实现join attribute clustering 的原因。如:如果从不同维度查询sales指定列,我们可以根据这些维度聚类sales表。

 在星型模式查询下,交错簇联合zone maps有效优化I/O。另外,使用zone maps, 交错簇为查询提供有效的I/O优化,同事提高压缩,因为相同的列值彼此接近,很容易被压缩。

Example: Attribute Clustered Table

An example of how a clustered table looks is illustrated in Figure 12-1. Assume you have a table sales with columns (category, country). 左侧的是线性排序,右侧是交错排序。 Observe that, in the interleaved-ordered table, there are contiguous regions on disk that contain data with a given category and country.

Figure 12-1 Attribute-Clustered Tables


Guidelines for Using Attribute Clustering

定义一个 attribute clustered table需要考虑:

  • Use attribute clustering in combination with zone maps to facilitate zone pruning and its associated I/O reduction.

  • Consider large tables that are frequently queried with predicates on medium to low cardinality columns(The lower the cardinality, the more duplicated elements in a column. Thus, a column with the lowest possible cardinality would have the same value for every row. SQL databases use cardinality to help determine the optimal query plan for a given query..

  • Consider fact tables that are frequently queried by dimensional hierarchies.

  • For a partitioned table, consider including columns that correlate with partition keys (to facilitate zone map partition pruning).

  • For linear ordering, list columns in prefix-to-suffix order.

  • Group together columns that form a dimensional hierarchy. This constitutes a column group. Within each column group, list columns in order of coarsest(粗糙) to finest granularity(粒度).

  • If there are more than four dimension tables, include the dimensions that are most commonly specified with filters. Limit the number of dimensions to two or three for better clustering effect.(控制维度在2~3个)

  • Consider using attribute clustering instead of indexes on low to medium cardinality columns.

  • If the primary key of a dimension table is composed of dimension hierarchy values (for example, the primary key is made up of year, quarter, month, day values), make the corresponding foreign key as clustering column instead of dimension hierarchy.

Advantages of Attribute-Clustered Tables

  • Eliminates storage costs associated with using indexes

  • Enables the accessing of clustered regions rather than performing random I/O or full table scans when used in conjunction with zone maps

  • Provides I/O reduction when used in conjunction with any of the following:

    • Oracle Exadata Storage Indexes

    • Oracle In-memory min/max pruning

    • Zone maps

    Attribute clustering provides data clustering based on the attributes that are used as filter predicates. Because both Exadata Storage Indexes and Oracle In-memory min/max pruning track the minimum and maximum values of columns stored in each physical region, clustering reduces the I/O required to access data.

    I/O pruning using zone maps can significantly reduce I/O costs and CPU cost of table scans and index scans.

  • Enables clustering of fact tables based on dimension columns in star schemas

    Techniques such as traditional table clusters do not provide for ordering by columns of other tables. In star schemas, most queries qualify dimension tables and not fact tables, so clustering by fact table columns is not effective. Oracle Database supports clustering on columns in dimension tables.

  • Improves data compression ratios and in this way indirectly improves table scan costs

    Compression can be improved because, with clustering, there is a high probability that clustered columns with the same values are close to each other on disk, hence the database can more easily compress them.

  • Minimizes table lookup and single block I/O operations for index range scan operations when the attribute clustering is on the index selection criteria.

  • Enables I/O reduction in OLTP applications for queries that qualify a prefix in and use attribute clustering with linear order

  • Enables I/O reduction on a subset of the clustering columns for attribute clustering with interleaved ordering

    If table data is ordered on multiple columns, as in an index-organized table, then a query must specify a prefix of the columns to gain I/O savings. In contrast, a BY INTERLEAVED table permits queries to benefit from I/O pruning when they specify columns from multiple tables in a non-prefix order.

About Defining Attribute Clustering for Tables

Attribute clustering information is part of the table metadata. You can define attribute clustering for a table either when table is first created or subsequently, by altering the table definition.

Use the CLUSTERTING clause of the CREATE TABLE statement to define attribute clustering for a table. The type of attribute clustering is specified by including BY LINEAR ORDER or BY INTERLEAVED ORDER.

As part of the table definition, you can specify that attribute clustering must be performed when the following operations are triggered:

  • Direct-path insert operations

    Set the ON LOAD option to YES to specify that attribute clustering must be performed during direct-path insert operations. This includesMERGE operations with implied direct loads using hints.

  • Data movement operations

    Set the ON DATA MOVEMENT option to YES to specify clustering must be performed during data movement operations. This includes online table redefinition and the following partition operations: MOVEMERGESPLIT, and COALESCE.

The ON LOAD and ON DATA MOVEMENT options can be included in a CREATE TABLE or ALTER TABLE statement. If neither YES ON LOAD nor YESON DATA MOVEMENT is specified, then clustering is not enforced automatically.

It will serve only as metadata defining natural clustering of the table that may be used later for zone map creation. In this case, it is up to the user to enforce clustering during loads.

优势以及定义详情参考:
http://docs.oracle.com/database/121/DWHSG/attcluster.htm#DWHSG8932



猜你喜欢

转载自blog.csdn.net/handan725/article/details/52723922