Oracle Ask Tom Partitioning Learning Series: Partitioning Tutorial for Developers

Oracle Partitioning: A Step-by-Step Introduction for Developers is one of the Oracle database developer courses .

insert image description here

Development with Oracle Partitioning/Development with Oracle Partitioning

Partitioning in the database reflects the same way we handle large tasks in the real world. When a task gets too large to tackle in one hit, whether it be trimming a tree, taking a long drive, or washing the dishes, we can split the task up into smaller pieces to make them more manageable. It can be as simple as trying to relocate one's catalog of Oracle technical books! Partitioning
in a database mirrors the way we handle large tasks in the real world. When a task gets too big to tackle at once, whether it's tree trimming, a long drive, or doing the dishes, we can break tasks down into smaller pieces to make them more manageable. It can be as simple as trying to relocate one's catalog of Oracle technical books!

video

In this video, the author Connor McDonald can divide the book by publishing house, by year, by Oracle database version...

Partitioning is that same thought applied to data stored in the database. As the demands to store more and more data in the database increase, the performance of operations against those large tables can suffer. And in applying the Pareto principle to the storage of data, typically it is only a subset of the entire dataset that is actively worked upon to satisfy the day to day needs of our businses users. Using the Partitioning option in the Oracle Database, data can be segmented into smaller, more manageable chunks to make maintenance tasks easier for Database Administrators, but also give scope for better application performance via more efficient execution of the queries being issued by Application Developers.
Partitioning is the same idea applied to data stored in a database. As the need to store more and more data in the database increases, the performance of operations on these large tables may suffer. When applying Pareto principles to data storage, usually only a subset of the entire data set is actively processed to meet the daily needs of our business users. Using the partitioning option in the Oracle Database, data can be broken up into smaller, more manageable chunks, making maintenance tasks easier for database administrators and also providing better performance by executing queries issued by application developers more efficiently. Good application performance.

Pareto principle

The Pareto principle states that for many outcomes, roughly 80% of consequences come from 20% of causes (the “vital few”).[1] Other names for this principle are the 80/20 rule, the law of the vital few, or the principle of factor sparsity.[2][3]

Partitioning Options/partition options

There are different types of partitioning options available to cater for specific business requirements. There might be a requirement to break up SALES data into each calendar year. Or there might be information being gathered on popular SPORTS that will be separated because minimal cross-sport qu series will ever be run. Or a table of cell phone CALLS might just be so large that it needs to be evenly scattered across smaller segments to keep them at a manageable size. All such options are possible with the Oracle Partitioning option. There are different types
of Partitioning options are available to meet specific business needs. It may be necessary to break down the SALES data into each calendar year. Or it might collect information on popular sports like NBA, NFL, MLB, NHL, i.e. basketball, football, baseball, and hockey, which would be separated since cross-sport queries are less common. Alternatively, the cell phone call table may be too large and need to be spread evenly across smaller segments to keep them to a manageable size. All of these options are available through the Oracle Partitioning option.

There are other partitioning strategies as well for more esoteric
requirements, including partitioning strategies between tables linked by referential integrity, and multi-dimensional forms of partitioning (partitions of partitions). Partitioning strategies between integrity-linked tables, and multidimensional forms of partitioning (partitioning of partitions).

Getting started with Partitioning

Get an Environment/Get Environment

You just go to a free service from Oracle called livesql.oracle.com to get started. This service allows you to run SQL and create database objects without requiring any software other than your browser. There are also hundreds of sample scripts and tutorials on a wide variety of topics to help you learn about Oracle Database

Oracle Partitioning is also available as a fully supported feature of all Oracle Database Cloud Service and on-premises Oracle Database Enterprise Edition. Here's a quick primer on LiveSQL from Oracle Vice President Mike Hichwa:

video

Oracle LiveSQL Characteristics:

  • Free: LiveSQL is completely free for any use. Signing up is free, fast and easy.
  • Scripting: The SQL you write in LiveSQL can be metadata tagged, saved, shared with others or with the entire Oracle community.
  • Tutorials: LiveSQL includes hundreds of tutorials written by Oracle Corporation internal and external experts to help you become productive quickly.
  • Latest Versions: LiveSQL runs on the latest versions of Oracle Database, so you can safely test new features before upgrading your own systems.

The following command gets the current version of the database:

select * from v$version;

insert image description here

A First Look at Partitioning Syntax/A Preliminary Study on Partitioning Syntax

Perhaps the most common example of partitioning in Oracle databases is to divide up a large data into partitions based on a time attribute. The largest tables in your applications are often a time-based record of critical business activities. It could be sales transactions for a retail business, mobile phone calls for a telecommunications business, or deposits and withdrawals for a banking institution. In all of these cases, there are a couple of common elements, namely each transaction (row) in the table has a time stamp of when the transaction occurred, and typically the volume of such transactions is high, making the table large in size in a short period of time. Partitioning is a natural fit for such tables, because queries often want to only peruse(随便翻阅,浏览) time-based subsets of the data, for example, transactions for the current month, or current week. Also, breaking the large table into smaller more manageable sized pieces is useful for adminstrators from the perspective of maintenance. Time is a continuous (analog) measurement, and thus, a large table would be segmented into time-based ranges, hence the term used for this opertaion is range based partitioning. This video walks you through the creation a simple range partitioned table.

video

key point:

  • Partition by range: The keyword to define that a table is partitioned is PARTITION BY, which follows the normal table column definition. The BY RANGE clause specifies the type of partitioning scheme the table will use.
  • Does not contain upper bound: Range partitioning does not give lower and upper bounds, only upper bounds. The lower bound is implicitly defined as the upper bound of the previous partition. A partition can contain values ​​up to but not including the value specified in the VALUES LESS THAN clause. ( It's a bit convoluted, in fact, it is greater than or equal to the lower boundary and smaller than the upper boundary )
  • At least 1 partition: After the PARTITION BY clause, there must always be at least one partition defined. ( This is true for range partitioning, while other types of partitioning are not necessarily applicable, such as automatic partitioning, but there are also a few cases )
  • USER_TAB_PARTITIONS: The data dictionary keeps track of all defined partitions for the table. USER_TAB_PARTITIONS displays one row for each defined partition of each partitioned table in the schema.

Performance Benefits/performance advantage

Even with just a simple range partitioning example, we have enough tools at our disposal(由我们支配,任我们处置) to examine the potential performance benefits possible by partitioning a table. When SQL queries consume too much I/O, often the only resolution considered is to create indexes on the table. The premise(前提) of indexing is simple: locate data more quickly and avoid scanning data unnecessarily. Partitioning a table takes the same approach via a concept known as “pruning” or “elimination”. If a query on a partitioned table contains appropriately phrased predicates that include the partitioning column(s), the optimizer can generate an execution plan that will bypass those partitionsthat by definition, could not contain data relevant to the query. The following video shows a demonstration of this, including a comparison of the cost of partition pruning versus a conventional indexing strategy (index strategy refers to the type of partition, such as range partition, list partition, hash partition, etc.) .

video

key point:

  • Partition pruning: If the optimizer can eliminate partition considerations, query performance can be significantly improved. In a later video, you'll see how to interpret the optimizer execution plan output to determine whether to perform partition elimination for a given SQL query.
  • Generate Test Data: You can use the DUAL technique in the video to generate arbitrary test data for any table, partitioned or otherwise. According to the video, keep the number of rows generated by a single DUAL CONNECT BY query within tens of thousands, and use Cartesian joins if you need to scale. See Tanel Poder's blog post on how these queries affect PGA memory and why you should not go to extremes.
  • Index Reduction: In some cases, partitioning a table allows existing indexes to be merged or dropped, which can reduce overall database size and improve insert, update, and delete performance.
create table SALES
(
  tstamp    timestamp(6) not null,
  sales_id  number(10) not null,
  amount    number(12, 2) not null 
);

-- timestamp之后的6表示秒的小数点的位数,6是默认值。

insert into sales
select
    timestamp '2010-01-01 00:00:00' +
    numtodsinterval(rownum*5, 'SECOND'),
    rownum,
    dbms_random.value(1,20)
from
    (select 1 from dual connect by level <= 10000),
    (select 1 from dual connect by level <= 10000)
where rownum <= 6000000;

commit;

-- 6000000 * 5 可换算为347.22天,所以时间戳都在2010年内,最大时间戳在12月中旬

set autotrace on
select max(amount)
from sales
where tstamp >= timestamp '2010-06-01 00:00:00'
and tstamp <= timestamp '2010-08-01 00:00:00';

The output is as follows, note that the consistent gets at this time is 19326:

MAX(AMOUNT)
-----------
         20


Execution Plan
----------------------------------------------------------
Plan hash value: 1047182207

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    26 |  5265   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |       |     1 |    26 |            |          |
|*  2 |   TABLE ACCESS FULL| SALES |  1062K|    26M|  5265   (1)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TSTAMP">=TIMESTAMP' 2010-06-01 00:00:00.000000000' AND
              "TSTAMP"<=TIMESTAMP' 2010-08-01 00:00:00.000000000')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
         41  recursive calls
         13  db block gets
      19326  consistent gets
          2  physical reads
       2576  redo size
        553  bytes sent via SQL*Net to client
        485  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

Create an index:

create index sales_ix on sales(tstamp);

Query again, its execution plan and statistics are as follows. Found that indexes don't help, still use full table scan:

MAX(AMOUNT)
-----------
         20


Execution Plan
----------------------------------------------------------
Plan hash value: 1047182207

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |    26 |  5265   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |       |     1 |    26 |            |          |
|*  2 |   TABLE ACCESS FULL| SALES |  1062K|    26M|  5265   (1)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("TSTAMP">=TIMESTAMP' 2010-06-01 00:00:00.000000000' AND
              "TSTAMP"<=TIMESTAMP' 2010-08-01 00:00:00.000000000')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      19038  consistent gets
          0  physical reads
          0  redo size
        553  bytes sent via SQL*Net to client
        485  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Modify to partition table:

alter table sales
modify partition by range (tstamp)
(
    partition p00 values less than (timestamp '2010-01-01 00:00:00'),
    partition p01 values less than (timestamp '2010-02-01 00:00:00'),
    partition p02 values less than (timestamp '2010-03-01 00:00:00'),
    partition p03 values less than (timestamp '2010-04-01 00:00:00'),
    partition p04 values less than (timestamp '2010-05-01 00:00:00'),
    partition p05 values less than (timestamp '2010-06-01 00:00:00'),
    partition p06 values less than (timestamp '2010-07-01 00:00:00'),
    partition p07 values less than (timestamp '2010-08-01 00:00:00'),
    partition p08 values less than (timestamp '2010-09-01 00:00:00'),
    partition p09 values less than (timestamp '2010-10-01 00:00:00'),
    partition p10 values less than (timestamp '2010-11-01 00:00:00'),
    partition p11 values less than (timestamp '2010-12-01 00:00:00'),
    partition p12 values less than (timestamp '2011-01-01 00:00:00')
);

Execute the query again, and the partition pruning takes effect. consistent gets down to 5075:

---------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       |     1 |    15 |  1415   (1)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE           |       |     1 |    15 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR|       |  1054K|    15M|  1415   (1)| 00:00:01 |     7 |     9 |
|*  3 |    TABLE ACCESS FULL      | SALES |  1054K|    15M|  1415   (1)| 00:00:01 |     7 |     9 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TSTAMP"<=TIMESTAMP' 2010-08-01 00:00:00.000000000')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       5075  consistent gets
          0  physical reads
          0  redo size
        553  bytes sent via SQL*Net to client
        485  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Finally, clean up the table:

drop table sales purge;

If you think the previous range partitioning syntax is complicated, you can also use interval partitioning. Interval partitioning is an extension of range partitioning:

alter table sales
modify partition by range (tstamp) interval (numtoyminterval(1, 'MONTH'))
(partition p00 values less than (timestamp '2010-01-01 00:00:00'));

View partition information:

-- 对于间隔分区,PARTITION_COUNT总是1048575
col TABLE_NAME for a20
col name for a20
col column_name for a20
set lines 140
col PARTITION_NAME for a20
col HIGH_VALUE for a40
set pages 9999

select TABLE_NAME, PARTITIONING_TYPE, PARTITION_COUNT, STATUS from USER_PART_TABLES;

TABLE_NAME           PARTITION PARTITION_COUNT STATUS
-------------------- --------- --------------- --------
SALES                RANGE             1048575 VALID

exec dbms_stats.gather_table_stats(null, 'SALES');

-- 必须搜集统计信息,NUM_ROWS才会有显示
select PARTITION_NAME, HIGH_VALUE, NUM_ROWS from USER_TAB_PARTITIONS where TABLE_NAME='SALES';

PARTITION_NAME       HIGH_VALUE                                 NUM_ROWS
-------------------- ---------------------------------------- ----------
P00                  TIMESTAMP' 2010-01-01 00:00:00'                   0
SYS_P27320           TIMESTAMP' 2010-02-01 00:00:00'              535679
SYS_P27321           TIMESTAMP' 2010-03-01 00:00:00'              483840
SYS_P27322           TIMESTAMP' 2010-04-01 00:00:00'              535680
SYS_P27323           TIMESTAMP' 2010-05-01 00:00:00'              518400
SYS_P27324           TIMESTAMP' 2010-06-01 00:00:00'              535680
SYS_P27325           TIMESTAMP' 2010-07-01 00:00:00'              518400
SYS_P27326           TIMESTAMP' 2010-08-01 00:00:00'              535680
SYS_P27327           TIMESTAMP' 2010-09-01 00:00:00'              535680
SYS_P27328           TIMESTAMP' 2010-10-01 00:00:00'              518400
SYS_P27329           TIMESTAMP' 2010-11-01 00:00:00'              535680
SYS_P27330           TIMESTAMP' 2010-12-01 00:00:00'              518400
SYS_P27331           TIMESTAMP' 2011-01-01 00:00:00'              228481

13 rows selected.

select * from USER_PART_KEY_COLUMNS;

NAME                 OBJEC COLUMN_NAME          COLUMN_POSITION COLLATED_COLUMN_ID
-------------------- ----- -------------------- --------------- ------------------
SALES                TABLE TSTAMP                             1

Multi-column Range Partitioning/multi-column range partitioning

Multiple columns can be specified as the partition key, the order of the columns is important.

video

create table SALES_DATA
(
    yyyy number(4) not null,
    mm number(2) not null,
    sales_id varchar2(10) not null,
    amount number(10, 2)
)
partition by range (yyyy, mm)
(
    partition p2010_q1 values less than (2010, 04),
    partition p2010_q2 values less than (2010, 07),
    partition p2010_q3 values less than (2010, 10),
    partition p2010_q4 values less than (2011, 01),
    partition p2011_q1 values less than (2011, 04),
    partition p2011_q2 values less than (2011, 07),
    partition p2011_q3 values less than (2011, 10),
    partition p2011_q4 values less than (2012, 01)
);

insert into sales_data values(2010, 03, 'Shoes', 27.10);
insert into sales_data values(2010, 02, 'Belt', 17.99);
insert into sales_data values(2010, 04, 'Hat', 42.40);
insert into sales_data values(2010, 09, 'Coffee', 3.50);
insert into sales_data values(2010, 10, 'Biscuits', 2.60);

exec dbms_stats.gather_table_stats('', 'SALES_DATA');

select partition_name, num_rows
from user_tab_partitions
where table_name = 'SALES_DATA';

The output is:

PARTITION_NAME	NUM_ROWS
P2010_Q1	2
P2010_Q2	1
P2010_Q3	1
P2010_Q4	1
P2011_Q1	0
P2011_Q2	0
P2011_Q3	0
P2011_Q4	0

8 rows selected.

Let's look at another example:

create table mobile_phone
(
    start_day date not null,
    end_day date not null,
    account_id varchar2(10) not null,
    calls number(10)
)
partition by range(start_day, end_day)
(
    partition p01 values less than (date '2010-02-01', date '2010-03-01'), 
    partition p02 values less than (date '2010-03-01', date '2010-04-01'), 
    partition p03 values less than (date '2010-04-01', date '2010-05-01') 
);

insert into mobile_phone values('07-FEB-2010', '12-FEB-2010', 'Acct#1', 100);
insert into mobile_phone values('12-FEB-2010', '13-APR-2010', 'Acct#1', 175);
exec dbms_stats.gather_table_stats('', 'MOBILE_PHONE');

select partition_name, num_rows
from user_tab_partitions
where table_name = 'MOBILE_PHONE';

The output is:

PARTITION_NAME	NUM_ROWS
P01	0
P02	2
P03	0

3 rows selected.

All data goes into the 2nd partition, which is obviously not what we want.

key point:

  • Tie-breaker is not multidimensional
    The second and third columns in the partition definition are used only as "tie-breaker" values. When inserting rows into a multi-column range partitioned table, the first column in the partition key is used to determine the partition where the row is stored. If multiple partitions have the same value for the first column, the second partition column is used (that is, overtime or tiebreaker to further determine which partition is stored), and so on . Multiple columns in partition key are not partitions of 'matrix' or 'n-dimensional' structure.
  • Storing dates as numbers
    One of the examples in the video uses the NUMBER data type to store date-based information . As the video states, this is generally a poor design idea . See Richard Foote's Storing Dates for more examples of why you might want to reconsider this approach in your database.
  • Dictionary Views
    In the same way that USER_TABLES contains various columns to reflect current optimizer statistics, USER_TAB_PARTITIONS also contains this information at the partition level. So you can use NUM_ROWS, BLOCKS, etc. to get information about volumes per partition, only as accurate as the time/granularity at which statistics are collected using DBMS_STATS .

The difference between the above two examples is that the partition key of the first example is the time point, and the second example is the time period, and the time periods overlap, so there is a problem.

Hash Partitioning/Hash partition

Sometimes partitioning is not a logical segmentation of a table based on its attributes like range partitioning does. When any database table grows large, it becomes more difficult for administrators to manage because maintaining such tables often results in longer downtime for business applications. Hash partitioning allows you to partition a table into equal-sized chunks based on a hash function applied to one or more columns of the table.

Creating a hash partitioned table is easy, but the number of hash partitions specified is critical.

video

Hash partitioning is very important for DBAs; and hash index partitioning is very important for developers.

key point:

  • Power of 2
    To keep partition sizes close, the number of partitions should be a power of 2.
  • The ORA_HASH
    hashing algorithm is not documented, but it has been observed that the ORA_HASH function returns results consistent with data fragmentation occurring with hash partitioning.
  • Splitting
    Partitions can be split using the ALTER TABLE SPLIT PARTITION command, a task usually performed by the database administrator. Splitting a partition is a resource-intensive activity because the entire partition's data may be moved.
create table T
(
    x number(10)
) partition by hash(x)
partitions 8;

select partition_name
from user_tab_partitions
where table_name = 'T';

The output is:

PARTITION_NAME
SYS_P492050
SYS_P492051
SYS_P492052
SYS_P492053
SYS_P492054
SYS_P492055
SYS_P492056
SYS_P492057

8 rows selected.

Insert 100,000 pieces of data:

insert into T
select level from dual 
connect by level <= 100000;

View rowid:

select rowid from T where rownum = 1;

The output is as follows, rowid actually contains the object ID:

ROWID
AJUIk+ADDAAABSTAAA

Therefore, the actual number of rows in each partition can be obtained through the following SQL:

select dbms_rowid.rowid_object(rowid) ptn_obj, count(*)
from T
group by dbms_rowid.rowid_object(rowid)
order by 2;

The output is as follows, it can be seen that the data is close to the average distribution:

PTN_OBJ	COUNT(*)
156272962	12342
156272963	12381
156272965	12382
156272961	12508
156272959	12575
156272960	12581
156272958	12603
156272964	12628

8 rows selected.

It can be verified by the ORA_HASH function, where 7 means from 0 to 8:

select ora_hash(x, 7), count(*)
from t
group by ora_hash(x, 7)
order by 2;

The output is as follows:

ORA_HASH(X,7)	COUNT(*)
4	12342
5	12381
7	12382
3	12508
1	12575
2	12581
0	12603
6	12628

8 rows selected.

The number of hash partitions should be a power of 2, such as 2, 4, 8, 16..., otherwise the distribution may be uneven.

drop table t purge;

create table T
(
    x number(10)
) partition by hash(x)
partitions 5;

insert into T
select level from dual 
connect by level <= 100000;

select dbms_rowid.rowid_object(rowid) ptn_obj, count(*)
from T
group by dbms_rowid.rowid_object(rowid)
order by 2;

It can be seen that the distribution is not uniform:

   PTN_OBJ   COUNT(*)
---------- ----------
    107682      12342
    107678      12603
    107681      24890
    107679      24956
    107680      25209

5 rows selected.

How to resize 5 partitions to 8 partitions? Let's look at the traditional approach first, most rows (nearly 88% in this case) need to be moved:

select count(*) from T
where ora_hash(x, 4) != ora_hash(x, 7);

COUNT(*)
87564

select count(*) from T
where ora_hash(x, 4) != ora_hash(x, 5);

COUNT(*)
83224

Oracle adopts a more intelligent algorithm, first look at the change from 5 partitions to 6 partitions:

alter table T add partition;

select dbms_rowid.rowid_object(rowid) ptn_obj, count(*)
from T
group by dbms_rowid.rowid_object(rowid)
order by 2;

PTN_OBJ	COUNT(*)
156275768	12342
156276447	12381
156276446	12575
156275764	12603
156275767	24890
156275766	25209

6 rows selected.

You can see the effect of adding a hash partition, compared to splitting a hash partition. In this example, 24956 is split into 12381 and 12575.

Repeat the above process, you can see that the data distribution tends to be balanced:

alter table T add partition;
select dbms_rowid.rowid_object(rowid) ptn_obj, count(*)
from T
group by dbms_rowid.rowid_object(rowid)
order by 2;

PTN_OBJ	COUNT(*)
156275768	12342
156276447	12381
156276446	12575
156276791	12581
156275764	12603
156276792	12628
156275767	24890

7 rows selected.

alter table T add partition;
select dbms_rowid.rowid_object(rowid) ptn_obj, count(*)
from T
group by dbms_rowid.rowid_object(rowid)
order by 2;

PTN_OBJ	COUNT(*)
156275768	12342
156276447	12381
156276843	12382
156276842	12508
156276446	12575
156276791	12581
156275764	12603
156276792	12628

8 rows selected.

Hash partitions can only be added one by one. The process of adding partitions is equivalent to split, and it is also the process of data rehashing.

The reduction of hash partitions is achieved through the coalesce operation.

select dbms_rowid.rowid_object(rowid) ptn_obj, count(*)
from T
group by dbms_rowid.rowid_object(rowid)
order by 2;

   PTN_OBJ   COUNT(*)
---------- ----------
    107682      12342
    107684      12381
    107688      12382
    107687      12508
    107683      12575
    107685      12581
    107678      12603
    107686      12628

8 rows selected.

alter table T COALESCE PARTITION;

-- 数据库自动选择了107687和107687进行coalesce
select dbms_rowid.rowid_object(rowid) ptn_obj, count(*)
from T
group by dbms_rowid.rowid_object(rowid)
order by 2;

   PTN_OBJ   COUNT(*)
---------- ----------
    107682      12342
    107684      12381
    107683      12575
    107685      12581
    107678      12603
    107686      12628
    107689      24890

7 rows selected.

List Partitioning/list partition

Range partitioning, as the name suggests, is about carving up data that is analog(模拟) in nature, that is, a continuous range of values(连续的值). This is why dates are a natural candidate for a range-based partitioning scheme.
But sometimes the column you might want to partition on contains a discrete(离散) set of values, which is when LIST partitioning is the best solution. Creating a LIST partitioned table requires nominating the discrete values for each partition, or relying on the new AUTOMATIC clause in 12c Release 2.

video

List partitioning is suitable for situations where there are few distinct values, range partitioning is cumbersome, and hash partitioning is unevenly distributed.

key point:

  • One or more values
    ​​A single partition can contain one or more discrete values.
  • Default
    "catch-all" partitions can be defined using the DEFAULT clause in the VALUES statement. Null values ​​also go into this partition.

Example:

create table sports
(
    sport varchar2(3)
);

insert into sports values('NHL');
insert into sports values('MLB');
insert into sports values('NBA');
insert into sports values('NFL');

alter table sports modify partition by hash(sport) partitions 4;

select sport, ora_hash(sport, 3) hash from sports;

The output is as follows, you can see that the distribution is not uniform:

SPORT	HASH
NHL		0
NFL		0
MLB		1
NBA		1

4 rows selected.

Change directly to list partitioning:

alter table sports modify partition by list(sport)
(
    partition NHL values ('NHL'),
    partition MLB values ('MLB'),
    partition NBA values ('NBA'),
    partition NFL values ('NFL')
);

alter table sports add partition OTHERS values ('LAC', 'TEN');

insert into sports values('XYZ');

The error is reported as:

ORA-14400: inserted partition key does not map to any partition ORA-06512: at "SYS.DBMS_SQL", line 1721

Just add a default partition:

alter table sports add partition ALL_OTHERS values (DEFAULT);

insert into sports values('XYZ');

Queries specifying partitions:

select * from sports partition(ALL_OTHERS);

Partitions of Partitions/Partitions of Partitions

That is composite partition. Up to two levels, allowed combinations are [Interval | Range | List | Hash]-[Range | List | Hash].

Even once partitioned, a table may still be extremely large. Or the partitioning scheme may result in partitions that are not equally sized on disk. For example, archiving off of older data might mean that historical partitions are far smaller than the current partitions (这里假设业务是不断增长的,因此今年的数据多于去年). Equi-sized partitions might be beneficial in particular when it comes to performing operations in paralle on a per-partition basis.
Partitions can be segmented further into subpartitions. Each partition can have it’s own subpartitioning scheme, or all partitions can share a common schema.

key point:

  • Flexibility
    The child partition scheme can be different from the parent partition scheme.
  • Per Partition Definition
    Each partition can have its own subpartition definition. Partitions are also valid without subpartitions. That is to say, for the same table, it is allowed that some partitions have no sub-partitions and some partitions have sub-partitions. Or different partitions can have different subpartition strategies
  • The SUBPARTITION_POSITION column in the order
    USER_TAB_SUBPARTITIONS refers to the relative position of the subpartition within the parent partition.
  • Templates
    Subpartition templates make it easy to implement a common scheme for all partitions and reduce the size of DDL scripts.
  • The existence of logical/physical
    subpartitions determines whether a partition is a physical segment on disk, or just a logical collection of physical subpartition segments. The former is the case when there is no sub-partition, and the latter is the case when there is a sub-partition, because the data is actually stored in the sub-partition. The partition mentioned here refers to the parent partition is more accurate

Interval Partitions/interval partition

For incoming data, a partition must already exist that the incoming partitioning key would map to. If a partition does not exist, transactions will fail with ORA-14400. In earlier versions of Oracle Database, this meant that administrators had to take extreme care to ensure that partitions for future data were defined or risk application outages. Interval partitioning (introduced in Oracle Database 11g) removes this risk by automatically creating new partitions on a partitioned table as required when new data arrives.

Interval partitioned tables are different from range partition in that logical “gaps” in the partition coverage are permitted.

Interval partitioning is a type of range partitioning.

video

key point:

  • Automatic Naming
    Because partitions are created dynamically, a system-generated name is assigned to the new partition. They can be renamed to meet existing business standards.
  • The range of the partition key that allows gaps
    is fixed, that is, the size of the interval. Unlike range partitioning, the upper and lower bounds are defined by intervals rather than adjacent partitions. In other words, range partitions are continuous and finite; interval partitions are infinite, with holes allowed in the middle
  • FOR syntax
    If you don't know the name of the interval partition, you can use the FOR (key-value) syntax

For range partitioning, since the partitions established in DDL are limited, partitions need to be pre-established before new data arrives. For example, on New Year's Eve, it is necessary to create a partition for New Year's data, otherwise an error will be reported when inserting, which is a burden on the DBA. Therefore, an automatic way is needed, which is interval partitioning.

create table sales
(
    tstamp  date    not null,
    empno   number(10)  not null,
    ename   varchar(2)  not null,
    deptno  varchar(2)  not null
)
partition by range (tstamp)
interval (numtoyminterval(1, 'YEAR'))
(
    partition p00 values less than (DATE '2010-01-01')
);

select partition_name, high_value from user_tab_partitions
where table_name = 'SALES';

PARTITION_NAME	HIGH_VALUE
--------------  ----------
P00				TO_DATE(' 2010-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')

insert into sales values (to_date(''12-DEC-2011), 100, 'ME', 'EA');

select partition_name, high_value from user_tab_partitions
where table_name = 'SALES';

PARTITION_NAME	HIGH_VALUE
--------------  ----------
P00				TO_DATE(' 2010-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
SYS_P492130		TO_DATE(' 2012-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')

The system automatically creates a partition whose name starts with SYS_P and can be renamed later.

PARTITION FOR syntax, referencing partitions by data:

alter table SALES move partition for (DATE '2011-02-01');

select * from SALES partition for (DATE '2011-02-01');

PARTITION FOR is a generic syntax, not limited to interval partitions. Also available for hash partitioning:

select count(*) from T partition for (100);

select * from T partition(p1, p2, ...);

Converting to Interval Partitions/conversion to interval partitions

Existing range partitioned tables can be converted into interval partition tables with a simple conversion command, requiring no downtime or data migration. The existing partitions in the range partitioned table remained defined as “range partitions” whilst new partitions created dynamically are “interval partitions”. Thus a table that was not initially created as an interval partitioned table is a hybrid between the two, containing both types of partitions. Because intervals are defined as an offset from an initial fixed range partition boundary, you cannot drop all of the range partitions in an interval partitioned table. At least one must always remain. In Oracle Database 12c Release 2, the database will automatically convert the oldest interval partition into a range partition if no initial range partition boundary can be found.

video

key point:

  • Conversion A
    range partitioned table can be converted to an interval partitioned table simply by specifying the desired interval with ALTER TABLE … SET INTERVAL.
  • Mixed Partitions On translation tables, the INTERVAL column
    on USER_TAB_PARTITIONS indicates whether each partition is a range partition or an interval partition.
  • Minimum range bounds At least one range partition
    must remain in the table . Prior to Oracle Database 12c Release 2, re-running the SET INTERVAL command would mark an existing interval partition as a range partition.

Example:

create table SALES
(
  tstamp    timestamp(6) not null,
  sales_id  number(10) not null,
  amount    number(12, 2) not null 
);

alter table sales
modify partition by range (tstamp)
(
    partition p00 values less than (timestamp '2010-01-01 00:00:00'),
    partition p01 values less than (timestamp '2010-02-01 00:00:00')
);

insert into sales values(timestamp '2009-12-02 00:00:00', 100, 100);
insert into sales values(timestamp '2010-01-02 00:00:00', 200, 200);

alter table sales set interval(numtoyminterval(1, 'MONTH'));

insert into sales values(timestamp '2010-02-02 00:00:00', 300, 300);

col pname for a20
col high_value for a40

select
    partition_name pname,
    partition_position pos, 
    high_value,
    interval
from user_tab_partitions
where table_name = 'SALES';

The output is:

PNAME                       POS HIGH_VALUE                               INT
-------------------- ---------- ---------------------------------------- ---
P00                           1 TIMESTAMP' 2010-01-01 00:00:00'          NO
P01                           2 TIMESTAMP' 2010-02-01 00:00:00'          NO
SYS_P27344                    3 TIMESTAMP' 2010-03-01 00:00:00'          YES

3 rows selected.

If interval is NO, it means range partition, and YES means interval partition.

The upper bound of the range partition serves as a marker against which subsequent interval partitions are relative. Therefore you cannot drop all range partitions. However, this limitation was broken, and apparently the software was improved. After 12.2, if you delete all range partitions, the remaining interval partitions are automatically converted to range partitions:

alter table sales drop partition p00;
alter table sales drop partition p01;
col pname for a20
col high_value for a40

select
    partition_name pname,
    partition_position pos, 
    high_value,
    interval
from user_tab_partitions
where table_name = 'SALES';

PNAME                       POS HIGH_VALUE                               INT
-------------------- ---------- ---------------------------------------- ---
SYS_P27344                    1 TIMESTAMP' 2010-03-01 00:00:00'          NO

Interval Partitions for Lists/interval partition of the list

INTERVAL partitioning is great for range partitioning to avoid the need for regular maintenance by a DBA to ensure that all values can be stored. The same facility is available for LIST partitioned tables from Oracle Database 12c Release 2 onwards. In this way, you can ensure that legal data values that have not been defined as partition keys can automatically create partitions on the fly.

video

key point:

  • One static partition
    Still must define at least one static partition
  • Allow Null Values
    ​​You can still define a partition to hold null values ​​if desired
  • Used judiciously,
    automatic partitioning works best with a limited set of distinct values . Don't fall into the trap of millions (i.e. too many distinct values) of partitions

Automatic list partitioning is suitable for limited distinct values ​​such as provinces, municipalities, states. But not too little, such as gender.

create table people(
    id int,
    name varchar2(100),
    gender varchar2(2)
);


alter table people
modify partition by list(gender) automatic
(
    partition male values ('M'),
    partition female values ('F')
);

insert into people values(100, 'VOID', 'U');

select partition_name from user_tab_partitions
where table_name = 'PEOPLE';

The output is:

PARTITION_NAME
--------------
FEMALE
MALE
SYS_P492154

3 rows selected.

Reference Partitioning/reference partition

Tables in a relational database do not work in isolation, and link them via declarative referential integrity. For this reason, a large table might not contain the column upon which we wish to partition it by. For example, a SALES table may be partitioned by a SALES_DATE, but a child table (say) SALES_ITEMS may only have a foreign back to the the parent SALES table, and thus no SALES_DATE column to partition on. Reference partitioning can be used to handle these more complex designs.
SALES_ITEMS中的数据比SALES会更多,因为是多对一的关系

video

key point:

  • Declarative
    foreign keys on child tables define partition keys
  • Cascading truncation
    The CASCADE option in the TRUNCATE command can be used to bypass the usual foreign key checks if the parent and its children need to be truncated
  • Strong binding
    Due to the tightly coupled nature of the data, there are some restrictions when tables are referenced by partitions

Example:

create table PARENT
(
    dte date    not null,
    pk  number(10)  not null,
    pad char(10)
)
partition by range(dte)
(
    partition p1 values less than (to_date('01-JAN-2010')),
    partition p2 values less than (to_date('01-FEB-2010')),
    partition p3 values less than (to_date('01-MAR-2010')),
    partition p4 values less than (to_date('01-APR-2010')),
    partition p5 values less than (to_date('01-MAY-2010')),
    partition p6 values less than (to_date('01-JUN-2010')),
    partition p7 values less than (to_date('01-JUL-2010')),
    partition p8 values less than (to_date('01-AUG-2010')),
    partition p9 values less than (to_date('01-SEP-2010'))
);

alter table PARENT add primary key(pk);

create table CHILD
(
    p number(10) not null,
    c number(10) not null,
    constraint CHILD_FK foreign key (p) references PARENT(pk)
    on delete cascade
)
partition by reference (CHILD_FK);

insert into PARENT select to_date('01-JAN-2010')+rownum, rownum, rownum
from dual connect by level <= 100;

insert into CHILD select rownum, rownum
from dual connect by level <= 100;

exec dbms_stats.gather_table_stats(null, 'CHILD');
select partition_name, num_rows from user_tab_partitions
where table_name = 'CHILD';

The output is as follows, and you can see that the subtable is also partitioned:

PARTITION_NAME	NUM_ROWS
P1	0
P2	30
P3	28
P4	31
P5	11
P6	0
P7	0
P8	0
P9	0
Download CSV
9 rows selected.

Indexes on Partitioned Tables/index on partitioned tables

An index normally points to a row in a physical segment (a table). But a partitioned table consists of multiple physical segments, so the index structures and the data contained within the index need to be slightly different for partitioned tables. If the index spans all partitions, what happens when we perform maintenance on a single partition, such as dropping it? What happens to the index entries that reference the now dropped partition?

video

key point:

  • A global
    index spans all partitions, which means that each index entry has a larger ROWID to refer to the partition (database object) and the position of the row within that partition
  • Maintenance
    Altering table partitions (truncating, splitting, merging, dropping, etc.) can mark indexes into an UNUSABLE state. By default, no errors are thrown when this happens, your query plan just doesn't use the index anymore . UPDATE INDEXES clause may be beneficial here
  • Asynchronous
    In 12c and later, DROP and TRUNCATE partition maintenance have less impact on global indexes because index cleanup is done in the background . This feature is useful for bulk data deletion

If a partition table uses a traditional index, that is, a global index, when the partition is deleted, the index status becomes UNUSABLE.
You can update the index while dropping the partition:

alter table <table_name> drop partition <partition_name> update index;

Skip unavailable indexes is set to TRUE by default:

SQL> show parameter skip

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
skip_unusable_indexes                boolean     TRUE

The global index not only needs to record the rowid, but also needs to record the segment ID where the partition is located. That is extended rowid.

ROWID is a virtual column and takes up 10 bytes:

SQL> select rowid, vsize(rowid), dump(rowid) from part where rownum <2;

ROWID              VSIZE(ROWID)
------------------ ------------
DUMP(ROWID)
--------------------------------------------------------------------------------
AAASNxAAMAAAAGDAAA           10
Typ=69 Len=10: 0,1,35,113,3,0,1,131,0,0

Locally Partitioned Indexes/local partition index

An index can be equipartitioned with its underlying table. Such an index is known as a local index(可以与global ). Local indexes can have significantly benefits when it comes to Information Lifecycle Management (ILM). Because table partitions operations such as TRUNCATE and DROP and isolated to a single index partition, the remainder of the index remains available and never needs unncessary maintenance. Local indexes are also lend themselves to easier partition exchange which is a useful technique for either archiving old data or introducing new data into a partitioned table with zero downtime.

A global partition index (or global index) is an index segment that maps multiple table partition segments. The local partition index (or local index) is a one-to-one correspondence between index segment and table partition segment. A local index is equivalent to partitioning the index.

If a partition is deleted at this time, the corresponding local partition index will also be deleted, and other partitions and indexes will not be affected.

create table sales
(
    tstamp  date    not null,
    empno   number(10)  not null,
    amount  number(10)  not null
)
partition by range (tstamp)
interval (numtoyminterval(1, 'YEAR'))
(
    partition p2009 values less than (DATE '2010-01-01'),
    partition p2010 values less than (DATE '2011-01-01'),
    partition p2011 values less than (DATE '2012-01-01'),
    partition p2012 values less than (DATE '2013-01-01')
);

create index sales_ix on sales(tstamp) local;

insert into sales values(DATE '2010-01-01', 100, 100);
insert into sales values(DATE '2011-01-01', 200, 200);
insert into sales values(DATE '2012-01-01', 300, 300);

alter table sales drop partition p2009;

select partition_name, status from user_ind_partitions
where index_name = 'SALES_IX';

The output is as follows, you can see that after the partition is deleted, the indexes of other partitions are still available:

PARTITION_NAME	STATUS
P2010			USABLE
P2011			USABLE
P2012			USABLE

3 rows selected.

With local indexes, ILM becomes more convenient:


alter table sales move partition p2010 compress;

alter table sales move partition p2010 tablespace ts_archive;

-- 这样RMAN就不会反复备份
alter tablespace ts_lowperformance read only;

alter table sales modify partition p2010 unusable local indexes;

alter table sales modify partition p2010 indexing off;

Older partitions are prohibited from local indexing because most of these old data are used for analysis. Of course, it is also possible to prohibit the current partition from being indexed, which is applicable to the case of using Database In-Memory.

The exchange partition will not be introduced.

Local versus Global Indexes/local and global indexes

The previous two videos discuss several apparent limitations of global indexes and plenty of advantages of local indexes. This seems to suggest that local indexes should always be used in preference to global indexes. This is not the case, and each indexing technique should be employed to best match the application requirements you have. A local index can be the absolutely wrong choice for particular kinds of queries. However, sometimes you may need to choose some design compromises if you want to get the best of both worlds - partition independence but still with solid declarative database design.

This section describes how to choose between local and global indexes. Both have their own applicable scenarios.

video

key point:

  • Read Multiplier (Read Mulitplier)
    Note that index lookups do not degrade performance due to wrong choice of local index partitioning strategy.
  • Design Tradeoffs
    Sometimes it can be beneficial to bring the partition key into the physical design of the primary key, even if the logical database design does not require it.
  • 12c and above
    Asynchronous cleanup of global indexes makes the distinction between local and global indexes less obvious.

The choice of a local or global index depends on whether partition pruning is in effect, that is, whether the partition key is included in the predicate.

For example, for a local index, if there is no partition key in the query condition, and the index key is not in the partition key, then all partitions need to be traversed, which is very expensive, especially for composite partitions. At this time, it is more appropriate to use a global index.

Local indexes can also come in handy if there is a partition key available in the filter predicate. The local index is very suitable for ILM, while the global index is convenient for quick search. The specific use depends on the application scenario.

Special Indexing Use-cases/ Special Indexing Use-cases

This tutorial series has covered the common partitioning strategies adopted by developers to build successful applications on large volumes of data. However, there are niche cases(特殊情况) that also need to be considered. For example, you could have an index that partitioned on a table that is not partitioned. Similarly, the index partitioning strategy might not align with the partitioning strategy for the underlying table. Such examples are typically rare, however there is one important case where hash partitioning an index is critical to achieve extreme levels of OLTP performance.

video

key point:

  • Flexibility
    The partitioning of an index need not match the partitioning of the table it is based on, but this is rarely the case.
  • Contention
    partitioning is an effective way to distribute activity across multiple segments to reduce contention for common data blocks .
  • The GLOBAL Keyword
    Although most partitioning syntax is simply PARTITION BY, to partition an index you need to prefix it with the GLOBAL keyword.

For non-partitioned tables, there is no concept of local indexes, but indexes can still be partitioned, the syntax is as follows:

create index idx1 on sales(id) 
GLOBAL PARTITION BY ...

Another special use case is hash partitioned index.
For high-frequency insertion, since the insertion is sequential, the write operation will be concentrated in the leading leaf block of the index, which will cause concurrency problems.
If hash partitioned index is used, the hotspots will be spread out (there will be 8 indexed leading leaf blocks at this time):

create index idx1 on sales(id) 
GLOBAL PARTITION BY HASH(txn_id) partitions 8;

Querying Partitioned Tables/Query partition table

Perhaps the most attractive benefit of partitioning is that performance improvements that can be achieved when querying partitioned tables. Partition pruning is the term used to describe when the predicates of a query are such that the entire table does not need to be scanned, but only a subset of the partitions. Detecting partition pruning relies on understanding the execution plan for a given query.

video

key point:


  • The Pstart/ Pstop column in the Pstart/ Pstop execution plan shows the range of partitions scanned. When KEY is shown, it means that the decision is made at execution time rather than parse time .
  • Bind
    partition pruning applies to bind variables as well as literal values ​​of partition key predicates.
  • Interval
    Since an interval partition table has logically defined all 1 million (upper limit on number of partitions) partitions, the Pstart/Pstop values ​​can be misleading.
  • Powerful
    pruning can happen with equality predicates, range predicates, in-list expressions, and many other permutations.

Query efficiency = required data / scanned data

drop table DEMO purge;

create table DEMO
(
    tstamp timestamp not null,
    empno   number(10) not null,
    ename   varchar2(10) not null,
    deptno  varchar2(10) not null
)
partition by range(tstamp)
(
    partition p00 values less than (timestamp '2010-01-01 00:00:00'),
    partition p01 values less than (timestamp '2010-02-01 00:00:00'),
    partition p02 values less than (timestamp '2010-03-01 00:00:00'),
    partition p03 values less than (timestamp '2010-04-01 00:00:00'),
    partition p04 values less than (timestamp '2010-05-01 00:00:00'),
    partition p05 values less than (timestamp '2010-06-01 00:00:00'),
    partition p06 values less than (timestamp '2010-07-01 00:00:00'),
    partition p07 values less than (timestamp '2010-08-01 00:00:00'),
    partition p08 values less than (timestamp '2010-09-01 00:00:00'),
    partition p09 values less than (timestamp '2010-10-01 00:00:00'),
    partition p10 values less than (timestamp '2010-11-01 00:00:00'),
    partition p11 values less than (timestamp '2010-12-01 00:00:00'),
    partition p12 values less than (timestamp '2011-01-01 00:00:00')
);

insert /*+ APPEND */ into DEMO
select 
    trunc(date '2010-01-01', 'YYYY') + mod(rownum, 360),
    rownum,
    rownum,
    mod(rownum, 1000)
from dual
connect by level <= 1000000;

commit;

Looking at the execution plan, PARTITION RANGE SINGLE means that only a single partition has been queried:

set lines 140
set autotrace on
select count(*) from DEMO where tstamp = to_date('01-JUN-2010');

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     1 |    11 |    91   (2)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE         |      |     1 |    11 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|      |  2778 | 30558 |    91   (2)| 00:00:01 |     7 |     7 |
|*  3 |    TABLE ACCESS FULL    | DEMO |  2778 | 30558 |    91   (2)| 00:00:01 |     7 |     7 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TSTAMP"=TIMESTAMP' 2010-06-01 00:00:00')

The following SQL cannot use partition pruning due to the use of functions, so PARTITION RANGE ALL:

set lines 140
set autotrace on
select count(*) from DEMO where trunc(tstamp) = to_date('01-JUN-2010');

---------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    11 |  1176   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE      |      |     1 |    11 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|      | 10000 |   107K|  1176   (4)| 00:00:01 |     1 |    13 |
|*  3 |    TABLE ACCESS FULL | DEMO | 10000 |   107K|  1176   (4)| 00:00:01 |     1 |    13 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(TRUNC(INTERNAL_FUNCTION("TSTAMP"))=TO_DATE(' 2010-06-01 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))

Bind variables can also use partition pruning, the KEY keyword indicates that partition pruning is determined at runtime rather than parse time:

set lines 140
set autotrace on

SQL> variable b1 varchar2(20);
SQL> begin
  2  :b1 := '01-JUN-2010';
  3  end;
  4  /

PL/SQL procedure successfully completed.

SQL> print b1

B1
--------------------------------------------------------------------------------------------
01-JUN-2010

SQL> select count(*) from DEMO where tstamp = :b1;

  COUNT(*)
----------
         0


Execution Plan
----------------------------------------------------------
Plan hash value: 1642956652

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     1 |    11 |    91   (2)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE         |      |     1 |    11 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|      |  2778 | 30558 |    91   (2)| 00:00:01 |   KEY |   KEY |
|*  3 |    TABLE ACCESS FULL    | DEMO |  2778 | 30558 |    91   (2)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TSTAMP"=TO_TIMESTAMP(:B1))

The scene of PARTITION RANGE OR, namely KEY(OR) :

select count(*) from DEMO
where 
	tstamp between to_date('12-JAN-2010') and to_date('07-FEB-2010')
or 
	tstamp between to_date('03-JUN-2010') and to_date('06-AUG-2010');

--------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    11 |     3   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE     |      |     1 |    11 |            |          |       |       |
|   2 |   PARTITION RANGE OR|      |   247K|  2658K|     3   (0)| 00:00:01 |KEY(OR)|KEY(OR)|
|*  3 |    TABLE ACCESS FULL| DEMO |   247K|  2658K|     3   (0)| 00:00:01 |KEY(OR)|KEY(OR)|
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TSTAMP">=TIMESTAMP' 2010-06-03 00:00:00' AND "TSTAMP"<=TIMESTAMP'
              2010-08-06 00:00:00' OR "TSTAMP"<=TIMESTAMP' 2010-02-07 00:00:00' AND
              "TSTAMP">=TIMESTAMP' 2010-01-12 00:00:00')

Scenarios for PARTITION RANGE INLIST:

select count(*) from DEMO
where 
	tstamp in (
	to_date('12-JAN-2010'),
	to_date('07-FEB-2010')
	);

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     1 |    11 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE         |      |     1 |    11 |            |          |       |       |
|   2 |   PARTITION RANGE INLIST|      |  5556 | 61116 |     2   (0)| 00:00:01 |KEY(I) |KEY(I) |
|*  3 |    TABLE ACCESS FULL    | DEMO |  5556 | 61116 |     2   (0)| 00:00:01 |KEY(I) |KEY(I) |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TSTAMP"=TIMESTAMP' 2010-01-12 00:00:00' OR "TSTAMP"=TIMESTAMP'
              2010-02-07 00:00:00')

Look at an execution plan whose data is not in the partition:

select count(*) from DEMO where tstamp = to_date('01-JUN-2022');
-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |     1 |    11 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |      |     1 |    11 |            |          |       |       |
|   2 |   PARTITION RANGE EMPTY|      |     1 |    11 |     2   (0)| 00:00:01 |INVALID|INVALID|
|*  3 |    TABLE ACCESS FULL   | DEMO |     1 |    11 |     2   (0)| 00:00:01 |INVALID|INVALID|
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TSTAMP"=TIMESTAMP' 2022-06-01 00:00:00')


The interval partition is special, Pstop is 1048575, that is, 1024 times 1024 minus 1:

drop table demo purge;

create table DEMO
(
    tstamp timestamp not null,
    empno   number(10) not null,
    ename   varchar2(10) not null,
    deptno  varchar2(10) not null
)
partition by range(tstamp)
interval (numtoyminterval(1, 'MONTH'))
(
	partition p00 values less than
	(timestamp '2010-01-01 00:00:00')
);

select * from demo;

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 2349549400

--------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    40 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL|      |     1 |    40 |     2   (0)| 00:00:01 |     1 |1048575|
|   2 |   TABLE ACCESS FULL | DEMO |     1 |    40 |     2   (0)| 00:00:01 |     1 |1048575|
--------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

-- 最后看一个数据不在分区中的执行计划
select count(*) from DEMO where tstamp = to_date('01-JUN-2022');

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     1 |    13 |     2   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE         |      |     1 |    13 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|      |     1 |    13 |     2   (0)| 00:00:01 |   151 |   151 |
|*  3 |    TABLE ACCESS FULL    | DEMO |     1 |    13 |     2   (0)| 00:00:01 |   151 |   151 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("TSTAMP"=TIMESTAMP' 2022-06-01 00:00:00')

Partition Queries with Joins/partition queries with joins

Partitioned tables are rarely queried in isolation, and thus gaining the benefits of partition pruning even when a partitioned table is joined to another table, or involved with a subquery is a critical feature of the Oracle Database. By utilising a memory structure know as a Bloom filter, the database can quickly identify which partitions will be needed to satisfy a given query.

video

key point:

  • A BF prefix
    : BF[digit] entry in the execution plan indicates that a Bloom filter is being used, e.g.:BF0001
  • Efficient
    Bloom filters use memory-efficient structures to identify which partitions can be pruned.
  • Negative/Positive
    With Bloom filters, there may be false positives, so the join operation may still process a little more data, but false negatives are not possible, so your query will not be incorrect result.

The definition of Bloom Filter in Wikipedia:

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not

For false positive and false negative, the author gave an example. Buying movie tickets, the website says there are tickets, but actually there are no tickets, this is the former;

Other Performance Opportunities/Other Performance Opportunities

When two tables have identical partitioning definitions, the database can take advantage of this knowledge. If we had tables of animals, then commonsense tells us we will never match breeds between the dogs partition in one table and the cats partition in the other table. A join should be able to detect when two disparate partitions could never have a matching row, and therefore be eliminated from a join operation. This is known as a partition-wise join(基于分区的join或分区join).

key point:

  • Identical
    partition definitions must be perfectly aligned for full partition joins to occur
  • Reference
    A reference partition is a perfect candidate for a partitioned join, since partitions between two related tables are by definition the same
  • Plan Hierarchy
    Know where JOIN and PARTIION lines are placed in the execution plan to detect partition joins.

The video
author gave an example of pairing socks, but I didn't understand it.

select ... from SOCKS n, SOCKS s
where n.style_size = s.style_size;

After 2 days, I finally finished learning.

Guess you like

Origin blog.csdn.net/stevensxiao/article/details/127836162