The new APPROX_COUNT_DISTINCT fast unique value counting function of the new features of performance optimization in Oracle12c

 

In Oracle11g, the APPROX_COUNT_DISTINCT function was added in order to improve the unique value count function when the DBMS_STATS package collects statistics, but it is not documented in the document. This function is included in the Oracle12c documentation, so we can now use it freely in our application.

1. Basic usage

In previous database versions, if we wanted to count unique values, we might do so.

SELECT COUNT(DISTINCT c_name) AS nm_cnt

FROM   test;

 

 NM_CNT

----------

     58172

 

1 row selected.

 

SQL>

The query will yield exact unique value results based on Oracle's read consistency model. That is, we will see committed data, and uncommitted changes made by the current session.

In contrast, the new function APPROX_COUNT_DISTINCT will not give exact results, but will deviate from the exact results.

SELECT APPROX_COUNT_DISTINCT(c_name) AS nm_cnt

FROM   test;

 

 NM_CNT

----------

     56789

 

1 row selected.

 

SQL>

This function can be used in grouped queries.

SELECT tablespace_name,APPROX_COUNT_DISTINCT(table_name) AS tab_count

FROM   user_tables

GROUP BY tablespace_name

ORDER BY tablespace_name;

 

TABLESPACE_NAME                 TAB_COUNT

------------------------------ ----------

SYSAUX 78

SYSTEM                                 22

USERS                                   7

                                       48

 

4 rows selected.

 

SQL>

2. Performance

In the example below, we can see the difference in performance between the two methods, but it doesn't seem to be particularly large.

SET TIMING ON

 

SELECT COUNT(DISTINCT c_name) AS nm_cnt

FROM   test;

 

 NM_CNT

----------

     58172

 

1 row selected.

 

Elapsed: 00:00:02.39

SQL>

 

 

SELECT APPROX_COUNT_DISTINCT(c_name) AS nm_cnt

FROM   test;

 

 NM_CNT

----------

     56789

 

1 row selected.

 

Elapsed: 00:00:02.00

SQL>

In fact, the APPROX_COUNT_DISTINCT function is used to handle much larger loads, below, we create a much larger table.

DROP TABLE test PURGE;

 

CREATE TABLE test AS

SELECT level AS  data

FROM  dual

CONNECT BY level <= 10000;

 

INSERT /*+ APPEND */ INTO test

SELECT a.data FROM test a

CROSS JOIN test b;

 

COMMIT;

 

EXEC DBMS_STATS.gather_table_stats(‘Test’,'Test');

There are now more than 1 million data in the table, 10,000 unique values. We will see that the performance of the two methods is quite different.

SET TIMING ON

 

SELECT COUNT(DISTINCT data) AS data_count

FROM  test;

 

DATA_COUNT

----------

    10000

 

1 row selected.

 

Elapsed: 00:00:19.66

SQL>

 

 

SELECT APPROX_COUNT_DISTINCT(data) ASdata_count

FROM  test;

 

DATA_COUNT

----------

     10030

 

1 row selected.

 

Elapsed: 00:00:10.46

SQL>

Through testing, it will be found that the previous method consumes more and more time and resources when the amount of data increases, while the new function APPROX_COUNT_DISTINCT consumes less time and resources when the amount of data increases. Change.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325554549&siteId=291194637