In Oracle11g, the APPROX_COUNT_DISTINCT function was added in order to improve the unique value count function when the DBMS_STATS package collects statistics, but it is not documented in the document. This function is included in the Oracle12c documentation, so we can now use it freely in our application.
1. Basic usage
In previous database versions, if we wanted to count unique values, we might do so.
SELECT COUNT(DISTINCT c_name) AS nm_cnt
FROM test;
NM_CNT
----------
58172
1 row selected.
SQL>
The query will yield exact unique value results based on Oracle's read consistency model. That is, we will see committed data, and uncommitted changes made by the current session.
In contrast, the new function APPROX_COUNT_DISTINCT will not give exact results, but will deviate from the exact results.
SELECT APPROX_COUNT_DISTINCT(c_name) AS nm_cnt
FROM test;
NM_CNT
----------
56789
1 row selected.
SQL>
This function can be used in grouped queries.
SELECT tablespace_name,APPROX_COUNT_DISTINCT(table_name) AS tab_count
FROM user_tables
GROUP BY tablespace_name
ORDER BY tablespace_name;
TABLESPACE_NAME TAB_COUNT
------------------------------ ----------
SYSAUX 78
SYSTEM 22
USERS 7
48
4 rows selected.
SQL>
2. Performance
In the example below, we can see the difference in performance between the two methods, but it doesn't seem to be particularly large.
SET TIMING ON
SELECT COUNT(DISTINCT c_name) AS nm_cnt
FROM test;
NM_CNT
----------
58172
1 row selected.
Elapsed: 00:00:02.39
SQL>
SELECT APPROX_COUNT_DISTINCT(c_name) AS nm_cnt
FROM test;
NM_CNT
----------
56789
1 row selected.
Elapsed: 00:00:02.00
SQL>
In fact, the APPROX_COUNT_DISTINCT function is used to handle much larger loads, below, we create a much larger table.
DROP TABLE test PURGE;
CREATE TABLE test AS
SELECT level AS data
FROM dual
CONNECT BY level <= 10000;
INSERT /*+ APPEND */ INTO test
SELECT a.data FROM test a
CROSS JOIN test b;
COMMIT;
EXEC DBMS_STATS.gather_table_stats(‘Test’,'Test');
There are now more than 1 million data in the table, 10,000 unique values. We will see that the performance of the two methods is quite different.
SET TIMING ON
SELECT COUNT(DISTINCT data) AS data_count
FROM test;
DATA_COUNT
----------
10000
1 row selected.
Elapsed: 00:00:19.66
SQL>
SELECT APPROX_COUNT_DISTINCT(data) ASdata_count
FROM test;
DATA_COUNT
----------
10030
1 row selected.
Elapsed: 00:00:10.46
SQL>
Through testing, it will be found that the previous method consumes more and more time and resources when the amount of data increases, while the new function APPROX_COUNT_DISTINCT consumes less time and resources when the amount of data increases. Change.