Learning from "breaking the tip of the sword Oracle Developer art" Chapter V extension report development of the GROUP BY
For simple group by statement is hard to analyze complex dimensional, complex reporting requirements is difficult to achieve actual production, propagation characteristics would require the group by, union statement needs but can also be achieved sql complex and inefficient
1 rollup Multidimensional Summary
rollup, conventional packet to packet, and based on this, from right to left through the column, followed by a higher level of subtotals, and finally the total, note the order and grouping columns associated rollup
Designated n-th column, there are n + 1 grouping Species
Part of the rollup can eliminate some unwanted subtotals and totals
example
[oracle@localhost ~]$ sqlplus scott/tiger; SQL*Plus: Release 11.2.0.4.0 Production on Mon Mar 23 10:31:24 2020 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options 10:31:24 SCOTT@edw> set autotrace on 10:31:30 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY ROLLUP(a.dname,b.job); DNAME JOB SUM_SAL -------------- --------- ---------- SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 SALES 9400 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 RESEARCH 10875 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING 8750 29025 13 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 3067950682 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 392 | 7 (29)| 00:00:01 | | 1 | SORT GROUP BY ROLLUP | | 14 | 392 | 7 (29)| 00:00:01 | | 2 | MERGE JOIN | | 14 | 392 | 6 (17)| 00:00:01 | | 3 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 52 | 2 (0)| 00:00:01 | | 4 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 | |* 5 | SORT JOIN | | 14 | 210 | 4 (25)| 00:00:01 | | 6 | TABLE ACCESS FULL | EMP | 14 | 210 | 3 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("A"."DEPTNO"="B"."DEPTNO") filter("A"."DEPTNO"="B"."DEPTNO") Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 913 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 2 sorts (memory) 0 sorts (disk) 13 rows processed 10:31:34 SCOTT@edw>
It can be seen only dept and emp tables are scanned only once, and if the union is to write the scan will be repeated, low efficiency
By executing plans to see there is a hidden operating SORT GROUP BY ROLLUP, displaying the results in order, in general or to display the sort, the default sort is not necessarily in line with business needs
packet directional rollup
If hint: expand_gset_to_union, the optimizer will rollup into a corresponding union all operations other grouping sets, cube may be
Part rollup packet will not need to come from the subtotal column to the group by rollup can, of course, not the total
example
10:31:34 SCOTT@edw> set autotrace off 10:43:49 SCOTT@edw> SELECT to_char(b.hiredate,'yyyy') hire_year,a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY to_char(b.hiredate,'yyyy'),a.dname,ROLLUP(b.job); HIRE DNAME JOB SUM_SAL ---- -------------- --------- ---------- 1980 RESEARCH CLERK 800 1980 RESEARCH 800 1981 SALES CLERK 950 1981 SALES MANAGER 2850 1981 SALES SALESMAN 5600 1981 SALES 9400 1981 RESEARCH ANALYST 3000 1981 RESEARCH MANAGER 2975 1981 RESEARCH 5975 1981 ACCOUNTING MANAGER 2450 1981 ACCOUNTING PRESIDENT 5000 1981 ACCOUNTING 7450 1982 ACCOUNTING CLERK 1300 1982 ACCOUNTING 1300 1987 RESEARCH CLERK 1100 1987 RESEARCH ANALYST 3000 1987 RESEARCH 4100 17 rows selected. Elapsed: 00:00:00.01 10:43:53 SCOTT@edw>
2 cube cross-tab report
cube packets can achieve more elaborate statistics, so the different dimensions may be analyzed, generate cross report, cube packet, is the sum, i.e. a column is not taken from the n-th column first, and then the subtotal, that is to take a to column n-1, the last n-th column to take all, i.e., standard packet
Because it contains all possible combinations, so the result has nothing to do with the order of columns, column order affects only the default sort to hide it, if you do not care a Sort
grouping a cube increases, the result may be exponential growth, the type of packet 2 ^ n
The syntax is similar to the example
11:02:40 SCOTT@edw> set autotrace on 11:02:48 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY CUBE(a.dname,b.job); DNAME JOB SUM_SAL -------------- --------- ---------- 29025 CLERK 4150 ANALYST 6000 MANAGER 8275 SALESMAN 5600 PRESIDENT 5000 SALES 9400 SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 RESEARCH 10875 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 ACCOUNTING 8750 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 18 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 2382666110 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 14 | 392 | 7 (29)| 00:00:01 | | 1 | SORT GROUP BY | | 14 | 392 | 7 (29)| 00:00:01 | | 2 | GENERATE CUBE | | 14 | 392 | 7 (29)| 00:00:01 | | 3 | SORT GROUP BY | | 14 | 392 | 7 (29)| 00:00:01 | | 4 | MERGE JOIN | | 14 | 392 | 6 (17)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 52 | 2 (0)| 00:00:01 | | 6 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 | |* 7 | SORT JOIN | | 14 | 210 | 4 (25)| 00:00:01 | | 8 | TABLE ACCESS FULL | EMP | 14 | 210 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 7 - access("A"."DEPTNO"="B"."DEPTNO") filter("A"."DEPTNO"="B"."DEPTNO") Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 8 consistent gets 0 physical reads 0 redo size 1175 bytes sent via SQL*Net to client 535 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 3 sorts (memory) 0 sorts (disk) 18 rows processed 11:02:52 SCOTT@edw>
You can see the execution plan, the result is ordered
Cube packet portion, examples
11:06:24 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY a.dname,CUBE(b.job); DNAME JOB SUM_SAL -------------- --------- ---------- SALES 9400 SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 RESEARCH 10875 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 ACCOUNTING 8750 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 12 rows selected. Elapsed: 00:00:00.00 11:06:26 SCOTT@edw>
3 grouping sets achieve Subtotal
rollup and cube will produce a variety of standard grouping, subtotals, total, grouping sets only concern specified dimension of subtotals, the result is n n columns species
The grouping sets (a, b, c) is the group by a, group by b group by c and union all results
example
11:06:26 SCOTT@edw> set autotrace on 11:12:33 SCOTT@edw> SELECT to_char(b.hiredate,'yyyy') hire_year,a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY GROUPING SETS( to_char(b.hiredate,'yyyy'),a.dname,b.job); HIRE DNAME JOB SUM_SAL ---- -------------- --------- ---------- CLERK 4150 SALESMAN 5600 PRESIDENT 5000 MANAGER 8275 ANALYST 6000 ACCOUNTING 8750 RESEARCH 10875 SALES 9400 1987 4100 1980 800 1982 1300 1981 22825 12 rows selected. Elapsed: 00:00:00.01 Execution Plan ---------------------------------------------------------- Plan hash value: 2825031421 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 14 | 448 | 17 (24)| 00:00:01 | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D660D_29B9BB | | | | | | 3 | MERGE JOIN | | 14 | 504 | 6 (17)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID| DEPT | 4 | 52 | 2 (0)| 00:00:01 | | 5 | INDEX FULL SCAN | PK_DEPT | 4 | | 1 (0)| 00:00:01 | |* 6 | SORT JOIN | | 14 | 322 | 4 (25)| 00:00:01 | | 7 | TABLE ACCESS FULL | EMP | 14 | 322 | 3 (0)| 00:00:01 | | 8 | LOAD AS SELECT | SYS_TEMP_0FD9D660E_29B9BB | | | | | | 9 | HASH GROUP BY | | 5 | 60 | 3 (34)| 00:00:01 | | 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660D_29B9BB | 14 | 168 | 2 (0)| 00:00:01 | | 11 | LOAD AS SELECT | SYS_TEMP_0FD9D660E_29B9BB | | | | | | 12 | HASH GROUP BY | | 4 | 56 | 3 (34)| 00:00:01 | | 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660D_29B9BB | 14 | 196 | 2 (0)| 00:00:01 | | 14 | LOAD AS SELECT | SYS_TEMP_0FD9D660E_29B9BB | | | | | | 15 | HASH GROUP BY | | 1 | 8 | 3 (34)| 00:00:01 | | 16 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660D_29B9BB | 14 | 112 | 2 (0)| 00:00:01 | | 17 | VIEW | | 5 | 160 | 2 (0)| 00:00:01 | | 18 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660E_29B9BB | 5 | 60 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 6 - access("SYS_TBL_$2$"."DEPTNO"="SYS_TBL_$1$"."DEPTNO") filter("SYS_TBL_$2$"."DEPTNO"="SYS_TBL_$1$"."DEPTNO") Statistics ---------------------------------------------------------- 23 recursive calls 33 db block gets 39 consistent gets 4 physical reads 2172 redo size 962 bytes sent via SQL*Net to client 524 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 12 rows processed 11:12:36 SCOTT@edw>
Implementation plan can be seen, there is no default sort, and disorder, and order of the columns has nothing to do
Similarly grouping sets a packet portion, examples
11:12:36 SCOTT@edw> set autotrace off 11:17:03 SCOTT@edw> SELECT a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY a.dname,GROUPING SETS(to_char(b.hiredate,'yyyy'),b.job); DNAME HIRE JOB SUM_SAL -------------- ---- --------- ---------- SALES MANAGER 2850 SALES CLERK 950 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING CLERK 1300 RESEARCH MANAGER 2975 SALES SALESMAN 5600 RESEARCH ANALYST 6000 RESEARCH CLERK 1900 RESEARCH 1981 5975 SALES 1981 9400 RESEARCH 1987 4100 ACCOUNTING 1981 7450 ACCOUNTING 1982 1300 RESEARCH 1980 800 15 rows selected. Elapsed: 00:00:00.01 11:17:05 SCOTT@edw>
Note that the meaning of this time have a greater change
cube, rollup grouping sets of parameters as
grouping sets only single packet, there is no summing of, if necessary to provide a total, or may be rollup cube as a parameter example
11:23:59 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY GROUPING sets(rollup(a.dname),ROLLUP(b.job)); DNAME JOB SUM_SAL -------------- --------- ---------- CLERK 4150 SALESMAN 5600 PRESIDENT 5000 MANAGER 8275 ANALYST 6000 ACCOUNTING 8750 RESEARCH 10875 SALES 9400 29025 29025 10 rows selected. Elapsed: 00:00:00.02 11:24:02 SCOTT@edw>
Problem is to produce a total two lines, as a cube or rollup grouping sets parameters corresponding to each of the union all cube or rollup operation, equivalent to the easier to understand the function
For repeated together, to eliminate the use of distinct, and there are additional special functions can be used, may be used to eliminate duplicate packets group_id (and distinct functions are not the same)
rollup and parameters can also be mixed as a cube, but can also be extended to use other features, such as part of a packet, a packet composite column, and the like connected to packet
rollup and grouping sets cube unacceptable as a parameter, rollup, and each cube as parameters nor
4 combination of columns packets, packet connections, the column reset packet
Combination of columns packets, is very useful to connect to a packet in a complex report. Combination of columns used to eliminate unnecessary packet reservation subtotal sum, according to a connectionless packet Cartesian product of the operation of each packet, the packet more finer. For conventional packet can not meet the requirements can be considered
Combination of columns about to treat multiple columns as a whole, the following comparison table clearly shows differences between
Connectionless packet more powerful, allowing the group by the emergence of multiple rollup, cube and grouping sets operation, so more grouping level, more sophisticated reporting, achieve very complex needs actually are the same regardless of the type of connection to a packet or unreasonable type of connection to a packet between the level of the last packet of each type which is a product of the type of extended packet level, group level is the Cartesian product, such as rollup (a, b), rollup (c), the final 6 = 3 * 2 packet level
Repeat column group by group is allowed duplicate columns, such group by rollup (a, (a, b)), group by a, rollup (a, b)
Combination of columns packets
example
14:48:13 SCOTT@edw> SELECT a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY rollup(a.dname,(to_char(b.hiredate,'yyyy'),b.job)); DNAME HIRE JOB SUM_SAL -------------- ---- --------- ---------- SALES 1981 CLERK 950 SALES 1981 MANAGER 2850 SALES 1981 SALESMAN 5600 SALES 9400 RESEARCH 1980 CLERK 800 RESEARCH 1981 ANALYST 3000 RESEARCH 1981 MANAGER 2975 RESEARCH 1987 CLERK 1100 RESEARCH 1987 ANALYST 3000 RESEARCH 10875 ACCOUNTING 1981 MANAGER 2450 ACCOUNTING 1981 PRESIDENT 5000 ACCOUNTING 1982 CLERK 1300 ACCOUNTING 8750 29025 15 rows selected. Elapsed: 00:00:00.00 14:48:16 SCOTT@edw>
Grouping similar parts rollup combination of columns and partial cube packets and adding the total effect can be achieved
But this too much trouble, the need for cube, rollup total and subtotal excluding demand with grouping_id or grouping function
rollup cube and can be converted to the corresponding grouping sets
Of course, the reverse is also possible, but not meaningful
Connect to a packet
example
14:48:16 SCOTT@edw> SELECT a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY rollup(a.dname,b.job),ROLLUP(to_char(b.hiredate,'yyyy')); DNAME HIRE JOB SUM_SAL -------------- ---- --------- ---------- SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 SALES 9400 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 RESEARCH 10875 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 ACCOUNTING 8750 29025 RESEARCH 1980 CLERK 800 RESEARCH 1980 800 1980 800 SALES 1981 CLERK 950 SALES 1981 MANAGER 2850 SALES 1981 SALESMAN 5600 SALES 1981 9400 RESEARCH 1981 ANALYST 3000 RESEARCH 1981 MANAGER 2975 RESEARCH 1981 5975 ACCOUNTING 1981 MANAGER 2450 ACCOUNTING 1981 PRESIDENT 5000 ACCOUNTING 1981 7450 1981 22825 ACCOUNTING 1982 CLERK 1300 ACCOUNTING 1982 1300 1982 1300 RESEARCH 1987 CLERK 1100 RESEARCH 1987 ANALYST 3000 RESEARCH 1987 4100 1987 4100 34 rows selected. Elapsed: 00:00:00.01 14:57:57 SCOTT@edw>
Cartesian product of the equivalent of two rollup
After understanding, the use of a connectionless packet, cube can rollup conversion, such as cube (a, b, c) is equal to rollup (a), rollup (b ), rollup (c), but for the rollup and grouping sets convert cube generally no what with
Is typically connected to the same packet type, packet barrier type of connection generally not used
Repeat column grouping
example
14:57:57 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY a.dname,ROLLUP(a.dname,b.job); DNAME JOB SUM_SAL -------------- --------- ---------- SALES CLERK 950 SALES MANAGER 2850 SALES SALESMAN 5600 RESEARCH CLERK 1900 RESEARCH ANALYST 6000 RESEARCH MANAGER 2975 ACCOUNTING CLERK 1300 ACCOUNTING MANAGER 2450 ACCOUNTING PRESIDENT 5000 SALES 9400 RESEARCH 10875 ACCOUNTING 8750 SALES 9400 RESEARCH 10875 ACCOUNTING 8750 15 rows selected. Elapsed: 00:00:00.00 15:07:14 SCOTT@edw>
Lacks significance examples illustrate syntax allows only
5 extended packet three functions: grouping, grouping_id, group_id
Extended three group functions: grouping, grouping_id, group_id in generating meaningful reports, filters the results sorted in a very important role, commonly used in complex report queries
Note grouping of functions and parameters can not be grouping_id combination of columns
grouping function used to create meaningful reports
grouping_id function and sorting the filtered results
discarding duplicates the function group_id
grouping function
In the extended group by clause is, null represents subtotal or total, but if the data already has a null value it? grouping function handles null packet group by extension problems:
It only accepts a parameter, and the parameter from the rollup, cube, grouping sets of columns. Of course, not be in the group by clause in the three columns, but the results must be 0, meaning no
grouping function for total or subtotal column returns 1, otherwise it returns 0. Whether the difference between the original data containing null, often used in conjunction with decode. Of course, also possible to determine the level of packet filtering whereby some lines, however, going to be bored, is generally substitute with grouping_id
example
15:34:01 SCOTT@edw> SELECT decode(GROUPING(a.dname),1,'全部部门',a.dname) dname,decode(grouping(b.mgr),1,'全部老板',b.mgr) mgr,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY ROLLUP(a.dname,b.mgr); DNAME MGR SUM_SAL -------------- ---------------------------------------- ---------- SALES 7698 6550 SALES 7839 2850 SALES 全部老板 9400 RESEARCH 7566 6000 RESEARCH 7788 1100 7839 2975 RESEARCH RESEARCH 7902 800 RESEARCH All boss 10875 the ACCOUNTING 5000 the ACCOUNTING 7782 1300 the ACCOUNTING 7839 2450 the ACCOUNTING boss 8750 All All All departments boss 29025 13 rows the Selected. The Elapsed: 00: 00: 00.01 15:34:12 SCOTT @ EDW>
grouping_id function
Grouping level for filtering and sorting the results
Can accept multiple parameters from rollup, cube, grouping sets of columns, from left to right in the order calculated column, the column is a packet 0, is as a total or subtotal, and then combined into a binary digit is called a bit vector , 10 bit vector converted to decimal i.e., the final result, on behalf of the packet level, such as the cube (a, b), then grouping_id (a, b) represent the following
Benefits grouping_id that can be calculated for multiple columns to get the group level
example
15:46:26 SCOTT@edw> SELECT a.dname,b.mgr,b.job,SUM(b.sal) sum_sal FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY ROLLUP(a.dname,b.mgr,b.job) HAVING grouping_id(a.dname,b.mgr,b.job) IN (0,7); DNAME MGR JOB SUM_SAL -------------- ---------- --------- ---------- SALES 7698 CLERK 950 SALES 7698 SALESMAN 5600 SALES 7839 MANAGER 2850 RESEARCH 7566 ANALYST 6000 RESEARCH 7788 CLERK 1100 RESEARCH 7839 MANAGER 2975 RESEARCH 7902 CLERK 800 ACCOUNTING PRESIDENT 5000 ACCOUNTING 7782 CLERK 1300 ACCOUNTING 7839 MANAGER 2450 29025 11 rows selected. Elapsed: 00:00:00.00 15:46:29 SCOTT@edw>
group_id function
group_id without parameters, as extended clause allows more complex group by grouping operation, in order to achieve sometimes complex report, the statistics may duplicate, and duplicate packets can be distinguished group_id function result, the first time is 0, increased after each occurrence 1, group_id appear in select lacks significance, it is common for having clause excluding double counting
example
15:46:29 SCOTT@edw> SELECT a.dname,b.job,SUM(b.sal) sum_sal,group_id() gi FROM dept a,emp b WHERE a.deptno=b.deptno GROUP BY GROUPING SETS(ROLLUP(a.dname),ROLLUP(b.job)) HAVING group_id()=0; DNAME JOB SUM_SAL GI -------------- --------- ---------- ---------- CLERK 4150 0 SALESMAN 5600 0 PRESIDENT 5000 0 MANAGER 8275 0 ANALYST 6000 0 ACCOUNTING 8750 0 RESEARCH 10875 0 SALES 9400 0 29025 0 9 rows selected. Elapsed: 00:00:00.01 15:55:55 SCOTT@edw>