Series Article Directory
[SQL development practical skills] series (1): those things that have to be said about SQL
[SQL development practical skills] series (2): simple single table query
[SQL development practical skills] series (3): those things about SQL sorting
[SQL Development Practical Skills] Series (4): Discuss the Precautions for Using UNION ALL and Empty String & UNION and OR from the Execution Plan
[SQL Development Practical Skills] Series (5): Look at the Efficiency of IN, EXISTS and INNER JOIN from the Execution Plan , we need to divide the scenarios and don’t memorize the online conclusion
[SQL Development Practical Skills] series (6): Look at the efficiency of NOT IN, NOT EXISTS and LEFT JOIN from the execution plan, and remember that internal and external association conditions should not be misplaced
[SQL Development Practical Skills] series ( Seven): Let’s talk about how to compare the difference data and the corresponding number of records in two tables under the premise of duplicate data
[SQL development practical skills] series (eight): talk about how to insert data, which is more flexible than constraints to restrict data insertion And how does an insert statement insert multiple tables at the same time
[SQL development practical skills] series (9): An update mistakenly updates other column data to be empty? Merge rewrite update! Give you five ways to delete duplicate data!
[SQL Development Practical Skills] Series (10): Starting from splitting strings, replacing strings, and counting the number of occurrences of strings
[SQL Development Practical Skills] Series (11): Take a few cases to talk about translate|regexp_replace| listagg|wmsys.wm_concat|substr|regexp_substr Commonly used functions
[SQL development practical skills] series (12): Three questions (how to sort the strings in alphabetical order after deduplicating the letters of the string? How to identify which strings contain numbers? How to convert delimited data into a multivalued IN list?)
[SQL Development Practical Skills] Series (13): Discuss common aggregate functions & see sum() over () through the execution plan to accumulate employee wages
[SQL Development Practical Skills] Series (14): Calculate the balance after consumption &Calculate the cumulative sum of bank turnover & calculate the top three employees in each department's salary
[SQL development practical skills] series (fifteen): Find the data information of the row where the most value is located and quickly calculate the sum of max/min() keep() over(), fisrt_value, last_value, ratio_to_report
[SQL development practical skills] series (16): time type operation in data warehouse (primary) day, month, year, hour, minute, second difference and time interval calculation [SQL
development Practical Skills] Series (Seventeen): Time type operations in data warehouses (primary) determine the number of working days between two dates, calculate the number of occurrences of each date in the week of the year, and determine the difference between the current record and the next record Number of days
[SQL development practical skills] series (18): time type operations in data warehouse (advanced) INTERVAL, EXTRACT and how to determine whether a year is a leap year and the calculation of the week [SQL development practical skills]
series (19): How to print the calendar of the current month or year with one SQL in the time type operation (advanced) in the data warehouse? How to determine the date of the first and last day of the week in a month?
[SQL Development Practical Skills] Series (20): Time Type Operations in Data Warehouse (Advanced) Obtain Quarter Start and End Time and How to Count Discontinuous Time Data
[SQL Development Practical Skills] Series (21): Data Time type operations in the warehouse (advanced) Identify overlapping date ranges, and summarize data at specified 10-minute intervals
[SQL development practical skills] series (22): Data warehouse report scenario ☞ Is the efficiency of the analysis function must be fast Chat 1 Talk about the implementation of result set paging and interlaced sampling
[SQL Development Practical Skills] Series (23): Data Warehouse Report Scenario ☞ How to de-duplicate data permutations and how to find the record containing the maximum and minimum values? Use the execution plan again to prove to you that the performance of the analysis function is not good. Must be high
[SQL development practical skills] series (24): data warehouse report scenario ☞ Detailed explanation of "row to column" and "column to row" through case execution plan [SQL development practical skills]
series (25 ): Data warehouse report scenario ☞ Duplicate data in the result set is only displayed once and the efficient way to write the salary difference of the calculation department and how to quickly group data
[SQL development practical skills] series (26): Data warehouse report scenario ☞ chat How ROLLUP and UNION ALL perform group totals respectively and how to identify which rows are the result rows for summary
[SQL Development Practical Skills] Series (27): Data Warehouse Report Scenario ☞Analytical functions are explained in detail by aggregating moving ranges The principle of window opening and how to print the ninety-nine multiplication table with one SQL
[SQL development practical skills] series (28): Data warehouse report scenario ☞ personnel distribution and how to achieve simultaneous aggregation of different groups (partitions)
Article Directory
foreword
The main content of this article is: the problem of spatial distribution of personnel through row-to-column conversion (work is displayed as a column, and each employee is displayed as a row), issues that should be paid attention to in continuous row-to-column conversion, different groups and partitions at the same time through execution plans Realize the aggregation requirements: It is required to list the number of employees in the department and position in the detailed data of the employee table! !
[SQL development practical skills] This series of bloggers writes as a review of old knowledge. After all, SQL development is very important and basic in data analysis scenarios. Interviews will often ask about SQL development and tuning experience. I believe that when I finish writing this A series of articles can also gain something, and you can also face SQL interviews with ease in the future~.
1. The distribution of personnel in the workspace
Now there is a requirement: Each job is required to be displayed as a column, and each employee is displayed as a row. When the employee corresponds to the job, it is displayed as yes, and if it does not correspond, it is displayed as empty!
What about this requirement?
In fact, we can use the PIVOT function to group by job and employee, and set the corresponding position to be:
SQL> select * from (select ename,job from emp)
2 pivot(
3 max('是')
4 for job in(
5 'ANALYST' as ANALYST,
6 'CLERK' as CLERK,
7 'MANAGER' as MANAGER,
8 'PRESIDENT' as PRESIDENT,
9 'SALESMAN' as SALESMAN
10 )
11 );
ENAME ANALYST CLERK MANAGER PRESIDENT SALESMAN
---------- ------- ----- ------- --------- --------
ADAMS 是
ALLEN 是
BLAKE 是
CLARK 是
FORD 是
JAMES 是
JONES 是
KING 是
MARTIN 是
MILLER 是
SCOTT 是
SMITH 是
TURNER 是
WARD 是
14 rows selected
This statement is equivalent to group by ename,job
.
2. Create a sparse matrix
To increase the difficulty of the above problem, the current requirement is: the corresponding position is directly displayed as the employee's name, and the distribution among departments is increased. Because the data is not summarized, it can still be processed by PIVOT. The query statement is as follows:
SQL>
SQL> select *
2 from (select empno, ename, ename as ename2, job, deptno from emp)
3 pivot(max(ename)
4 for deptno in(10 as d10, 20 as d20, 30 as d30))
5 pivot(max(ename2)
6 for job in('ANALYST' as ANALYST,
7 'CLERK' as CLERK,
8 'MANAGER' as MANAGER,
9 'PRESIDENT' as PRESIDENT,
10 'SALESMAN' as SALESMAN
11 ));
EMPNO D10 D20 D30 ANALYST CLERK MANAGER PRESIDENT SALESMAN
----- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
7900 JAMES JAMES
7369 SMITH SMITH
7499 ALLEN ALLEN
7521 WARD WARD
7566 JONES JONES
7654 MARTIN MARTIN
7698 BLAKE BLAKE
7782 CLARK CLARK
7788 SCOTT SCOTT
7839 KING KING
7844 TURNER TURNER
7876 ADAMS ADAMS
7902 FORD FORD
7934 MILLER MILLER
14 rows selected
Note: If there is a summary of the data, do not use this method with two PIOVTs. Because this query is actually equivalent to the nesting of two PIVOT clauses.
In the previous article, there is a count case when statement, as follows:
SQL>
SQL> select count(case
2 when deptno = 10 then
3 ename
4 end) as deptno_10,
5 count(case
6 when deptno = 20 then
7 ename
8 end) as deptno_20,
9 count(case
10 when deptno = 30 then
11 ename
12 end) as deptno_30,
13 count(case
14 when job = 'ANALYST' then
15 job
16 end) as ANALYST,
17 count(case
18 when job = 'CLERK' then
19 job
20 end) as CLERK,
21 count(case
22 when job = 'MANAGER' then
23 job
24 end) as MANAGER,
25 count(case
26 when job = 'PRESIDENT' then
27 job
28 end) as PRESIDENT,
29 count(case
30 when job = 'SALESMAN' then
31 job
32 end) as SALESMAN
33 from emp;
DEPTNO_10 DEPTNO_20 DEPTNO_30 ANALYST CLERK MANAGER PRESIDENT SALESMAN
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
3 5 6 2 4 3 1 4
Let's try to rewrite it with PIOVT to see what happens. The original PIOVT statement is as follows:
SQL>
SQL> select *
2 from (select ename, ename as ename2, job, deptno from emp)
3 pivot(count(ename)
4 for deptno in(10 as d10, 20 as d20, 30 as d30))
5 pivot(count(ename2)
6 for job in('ANALYST' as ANALYST,
7 'CLERK' as CLERK,
8 'MANAGER' as MANAGER,
9 'PRESIDENT' as PRESIDENT,
10 'SALESMAN' as SALESMAN
11 ));
D10 D20 D30 ANALYST CLERK MANAGER PRESIDENT SALESMAN
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
0 0 1 0 1 1 0 4
0 1 0 2 2 1 0 0
1 0 0 0 1 1 1 0
SQL>
You can see the data, which is inconsistent with the results of case when. The following is changed to a nested method for analysis.
Nesting first step:
SQL> with t as (
2 select *
3 from (select ename, ename as ename2, job, deptno from emp)
4 pivot(count(ename)
5 for deptno in(10 as d10, 20 as d20, 30 as d30))
6 )
7 select * from t;
ENAME2 JOB D10 D20 D30
---------- --------- ---------- ---------- ----------
FORD ANALYST 0 1 0
KING PRESIDENT 1 0 0
WARD SALESMAN 0 0 1
ADAMS CLERK 0 1 0
ALLEN SALESMAN 0 0 1
BLAKE MANAGER 0 0 1
CLARK MANAGER 1 0 0
JAMES CLERK 0 0 1
JONES MANAGER 0 1 0
SCOTT ANALYST 0 1 0
SMITH CLERK 0 1 0
MARTIN SALESMAN 0 0 1
MILLER CLERK 1 0 0
TURNER SALESMAN 0 0 1
14 rows selected
The first step is equivalent to group by empno,job
.
Nested example second step:
SQL> with t as
2 (select *
3 from (select ename, ename as ename2, job, deptno from emp)
4 pivot(count(ename)
5 for deptno in(10 as d10, 20 as d20, 30 as d30)))
6 select *
7 from t
8 pivot (count(ename2) for job in('ANALYST' as ANALYST,
9 'CLERK' as CLERK,
10 'MANAGER' as MANAGER,
11 'PRESIDENT' as PRESIDENT,
12 'SALESMAN' as SALESMAN));
D10 D20 D30 ANALYST CLERK MANAGER PRESIDENT SALESMAN
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
0 0 1 0 1 1 0 4
0 1 0 2 2 1 0 0
1 0 0 0 1 1 1 0
SQL>
Because the column returned in the first step is (ENAME2, JOB, D10, D20, D30), after removing (ENAME2, JOB), the rest is (D10, D20, D30). So the second
step is equivalent to group by D10,D20,D30
.
But what we want is to calculate the count of the emp table according to the job grouping and the departmental grouping. They are the statistical combination of the two combinations of the emp table, instead of grouping count on the basis of condition 1.
3. Simultaneous aggregation of different groups and partitions
Now there is a requirement: It is required to list the number of employees in the department and position in the detailed data of the employee table.
Before using the analysis function, this kind of demand needs to use self-correlation:
SQL> with t as
2 (select count(*) as cnt from emp),
3 t1 as
4 (select deptno, count(*) as dcnt from emp group by deptno),
5 t2 as
6 (select job, count(*) as jcnt from emp group by job)
7 select emp.ename,
8 emp.deptno,
9 t1.dcnt,
10 emp.job,
11 t2.jcnt,
12 (select * from t) as cnt
13 from emp
14 inner join t1
15 on (emp.deptno = t1.deptno)
16 inner join t2
17 on (emp.job = t2.job);
ENAME DEPTNO DCNT JOB JCNT CNT
---------- ------ ---------- --------- ---------- ----------
FORD 20 5 ANALYST 2 14
SCOTT 20 5 ANALYST 2 14
MILLER 10 3 CLERK 4 14
JAMES 30 6 CLERK 4 14
ADAMS 20 5 CLERK 4 14
SMITH 20 5 CLERK 4 14
CLARK 10 3 MANAGER 3 14
BLAKE 30 6 MANAGER 3 14
JONES 20 5 MANAGER 3 14
KING 10 3 PRESIDENT 1 14
TURNER 30 6 SALESMAN 4 14
MARTIN 30 6 SALESMAN 4 14
WARD 30 6 SALESMAN 4 14
ALLEN 30 6 SALESMAN 4 14
14 rows selected
SQL>
Take a look at the execution plan:
Plan Hash Value :
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 868 | 12 | 00:00:01 |
| 1 | VIEW | | 1 | 13 | 1 | 00:00:01 |
| 2 | SORT AGGREGATE | | 1 | | | |
| 3 | INDEX FULL SCAN | IDX_EMPNO | 15 | | 1 | 00:00:01 |
| * 4 | HASH JOIN | | 14 | 868 | 11 | 00:00:01 |
| * 5 | HASH JOIN | | 13 | 559 | 7 | 00:00:01 |
| 6 | VIEW | | 3 | 78 | 4 | 00:00:01 |
| 7 | SORT GROUP BY | | 3 | 9 | 4 | 00:00:01 |
| 8 | TABLE ACCESS FULL | EMP | 15 | 45 | 3 | 00:00:01 |
| * 9 | TABLE ACCESS FULL | EMP | 13 | 221 | 3 | 00:00:01 |
| 10 | VIEW | | 5 | 95 | 4 | 00:00:01 |
| 11 | SORT GROUP BY | | 5 | 40 | 4 | 00:00:01 |
| 12 | TABLE ACCESS FULL | EMP | 15 | 120 | 3 | 00:00:01 |
------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 4 - access("EMP"."JOB"="T2"."JOB")
* 5 - access("EMP"."DEPTNO"="T1"."DEPTNO")
* 9 - filter("EMP"."JOB" IS NOT NULL AND "EMP"."DEPTNO" IS NOT NULL)
This way of writing is more complicated, and it needs to visit the table emp four times (because I have built an index. So one has gone away from the index).
If you use an analytic function instead, the statement is simpler:
SQL> select emp.ename,
2 emp.deptno,
3 count(*) over(partition by deptno) dcnt,
4 emp.job,
5 count(*) over(partition by job) jcnt,
6 count(*) over() as cnt
7 from emp
8 ;
ENAME DEPTNO DCNT JOB JCNT CNT
---------- ------ ---------- --------- ---------- ----------
MILLER 10 3 CLERK 4 14
KING 10 3 PRESIDENT 1 14
CLARK 10 3 MANAGER 3 14
SMITH 20 5 CLERK 4 14
SCOTT 20 5 ANALYST 2 14
ADAMS 20 5 CLERK 4 14
FORD 20 5 ANALYST 2 14
JONES 20 5 MANAGER 3 14
WARD 30 6 SALESMAN 4 14
MARTIN 30 6 SALESMAN 4 14
TURNER 30 6 SALESMAN 4 14
ALLEN 30 6 SALESMAN 4 14
JAMES 30 6 CLERK 4 14
BLAKE 30 6 MANAGER 3 14
14 rows selected
Look at the execution plan:
Plan Hash Value : 4086863039
----------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 15 | 255 | 5 | 00:00:01 |
| 1 | WINDOW SORT | | 15 | 255 | 5 | 00:00:01 |
| 2 | WINDOW SORT | | 15 | 255 | 5 | 00:00:01 |
| 3 | TABLE ACCESS FULL | EMP | 15 | 255 | 3 | 00:00:01 |
----------------------------------------------------------------------
From the perspective of the execution plan, the table is scanned once.
But didn't I have two articles before me that kept saying that you should be cautious when using analytical functions? Why do I recommend it to everyone?
When encountering such a situation where the same table is accessed multiple times, you can try to see if it can be rewritten with an analysis function, and how efficient it is after rewriting. If you can get the conclusion that "the performance improvement is obvious" by analyzing the execution plan like I am now ,
then of course your scene can be used, and of course the most important point is: don’t forget to check the data after rewriting! This is a very important point.
Summarize
The main content of this article is: the problem of spatial distribution of personnel through row-to-column conversion (work is displayed as a column, and each employee is displayed as a row), issues that should be paid attention to in continuous row-to-column conversion, different groups and partitions at the same time through execution plans Realize the aggregation requirements: It is required to list the number of employees in the department and position in the detailed data of the employee table! !