Series Article Directory

Article Directory

Series Article Directory
foreword
1. The distribution of personnel in the workspace
2. Create a sparse matrix
3. Simultaneous aggregation of different groups and partitions
Summarize

foreword

The main content of this article is: the problem of spatial distribution of personnel through row-to-column conversion (work is displayed as a column, and each employee is displayed as a row), issues that should be paid attention to in continuous row-to-column conversion, different groups and partitions at the same time through execution plans Realize the aggregation requirements: It is required to list the number of employees in the department and position in the detailed data of the employee table! !
[SQL development practical skills] This series of bloggers writes as a review of old knowledge. After all, SQL development is very important and basic in data analysis scenarios. Interviews will often ask about SQL development and tuning experience. I believe that when I finish writing this A series of articles can also gain something, and you can also face SQL interviews with ease in the future~.

1. The distribution of personnel in the workspace

Now there is a requirement: Each job is required to be displayed as a column, and each employee is displayed as a row. When the employee corresponds to the job, it is displayed as yes, and if it does not correspond, it is displayed as empty!
What about this requirement?
In fact, we can use the PIVOT function to group by job and employee, and set the corresponding position to be:

SQL> select * from (select ename,job from emp)
  2  pivot(
  3  max('是')
  4  for job in(
  5    'ANALYST' as ANALYST,
  6    'CLERK' as CLERK,
  7    'MANAGER' as MANAGER,
  8    'PRESIDENT' as PRESIDENT,
  9    'SALESMAN' as SALESMAN
 10    )
 11  );

ENAME      ANALYST CLERK MANAGER PRESIDENT SALESMAN
---------- ------- ----- ------- --------- --------
ADAMS              是                      
ALLEN                                      是
BLAKE                    是                
CLARK                    是                
FORD       是                              
JAMES              是                      
JONES                    是                
KING                             是        
MARTIN                                     是
MILLER             是                      
SCOTT      是                              
SMITH              是                      
TURNER                                     是
WARD                                       是

14 rows selected

This statement is equivalent to group by ename,job.

2. Create a sparse matrix

To increase the difficulty of the above problem, the current requirement is: the corresponding position is directly displayed as the employee's name, and the distribution among departments is increased. Because the data is not summarized, it can still be processed by PIVOT. The query statement is as follows:

SQL> 
SQL> select *
  2    from (select empno, ename, ename as ename2, job, deptno from emp)
  3  pivot(max(ename)
  4     for deptno in(10 as d10, 20 as d20, 30 as d30))
  5  pivot(max(ename2)
  6     for job in('ANALYST' as ANALYST,
  7                'CLERK' as CLERK,
  8                'MANAGER' as MANAGER,
  9                'PRESIDENT' as PRESIDENT,
 10                'SALESMAN' as SALESMAN
 11                ));

EMPNO D10        D20        D30        ANALYST    CLERK      MANAGER    PRESIDENT  SALESMAN
----- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
 7900                       JAMES                 JAMES                            
 7369            SMITH                            SMITH                            
 7499                       ALLEN                                                  ALLEN
 7521                       WARD                                                   WARD
 7566            JONES                                       JONES                 
 7654                       MARTIN                                                 MARTIN
 7698                       BLAKE                            BLAKE                 
 7782 CLARK                                                  CLARK                 
 7788            SCOTT                 SCOTT                                       
 7839 KING                                                              KING       
 7844                       TURNER                                                 TURNER
 7876            ADAMS                            ADAMS                            
 7902            FORD                  FORD                                        
 7934 MILLER                                      MILLER                           

14 rows selected

Note: If there is a summary of the data, do not use this method with two PIOVTs. Because this query is actually equivalent to the nesting of two PIVOT clauses.
In the previous article, there is a count case when statement, as follows:

SQL> 
SQL> select count(case
  2                 when deptno = 10 then
  3                  ename
  4               end) as deptno_10,
  5         count(case
  6                 when deptno = 20 then
  7                  ename
  8               end) as deptno_20,
  9         count(case
 10                 when deptno = 30 then
 11                  ename
 12               end) as deptno_30,
 13         count(case
 14                 when job = 'ANALYST' then
 15                  job
 16               end) as ANALYST,
 17         count(case
 18                 when job = 'CLERK' then
 19                  job
 20               end) as CLERK,
 21         count(case
 22                 when job = 'MANAGER' then
 23                  job
 24               end) as MANAGER,
 25         count(case
 26                 when job = 'PRESIDENT' then
 27                  job
 28               end) as PRESIDENT,
 29         count(case
 30                 when job = 'SALESMAN' then
 31                  job
 32               end) as SALESMAN
 33    from emp;

 DEPTNO_10  DEPTNO_20  DEPTNO_30    ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
         3          5          6          2          4          3          1          4

Let's try to rewrite it with PIOVT to see what happens. The original PIOVT statement is as follows:

SQL> 
SQL>  select *
  2     from (select  ename, ename as ename2, job, deptno from emp)
  3   pivot(count(ename)
  4      for deptno in(10 as d10, 20 as d20, 30 as d30))
  5   pivot(count(ename2)
  6      for job in('ANALYST' as ANALYST,
  7                 'CLERK' as CLERK,
  8                 'MANAGER' as MANAGER,
  9                 'PRESIDENT' as PRESIDENT,
 10                 'SALESMAN' as SALESMAN
 11                 ));

       D10        D20        D30    ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
         0          0          1          0          1          1          0          4
         0          1          0          2          2          1          0          0
         1          0          0          0          1          1          1          0

SQL>

You can see the data, which is inconsistent with the results of case when. The following is changed to a nested method for analysis.
Nesting first step:

SQL> with t as (
  2  select *
  3     from (select  ename, ename as ename2, job, deptno from emp)
  4   pivot(count(ename)
  5      for deptno in(10 as d10, 20 as d20, 30 as d30))
  6  )
  7  select * from t;

ENAME2     JOB              D10        D20        D30
---------- --------- ---------- ---------- ----------
FORD       ANALYST            0          1          0
KING       PRESIDENT          1          0          0
WARD       SALESMAN           0          0          1
ADAMS      CLERK              0          1          0
ALLEN      SALESMAN           0          0          1
BLAKE      MANAGER            0          0          1
CLARK      MANAGER            1          0          0
JAMES      CLERK              0          0          1
JONES      MANAGER            0          1          0
SCOTT      ANALYST            0          1          0
SMITH      CLERK              0          1          0
MARTIN     SALESMAN           0          0          1
MILLER     CLERK              1          0          0
TURNER     SALESMAN           0          0          1

14 rows selected

The first step is equivalent to group by empno,job.
Nested example second step:

SQL> with t as
  2   (select *
  3      from (select ename, ename as ename2, job, deptno from emp)
  4    pivot(count(ename)
  5       for deptno in(10 as d10, 20 as d20, 30 as d30)))
  6  select *
  7    from t
  8  pivot (count(ename2) for job in('ANALYST' as ANALYST,
  9                             'CLERK' as CLERK,
 10                             'MANAGER' as MANAGER,
 11                             'PRESIDENT' as PRESIDENT,
 12                             'SALESMAN' as SALESMAN));

       D10        D20        D30    ANALYST      CLERK    MANAGER  PRESIDENT   SALESMAN
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
         0          0          1          0          1          1          0          4
         0          1          0          2          2          1          0          0
         1          0          0          0          1          1          1          0

SQL>

Because the column returned in the first step is (ENAME2, JOB, D10, D20, D30), after removing (ENAME2, JOB), the rest is (D10, D20, D30). So the second
step is equivalent to group by D10,D20,D30.
But what we want is to calculate the count of the emp table according to the job grouping and the departmental grouping. They are the statistical combination of the two combinations of the emp table, instead of grouping count on the basis of condition 1.

3. Simultaneous aggregation of different groups and partitions

Now there is a requirement: It is required to list the number of employees in the department and position in the detailed data of the employee table.

Before using the analysis function, this kind of demand needs to use self-correlation:

SQL> with t as
  2   (select count(*) as cnt from emp),
  3  t1 as
  4   (select deptno, count(*) as dcnt from emp group by deptno),
  5  t2 as
  6   (select job, count(*) as jcnt from emp group by job)
  7  select emp.ename,
  8         emp.deptno,
  9         t1.dcnt,
 10         emp.job,
 11         t2.jcnt,
 12         (select * from t) as cnt
 13    from emp
 14   inner join t1
 15      on (emp.deptno = t1.deptno)
 16   inner join t2
 17      on (emp.job = t2.job);

ENAME      DEPTNO       DCNT JOB             JCNT        CNT
---------- ------ ---------- --------- ---------- ----------
FORD           20          5 ANALYST            2         14
SCOTT          20          5 ANALYST            2         14
MILLER         10          3 CLERK              4         14
JAMES          30          6 CLERK              4         14
ADAMS          20          5 CLERK              4         14
SMITH          20          5 CLERK              4         14
CLARK          10          3 MANAGER            3         14
BLAKE          30          6 MANAGER            3         14
JONES          20          5 MANAGER            3         14
KING           10          3 PRESIDENT          1         14
TURNER         30          6 SALESMAN           4         14
MARTIN         30          6 SALESMAN           4         14
WARD           30          6 SALESMAN           4         14
ALLEN          30          6 SALESMAN           4         14

14 rows selected


SQL>

Take a look at the execution plan:

 Plan Hash Value  : 

------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows | Bytes | Cost | Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |           |   14 |   868 |   12 | 00:00:01 |
|   1 |   VIEW                  |           |    1 |    13 |    1 | 00:00:01 |
|   2 |    SORT AGGREGATE       |           |    1 |       |      |          |
|   3 |     INDEX FULL SCAN     | IDX_EMPNO |   15 |       |    1 | 00:00:01 |
| * 4 |   HASH JOIN             |           |   14 |   868 |   11 | 00:00:01 |
| * 5 |    HASH JOIN            |           |   13 |   559 |    7 | 00:00:01 |
|   6 |     VIEW                |           |    3 |    78 |    4 | 00:00:01 |
|   7 |      SORT GROUP BY      |           |    3 |     9 |    4 | 00:00:01 |
|   8 |       TABLE ACCESS FULL | EMP       |   15 |    45 |    3 | 00:00:01 |
| * 9 |     TABLE ACCESS FULL   | EMP       |   13 |   221 |    3 | 00:00:01 |
|  10 |    VIEW                 |           |    5 |    95 |    4 | 00:00:01 |
|  11 |     SORT GROUP BY       |           |    5 |    40 |    4 | 00:00:01 |
|  12 |      TABLE ACCESS FULL  | EMP       |   15 |   120 |    3 | 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------
* 4 - access("EMP"."JOB"="T2"."JOB")
* 5 - access("EMP"."DEPTNO"="T1"."DEPTNO")
* 9 - filter("EMP"."JOB" IS NOT NULL AND "EMP"."DEPTNO" IS NOT NULL)

This way of writing is more complicated, and it needs to visit the table emp four times (because I have built an index. So one has gone away from the index).
If you use an analytic function instead, the statement is simpler:

SQL> select emp.ename,
  2         emp.deptno,
  3         count(*) over(partition by deptno) dcnt,
  4         emp.job,
  5         count(*) over(partition by job) jcnt,
  6         count(*) over() as cnt
  7    from emp
  8  ;

ENAME      DEPTNO       DCNT JOB             JCNT        CNT
---------- ------ ---------- --------- ---------- ----------
MILLER         10          3 CLERK              4         14
KING           10          3 PRESIDENT          1         14
CLARK          10          3 MANAGER            3         14
SMITH          20          5 CLERK              4         14
SCOTT          20          5 ANALYST            2         14
ADAMS          20          5 CLERK              4         14
FORD           20          5 ANALYST            2         14
JONES          20          5 MANAGER            3         14
WARD           30          6 SALESMAN           4         14
MARTIN         30          6 SALESMAN           4         14
TURNER         30          6 SALESMAN           4         14
ALLEN          30          6 SALESMAN           4         14
JAMES          30          6 CLERK              4         14
BLAKE          30          6 MANAGER            3         14

14 rows selected

Look at the execution plan:

 Plan Hash Value  : 4086863039 

----------------------------------------------------------------------
| Id | Operation             | Name | Rows | Bytes | Cost | Time     |
----------------------------------------------------------------------
|  0 | SELECT STATEMENT      |      |   15 |   255 |    5 | 00:00:01 |
|  1 |   WINDOW SORT         |      |   15 |   255 |    5 | 00:00:01 |
|  2 |    WINDOW SORT        |      |   15 |   255 |    5 | 00:00:01 |
|  3 |     TABLE ACCESS FULL | EMP  |   15 |   255 |    3 | 00:00:01 |
----------------------------------------------------------------------

From the perspective of the execution plan, the table is scanned once.
But didn't I have two articles before me that kept saying that you should be cautious when using analytical functions? Why do I recommend it to everyone?
When encountering such a situation where the same table is accessed multiple times, you can try to see if it can be rewritten with an analysis function, and how efficient it is after rewriting. If you can get the conclusion that "the performance improvement is obvious" by analyzing the execution plan like I am now ,
then of course your scene can be used, and of course the most important point is: don’t forget to check the data after rewriting! This is a very important point.

[SQL Development Practical Skills] Series (28): Data Warehouse Report Scenario ☞Personnel Distribution and How to Realize Simultaneous Gathering of Different Groups (Partitions)