Series Article Directory

Article Directory

Series Article Directory
foreword
1. Rank the results and convert them into columns
Summarize

foreword

The main content of this article is: There is a requirement: grade the results in emp according to salary, among which the highest three grades are used as a column, the second highest three grades are used as a column, and the rest are used as a column. In this article, we will discuss how to achieve this requirement. Two methods of row-to-column conversion are given: case when and pivot. Use this case to understand the importance of hiding column information.
[SQL development practical skills] This series of bloggers writes as a review of old knowledge. After all, SQL development is very important and basic in data analysis scenarios. Interviews will often ask about SQL development and tuning experience. I believe that when I finish writing this A series of articles can also gain something, and you can also face SQL interviews with ease in the future~.

1. Rank the results and convert them into columns

Now there is a requirement: classify the results in emp by salary, among which the highest three grades are used as a column, the second highest three grades are used as a column, and the rest are used as a column.
In this article, we will discuss how to achieve this requirement. The solution to this problem is as follows:

1. Generate serial number

Here let the same data (3000) be sorted the same, and do not occupy the sorting position, so you need to use dense_rank
to generate the serial number:

with t as (
select ename,sal,dense_rank()over(order by sal desc) as rn
from emp
)
select * from t;
ENAME            SAL         RN
---------- --------- ----------
KING         5000.00          1
FORD         3000.00          2
SCOTT        3000.00          2
JONES        2975.00          3
BLAKE        2850.00          4
CLARK        2450.00          5
ALLEN        1600.00          6
TURNER       1500.00          7
MILLER       1300.00          8
WARD         1250.00          9
MARTIN       1250.00          9
ADAMS        1100.00         10
JAMES         950.00         11
SMITH         800.00         12

14 rows selected

From the above query results, we can see that (FORD, SCOTT) sal is 3000, and the sorting number rn is 2.

2. According to certain requirements (I am free here)

Divide the above data into three files, which can be done by CASE WHEN:

with t as
 (select ename, sal, dense_rank() over(order by sal desc) as rn from emp)
select t.*,
       case
         when rn <= 3 then
          1
         when rn <= 6 then
          2
         else
          3
       end as new_rn
  from t;
ENAME            SAL         RN     NEW_RN
---------- --------- ---------- ----------
KING         5000.00          1          1
FORD         3000.00          2          1
SCOTT        3000.00          2          1
JONES        2975.00          3          1
BLAKE        2850.00          4          2
CLARK        2450.00          5          2
ALLEN        1600.00          6          2
TURNER       1500.00          7          3
MILLER       1300.00          8          3
WARD         1250.00          9          3
MARTIN       1250.00          9          3
ADAMS        1100.00         10          3
JAMES         950.00         11          3
SMITH         800.00         12          3

14 rows selected

3. To regenerate serial numbers for three columns of data

In this way, the rows with the same serial number can be grouped into one row when the rows are converted into columns:

with t as
 (select ename, sal, dense_rank() over(order by sal desc) as rn from emp),
t1 as
 (select t.*,
         case
           when rn <= 3 then
            1
           when rn <= 6 then
            2
           else
            3
         end as new_rn
    from t)
select t1.*, row_number() over(partition by new_rn order by sal) as flag
  from t1
ENAME            SAL         RN     NEW_RN       FLAG
---------- --------- ---------- ---------- ----------
JONES        2975.00          3          1          1
FORD         3000.00          2          1          2
SCOTT        3000.00          2          1          3
KING         5000.00          1          1          4
ALLEN        1600.00          6          2          1
CLARK        2450.00          5          2          2
BLAKE        2850.00          4          2          3
SMITH         800.00         12          3          1
JAMES         950.00         11          3          2
ADAMS        1100.00         10          3          3
MARTIN       1250.00          9          3          4
WARD         1250.00          9          3          5
MILLER       1300.00          8          3          6
TURNER       1500.00          7          3          7

14 rows selected

4. Perform "row-to-column" conversion according to the last generated "grouping" column

with t as --l.对数据分档
 (select ename, sal, dense_rank() over(order by sal desc) as rn from emp),
t1 as --2.根据档次把数据分为三类
 (select t.*,
         case
           when rn <= 3 then
            1
           when rn <= 6 then
            2
           else
            3
         end as new_rn
    from t),
t2 as --3.分别对三列的数据重新取序号,这样相同序号的可以汇总后放在同一行
 (select t1.*, row_number() over(partition by new_rn order by sal) as flag
    from t1)
--4.行转列
select max(case new_rn
             when 1 then
              ename || '(' || sal || ')'
           end) as 第一档,
       max(case new_rn
             when 2 then
              ename || '(' || sal || ')'
           end) as 第二档,
       max(case new_rn
             when 3 then
              ename || '(' || sal || ')'
           end) as 第三档
  from t2
 group by flag
 order by flag;
第一档                                                                           第二档                                                                           第三档
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
JONES(2975)                                                                      ALLEN(1600)                                                                      SMITH(800)
FORD(3000)                                                                       CLARK(2450)                                                                      JAMES(950)
SCOTT(3000)                                                                      BLAKE(2850)                                                                      ADAMS(1100)
KING(5000)                                                                                                                                                        MARTIN(1250)
                                                                                                                                                                  WARD(1250)
                                                                                                                                                                  MILLER(1300)
                                                                                                                                                                  TURNER(1500)

7 rows selected

The above writing method is written in the case when method, and the following is a writing method using pivot:

with t as --l.对数据分档
 (select ename, sal, dense_rank() over(order by sal desc) as rn from emp),
t1 as --2.根据档次把数据分为三类
 (select t.*,
         case
           when rn <= 3 then
            1
           when rn <= 6 then
            2
           else
            3
         end as new_rn
    from t),
t2 as --3.分别对三列的数据重新取序号,这样相同序号的可以汇总后放在同一行
 (select t1.*, row_number() over(partition by new_rn order by sal) as flag
    from t1)
--4.行转列
select max(第一档), max(第二档), max(第三档)
  from (select ename || '(' || sal || ')' as enames, new_rn, flag from t2)
pivot (max(enames) for new_rn in(1 as 第一档,
                            2 as 第二档,
                            3 as 第三档
                            ))
 group by flag;
MAX(第一档)                                                                      MAX(第二档)                                                                      MAX(第三档)
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
JONES(2975)                                                                      ALLEN(1600)                                                                      SMITH(800)
FORD(3000)                                                                       CLARK(2450)                                                                      JAMES(950)
SCOTT(3000)                                                                      BLAKE(2850)                                                                      ADAMS(1100)
KING(5000)                                                                                                                                                        MARTIN(1250)
                                                                                                                                                                  WARD(1250)
                                                                                                                                                                  MILLER(1300)
                                                                                                                                                                  TURNER(1500)

7 rows selected

The row numbers generated after sorting belong to implicit information, and this implicit information is often used in various complex queries. For this kind of query, when you know what kind of implicit information you need, you're halfway there! ! !

[SQL Development Practical Skills] Series (34): Data Warehouse Report Scenario ☞How to convert data into columns in parallel