[SQL Development Practical Skills] Series (34): Data Warehouse Report Scenario ☞How to convert data into columns in parallel

Series Article Directory

[SQL development practical skills] series (1): those things that have to be said about SQL
[SQL development practical skills] series (2): simple single table query
[SQL development practical skills] series (3): those things about SQL sorting
[SQL Development Practical Skills] Series (4): Discuss the Precautions for Using UNION ALL and Empty String & UNION and OR from the Execution Plan
[SQL Development Practical Skills] Series (5): Look at the Efficiency of IN, EXISTS and INNER JOIN from the Execution Plan , we need to divide the scenarios and don’t memorize the online conclusion
[SQL Development Practical Skills] series (6): Look at the efficiency of NOT IN, NOT EXISTS and LEFT JOIN from the execution plan, and remember that internal and external association conditions should not be misplaced
[SQL Development Practical Skills] series ( Seven): Let’s talk about how to compare the difference data and the corresponding number of records in two tables under the premise of duplicate data
[SQL development practical skills] series (eight): talk about how to insert data, which is more flexible than constraints to restrict data insertion And how does an insert statement insert multiple tables at the same time
[SQL development practical skills] series (9): An update mistakenly updates other column data to be empty? Merge rewrite update! Give you five ways to delete duplicate data!
[SQL Development Practical Skills] Series (10): Starting from splitting strings, replacing strings, and counting the number of occurrences of strings
[SQL Development Practical Skills] Series (11): Take a few cases to talk about translate|regexp_replace| listagg|wmsys.wm_concat|substr|regexp_substr Commonly used functions
[SQL development practical skills] series (12): Three questions (how to sort the strings in alphabetical order after deduplicating the letters of the string? How to identify which strings contain numbers? How to convert delimited data into a multivalued IN list?)
[SQL Development Practical Skills] Series (13): Discuss common aggregate functions & see sum() over () through the execution plan to accumulate employee wages
[SQL Development Practical Skills] Series (14): Calculate the balance after consumption &Calculate the cumulative sum of bank turnover & calculate the top three employees in each department's salary
[SQL development practical skills] series (fifteen): Find the data information of the row where the most value is located and quickly calculate the sum of max/min() keep() over(), fisrt_value, last_value, ratio_to_report
[SQL development practical skills] series (16): time type operation in data warehouse (primary) day, month, year, hour, minute, second difference and time interval calculation [SQL
development Practical Skills] Series (Seventeen): Time type operations in data warehouses (primary) determine the number of working days between two dates, calculate the number of occurrences of each date in the week of the year, and determine the difference between the current record and the next record Number of days
[SQL development practical skills] series (18): time type operations in data warehouse (advanced) INTERVAL, EXTRACT and how to determine whether a year is a leap year and the calculation of the week [SQL development practical skills]
series (19): How to print the calendar of the current month or year with one SQL in the time type operation (advanced) in the data warehouse? How to determine the date of the first and last day of the week in a month?
[SQL Development Practical Skills] Series (20): Time Type Operations in Data Warehouse (Advanced) Obtain Quarter Start and End Time and How to Count Discontinuous Time Data
[SQL Development Practical Skills] Series (21): Data Time type operations in the warehouse (advanced) Identify overlapping date ranges, and summarize data at specified 10-minute intervals
[SQL development practical skills] series (22): Data warehouse report scenario ☞ Is the efficiency of the analysis function must be fast Chat 1 Talk about the implementation of result set paging and interlaced sampling
[SQL Development Practical Skills] Series (23): Data Warehouse Report Scenario ☞ How to de-duplicate data permutations and how to find the record containing the maximum and minimum values? Use the execution plan again to prove to you that the performance of the analysis function is not good. Must be high
[SQL development practical skills] series (24): data warehouse report scenario ☞ Detailed explanation of "row to column" and "column to row" through case execution plan [SQL development practical skills]
series (25 ): Data warehouse report scenario ☞ Duplicate data in the result set is only displayed once and the efficient way to write the salary difference of the calculation department and how to quickly group data
[SQL development practical skills] series (26): Data warehouse report scenario ☞ chat How ROLLUP and UNION ALL perform group totals respectively and how to identify which rows are the result rows for summary
[SQL Development Practical Skills] Series (27): Data Warehouse Report Scenario ☞Analytical functions are explained in detail by aggregating moving ranges The principle of window opening and how to print the ninety-nine multiplication table with one SQL
[SQL development practical skills] series (28): Data warehouse report scenario ☞ personnel distribution and how to achieve simultaneous aggregation of different groups (partitions)
[SQL development practical skills] series (29): Data warehouse report scenario ☞ simple tree (hierarchical) query and how to determine the root node, branch node and leaf node
[SQL development practical skills] series (30): Data warehouse report scenario ☞ tree How are (hierarchical) queries sorted? And how to correctly use the where condition in the tree query
[SQL development practical skills] series (31): Data warehouse report scenario ☞ Hierarchical query How to query only a certain branch of the tree structure? How to cut off a branch?
[SQL Development Practical Skills] Series (32): Data Warehouse Report Scenario ☞ Deduplicate the value in a field in the table
[SQL Development Practical Skills] Series (33): Data Warehouse Report Scenario ☞ Never Fixed Extract the elements of strings by position and search for data that meets the conditions such as letters first and numbers next
[SQL development practical skills] series (34): data warehouse report scenario ☞How to classify data and convert them into columns in parallel



foreword

The main content of this article is: There is a requirement: grade the results in emp according to salary, among which the highest three grades are used as a column, the second highest three grades are used as a column, and the rest are used as a column. In this article, we will discuss how to achieve this requirement. Two methods of row-to-column conversion are given: case when and pivot. Use this case to understand the importance of hiding column information.
[SQL development practical skills] This series of bloggers writes as a review of old knowledge. After all, SQL development is very important and basic in data analysis scenarios. Interviews will often ask about SQL development and tuning experience. I believe that when I finish writing this A series of articles can also gain something, and you can also face SQL interviews with ease in the future~.


1. Rank the results and convert them into columns

Now there is a requirement: classify the results in emp by salary, among which the highest three grades are used as a column, the second highest three grades are used as a column, and the rest are used as a column.
In this article, we will discuss how to achieve this requirement. The solution to this problem is as follows:

1. Generate serial number

Here let the same data (3000) be sorted the same, and do not occupy the sorting position, so you need to use dense_rank
to generate the serial number:

with t as (
select ename,sal,dense_rank()over(order by sal desc) as rn
from emp
)
select * from t;
ENAME            SAL         RN
---------- --------- ----------
KING         5000.00          1
FORD         3000.00          2
SCOTT        3000.00          2
JONES        2975.00          3
BLAKE        2850.00          4
CLARK        2450.00          5
ALLEN        1600.00          6
TURNER       1500.00          7
MILLER       1300.00          8
WARD         1250.00          9
MARTIN       1250.00          9
ADAMS        1100.00         10
JAMES         950.00         11
SMITH         800.00         12

14 rows selected

From the above query results, we can see that (FORD, SCOTT) sal is 3000, and the sorting number rn is 2.

2. According to certain requirements (I am free here)

Divide the above data into three files, which can be done by CASE WHEN:

with t as
 (select ename, sal, dense_rank() over(order by sal desc) as rn from emp)
select t.*,
       case
         when rn <= 3 then
          1
         when rn <= 6 then
          2
         else
          3
       end as new_rn
  from t;
ENAME            SAL         RN     NEW_RN
---------- --------- ---------- ----------
KING         5000.00          1          1
FORD         3000.00          2          1
SCOTT        3000.00          2          1
JONES        2975.00          3          1
BLAKE        2850.00          4          2
CLARK        2450.00          5          2
ALLEN        1600.00          6          2
TURNER       1500.00          7          3
MILLER       1300.00          8          3
WARD         1250.00          9          3
MARTIN       1250.00          9          3
ADAMS        1100.00         10          3
JAMES         950.00         11          3
SMITH         800.00         12          3

14 rows selected

3. To regenerate serial numbers for three columns of data

In this way, the rows with the same serial number can be grouped into one row when the rows are converted into columns:

with t as
 (select ename, sal, dense_rank() over(order by sal desc) as rn from emp),
t1 as
 (select t.*,
         case
           when rn <= 3 then
            1
           when rn <= 6 then
            2
           else
            3
         end as new_rn
    from t)
select t1.*, row_number() over(partition by new_rn order by sal) as flag
  from t1
ENAME            SAL         RN     NEW_RN       FLAG
---------- --------- ---------- ---------- ----------
JONES        2975.00          3          1          1
FORD         3000.00          2          1          2
SCOTT        3000.00          2          1          3
KING         5000.00          1          1          4
ALLEN        1600.00          6          2          1
CLARK        2450.00          5          2          2
BLAKE        2850.00          4          2          3
SMITH         800.00         12          3          1
JAMES         950.00         11          3          2
ADAMS        1100.00         10          3          3
MARTIN       1250.00          9          3          4
WARD         1250.00          9          3          5
MILLER       1300.00          8          3          6
TURNER       1500.00          7          3          7

14 rows selected

4. Perform "row-to-column" conversion according to the last generated "grouping" column

with t as --l.对数据分档
 (select ename, sal, dense_rank() over(order by sal desc) as rn from emp),
t1 as --2.根据档次把数据分为三类
 (select t.*,
         case
           when rn <= 3 then
            1
           when rn <= 6 then
            2
           else
            3
         end as new_rn
    from t),
t2 as --3.分别对三列的数据重新取序号,这样相同序号的可以汇总后放在同一行
 (select t1.*, row_number() over(partition by new_rn order by sal) as flag
    from t1)
--4.行转列
select max(case new_rn
             when 1 then
              ename || '(' || sal || ')'
           end) as 第一档,
       max(case new_rn
             when 2 then
              ename || '(' || sal || ')'
           end) as 第二档,
       max(case new_rn
             when 3 then
              ename || '(' || sal || ')'
           end) as 第三档
  from t2
 group by flag
 order by flag;
第一档                                                                           第二档                                                                           第三档
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
JONES(2975)                                                                      ALLEN(1600)                                                                      SMITH(800)
FORD(3000)                                                                       CLARK(2450)                                                                      JAMES(950)
SCOTT(3000)                                                                      BLAKE(2850)                                                                      ADAMS(1100)
KING(5000)                                                                                                                                                        MARTIN(1250)
                                                                                                                                                                  WARD(1250)
                                                                                                                                                                  MILLER(1300)
                                                                                                                                                                  TURNER(1500)

7 rows selected

The above writing method is written in the case when method, and the following is a writing method using pivot:

with t as --l.对数据分档
 (select ename, sal, dense_rank() over(order by sal desc) as rn from emp),
t1 as --2.根据档次把数据分为三类
 (select t.*,
         case
           when rn <= 3 then
            1
           when rn <= 6 then
            2
           else
            3
         end as new_rn
    from t),
t2 as --3.分别对三列的数据重新取序号,这样相同序号的可以汇总后放在同一行
 (select t1.*, row_number() over(partition by new_rn order by sal) as flag
    from t1)
--4.行转列
select max(第一档), max(第二档), max(第三档)
  from (select ename || '(' || sal || ')' as enames, new_rn, flag from t2)
pivot (max(enames) for new_rn in(1 as 第一档,
                            2 as 第二档,
                            3 as 第三档
                            ))
 group by flag;
MAX(第一档)                                                                      MAX(第二档)                                                                      MAX(第三档)
-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
JONES(2975)                                                                      ALLEN(1600)                                                                      SMITH(800)
FORD(3000)                                                                       CLARK(2450)                                                                      JAMES(950)
SCOTT(3000)                                                                      BLAKE(2850)                                                                      ADAMS(1100)
KING(5000)                                                                                                                                                        MARTIN(1250)
                                                                                                                                                                  WARD(1250)
                                                                                                                                                                  MILLER(1300)
                                                                                                                                                                  TURNER(1500)

7 rows selected

The row numbers generated after sorting belong to implicit information, and this implicit information is often used in various complex queries. For this kind of query, when you know what kind of implicit information you need, you're halfway there! ! !


Summarize

The main content of this article is: There is a requirement: grade the results in emp according to salary, among which the highest three grades are used as a column, the second highest three grades are used as a column, and the rest are used as a column. In this article, we will discuss how to achieve this requirement. Two methods of row-to-column conversion are given: case when and pivot. Use this case to understand the importance of hiding column information.

Guess you like

Origin blog.csdn.net/qq_28356739/article/details/129901671