1. WITH table_name AS ();
2.FROM table_name ( INSERT INTO table_name SELECT a ,b)
3. ROLLUP / CUBE / GROUPING SETS window function
First of all, if you want to read this article, you need to have a certain understanding of the above three points.
1. WITH table_name AS ();
https://blog.csdn.net/u010003835/article/details/105399470
2.FROM table_name ( INSERT INTO table_name SELECT a ,b)
https://blog.csdn.net/u010003835/article/details/105400140
3. ROLLUP / CUBE / GROUPING SETS window function
https://blog.csdn.net/u010003835/article/details/105353510
Suppose, we have such a scenario.
、
Data statistics background:
We now have multiple companies, multiple departments, and multiple employees' salaries. Now we need to count salary according to multiple dimensions.
At the same time, we have multiple result tables, so we need to put the data into multiple result tables while counting.
First we build the basic table
use data_warehouse_test;
CREATE TABLE IF NOT EXISTS datacube_salary_org (
company_name STRING COMMENT '公司名称'
,dep_name STRING COMMENT '部门名称'
,user_id BIGINT COMMENT '用户id'
,user_name STRING COMMENT '用户姓名'
,salary DECIMAL(10,2) COMMENT '薪水'
,create_time DATE COMMENT '创建时间'
,update_time DATE COMMENT '修改时间'
)
PARTITIONED BY(
pt STRING COMMENT '数据分区'
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
;
CREATE TABLE IF NOT EXISTS datacube_salary_basic_aggr(
company_name STRING COMMENT '公司名称'
,dep_name STRING COMMENT '部门名称'
,user_id BIGINT COMMENT '用户id'
,salary DECIMAL(10,2) COMMENT '薪水'
)
STORED AS ORC
;
CREATE TABLE IF NOT EXISTS datacube_salary_dep_aggr(
company_name STRING COMMENT '公司名称'
,dep_name STRING COMMENT '部门名称'
,total_salary DECIMAL(10,2) COMMENT '薪水'
)
STORED AS ORC
;
CREATE TABLE IF NOT EXISTS datacube_salary_company_aggr(
company_name STRING COMMENT '公司名称'
,total_salary DECIMAL(10,2) COMMENT '薪水'
)
STORED AS ORC
;
CREATE TABLE IF NOT EXISTS datacube_salary_total_aggr(
total_salary DECIMAL(10,2) COMMENT '薪水'
)
STORED AS ORC
;
Create a txt file and fill in the following
s.zh,engineer,1,szh,28000.0,2020-04-07,2020-04-07
s.zh,engineer,2,zyq,26000.0,2020-04-03,2020-04-03
s.zh,tester,3,gkm,20000.0,2020-04-07,2020-04-07
x.qx,finance,4,pip,13400.0,2020-04-07,2020-04-07
x.qx,finance,5,kip,24500.0,2020-04-07,2020-04-07
x.qx,finance,6,zxxc,13000.0,2020-04-07,2020-04-07
x.qx,kiccp,7,xsz,8600.0,2020-04-07,2020-04-07
Create partition & LOAD file
Create partition
ALTER TABLE datacube_salary_org ADD PARTITION (pt = '20200405');
Standard usage of LOAD file
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOAD data
LOAD DATA LOCAL INPATH '/opt/hive/my_script/data_warehouse_test/rollup_table/org_data.txt' OVERWRITE INTO TABLE datacube_salary_org PARTITION (pt = '20200405');
Now combined with the above usage,
1. WITH table_name AS ();
2.FROM table_name ( INSERT INTO table_name SELECT a ,b)
3. ROLLUP / CUBE / GROUPING SETS window function
The organizational relationship of the data is company-> department-> people-> salary
We count multiple dimensions in one SQL (according to people, departments, companies, the overall 4 dimensions), and put into the result table
WITH tmp_mid as (
SELECT
grouping__id
,company_name
,dep_name
,user_id
,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY
company_name
,dep_name
,user_id
WITH ROLLUP
)
FROM tmp_mid
INSERT OVERWRITE TABLE datacube_salary_basic_aggr
SELECT
company_name
,dep_name
,user_id
,total_salary
WHERE grouping__id = 7
INSERT OVERWRITE TABLE datacube_salary_dep_aggr
SELECT
company_name
,dep_name
,total_salary
WHERE grouping__id = 3
INSERT OVERWRITE TABLE datacube_salary_company_aggr
SELECT
company_name
,total_salary
WHERE grouping__id = 1
INSERT OVERWRITE TABLE datacube_salary_total_aggr
SELECT
total_salary
WHERE grouping__id = 0
;
We introduce the following tables
datacube_salary_basic_aggr Basic (company, department, individual level) salary statistics
datacube_salary_dep_aggr company, department level salary statistics
datacube_salary_company_aggr Company-level salary statistics
datacube_salary_total_aggr Statistics table of overall salary