Combined with temporary table in Hive_Hive, FROM ** INSERT multi-table insertion, ROLLUP / CUBE / GROUPING SETS window function to optimize statistics writing logic

1. WITH table_name AS ();

2.FROM table_name ( INSERT INTO table_name SELECT  a  ,b)

3. ROLLUP / CUBE / GROUPING SETS window function

 

First of all, if you want to read this article, you need to have a certain understanding of the above three points.

 

1. WITH table_name AS ();

https://blog.csdn.net/u010003835/article/details/105399470

 

2.FROM table_name ( INSERT INTO table_name SELECT  a  ,b)

https://blog.csdn.net/u010003835/article/details/105400140

 

3. ROLLUP / CUBE / GROUPING SETS window function

https://blog.csdn.net/u010003835/article/details/105353510

 

 

Suppose, we have such a scenario. 

Data statistics background:

We now have multiple companies, multiple departments, and multiple employees' salaries. Now we need to count salary according to multiple dimensions.

At the same time, we have multiple result tables, so we need to put the data into multiple result tables while counting.

 

First we build the basic table

use data_warehouse_test;
 
 
CREATE TABLE IF NOT EXISTS datacube_salary_org (
 company_name STRING COMMENT '公司名称'
 ,dep_name STRING COMMENT '部门名称'
 ,user_id BIGINT COMMENT '用户id'
 ,user_name STRING COMMENT '用户姓名'
 ,salary DECIMAL(10,2) COMMENT '薪水'
 ,create_time DATE COMMENT '创建时间'
 ,update_time DATE COMMENT '修改时间'
) 
PARTITIONED BY(
 pt STRING COMMENT '数据分区'
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
;
 
 
CREATE TABLE IF NOT EXISTS datacube_salary_basic_aggr(
 company_name STRING COMMENT '公司名称'
 ,dep_name STRING COMMENT '部门名称'
 ,user_id BIGINT COMMENT '用户id'
 ,salary DECIMAL(10,2) COMMENT '薪水'
)
STORED AS ORC
;
 
 
CREATE TABLE IF NOT EXISTS datacube_salary_dep_aggr(
 company_name STRING COMMENT '公司名称'
 ,dep_name STRING COMMENT '部门名称'
 ,total_salary DECIMAL(10,2) COMMENT '薪水'
)
STORED AS ORC
;
 
 
CREATE TABLE IF NOT EXISTS datacube_salary_company_aggr(
 company_name STRING COMMENT '公司名称'
 ,total_salary DECIMAL(10,2) COMMENT '薪水'
)
STORED AS ORC
;
 
 
CREATE TABLE IF NOT EXISTS datacube_salary_total_aggr(
 total_salary DECIMAL(10,2) COMMENT '薪水'
)
STORED AS ORC
;

 

Create a txt file and fill in the following

s.zh,engineer,1,szh,28000.0,2020-04-07,2020-04-07
s.zh,engineer,2,zyq,26000.0,2020-04-03,2020-04-03
s.zh,tester,3,gkm,20000.0,2020-04-07,2020-04-07
x.qx,finance,4,pip,13400.0,2020-04-07,2020-04-07
x.qx,finance,5,kip,24500.0,2020-04-07,2020-04-07
x.qx,finance,6,zxxc,13000.0,2020-04-07,2020-04-07
x.qx,kiccp,7,xsz,8600.0,2020-04-07,2020-04-07

Create partition & LOAD file

Create partition

ALTER TABLE datacube_salary_org ADD PARTITION (pt = '20200405');

Standard usage of LOAD file 

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]  

LOAD data

LOAD DATA LOCAL INPATH '/opt/hive/my_script/data_warehouse_test/rollup_table/org_data.txt'  OVERWRITE INTO TABLE datacube_salary_org PARTITION (pt = '20200405');

 

 

 

Now combined with the above usage,

1. WITH table_name AS ();

2.FROM table_name ( INSERT INTO table_name SELECT  a  ,b)

3. ROLLUP / CUBE / GROUPING SETS window function

The organizational relationship of the data is company-> department-> people-> salary

We count multiple dimensions in one SQL (according to people, departments, companies, the overall 4 dimensions), and put into the result table

WITH tmp_mid as (
SELECT
 grouping__id
 ,company_name
 ,dep_name
 ,user_id
 ,SUM(salary) AS total_salary
FROM datacube_salary_org
WHERE pt = '20200407'
GROUP BY 
 company_name
 ,dep_name
 ,user_id 
 WITH ROLLUP
)

FROM tmp_mid

INSERT OVERWRITE TABLE datacube_salary_basic_aggr
SELECT
 company_name
 ,dep_name
 ,user_id
 ,total_salary 
WHERE grouping__id = 7

INSERT OVERWRITE TABLE datacube_salary_dep_aggr
SELECT 
 company_name
 ,dep_name
 ,total_salary 
WHERE grouping__id = 3

INSERT OVERWRITE TABLE datacube_salary_company_aggr
SELECT
 company_name
 ,total_salary
WHERE grouping__id = 1

INSERT OVERWRITE TABLE datacube_salary_total_aggr
SELECT
 total_salary
WHERE grouping__id = 0
;

 

We introduce the following tables 

datacube_salary_basic_aggr Basic (company, department, individual level) salary statistics

datacube_salary_dep_aggr company, department level salary statistics

datacube_salary_company_aggr Company-level salary statistics

datacube_salary_total_aggr Statistics table of overall salary

 

 

 

 

Published 519 original articles · praised 1146 · 2.83 million views

Guess you like

Origin blog.csdn.net/u010003835/article/details/105394136