hive data analysis five GROUPING SETS GROUPING__ID CUBE ROLLUP function of the window

Problem solved: Under normal circumstances can only be one dimension group by polymerization, if you need to use multiple dimensions polymerization union all to achieve, and grouping series of function can be achieved at once GROUPING SETS (month, day, ( month, day )) represents a polymeric three month, day two (month, day) three
Overview:
the GROUPING SETS, GROUPING__ID, CUBE, the ROLLUP
these function is typically used in OLAP analysis, not cumulative, but also according to different dimensions drill and drill statistical indicators, such as the number of UV minutes hours, days, months.
cookie5.txt

2015-03,2015-03-10,cookie1
2015-03,2015-03-10,cookie5
2015-03,2015-03-12,cookie7
2015-04,2015-04-12,cookie3
2015-04,2015-04-13,cookie2
2015-04,2015-04-13,cookie4
2015-04,2015-04-16,cookie4
2015-03,2015-03-10,cookie2
2015-03,2015-03-10,cookie3
2015-04,2015-04-12,cookie5
2015-04,2015-04-13,cookie6
2015-04,2015-04-15,cookie3
2015-04,2015-04-15,cookie2
2015-04,2015-04-16,cookie1
drop table if exists cookie5;
create table cookie5(month string, day string, cookieid string) 
row format delimited fields terminated by ',';
load data local inpath "/home/hadoop/cookie5.txt" into table cookie5;
select * from cookie5;


In a GROUP BY query, combinations of different dimensions according to the polymerization, the result is equivalent to the GROUP BY is set different dimensions UNION ALL GROUPING__ID, which shows the results of a set of packets belonging.

Play with
the month and day are polymerized

 select 
  month,
  day,
  count(distinct cookieid) as uv,
  GROUPING__ID
from cookie5 
group by month,day 
grouping sets (month,day) 
order by GROUPING__ID;

Equivalent to

SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month 
UNION ALL 
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day


Description result

第一列是按照month进行分组

第二列是按照day进行分组

第三列是按照month或day分组是,统计这一组有几个不同的cookieid

第四列grouping_id表示这一组结果属于哪个分组集合,
      根据grouping sets中的分组条件month,day,1是代表month,2是代表day

Another example

SELECT  month, day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID 
FROM cookie5 
GROUP BY month,day 
GROUPING SETS (month,day,(month,day)) 
ORDER BY GROUPING__ID;

Equivalent to

SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM cookie5 GROUP BY month 
UNION ALL 
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY day
UNION ALL 
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY month,day

GROUPING SETS (month, day, ( month, day)) seen in three polymerization,
first: month
second: day
Third: (month, day)

Guess you like

Origin blog.csdn.net/weixin_42177380/article/details/90809586