Problem solved: Under normal circumstances can only be one dimension group by polymerization, if you need to use multiple dimensions polymerization union all to achieve, and grouping series of function can be achieved at once GROUPING SETS (month, day, ( month, day )) represents a polymeric three month, day two (month, day) three
Overview:
the GROUPING SETS, GROUPING__ID, CUBE, the ROLLUP
these function is typically used in OLAP analysis, not cumulative, but also according to different dimensions drill and drill statistical indicators, such as the number of UV minutes hours, days, months.
cookie5.txt
2015-03,2015-03-10,cookie1
2015-03,2015-03-10,cookie5
2015-03,2015-03-12,cookie7
2015-04,2015-04-12,cookie3
2015-04,2015-04-13,cookie2
2015-04,2015-04-13,cookie4
2015-04,2015-04-16,cookie4
2015-03,2015-03-10,cookie2
2015-03,2015-03-10,cookie3
2015-04,2015-04-12,cookie5
2015-04,2015-04-13,cookie6
2015-04,2015-04-15,cookie3
2015-04,2015-04-15,cookie2
2015-04,2015-04-16,cookie1
drop table if exists cookie5;
create table cookie5(month string, day string, cookieid string)
row format delimited fields terminated by ',';
load data local inpath "/home/hadoop/cookie5.txt" into table cookie5;
select * from cookie5;
In a GROUP BY query, combinations of different dimensions according to the polymerization, the result is equivalent to the GROUP BY is set different dimensions UNION ALL GROUPING__ID, which shows the results of a set of packets belonging.
Play with
the month and day are polymerized
select
month,
day,
count(distinct cookieid) as uv,
GROUPING__ID
from cookie5
group by month,day
grouping sets (month,day)
order by GROUPING__ID;
Equivalent to
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY day
Description result
第一列是按照month进行分组
第二列是按照day进行分组
第三列是按照month或day分组是,统计这一组有几个不同的cookieid
第四列grouping_id表示这一组结果属于哪个分组集合,
根据grouping sets中的分组条件month,day,1是代表month,2是代表day
Another example
SELECT month, day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM cookie5
GROUP BY month,day
GROUPING SETS (month,day,(month,day))
ORDER BY GROUPING__ID;
Equivalent to
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM cookie5 GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM cookie5 GROUP BY day
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM cookie5 GROUP BY month,day
GROUPING SETS (month, day, ( month, day)) seen in three polymerization,
first: month
second: day
Third: (month, day)