本文基于hive1.1.0-cdh5.12.1
将数据按N个一组进行分组
示例数据:
id |
0 |
1 |
2 |
3 |
4 |
6 |
7
扫描二维码关注公众号,回复:
15540706 查看本文章
|
要求:将id按一个长度为N的滚动窗口分组(N>1),并求出窗口区间(前闭后开)
参考结果:N=3
id |
分组 |
窗口区间 |
0 |
1 |
[0-3) |
1 |
1 |
[0-3) |
2 |
1 |
[0-3) |
3 |
2 |
[3-6) |
4 |
2 |
[3-6) |
5 |
2 |
[3-6) |
6 |
3 |
[6-9) |
7 |
3 |
[6-9) |
参考实现:
set hivevar:win_size=3; --窗口大小
with t as
(
select i as id
from (select 7 as days ) t
LATERAL VIEW
posexplode(split(repeat(',',days),',')) pe as i, x
) --构造一个测试表
------------------------SQL-------------------------------
select id
,floor(id/win_size) as group_id --除以窗口大小后向下取整,得到分组id
,floor(id/win_size)*win_size as win_start--【分组id】*【窗口大小】=【窗口起始】
,(floor(id/win_size)+1)*win_size as win_end --【窗口起始】+【窗口大小】=(【分组id】+1)*【窗口大小】=【窗口终止(不含)】
,printf('[%d-%d)',floor(id/win_size)*win_size,(floor(id/win_size)+1)*win_size ) as window --使用printf格式窗口区间
from
(select id,${hivevar:win_size} as win_size from t) t1;
+-----+-----------+------------+----------+----------+--+
| id | group_id | win_start | win_end | window |
+-----+-----------+------------+----------+----------+--+
| 0 | 0 | 0 | 3 | [0-3) |
| 1 | 0 | 0 | 3 | [0-3) |
| 2 | 0 | 0 | 3 | [0-3) |
| 3 | 1 | 3 | 6 | [3-6) |
| 4 | 1 | 3 | 6 | [3-6) |
| 5 | 1 | 3 | 6 | [3-6) |
| 6 | 2 | 6 | 9 | [6-9) |
| 7 | 2 | 6 | 9 | [6-9) |
+-----+-----------+------------+----------+----------+--+