Hive分析函数拓展(一)

数据:

cookie.txt

cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4

建表语句:

create database if not exists myhive;
use myhive;
drop table if exists cookie;
create table cookie(cookieid string, createtime string, pv int) row format delimited fields terminated by ',';
load data local inpath "/home/hadoop/cookie.txt" into table cookie;
select * from cookie;

SQL:

SELECT cookieid,createtime,pv,
    SUM(pv) OVER (PARTITION BY cookieid ORDER BY createtime) AS pv1,                                 							 -- 默认为从起点到当前行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,		
    									  --从起点到当前行,结果同pv1
    SUM(pv) OVER(PARTITION BY cookieid) AS pv3,                                                                                 --分组内所有行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,                    
    									  --当前行+往前3行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,                  
    									  --当前行+往前3行+往后1行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6		
    									  --当前行+往后所有行
FROM cookie;

结果:

			pv    pv1     pv2      pv3      pv4    pv5     pv6
cookie1 2015-04-10      1       1       1       26      1       6       26
cookie1 2015-04-11      5       6       6       26      6       13      25
cookie1 2015-04-12      7       13      13      26      13      16      20
cookie1 2015-04-13      3       16      16      26      16      18      13
cookie1 2015-04-14      2       18      18      26      17      21      10
cookie1 2015-04-15      4       22      22      26      16      20      8
cookie1 2015-04-16      4       26      26      26      13      13      4

分析:

pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 同pv1
pv3: 分组内(cookie1)所有的pv累加
pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

总结:

  • 如果不指定ROWS BETWEEN,默认为从起点到当前行
  • 如果不指定ORDER BY,则将分组内所有值累加
  • ROWS BETWEEN也叫WINDOW子句
    • PRECEDING:往前
    • FOLLOWING:往后
    • CURRENT ROW:当前行
    • UNBOUNDED:起点
    • UNBOUNDED PRECEDING:表示从前面的起点
    • UNBOUNDED FOLLOWING:表示到后面的终点

其他AVG,MIN,MAX,和SUM用法一样

发布了101 篇原创文章 · 获赞 265 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/a805814077/article/details/103167334
今日推荐