hive按照某个字段分组,然后获取每个分组中最新的n条数据

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/toto1297488504/article/details/85100542
hive -e "use db; select t.advertId,t.exposureNum from (select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2) t where t.rn=1;" > exposureInfo.txt;

解释说明:
ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn
表示通过 advertId 进行分区,然后在每个分区中按照时间降序排列结果。
ROW_NUMBER() 生成的时候通OVER(通过)(PARTITION BY advertId ORDER BY addTime desc) 这里面的结果生成行号

其中:

select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2

通过上面SQL生成的结果类似:
在这里插入图片描述

通过:

hive -e "use db; select t.advertId,t.exposureNum from (select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2) t where t.rn=1;" > exposureInfo.txt;

生成的数据为:
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/toto1297488504/article/details/85100542