版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
现在脑子里想的只有学习,不要问我为什么,因为每天都是穷醒的,好了,开始上班时间偷偷写博客了
SQL-第一题:聚合函数形式的题
先讨论一下最基础的东西在hivesql里,比较晦涩,但是真的很好用哦。
谈谈个人理解:
在sql里两个核心的东西:(1)group :核心是聚合(2)join :的核心是要join哪些列
注意:group by 和 over在一个select里不能同时使用的哈
我们为什么要聚合呢?
eg:一个表 表示三个班级的学生,有名称,分数,班级,如果不聚合
我能不能这样写呢
select name , score from student group by class
不能 为什么不能?
首先我的name,score 在groupby的时候我不知道取哪一个,
groupby class 就相当于在所有的列名里面取项,你不知道取哪个name,哪个score,
所以取得项要么出现在groupby 后面,
要么取得项是一个聚合列(eg:sum(score)代表每个班的所有得分数) ,
这个聚合列就用到聚合函数udaf函数。
案例1 :【聚合函数形式的题】
eg:求每一个班级得分最高的学生得姓名+分数+班级?
解决方法:
1.sql -GroupBy +Join
2.RDD
3.sql -开窗
开窗函数:就是1对多 ,1就是partition by 谁 ,产生了多个数(产生了一个新列存这多个数)
group by :是多对1,把多个合并成一个
1.group by + join
1)子表a : select class,max(score) score from student group by class
(1)中不能有name 想完成需求只能子表与原表join形成一个大表,利用大表添加name
2)
select b.name, a.class,a.score from student b
join
(select class,max(score) score from student group by class ) a
on
b.class =a.class and a.score = b.score
2.开窗函数
开窗函数:结合上面的问题,我在表后面再加一列,加什么列呢,首先第一个我是按照班级先分组,
班级里的分数从高到低做一个排序,把排序的名次 作为要加的列。
这样就不用像上面那样group by,直接select 那个列就能取到值。这个就叫开窗函数。
select name,class,score,rank() over(partition by class order by score desc) rank from student
这个语句就是为了产生最后一列 。 over就叫开窗函数 ,这个over里面怎么开的窗呢?
partition by 就是先分组(就是以什么开窗,相当于 再某一个class里面我用一次rank(),在另一个class里面
我在用一次rank(), rank()的应用前提是 order by 某个东西,rank()之后,这样就产生了最后一列 名字叫rank
这个语句执行之后产生了一个新的表,这个表多了一列叫rank。假如这个表叫aa
我产生完表aa之后,直接就
select name,class,score from aa where rank =1 就完成需求
开窗函数里不止只有Rank()函数可用 ,它有很多哈
SQL题01
先讲思路–>再演示结果
SQL1: ods_domain_traffic_info这个表
domain time traffic(T)
gifshow.com 2019/01/01 5
yy.com 2019/01/01 4
huya.com 2019/01/01 1
gifshow.com 2019/01/20 6
gifshow.com 2019/02/01 8
yy.com 2019/01/20 5
gifshow.com 2019/02/02 7
需求:统计每个用户的累计访问量 一个SQL搞定
结果如下:
domain month traffics totals
gifshow.com 2019-01 11 11
gifshow.com 2019-02 15 26
yy.com 2019-01 9 9
huya.com 2019-01 1 1
思路:
结果要求的是 每个domain 每个月的 traffics 和 totals
分两步:
1.domain + month + traffics
a.
month : time 截取获得
traffics : sum(traffic)
b.
group by(domain,month ) +sum(traffic) ===> 拿到domain month traffics
2.拿到domain month traffics 目的是 domain month traffics totals
a.新生成了一个列 totals 先想到 开窗函数 over()
partition by 谁呢? order by 谁呢?
基于给的结果知道 partition by domain order by month over()前面选择 sum(traffics)
这样就ok了
我写sql思路过程:
1.每个domain 每个month的总量 ==》 domain month traffics
tmp :
select
domain,substr(regexp_replace(time,"/","-"),1,7) as month,
sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
2.目的是totals 生成了一个新列(1对多) 用开窗函数 ,以domain分组以month排序 之后用sum
result:
select
domain,month,traffics,
sum(traffics)over(partition by domain order by month) as totals
from tmp;
整合:
select
domain,month,traffics,
sum(traffics)over(partition by domain order by month) as totals
from(
select
domain,substr(regexp_replace(time,"/","-"),1,7) as month,
sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
) as tmp;
注意哈:我写博客为了好看 使用了 tab建 ,如何你想测试的话 把sql中的tab 地方去掉哈 。
结果演示:
1. domain +time + traffic ---> domain + month + traffics
每个domain 每个month的总量 ==》 domain month traffics
tmp :
select
domain,substr(regexp_replace(time,"/","-"),1,7) as month,
sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
0: jdbc:hive2://hadoop101:10000> select
. . . . . . . . . . . . . . . .> domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
. . . . . . . . . . . . . . . .> from ods_domain_traffic_info
. . . . . . . . . . . . . . . .> group by domain,substr(regexp_replace(time,"/","-"),1,7);
INFO : Compiling command(queryId=double_happy_20190917140000_1dab3570-5d32-449f-ab8f-4c896be57622): select
domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:domain, type:string, comment:null), FieldSchema(name:month, type:string, comment:null), FieldSchema(name:traffics, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=double_happy_20190917140000_1dab3570-5d32-449f-ab8f-4c896be57622); Time taken: 0.498 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=double_happy_20190917140000_1dab3570-5d32-449f-ab8f-4c896be57622): select
domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
INFO : Query ID = double_happy_20190917140000_1dab3570-5d32-449f-ab8f-4c896be57622
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0001, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0001/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0001
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:00:23,046 Stage-1 map = 0%, reduce = 0%
INFO : 2019-09-17 14:00:28,411 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.44 sec
INFO : 2019-09-17 14:00:34,894 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.51 sec
INFO : MapReduce Total cumulative CPU time: 3 seconds 510 msec
INFO : Ended Job = job_1568699800773_0001
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.51 sec HDFS Read: 9195 HDFS Write: 82 SUCCESS
INFO : Total MapReduce CPU Time Spent: 3 seconds 510 msec
INFO : Completed executing command(queryId=double_happy_20190917140000_1dab3570-5d32-449f-ab8f-4c896be57622); Time taken: 23.04 seconds
INFO : OK
+--------------+----------+-----------+--+
| domain | month | traffics |
+--------------+----------+-----------+--+
| gifshow.com | 2019-01 | 11 |
| gifshow.com | 2019-02 | 15 |
| huya.com | 2019-01 | 1 |
| yy.com | 2019-01 | 9 |
+--------------+----------+-----------+--+
2.目的是totals 生成了一个新列(1对多) 用开窗函数 ,以domain分组以month排序 之后用sum
0: jdbc:hive2://hadoop101:10000> select
. . . . . . . . . . . . . . . .> domain,month,traffics,
. . . . . . . . . . . . . . . .> sum(traffics)over(partition by domain order by month) as totals
. . . . . . . . . . . . . . . .> from(
. . . . . . . . . . . . . . . .> select
. . . . . . . . . . . . . . . .> domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
. . . . . . . . . . . . . . . .> from ods_domain_traffic_info
. . . . . . . . . . . . . . . .> group by domain,substr(regexp_replace(time,"/","-"),1,7)
. . . . . . . . . . . . . . . .> ) as tmp;
INFO : Compiling command(queryId=double_happy_20190917140202_f7fde489-9ae9-4ab5-85fd-591382b70891): select
domain,month,traffics,
sum(traffics)over(partition by domain order by month) as totals
from(
select
domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
) as tmp
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:domain, type:string, comment:null), FieldSchema(name:month, type:string, comment:null), FieldSchema(name:traffics, type:bigint, comment:null), FieldSchema(name:totals, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=double_happy_20190917140202_f7fde489-9ae9-4ab5-85fd-591382b70891); Time taken: 0.156 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=double_happy_20190917140202_f7fde489-9ae9-4ab5-85fd-591382b70891): select
domain,month,traffics,
sum(traffics)over(partition by domain order by month) as totals
from(
select
domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
) as tmp
INFO : Query ID = double_happy_20190917140202_f7fde489-9ae9-4ab5-85fd-591382b70891
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0002, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0002/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0002
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:02:38,123 Stage-1 map = 0%, reduce = 0%
INFO : 2019-09-17 14:02:44,384 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.64 sec
INFO : 2019-09-17 14:02:50,678 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.26 sec
INFO : MapReduce Total cumulative CPU time: 4 seconds 260 msec
INFO : Ended Job = job_1568699800773_0002
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.26 sec HDFS Read: 11518 HDFS Write: 92 SUCCESS
INFO : Total MapReduce CPU Time Spent: 4 seconds 260 msec
INFO : Completed executing command(queryId=double_happy_20190917140202_f7fde489-9ae9-4ab5-85fd-591382b70891); Time taken: 21.812 seconds
INFO : OK
+--------------+----------+-----------+---------+--+
| domain | month | traffics | totals |
+--------------+----------+-----------+---------+--+
| gifshow.com | 2019-01 | 11 | 11 |
| gifshow.com | 2019-02 | 15 | 26 |
| huya.com | 2019-01 | 1 | 1 |
| yy.com | 2019-01 | 9 | 9 |
+--------------+----------+-----------+---------+--+
也可以这样的 :不生成新的列:
0: jdbc:hive2://hadoop101:10000> select
. . . . . . . . . . . . . . . .> domain,month,
. . . . . . . . . . . . . . . .> sum(traffics)over(partition by domain order by month) as traffics
. . . . . . . . . . . . . . . .> from(
. . . . . . . . . . . . . . . .> select
. . . . . . . . . . . . . . . .> domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
. . . . . . . . . . . . . . . .> from ods_domain_traffic_info
. . . . . . . . . . . . . . . .> group by domain,substr(regexp_replace(time,"/","-"),1,7)
. . . . . . . . . . . . . . . .> ) as tmp;
INFO : Compiling command(queryId=double_happy_20190917140505_20c693e7-e3af-4233-966a-94cdfc50388d): select
domain,month,
sum(traffics)over(partition by domain order by month) as traffics
from(
select
domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
) as tmp
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:domain, type:string, comment:null), FieldSchema(name:month, type:string, comment:null), FieldSchema(name:traffics, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=double_happy_20190917140505_20c693e7-e3af-4233-966a-94cdfc50388d); Time taken: 0.05 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=double_happy_20190917140505_20c693e7-e3af-4233-966a-94cdfc50388d): select
domain,month,
sum(traffics)over(partition by domain order by month) as traffics
from(
select
domain,substr(regexp_replace(time,"/","-"),1,7) as month, sum(traffic) as traffics
from ods_domain_traffic_info
group by domain,substr(regexp_replace(time,"/","-"),1,7)
) as tmp
INFO : Query ID = double_happy_20190917140505_20c693e7-e3af-4233-966a-94cdfc50388d
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0003, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0003/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0003
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:05:17,501 Stage-1 map = 0%, reduce = 0%
INFO : 2019-09-17 14:05:22,722 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.47 sec
INFO : 2019-09-17 14:05:29,989 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.97 sec
INFO : MapReduce Total cumulative CPU time: 3 seconds 970 msec
INFO : Ended Job = job_1568699800773_0003
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.97 sec HDFS Read: 11443 HDFS Write: 82 SUCCESS
INFO : Total MapReduce CPU Time Spent: 3 seconds 970 msec
INFO : Completed executing command(queryId=double_happy_20190917140505_20c693e7-e3af-4233-966a-94cdfc50388d); Time taken: 21.456 seconds
INFO : OK
+--------------+----------+-----------+--+
| domain | month | traffics |
+--------------+----------+-----------+--+
| gifshow.com | 2019-01 | 11 |
| gifshow.com | 2019-02 | 26 |
| huya.com | 2019-01 | 1 |
| yy.com | 2019-01 | 9 |
+--------------+----------+-----------+--+
SQL2
SQL2:
uid pid
user1 a
user2 b
user1 c
user2 c
user3 c
user3 c
1)uv ==> uid cnt 应该是pid 有多少uid 访问 统计 uid个数 , 或者反过来??
2)统计每个产品top3的用户信息 ==> pid uid cnt
思路:
(1)这块有歧义 那么两个都做一下
pid 有多少uid ? 每个pid 有多少个 uid ==》 pid + uid
1. group by pid +count(distinct(uid)) 要去重的
uid访问pid的个数? 每个uid + pid
a. group by uid + count(distinct(pid)) 要对pid去重
这题比较简单 就是以谁分组 count 去重后的谁
结果展示: 因为我就造了3个pid abc 100个user
0: jdbc:hive2://hadoop101:10000> select uid ,count(distinct(pid)) as cnt
. . . . . . . . . . . . . . . .> from ods_uid_pid_info
. . . . . . . . . . . . . . . .> group by uid;
INFO : Compiling command(queryId=double_happy_20190917141414_3c87306d-3ebd-49b1-9371-8fdba5629b97): select uid ,count(distinct(pid)) as cnt
from ods_uid_pid_info
group by uid
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:uid, type:string, comment:null), FieldSchema(name:cnt, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=double_happy_20190917141414_3c87306d-3ebd-49b1-9371-8fdba5629b97); Time taken: 0.055 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=double_happy_20190917141414_3c87306d-3ebd-49b1-9371-8fdba5629b97): select uid ,count(distinct(pid)) as cnt
from ods_uid_pid_info
group by uid
INFO : Query ID = double_happy_20190917141414_3c87306d-3ebd-49b1-9371-8fdba5629b97
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0004, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0004/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0004
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:14:52,471 Stage-1 map = 0%, reduce = 0%
INFO : 2019-09-17 14:14:57,670 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.05 sec
INFO : 2019-09-17 14:15:03,926 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.69 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 690 msec
INFO : Ended Job = job_1568699800773_0004
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.69 sec HDFS Read: 17780 HDFS Write: 846 SUCCESS
INFO : Total MapReduce CPU Time Spent: 2 seconds 690 msec
INFO : Completed executing command(queryId=double_happy_20190917141414_3c87306d-3ebd-49b1-9371-8fdba5629b97); Time taken: 19.383 seconds
INFO : OK
+----------+------+--+
| uid | cnt |
+----------+------+--+
| user0 | 1 |
| user1 | 3 |
| user10 | 3 |
| user100 | 3 |
| user11 | 3 |
| user12 | 3 |
| user13 | 3 |
| user14 | 3 |
| user15 | 3 |
| user16 | 3 |
| user17 | 3 |
| user18 | 1 |
| user19 | 2 |
| user2 | 3 |
| user20 | 3 |
| user21 | 3 |
| user22 | 3 |
| user24 | 3 |
| user25 | 3 |
| user26 | 3 |
| user27 | 3 |
| user28 | 3 |
| user29 | 3 |
| user3 | 3 |
| user30 | 3 |
| user31 | 2 |
| user32 | 3 |
| user33 | 3 |
| user34 | 3 |
| user36 | 2 |
| user37 | 3 |
| user38 | 3 |
| user39 | 1 |
| user4 | 3 |
| user41 | 3 |
| user42 | 1 |
| user43 | 1 |
| user44 | 3 |
| user45 | 3 |
| user46 | 3 |
| user47 | 2 |
| user48 | 3 |
| user49 | 2 |
| user5 | 3 |
| user50 | 3 |
| user51 | 3 |
| user52 | 3 |
| user54 | 3 |
| user55 | 3 |
| user57 | 1 |
| user58 | 3 |
| user59 | 2 |
| user6 | 3 |
| user60 | 1 |
| user61 | 3 |
| user62 | 3 |
| user63 | 3 |
| user64 | 3 |
| user65 | 2 |
| user66 | 3 |
| user67 | 3 |
| user68 | 1 |
| user69 | 2 |
| user7 | 3 |
| user70 | 3 |
| user71 | 3 |
| user72 | 1 |
| user73 | 3 |
| user74 | 3 |
| user75 | 3 |
| user76 | 2 |
| user77 | 3 |
| user78 | 3 |
| user79 | 3 |
| user8 | 3 |
| user80 | 3 |
| user81 | 3 |
| user82 | 1 |
| user83 | 3 |
| user84 | 1 |
| user85 | 3 |
| user86 | 3 |
| user87 | 3 |
| user88 | 3 |
| user9 | 3 |
| user90 | 3 |
| user91 | 3 |
| user92 | 3 |
| user93 | 2 |
| user94 | 3 |
| user95 | 3 |
| user96 | 3 |
| user97 | 3 |
| user98 | 3 |
| user99 | 3 |
+----------+------+--+
(2)统计每个产品top3的用户信息 ==> pid uid cnt
思路:
1.pid uid cnt top3
意思是 每个pid 每个uid 的 访问次数 并 取出 top3
分两步:
step1:每个pid 每个uid 的 访问次数
step2:基于step1 取出 top3
step1:每个pid 每个uid 的 访问次数
group by(pid + uid) + count(uid) (group by 是有去重的哈 不难理解吧)
tmp:
select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
step2: 基于step1 pid,uid,count 取出 top3
目的 Top3 =》 基于1 要以pid进行分组以count 进行排序 生成新的列 rank (用开窗 1 对 多)
这块用到开窗 rank() 或者 row_number() + over
paritition by 谁 ?order by 谁?
要求的是每个产品top3的用户信息
所以是 paritition by pid order by count (注意哈 这个count 是step1 得到的 每个pid 每个uid 的 count)
result_tmp:
select pid,uid,count, rank()over(partition by pid order by count desc) as rank
from tmp; //有并列的
result_tmp:
select pid,uid,count, row_number()over(partition by pid order by count desc) as rank
from tmp; //没有并列的
step 3:基于2 进行where rank <= 3
result: 两种
select pid,uid,count
from result_tmp
where rank<=3;
step1 演示; 每个pid 每个uid 的 访问次数
0: jdbc:hive2://hadoop101:10000> select pid,uid,count(uid) as count
. . . . . . . . . . . . . . . .> from ods_uid_pid_info
. . . . . . . . . . . . . . . .> group by pid,uid ;
INFO : Compiling command(queryId=double_happy_20190917144747_4703dd87-bc51-4175-8551-b4ffcc78ed11): select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:pid, type:string, comment:null), FieldSchema(name:uid, type:string, comment:null), FieldSchema(name:count, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=double_happy_20190917144747_4703dd87-bc51-4175-8551-b4ffcc78ed11); Time taken: 0.046 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=double_happy_20190917144747_4703dd87-bc51-4175-8551-b4ffcc78ed11): select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
INFO : Query ID = double_happy_20190917144747_4703dd87-bc51-4175-8551-b4ffcc78ed11
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0007, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0007/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0007
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:47:44,611 Stage-1 map = 0%, reduce = 0%
INFO : 2019-09-17 14:47:49,855 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.07 sec
INFO : 2019-09-17 14:47:56,080 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.46 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 460 msec
INFO : Ended Job = job_1568699800773_0007
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.46 sec HDFS Read: 17957 HDFS Write: 2767 SUCCESS
INFO : Total MapReduce CPU Time Spent: 2 seconds 460 msec
INFO : Completed executing command(queryId=double_happy_20190917144747_4703dd87-bc51-4175-8551-b4ffcc78ed11); Time taken: 19.001 seconds
INFO : OK
+------+----------+--------+--+
| pid | uid | count |
+------+----------+--------+--+
| a | user1 | 4 |
| a | user10 | 2 |
| a | user100 | 5 |
| a | user11 | 8 |
| a | user12 | 4 |
| a | user13 | 3 |
| a | user14 | 6 |
| a | user15 | 11 |
| a | user16 | 6 |
| a | user17 | 5 |
| a | user19 | 2 |
| a | user2 | 2 |
| a | user20 | 3 |
| a | user21 | 1 |
| a | user22 | 5 |
| a | user24 | 7 |
| a | user25 | 4 |
| a | user26 | 6 |
| a | user27 | 2 |
| a | user28 | 1 |
| a | user29 | 4 |
| a | user3 | 3 |
| a | user30 | 4 |
| a | user31 | 5 |
| a | user32 | 4 |
| a | user33 | 2 |
| a | user34 | 1 |
| a | user36 | 1 |
| a | user37 | 7 |
| a | user38 | 5 |
| a | user4 | 3 |
| a | user41 | 1 |
| a | user43 | 1 |
| a | user44 | 2 |
| a | user45 | 3 |
| a | user46 | 2 |
| a | user47 | 2 |
| a | user48 | 2 |
| a | user49 | 3 |
| a | user5 | 10 |
| a | user50 | 3 |
| a | user51 | 4 |
| a | user52 | 1 |
| a | user54 | 9 |
| a | user55 | 6 |
| a | user58 | 1 |
| a | user59 | 3 |
| a | user6 | 3 |
| a | user61 | 2 |
| a | user62 | 11 |
| a | user63 | 3 |
| a | user64 | 4 |
| a | user65 | 2 |
| a | user66 | 3 |
| a | user67 | 2 |
| a | user69 | 2 |
| a | user7 | 4 |
| a | user70 | 3 |
| a | user71 | 6 |
| a | user72 | 2 |
| a | user73 | 6 |
| a | user74 | 2 |
| a | user75 | 2 |
| a | user76 | 2 |
| a | user77 | 3 |
| a | user78 | 7 |
| a | user79 | 7 |
| a | user8 | 1 |
| a | user80 | 2 |
| a | user81 | 6 |
| a | user82 | 1 |
| a | user83 | 5 |
| a | user85 | 3 |
| a | user86 | 5 |
| a | user87 | 8 |
| a | user88 | 5 |
| a | user9 | 1 |
| a | user90 | 2 |
| a | user91 | 4 |
| a | user92 | 2 |
| a | user93 | 1 |
| a | user94 | 6 |
| a | user95 | 2 |
| a | user96 | 6 |
| a | user97 | 6 |
| a | user98 | 3 |
| a | user99 | 4 |
| b | user0 | 2 |
| b | user1 | 7 |
| b | user10 | 1 |
| b | user100 | 4 |
| b | user11 | 6 |
| b | user12 | 3 |
| b | user13 | 4 |
| b | user14 | 4 |
| b | user15 | 9 |
| b | user16 | 6 |
| b | user17 | 4 |
| b | user19 | 6 |
| b | user2 | 5 |
+------+----------+--------+--+
| pid | uid | count |
+------+----------+--------+--+
| b | user20 | 1 |
| b | user21 | 8 |
| b | user22 | 8 |
| b | user24 | 2 |
| b | user25 | 1 |
| b | user26 | 7 |
| b | user27 | 2 |
| b | user28 | 2 |
| b | user29 | 4 |
| b | user3 | 8 |
| b | user30 | 3 |
| b | user32 | 3 |
| b | user33 | 2 |
| b | user34 | 4 |
| b | user36 | 1 |
| b | user37 | 10 |
| b | user38 | 3 |
| b | user4 | 5 |
| b | user41 | 1 |
| b | user44 | 1 |
| b | user45 | 1 |
| b | user46 | 2 |
| b | user47 | 1 |
| b | user48 | 7 |
| b | user49 | 2 |
| b | user5 | 8 |
| b | user50 | 6 |
| b | user51 | 7 |
| b | user52 | 3 |
| b | user54 | 2 |
| b | user55 | 1 |
| b | user58 | 5 |
| b | user59 | 1 |
| b | user6 | 1 |
| b | user60 | 4 |
| b | user61 | 10 |
| b | user62 | 4 |
| b | user63 | 4 |
| b | user64 | 1 |
| b | user65 | 1 |
| b | user66 | 7 |
| b | user67 | 1 |
| b | user68 | 1 |
| b | user69 | 8 |
| b | user7 | 8 |
| b | user70 | 3 |
| b | user71 | 4 |
| b | user73 | 5 |
| b | user74 | 1 |
| b | user75 | 3 |
| b | user77 | 8 |
| b | user78 | 2 |
| b | user79 | 9 |
| b | user8 | 1 |
| b | user80 | 6 |
| b | user81 | 6 |
| b | user83 | 3 |
| b | user84 | 4 |
| b | user85 | 6 |
| b | user86 | 5 |
| b | user87 | 6 |
| b | user88 | 4 |
| b | user9 | 1 |
| b | user90 | 4 |
| b | user91 | 2 |
| b | user92 | 5 |
| b | user94 | 3 |
| b | user95 | 3 |
| b | user96 | 4 |
| b | user97 | 7 |
| b | user98 | 3 |
| b | user99 | 3 |
| c | user1 | 5 |
| c | user10 | 7 |
| c | user100 | 2 |
| c | user11 | 4 |
| c | user12 | 5 |
| c | user13 | 3 |
| c | user14 | 2 |
| c | user15 | 6 |
| c | user16 | 3 |
| c | user17 | 3 |
| c | user18 | 1 |
| c | user2 | 2 |
| c | user20 | 2 |
| c | user21 | 4 |
| c | user22 | 4 |
| c | user24 | 4 |
| c | user25 | 4 |
| c | user26 | 6 |
| c | user27 | 4 |
| c | user28 | 1 |
| c | user29 | 7 |
| c | user3 | 3 |
| c | user30 | 1 |
| c | user31 | 1 |
| c | user32 | 3 |
| c | user33 | 1 |
| c | user34 | 2 |
| c | user37 | 6 |
+------+----------+--------+--+
| pid | uid | count |
+------+----------+--------+--+
| c | user38 | 6 |
| c | user39 | 2 |
| c | user4 | 9 |
| c | user41 | 2 |
| c | user42 | 2 |
| c | user44 | 1 |
| c | user45 | 3 |
| c | user46 | 3 |
| c | user48 | 6 |
| c | user5 | 9 |
| c | user50 | 6 |
| c | user51 | 8 |
| c | user52 | 4 |
| c | user54 | 3 |
| c | user55 | 2 |
| c | user57 | 1 |
| c | user58 | 2 |
| c | user6 | 1 |
| c | user61 | 6 |
| c | user62 | 4 |
| c | user63 | 3 |
| c | user64 | 2 |
| c | user66 | 13 |
| c | user67 | 4 |
| c | user7 | 10 |
| c | user70 | 5 |
| c | user71 | 6 |
| c | user73 | 7 |
| c | user74 | 4 |
| c | user75 | 4 |
| c | user76 | 1 |
| c | user77 | 6 |
| c | user78 | 3 |
| c | user79 | 7 |
| c | user8 | 2 |
| c | user80 | 2 |
| c | user81 | 11 |
| c | user83 | 2 |
| c | user85 | 2 |
| c | user86 | 5 |
| c | user87 | 4 |
| c | user88 | 4 |
| c | user9 | 1 |
| c | user90 | 4 |
| c | user91 | 2 |
| c | user92 | 4 |
| c | user93 | 1 |
| c | user94 | 12 |
| c | user95 | 3 |
| c | user96 | 4 |
| c | user97 | 6 |
| c | user98 | 3 |
| c | user99 | 5 |
+------+----------+--------+--+
step2:目的 Top3 =》 基于1 要以pid进行分组以count 进行排序 生成新的列 rank
0: jdbc:hive2://hadoop101:10000> select pid,uid,count, rank()over(partition by pid order by count desc) as rank
. . . . . . . . . . . . . . . .> from(
. . . . . . . . . . . . . . . .> select pid,uid,count(uid) as count
. . . . . . . . . . . . . . . .> from ods_uid_pid_info
. . . . . . . . . . . . . . . .> group by pid,uid
. . . . . . . . . . . . . . . .> )as tmp;
INFO : Compiling command(queryId=double_happy_20190917144949_8fd22282-a9be-4e95-aa2a-b98121f6354d): select pid,uid,count, rank()over(partition by pid order by count desc) as rank
from(
select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
)as tmp
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:pid, type:string, comment:null), FieldSchema(name:uid, type:string, comment:null), FieldSchema(name:count, type:bigint, comment:null), FieldSchema(name:rank, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=double_happy_20190917144949_8fd22282-a9be-4e95-aa2a-b98121f6354d); Time taken: 0.054 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=double_happy_20190917144949_8fd22282-a9be-4e95-aa2a-b98121f6354d): select pid,uid,count, rank()over(partition by pid order by count desc) as rank
from(
select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
)as tmp
INFO : Query ID = double_happy_20190917144949_8fd22282-a9be-4e95-aa2a-b98121f6354d
INFO : Total jobs = 2
INFO : Launching Job 1 out of 2
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0008, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0008/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0008
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:49:58,539 Stage-1 map = 0%, reduce = 0%
INFO : 2019-09-17 14:50:03,879 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.05 sec
INFO : 2019-09-17 14:50:10,134 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.98 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 980 msec
INFO : Ended Job = job_1568699800773_0008
INFO : Launching Job 2 out of 2
INFO : Starting task [Stage-2:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0009, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0009/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0009
INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:50:17,507 Stage-2 map = 0%, reduce = 0%
INFO : 2019-09-17 14:50:23,732 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.94 sec
INFO : 2019-09-17 14:50:29,976 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.39 sec
INFO : MapReduce Total cumulative CPU time: 3 seconds 390 msec
INFO : Ended Job = job_1568699800773_0009
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.98 sec HDFS Read: 17068 HDFS Write: 6962 SUCCESS
INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.39 sec HDFS Read: 14419 HDFS Write: 3494 SUCCESS
INFO : Total MapReduce CPU Time Spent: 6 seconds 370 msec
INFO : Completed executing command(queryId=double_happy_20190917144949_8fd22282-a9be-4e95-aa2a-b98121f6354d); Time taken: 39.409 seconds
INFO : OK
+------+----------+--------+-------+--+
| pid | uid | count | rank |
+------+----------+--------+-------+--+
| a | user15 | 11 | 1 |
| a | user62 | 11 | 1 |
| a | user5 | 10 | 3 |
| a | user54 | 9 | 4 |
| a | user87 | 8 | 5 |
| a | user11 | 8 | 5 |
| a | user37 | 7 | 7 |
| a | user24 | 7 | 7 |
| a | user78 | 7 | 7 |
| a | user79 | 7 | 7 |
| a | user55 | 6 | 11 |
| a | user96 | 6 | 11 |
| a | user94 | 6 | 11 |
| a | user16 | 6 | 11 |
| a | user81 | 6 | 11 |
| a | user26 | 6 | 11 |
| a | user97 | 6 | 11 |
| a | user71 | 6 | 11 |
| a | user14 | 6 | 11 |
| a | user73 | 6 | 11 |
| a | user86 | 5 | 21 |
| a | user38 | 5 | 21 |
| a | user88 | 5 | 21 |
| a | user22 | 5 | 21 |
| a | user83 | 5 | 21 |
| a | user31 | 5 | 21 |
| a | user100 | 5 | 21 |
| a | user17 | 5 | 21 |
| a | user12 | 4 | 29 |
| a | user7 | 4 | 29 |
| a | user25 | 4 | 29 |
| a | user29 | 4 | 29 |
| a | user30 | 4 | 29 |
| a | user32 | 4 | 29 |
| a | user99 | 4 | 29 |
| a | user91 | 4 | 29 |
| a | user51 | 4 | 29 |
| a | user64 | 4 | 29 |
| a | user1 | 4 | 29 |
| a | user4 | 3 | 40 |
| a | user50 | 3 | 40 |
| a | user66 | 3 | 40 |
| a | user3 | 3 | 40 |
| a | user85 | 3 | 40 |
| a | user77 | 3 | 40 |
| a | user98 | 3 | 40 |
| a | user59 | 3 | 40 |
| a | user6 | 3 | 40 |
| a | user20 | 3 | 40 |
| a | user70 | 3 | 40 |
| a | user63 | 3 | 40 |
| a | user45 | 3 | 40 |
| a | user13 | 3 | 40 |
| a | user49 | 3 | 40 |
| a | user65 | 2 | 55 |
| a | user19 | 2 | 55 |
| a | user2 | 2 | 55 |
| a | user27 | 2 | 55 |
| a | user33 | 2 | 55 |
| a | user44 | 2 | 55 |
| a | user46 | 2 | 55 |
| a | user47 | 2 | 55 |
| a | user48 | 2 | 55 |
| a | user61 | 2 | 55 |
| a | user67 | 2 | 55 |
| a | user69 | 2 | 55 |
| a | user72 | 2 | 55 |
| a | user74 | 2 | 55 |
| a | user75 | 2 | 55 |
| a | user76 | 2 | 55 |
| a | user80 | 2 | 55 |
| a | user90 | 2 | 55 |
| a | user92 | 2 | 55 |
| a | user95 | 2 | 55 |
| a | user10 | 2 | 55 |
| a | user52 | 1 | 76 |
| a | user9 | 1 | 76 |
| a | user41 | 1 | 76 |
| a | user93 | 1 | 76 |
| a | user36 | 1 | 76 |
| a | user34 | 1 | 76 |
| a | user28 | 1 | 76 |
| a | user21 | 1 | 76 |
| a | user43 | 1 | 76 |
| a | user82 | 1 | 76 |
| a | user8 | 1 | 76 |
| a | user58 | 1 | 76 |
| b | user37 | 10 | 1 |
| b | user61 | 10 | 1 |
| b | user15 | 9 | 3 |
| b | user79 | 9 | 3 |
| b | user7 | 8 | 5 |
| b | user21 | 8 | 5 |
| b | user22 | 8 | 5 |
| b | user69 | 8 | 5 |
| b | user5 | 8 | 5 |
| b | user3 | 8 | 5 |
| b | user77 | 8 | 5 |
| b | user51 | 7 | 12 |
| b | user48 | 7 | 12 |
+------+----------+--------+-------+--+
| pid | uid | count | rank |
+------+----------+--------+-------+--+
| b | user26 | 7 | 12 |
| b | user97 | 7 | 12 |
| b | user66 | 7 | 12 |
| b | user1 | 7 | 12 |
| b | user50 | 6 | 18 |
| b | user87 | 6 | 18 |
| b | user85 | 6 | 18 |
| b | user81 | 6 | 18 |
| b | user80 | 6 | 18 |
| b | user19 | 6 | 18 |
| b | user16 | 6 | 18 |
| b | user11 | 6 | 18 |
| b | user2 | 5 | 26 |
| b | user92 | 5 | 26 |
| b | user58 | 5 | 26 |
| b | user73 | 5 | 26 |
| b | user4 | 5 | 26 |
| b | user86 | 5 | 26 |
| b | user88 | 4 | 32 |
| b | user71 | 4 | 32 |
| b | user29 | 4 | 32 |
| b | user84 | 4 | 32 |
| b | user13 | 4 | 32 |
| b | user17 | 4 | 32 |
| b | user100 | 4 | 32 |
| b | user34 | 4 | 32 |
| b | user14 | 4 | 32 |
| b | user96 | 4 | 32 |
| b | user90 | 4 | 32 |
| b | user63 | 4 | 32 |
| b | user62 | 4 | 32 |
| b | user60 | 4 | 32 |
| b | user12 | 3 | 46 |
| b | user99 | 3 | 46 |
| b | user94 | 3 | 46 |
| b | user95 | 3 | 46 |
| b | user98 | 3 | 46 |
| b | user75 | 3 | 46 |
| b | user83 | 3 | 46 |
| b | user52 | 3 | 46 |
| b | user32 | 3 | 46 |
| b | user30 | 3 | 46 |
| b | user70 | 3 | 46 |
| b | user38 | 3 | 46 |
| b | user54 | 2 | 58 |
| b | user28 | 2 | 58 |
| b | user27 | 2 | 58 |
| b | user49 | 2 | 58 |
| b | user46 | 2 | 58 |
| b | user24 | 2 | 58 |
| b | user33 | 2 | 58 |
| b | user0 | 2 | 58 |
| b | user91 | 2 | 58 |
| b | user78 | 2 | 58 |
| b | user67 | 1 | 68 |
| b | user65 | 1 | 68 |
| b | user64 | 1 | 68 |
| b | user6 | 1 | 68 |
| b | user59 | 1 | 68 |
| b | user55 | 1 | 68 |
| b | user74 | 1 | 68 |
| b | user8 | 1 | 68 |
| b | user47 | 1 | 68 |
| b | user45 | 1 | 68 |
| b | user44 | 1 | 68 |
| b | user41 | 1 | 68 |
| b | user36 | 1 | 68 |
| b | user25 | 1 | 68 |
| b | user9 | 1 | 68 |
| b | user20 | 1 | 68 |
| b | user10 | 1 | 68 |
| b | user68 | 1 | 68 |
| c | user66 | 13 | 1 |
| c | user94 | 12 | 2 |
| c | user81 | 11 | 3 |
| c | user7 | 10 | 4 |
| c | user5 | 9 | 5 |
| c | user4 | 9 | 5 |
| c | user51 | 8 | 7 |
| c | user79 | 7 | 8 |
| c | user10 | 7 | 8 |
| c | user73 | 7 | 8 |
| c | user29 | 7 | 8 |
| c | user38 | 6 | 12 |
| c | user37 | 6 | 12 |
| c | user97 | 6 | 12 |
| c | user15 | 6 | 12 |
| c | user77 | 6 | 12 |
| c | user61 | 6 | 12 |
| c | user50 | 6 | 12 |
| c | user26 | 6 | 12 |
| c | user48 | 6 | 12 |
| c | user71 | 6 | 12 |
| c | user99 | 5 | 22 |
| c | user12 | 5 | 22 |
| c | user1 | 5 | 22 |
| c | user70 | 5 | 22 |
| c | user86 | 5 | 22 |
| c | user75 | 4 | 27 |
| c | user87 | 4 | 27 |
+------+----------+--------+-------+--+
| pid | uid | count | rank |
+------+----------+--------+-------+--+
| c | user74 | 4 | 27 |
| c | user67 | 4 | 27 |
| c | user21 | 4 | 27 |
| c | user62 | 4 | 27 |
| c | user88 | 4 | 27 |
| c | user96 | 4 | 27 |
| c | user92 | 4 | 27 |
| c | user90 | 4 | 27 |
| c | user27 | 4 | 27 |
| c | user25 | 4 | 27 |
| c | user24 | 4 | 27 |
| c | user22 | 4 | 27 |
| c | user52 | 4 | 27 |
| c | user11 | 4 | 27 |
| c | user63 | 3 | 43 |
| c | user45 | 3 | 43 |
| c | user95 | 3 | 43 |
| c | user46 | 3 | 43 |
| c | user32 | 3 | 43 |
| c | user54 | 3 | 43 |
| c | user3 | 3 | 43 |
| c | user98 | 3 | 43 |
| c | user17 | 3 | 43 |
| c | user16 | 3 | 43 |
| c | user78 | 3 | 43 |
| c | user13 | 3 | 43 |
| c | user83 | 2 | 55 |
| c | user85 | 2 | 55 |
| c | user58 | 2 | 55 |
| c | user55 | 2 | 55 |
| c | user42 | 2 | 55 |
| c | user100 | 2 | 55 |
| c | user91 | 2 | 55 |
| c | user14 | 2 | 55 |
| c | user8 | 2 | 55 |
| c | user80 | 2 | 55 |
| c | user39 | 2 | 55 |
| c | user64 | 2 | 55 |
| c | user20 | 2 | 55 |
| c | user41 | 2 | 55 |
| c | user2 | 2 | 55 |
| c | user34 | 2 | 55 |
| c | user18 | 1 | 71 |
| c | user30 | 1 | 71 |
| c | user28 | 1 | 71 |
| c | user44 | 1 | 71 |
| c | user57 | 1 | 71 |
| c | user6 | 1 | 71 |
| c | user76 | 1 | 71 |
| c | user9 | 1 | 71 |
| c | user93 | 1 | 71 |
| c | user31 | 1 | 71 |
| c | user33 | 1 | 71 |
+------+----------+--------+-------+--+
step3 基于2 进行where rank <= 3
整合:
result: 两种
select pid,uid,count
from(
select pid,uid,count, rank()over(partition by pid order by count desc) as rank
from(
select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
)as tmp
)as result_tmp
where rank<=3;
结果:
0: jdbc:hive2://hadoop101:10000> select pid,uid,count
. . . . . . . . . . . . . . . .> from(
. . . . . . . . . . . . . . . .> select pid,uid,count, rank()over(partition by pid order by count desc) as rank
. . . . . . . . . . . . . . . .> from(
. . . . . . . . . . . . . . . .> select pid,uid,count(uid) as count
. . . . . . . . . . . . . . . .> from ods_uid_pid_info
. . . . . . . . . . . . . . . .> group by pid,uid
. . . . . . . . . . . . . . . .> )as tmp
. . . . . . . . . . . . . . . .> )as result_tmp
. . . . . . . . . . . . . . . .> where rank<=3;
INFO : Compiling command(queryId=double_happy_20190917143131_3e5d7235-f034-4022-bea0-b86d6464a437): select pid,uid,count
from(
select pid,uid,count, rank()over(partition by pid order by count desc) as rank
from(
select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
)as tmp
)as result_tmp
where rank<=3
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:pid, type:string, comment:null), FieldSchema(name:uid, type:string, comment:null), FieldSchema(name:count, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=double_happy_20190917143131_3e5d7235-f034-4022-bea0-b86d6464a437); Time taken: 0.096 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=double_happy_20190917143131_3e5d7235-f034-4022-bea0-b86d6464a437): select pid,uid,count
from(
select pid,uid,count, rank()over(partition by pid order by count desc) as rank
from(
select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
)as tmp
)as result_tmp
where rank<=3
INFO : Query ID = double_happy_20190917143131_3e5d7235-f034-4022-bea0-b86d6464a437
INFO : Total jobs = 2
INFO : Launching Job 1 out of 2
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0005, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0005/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0005
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:31:30,465 Stage-1 map = 0%, reduce = 0%
INFO : 2019-09-17 14:31:34,683 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.17 sec
INFO : 2019-09-17 14:31:40,960 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.51 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 510 msec
INFO : Ended Job = job_1568699800773_0005
INFO : Launching Job 2 out of 2
INFO : Starting task [Stage-2:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0006, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0006/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0006
INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:31:49,210 Stage-2 map = 0%, reduce = 0%
INFO : 2019-09-17 14:31:54,497 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.17 sec
INFO : 2019-09-17 14:32:00,785 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.48 sec
INFO : MapReduce Total cumulative CPU time: 3 seconds 480 msec
INFO : Ended Job = job_1568699800773_0006
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.51 sec HDFS Read: 17079 HDFS Write: 6962 SUCCESS
INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.48 sec HDFS Read: 14767 HDFS Write: 117 SUCCESS
INFO : Total MapReduce CPU Time Spent: 5 seconds 990 msec
INFO : Completed executing command(queryId=double_happy_20190917143131_3e5d7235-f034-4022-bea0-b86d6464a437); Time taken: 37.896 seconds
INFO : OK
+------+---------+--------+--+
| pid | uid | count |
+------+---------+--------+--+
| a | user15 | 11 |
| a | user62 | 11 |
| a | user5 | 10 |
| b | user37 | 10 |
| b | user61 | 10 |
| b | user15 | 9 |
| b | user79 | 9 |
| c | user66 | 13 |
| c | user94 | 12 |
| c | user81 | 11 |
+------+---------+--------+--+
带上排名 看的更清楚:
0: jdbc:hive2://hadoop101:10000> select pid,uid,count,rank
. . . . . . . . . . . . . . . .> from(
. . . . . . . . . . . . . . . .> select pid,uid,count, rank()over(partition by pid order by count desc) as rank
. . . . . . . . . . . . . . . .> from(
. . . . . . . . . . . . . . . .> select pid,uid,count(uid) as count
. . . . . . . . . . . . . . . .> from ods_uid_pid_info
. . . . . . . . . . . . . . . .> group by pid,uid
. . . . . . . . . . . . . . . .> )as tmp
. . . . . . . . . . . . . . . .> )as result_tmp
. . . . . . . . . . . . . . . .> where rank<=3;
INFO : Compiling command(queryId=double_happy_20190917145252_c7075e67-87d3-4c1d-9d14-0944dfcdbdca): select pid,uid,count,rank
from(
select pid,uid,count, rank()over(partition by pid order by count desc) as rank
from(
select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
)as tmp
)as result_tmp
where rank<=3
INFO : Semantic Analysis Completed
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:pid, type:string, comment:null), FieldSchema(name:uid, type:string, comment:null), FieldSchema(name:count, type:bigint, comment:null), FieldSchema(name:rank, type:int, comment:null)], properties:null)
INFO : Completed compiling command(queryId=double_happy_20190917145252_c7075e67-87d3-4c1d-9d14-0944dfcdbdca); Time taken: 0.047 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=double_happy_20190917145252_c7075e67-87d3-4c1d-9d14-0944dfcdbdca): select pid,uid,count,rank
from(
select pid,uid,count, rank()over(partition by pid order by count desc) as rank
from(
select pid,uid,count(uid) as count
from ods_uid_pid_info
group by pid,uid
)as tmp
)as result_tmp
where rank<=3
INFO : Query ID = double_happy_20190917145252_c7075e67-87d3-4c1d-9d14-0944dfcdbdca
INFO : Total jobs = 2
INFO : Launching Job 1 out of 2
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0010, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0010/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0010
INFO : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:52:46,500 Stage-1 map = 0%, reduce = 0%
INFO : 2019-09-17 14:52:50,688 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.06 sec
INFO : 2019-09-17 14:52:55,872 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.8 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 800 msec
INFO : Ended Job = job_1568699800773_0010
INFO : Launching Job 2 out of 2
INFO : Starting task [Stage-2:MAPRED] in serial mode
INFO : Number of reduce tasks not specified. Estimated from input data size: 1
INFO : In order to change the average load for a reducer (in bytes):
INFO : set hive.exec.reducers.bytes.per.reducer=<number>
INFO : In order to limit the maximum number of reducers:
INFO : set hive.exec.reducers.max=<number>
INFO : In order to set a constant number of reducers:
INFO : set mapreduce.job.reduces=<number>
INFO : Starting Job = job_1568699800773_0011, Tracking URL = http://hadoop101:8088/proxy/application_1568699800773_0011/
INFO : Kill Command = /home/double_happy/app/hadoop/bin/hadoop job -kill job_1568699800773_0011
INFO : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
INFO : 2019-09-17 14:53:03,511 Stage-2 map = 0%, reduce = 0%
INFO : 2019-09-17 14:53:09,752 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 0.89 sec
INFO : 2019-09-17 14:53:15,983 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 3.4 sec
INFO : MapReduce Total cumulative CPU time: 3 seconds 400 msec
INFO : Ended Job = job_1568699800773_0011
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.8 sec HDFS Read: 17080 HDFS Write: 6962 SUCCESS
INFO : Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 3.4 sec HDFS Read: 14832 HDFS Write: 137 SUCCESS
INFO : Total MapReduce CPU Time Spent: 6 seconds 200 msec
INFO : Completed executing command(queryId=double_happy_20190917145252_c7075e67-87d3-4c1d-9d14-0944dfcdbdca); Time taken: 36.913 seconds
INFO : OK
+------+---------+--------+-------+--+
| pid | uid | count | rank |
+------+---------+--------+-------+--+
| a | user15 | 11 | 1 |
| a | user62 | 11 | 1 |
| a | user5 | 10 | 3 |
| b | user37 | 10 | 1 |
| b | user61 | 10 | 1 |
| b | user15 | 9 | 3 |
| b | user79 | 9 | 3 |
| c | user66 | 13 | 1 |
| c | user94 | 12 | 2 |
| c | user81 | 11 | 3 |
+------+---------+--------+-------+--+