Hive参数传递与相关函数

1. Hive的参数传递

1.1 Hive命令行

查看hive命令的参数

[hadoop@node03 ~]$ hive -help

语法结构：

hive [-hiveconf x=y]* [<-i filename>]* [<-f filename>|<-e query-string>] [-S]

-i 从文件初始化HQL。
-e从命令行执行指定的HQL
-f 执行HQL脚本
-v 输出执行的HQL语句到控制台
-p <port> connect to Hive Server on port number
-hiveconf x=y Use this to set hive/hadoop configuration variables. 设置hive运行时候的参数配置

1.2 Hive参数配置方式

对于一般参数，有以下三种设定方式：

配置文件 hive-site.xml
命令行参数启动hive客户端的时候可以设置参数
参数声明进入客户端以后设置的一些参数 set

（1）配置文件：

Hive的配置文件包括
- 用户自定义配置文件：$HIVE_CONF_DIR/hive-site.xml
- 默认配置文件：$HIVE_CONF_DIR/hive-default.xml
用户自定义配置会覆盖默认配置。
Hive的配置会覆盖Hadoop的配置。配置文件的设定对本机启动的所有Hive进程都有效。

（2）命令行参数：

启动Hive（客户端或Server方式）时，可以在命令行添加-hiveconf param=value来设定参数，例如：

bin/hive --hiveconf hive.root.logger=INFO,console

这一设定只对本次启动的Session（对于Server方式启动，则是所有请求的Sessions）有效。

（3）参数声明：

可以在HQL中使用SET关键字设定参数，例如：

-- 设置mr中reduce个数
set mapreduce.job.reduces=100;

这一设定的作用域也是session级的。

上述三种设定方式的优先级依次递增。

即参数声明覆盖命令行参数，命令行参数覆盖配置文件设定。即：参数声明 > 命令行参数 > 配置文件参数（hive）
注意某些系统级的参数，例如log4j相关的设定，必须用前两种方式设定，因为那些参数的读取在Session建立以前已经完成了。

1.3 使用变量传递参数

在hive当中我们一般可以使用hivevar或者hiveconf来进行参数的传递。

（1）hiveconf使用说明

hiveconf用于定义HIVE执行上下文的属性(配置参数)，可覆盖覆盖hive-site.xml（hive-default.xml）中的参数值，如用户执行目录、日志打印级别、执行队列等。例如我们可以使用hiveconf来覆盖我们的hive属性配置；
hiveconf变量取值必须要使用hiveconf作为前缀参数，具体格式如下:
bin/hive --hiveconf "mapred.job.queue.name=root.default"

（2）hivevar使用说明

hivevar用于定义HIVE运行时的变量替换，类似于JAVA中的“PreparedStatement”，与${key}配合使用或者与 ${hivevar:key}；
对于hivevar取值可以不使用前缀hivevar，具体格式如下：

-- 使用前缀: ${hivevar:key}
-- 不使用前缀: ${key}
hive --hivevar name=zhangsan

（3）define使用说明

define与hivevar用途完全一样，简写为"-d"

define写法：

hive --hiveconf "mapred.job.queue.name=root.default" -d my="202003" --database myhive

-- 执行SQL
hive > select * from myhive.score where concat(year, month) = ${my} limit 5;

（4）hiveconf与hivevar使用实战

需求：hive当中执行以下hql语句，并将'201807'、'80'、'03'用参数的形式全部都传递进去。

select * from student left join score on student.s_id = score.s_id where score.month = '201807' and score.s_score > 80 and score.c_id = 03;

第一步：创建student表并加载数据

hive (myhive)> create external table student
> (s_id string,

> s_name string,

> s_birth string,

>s_sex string)

> row format delimited
>fields terminated by '\t';

第二步：加载数据到student表

hive (myhive)> load data local inpath '/install/hivedatas/student.csv' overwrite into table student;

第三步：定义hive脚本

开发hql脚本，并使用hiveconf和hivevar进行参数传入；
node03执行以下命令定义hql脚本：
vim hivevariable.hql

use myhive;
select * from student left join score on student.s_id = score.s_id where score.month = ${hiveconf:month} and score.s_score > ${hivevar:s_score} and score.c_id = ${c_id};

第四步：调用hive脚本并传递参数

student、score表内容如下：

node03执行以下命令：

hive --hiveconf month=202003 --hivevar s_score=80 --hivevar c_id=03 -f /install/hivedatas/hivevariable.hql

2. Hive的常用函数

2.1 系统内置函数

（1）查看系统自带的函数
hive> show functions;
（2）显示自带的函数的用法
hive> desc function upper;
（3）详细显示自带的函数的用法
hive> desc function extended upper;

2.2 数值计算函数

（1）取整函数: round

语法: round(double a)
返回值: BIGINT
说明: 返回double类型的整数值部分（遵循四舍五入）
hive> select round(3.1415926) from tableName;

（2）指定精度取整函数: round

语法: round(double a, int d)
返回值: DOUBLE
说明: 返回指定精度d的double类型；d为小数点个数
hive> select round(3.1415926, 4) from tableName;返回值为3.1416

（3）向下取整函数: floor

语法: floor(double a)
返回值: BIGINT
说明: 返回等于或者小于该double变量的最大的整数
hive> select floor(3.1415926) from tableName;返回3

（4）向上取整函数: ceil

语法: ceil(double a)
返回值: BIGINT
说明: 返回等于或者大于该double变量的最小的整数
hive> select ceil(3.1415926) from tableName;返回4

（5）向上取整函数: ceiling

语法: ceiling(double a)
返回值: BIGINT
说明: 与ceil功能相同
hive> select ceiling(3.1415926) from tableName;返回4

（6）取随机数函数: rand

语法: rand(), rand(int seed)
返回值: double
说明: 返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列；
hive> select rand() from tableName;
0.5577432776034763

2.3日期函数

（1）UNIX时间戳转日期函数: from_unixtime

语法: from_unixtime(bigint unixtime[, string format])
返回值: string
说明: 转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式；前面的unixtime为秒数

hive> select from_unixtime(1323308943, 'yyyyMMdd') from tableName;返回20111208

（2）日期转周函数: weekofyear

语法: weekofyear (string date)
返回值: int
说明: 返回日期在当前的周数。

hive> select weekofyear('2011-12-08 10:03:01') from tableName;

（3）日期比较函数: datediff

语法: datediff(string enddate, string startdate)
返回值: int
说明: 返回结束日期减去开始日期的天数。
hive> select datediff('2012-12-08','2012-05-09') from tableName;

（4）日期增加函数: date_add

语法: date_add(string startdate, int days)
返回值: string
说明: 返回开始日期startdate增加days天后的日期。
hive> select date_add('2012-12-08',10) from tableName;

（5）日期减少函数: date_sub

语法: date_sub (string startdate, int days)
返回值: string
说明: 返回开始日期startdate减少days天后的日期。
hive> select date_sub('2012-12-08',10) from tableName;

2.4 条件函数

（1）If函数: if

语法: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
返回值: T
说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull
hive> select if(1=2,100,200) from tableName;
200

（2）非空查找函数: COALESCE

语法: COALESCE(T v1, T v2, …)
返回值: T
说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL
hive> select COALESCE(null,'100','50') from tableName;
100

（3）条件判断函数：CASE

语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
返回值: T
说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f
hive> select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end from tableName;
mary

（4）条件判断函数：CASE

语法: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
返回值: T
说明：如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e
hive> select case when 1=2 then 'tom' when 2=2 then 'mary' else 'tim' end from tableName;
mary

2.5 字符串函数

（1）字符串连接函数：concat

语法: concat(string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，支持任意个输入字符串
hive> select concat('abc','def','gh') from tableName;
abcdefgh

（2）字符串连接并指定字符串分隔符：concat_ws

语法: concat_ws(string SEP, string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符
hive> select concat_ws(',','abc','def','gh') from tableName;
abc,def,gh

（3）字符串截取函数：substr

语法: substr(string A, int start), substring(string A, int start)
返回值: string
说明：返回字符串A从start位置到结尾的字符串
hive> select substr('abcde',3) from tableName;
cde
hive> select substr('abcde',-1) from tableName; （和ORACLE相同）
e

（4）字符串截取函数：substr, substring

语法: substr(string A, int start, int len),substring(string A, int start, int len)
返回值: string
说明：返回字符串A从start位置开始，长度为len的字符串
hive> select substr('abcde',3,2) from tableName;
cd

（5）去空格函数：trim

语法: trim(string A)
返回值: string
说明：去除字符串两边的空格
hive> select trim(' ab c ') from tableName;
ab c

（6）url解析函数 parse_url

语法: parse_url(string urlString, string partToExtract [, string keyToExtract])
返回值: string
说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.
hive> select parse_url ('https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') from tableName;
www.tableName.com

（7）json解析 get_json_object

语法: get_json_object(string json_string, string path)
返回值: string
说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。
hive> select get_json_object('{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} },"email":"amy@only_for_json_udf_test.net","owner":"amy"}','$.owner') from tableName;
返回：amy（若最后一项改为$.email，则返回amy@only_for_json_udf_test.net）

（8）分割字符串函数: split

语法: split(string str, string pat)
返回值: array
说明: 按照pat字符串分割str，会返回分割后的字符串数组
hive> select split('abtcdtef','t') from tableName;
["ab","cd","ef"]

2.6 集合统计函数

（1）个数统计函数: count

语法: count(*), count(expr), count(DISTINCT expr[, expr_.])
返回值：Int
说明: count(*)统计检索出的行的个数，包括NULL值的行；count(expr)返回指定字段的非空值的个数；count(DISTINCT expr[, expr_.])返回指定字段的不同的非空值的个数；
hive> select count(*) from tableName;
hive> select count(distinct t) from tableName;

（2）总和统计函数: sum

语法: sum(col), sum(DISTINCT col)
返回值: double
说明: sum(col)统计结果集中col的相加的结果；sum(DISTINCT col)统计结果中col不同值相加的结果；
select sum(t) from tableName;

2.7 复合类型构建函数

（1）Map类型构建: map

语法: map (key1, value1, key2, value2, …)
说明：根据输入的key和value对构建map类型
第一步：建表
create table score_map(name string, score map<string, int>)
row format delimited fields terminated by '\t'
collection items terminated by ','
map keys terminated by ':';
第二步：创建数据内容如下并加载数据
cd /kkb/install/hivedatas/
vim score_map.txt

zhangsan 数学:80,语文:89,英语:95
lisi 语文:60,数学:80,英语:99
第三步：加载数据到hive表当中去
load data local inpath '/install/hivedatas/score_map.txt' overwrite into table score_map;
第四步： map结构数据访问：
-- 获取所有的value：
select name,map_values(score) from score_map;

-- 获取所有的key：
select name,map_keys(score) from score_map;

-- 按照key来进行获取value值
select name,score["数学"] from score_map;

-- 查看map元素个数
select name,size(score) from score_map;

-- 构建一个map
select map(1, 'zs', 2, 'lisi');

（2）Struct类型构建: struct

语法: struct(val1, val2, val3, …)
说明：根据输入的参数构建结构体struct类型，似于C语言中的结构体，内部数据通过X.X来获取，假设我数据格式是这样的，电影ABC，有1254人评价过，打分为7.4分；
第一步：创建struct表
create table movie_score(name string, info struct<number:int,score:float>)
row format delimited fields terminated by "\t"
collection items terminated by ":";
第二步：创建数据并加载数据
cd /kkb/install/hivedatas/
vim struct.txt

-- 电影ABC，有1254人评价过，打分为7.4分
ABC   1254:7.4
DEF   256:4.9
XYZ   456:5.4
第三步：加载数据
load data local inpath '/install/hivedatas/struct.txt' overwrite into table movie_score;
第四步： hive当中查询数据
hive> select * from movie_score;
hive> select info.number, info.score from movie_score;
OK
1254 7.4
256 4.9
456 5.4

-- 构建一个struct
select struct(1, 'anzhulababy', 'moon', 1.68);

（3）Array类型构建: array

语法: array(val1, val2, …)
说明：根据输入的参数构建数组array类型
第一步：创建表

hive> create table person(name string, work_locations array<string>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';

第二步：加载数据到person表当中去

cd /kkb/install/hivedatas/
vim person.txt

-- 数据内容格式如下
biansutao beijing,shanghai,tianjin,hangzhou
linan changchun,chengdu,wuhan

第三步：加载数据

hive > load data local inpath '/kkb/install/hivedatas/person.txt' overwrite into table person;

第四步：查询所有数据数据

hive > select * from person;

-- 按照下表索引进行查询
hive > select work_locations[0] from person;

-- 查询所有集合数据
hive > select work_locations from person;

-- 查询元素个数
hive > select size(work_locations) from person;

-- 构建array
select array(1, 2, 1);
select array(1, 'a', 1.0)

2.8 复杂类型长度统计函数

（1）Map类型长度函数: size(Map<k .V>)

语法: size(Map<k .V>)
返回值: int
说明: 返回map类型的长度
hive> select size(map(1, 'zs', 2, 'anzhulababy')) from tableName;
2

（2）array类型长度函数: size(Array<T>)

语法: size(Array<T>)
返回值: int
说明: 返回array类型的长度
hive> select size(t) from arr_table2;
4

（3）类型转换函数

类型转换函数: cast
语法: cast(expr as <type>)
返回值: Expected "=" to follow "type"
说明: 返回转换后的数据类型
hive> select cast('1' as bigint) from tableName;
1

2.9 行转列

（1）相关函数说明

CONCAT(string A/col, string B/col…)：返回输入字符串连接后的结果，支持任意个输入字符串;
CONCAT_WS(separator, str1, str2,...)：它是一个特殊形式的 CONCAT()。
- 第一个参数剩余参数间的分隔符。分隔符可以是与剩余参数一样的字符串。如果分隔符是 NULL，返回值也将为 NULL。
- 这个函数会跳过分隔符参数后的任何 NULL 和空字符串。分隔符将被加到被连接的字符串之间;
COLLECT_SET(col)：函数只接受基本数据类型，它的主要作用是将某字段的值进行去重汇总，产生array类型字段。

（2）数据准备

name	constellation	blood_type
孙悟空	白羊座	A
老王	射手座	A
宋宋	白羊座	B
猪八戒	白羊座	A
按住那baby	射手座	A

（3）需求

把星座和血型一样的人归类到一起。结果如下：

射手座,A 老王|按住那baby
白羊座,A 孙悟空|猪八戒
白羊座,B 宋宋

（4）创建表数据文件

node03服务器执行以下命令创建文件，注意数据使用\t进行分割：

cd /install/hivedatas
vim constellation.txt

孙悟空   白羊座   A
老王   射手座   A
宋宋   白羊座   B
猪八戒   白羊座   A
按住那baby   射手座   A

（5）创建hive表并导入数据

创建hive表并加载数据：

hive (hive_explode)> create table person_info(name string, constellation string, blood_type string) row format delimited fields terminated by "\t";

加载数据

hive (hive_explode)> load data local inpath '/install/hivedatas/constellation.txt' into table person_info;

（6）按需求查询数据

分析需求：

（1）星座和血型归并在一起（base），并以“，”分隔；故此处可以采用concat函数连接血型和星座，两者之间以“，”区分

分组依据分别是人名和星座血型；

此时sql为：(select name,concat(contellation,",",blood_type) base from person_info ) t1

（2）相同血型相同星座的人名归并在一起（name），以“|”分隔；此处可以采用concat_ws函数连接人名，两者之间以“|”区分；重复的人名得去重汇总，采用concat_set函数。

此时的sql语句为：select t1.base,concat_ws('|',collect_et(t1.name))

(3)完整的sql语句为：

select t1.base, concat_ws('|', collect_set(t1.name)) as name
from
(select name, concat(constellation, "," , blood_type) as base from person_info) as t1
group by t1.base;

2.10 列转行

（1）函数说明

EXPLODE(col)：将hive一列中复杂的array或者map结构拆分成多行。
LATERAL VIEW
- 用法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias
- 解释：用于和split, explode等UDTF一起使用，它能够将一列数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。
- 相当于拆分后的结果存入一个虚拟的中间表之中；

（2）数据准备

数据内容如下，字段之间都是使用\t进行分割
cd /install/hivedatas

vim movie.txt

《疑犯追踪》   悬疑,动作,科幻,剧情
《Lie to me》   悬疑,警匪,动作,心理,剧情
《战狼2》   战争,动作,灾难

（3）需求

将电影分类中的数组数据展开。结果如下：

《疑犯追踪》   悬疑
《疑犯追踪》   动作
《疑犯追踪》   科幻
《疑犯追踪》   剧情
《Lie to me》   悬疑
《Lie to me》   警匪
《Lie to me》   动作
《Lie to me》   心理
《Lie to me》   剧情
《战狼2》战争
《战狼2》动作
《战狼2》灾难

（4）创建hive表并导入数据

创建hive表

hive (hive_explode)> create table movie_info(movie string, category array<string>)
row format delimited fields terminated by "\t"
collection items terminated by ",";

加载数据

load data local inpath "/kkb/install/hivedatas/movie.txt" into table movie_info;

（5）按需求查询数据

select movie, category_name from movie_info
lateral view explode(category) table_tmp as category_name;

2.11 使用explode拆分json字符串

（1）需求：现在有一些数据格式如下：

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

其中字段与字段之间的分隔符是 |
我们要解析得到所有的monthSales对应的值为以下这一列（行转列）

4900
2090
6987

用get_json_object来获取key为monthSales的数据：

（2）第一步：创建hive表

hive (hive_explode)> create table hive_explode.explode_lateral_view (area string, goods_id string, sale_info string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS textfile;

（3）第二步：准备数据并加载数据

准备数据如下

cd /install/hivedatas

vim explode_json

加载数据到hive表当中去

hive (hive_explode)> load data local inpath '/install/hivedatas/explode_json' overwrite into table hive_explode.explode_lateral_view;

（4）第三步：使用explode拆分Array

hive (hive_explode)> select explode(split(goods_id, ',')) as goods_id from hive_explode.explode_lateral_view;

（5）第四步：使用explode拆解Map

hive (hive_explode)> select explode(split(area, ',')) as area from hive_explode.explode_lateral_view;

（6）第五步：拆解json字段

hive (hive_explode)> select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as sale_info from hive_explode.explode_lateral_view;

（7）用get_json_object来获取key为monthSales的数据：

此处必须配合LATERAL VIEW使用；因为get_json_object不支持多个字段。
lateral view用于和split、explode等UDTF一起使用的，能将一行数据拆分成多行数据；
在此基础上可以对拆分的数据进行聚合；
lateral view首先为原始表的每行调用UDTF，UDTF会把一行拆分成一行或者多行，lateral view在把结果组合，产生一个支持别名表的虚拟表。
配合lateral view查询多个字段；
如：hive (hive_explode)> select goods_id2, sale_info from explode_lateral_view
LATERAL VIEW explode(split(goods_id, ','))goods as goods_id2;

其中LATERAL VIEW explode(split(goods_id,','))goods相当于一个虚拟表，与原表explode_lateral_view笛卡尔积关联。

也可以多重使用，如下，也是三个表笛卡尔积的结果；
最终，获取key为monthSales的数据的sql语句为：

select

get_json_object(concat('{',sale_info_1,'}'),'$.source') as source,

get_json_object(concat('{',sale_info_1,'}'),'$.monthSales') as monthSales,

get_json_object(concat('{',sale_info_1,'}'),'$.userCount') as userCount,

get_json_object(concat('{',sale_info_1,'}'),'$.score') as score
from explode_lateral_view
LATERAL VIEW explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{'))sale_info as sale_info_1;

总结：

Lateral View通常和UDTF一起出现，为了解决UDTF不允许在select字段的问题。
Multiple Lateral View可以实现类似笛卡尔乘积。
Outer关键字可以把不输出的UDTF的空结果，输出成NULL，防止丢失数据。

2.13 分析函数—分组求topN

1、分析函数的作用

对于一些比较复杂的数据求取过程，我们可能就要用到分析函数
分析函数主要用于==分组求topN或者求取百分比，或者进行数据的切片==等等，我们都可以使用分析函数来解决

2、常用的分析函数

（1）ROW_NUMBER()：

从1开始，按照顺序，给分组内的记录加序列；
- 比如，按照pv降序排列，生成分组内每天的pv名次,ROW_NUMBER()的应用场景非常多
- 再比如，获取分组内排序第一的记录;
- 获取一个session中的第一条refer等。

（2）RANK() ：

生成数据项在分组中的排名，排名相等会在名次中留下空位

（3）DENSE_RANK() ：

生成数据项在分组中的排名，排名相等会在名次中不会留下空位

（4）CUME_DIST ：

小于等于当前值的行数/分组内总行数。比如，统计小于等于当前薪水的人数，所占总人数的比例

（5）PERCENT_RANK ：

分组内当前行的RANK值/分组内总行数

（6）NTILE(n) ：

用于将分组数据按照顺序切分成n片，返回当前切片值
如果切片不均匀，默认增加第一个切片的分布。
NTILE不支持ROWS BETWEEN，比如 NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)；

3、需求描述

现有数据内容格式如下，分别对应三个字段，cookieid，createtime ，pv
求取每个cookie访问pv前三名的数据记录，其实就是分组求topN，求取每组当中的前三个值

cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,2
cookie2,2015-04-11,3
cookie2,2015-04-12,5
cookie2,2015-04-13,6
cookie2,2015-04-14,3
cookie2,2015-04-15,9
cookie2,2015-04-16,7

第一步：创建数据库表

在hive当中创建数据库表

CREATE EXTERNAL TABLE cookie_pv (
cookieid string,
createtime string,
pv INT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

第二步：准备数据并加载

node03执行以下命令，创建数据，并加载到hive表当中去
cd /install/hivedatas
vim cookiepv.txt

cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,2
cookie2,2015-04-11,3
cookie2,2015-04-12,5
cookie2,2015-04-13,6
cookie2,2015-04-14,3
cookie2,2015-04-15,9
cookie2,2015-04-16,7
加载数据到hive表当中去
load data local inpath '/install/hivedatas/cookiepv.txt' overwrite into table cookie_pv;

第三步：使用分析函数来求取每个cookie访问PV的前三条记录

select * from (
SELECT cookieid,createtime,pv,
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3
FROM cookie_pv
) temp where temp.rn1 <= 3;

fengge18306

发布了31 篇原创文章 · 获赞 20 · 访问量 3076

私信关注