array类型
建表并加载数据 创建表时候指定字段为array类型 location array 指定array中每个的分隔符COLLECTION ITEMS TERMINATED BY ‘,’
hive (wzj)> create table hive_array(
> name string,
> loaction array<string>)
> row format delimited fields terminated by '\t' collection items terminated by ',';
OK
Time taken: 0.426 seconds
hive (wzj)> load data local inpath '/home/wzj/data/hive_array.txt' overwrite into table hive_array;
Loading data to table wzj.hive_array
Table wzj.hive_array stats: [numFiles=1, totalSize=77]
OK
Time taken: 0.98 seconds
0: jdbc:hive2://hadoop001:10000/data_hive> select * from hive_array;
INFO : OK
+------------------+----------------------------------------------+--+
| hive_array.name | hive_array.loaction |
+------------------+----------------------------------------------+--+
| pk | ["beijing","shanghai","tianjin","hangzhou"] |
| jepson | ["changchu","chengdu","wuhan","beijing"] |
+------------------+----------------------------------------------+--+
2 rows selected (0.385 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select name,loaction[0],loaction[2],size(loaction) from hive_array;
INFO : OK
+---------+-----------+----------+------+--+
| name | _c1 | _c2 | _c3 |
+---------+-----------+----------+------+--+
| pk | beijing | tianjin | 4 |
| jepson | changchu | wuhan | 4 |
+---------+-----------+----------+------+--+
2 rows selected (0.201 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive>
where条件使用 array_contains函数
0: jdbc:hive2://hadoop001:10000/data_hive> select * from hive_array where array_contains(loaction,'wuhan');
INFO : OK
+------------------+-------------------------------------------+--+
| hive_array.name | hive_array.loaction |
+------------------+-------------------------------------------+--+
| jepson | ["changchu","chengdu","wuhan","beijing"] |
+------------------+-------------------------------------------+--+
1 row selected (0.238 seconds)
Map类型
建表并加载数据 指定Map类型 members Map<string,string> 指定Map Key的分隔符和字段之间的分隔符 collection items terminated by ‘#’ map keys terminated by ‘:’;
0: jdbc:hive2://hadoop001:10000/data_hive> create table hive_map( id int, name string,members map<string,string>, age int)row format delimited fields terminated by ',' collection items terminated by '#' map keys terminated by ':';
INFO : OK
No rows affected (0.289 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> load data local inpath '/home/wzj/data/hive_map.txt' into table hive_map;
INFO : OK
No rows affected (0.564 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select * from hive_map;
INFO : OK
+--------------+----------------+----------------------------------------------------+---------------+--+
| hive_map.id | hive_map.name | hive_map.members |hive_map.age |
+--------------+----------------+----------------------------------------------------+---------------+--+
| 1 | zhangsan | {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} | 28 |
| 2 | lisi | {"father":"mayun","mother":"huangyi","brother":"guanyu"} | 22 |
| 3 | wangwu | {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} | 29 |
| 4 | mayun | {"father":"mayongzhen","mother":"angelababy"} | 26 |
+--------------+----------------+----------------------------------------------------+---------------+--+
4 rows selected (0.173 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select name,members['father'] as father,members['mother'] as mother from hive_map;
INFO : OK
+-----------+--------------+-------------+--+
| name | father | mother |
+-----------+--------------+-------------+--+
| zhangsan | xiaoming | xiaohuang |
| lisi | mayun | huangyi |
| wangwu | wangjianlin | ruhua |
| mayun | mayongzhen | angelababy |
+-----------+--------------+-------------+--+
4 rows selected (0.23 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select map_keys(members),map_values(members) from hive_map;
INFO : OK
+--------------------------------+-------------------------------------+--+
| _c0 | _c1 |
+--------------------------------+-------------------------------------+--+
| ["father","mother","brother"] | ["xiaoming","xiaohuang","xiaoxu"] |
| ["father","mother","brother"] | ["mayun","huangyi","guanyu"] |
| ["father","mother","sister"] | ["wangjianlin","ruhua","jingtian"] |
| ["father","mother"] | ["mayongzhen","angelababy"] |
+--------------------------------+-------------------------------------+--+
4 rows selected (0.176 seconds)
运用数组里面的array_contains,查询出有兄弟的人,并输出兄弟是谁
0: jdbc:hive2://hadoop001:10000/data_hive> select id,name,members['brother'] brother from hive_map where array_contains(map_keys(members),'brother');
INFO : OK
+-----+-----------+----------+--+
| id | name | brother |
+-----+-----------+----------+--+
| 1 | zhangsan | xiaoxu |
| 2 | lisi | guanyu |
+-----+-----------+----------+--+
2 rows selected (0.175 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive>
struct 结构体类型
0: jdbc:hive2://hadoop001:10000/data_hive> create table hive_struct(
. . . . . . . . . . . . . . . . . . . . .> id string,
. . . . . . . . . . . . . . . . . . . . .> info struct<name:string,age:int>
. . . . . . . . . . . . . . . . . . . . .> ) row format delimited fields terminated by '#'
. . . . . . . . . . . . . . . . . . . . .> collection items terminated by ':' ;
INFO : OK
No rows affected (0.166 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> load data local inpath '/home/wzj/data/hive_struct.txt' into table hive_struct;
INFO : OK
+-----------------+-------------------------------+--+
| hive_struct.id | hive_struct.info |
+-----------------+-------------------------------+--+
| 192.168.1.1 | {"name":"zhangsan","age":40} |
| 192.168.1.2 | {"name":"lisi","age":50} |
| 192.168.1.3 | {"name":"wangwu","age":60} |
| 192.168.1.4 | {"name":"zhaoliu","age":70} |
+-----------------+-------------------------------+--+
4 rows selected (0.131 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select id,info.name,info.age from hive_struct;
INFO : OK
+--------------+-----------+------+--+
| id | name | age |
+--------------+-----------+------+--+
| 192.168.1.1 | zhangsan | 40 |
| 192.168.1.2 | lisi | 50 |
| 192.168.1.3 | wangwu | 60 |
| 192.168.1.4 | zhaoliu | 70 |
+--------------+-----------+------+--+
4 rows selected (0.161 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> create table ad_list(
. . . . . . . . . . . . . . . . . . . . .> ad_id string,
. . . . . . . . . . . . . . . . . . . . .> url string,
. . . . . . . . . . . . . . . . . . . . .> catalogs string
. . . . . . . . . . . . . . . . . . . . .> ) row format delimited fields terminated by '\t';
INFO : OK
No rows affected (0.129 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> load data local inpath '/home/wzj/data/ad_list.txt' into table ad_list;
INFO : OK
No rows affected (0.384 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> create table click_log(
. . . . . . . . . . . . . . . . . . . . .> cookie_id string,
. . . . . . . . . . . . . . . . . . . . .> ad_id string,
. . . . . . . . . . . . . . . . . . . . .> time string
. . . . . . . . . . . . . . . . . . . . .> ) row format delimited fields terminated by '\t';
INFO : OK
No rows affected (0.147 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> load data local inpath '/home/wzj/data/click_log.txt' into table click_log;
INFO : OK
No rows affected (0.354 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select * from click_log;
INFO : OK
+----------------------+------------------+-----------------------------+--+
| click_log.cookie_id | click_log.ad_id | click_log.time |
+----------------------+------------------+-----------------------------+--+
| 11 | ad_101 | 2014-05-01 06:01:12.334+01 |
| 22 | ad_102 | 2014-05-01 07:28:12.342+01 |
| 33 | ad_103 | 2014-05-01 07:50:12.33+01 |
| 11 | ad_104 | 2014-05-01 09:27:12.33+01 |
| 22 | ad_103 | 2014-05-01 09:03:12.324+01 |
| 33 | ad_102 | 2014-05-02 19:10:12.343+01 |
| 11 | ad_101 | 2014-05-02 09:07:12.344+01 |
| 35 | ad_105 | 2014-05-03 11:07:12.339+01 |
| 22 | ad_104 | 2014-05-03 12:59:12.743+01 |
| 77 | ad_103 | 2014-05-03 18:04:12.355+01 |
| 99 | ad_102 | 2014-05-04 00:36:39.713+01 |
| 33 | ad_101 | 2014-05-04 19:10:12.343+01 |
| 11 | ad_101 | 2014-05-05 09:07:12.344+01 |
| 35 | ad_102 | 2014-05-05 11:07:12.339+01 |
| 22 | ad_103 | 2014-05-05 12:59:12.743+01 |
| 77 | ad_104 | 2014-05-05 18:04:12.355+01 |
| 99 | ad_105 | 2014-05-05 20:36:39.713+01 |
+----------------------+------------------+-----------------------------+--+
17 rows selected (0.179 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select * from ad_list;
INFO : OK
+----------------+------------------------+--------------------------------------+--+
| ad_list.ad_id | ad_list.url | ad_list.catalogs |
+----------------+------------------------+--------------------------------------+--+
| ad_101 | http://www.google.com | catalog8|catalog1 |
| ad_102 | http://www.sohu.com | catalog6|catalog3 |
| ad_103 | http://www.baidu.com | catalog7 |
| ad_104 | http://www.qq.com | catalog5|catalog1|catalog4|catalog9 |
| ad_105 | http://sina.com | NULL |
+----------------+------------------------+--------------------------------------+--+
5 rows selected (0.145 seconds)
统计每个人所访问的id去重(如果不需要去重,则使用collect_list函数)
0: jdbc:hive2://hadoop001:10000/data_hive> select cookie_id,collect_set(ad_id) from click_log group by cookie_id;
INFO : OK
+------------+-------------------------------+--+
| cookie_id | _c1 |
+------------+-------------------------------+--+
| 11 | ["ad_101","ad_104"] |
| 22 | ["ad_102","ad_103","ad_104"] |
| 33 | ["ad_103","ad_102","ad_101"] |
| 35 | ["ad_105","ad_102"] |
| 77 | ["ad_103","ad_104"] |
| 99 | ["ad_102","ad_105"] |
+------------+-------------------------------+--+
6 rows selected (41.735 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select click.cookie_id,click.ad_id,click.amount,ad_list.catalogs from
. . . . . . . . . . . . . . . . . . . . .> (select cookie_id,ad_id ,count(1) amount from click_log group by cookie_id,ad_id) click
. . . . . . . . . . . . . . . . . . . . .> join ad_list
. . . . . . . . . . . . . . . . . . . . .> on ad_list.ad_id = click.ad_id;
INFO : OK
+------------------+--------------+---------------+--------------------------------------+--+
| click.cookie_id | click.ad_id | click.amount | ad_list.catalogs |
+------------------+--------------+---------------+--------------------------------------+--+
| 11 | ad_101 | 3 | catalog8|catalog1 |
| 11 | ad_104 | 1 | catalog5|catalog1|catalog4|catalog9 |
| 22 | ad_102 | 1 | catalog6|catalog3 |
| 22 | ad_103 | 2 | catalog7 |
| 22 | ad_104 | 1 | catalog5|catalog1|catalog4|catalog9 |
| 33 | ad_101 | 1 | catalog8|catalog1 |
| 33 | ad_102 | 1 | catalog6|catalog3 |
| 33 | ad_103 | 1 | catalog7 |
| 35 | ad_102 | 1 | catalog6|catalog3 |
| 35 | ad_105 | 1 | NULL |
| 77 | ad_103 | 1 | catalog7 |
| 77 | ad_104 | 1 | catalog5|catalog1|catalog4|catalog9 |
| 99 | ad_102 | 1 | catalog6|catalog3 |
| 99 | ad_105 | 1 | NULL |
+------------------+--------------+---------------+--------------------------------------+--+
14 rows selected (52.864 seconds)
0: jdbc:hive2://hadoop001:10000/data_hive> select ad_id,catalog from ad_list lateral view outer explode(split(catalogs,'\\|')) t as catalog;
INFO : OK
+---------+-----------+--+
| ad_id | catalog |
+---------+-----------+--+
| ad_101 | catalog8 |
| ad_101 | catalog1 |
| ad_102 | catalog6 |
| ad_102 | catalog3 |
| ad_103 | catalog7 |
| ad_104 | catalog5 |
| ad_104 | catalog1 |
| ad_104 | catalog4 |
| ad_104 | catalog9 |
| ad_105 | NULL |
+---------+-----------+--+
10 rows selected (0.145 seconds)