hive中join函数的所有操作
准备数据
a.txt
1,a
2,b
3,c
4,d
7,y
8,u
b.txt
2,bb
3,cc
7,yy
9,pp
建表加载数据
create table a(
id int,
name string
)
row format delimited fields terminated by ',';
load data local inpath '/hive/a.txt' overwrite into table a;
+-------+---------+
| a.id | a.name |
+-------+---------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
| 7 | y |
| 8 | u |
+-------+---------+
create table b(
id int,
name string
)
row format delimited fields terminated by ',';
load data local inpath '/hive/b.txt' into table b;
+-------+---------+
| b.id | b.name |
+-------+---------+
| 2 | bb |
| 3 | cc |
| 7 | yy |
| 9 | pp |
+-------+---------+
注意:hive中的join操作的关键字必须在on中指定,不能再where中指定,不然会先做笛卡尔积再做过滤.
1.内关联(inner join)
join关键字默认为内关联,返回两张表中都有的信息,用a表和b表的id关联的,输出两个表相同的内容(inner是默认的可以不用写)
select * from a join b on a.id = b.id;
+-------+---------+-------+---------+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+
| 2 | b | 2 | bb |
| 3 | c | 3 | cc |
| 7 | y | 7 | yy |
+-------+---------+-------+---------+
2.左关联(left join)
以左边的表作为主表和其他的表进行关联,返回记录数和主表的记录数相同,关联不上的字段用null表示;
select * from a left join b on a.id =b.id;
+-------+---------+-------+---------+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+
| 1 | a | NULL | NULL |
| 2 | b | 2 | bb |
| 3 | c | 3 | cc |
| 4 | d | NULL | NULL |
| 7 | y | 7 | yy |
| 8 | u | NULL | NULL |
+-------+---------+-------+---------+
3.右关联(right join)
以右边的表作为主表和其他的表进行关联,返回记录数和主表的记录数相同,关联不上的字段用null表示;
select * from a right join b on a.id =b.id;
+-------+---------+-------+---------+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+
| 2 | b | 2 | bb |
| 3 | c | 3 | cc |
| 7 | y | 7 | yy |
| NULL | NULL | 9 | pp |
+-------+---------+-------+---------+
4.全关联(full join)
返回两个表记录的并集,关联不上的字段为NULL;
select * from a full join b on a.id =b.id;
+-------+---------+-------+---------+
| a.id | a.name | b.id | b.name |
+-------+---------+-------+---------+
| 1 | a | NULL | NULL |
| 2 | b | 2 | bb |
| 3 | c | 3 | cc |
| 4 | d | NULL | NULL |
| 7 | y | 7 | yy |
| 8 | u | NULL | NULL |
| NULL | NULL | 9 | pp |
+-------+---------+-------+---------+
5.左表交集关联(left semi join):
left semi join以关键字前面的表为主表,两个表对on的条件字段做交集,返回前面表的记录。相当于内关联只返回主表
select * from a left semi join b on a.id =b.id;
+-------+---------+
| a.id | a.name |
+-------+---------+
| 2 | b |
| 3 | c |
| 7 | y |
+-------+---------+