hive中join函数的所有操作

hive中join函数的所有操作

准备数据

a.txt

1,a

2,b

3,c

4,d

7,y

8,u

b.txt

2,bb

3,cc

7,yy

9,pp

建表加载数据

create table a(
id  int,
name string
)
 row format delimited fields terminated by ',';
load data local inpath  '/hive/a.txt'  overwrite into table a;

+-------+---------+
| a.id  | a.name  |
+-------+---------+
| 1     | a       |
| 2     | b       |
| 3     | c       |
| 4     | d       |
| 7     | y       |
| 8     | u       |
+-------+---------+
create table b(
id int,
name string
)
row format delimited fields terminated by ',';
load data local inpath '/hive/b.txt' into  table b; 
 
+-------+---------+
| b.id  | b.name  |
+-------+---------+
| 2     | bb      |
| 3     | cc      |
| 7     | yy      |
| 9     | pp      |
+-------+---------+ 

注意:hive中的join操作的关键字必须在on中指定,不能再where中指定,不然会先做笛卡尔积再做过滤.

1.内关联(inner join)

join关键字默认为内关联,返回两张表中都有的信息,用a表和b表的id关联的,输出两个表相同的内容(inner是默认的可以不用写)

 select * from a join b on a.id = b.id;

 +-------+---------+-------+---------+
| a.id  | a.name  | b.id  | b.name  |
+-------+---------+-------+---------+
| 2     | b       | 2     | bb      |
| 3     | c       | 3     | cc      |
| 7     | y       | 7     | yy      |
+-------+---------+-------+---------+

2.左关联(left join)

以左边的表作为主表和其他的表进行关联,返回记录数和主表的记录数相同,关联不上的字段用null表示;

select * from a left join b on a.id =b.id;
 
 +-------+---------+-------+---------+
| a.id  | a.name  | b.id  | b.name  |
+-------+---------+-------+---------+
| 1     | a       | NULL  | NULL    |
| 2     | b       | 2     | bb      |
| 3     | c       | 3     | cc      |
| 4     | d       | NULL  | NULL    |
| 7     | y       | 7     | yy      |
| 8     | u       | NULL  | NULL    |
+-------+---------+-------+---------+

3.右关联(right join)

以右边的表作为主表和其他的表进行关联,返回记录数和主表的记录数相同,关联不上的字段用null表示;

select * from a right join b on a.id =b.id;
 
 +-------+---------+-------+---------+
| a.id  | a.name  | b.id  | b.name  |
+-------+---------+-------+---------+
| 2     | b       | 2     | bb      |
| 3     | c       | 3     | cc      |
| 7     | y       | 7     | yy      |
| NULL  | NULL    | 9     | pp      |
+-------+---------+-------+---------+

4.全关联(full join)

返回两个表记录的并集,关联不上的字段为NULL;

  select * from a full join b on a.id =b.id;
 
 +-------+---------+-------+---------+
| a.id  | a.name  | b.id  | b.name  |
+-------+---------+-------+---------+
| 1     | a       | NULL  | NULL    |
| 2     | b       | 2     | bb      |
| 3     | c       | 3     | cc      |
| 4     | d       | NULL  | NULL    |
| 7     | y       | 7     | yy      |
| 8     | u       | NULL  | NULL    |
| NULL  | NULL    | 9     | pp      |
+-------+---------+-------+---------+

5.左表交集关联(left semi join):

left semi join以关键字前面的表为主表,两个表对on的条件字段做交集,返回前面表的记录。相当于内关联只返回主表

 select * from a left semi join b on a.id =b.id;
 
 +-------+---------+
| a.id  | a.name  |
+-------+---------+
| 2     | b       |
| 3     | c       |
| 7     | y       |
+-------+---------+
 
发布了48 篇原创文章 · 获赞 11 · 访问量 1546

猜你喜欢

转载自blog.csdn.net/weixin_45896475/article/details/104012391