目前hive不支持 in或not in 中包含查询子句的语法,所以只能通过left join实现。
假设有一个用户浏览商品表skim,和一个用户购买商品表buy。如下
skim表
userId | itemId | time |
---|---|---|
001 | 342 | 2015-05-08 |
002 | 382 | 2015-05-09 |
003 | 458 | 2015-05-09 |
004 | 468 | 2015-05-09 |
buy表
userId | itemId | time |
---|---|---|
001 | 342 | 2015-05-07 |
002 | 382 | 2015-05-08 |
003 | 458 | 2015-05-09 |
005 | 325 | 2015-05-09 |
select * from skim left outer join buy
on skim.userId = buy .userId and skim.itemId = buy .itemId
结果如下:
skim.userId | skim.itemId | skim.time | buy.userId | buy.itemId | buy.time |
---|---|---|---|---|---|
001 | 342 | 2015-05-08 | 001 | 342 | 2015-05-08 |
002 | 382 | 2015-05-09 | 002 | 382 | 2015-05-09 |
003 | 458 | 2015-05-09 | 003 | 458 | 2015-05-09 |
004 | 468 | 2015-05-09 |
IN
如果要查询在skim表中并且也在buy表中的信息,需要用in查询,hive sql如下:
select skim.userId , skim.itemId from skim left outer join buy
on skim.userId = buy .userId and skim.itemId = buy .itemId where buy .userId is not null;
结果如下:
userId | itemId |
---|---|
001 | 342 |
002 | 382 |
003 | 458 |
NOT IN
如果要查询在skim表中并且不也在buy表中的信息,需要用not in查询,hive sql如下:
select skim.userId, skim.itemId from skim left outer join buy
on skim.userId=buy .userId and skim.itemId=buy .itemId where buy .userId is null;
结果如下:
userId | itemId |
---|---|
004 | 468 |