not in the hive optimization

For example: A, B two tables, to find the ID field, the presence of Table A, Table B, but the data does not exist.

          Table A total of 13w, after deduplication 3w,

          Table B Total 2W, and the index

method one

not in, easy to understand, low efficiency, time: 1.395s

select distinct A.id from A where A.id not in(select id from B)

Method Two

left...join...on ,B.id isnull    时间:0.739s

select A.ID from A left join B on A.ID=B.ID where B.ID is null

Method Three

High efficiency, time: 0.57s

select * from  A where (select count(1) as num from B where A.ID = B.ID) = 0


Guess you like

Origin www.cnblogs.com/starzy/p/11146056.html