Explanation of left semi join of hive

introduce


LEFT SEMI JOIN (left semi-join) is a more efficient implementation of IN/EXISTS subqueries .

Hive does not currently implement IN/EXISTS subqueries, so you can  rewrite your subqueries with LEFT SEMI JOIN .


Example



can be rewritten as



Features


1. The limitation of the left semi join is that the table on the right in the JOIN clause can only set the filter condition in the ON clause, and cannot filter in the WHERE clause, the SELECT clause or other places.

2. The left semi join only transmits the join key of the table to the map stage, so the result of the last select in the left semi join can only appear in the left table.

3. Because the left semi join is in(keySet) relationship, if the duplicate records in the right table are encountered, the left table will be skipped, and the join will be traversed all the time. This leads to the fact that only one left semi join is generated when the right table has duplicate values, and multiple joins are generated, which also leads to higher performance of left semi join.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325968986&siteId=291194637
Recommended