introduce
LEFT SEMI JOIN (left semi-join) is a more efficient implementation of IN/EXISTS subqueries .
Hive does not currently implement IN/EXISTS subqueries, so you can rewrite your subqueries with LEFT SEMI JOIN .
Example
can be rewritten as
Features
1. The limitation of the left semi join is that the table on the right in the JOIN clause can only set the filter condition in the ON clause, and cannot filter in the WHERE clause, the SELECT clause or other places.
2. The left semi join only transmits the join key of the table to the map stage, so the result of the last select in the left semi join can only appear in the left table.
3. Because the left semi join is in(keySet) relationship, if the duplicate records in the right table are encountered, the left table will be skipped, and the join will be traversed all the time. This leads to the fact that only one left semi join is generated when the right table has duplicate values, and multiple joins are generated, which also leads to higher performance of left semi join.