Performance optimization hive size table join

When a large table and a small table join operation, mapjoin performance than ordinary join much faster, mapjoin can solve data skew problem, the basic principle: in a small amount of data, the small table will be loaded into the execution of all join memory operation of the program in order to speed up the execution of the join.

When the size of the table join, in front of the small table, the table will be small cache.

mapjoin small table into memory, one by one in the map matching table and a large end, the operation is omitted reduce

Guess you like

Origin www.cnblogs.com/18800105616a/p/11454117.html