The difference between sort by and order by in Hive

order byIt will be done on the input 全局排序, so there is only one reducer (multiple reducers cannot guarantee global ordering), which will lead to a longer calculation time when the input scale is large.

sort byIt is not a global sort, which completes the sorting before the data enters the reducer.
Therefore, if you sort with sort by, and set mapred.reduce.tasks>1, then sort by 只保证每个 reducer 的输出有序,不保证全局有序.


See you next time, bye!

Guess you like

Origin blog.csdn.net/frdevolcqzyxynjds/article/details/131856333