order by
It will be done on the input 全局排序
, so there is only one reducer (multiple reducers cannot guarantee global ordering), which will lead to a longer calculation time when the input scale is large.
sort by
It is not a global sort, which completes the sorting before the data enters the reducer.
Therefore, if you sort with sort by, and set mapred.reduce.tasks>1
, then sort by 只保证每个 reducer 的输出有序,不保证全局有序
.
See you next time, bye!