MapReduce implementation mechanisms of ReduceTask

Reduce roughly divided Copy , the Sort , the reduce in three stages, with emphasis on the first two stages . copy phase contains a eventFetcher to get the completed map list by Fetcher to thread copy data, this process will start two merge threads, respectively inMemoryMerger and onDiskMerger , respectively, in-memory data merge to disk and the disk the data Merge . The data to be copy after completion, copy phase is complete, start sort stage, sort the main stage is the implementation of finalMerge operation, pure sort stage, after the completion of that reduce stage, to call a user-defined reduceFunction for processing.

detailed steps:

 

  1. Copy phase , simply pull data. Reduce process started some data copy thread ( Fetcher ) , via HTTP mode request maptask obtain files they own.
  2. Merge stage . Heremergeasmapendmergeoperation, but the array is stored in a differentmapsidecopyvalues come. Copyover data will be first into the memory buffer, the buffer size than heremapmore flexible end. mergethree forms: memory to memory; memory to disk; disk to disk. Dir default form is not enabled. When the amount of data in the memory reaches a certain threshold, the boot memory to diskMerge. Andmapend similar, which is the overflow of the process of writing, this process if you have setCombiner, also enabled, and then generate a large number of overflow write files on the disk. The secondmergemode has been run, until there is nomapwhen the data side of the end, and then start the third disk-to-diskmergeway generate the final document.
  3. After the data is combined into a dispersion of large data, then also after merged data sorting .
  4. Sort of key value call to the reduce method , equal key value pairs called once reduce method, each call will have zero or more key-value pairs, and finally the output of these keys written to the HDFS file .

 

 
 

 



Guess you like

Origin www.cnblogs.com/TiePiHeTao/p/5d35cf700d18c6ad01323b3f4093e99c.html