Several key points in the mapreduce process


   The simplest process of data flow in MapReduce (1): map-reduce
   (2) The process of customizing partitioner to send the result of map to the specified reducer: map-partition-
   reduce Optimization) process: map - combin (local reduce) - partition -reduce

The processing result of the map function is placed in the memory, which is also called the ring buffer. When the buffer reaches 80%, overflow will be formed, while writing to the disk Input, generally accept map output, the specific process is a pipeline mechanism

map data is first written to the buffer and sorted internally, and then written to the disk when it reaches a certain size. If it has to be written every time, it is too expensive to be practical. The size of the buffer period is an important aspect of tuning. When the intermediate result of the map is very large, should the buffer

combiner process be appropriately increased before

all the map processing is completed before the reduce is executed?
Answer: No, reduce will perform a preprocessing. Reduce will preprocess the map nodes that have been processed, such as data sorting

, and then start the real reduce calculation when all the data comes over. When the

buffer buffer is full (80%), it will be written to the disk. That process is called overflow writing.

The purpose of partitioning is to determine which Reducer the Mapper's output records are sent to for processing based on the Key value. The grouping is easier to understand. Grouping is related to the Key of the record. In the same partition, records with the same key value belong to the same group.

For general keys, as long as the key value is the same, the corresponding value will be assigned to the same reduce;
For compound keys, the form is TextPair<key1,key2>. By controlling key1 to partition, the values ​​with the same key1 will be divided into the same partition, but if the key2 is not the same at this time, the different key2 will be divided into the same partition. Divide into different groups


Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=327016703&siteId=291194637