Shuffle process of hadoop MapReduce model

Shuffle is at the heart of the MapReduce programming model. It mainly refers to the process from maptask outputting key/value to reducetask accepting input. This process is also known as "where the magic happens" and is the guarantee that mapreduce can go smoothly. Its official description is as follows:

4df193f5-e56e-308f-9689-eac035dd8a2b.png (774×367)

Let's first analyze the operation of the map side:

In this graph, the input on the map side comes from the data of the block stored locally. Each time the map function is executed, the offset and the data in a line of block will be passed in. After processing by the map function, the data in the form of key/value will be output to the memory. After the data enters the memory, the partition operation will be performed. The partition is mainly to hash the key of the data. Through this step, you can determine which reduce the data with the same key value should be sent to. Generally, the buffer memory is 100m. When a certain threshold is exceeded, the hadoop framework will overflow and write part of the data to the disk file. This process is called spill (overwriting). This process is executed by a separate thread and will not affect the output data of the map function. Before the overflow is written to the disk, the data is sorted according to the key value, and this process is also called sort. We know that in the map stage, a lot of the same key value will be generated. For example, in wordcount, a maptask may generate many key-value pairs such as 'is":1. Suppose there are 100 identical key-value pairs here. These are directly written to the file, which not only occupies the external memory but also increases the I/O time and affects the efficiency. Therefore, if the user has set the combiner, the combiner operation should be performed at this time to combine multiple keys with the same key value. The value pair is merged, here is 'is': 100. In the above figure, we see that in the overflow file, all the data sent to the same reduce side are closely adjacent. So when the overflow is written to the file , hadoop will merge the data of the same partition together. When there are multiple overflow files, these files will be merged into one file. This process is called merge. The merge operation will write all the values ​​of the data with the same key value to In a set. In this process, if the combiner is set, the combiner operation will also be performed.

Let's take a look at the operation of the reduce side:

After the map phase is completed (you don't have to wait for everything to complete), the reduce side will get the output file of the map side through an http request. Because the output files on the map side are stored in partitions, reduce can only obtain data corresponding to its own partitions. These data will be directly in memory, and when the file is pulled, reduce will continue the merge operation. There are three types of merge operations:

1. Memory to memory

2. Memory to disk

3. Disk to Disk

The first case is not enabled by default. When pulling data, if the occupied memory exceeds the threshold, the overflow write thread will also be started. The merge operation during this period is similar to that on the map side. when

When all data is overflowed into memory, a third meger operation is performed to merge multiple overflow files into one. This final file is the input source of reduce, and the entire shuffle operation ends here.

Reference for this blog: http://langyu.iteye.com/blog/992916


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325942099&siteId=291194637