Detailed explanation of the working principle of MapReduce (transfer)

Reprinted from: http://weixiaolu.iteye.com/blog/1474172

 

Foreword: 

Some time ago, our cloud computing team learned the knowledge related to hadoop together. Everyone actively did and learned a lot, and gained a lot. But after school started, everyone was busy with their own affairs, and there was not much movement in cloud computing. Haha~ But recently, under the call of Boss Hu, our cloud computing team has been revived. I hope everyone will continue to fight with the slogan "Cloud in hand, follow me". Even if this blog post is a testimony to our team's "restarting cloud computing", I hope there will be more excellent articles published. Tang Shuai, Liang Zai, Mr. Xie•••Let's do it!

Hehe, let's get to the topic below. This article mainly analyzes the following two points:
Contents:
1. MapReduce job running process
2. Shuffle and sorting process in Map and Reduce tasks
 

Text:  

1. MapReduce job running process


The following is a schematic diagram of the process I drew with visio2010:

 

 

 

Process analysis:


1. Start a job on the client side.


2. Request a Job ID from the JobTracker.


3. Copy the resource files required to run the job to HDFS, including the JAR file packaged by the MapReduce program, the configuration file, and the input partition information calculated by the client. These files are stored in a folder created by JobTracker specifically for the job. The folder name is the Job ID of the job. The JAR file will have 10 copies by default (controlled by the mapred.submit.replication property); the input partition information tells the JobTracker how many map tasks should be started for this job.


4. After the JobTracker receives the job, it puts it in a job queue and waits for the job scheduler to schedule it (is this very similar to the process scheduling in the microcomputer, huh), when the job scheduler uses its own scheduling algorithm When the job is scheduled, a map task will be created for each partition according to the input partition information, and the map task will be assigned to the TaskTracker for execution. For map and reduce tasks, TaskTracker has a fixed number of map slots and reduce slots according to the number of host cores and memory size. What needs to be emphasized here is that the map task is not randomly assigned to a TaskTracker. There is a concept here: Data-Local. It means: assign the map task to the TaskTracker containing the data block processed by the map, and copy the program JAR package to the TaskTracker to run, which is called "operation movement, data movement". Data localization is not considered when assigning reduce tasks.


5. The TaskTracker will send a heartbeat to the JobTracker every once in a while, telling the JobTracker that it is still running, and the heartbeat also carries a lot of information, such as the progress of the current map task completion. When the JobTracker receives the job's last task completion message, it sets the job to "success". When the JobClient queries the status, it will know that the task is complete and display a message to the user.

The above is to analyze the working principle of MapReduce at the level of client, JobTracker, and TaskTracker. Let's be a little more detailed and analyze and analyze from the level of map task and reduce task.

2. Shuffle and sorting process in Map and Reduce tasks


Also post the schematic diagram of the process I drew in visio:

 

流程分析: 

Map端: 

1.每个输入分片会让一个map任务来处理,默认情况下,以HDFS的一个块的大小(默认为64M)为一个分片,当然我们也可以设置块的大小。map输出的结果会暂且放在一个环形内存缓冲区中(该缓冲区的大小默认为100M,由io.sort.mb属性控制),当该缓冲区快要溢出时(默认为缓冲区大小的80%,由io.sort.spill.percent属性控制),会在本地文件系统中创建一个溢出文件,将该缓冲区中的数据写入这个文件。

2.在写入磁盘之前,线程首先根据reduce任务的数目将数据划分为相同数目的分区,也就是一个reduce任务对应一个分区的数据。这样做是为了避免有些reduce任务分配到大量数据,而有些reduce任务却分到很少数据,甚至没有分到数据的尴尬局面。其实分区就是对数据进行hash的过程。然后对每个分区中的数据进行排序,如果此时设置了Combiner,将排序后的结果进行Combia操作,这样做的目的是让尽可能少的数据写入到磁盘。

3.当map任务输出最后一个记录时,可能会有很多的溢出文件,这时需要将这些文件合并。合并的过程中会不断地进行排序和combia操作,目的有两个:1.尽量减少每次写入磁盘的数据量;2.尽量减少下一复制阶段网络传输的数据量。最后合并成了一个已分区且已排序的文件。为了减少网络传输的数据量,这里可以将数据压缩,只要将mapred.compress.map.out设置为true就可以了。

4. Copy the data in the partition to the corresponding reduce task. Some people may ask: How does the data in the partition know which reduce it corresponds to? In fact, the map task has always kept in touch with its parent TaskTracker, and the TaskTracker has always kept a heartbeat with the JobTracker. Therefore, the macro information of the entire cluster is saved in the JobTracker. As long as the reduce task obtains the corresponding map output position from the JobTracker, it is ok.

At this point, the map side is analyzed. So what exactly is Shuffle? Shuffle means "shuffle" in Chinese. If we look at it this way: the data generated by a map is partitioned through the hash process but assigned to different reduce tasks. Is it a process of shuffling the data? Ha ha.

Reduce side: 

1. Reduce will receive data from different map tasks, and the data from each map is ordered. If the amount of data accepted by the reduce side is fairly small, it is stored directly in memory (the buffer size is controlled by the mapred.job.shuffle.input.buffer.percent property, which represents the percentage of heap space used for this purpose), if the amount of data If it exceeds a certain proportion of the buffer size (determined by mapred.job.shuffle.merge.percent), the data is merged and then overflowed and written to disk.

2. As overflow files grow, background threads will merge them into a larger ordered file, in order to save time for subsequent merges. In fact, no matter on the map side or the reduce side, MapReduce performs sorting and merging operations repeatedly. Now I finally understand why some people say: sorting is the soul of hadoop.

3. There will be many intermediate files (written to disk) during the merging process, but MapReduce will make the data written to the disk as little as possible, and the result of the last merge is not written to the disk, but is directly input to the reduce function .

At this point, the working principle of MapReduce has finally been analyzed, but I will continue to study it in depth. Please pay attention to my follow-up hadoop related blogs.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326300296&siteId=291194637