What are the values stored in the key and value of the Hadoop mapreduce process?

Reprinted from: https://www.cnblogs.com/gaopeng527/p/5436820.html

Take wordCount as an example here, you can understand it directly by looking at the picture:

(1) inputFormat reads the file to be processed on hdfs line by line, and splits the file into splits. Since the test file is small, each file is a split, and the file is divided by line to form <key, value> pair, as shown in Figure 4-1. This step is automatically completed by the MapReduce framework, where the offset (ie key value) includes the number of characters occupied by the carriage return (Windows and Linux environments will be different).

Here is to process each file by line. There are two files in the following figure, each file has two lines, the offset of the position of the beginning character of each line, the offset of the beginning of the first line is naturally 0, hello world There are a total of 10 offsets, plus 11 offsets in the middle space, and the carriage return counts one more. The offset at the beginning of the second line is 12.

 

 image

Figure 4-1 Segmentation process

 

  2) Give the segmented <key, value> pair to the user-defined map method for processing to generate a new <key, value> pair, as shown in Figure 4-2.

 Here is the user-defined map processing program. The characters of each line are divided by " ", and each element of the division is recorded as 1, that is, all the values ​​of the map node are 1.

 image

Figure 4-2 Execute the map method

 

  3) After obtaining the <key, value> pairs output by the map method, Mapper will sort them according to the key value, and execute the Combine process to accumulate the key to the same value to obtain the final output result of Mapper. As shown in Figure 4-3.

 

 image

Figure 4-3 Map-side sorting and Combine process

 

  4) The Reducer first sorts the data received from the Mapper, and then passes it to the user-defined reduce method for processing to obtain a new <key, value> pair, which is used as the output result of WordCount, as shown in Figure 4-4.

 

 image

Figure 4-4 Sorting and output results on the Reduce side


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325939736&siteId=291194637