[] Peruse MapReduce hadoop works

Introduction: Mid something extra rest day, this evening to pick up Hadoop, but feel a little uneasy, I do not know how to choose the method of follow-up study of Hadoop.

Just open the computer, decides:

1, first the basic principles of Hadoop MapReduce and Yarn solid hit to say, the Internet, said side edges remember a good point drawing effect;

2, there is time to see more of the basics of Java and Python , firm firm;

3, to start learning hive and spark

 

text:

How MapReduce divide and conquer?

 Map stages:

splitting a data input (Split):. data read line by line, to obtain a series of (key / value)

Note: Split number allocated according to the file number, key value comprises a carriage

 

 b. Perform Map method is user-defined

 

 c.Mapper key value outputted by the output of the (key, value) sort, and combine process is performed, the same key value cumulative value

Note 1: combine not substituted reduce, but may combine to reduce the amount of data transferred between the map and reduce

Note 2: the map between the two processes and also cobine: the collect and spill

         the collect : map method is the processed data, the general call OutputCollector. collect () collection results, and forming (key / value) inside the slice, and writes a ring buffer

         spill : when the ring buffer is filled, the MapReduce writes the data to the local disk, the temporary files generated

 

 

 

 

 

Reduce stages:

Map output phase value customize reduce function, and outputs the new (key / value), and output as a result.

 

 Reduce stage 5 steps: shuffle (copy) - merge (merging) - sort (sorted) - reduce (execution function) - write (write result)

 

Guess you like

Origin www.cnblogs.com/CQ-LQJ/p/11525286.html