Introduction: Mid something extra rest day, this evening to pick up Hadoop, but feel a little uneasy, I do not know how to choose the method of follow-up study of Hadoop.
Just open the computer, decides:
1, first the basic principles of Hadoop MapReduce and Yarn solid hit to say, the Internet, said side edges remember a good point drawing effect;
2, there is time to see more of the basics of Java and Python , firm firm;
3, to start learning hive and spark
text:
How MapReduce divide and conquer?
Map stages:
splitting a data input (Split):. data read line by line, to obtain a series of (key / value)
Note: Split number allocated according to the file number, key value comprises a carriage
b. Perform Map method is user-defined
c.Mapper key value outputted by the output of the (key, value) sort, and combine process is performed, the same key value cumulative value
Note 1: combine not substituted reduce, but may combine to reduce the amount of data transferred between the map and reduce
Note 2: the map between the two processes and also cobine: the collect and spill
the collect : map method is the processed data, the general call OutputCollector. collect () collection results, and forming (key / value) inside the slice, and writes a ring buffer
spill : when the ring buffer is filled, the MapReduce writes the data to the local disk, the temporary files generated
Reduce stages:
Map output phase value customize reduce function, and outputs the new (key / value), and output as a result.
Reduce stage 5 steps: shuffle (copy) - merge (merging) - sort (sorted) - reduce (execution function) - write (write result)