Common methods for optimization of hadoop jobs

After the mapreduce application mechanism is completed, we often face a common problem "job running too slow". At this time, we need to tune the following aspects to improve the running speed of the job

(1) You can view this document through the web interface of jobtracker. The number of mappers used in the second job, check the average running time of each mapper, if the mapper running time is too short (for example, each mapper runs for more than 10 seconds), then Miao Ming mapper is not well used, we need to reduce the number of mappers , making each maper run longer. The runtime of the mapper depends on the format of the mapper input data, so we can adjust the mapper's input format.

(2) View the number of reducers in this job through the web interface of Jobtracker. The number of reducers in the cluster should be slightly smaller than the number of task slots of the reducers, so that the reducers can complete tasks in the same cycle, avoiding the generation of one reducer due to dynamic management to process two. situation of the task.

(3) Whether the use of combiner is reasonable, making full use of combiner can reduce the amount of data transmitted by shuffle, network transmission is reduced, and the job running speed will naturally be faster, but combiner should be used with caution, depending on the situation, take the average value of the job and try not to use it Combiner, it will produce a large deviation.

(4) Similar to 3, there is another way to reduce network transmission. The output of the map is compressed, and the amount of compressed data is reduced, which also reduces the pressure of network transmission.

(5) In order to make job sorting more reasonable, it can be customized Sequence, custom comparator, but it should be noted that it must be ensured that RawComparator has been implemented

(6) Finally, shuffle can be adjusted, and some memory management parameters can be adjusted to make up for the lack of performance. The

writing is relatively rough, and some details will be filled in time. together.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326849845&siteId=291194637