MapReduce job

The purpose of the MapReduce job is to read the data of multiple tables in the database, then filter according to the specific business situation in JAVA, and write the results that match the data to HDFS. When submitting the job for debugging in Eclipse, it is found that the In the Reduce phase, the exception of Java heap space is always thrown. This abnormality is obviously caused by the overflow of heap memory. Then Sanxian carefully looked at the code of the business block. When reading the database in Reduce, there are several tables returned. The amount of data is about 500,000. Because the specific amount is not too large, it is not returned by paging. After reading, the data is encapsulated by Map collection, and it will stay in memory for a period of time during business processing. The configuration reduce memory in the original mapred-site.xml is relatively small, you only need to increase the memory here.



Java code copy code  Favorite code
  1. <property>  
  2.     <name>mapreduce.map.memory.mb</name>  
  3.     <value>215</value>  
  4. </property>  
  5. <property>  
  6.     <name>mapreduce.map.java.opts</name>  
  7.     <value>-Xmx215M</value>  
  8. </property>  
  9.   
  10. <property>  
  11.     <name>mapreduce.reduce.memory.mb</name>  
  12.     <value>1024</value>  
  13. </property>  
  14. <property>  
  15.     <name>mapreduce.reduce.java.opts</name>  
  16.     <value>-Xmx1024M</value>  
  17. </property>  
<property>
    <name>mapreduce.map.memory.mb</name>
    <value>215</value>
</property>
<property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx215M</value>
</property>

<property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>1024</value>
</property>
<property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx1024M</value>
</property>




Several important parameters of hadoop2.2 memory control:

Java code copy code  Favorite code
  1. YARN  
  2. yarn.scheduler.minimum-allocation-mb  
  3. yarn.scheduler.maximum-allocation-mb  
  4. yarn.nodemanager.vmem-pmem-ratio  
  5. yarn.nodemanager.resource.memory.mb  
  6. Mapreuce  
  7. Map Memory  
  8. mapreduce.map.java.opts  
  9. mapreduce.map.memory.mb  
  10. Reduce Memory  
  11. mapreduce.reduce.java.opts  
  12. mapreduce.reduce.memory.mb  
YARN
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.vmem-pmem-ratio
yarn.nodemanager.resource.memory.mb
Mapreuce
Map Memory
mapreduce.map.java.opts
mapreduce.map.memory.mb
Reduce Memory
mapreduce.reduce.java.opts
mapreduce.reduce.memory.mb




If an exception occurs:

Java code copy code  Favorite code
  1. Container [pid=17645,containerID=container_1415210272486_0013_01_000004] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.6 GB of 2.1 GB virtual memory used. Killing container.  
  2. Dump of the process-tree for container_1415210272486_0013_01_000004 :  
Container [pid=17645,containerID=container_1415210272486_0013_01_000004] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.6 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1415210272486_0013_01_000004 :




You can adjust the ratio of yarn.nodemanager.vmem-pmem-ratio, the default is 2.1, or try to increase the number of program reduce runs. The control of this ratio affects the use of virtual memory. When the virtual memory calculated by yarn, the ratio of When the mapreduce.map.memory.mb or mapreduce.reduce.memory.mb in mapred-site.xml is more than 2.1 times, the exception in the above screenshot will occur, and the default mapreduce.map.memory.mb or
The initial size of mapreduce.reduce.memory.mb is 1024M, and then according to the virtual memory calculated by the abnormal yarn itself according to the operating environment, it is found that it is larger than 1024*2.1, so it will be killed by the NodeManage daemon. AM container, thus causing the entire MR job to fail, now we just need to increase this ratio to avoid this exception. The specific adjustment is large and small, which can be set according to the specific situation.





As a final note, the script configuration of hadoop's environment variables is java.sh:

Java code copy code  Favorite code
  1. export PATH=.:$PATH  
  2.   
  3.   
  4. export FSE_HOME="/home/search/fse2"  
  5. export FSE_CONF_DIR=$FSE_HOME/conf  
  6. export PATH=$PATH:$FSE_HOME/bin  
  7.   
  8.   
  9. user="search"  
  10. export JAVA_HOME="/usr/local/jdk"  
  11. export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib  
  12. export PATH=$PATH:$JAVA_HOME/bin  
  13.   
  14.   
  15.   
  16. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin  
  17. export HADOOP_HOME=/home/search/hadoop  
  18. export HADOOP_MAPRED_HOME=$HADOOP_HOME  
  19. export HADOOP_COMMON_HOME=$HADOOP_HOME  
  20. export HADOOP_HDFS_HOME=$HADOOP_HOME  
  21. export YARN_HOME=$HADOOP_HOME  
  22. export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop  
  23. export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop  
  24. export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin  
  25. export CLASSPATH=.:$CLASSPATH:$HADOOP_COMMON_HOME:$HADOOP_COMMON_HOMEi/lib:$HADOOP_MAPRED_HOME:$HADOOP_HDFS_HOME:$HADOOP_HDFS_HOME  
  26.   
  27.   
  28.   
  29.   
  30. #export HADOOP_HOME=/home/$user/hadoop  
  31. #export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop  
  32. #export CLASSPATH=.:$CLASSPATH:$HADOOP_HOME/lib  
  33. #export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin  
  34.   
  35. export ANT_HOME=/usr/local/ant  
  36. export CLASSPATH=$CLASSPATH:$ANT_HOME/lib  
  37. export PATH=$PATH:$ANT_HOME/bin  
  38.   
  39. export MAVEN_HOME="/usr/local/maven"  
  40. export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib  
  41. export PATH=$PATH:$MAVEN_HOME/bin  

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326608303&siteId=291194637