The purpose of the MapReduce job is to read the data of multiple tables in the database, then filter according to the specific business situation in JAVA, and write the results that match the data to HDFS. When submitting the job for debugging in Eclipse, it is found that the In the Reduce phase, the exception of Java heap space is always thrown. This abnormality is obviously caused by the overflow of heap memory. Then Sanxian carefully looked at the code of the business block. When reading the database in Reduce, there are several tables returned. The amount of data is about 500,000. Because the specific amount is not too large, it is not returned by paging. After reading, the data is encapsulated by Map collection, and it will stay in memory for a period of time during business processing. The configuration reduce memory in the original mapred-site.xml is relatively small, you only need to increase the memory here.
- <property>
- <name>mapreduce.map.memory.mb</name>
- <value>215</value>
- </property>
- <property>
- <name>mapreduce.map.java.opts</name>
- <value>-Xmx215M</value>
- </property>
- <property>
- <name>mapreduce.reduce.memory.mb</name>
- <value>1024</value>
- </property>
- <property>
- <name>mapreduce.reduce.java.opts</name>
- <value>-Xmx1024M</value>
- </property>
<property> <name>mapreduce.map.memory.mb</name> <value>215</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx215M</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx1024M</value> </property>
Several important parameters of hadoop2.2 memory control:
- YARN
- yarn.scheduler.minimum-allocation-mb
- yarn.scheduler.maximum-allocation-mb
- yarn.nodemanager.vmem-pmem-ratio
- yarn.nodemanager.resource.memory.mb
- Mapreuce
- Map Memory
- mapreduce.map.java.opts
- mapreduce.map.memory.mb
- Reduce Memory
- mapreduce.reduce.java.opts
- mapreduce.reduce.memory.mb
YARN yarn.scheduler.minimum-allocation-mb yarn.scheduler.maximum-allocation-mb yarn.nodemanager.vmem-pmem-ratio yarn.nodemanager.resource.memory.mb Mapreuce Map Memory mapreduce.map.java.opts mapreduce.map.memory.mb Reduce Memory mapreduce.reduce.java.opts mapreduce.reduce.memory.mb
If an exception occurs:
- Container [pid=17645,containerID=container_1415210272486_0013_01_000004] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.6 GB of 2.1 GB virtual memory used. Killing container.
- Dump of the process-tree for container_1415210272486_0013_01_000004 :
Container [pid=17645,containerID=container_1415210272486_0013_01_000004] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.6 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1415210272486_0013_01_000004 :
You can adjust the ratio of yarn.nodemanager.vmem-pmem-ratio, the default is 2.1, or try to increase the number of program reduce runs. The control of this ratio affects the use of virtual memory. When the virtual memory calculated by yarn, the ratio of When the mapreduce.map.memory.mb or mapreduce.reduce.memory.mb in mapred-site.xml is more than 2.1 times, the exception in the above screenshot will occur, and the default mapreduce.map.memory.mb or
The initial size of mapreduce.reduce.memory.mb is 1024M, and then according to the virtual memory calculated by the abnormal yarn itself according to the operating environment, it is found that it is larger than 1024*2.1, so it will be killed by the NodeManage daemon. AM container, thus causing the entire MR job to fail, now we just need to increase this ratio to avoid this exception. The specific adjustment is large and small, which can be set according to the specific situation.
As a final note, the script configuration of hadoop's environment variables is java.sh:
- export PATH=.:$PATH
- export FSE_HOME="/home/search/fse2"
- export FSE_CONF_DIR=$FSE_HOME/conf
- export PATH=$PATH:$FSE_HOME/bin
- user="search"
- export JAVA_HOME="/usr/local/jdk"
- export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
- export PATH=$PATH:$JAVA_HOME/bin
- export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
- export HADOOP_HOME=/home/search/hadoop
- export HADOOP_MAPRED_HOME=$HADOOP_HOME
- export HADOOP_COMMON_HOME=$HADOOP_HOME
- export HADOOP_HDFS_HOME=$HADOOP_HOME
- export YARN_HOME=$HADOOP_HOME
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
- export CLASSPATH=.:$CLASSPATH:$HADOOP_COMMON_HOME:$HADOOP_COMMON_HOMEi/lib:$HADOOP_MAPRED_HOME:$HADOOP_HDFS_HOME:$HADOOP_HDFS_HOME
- #export HADOOP_HOME=/home/$user/hadoop
- #export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- #export CLASSPATH=.:$CLASSPATH:$HADOOP_HOME/lib
- #export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
- export ANT_HOME=/usr/local/ant
- export CLASSPATH=$CLASSPATH:$ANT_HOME/lib
- export PATH=$PATH:$ANT_HOME/bin
- export MAVEN_HOME="/usr/local/maven"
- export CLASSPATH=$CLASSPATH:$MAVEN_HOME/lib
- export PATH=$PATH:$MAVEN_HOME/bin