Memory settings in Hadoop YARN

When Hadoop runs a very simple program in spark with yarn, an error is reported:
java.lang.IllegalStateException: Spark context stopped while waiting for backend 
    at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:614) 
    at org. apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:169) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:567) 
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala :2313) 

Then Baidu searched, some people said that it was because of some incompatibility between Java 8 and Hadoop 2.7.3 YARN, which caused memory overflow, resulting in abnormal termination of the program. The solution is to modify the yarn-site.xml configuration under Hadoop file, add the following properties:

<property> 
    <name>yarn.nodemanager.pmem-check-enabled</name> 
    <value>false</value> 
</property> 
 
<  property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value> 
</property> 

Then restart YARN, and then run Spark in --master yarn mode.
      After modifying the configuration file according to the above operations, it is found that Spark can indeed start normally, and the problem of abnormal exit of the word segmentation program Action that occurred two days ago no longer exists, and then checked yarn.nodemanager.pmem-check-enabled and yarn .nodemanager.vmem-check-enabled These two configuration items, it is found that the default configuration of these two parameters YARN has an impact on the abnormal exit encountered before.
      First, yarn.nodemanager.pmem-check-enabled indicates whether to start a thread to check the amount of physical memory each task is using. If the task exceeds the allocated value, it will be killed directly. The default is true, that is, if my word segmentation When the action is executed, if the amount of memory used is greater than the default physical memory of the system, it will be Killed. yarn.nodemanager.vmem-check-enabled
indicates whether to start a thread to check the amount of virtual memory each task is using. If the task exceeds the allocated value, it will be killed directly. The default is true, that is, if my word segmentation operation When the action is executed, if the amount of virtual memory used is greater than the default physical memory of the system, it will be killed.
Then in the default configuration of YARN, there are the following parameters
yarn.nodemanager.vmem-pmem-ratio is the maximum amount of virtual memory that can be used for each 1MB of physical memory used by the task, the default is 2.1; yarn.nodemanager.resource.memory-mb is the total amount of physical memory that can be used by YARN on the node, The default is 8192 (MB); however, the memory of the server where my YARN is located is actually only 4G. Since the parameter value of the total amount of available physical memory is not modified, when the action is executed, the memory is burst and the spark task is killed.

Therefore, when running Spark in --master yarn mode, you must first adjust YARN's memory-related parameter settings, otherwise there will be unexpected consequences.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326257607&siteId=291194637