spark.shuffle.service.enabled使用

Normally, this refers to a long-running auxiliary service in NodeManager to improve Shuffle computing performance. The default is false, which means that the function is not enabled, and it is not very friendly to the computing task itself

The reason why it is generally not turned on is because it only promotes the use of an auxiliary service to get data from the executor for other places when the executor cannot provide data externally for GC or other reasons, and the main reason for not using it is The use of it still requires an external auxiliary shuffle service, such as the External shuffle Service service in the NodeManager of yarn

If you use it in on yarn mode, you need to configure the following information in yarn-site.xml

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>spark_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
    <value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
    <name>spark.shuffle.service.port</name>
    <value>7337</value>
</property>

For spark, you need to add the following configuration in spark-defaults.conf

spark.shuffle.service.enabled   true
spark.shuffle.service.port      7337

These two configurations are to open this task, and the other is the port of the service

Guess you like

Origin blog.csdn.net/dudadudadd/article/details/114698915