Spark performance optimization and fault handling (4) Spark JVM tuning

For JVM tuning, you should first make it clear full gc/minor gc,都会导致 JVM 的工作线程停止工作, ie stop the world.

1. Reduce the memory ratio of cache operations

1.1 Static memory management mechanism

According to the Spark static memory management mechanism, the heap memory is divided into two pieces, Storageand Execution. Storage is mainly used to cache RDD data and broadcast data, and Execution is mainly used to cache intermediate data generated in the shuffle process Storage 占系统内存的 60%,Execution 占系统内存的 20%,并且两者完全独立.

In general, Storage memory is provided for cache operations. However, if the cache operation memory is not very tight in some cases, and there are many objects created in the task operator, the Execution memory is relatively small, which will lead to Frequent minor gc or even frequent full gc will cause Spark to stop working frequently, which will have a great impact on performance.

In the Spark UI, you can view the running status of each stage, including the running time of each task, gc time, etc.,如果发现 gc 太频繁,时间太长,就可以考虑调节 Storage 的内存占比,让 task 执行算子函数式,有更多的内存可以使用。

Storage memory area can spark.storage.memoryFractionbe specified, the default is 0.6, i.e. 60%, can be progressively decreasing downwardly.

val conf = new SparkConf().set("spark.storage.memoryFraction", "0.4")  //默认为0.6

1.2 Unified memory management mechanism

According to Spark's unified memory management mechanism, heap memory is divided into two pieces, Storage and Execution. Storage is mainly used to cache data, and Execution is mainly used to cache intermediate data generated in the shuffle process. The memory part composed of the two is called unified memory. Storage and Execution each account for 50% of the unified memory 由于动态占用机制的实现,shuffle 过程需要的内存过大时,会自动占用 Storage 的内存区域,因此无需手动进行调节.

Two, adjust Executor off-heap memory

Executor's off-heap memory is mainly used for program shared libraries, Perm Space, thread Stack, and some Memory mapping, etc., or allocate objects in a C-like manner.

Sometimes, if the amount of data processed by your Spark job is very large, reaching hundreds of millions of data, running Spark jobs will report errors from time to time, such as shuffle output file cannot find, executor lost, task lost, out of memory, etc. 可能是 Executor 的堆外内存不太够用,导致 Executor 在运行的过程中内存溢出.

When the stage task is running, it may need to pull the shuffle map output file from some Executors, but the Executor may have died due to a memory overflow, and its associated BlockManager is also gone, which may report a shuffle output file Cannot find, executor lost, task lost, out of memory and other errors. At this time, you can consider adjusting the Executor's off-heap memory to avoid errors. At the same time, when the off-heap memory adjustment is relatively large, In terms of performance, it will also bring a certain improvement.

By default, the upper limit of Executor's off-heap memory is about 300 MB. In the actual production environment, when processing massive amounts of data, there will be problems here, causing the Spark job to repeatedly crash and fail to run. At this time, it will adjust This parameter is at least 1G, even 2G, 4G.

Executor outer heap memory configuration needs spark-submit 脚本in the configuration, as follows:

--conf spark.yarn.executor.memoryOverhead=2048

After the above parameter configuration is completed, certain JVM OOM abnormal problems will be avoided, and at the same time, the performance of the overall Spark job can be improved.

Three, adjust the connection waiting time

During the running of the Spark job, the Executor first obtains a piece of data from its locally associated BlockManager. If the local BlockManager does not have it, it will remotely connect to the Executor BlockManager on other nodes through TransferService to obtain the data.

If the task creates a large number of objects or creates a large object during the running process, it will take up a lot of memory, which will cause frequent garbage collection, however 垃圾回收会导致工作现场全部停止,也就是说,垃圾回收一旦执行,Spark 的 Executor 进程就会停止工作,无法提供相应,此时,由于没有响应,无法建立网络连接,会导致网络连接超时.

In a production environment, sometimes errors such as file not found and file lost may be encountered. In this case, it is very likely that Executor's BlockManager cannot establish a connection when pulling data, and then exceeds the default connection waiting time After 60s, it is declared that the data pull fails. If the data fails to be pulled after repeated attempts, the Spark job may crash. This situation may also cause DAGScheduler to submit several stages repeatedly, and TaskScheduler returns to submit several tasks, which greatly prolongs the running time of our Spark job.

At this time, you can consider adjusting the timeout duration of the connection. The waiting duration of the connection needs to be set in the spark-submit script. The setting method is as follows:

--conf spark.core.connection.ack.wait.timeout=300

After adjusting the connection waiting time, you can usually avoid some XX file pull failures and XX file lost errors.

Guess you like

Origin blog.csdn.net/weixin_43520450/article/details/108651169