New feature of Spark 3.4.0 -- UI supports storage in RocksDB

background

For Spark , all event information and information required by the UI are stored in the memory by default. In CS, for the mode of using Spark as the server, it will cause OOM, and it will also cause the previous author to submit PR: Multi sparkSession should share single SQLAppStatusStore issue.

analyze

As stated in Better Spark UI scalability and Driver stability for large applications : The current Spark UI and SHS have the following problems:
insert image description here

  1. The UI information stored by spark is all in memory ( the data structure is InMemoryStore ), which will take up a lot of memory, cause Drvier OOM , and affect the stability of spark
  2. Spark's UI stores a limited number of SQL entries, so if you want to go back and look at the scene after the task is completed, there is no trace
  3. Spark History Server must parse all spark events from the event in json format, especially the uncompressed time log will be very large, causing the SHS startup to wait for a long time

After the introduction of RocksDB as storage, the memory required by the driver can be reduced, and the introduction of the new protobuf serializer can greatly speed up the reading and writing of spark events
insert image description here

Guess you like

Origin blog.csdn.net/monkeyboy_tech/article/details/131491218