What should I do if the Spark job does not know how to allocate resources?

A few days ago, several friends asked me about the resource allocation of spark jobs: that is, when submitting a job, I don’t know how many resources should be allocated? My answer is to rely on experience . If you think about it carefully, relying on experience is equivalent to not saying it. There must be some methodologies or ideas. So there is this article. When I wrote it, I really didn't know how to write it, so I searched the Internet to see how the big guys answered. Coincidentally, I really found out that someone asked this question 3 years ago.

picture

After reading the comments, I feel that I can understand it, but I don’t know if my friends can understand it, so I think I’d better explain it in detail

picture

First of all, the spark official website provides us with some hardware-level suggestions, first link to https://spark.apache.org/docs/latest/hardware-provisioning.html.

But this article does not introduce how to allocate resources at the application level in a very fine-grained manner. However, you can refer to several directions involved, such as memory and CPU .

First, let’s review the parameters involved in these two parts of spark as follows:

--num-executors/ (Spark.executor.instances parameter in SparkSQL)

This parameter represents the total number of executors used by the job

--driver-memory/ (SparkSQL is configured with the spark.driver.memory parameter)

This parameter represents the memory consumed by the Driver side. Usually, the Driver side does not consume too many resources, so there is no need to give too many resources.

--executor-memory (SparkSQL is configured with spark.executor.memory parameter)

This parameter represents the memory occupied by each executor

--executor-cores/ (SparkSQL is configured with spark.executor.cores parameter)

This parameter represents the number of tasks that can be executed in parallel in an executor (note the difference between parallel and concurrent!)

Usually the total memory consumed by a Spark job >=(spark.executor.instances)*(spark.exector.memory);

The number of CPUs occupied >= (spark.executor.instances) * (spark.exector.cores).

The reason why greater than is used here is because AppMaster also takes up resources! When it comes to AppMaster, you need to review the two parameters in the Yarn part:

yarn.nodemanager.resource.memory-mb

This parameter indicates that nm can use the maximum available memory for Yarn

yarn.nodemanager.resource.cpu-vcores

This parameter indicates the maximum number of virtual cpus that nm can use for yarn

Generally, in the actual production environment, the total resources that can be allocated by yarn must have been allocated. There will even be a separate spark job queue (here also depends on the scheduling mode of yarn, whether it is fair, fifo, or capacity).

Let's give an example here to illustrate: Suppose we have a bare cluster with a total of 6 nodes, and each node has a configuration of 16c 64g (in reality, the configuration of each node may be uneven).

But it is impossible for us to allocate all the resources to yarn, and we still need to reserve some of them for system operation, so each node should reserve 1C and 1G, and take the extreme method. Finally, the resources available for each node to be allocated by yarn are as follows:

yarn.nodemanager.resource.memory-mb=63g

yarn.nodemanager.resource.cpu-vcores=15

Then the total resource is all_memory=63*6=378g; all_cpu=6*15=90c

We assume that the parallelism of each Executor is 5, then we need 90/5=18 executors, but we need to reserve one for AppMaster, so –num-exectors=17; we just calculated that we need a total of 18 Executors, a total of 6 nodes, On average, each node starts 3 executors, and one node is 63g, so each executor can occupy 21g of memory. Please note: each executor does not use all 21g for jobs, but also reserves some resources for stack storage , The overhead of buffers is used to ensure stability, which is often referred to as off-heap memory. The overhead of this block is set through the spark.executor.memoryOverhead parameter, and its value = max(384mb, 0.1*executorMemory). Therefore, this part of off-heap memory = max(384mb, 0.1*21)~=2g. Then the heap memory = 21-2 = 19g.

Here is a table to summarize:

That is to say, when submitting a job, its parameter configuration value cannot be greater than the following value (the example here is assuming that all resources are given to your task for use!):

--executor-cores / spark.executor.cores = 5--executor-memory / spark.executor.memory = 19--num-executors / spark.executor.instances = 17

Number of nodes

memory/node

CPU/node

Yarn can allocate the maximum amount of memory

Yarn can allocate the maximum CPU

Assume the degree of parallelism of each executor

Total number of executors

The number of executors actually working

The number of executors each node can start

Memory occupied by each executor

off-heap memory

The memory used for actual work

6

64g

16C

63g*6=378g

15C*6=90C

5

90C/5=18

18-1=17

18/6=3

63g/3=21g

max(384mb,0.1*21g)=2g

21g-2g=19g






9

90c/9=10

10-1=9

10/6~=2

63g/~=2~=37g

max(384mb,0.1*37g)=3g

37g-3=34g

Some friends may be confused, why the parallelism of each executor is set to 5, of course, it can also be increased. If you increase the parallelism of each executor, it is equivalent to improving the parallelism capability, which reduces the number of polling tasks. For example, if you have 100 tasks and each executor executes 20 tasks, then you need to poll 5 times; if each executor sets 50 tasks, then polling is required twice; this will increase the memory held by each executor.

In fact, one of the evaluation methods mentioned in the previous forum is to infer backwards according to the number of tasks. The official suggestion is that a CPU core handles 2 to 3 tasks.

picture

Then suppose there are 100 tasks, and at most 50c is needed, or according to the above example, at this time, it depends on whether you adopt the method of sharing each node equally, or centralized processing on a few nodes. The method to be adopted also needs to be combined with the amount of data you actually process and the complex logic of calculations (because factors such as network consumption and localized calculations, the load of each node, etc. must also be considered here), of course, it is usually an amortized method , then each node will probably use 8c. So how many executors does each node start? Is it to start one executor, or two executors, each executor, and how much memory does each executor occupy? At this time, it depends on whether your code logic calculates more or caches more. This is where the memory model is involved. For those who don’t know the memory model, review my original article: Tungsten On Spark-Memory Model Design

Although the above resource allocation has been evaluated, it does not mean that there is no need for subsequent optimization. Otherwise, why would there be so many optimization articles on the Internet? At the same time, we must also pay attention to teamwork resources. You cannot use all the resources for you, otherwise others will How else do you work?

So you have to reduce some of the resources you evaluate, so that you may need to make constant adjustments. This is why many people will say that they rely on experience when they ask how to allocate resources!

The editor thinks that there is really no formula to apply to this thing.

If you have a better methodology for allocating resources or something is wrong, please contact me and share it with other friends.

If you have other troubles, welcome to harass me~

reference:

  • spark official website

  • https://bbs.csdn.net/topics/392153088

Guess you like

Origin blog.csdn.net/qq_28680977/article/details/122429880