How does presto ensure that the job memory will not conflict and overflow

Abstract: As a pure memory computing engine, how does the presto computing engine ensure that the computing process will not overflow the memory of the job? This article will conduct in-depth study and analysis.

This article is shared from the HUAWEI CLOUD community " How does presto ensure that the job memory will not conflict and overflow? In -depth analysis of presto memory management mechanism ", author: breakDawn.

First, presto is divided into the following three memory pools

System Pool

System Pool refers to the system memory pool, which is reserved for the system and the buffer. The default is 40% of the memory space reserved for the system.

It seems that for presto, the buffers between jobs are shared and considered to be part of the system and general level.

General Pool

The general memory pool used to allocate memory for each query runtime. Most of these queries use general Pool.

Reserved Pool

The reserved memory pool is used for memory reservation allocations for very large jobs that may be triggered suddenly. That is, the largest query will use the Reserved Pool

The space of the Reserved Pool is equal to the maximum space used by a query running on a machine, and the default is 10% of the space.

Before the physical plan is actually executed, the memory requirements come from systemMemoryPool, including temporary data structures, transmission buffers, etc.

When executing the physical plan, different Operator types apply for memory as needed. For example, the aggregationOperator uses the getEsctimatedSize() method to estimate the required memory.

The memory obtained here comes from reservedMemoryPool or generalMemoryPool. Which pool is used depends on whether the current query consumes the most memory

Q: Why should a Reserved memory pool be introduced and only available for one job?

If there is no Reserved Pool, then when there are a lot of queries and the memory space is almost occupied, a query that consumes a lot of memory starts to run.

But at this time there is no memory space for the query to run, and the query has been in a suspended state, waiting for available memory.

However, after other small memory queries are run, only a little space may be freed up, and new small memory queries are added. Since the small memory query takes up small memory, it is easy to find the available memory. In this case, the large-memory query hangs until it starves to death.

Therefore, in order to prevent such starvation, a space must be reserved for running the query in a large memory. The size of the reserved space is equal to the maximum memory allowed by the query. Every second, Presto picks out a query with the largest memory usage and allows it to use the reserved pool to avoid having no available memory for the query to run.

The election mechanism of the reserved pool (how is the reserved pool elected)

As shown below:

Presto memory management, divided into two parts:

1. Query memory (job memory) management

The query is divided into many tasks, and each task will have a thread loop to obtain the status of the task, including the memory used by the task. Aggregate into the memory used by the query.

If the aggregate memory of a query exceeds a certain size, the query is forcibly terminated.

2. Machine memory management

The coordinator has a thread that periodically trains each machine to view the current machine memory status.

When the query memory and machine memory are aggregated, the coordinator will select a query with the largest memory usage and assign it to the Reserved Pool.
Memory management is managed by the coordinator. The coordinator makes a judgment every second and specifies that a query can use reserved memory on all machines.

Q: If the query is not run on a certain machine, isn't the memory reserved by the machine wasted? Why not pick out the largest task execution on a single machine?

The reason is still deadlock. If the query has reserved memory on other machines, the execution ends soon. However, it is not the largest task on a certain machine (that is, this task may only rank second on another node, and is occupied by another large job in the reserved pool, which causes the next step to be stuck and cannot be executed continuously), always Can not be run, resulting in the query can not end.

So the primary purpose is to ensure that the largest job that has been perceived at the moment is executed as soon as possible.

How to kill unnecessary queries when memory is low?

There is a session-level configuration query_max_memory for each job submission , which is the maximum memory specified for this query. During the polling process, if it is found that the memory exceeds the upper limit of this query, the query will be killed. 

There is also a session configuration resource_overcommit

If set to true, the job will not be killed even if the memory temporarily exceeds the specified memory for a single job, but if the memory of the entire cluster is insufficient, it will still be killed.

  • Determination of insufficient cluster memory:
    If there is insufficient memory in the memory pool of a worker node (that is, the node is blocked), it is considered that a memory overflow has occurred.

 

Click Follow to learn about HUAWEI CLOUD's new technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324142531&siteId=291194637