After the installation is complete Presto need to do

Presto is its excellent search speed as we know, based on its own MPP architecture, the Hive can quickly query the data, while supporting the expansion Connector, currently Mysql, MongoDB, Cassandra, Hive and so a series of databases provide Connector be supported. Is our common SQL on Hadoop solution. So the question we look at today, when we choose as our query engine Presto, we need to consider.

Presto performance tuning and stability

Presto problems

  1. Coordinator of a single point (common scenario: ip drift, nginx proxy dynamic access, etc.)
  2. Large queries easily OOM (0.186+ version supports dump to disk unverified)
  3. No fault tolerance, no retry mechanism
  4. Presto complex deployment environment, MPP architecture susceptible to influence a single machine
  5. Presto lack of concurrency

Tuning Strategy

  1. Coordinator avoid deploying multiple single-point problem, the upper layer of the package tracking to avoid direct jdbc
  2. If it is necessary to retry the operation in tracking (needed to determine the job status)
  3. For worker-related memory parameters of the rational allocation, to avoid OOM
  4. Enable queue Presto own resources while building a query queue in line with business scenarios, and control the amount of concurrent queries priority task to ensure the efficient completion
  5. Presto develop monitoring systems to monitor the status of the cluster Presto, timely warning. Presto dynamically adjust cluster size

Memory Tuning

Presto memory pool divided into three categories, namely GENERAL_POOL, RESERVED_POOL, SYSTEM_POOL.

SYSTEM_POOL is reserved for system memory, worker and perform the tasks necessary to initialize memory, the default is Xmx 0.4 also specify RESERVED_POOL be resources.reserved-system-memory is the maximum query memory, Presto sets the current easy to use query memory biggest cut to the memory region, the default is Xmx 0.1 configured by-memory-per-query.max Node
GENERAL_POOL other query memory, i.e. in addition to other query query maximum query memory size Xmx-SYSTEM_POOL-RESERVED_POOL

Overall memory configuration affected by the following scenario:

  1. User queries data, complexity (how much the decision of the query memory)
  2. Concurrency user queries (decide how much the jvm heap)

Note that: simply increase the value of RESERVED_POOL and Presto query not solve the problem, because most of the time RESERVED_POOL is not involved in the calculation, only the following scenario will be used, and can only be used by a Query.

  1. GENERAL_POOL situation has blocked node node appears, that is out of memory
  2. RESERVED_POOL not used

Therefore, the three reasonable values ​​need to be configured, if a concurrent need relatively large holding SYSTEM_POOL default or slightly larger again, RESERVED_POOL may be slightly increased to about one eighth.

While for the question jvm OOM, you need to configure the Presto of jvm.config:

-XX:G1ReservePercent=15
-XX:InitiatingHeapOccupancyPercent=40
-XX:ConcGCThreads=8

Presto monitoring

Presto own monitoring page only shows the current state of the cluster and Presto recent partial query, can not meet the demand. Relevant information required for data collection:

  1. Query basic information (status, memory usage, total time, error messages, etc.)
  2. Query Performance information (time for each step, the amount of data input and output data information and the like, and comprising a stage before the stage before the task)
  3. Abnormal warning

Presto subsequent optimization

  1. Control the partition table queries maximum number of zoning restrictions
  2. Control the maximum number of generated split a single query, to prevent large consumption of computing resources
  3. Automatic discovery and kill long-running queries
  4. Presto query (query limits the amount of data of more than xx) limiting
  5. Presto resources to enable queue
  6. Unified query engine

The current version of Presto memory limits and management

Stand-alone dimension
  1. GENERAL_POOL memory each time the application will determine whether the memory usage exceeds the maximum amount of memory, if it exceeds the error, the error is "Query exceeded local memory limit of x", which protects the Presto will apply for unlimited memory, will only lead to the current query An error occurred. Meanwhile, if the node may use GENERAL_POOL memory and recyclability memory is 0, the node is considered Block node.
  2. RESERVED_POOL can be considered the biggest query SQL, which can meet GENERAL_POOL memory restriction policy, then the policy will certainly meet RESERVED_POOL (multiplexed GENERAL_POOL strategy).
  3. RESERVED_POOL current version can be found to limit the memory, so when complicated by very high, and when the scan data is very large, there is a low probability that OOM can cause problems. But with the Resource Group, set reasonable memory, basically OOM will avoid problems.
Cluster dimensions

When both of the following two points, Presto will think beyond the requirements of the memory of the cluster:

  • GENERAL_POOL appears blocked node (Block node)
  • RESERVED_POOL has been used
  • When it is determined beyond a cluster CLuster Memory, Memory Management in two ways:
  1. One by one through each query, to determine the current total memory consumed by the query exceeds the query.max-memory (config.properties in configuration), if exceeded, then the query will be failed.
  2. If query.max-memory configured properly, the value is very large, it may be too 5 seconds (default time) still does not satisfy the first case, it will use the second method for managing query. The second management method is divided into two smaller management to decide policy based on the query Kill LowMemoryKillerPolicy, divided into total-reservation and total-reservation-on-blocked-nodes. Configure total-reservation role is to kill off all the queries in the most expensive query memory; and total-reservation-on-blocked-nodes use to kill most memory query on a node out of memory (blocking) of.
Resource Groups

Resource Groups can be considered Presto implements a weak resource limits and isolation. It may specify the queue size, complicated by size, memory size used for each group. For each group set a reasonable hardConcurrencyLimit (maximum number of concurrent), softMemoryLimit (maximum memory use value) and maxQueued (queue size) on the one hand can reduce the impact of the different services, on the other hand is also a high probability to avoid OOM problem, of course, good at using the user and under do secondary development, you can make Presto supports multiple users share the same rights and group authentication.

Reference:
http://armsword.com/2018/05/22/the-memory-management-and-tuning-experience-of-presto/

Welcome to my attention: three gold big data

Guess you like

Origin www.cnblogs.com/jixin/p/11234861.html