Take you through HetuEngine: resource planning and data source docking

This article is shared from the HUAWEI CLOUD community " [Leading you to play with HetuEngine] (3) HetuEngine resource planning ", author: HetuEngine nine-level endorsement.

HetuEngine supports resource planning in two dimensions of service layer role instances and computing instances, and supports load sharing and balancing by starting multiple computing instances in high-concurrency scenarios, so as to meet resource planning requirements in various business scenarios.

1. HetuEngine role instance resource planning

HetuEngine can manage computing instances as a service through the service layer. The role instances of the service layer include HSBroker, HSConsole, HSFabric, and QAS.

 

Instance parameters can be adjusted through HetuEngine service layer configuration, as shown in the figure below.

 

Compute Instance Resource Planning

The computing instance of HetuEngine is a memory-based computing engine running in the Yarn container. It generally includes 1 to 2 Coordinator and N workers. The Coordinator is the management node, which provides SQL receiving, SQL parsing, execution plan generation, and execution plan For capabilities such as optimization, task assignment, and resource scheduling, if computing instances need to support high availability, two Coordinator must be deployed. Worker is a working node that provides data source data parallel pull, distributed SQL computing and other capabilities. Starting from version 8.2.1, HetuEngine supports single-tenant multi-computing instances.

The relationship between Yarn's tenant queue, HetuEngine computing instance, Coordinator and Worker of the computing instance is shown in the following figure:

Schematic diagram of Yarn resource pool allocation (AM is Yarn's application manager)

HetuEngine supports the management of computing instances on the HSConsole interface, and can perform differentiated configurations for each computing instance, as shown in the figure below

It also supports adding custom parameter configurations at the computing instance level when creating a computing instance:

HetuEngine Computing Instance Selection and Memory Configuration Suggestions

The computing instance of HetuEngine is a SQL query engine, which is a pure memory computing engine. Therefore, from a performance point of view, it is necessary to give computing instances as much memory resources as possible.

Since the computing instance of HetuEngine is in onYarn mode, both Coordinator and Worker run on the NodeManager node of Yarn.

coordinator & worker resource configuration recommendation

Coordinator recommends deploying two nodes, and Workers are deployed according to actual resource conditions.

• The memory configuration requirements for Coordaintor and Worker are:

1. Require yarn.scheduler.maximum-allocation-mb > coordaintor/worker container memory > JVM memory.

2. It is recommended that the yarn.scheduler.maximum-allocation-mb memory be 90% of the physical memory of the node, the coordaintor/worker container memory ratio yarn.scheduler.maximum-allocation-mb, and the JVM memory be 80% of the coordaintor/worker container memory ratio .

3. It is recommended that a node should be deployed in the form of a conatiner to avoid memory fragmentation and never waste resources.

4. The memory resources used by coordaintor and worker+AM cannot exceed the maximum memory resources available to the tenant.

• The CPU value configuration requirements for Coordaintor and Worker are:

1. yarn.scheduler.maximum-allocation-vcores is greater than the vcores of coordaintor and worker.

2. It is recommended that the vcore values ​​of the coordaintor and worker be 2 to 10 less than the value of yarn.scheduler.maximum-allocation-vcores.

3. The core resources used by the coordaintor and worker+AM cannot exceed the maximum core resources that can be used by the tenant.

Example of Queue Resource Planning Configuration

Compute Instance Size Estimation

Roughly estimate the size and number of computing instance workers based on the amount of business data

Yarn parameters, computing instance memory configuration

  • Yarn parameter adjustment

Adjust the maximum number of cores and maximum memory related parameters of the container on yarn to meet the estimated size requirements of the computing instance, and modify it at the yarn service level

  • Compute Instance Memory Adjustment

HetuEngine configuration (it is recommended that the CN and Worker configurations be consistent): the specific modification points are shown in the figure below. On the HSConsole page, select the computing instance, click "Configure", and you can modify it in the pop-up window as shown in the figure below:

3. Recommended multi-instance configuration under high concurrency

The concurrency of a single HetuEngine computing instance is recommended to be less than 50. In high concurrency scenarios, it is recommended to start multiple computing instances for load sharing to avoid significant performance degradation. HetuEngine supports two ways to start multi-computing instances, one is the single-tenant single-instance mode, and the other is the single-tenant multi-instance mode.

Method 1: Deployment mode of single tenant and single instance.

Resources can be divided into multiple resource pools, each tenant exclusively occupies a resource pool, and each tenant launches a computing instance for deployment. For example, resources are divided into three resource pools: default, online, and offline, which are used by default, online, and offline tenants respectively. Each tenant starts a computing instance, and different services will be submitted to different resource queues:

Method 2: Single-tenant multi-instance deployment mode.

After version 320, HetuEngine supports starting multiple computing instances in a single tenant through configuration. As shown in the figure below, different services are submitted to queues in the same tenant, and HetuEngine can automatically achieve load balancing among computing instances in a single tenant.

2. HetuEngine data source docking

HetuEngine can support cross-source (multiple data sources, such as Hive, HBase, GaussDB (DWS), Elasticsearch, ClickHouse, etc.), cross-domain (multiple regions or data centers) fast joint query, especially suitable for Hadoop clusters (FusionInsight MRS ) interactive fast query scenario of Hive and Hudi data. This chapter will introduce HetuEngine's data source docking capabilities and operating practices.

Data source docking overview

The current HetuEngine data source docking supports the following capabilities:

1. Supports docking with various data sources such as Hive, HBase, GaussDB (DWS), Elasticsearch, ClickHouse, Hudi, IoTDB, and supports docking with cross-domain HetuEngine

2. Support fast joint query of multiple data sources and provide visual data source configuration and management pages. Users can quickly add data sources through the HSConsole interface and perform differentiated configurations

3. The data source takes effect dynamically without restarting the computing instance

4. Support data source pushdown

Multiple data source docking

The data sources supported by the current version of HetuEngine are shown in Table 1

Visual data source management interface

HetuEngine can support fast joint query of multiple data sources and provide visual data source configuration and management pages. Users can quickly add data sources through the HSConsole interface and perform differentiated configurations. An example of operation is shown in the figure below

You can add the custom configuration of the corresponding data source through the bottom "custom configuration"

The data source takes effect dynamically

Operations such as adding, configuring, and deleting data sources on the HSConsole interface or using the HSConsole Rest API support dynamic validation without restarting the computing instance.

The default time for data source dynamics to take effect is 60 seconds. To modify the dynamic effective time, add the following parameters in the custom configuration of the computing instance, for example:

catalog.scanner-interval =120s

Data source calculation pushdown

HetuEngine supports query pushdown (pushdown), which can push down queries, or parts of queries, to connected data sources. This means that special predicates, aggregate functions or other operations, can be passed to the underlying database or file system for processing. Query pushdown can bring the following benefits:

  1. Improve overall query performance.

  2. Reduce network traffic between HetuEngine and data sources.

  3. Reduce the load on remote data sources.

The specific support of HetuEngine for query pushdown depends on the specific Connector and the underlying data source or storage system related to the Connector.

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

 

 

 

The Indian Ministry of Defense self-developed Maya OS, fully replacing Windows Redis 7.2.0, and the most far-reaching version 7-Zip official website was identified as a malicious website by Baidu. Go 2 will never bring destructive changes to Go 1. Xiaomi released CyberDog 2, More than 80% open source rate ChatGPT daily cost of about 700,000 US dollars, OpenAI may be on the verge of bankruptcy Meditation software will be listed, founded by "China's first Linux person" Apache Doris 2.0.0 version officially released: blind test performance 10 times improved, More unified and diverse extremely fast analysis experience The first version of the Linux kernel (v0.01) open source code interpretation Chrome 116 is officially released
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/10098800