[Hive of big data] 20. Hive tuning related configuration and Explain view execution plan

1 Yarn resource configuration

  The parameters of Yarn need to be adjusted related to CPU, memory and other resources
(1) yarn.nodemanager.resource.memory-mb
  sets the memory allocated by a NodeManager node to the container Container, which depends on the total memory capacity of the node where the NodeManager is located and the node running The number of other services is generally between 1/2 and 2/3 of the total memory.

<!-- NodeManager节点分配给容器Container使用的内存 设置2G -->
<property>
   <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
</property>

(2) yarn.nodemanager.resource.cpu-vcores
  sets the number of CPU cores allocated by a NodeManager node to the Container, depending on the total number of CPU cores of the node where the NodeManager is located and other services run by the node. Generally, one core allocates 4G of memory.

<!-- NodeManager节点分配给Container使用的CPU核数 -->
<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>1</value>
</property>

(3) yarn.scheduler.maximum-allocation-mb
  The maximum memory that a single Container can use can be slightly increased.

<!-- 单个Container能够使用的最大内存 -->
<property>
   <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>1536</value>
</property>

(4) yarn.scheduler.minimum-allocation-mb
  The minimum memory that a single Container can use.

<!-- 单个Container能够使用的最小内存 -->
<property>
   <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>512</value>
</property>

Modify the yarn-site.xml file

<!-- NodeManager节点分配给容器Container使用的内存 设置64G -->
<property>
   <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2048</value>
</property>

<!-- NodeManager节点分配给Container使用的CPU核数 -->
<property>
   <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>1</value>
</property>

<!-- 单个Container能够使用的最大内存 -->
<property>
   <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>1536</value>
</property>

<!-- 单个Container能够使用的最小内存 -->
<property>
   <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>512</value>
</property>

Save the distribution and restart yarn.

xsync yarn-site.xml
stop-yarn.sh
start-yarn.sh

2 MapReduce configuration

  The MapReduce resource configuration mainly includes the memory and CPU cores of the Map Task, and the memory and CPU cores of the Reduce Task.

(1) mapreduce.map.memory.mb
  The container memory size applied by a single Map Task, and its default value is 1024. The value cannot exceed the range specified by yarn.scheduler.maximum-allocation-mb and yarn.scheduler.minimum-allocation-mb.

  This parameter needs to be configured individually according to different computing tasks. In hive, you can directly use the following method to configure each SQL statement individually:

set  mapreduce.map.memory.mb=1536;

(2) mapreduce.map.cpu.vcores
  The number of container cpu cores applied by a single Map Task. The default value is 1.

(3) mapreduce.reduce.memory.mb
  The container container memory size applied by a single Reduce Task, and its default value is 1024. The value also cannot exceed the range specified by yarn.scheduler.maximum-allocation-mb and yarn.scheduler.minimum-allocation-mb.
  This parameter needs to be configured individually according to different computing tasks. In hive, you can directly use the following method to configure each SQL statement individually:

set  mapreduce.reduce.memory.mb=1536;

(4) mapreduce.reduce.cpu.vcores
  The number of container cpu cores applied by a single Reduce Task, and its default value is 1.

3 Explain to view the execution plan

3.1 Overview of the execution plan

  The execution plan is simply how many MapReduces a SQL statement is finally translated into, what is done in the Map, and what is done in the Reduce.
  The execution plan displayed by Explain consists of a series of stages (a stage of the entire execution plan, and a SQL statement will divide the entire stage into several execution plans). Stages have dependencies, and each stage corresponds to a MapReduce job or a file. system operation, etc.
  If a Stage corresponds to a MapReduce Job, the calculation logic of the Map side and the Reduce side are described by the Map Operator Tree and the Reduce Operator Tree respectively; the Operator Tree is composed of a series of Operators; an Operator represents a single Logical operations, such as TableScan Operator, Select Operator, Join Operator, etc.

For example, a job execution plan:
  stage-1 depends on stage-0. stage-0 is a pull operation, and stage-1 has a map core reduce operation tree.
insert image description here
Common Operators and their functions are as follows:

TableScan: table scan operation, usually the first operation on the map side must be a table scan operation.

Select Operator: Select the operator.

Group By Operator: group aggregation operation.

Reduce Output Operator: output to the reduce operation.

Filter Operator: filter operations, such as where, having.

Join Operator—Join operation.

File Output Operator: File output operation.

The Fetch Operator client fetches data operations.

3.2 Basic syntax

explain [formatted|extended|dependency] query_sql;

(1) formatted: output the execution plan as a JSON string.
(2) extended: Output additional information in the execution plan, usually read and write file names, temporary file directories and other information.
(3) dependency: output the table or partition read by the execution plan.

Guess you like

Origin blog.csdn.net/qq_18625571/article/details/131197316