Hive2 installs Tez computing engine

1. Introduction to Tez

The ApacheTEZ® project aims to build an application framework that allows data to be processed using complex directed acyclic graphs. It is currently built on top of Apache Hadoop YARN.
The 2 main design themes of Tez are:
Empowering end users through:
Expressive data flow definition API
Flexible input-processor-output runtime model
Agnostic data types
Simplified deployment
Execution performance
compared to Map Reduce Performance improvements
Optimal resource management
Planned reconfiguration at runtime
Dynamic physical dataflow decisions

By allowing projects such as Apache Hive and Apache Pig to run complex DAG tasks, Tez can be used to process data that previously required multiple MR jobs to be executed in a single Tez job, as shown below.

image-20201025185909626

2. Installation Guide

1) Download the tez dependency package: http://tez.apache.org or download from the domestic mirror source https://mirrors.huaweicloud.com/apache/tez/0.9.2/

image-20201023075444624

2) Copy apache-tez-0.9.2-bin.tar.gz to the /software directory of Houda

  1. Unzip the installation package to opt
[root@houda share]# tar -zxvf /software/apache-tez-0.9.2-bin.tar.gz -C /opt/
  1. Modify the file name
[root@houda share]# mv /opt/apache-tez-0.9.2-bin /opt/tez

5) Upload tez.tar.gz to the /tez directory of HDFS

[root@houda opt]# cd /opt/tez/share/
[root@houda opt]# hadoop fs -mkdir /tez
[root@houda share]# hadoop fs -put ./tez.tar.gz /tez 	
  1. To avoid conflicts with hadoop and hive log jar packages, delete the log4j package of tez
[root@houda share]# rm -rf tez/lib/slf4j-log4j12-1.7.10.jar

7) Create tez-site.xml file in hadoop

[root@houda share]# vim $HADOOP_HOME/etc/hadoop/tez-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>tez.lib.uris</name>
<value>hdfs://houda:9000/tez/tez.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
  <description>Enable Tez to use the Timeline Server for History Logging</description>
  <name>tez.history.logging.service.class</name>
  <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
</configuration>
  1. Edit the hadoop-env.sh script, at the end of the script, add the configuration
export TEZ_CONF_DIR=/opt/hadoop-2.7.6/etc/hadoop
export TEZ_JARS=/opt/tez/
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
  1. Set the resource configuration of nodemanager in yarn-site.xml
<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>22528</value>
    <discription>每个节点可用内存,单位MB</discription>
</property>

<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1500</value>
    <discription>单个任务可申请最少内存,默认1024MB</discription>
</property>

<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>16384</value>
    <discription>单个任务可申请最大内存,默认8192MB</discription>
</property>
  1. Set the tez computing engine in hive-site.xml
[root@houda share]# vim /opt/hive/conf/hive-site.xml
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
  1. Restart hadoop service test
[root@houda share]# stop-all.sh && start-all.sh
[root@houda share]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.6/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/opt/hive/conf/hive-log4j2.properties Async: true
hive (default)> select count(*) from default.emp;
Query ID = root_20201025200104_58fc10de-25ac-4acc-8d11-24fe0b0c7f0c
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1603626670053_0003)
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 3.63 s
----------------------------------------------------------------------------------------------
OK
_c0
14
Time taken: 6.419 seconds, Fetched: 1 row(s)
  1. If the operation is successful, it means that the modification is successful.

Guess you like

Origin blog.csdn.net/weixin_38620636/article/details/130404941