HIVE学习二:hive on tez

环境介绍

组件 版本
hadoop 2.6.5
hive 2.3.6
tez 0.8.5

tez对hadoop版本是有要求的。tez 0.8及以上需要hadoop 2.6及以上。tez 0.9及以上需要hadoop 2.7及以上。

下载、安装、配置TEZ

  1. 清华镜像站下载对应版本的tez如apache-tez-0.8.5-bin.tar.gz,解压后放在/usr/local/src目录下并建立软连接。如下图所示。tez官网介绍的是用源码编译的方式获取tez,由于源码编译太慢了,直接采用编译好的tez包apache-tez-0.8.5-bin.tar.gz
    在这里插入图片描述
  2. 在hdfs上创建目录,将tez的tar包拷贝到对应目录。其中tez.tar.gz包放在${TEZ_HOME}/share目录下。
hdfs dfs -mkdir -p /apps/tez
hdfs dfs -put tez/share/tez.tar.gz /apps/tez

在这里插入图片描述

  1. 编写tez-site.xml文件,放在${HADOOP_HOME}/etc/hadoop目录下,内容如下。
    设置tez.lib.uris属性指向刚刚上传到hdfs上的tez.tar.gz路径。编写完成后拷贝tez-site.xml文件到所有节点的${HADOOP_HOME}/etc/hadoop目录下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>tez.lib.uris</name>
        <value>${fs.defaultFS}/apps/tez/tez.tar.gz</value>
    </property>
</configuration>

  1. 每个node都配置hadoop classpath环境变量,使其包括tez libraries。
export TEZ_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export TEZ_HOME=/usr/local/src/tez
export TEZ_JARS=${TEZ_HOME}/*:${TEZ_HOME}/lib/*
export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
  1. 配置tez ui
    tez ui依赖于YARN timelineserver服务。在hadoop2.4版本之前对任务执行的监控只开发了针对MR的Job History Server,它可以提供给用户用户查询已经运行完成的作业的信息,但是后来,随着在YARN上面集成的越来越多的计算框架,比如spark、Tez,也有必要为基于这些计算引擎的技术开发相应的作业任务监控工具,所以hadoop的开发人员就考虑开发一款更加通用的Job History Server,即YARN Timeline Server

yarn-site.xml文件添加如下内容配置YARN Timeline Server。更加详细的配置可参考TimelineServer
特别需要注意的是yarn.timeline-service.hostname需要改成启动TimelineServer服务的节点地址,如我在master机器上启动,这里就写master。官网默认是0.0.0.0,这样DAG Master会报错找不到TimelineServer。改成真正的hostname即可。

<!--configurations for timelineserver-->
    <property>
        <name>yarn.timeline-service.hostname</name>
        <value>master</value>
    </property>
    <property>
        <description>Address for the Timeline server to start the RPC server.</description>
        <name>yarn.timeline-service.address</name>
        <value>${yarn.timeline-service.hostname}:10200</value>
    </property>
    
    <property>
        <description>The http address of the Timeline service web application.</description>
        <name>yarn.timeline-service.webapp.address</name>
        <value>${yarn.timeline-service.hostname}:8188</value>
    </property>
    
    <property>
        <description>The https address of the Timeline service web application.</description>
        <name>yarn.timeline-service.webapp.https.address</name>
        <value>${yarn.timeline-service.hostname}:8190</value>
    </property>
    
    <property>
        <description>Handler thread count to serve the client RPC requests.</description>
        <name>yarn.timeline-service.handler-thread-count</name>
        <value>10</value>
    </property>
    
    <property>
        <description>The max number of applications could be fetched by using REST API
         or application history protocol and shown in timeline server web ui. Defaults
         to `10000`.</description>
        <name>yarn.timeline-service.generic-application-history.max-applications</name>
        <value>10000</value>
    </property>
    
    <property>
        <description>Enables cross-origin support (CORS) for web services where
        cross-origin web response headers are needed. For example, javascript making
        a web services request to the timeline server.</description>
        <name>yarn.timeline-service.http-cross-origin.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <description>Comma separated list of origins that are allowed for web
        services needing cross-origin (CORS) support. Wildcards (*) and patterns
        allowed</description>
        <name>yarn.timeline-service.http-cross-origin.allowed-origins</name>
        <value>*</value>
    </property>
    
    <property>
        <description>Comma separated list of methods that are allowed for web
        services needing cross-origin (CORS) support.</description>
        <name>yarn.timeline-service.http-cross-origin.allowed-methods</name>
        <value>GET,POST,HEAD</value>
    </property>
    
    <property>
        <description>Comma separated list of headers that are allowed for web
        services needing cross-origin (CORS) support.</description>
        <name>yarn.timeline-service.http-cross-origin.allowed-headers</name>
        <value>X-Requested-With,Content-Type,Accept,Origin</value>
    </property>
    
    <property>
        <description>The number of seconds a pre-flighted request can be cached
        for web services needing cross-origin (CORS) support.</description>
        <name>yarn.timeline-service.http-cross-origin.max-age</name>
        <value>1800</value>
    </property>
    
    <property>
        <description>Indicate to ResourceManager as well as clients whether
        history-service is enabled or not. If enabled, ResourceManager starts
        recording historical data that Timelien service can consume. Similarly,
        clients can redirect to the history service when applications
        finish if this is enabled.</description>
        <name>yarn.timeline-service.generic-application-history.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <description>Store class name for history store, defaulting to file system
        store</description>
        <name>yarn.timeline-service.generic-application-history.store-class</name>
        <value>org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore</value>
    </property>

    <property>
        <description>Indicate to clients whether Timeline service is enabled or not.
        If enabled, the TimelineClient library used by end-users will post entities
        and events to the Timeline server.</description>
        <name>yarn.timeline-service.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <description>Store class name for timeline store.</description>
        <name>yarn.timeline-service.store-class</name>
        <value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value>
    </property>
    
    <property>
        <description>Enable age off of timeline store data.</description>
        <name>yarn.timeline-service.ttl-enable</name>
        <value>true</value>
    </property>

    <property>
        <description>Publish YARN information to Timeline Server</description>
        <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
        <value>true</value>
    </property>
    
    <property>
        <description>Time to live for timeline store data in milliseconds.</description>
        <name>yarn.timeline-service.ttl-ms</name>
        <value>604800000</value>
    </property>

tez-site.xml文件添加如下内容来配置tez ui

<property>
    <name>tez.history.logging.service.class</name>
    <value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
    <name>tez.tez-ui.history-url.base</name>
    <value>http://master:8080/tez-ui/</value>
</property>
  1. 在某个节点下例如master节点安装tomcat,并将${TEZ_HOME}/tez-ui-0.8.5.war文件拷贝到${TOMCAT_HOME}/webapps/下并重命名为tez-ui.war,如下图。这就对应上面tez-site.xml文件中的tez.tez-ui.history-url.base属性值
    在这里插入图片描述
  2. 如果tomcat不是安装在YARN Timeline Server服务启动的节点,就需要修改tez-ui/scripts/configs.js文件,如下所示,timelineBaseUrlRMWebUrl写成正确的地址
    在这里插入图片描述
  3. 修改hive-site.xml文件,将执行引擎修改为tez,如下所示
<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
    <description/>
</property>
  1. 编辑完对应的文件后,启动hdfs集群和yarn集群以及Timeline Server服务和tomcat
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start historyserver

测试hive on tez

在hive里执行hql语句后出现如下图所示的结果,并且能在yarn ui上点开进入到tez ui界面
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
默认情况下,application对应的历史文件会存储在yarn.timeline-service.leveldb-timeline-store.path,默认值是${hadoop.tmp.dir}/yarn/timeline
在这里插入图片描述

如果想退回用hive on mr,则可以通过unset命令取消掉当前会话下关于TEZ的环境变量和HADOOP_CLASSPATH,并同时修改hive-site.xml文件中的执行引擎,然后重启hiveserver2服务重新进入beeline就可以退回了。
如果想再次用hive on tez,则需要source /etc/profile来加载关于TEZ的环境变量和HADOOP_CLASSPATH,并同时修改hive-site.xml文件中的执行引擎,然后重启hiveserver2服务重新进入beeline

unset HADOOP_CLASSPATH
unset TEZ_CONF_DIR
unset TEZ_HOME
unset TEZ_JARS

beeline -u jdbc:hive2://master:10000 -n root --hiveconf hive.execution.engine=mr

不按照上述操作的话直接换成mr引擎,可能报SuchNoField等错误,明显的版本不兼容。

参考网址

TimelineServer
Tez-install
tez-ui

发布了19 篇原创文章 · 获赞 0 · 访问量 688

猜你喜欢

转载自blog.csdn.net/qq_23120963/article/details/104604707