文章目录
环境介绍
组件 | 版本 |
---|---|
hadoop | 2.6.5 |
hive | 2.3.6 |
tez | 0.8.5 |
tez对hadoop版本是有要求的。tez 0.8及以上需要hadoop 2.6及以上。tez 0.9及以上需要hadoop 2.7及以上。
下载、安装、配置TEZ
- 从清华镜像站下载对应版本的tez如
apache-tez-0.8.5-bin.tar.gz
,解压后放在/usr/local/src
目录下并建立软连接。如下图所示。tez官网介绍的是用源码编译的方式获取tez,由于源码编译太慢了,直接采用编译好的tez包apache-tez-0.8.5-bin.tar.gz
。
- 在hdfs上创建目录,将tez的tar包拷贝到对应目录。其中
tez.tar.gz
包放在${TEZ_HOME}/share
目录下。
hdfs dfs -mkdir -p /apps/tez
hdfs dfs -put tez/share/tez.tar.gz /apps/tez
- 编写
tez-site.xml
文件,放在${HADOOP_HOME}/etc/hadoop
目录下,内容如下。
设置tez.lib.uris
属性指向刚刚上传到hdfs上的tez.tar.gz
路径。编写完成后拷贝tez-site.xml
文件到所有节点的${HADOOP_HOME}/etc/hadoop
目录下
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>${fs.defaultFS}/apps/tez/tez.tar.gz</value>
</property>
</configuration>
- 给每个node都配置hadoop classpath环境变量,使其包括tez libraries。
export TEZ_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export TEZ_HOME=/usr/local/src/tez
export TEZ_JARS=${TEZ_HOME}/*:${TEZ_HOME}/lib/*
export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
- 配置tez ui
tez ui依赖于YARN timelineserver
服务。在hadoop2.4版本之前对任务执行的监控只开发了针对MR的Job History Server,它可以提供给用户用户查询已经运行完成的作业的信息,但是后来,随着在YARN上面集成的越来越多的计算框架,比如spark、Tez,也有必要为基于这些计算引擎的技术开发相应的作业任务监控工具,所以hadoop的开发人员就考虑开发一款更加通用的Job History Server
,即YARN Timeline Server
。
在yarn-site.xml
文件添加如下内容配置YARN Timeline Server
。更加详细的配置可参考TimelineServer
特别需要注意的是yarn.timeline-service.hostname
需要改成启动TimelineServer
服务的节点地址,如我在master机器上启动,这里就写master。官网默认是0.0.0.0,这样DAG Master会报错找不到TimelineServer。改成真正的hostname即可。
<!--configurations for timelineserver-->
<property>
<name>yarn.timeline-service.hostname</name>
<value>master</value>
</property>
<property>
<description>Address for the Timeline server to start the RPC server.</description>
<name>yarn.timeline-service.address</name>
<value>${yarn.timeline-service.hostname}:10200</value>
</property>
<property>
<description>The http address of the Timeline service web application.</description>
<name>yarn.timeline-service.webapp.address</name>
<value>${yarn.timeline-service.hostname}:8188</value>
</property>
<property>
<description>The https address of the Timeline service web application.</description>
<name>yarn.timeline-service.webapp.https.address</name>
<value>${yarn.timeline-service.hostname}:8190</value>
</property>
<property>
<description>Handler thread count to serve the client RPC requests.</description>
<name>yarn.timeline-service.handler-thread-count</name>
<value>10</value>
</property>
<property>
<description>The max number of applications could be fetched by using REST API
or application history protocol and shown in timeline server web ui. Defaults
to `10000`.</description>
<name>yarn.timeline-service.generic-application-history.max-applications</name>
<value>10000</value>
</property>
<property>
<description>Enables cross-origin support (CORS) for web services where
cross-origin web response headers are needed. For example, javascript making
a web services request to the timeline server.</description>
<name>yarn.timeline-service.http-cross-origin.enabled</name>
<value>true</value>
</property>
<property>
<description>Comma separated list of origins that are allowed for web
services needing cross-origin (CORS) support. Wildcards (*) and patterns
allowed</description>
<name>yarn.timeline-service.http-cross-origin.allowed-origins</name>
<value>*</value>
</property>
<property>
<description>Comma separated list of methods that are allowed for web
services needing cross-origin (CORS) support.</description>
<name>yarn.timeline-service.http-cross-origin.allowed-methods</name>
<value>GET,POST,HEAD</value>
</property>
<property>
<description>Comma separated list of headers that are allowed for web
services needing cross-origin (CORS) support.</description>
<name>yarn.timeline-service.http-cross-origin.allowed-headers</name>
<value>X-Requested-With,Content-Type,Accept,Origin</value>
</property>
<property>
<description>The number of seconds a pre-flighted request can be cached
for web services needing cross-origin (CORS) support.</description>
<name>yarn.timeline-service.http-cross-origin.max-age</name>
<value>1800</value>
</property>
<property>
<description>Indicate to ResourceManager as well as clients whether
history-service is enabled or not. If enabled, ResourceManager starts
recording historical data that Timelien service can consume. Similarly,
clients can redirect to the history service when applications
finish if this is enabled.</description>
<name>yarn.timeline-service.generic-application-history.enabled</name>
<value>true</value>
</property>
<property>
<description>Store class name for history store, defaulting to file system
store</description>
<name>yarn.timeline-service.generic-application-history.store-class</name>
<value>org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore</value>
</property>
<property>
<description>Indicate to clients whether Timeline service is enabled or not.
If enabled, the TimelineClient library used by end-users will post entities
and events to the Timeline server.</description>
<name>yarn.timeline-service.enabled</name>
<value>true</value>
</property>
<property>
<description>Store class name for timeline store.</description>
<name>yarn.timeline-service.store-class</name>
<value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value>
</property>
<property>
<description>Enable age off of timeline store data.</description>
<name>yarn.timeline-service.ttl-enable</name>
<value>true</value>
</property>
<property>
<description>Publish YARN information to Timeline Server</description>
<name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
<value>true</value>
</property>
<property>
<description>Time to live for timeline store data in milliseconds.</description>
<name>yarn.timeline-service.ttl-ms</name>
<value>604800000</value>
</property>
在tez-site.xml
文件添加如下内容来配置tez ui
<property>
<name>tez.history.logging.service.class</name>
<value>org.apache.tez.dag.history.logging.ats.ATSHistoryLoggingService</value>
</property>
<property>
<name>tez.tez-ui.history-url.base</name>
<value>http://master:8080/tez-ui/</value>
</property>
- 在某个节点下例如master节点安装tomcat,并将
${TEZ_HOME}/tez-ui-0.8.5.war
文件拷贝到${TOMCAT_HOME}/webapps/
下并重命名为tez-ui.war
,如下图。这就对应上面tez-site.xml
文件中的tez.tez-ui.history-url.base
属性值
- 如果tomcat不是安装在
YARN Timeline Server
服务启动的节点,就需要修改tez-ui/scripts/configs.js
文件,如下所示,timelineBaseUrl
和RMWebUrl
写成正确的地址
- 修改
hive-site.xml
文件,将执行引擎修改为tez,如下所示
<property>
<name>hive.execution.engine</name>
<value>tez</value>
<description/>
</property>
- 编辑完对应的文件后,启动hdfs集群和yarn集群以及
Timeline Server
服务和tomcat
start-dfs.sh
start-yarn.sh
yarn-daemon.sh start historyserver
测试hive on tez
在hive里执行hql语句后出现如下图所示的结果,并且能在yarn ui上点开进入到tez ui界面
默认情况下,application对应的历史文件会存储在yarn.timeline-service.leveldb-timeline-store.path
,默认值是${hadoop.tmp.dir}/yarn/timeline
如果想退回用hive on mr,则可以通过unset命令取消掉当前会话下关于TEZ的环境变量和HADOOP_CLASSPATH,并同时修改hive-site.xml文件中的执行引擎,然后重启hiveserver2服务重新进入beeline
就可以退回了。
如果想再次用hive on tez,则需要source /etc/profile
来加载关于TEZ的环境变量和HADOOP_CLASSPATH,并同时修改hive-site.xml文件中的执行引擎,然后重启hiveserver2服务重新进入beeline
。
unset HADOOP_CLASSPATH
unset TEZ_CONF_DIR
unset TEZ_HOME
unset TEZ_JARS
beeline -u jdbc:hive2://master:10000 -n root --hiveconf hive.execution.engine=mr
不按照上述操作的话直接换成mr引擎,可能报SuchNoField
等错误,明显的版本不兼容。