Hive on Spark on YARN配置整理

版本兼容性

配置之前,首先确保Hive与、spark、hadoop版本的兼容性,方法是下载hive源码,在hive源码的根目录下有一个pom.xml文件,在该文件中通过查找spark.version可以查看到与该hive版本兼容的spark版本号,同理,在该文件中通过查找hadoop.version可以查看到与该hive版本兼容的hadoop版本号,注意:关于版本兼容性,只要保证大版本号一致即可(如与hive-3.1.1兼容的spark版本为2.3.0,hadoop版本为3.1.0,这种情况下,我们可以安装spark-2.3.3稳定版本,hadoop版本也是类似的)
对于hive与spark版本的兼容性,也可以参考:
https://cwiki.apache.org//confluence/display/Hive/Hive+on+Spark:+Getting+Started

spark安装(建议源码编译安装)

此处以源码编译安装为例

源码编译(以spark-2.3.3版本,hadoop-3.1.0版本为例)

下载spark-2.3.3版本源码,解压后,进入源码根目录,执行如下命令编译源码(注:在linux上编译,且linux主机需要能联网,此外该主机上已安装Maven,maven版本至少3.3.9以上)

./dev/make-distribution.sh --name "h310-without-hive" --tgz "-Pyarn,-Phadoop-3.1,-Dhadoop.version=3.1.0,parquet-provided,orc-provided"

编译好后,在源码根目录下会生成安装包“spark-2.3.3-bin-h310-without-hive.tgz”,spark on yarn部署方法具体参考:
https://blog.csdn.net/wangkai_123456/article/details/87348161#3Spark_25

YARN配置

在${HADOOP_HOME}/etc/hadoop/yarn-site.xml文件中添加如下配置:

   <property>
       <name>yarn.resourcemanager.scheduler.class</name>
       <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
   </property>

Hive配置

To add the Spark dependency to Hive

Prior to Hive 2.2.0, link the spark-assembly jar to HIVE_HOME/lib.
Since Hive 2.2.0, Hive on Spark runs with Spark 2.0.0 and above, which doesn’t have an assembly jar.
To run with YARN mode (either yarn-client or yarn-cluster), link the following jars to HIVE_HOME/lib.
scala-library
spark-core
spark-network-common
To run with LOCAL mode (for debugging only), link the following jars in addition to those above to HIVE_HOME/lib.
chill-java chill jackson-module-paranamer jackson-module-scala jersey-container-servlet-core
jersey-server json4s-ast kryo-shaded minlog scala-xml spark-launcher
spark-network-shuffle spark-unsafe xbean-asm5-shaded

Configure Hive execution engine to use Spark

在${HIVE_HOME}conf/hive-site.xml文件中增加如下配置:

     <property>
         <name>hive.execution.engine</name>
         <value>spark</value>
     </property>

Configure Spark-application configs for Hive

在${HIVE_HOME}conf/hive-site.xml文件中增加如下配置:

     <property>
         <name>spark.home</name>
         <value>/usr/local/spark</value>
     </property>
     <property>
         <name>spark.master</name>
         <value>yarn-cluster</value>
     </property>
     <property>
         <name>hive.spark.client.channel.log.level</name>
         <value>WARN</value>
     </property>
     <property>
         <name>spark.eventLog.enabled</name>
         <value>true</value>
     </property>
     <property>
         <name>spark.eventLog.dir</name>
         <value>hdfs://hadoopSvr1:8020/user/hive/tmp/sparkeventlog</value> 
     </property>
     <property>
         <name>spark.executor.memory</name>
         <value>1g</value>
     </property>
     <property>
         <name>spark.executor.cores</name>
         <value>2</value>
     </property>
     <property>
         <name>spark.executor.instances</name>
         <value>6</value>
     </property>
     <property>
         <name>spark.yarn.executor.memoryOverhead</name>
         <value>150m</value>
     </property>
     <property>
         <name>spark.driver.memory</name>
         <value>4g</value>
     </property>
     <property>
         <name>spark.yarn.driver.memoryOverhead</name>
         <value>400m</value>
     </property>
     <property>
         <name>spark.serializer</name>
         <value>org.apache.spark.serializer.KryoSerializer</value>
     </property>
     <property>
         <name>spark.yarn.jars</name>
         <value>hdfs://hadoopSvr1:8020/spark-jars/*</value>
     </property>

Allow Yarn to cache necessary spark dependency jars on nodes so that it does not need to be distributed each time when an application runs.

上传 $SPARK_HOME/jars目录下的所有jar包到HDFS目录(比如:hdfs://hadoopSvr1:8020/spark-jars),并${HIVE_HOME}conf/hive-site.xml文件中增加如下配置:

扫描二维码关注公众号,回复: 5430298 查看本文章
<property>
  <name>spark.yarn.jars</name>
  <value>hdfs://hadoopSvr1:8020/spark-jars/*</value>
</property>

spark配置(可选)

参考:https://cwiki.apache.org//confluence/display/Hive/Hive+on+Spark:+Getting+Started

猜你喜欢

转载自blog.csdn.net/wangkai_123456/article/details/88135386