hdp2.4集成spark2.X

Hdp2.4集成spark2

集成步骤

1. 从官网下载http://spark.apache.org/downloads.html 下载spark2.3 包

2. 把spark2.3包上传到需要安装的机器上。

cd /usr/hdp/2.4.0.0-169

tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz

mv spark-2.3.0-bin-hadoop2.7 spark2

3. 修改spark2 的用户名和用户组chown -R root:root *

4. 创建软连接指向spark2实际目录。

ln -s spark2-client /usr/hdp/2.4.0.0-169/spark2

ln -s spark2-historyserver /usr/hdp/2.4.0.0-169/spark2

ln -s spark2-thriftserver /usr/hdp/2.4.0.0-169/spark2

5. 进入spark2修改conf目录下的配置文件。

cd conf

cp spark-env.sh.template spark-env.sh

cp spark-defaults.conf.template spark-defaults.conf

6修改文件 vi spark-env.sh 。在起文件末尾添加内容

# Alternate conf dir. (Default: ${SPARK_HOME}/conf)

export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/current/spark2-historyserver/conf}

# Where log files are stored.(Default:${SPARK_HOME}/logs)

#export SPARK_LOG_DIR=${SPARK_HOME:-/usr/hdp/current/spark2-historyserver}/logs

export SPARK_LOG_DIR=/var/log/spark2

# Where the pid file is stored. (Default: /tmp)

export SPARK_PID_DIR=/var/run/spark2

#Memory for Master, Worker and history server (default: 1024MB)

export SPARK_DAEMON_MEMORY=1024m

# A string representing this instance of spark.(Default: $USER)

SPARK_IDENT_STRING=$USER

# The scheduling priority for daemons. (Default: 0)

SPARK_NICENESS=0

export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/current/hadoop-client/conf}

# The java implementation to use.

export JAVA_HOME=/usr/jdk64/jdk1.8.0_60

7修改vi spark-defaults.conf 的配置文件内容。在文件结尾添加如下内容

spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native

spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native

spark.eventLog.dir hdfs:///spark2-history

spark.eventLog.enabled true

# Required: setting this parameter to 'false' turns off ATS timeline server for Spark

spark.hadoop.yarn.timeline-service.enabled false

spark.driver.extraJavaOptions -Dhdp.version=2.4.0.0-169

spark.yarn.am.extraJavaOptions -Dhdp.version=2.4.0.0-169

spark.history.fs.logDirectory hdfs:///spark2-history

#spark.history.kerberos.keytab none

#spark.history.kerberos.principal none

#spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider

#spark.history.ui.port 18080

spark.yarn.containerLauncherMaxThreads 25

spark.yarn.driver.memoryOverhead 200

spark.yarn.executor.memoryOverhead 200

#spark.yarn.historyServer.address sandbox.hortonworks.com:18080

spark.yarn.max.executor.failures 3

spark.yarn.preserve.staging.files false

spark.yarn.queue default

spark.yarn.scheduler.heartbeat.interval-ms 5000

spark.yarn.submit.file.replication 3

spark.ui.port 4041

8. 在ambari界面修改yarn的参数。

yarn.scheduler.maximum-allocation-mb = 2500MB

yarn.nodemanager.resource.memory-mb = 7800MB

9测试hdp集成spark2

提交job测试

spark2运行测试：

export SPARK_MAJOR_VERSION=2

./bin/spark-submit \

--class org.apache.spark.examples.SparkPi \

--master yarn-client \

--num-executors 3 \

--driver-memory 512m \

--executor-memory 512m \

--executor-cores 1 \

examples/jars/spark-examples*.jar 10

./bin/spark-submit \

--class org.apache.spark.examples.SparkTC \

--master yarn-client \

--num-executors 3 \

--driver-memory 512m \

--executor-memory 512m \

--executor-cores 1 \

examples/jars/spark-examples*.jar 10

10.在ambari页面yarn 参考job运行状态。

参考链接：

https://community.hortonworks.com/articles/53029/how-to-install-and-run-spark-20-on-hdp-25-sandbox.html

Ps:

如果遇到HDFS写权限问题，可以转换角色。或者在设置权限

dfs.permissions.enabled=false

Over

2018.6.11

猜你喜欢