hdp2.4集成spark2.X

Hdp2.4集成spark2

集成步骤

1. 从官网下载http://spark.apache.org/downloads.html  下载spark2.3

2. spark2.3包上传到需要安装的机器上。

cd  /usr/hdp/2.4.0.0-169

tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz

mv spark-2.3.0-bin-hadoop2.7  spark2

3. 修改spark2 的用户名和用户组chown -R root:root *

 

 

4. 创建软连接指向spark2实际目录。

ln -s  spark2-client  /usr/hdp/2.4.0.0-169/spark2

ln -s  spark2-historyserver  /usr/hdp/2.4.0.0-169/spark2

ln -s  spark2-thriftserver  /usr/hdp/2.4.0.0-169/spark2

 

5. 进入spark2修改conf目录下的配置文件。

cd conf

cp spark-env.sh.template spark-env.sh

cp spark-defaults.conf.template spark-defaults.conf

6修改文件 vi   spark-env.sh  。在起文件末尾添加内容

# Alternate conf dir. (Default: ${SPARK_HOME}/conf)

export SPARK_CONF_DIR=${SPARK_CONF_DIR:-/usr/hdp/current/spark2-historyserver/conf}

 

# Where log files are stored.(Default:${SPARK_HOME}/logs)

#export SPARK_LOG_DIR=${SPARK_HOME:-/usr/hdp/current/spark2-historyserver}/logs

export SPARK_LOG_DIR=/var/log/spark2

 

# Where the pid file is stored. (Default: /tmp)

export SPARK_PID_DIR=/var/run/spark2

 

#Memory for Master, Worker and history server (default: 1024MB)

export SPARK_DAEMON_MEMORY=1024m

 

# A string representing this instance of spark.(Default: $USER)

SPARK_IDENT_STRING=$USER

 

# The scheduling priority for daemons. (Default: 0)

SPARK_NICENESS=0

 

export HADOOP_HOME=${HADOOP_HOME:-/usr/hdp/current/hadoop-client}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/usr/hdp/current/hadoop-client/conf}

 

# The java implementation to use.

export JAVA_HOME=/usr/jdk64/jdk1.8.0_60

7修改vi  spark-defaults.conf  的配置文件内容。在文件结尾添加如下内容

spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native

spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native

spark.eventLog.dir hdfs:///spark2-history

spark.eventLog.enabled true

 

# Required: setting this parameter to 'false' turns off ATS timeline server for Spark

spark.hadoop.yarn.timeline-service.enabled false

 

spark.driver.extraJavaOptions -Dhdp.version=2.4.0.0-169

spark.yarn.am.extraJavaOptions -Dhdp.version=2.4.0.0-169

 

spark.history.fs.logDirectory hdfs:///spark2-history

#spark.history.kerberos.keytab none

#spark.history.kerberos.principal none

#spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider

#spark.history.ui.port 18080

 

spark.yarn.containerLauncherMaxThreads 25

spark.yarn.driver.memoryOverhead 200

spark.yarn.executor.memoryOverhead 200

#spark.yarn.historyServer.address sandbox.hortonworks.com:18080

spark.yarn.max.executor.failures 3

spark.yarn.preserve.staging.files false

spark.yarn.queue default

spark.yarn.scheduler.heartbeat.interval-ms 5000

spark.yarn.submit.file.replication 3

spark.ui.port 4041

8. ambari界面修改yarn的参数。

yarn.scheduler.maximum-allocation-mb = 2500MB

yarn.nodemanager.resource.memory-mb = 7800MB

 

 

9测试hdp集成spark2

提交job测试

spark2运行测试:

 

export SPARK_MAJOR_VERSION=2

  

./bin/spark-submit \

    --class org.apache.spark.examples.SparkPi \

    --master yarn-client \

    --num-executors 3 \

    --driver-memory 512m \

    --executor-memory 512m \

    --executor-cores 1 \

    examples/jars/spark-examples*.jar 10    

    

    

 

./bin/spark-submit \

    --class org.apache.spark.examples.SparkTC \

    --master yarn-client \

    --num-executors 3 \

    --driver-memory 512m \

    --executor-memory 512m \

    --executor-cores 1 \

examples/jars/spark-examples*.jar 10

10.ambari页面yarn  参考job运行状态。

参考链接:

https://community.hortonworks.com/articles/53029/how-to-install-and-run-spark-20-on-hdp-25-sandbox.html

Ps:

如果遇到HDFS写权限问题,可以转换角色。或者在设置权限

dfs.permissions.enabled=false

 

 

Over

                                        2018.6.11

猜你喜欢

转载自blog.csdn.net/liuxiangke0210/article/details/80653699