spark与hive部署在一台服务器上。
1 hive-site.xml
这里可以看到spark和hive使用的是thrift协议通讯
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://bwsc79:9083,thrift://bwsc80:9083,thrift://bwsc81:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
</configuration>
2 hbase-site.xml
从下面的配置可以看出,spark中的hbase的配置,是从hbase拷贝过来,可以看看第3.1.1章 hbase环境准备
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://bwsc65:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/appdata/zookeeper/data</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>bwsc65,bwsc66,bwsc67</value>
</property>
<property>
<name>hbase.block.data.cachecompressed</name>
<value>true</value>
</property>
<property>
<name>hbase.regionserver.codecs</name>
<value>snappy</value>
</property>
</configuration>
vi /etc/profile
export HIVE_HOME=/application/hive
export PATH=$PATH:$HIVE_HOME/bin
export SPARK_CLUSTER_HOME=/application/spark
source /etc/profile
这里看到spark使用yarn做资源调度,这里配置了hadoop的jar,所以配置对应的classpath,但zk这里用的是真是的hadoop集群依赖的。
export LD_LIBRARY_PATH="$HADOOP_HOME/lib/native"
export SPARK_DIST_CLASSPATH=$(/application/hadoop/bin/hadoop classpath)
export SPARK_SSH_OPTS="-p 52113"
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bwsc65:2181,bwsc66:2181,bwsc67:2181 -Dspark.deploy.zookeeper.dir=/spark"
export SPARK_LOG_DIR=/appdata/spark-cluster/logs
export SPARK_WORKER_CORES=6
export SPARK_WORKER_MEMORY=6144m
export SPARK_WORKER_PORT=9090
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
4 spark-defaults.conf
spark.master spark://bwsc79:6066,bwsc80:6066,bwsc81:6066
spark.submit.deployMode cluster
spark.eventLog.enabled false
spark.eventLog.dir hdfs://bwsc65:9000/spark-cluster/logs
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.jars $SPARK_HOME/lib/*.jar
#spark.cores.max 1
spark.deploy.defaultCores 2
spark.driver.cores 1
spark.driver.memory 512m
spark.driver.supervise true
#executor cores = spark.max.cores / spark.executor.cores
spark.executor.cores 1
spark.executor.memory 512m
#scheduler
spark.scheduler.mode FAIR
spark.task.maxFailures 4
5 slaves
三个节点一样
bwsc79
bwsc80
bwsc81