一、集群搭建Spark*3+Hadoop*1
-
Linux准备工作
(1)安装镜像CentOS6.5-minimal.iso
(2)配置ip临时配置: 1. ifconfig eth0 192.168.109.191 (192,193) 2. ping 192.168.109.19x (自己) 3. 修改/etc/hosts,添加“ip/主机”映射 4. ping {主机名} (自己) 永久配置 5. vi /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 TYPE=Ethernet ONBOOT=yes NM_CONTROLLED=yes BOOTPROTO=static IPADDR=192.168.109.19x NETMASK=255.255.255.0 GATEWAY=192.168.109.2 6. service network restart
(3)关闭防火墙
7. service iptables stop 8. chkconfig iptables off 9. vi /etc/selinux/config, 修改SELINUX=disabled
(4)配置DNS服务器
10. vi /etc/resolv.conf, 修改nameserver 114.114.114.114 11. ping检查
(5) 上传CentOs6-Base-163.repo至/etc/yum.repos.d/
12. 置所有gpgcheck=0 13. yum clean all 14. yum makecache 15. yum install talnet
(6) 配置jdk
16. 将jdk.rpm上传至/usr/local/myspark下 17. rpm -ivh jdk.rpm 18. 配置环境变量,vi /root/.bashrc export JAVA_HOME=/usr/java/latest export PATH=$PATH:$JAVA_HOME/bin 19. source .bashrc 20. java -version检查
-
hosts节点全映射配置
-
配置SSH免密登录
(1)三台主机配置对本机的SSH免密登录ssh-keygen -t rsa,不断敲击空格,默认将产生的rsa公钥放在/root/.ssh目录 cp id_rsa.pub authorized_keys, 将公钥复制为authorized_keys
(2)三台主机间SSH免密登录
ssh-copy-id -i sparkx,将本机的公钥拷贝到sparkx的authorized_keys文件中
-
搭建Hadoop(HDFS+Yarn)
(1)上传hadoop-xxx.tar.gz至/usr/local/myspark/
(2)解压缩,删除压缩包,重命名为hadoop
(3)配置Hadoop环境变量,vi .bashrcexport HADOOP_HOME=/usr/local/myspark/hadoop export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
(4)修改Hadoop配置文件
<!-- core-site.xml --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://spark1:9000</value> </property> </configuration> <!-- hdfs-site.xml --> <configuration> <property> <name>dfs.name.dir</name> <value>/usr/local/data/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/data/datanode</value> </property> <property> <name>dfs.tmp.dir</name> <value>/usr/local/data/tmp</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> <!-- mapreduce.xml --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration> <!-- yarn-site.xml --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>spark1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration> <!-- slaves --> spark1 spark2 spark3
(5)使用如上配置对集群中的另外两台及其搭建:通过scp,将spark1上面的hadoop和.bashrc配置文件全拷贝,并创建对应的数据目录。注意 source .bashrc 以立即生效。
-
HDFS, YARN集群启动
(1) 格式化hdfs:在主节点spark1上执行hdfs namenode -format
(2) 启动hdfs集群:在主节点spark1上执行start-dfs.sh
(3) jps与50070验证HDFS
(4) 启动yarn集群:在主节点spark1上执行start-yarn.sh
(5) jps与8088验证YARN -
搭建Hive
(1)上传apache-hive-xxx.tar.gz至/usr/local/myspark
(2) 解压缩,删除压缩包,重命名为hive
(3)配置Hive环境变量,vi .bashrcexport HIVE_HOME=/usr/local/myspark/hive export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
(4)安装MySql作为Hive原数据库
1. yum install -y mysql -server 2. service mysqld start 3. chkconfig mysqld on 4. yum install -y mysql-connector-java 5. cp /usr/share/java/mysql-connector.jar /usr/local/myspark/hive/lib/ 6. 登录MySql (1)创建元数据库 CREATE DATABASE IF NOT EXISTS hive_metadata (2)授权 GRANT ALL PRIVILEGES ON hive_metadata.* to 'hive'@'%' IDENTIFIED BY 'hive'; GRANT ALL PRIVILEGES ON hive_metadata.* to 'hive'@'localhost' IDENTIFIED BY 'hive'; GRANT ALL PRIVILEGES ON hive_metadata.* to 'hive'@'spark1' IDENTIFIED BY 'hive'; (3)退出 EXIT 7. 修改配置文件( .../hive/conf目录下) (1) cp hive-default-xml.template hive-site.xml (2) vi hive-site.xml * javax.jdo.option.ConnectionURL * javax.jdo.option.ConnectionDriverName * javax.jdo.option.ConnectionUserName * javax.jdo.option.ConnectionPassword * hive.metastore.warehouse.dir (3) mv hive-env.sh.template hive-env.sh (4) [.../hive/bin] vi hive-config.sh export JAVA_HOME=/usr/java/latest export HIVE_HOME=/usr/local/myspark/hive export HADOOP_HOME=/usr/local/myspark/hadoop (5) [Hive 2.x]启动之前需要对元数据库初始化, .../hive/bin/schematool -dbType mysql -initSchema
-
搭建Zookeeper
(1) 上传zookeeper-xxx.tar.gz至/usr/local/myspark
(2) 解压缩,删除压缩包,重命名zookeeper
(3)配置ZooKeeper环境变量export ZOOKEEPER_HOME=/usr/local/myspark/zookeeper export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin
(4) 修改配置文件(…/zookeeper/conf下)
1. mv zoo_sample.cfg zoo.cfg 2. vi zoo.cfg dataDir=/usr/local/zookeeper/tmp server.0=spark1:2888:3888 server.1=spark2:2888:3888 server.2=spark3:2888:3888
(5) 设置工作目录
3. mkdir data 4. vim data/myid 插入标识0
(6) 通过scp将ZooKeeper复制到另外节点,注意修改对应的myid
(7) 在三台节点上启动:zkServer.sh start
(8) zkServer.sh status 检查状态 -
安装Scala
(1) 上传scala-xxx.tgz至/usr/local/myspark
(2) 解压缩,删除压缩包,重命名scala
(3) 配置scala环境变量export SCALA_HOME=/usr/local/myspark/scala export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin
(4) 使用scp将scala及~/.bashrc配置到另外节点
-
搭建Kalfka
(1) 上传kafka-xxx.tgz, slf4j至/usr/local/myspark
(2) 解压缩,删除压缩包,重命名Kafka
(3) 修改Kalfka配置文件(…/kafka/config/server.properties)broker.id=0(1,2,3...) zookeeper.connect=192.168.109.191:2181,192.168.109.192:2181,192.168.109.193:2181
(4) 提取sl4j-nop-xxx.jar,复制到…/kalfka/libs/
(5) 使用scp将Kalfka配置到另外节点并修改config/server.properties中的broker.id
(6) 在Kafka主目录下启动nohup bin/kafka-server-start.sh config/server.properties &
(7) 基本命令测试
bin/kafka-topics.sh --zookeeper 192.168.109.191:2181,192.168.109.192:2181,192.168.109.193:2181 --topic TestTopic --replication-factor 1 --partitions 1 --create bin/kafka-console-producer.sh --broker-list 192.168.109.191:2181,192.168.109.192:2181,192.168.109.193:2181 --topic TestTopic bin/kafka-console-consumer.sh --zookeeper 192.168.109.191:2181,192.168.109.192:2181,192.168.109.193:2181 --topic TestTopic --from-begining
-
搭建Spark集群
(1) 上传spark-xxx.tar.gz 至 /usr/local/myspark
(2) 解压缩,删除压缩包,重命名spark
(3) 配置Spark相关环境变量export SPARK_HOME=/usr/local/myspark/spark export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
(4) 修改Spark配置文件 conf/spark-env.sh
mv spark-env.sh.template spark-env.sh vi spark-env.sh export JAVA_HOME=/usr/java/latest export SCALA_HOME=/usr/local/myspark/scala export SPARK_MASTER_IP=192.168.109.191 export SPARK_WORKER_MEMORY=1g export HADOOP_CONF_DIR=/usr/local/myspark/hadoop/etc/hadoop mv slaves.template slaves vi slaves spark2 spark3 #注:Spark需要大量内存进行计算,故主节点不予配置
(5) 使用scp将spark及~/.bashrc配置到另外两台结点
(6) 启动sparksbin/start-all.sh
(7) 内置可视化页面 spark1:8080