[笔记迁移][Spark][1]Spark环境搭建

版权声明:Collected by Bro_Rabbit only for study https://blog.csdn.net/weixin_38240095/article/details/83515284

一、集群搭建Spark*3+Hadoop*1

  1. Linux准备工作
    (1)安装镜像CentOS6.5-minimal.iso
    (2)配置ip

    临时配置:
    1. ifconfig eth0 192.168.109.191	(192,193)
    2. ping 192.168.109.19x (自己)
    3. 修改/etc/hosts,添加“ip/主机”映射
    4. ping {主机名} (自己)
    
    永久配置
    5. vi /etc/sysconfig/network-scripts/ifcfg-eth0
    	DEVICE=eth0
    	TYPE=Ethernet
    	ONBOOT=yes
    	NM_CONTROLLED=yes
    	BOOTPROTO=static
    	IPADDR=192.168.109.19x
    	NETMASK=255.255.255.0
    	GATEWAY=192.168.109.2
    6. service network restart
    

    (3)关闭防火墙

    7. service iptables stop
    8. chkconfig iptables off
    9. vi /etc/selinux/config, 修改SELINUX=disabled
    

    (4)配置DNS服务器

    10. vi /etc/resolv.conf, 修改nameserver 114.114.114.114
    11. ping检查
    

    (5) 上传CentOs6-Base-163.repo至/etc/yum.repos.d/

    12. 置所有gpgcheck=0
    13. yum clean all
    14. yum makecache
    15. yum install talnet
    

    (6) 配置jdk

    16. 将jdk.rpm上传至/usr/local/myspark下
    17.  rpm -ivh jdk.rpm
    18.  配置环境变量,vi /root/.bashrc
    		export JAVA_HOME=/usr/java/latest
    		export PATH=$PATH:$JAVA_HOME/bin
    19. source .bashrc
    20.  java -version检查 
    
  2. hosts节点全映射配置

  3. 配置SSH免密登录
    (1)三台主机配置对本机的SSH免密登录

    ssh-keygen -t rsa,不断敲击空格,默认将产生的rsa公钥放在/root/.ssh目录
    cp id_rsa.pub authorized_keys, 将公钥复制为authorized_keys
    

    (2)三台主机间SSH免密登录

    ssh-copy-id -i sparkx,将本机的公钥拷贝到sparkx的authorized_keys文件中
    
  4. 搭建Hadoop(HDFS+Yarn)
    (1)上传hadoop-xxx.tar.gz至/usr/local/myspark/
    (2)解压缩,删除压缩包,重命名为hadoop
    (3)配置Hadoop环境变量,vi .bashrc

    export HADOOP_HOME=/usr/local/myspark/hadoop
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    

    (4)修改Hadoop配置文件

    <!-- core-site.xml -->
    <configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://spark1:9000</value>
        </property>
    </configuration>
    
    <!-- hdfs-site.xml -->
    <configuration>
        <property>
                <name>dfs.name.dir</name>
                <value>/usr/local/data/namenode</value>
        </property>
        <property>
                <name>dfs.data.dir</name>
                <value>/usr/local/data/datanode</value>
        </property>
        <property>
                <name>dfs.tmp.dir</name>
                <value>/usr/local/data/tmp</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
    </configuration>
    
    <!-- mapreduce.xml -->
    <configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
    </configuration>
    
    <!-- yarn-site.xml -->
    <configuration>
    <!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>spark1</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
    </configuration>
    
    <!-- slaves -->
    spark1
    spark2
    spark3
    
    

    (5)使用如上配置对集群中的另外两台及其搭建:通过scp,将spark1上面的hadoop和.bashrc配置文件全拷贝,并创建对应的数据目录。注意 source .bashrc 以立即生效。

  5. HDFS, YARN集群启动
    (1) 格式化hdfs:在主节点spark1上执行hdfs namenode -format
    (2) 启动hdfs集群:在主节点spark1上执行start-dfs.sh
    (3) jps与50070验证HDFS
    (4) 启动yarn集群:在主节点spark1上执行start-yarn.sh
    (5) jps与8088验证YARN

  6. 搭建Hive
    (1)上传apache-hive-xxx.tar.gz至/usr/local/myspark
    (2) 解压缩,删除压缩包,重命名为hive
    (3)配置Hive环境变量,vi .bashrc

    export HIVE_HOME=/usr/local/myspark/hive
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
    

    (4)安装MySql作为Hive原数据库

    1. yum install -y mysql -server
    2. service mysqld start
    3. chkconfig mysqld on
    4. yum install -y mysql-connector-java
    5. cp /usr/share/java/mysql-connector.jar /usr/local/myspark/hive/lib/
    6. 登录MySql
    	(1)创建元数据库 
    		CREATE DATABASE IF NOT EXISTS hive_metadata
    	(2)授权
    		GRANT ALL PRIVILEGES ON hive_metadata.* to 'hive'@'%' IDENTIFIED BY 'hive';
    		GRANT ALL PRIVILEGES ON hive_metadata.* to 'hive'@'localhost' IDENTIFIED BY 'hive';
    		GRANT ALL PRIVILEGES ON hive_metadata.* to 'hive'@'spark1' IDENTIFIED BY 'hive';
    	(3)退出
    		EXIT
    7. 修改配置文件( .../hive/conf目录下)
    	(1) cp hive-default-xml.template hive-site.xml
    	(2) vi hive-site.xml
    		* javax.jdo.option.ConnectionURL
    		* javax.jdo.option.ConnectionDriverName
    		* javax.jdo.option.ConnectionUserName
    		* javax.jdo.option.ConnectionPassword
    		* hive.metastore.warehouse.dir
    	(3) mv hive-env.sh.template hive-env.sh
    	(4) [.../hive/bin] vi hive-config.sh
    		export JAVA_HOME=/usr/java/latest
    		export 	HIVE_HOME=/usr/local/myspark/hive
    		export HADOOP_HOME=/usr/local/myspark/hadoop
    	(5) [Hive 2.x]启动之前需要对元数据库初始化, .../hive/bin/schematool -dbType mysql -initSchema
    
  7. 搭建Zookeeper
    (1) 上传zookeeper-xxx.tar.gz至/usr/local/myspark
    (2) 解压缩,删除压缩包,重命名zookeeper
    (3)配置ZooKeeper环境变量

    export ZOOKEEPER_HOME=/usr/local/myspark/zookeeper
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin
    

    (4) 修改配置文件(…/zookeeper/conf下)

    1. mv zoo_sample.cfg	zoo.cfg
    2. vi zoo.cfg
    	dataDir=/usr/local/zookeeper/tmp
    	server.0=spark1:2888:3888
    	server.1=spark2:2888:3888
    	server.2=spark3:2888:3888
    

    (5) 设置工作目录

    3. mkdir data
    4. vim data/myid	插入标识0
    

    (6) 通过scp将ZooKeeper复制到另外节点,注意修改对应的myid
    (7) 在三台节点上启动:zkServer.sh start
    (8) zkServer.sh status 检查状态

  8. 安装Scala
    (1) 上传scala-xxx.tgz至/usr/local/myspark
    (2) 解压缩,删除压缩包,重命名scala
    (3) 配置scala环境变量

    export SCALA_HOME=/usr/local/myspark/scala
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin
    

    (4) 使用scp将scala及~/.bashrc配置到另外节点

  9. 搭建Kalfka
    (1) 上传kafka-xxx.tgz, slf4j至/usr/local/myspark
    (2) 解压缩,删除压缩包,重命名Kafka
    (3) 修改Kalfka配置文件(…/kafka/config/server.properties)

    broker.id=0(1,2,3...)
    zookeeper.connect=192.168.109.191:2181,192.168.109.192:2181,192.168.109.193:2181
    

    (4) 提取sl4j-nop-xxx.jar,复制到…/kalfka/libs/
    (5) 使用scp将Kalfka配置到另外节点并修改config/server.properties中的broker.id
    (6) 在Kafka主目录下启动

    nohup bin/kafka-server-start.sh config/server.properties &
    

    (7) 基本命令测试

    bin/kafka-topics.sh --zookeeper 192.168.109.191:2181,192.168.109.192:2181,192.168.109.193:2181 --topic TestTopic --replication-factor 1 --partitions 1 --create
    
    bin/kafka-console-producer.sh --broker-list 192.168.109.191:2181,192.168.109.192:2181,192.168.109.193:2181 --topic TestTopic
    
    bin/kafka-console-consumer.sh --zookeeper 192.168.109.191:2181,192.168.109.192:2181,192.168.109.193:2181 --topic TestTopic --from-begining
    
  10. 搭建Spark集群
    (1) 上传spark-xxx.tar.gz 至 /usr/local/myspark
    (2) 解压缩,删除压缩包,重命名spark
    (3) 配置Spark相关环境变量

    export SPARK_HOME=/usr/local/myspark/spark
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
    export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
    

    (4) 修改Spark配置文件 conf/spark-env.sh

    mv spark-env.sh.template spark-env.sh
    vi spark-env.sh
    
    export JAVA_HOME=/usr/java/latest
    export SCALA_HOME=/usr/local/myspark/scala
    export SPARK_MASTER_IP=192.168.109.191
    export SPARK_WORKER_MEMORY=1g
    export HADOOP_CONF_DIR=/usr/local/myspark/hadoop/etc/hadoop
    
    mv slaves.template slaves
    vi slaves
    
    spark2
    spark3
    #注:Spark需要大量内存进行计算,故主节点不予配置
    

    (5) 使用scp将spark及~/.bashrc配置到另外两台结点
    (6) 启动spark

    sbin/start-all.sh
    

    (7) 内置可视化页面 spark1:8080

猜你喜欢

转载自blog.csdn.net/weixin_38240095/article/details/83515284
今日推荐