一、安装hadoop-2.5.0-cdh5.3.6 ---------------------------------------------- 1.下载安装包 http://archive.cloudera.com/cdh5/cdh/5/) 2.将hadoop包进行解压缩:tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz 3.对hadoop目录进行重命名: mv /package/hadoop-2.5.0-cdh5.3.6 /soft ln -s /soft/hadoop-2.5.0-cdh5.3.6/ /soft/hadoop 4.配置hadoop相关环境变量 nano ~/.bashrc export HADOOP_HOME=/soft/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin source ~/.bashrc 5.创建/data/hadoop目录 6.修改core-site.xml <property> <name>fs.default.name</name> <value>hdfs://s101:9000</value> </property> 7.修改hdfs-site.xml <property> <name>dfs.name.dir</name> <value>/data/hadoop/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/data/hadoop/datanode</value> </property> <property> <name>dfs.tmp.dir</name> <value>/data/hadoop/tmp</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>s105:9000</value> </property> 8. 修改mapred-site.xml <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> 9. 修改yarn-site.xml <property> <name>yarn.resourcemanager.hostname</name> <value>s101</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> 10.配置slaves s102 s103 s104 二、编写shell脚本 --------------------------------------------- [xcall.sh ] #!/bin/bash params=$@ i=101 for (( i=101 ; i <= 105 ; i = $i + 1 )) ; do tput setaf 2 echo ============= s$i ============= tput setaf 7 ssh -4 s$i "source /etc/profile ; $params" done [xcopy.sh] #!/bin/bash ################# #x copy ################# # argu < 1 , no args if [ $# -lt 1 ] then echo no args! exit fi #get first argument arg1=$1 #get current userName cuser=`whoami` #get fileName fname=`basename $arg1` #get dir dir=`dirname $arg1` #dir= . or dir = /xxx/xx,get absPath if [ "$dir" = "." ] then dir=`pwd` fi for((i=102;i<=105;i=i+1)) do echo ---- coping $arg1 to s$i ---- if [ -d $arg1 ] then scp -r $arg1 $cuser@s$i:$dir else scp $arg1 $cuser@s$i:$dir fi echo done [xrm.sh] #!/bin/bash ################# #x rm ################# # argu < 1 , no args if [ $# -lt 1 ] then echo no args! exit fi #get first argument arg1=$1 #get current userName cuser=`whoami` #get fileName fname=`basename $arg1` #get dir dir=`dirname $arg1` #dir= . or dir = /xxx/xx,get absPath if [ "$dir" = "." ] then dir=`pwd` fi echo ---- rming $arg1 from localhost ---- rm -rf $arg1 echo for((i=102;i<=105;i=i+1)) do echo ---- rming $arg1 to s$i ---- ssh s$i rm -rf $dir/$fname echo done 三、拷贝hadoop到其他机器 ------------------------------------------------ 1.使用xcopy.sh 拷贝hadoop文件夹 2.拷贝环境变量文件 3.在其他机器创建 /data/hadoop目录 xcall.sh "mkdir /data/hadoop" 4.启动hadoop集群 a.格式化namenode:在s101上执行以下命令,hdfs namenode -format b.启动hdfs集群:start-dfs.sh c.验证启动是否成功:jps、50070端口 四、安装 hive-0.13.1-cdh5.3.6 -------------------------------------------- 1.将课程提供的hive-0.13.1-cdh5.3.6.tar.gz 使用WinSCP上传到s101 2.解压缩hive安装包:tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz 3.重命名hive目录:mv hive-0.13.1-cdh5.3.6 hive 4.配置hive相关的环境变量 nano /etc/profile export HIVE_HOME=/soft/hive export PATH=$PATH:$HIVE_HOME/bin source /etc/profile 5.配置配置文件 [配置hive-site.xml] mv hive-default.xml.template hive-site.xml <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://s101:3306/hive_metadata?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> </property> [配置hive-env.sh和hive-config.sh] mv hive-env.sh.template hive-env.sh nano /soft/hive/bin/hive-config.sh export JAVA_HOME=/soft/jdk export HIVE_HOME=/soft/hive export HADOOP_HOME=/soft/hadoop 6.验证hive是否成功安装 五、使用yum在centos-1上安装mysql ------------------------------------------------- 1.在centos-1上安装mysql。 据说centos7没有mysql 的yum源,使用 wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm 从网上下载这个rpm包,下载好rpm包后安装这个包 rpm -ivh mysql-community-release-el7-5.noarch.rpm 安装好这个rpm包后,开始安装mysql-server sudo yum install mysql-server 2.使用yum安装mysql server。 yum install -y mysql-server service mysqld start chkconfig mysqld on 3.使用yum安装mysql connector yum install -y mysql-connector-java 4.将mysql connector拷贝到hive的lib包中 cp /usr/share/java/mysql-connector-java-5.1.17.jar /usr/local/hive/lib 5.在mysql上创建hive元数据库,创建hive账号,并进行授权 create database if not exists hive_metadata; grant all privileges on hive_metadata.* to 'hive'@'%' identified by 'hive'; grant all privileges on hive_metadata.* to 'hive'@'localhost' identified by 'hive'; grant all privileges on hive_metadata.* to 'hive'@'s101' identified by 'hive'; flush privileges; use hive_metadata; 六、搭建zk -------------------------------------- 1.安装zk a.将课程提供的zookeeper-3.4.5-cdh5.3.6.tar.gz使用WinSCP拷贝到sparkproject1的/usr/local目录下。 b.对zookeeper-3.4.5-cdh5.3.6.tar.gz进行解压缩:tar -zxvf zookeeper-3.4.5-cdh5.3.6.tar.gz。 c.对zookeeper目录进行重命名:mv zookeeper-3.4.5-cdh5.3.6 zk。 d.配置zookeeper相关的环境变量 vi ~/.bashrc export ZOOKEEPER_HOME=/soft/zk export PATH=$PATH:$ZOOKEEPER_HOME/bin source ~/.bashrc 2.配置zk cd zk/conf mv zoo_sample.cfg zoo.cfg vi zoo.cfg 修改:dataDir=/usr/local/zk/data 新增: server.0=sparkproject1:2888:3888 server.1=sparkproject2:2888:3888 server.2=sparkproject3:2888:3888 3.设置zk节点标识 cd zk mkdir data cd data vi myid 0 4. 搭建zk集群 a.在另外两个节点上按照上述步骤配置ZooKeeper,使用scp将zk和.bashrc拷贝到spark2和spark3上即可。 b.唯一的区别是spark2和spark3的标识号分别设置为1和2 5. 启动ZooKeeper集群 a.分别在三台机器上执行:zkServer.sh start。 b.检查ZooKeeper状态:zkServer.sh status,应该是一个leader,两个follower c.jps:检查三个节点是否都有QuromPeerMain进程 七、安装scala ------------------------------------- 1、将课程提供的scala-2.11.4.tgz使用WinSCP拷贝到sparkproject1的/usr/local目录下。 2、对scala-2.11.4.tgz进行解压缩:tar -zxvf scala-2.11.4.tgz。 3、对scala目录进行重命名:mv scala-2.11.4 scala 4、配置scala相关的环境变量 vi ~/.bashrc export SCALA_HOME=/usr/local/scala export PATH=$SCALA_HOME/bin source ~/.bashrc 5、查看scala是否安装成功:scala -version 6、按照上述步骤在sparkproject2和sparkproject3机器上都安装好scala。使用scp将scala和.bashrc拷贝到另外两台机器上即可。 八、安装kafka ------------------------------------- 1、将课程提供的kafka_2.9.2-0.8.1.tgz使用WinSCP拷贝到sparkproject1的/usr/local目录下。 2、对kafka_2.9.2-0.8.1.tgz进行解压缩:tar -zxvf kafka_2.9.2-0.8.1.tgz。 3、对kafka目录进行改名:mv kafka_2.9.2-0.8.1 kafka 4、配置kafka vi /usr/local/kafka/config/server.properties broker.id:依次增长的整数,0、1、2,集群中Broker的唯一id zookeeper.connect=192.168.1.105:2181,192.168.1.106:2181,192.168.1.107:2181 5、安装slf4j 将课程提供的slf4j-1.7.6.zip上传到/usr/local目录下 unzip slf4j-1.7.6.zip 把slf4j中的slf4j-nop-1.7.6.jar复制到kafka的libs目录下面 6.启动kafka集群 a.解决kafka Unrecognized VM option 'UseCompressedOops'问题 vi /usr/local/kafka/bin/kafka-run-class.sh if [ -z "$KAFKA_JVM_PERFORMANCE_OPTS" ]; then KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true" fi 去掉-XX:+UseCompressedOops即可 b.在三台机器上的kafka目录下,分别执行以下命令:nohup /soft/kafka/bin/kafka-server-start.sh /soft/kafka/config/server.properties & c.使用jps检查启动是否成功 7. 测试kafka集群 使用基本命令检查kafka是否搭建成功 /soft/kafka/bin/kafka-topics.sh --zookeeper s101:2181,s101:2181,s103:2181 --topic TestTopic --replication-factor 1 --partitions 1 --create /soft/kafka/bin/kafka-console-producer.sh --broker-list s101:9092,s102:9092,s103:9092 --topic TestTopic /soft/kafka/bin/kafka-console-consumer.sh --zookeeper s101:2181,s101:2181,s103:2181 --topic TestTopic --from-beginning 九、安装flume ------------------------------------------------ 1.安装flume a.将课程提供的flume-ng-1.5.0-cdh5.3.6.tar.gz使用WinSCP拷贝到sparkproject1的/usr/local目录下。 b.对flume进行解压缩:tar -zxvf flume-ng-1.5.0-cdh5.3.6.tar.gz c.对flume目录进行重命名:mv apache-flume-1.5.0-cdh5.3.6-bin flume d.配置scala相关的环境变量 vi ~/.bashrc export FLUME_HOME=/usr/local/flume export FLUME_CONF_DIR=$FLUME_HOME/conf export PATH=$FLUME_HOME/bin source ~/.bashrc 2.配置flume vi /usr/local/flume/conf/flume-conf.properties #agent1表示代理名称 agent1.sources=source1 agent1.sinks=sink1 agent1.channels=channel1 #配置source1 agent1.sources.source1.type=spooldir agent1.sources.source1.spoolDir=/usr/local/logs agent1.sources.source1.channels=channel1 agent1.sources.source1.fileHeader = false agent1.sources.source1.interceptors = i1 agent1.sources.source1.interceptors.i1.type = timestamp #配置channel1 agent1.channels.channel1.type=file agent1.channels.channel1.checkpointDir=/usr/local/logs_tmp_cp agent1.channels.channel1.dataDirs=/usr/local/logs_tmp #配置sink1 agent1.sinks.sink1.type=hdfs agent1.sinks.sink1.hdfs.path=hdfs://s101:9000/logs agent1.sinks.sink1.hdfs.fileType=DataStream agent1.sinks.sink1.hdfs.writeFormat=TEXT agent1.sinks.sink1.hdfs.rollInterval=1 agent1.sinks.sink1.channel=channel1 agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d 3.测试flume flume-ng agent -n agent1 -c conf -f /soft/flume/conf/flume-conf.properties -Dflume.root.logger=DEBUG,console 新建一份文件,移动到/usr/local/logs目录下,flume就会自动上传到HDFS的/logs目录中 十、安装Spark ----------------------------------------- 1.安装spark a.将spark-1.5.1-bin-hadoop2.4.tgz使用WinSCP上传到/usr/local目录下。 b.解压缩spark包:tar -zxvf spark-1.5.1-bin-hadoop2.4.tgz。 c.重命名spark目录:mv spark-1.5.1-bin-hadoop2.4 spark d.修改spark环境变量 vi ~/.bashrc export SPARK_HOME=/usr/local/spark export PATH=$SPARK_HOME/bin export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib source ~/.bashrc 2.修改spark-env.sh文件 a.cd /usr/local/spark/conf b.cp spark-env.sh.template spark-env.sh c.vi spark-env.sh export JAVA_HOME=/soft/jdk export SCALA_HOME=/soft/scala export HADOOP_HOME=/soft/hadoop export HADOOP_CONF_DIR=/soft/hadoop/etc/hadoop 3.用yarn-client模式提交spark作业 /soft/spark/bin/spark-submit \ --class org.apache.spark.examples.JavaSparkPi \ --master yarn-client \ --num-executors 1 \ --driver-memory 10m \ --executor-memory 10m \ --executor-cores 1 \ /soft/spark/lib/spark-examples-1.5.1-hadoop2.4.0.jar \ 4. 用yarn-cluster模式提交spark作业 /soft/spark/bin/spark-submit \ --class org.apache.spark.examples.JavaSparkPi \ --master yarn-cluster \ --num-executors 1 \ --driver-memory 10m \ --executor-memory 10m \ --executor-cores 1 \ /soft/spark/lib/spark-examples-1.5.1-hadoop2.4.0.jar \
从零开始搭建企业CDH大数据平台(二)-- CDH 5.3.6集群搭建篇
猜你喜欢
转载自blog.csdn.net/xcvbxv01/article/details/86314954
今日推荐
周排行