大数据——Hadoop集群的搭建

hadoop环境搭建

注:三个重要网址
    hadoop下载地址:https://archive.apache.org/dist/hadoop/common/hadoop-2.5.0/
    hadoop官方网址:hadoop.apache.org
    hadoop环境搭建网址:https://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/SingleCluster.html

Hadoop 2.x 部署

        * Local Mode
        * Distributed Mode
        * 伪分布式
                一台机器,运行所有的守护进程,
                从节点DataNode、NodeManager
        * 完全分布式
                有多个从节点
                DataNodes
                NodeManagers
        配置文件
                $HADOOP_HOME/etc/hadoop/slaves

  1. 下载软件:jdk1.8、hadoop2.5.0(都是Linux版本的根据Linux的位数选择相应的软件位数)

  2. 把软件通过FileZilla传到Linux /apps下

  3. 安装jdk

    环境变量的配置(Linux下命令)
    vi /etc/profile
    在最后一行加入
    export JAVA_HOME=/apps/java/jdk
    export PATH=$PATH:$JAVA_HOME/bin
    退出,执行source /etc/profile
    
  4. 安装hadoop 2.5.0对应jdk1.7或者以上

  5. 三台机器的配置

主机 配置
192.168.0.129 CnetOS1(主机名)&1.5G&1 CPU
192.168.0.130 CnetOS2(主机名)&1G&1 CPU
192.168.0.131 CnetOS3(主机名)&1G&1 CPU
  1. 配置映射

    /etc/hosts(包括windows本机的hosts位置:C:\Windows\System32\drivers\etc)
    	192.168.0.129		www.shy1.com	CentOS1
    	192.168.0.130		www.shy2.com	CentOS2
    	192.168.0.131		www.shy3.com	CentOS3
    
  2. 分布式集群在三台机器上的部署

    		CentOS1					CentOS2					CentOS3
    HDFS
    		NameNode
    		DataNode 				DataNode 				DataNode
    														SecondaryNameNode
    														
    YARN
    								ResourceManager
    		NodeManager 			NodeManager 			NodeManager
    		
    
    MapReduce
    
    		JobHistoryServer	
    
  3. 配置文件

    • hdfs
      • hadoop-env.sh
        export JAVA_HOME=/apps/java/jdk1.8

      • core-site.xml

         <configuration>
             <property>
                 	<name>fs.defaultFS</name>
                	 <value>hdfs://www.shy1.com:8020</value>
             </property>
             <property>
                 <name>hadoop-tmp.dir</name>
                 <value>/apps/hadoop/hadoop-2.5.0/data/tmp</value>
             </property>
             <property>
                 <name>fs.trash.interval</name>
                 <value>420</value>
             </property>
         </configuration>
        
      • hdfs-site.xml

         <configuration>
         		<property>
         		<name>dfs.namenode.secondary.http-address</name>
         		 <value>www.shy3.com:50090</value>
         		</property>
         </configuration>
        
      • slaves

         CentOS2
         CentOS3
         CentOS4
        
    • yarn
      • yarn-env.sh
        export JAVA_HOME=/apps/java/jdk1.8

      • yarn-site.xml

         <configuration>
         	<property>
                 <name>yarn.resourcemanager.hostname</name>
                 <value>CentOS3</value>
             </property>
         
             <property>
                 <name>yarn.nodemanager.aux-services</name>
                 <value>mapreduce_shuffle</value>
             </property>
         
             <property>
                 <name>yarn.nodemanager.resource.memory-mb</name>
                 <value>4096</value>
             </property>
         
             <property>
                 <name>yarn.nodemanager.resource.cpu-vcores</name>
                 <value>4</value>
             </property>
         
             <property>
                 <name>yarn.log-aggregation-enable</name>
                 <value>true</value>
             </property>
         
             <property>
                 <name>yarn.log-aggregation.retain-seconds</name>
                 <value>640800</value>
             </property>
         </configuration>
        
      • slaves

         CentOS2
         CentOS3
         CentOS4
        
    • mapredue
      • mapred-env.sh
        export JAVA_HOME=/apps/java/jdk1.8

      • mapred-site.xml

         <configuration>
         	    <property>
         	        <name>mapreduce.framework.name</name>
         	        <value>yarn</value>
         	    </property>
         	
         	    <property>
         	        <name>mapreduce.jobhistory.address</name>
         	        <value>www.shy1.com:10020</value>
         	    </property>
         	
         	    <property>
         	        <name>mapreduce.jobhistory.webapp.address</name>
         	        <value>www.shy1.com:19888</value>
         	    </property>
         	</configuration>
        

9.把这个hadoop发送到另外的两台机器上去环境搭建完成
10.测试集群

* 基本测试
		启动服务:(在hadoop目录下 )
			先启动hdfs文件系统
				sbin/start-dfs.sh
			在启动yarn(这个是在第二台机器上启动的,也就是resourcemanager所在的机器)
				sbin/start-yarn.sh
			最后启动jobhistoryserver
				sbin/mr-jobhistory-daemon.sh start historyserver
			jps看一下进程是否都已经启动起来了,如果没有就手动进行启动
			打开浏览器输入配置了namenode节点的地址和50070端口号进行访问。如果不成功请关闭所有虚拟机的防火墙
		服务启动,是否可用,简单的应用
		* hdfs
			读写操作
			bin/hdfs dfs -mkdir -p /user/beifeng/tmp/conf
			bin/hdfs dfs -put etc/hadoop/*-site.xml /user/beifeng/tmp/conf
			bin/hdfs dfs -text /user/beifeng/tmp/conf/core-site.xml
		* yarn
			run jar
		* mapreduce
			bin/yarn jar share/hadoop/mapreduce/hadoop*example*.jar wordcount /user/beifeng/mapreuce/wordcount/input /user/beieng/mapreduce/wordcount/output

启动方式

* 各个服务组件逐一启动
	* hdfs
		hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode
	* yarn
		yarn-daemon.sh start|stop resourcemanager|nodemanager
	* mapreduce
		mr-historyserver-daemon.sh start|stop historyserver
* 各个模块分开启动
	* hdfs
		start-dfs.sh
		stop-dfs.sh
	* yarn
		start-yarn.sh
		stop-yarn.sh
* 全部启动
	* start-all.sh
	* stop-all.sh

猜你喜欢

转载自blog.csdn.net/qq_40395687/article/details/84942939