Hadoop environment construction

1. Stand-alone mode

       1. Create a new virtual machine

             Install centos-12 system, default root user

        2. Create a new user

               useradd centos, set password

               Grant centos user permissions in the /etc/sudoers file, chmod -v u+w /etc/sudoers; centos ALL=(ALL) ALL;       

                                               chmod -v uw /etc/sudoers; centos users need to add sudo to execute commands;   

         3. Configure the network and set the hostname

                 vi /etc/sysconfig/network-scripts/ifcfg-eth0 , set ONBOOT=yes , ping command to verify
                 vi /etc/hostname: Set the hostname, if the hostname file does not exist, create it yourself
                 vi /etc/hosts : make hostnames match IP addresses

             4. Install jdk and hadoop

                 Download xshell and xftp to connect to the virtual machine, and transfer the jdk and hadoop installation packages to the virtual machine

                 Decompression: tar -xzvf , copy the decompressed files (decompression is installed) to your own directory: jdk: /usr/java hadoop: /opt/

              5. Configure the environment /etc/profile

                 export JAVA_HOME=/usr/java/jdk1.7.0_80
                 export PATH=$JAVA_HOME/bin:$PATH

                 export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

                 HADOOP_HOME=/opt/hadoop-2.7.5
                 PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH

                 To make the configuration file take effect, the source command

              6. hadoop fs -ls View hadoop files is native, that is, stand-alone mode


2. Pseudo-distributed mode

     1. Operate on the basis of stand-alone mode, under /opt/hadoop-2.7.5/etc/, copy a pseudo-distributed file: cp -R hadoop hadoop_virtual

           Modify the file name of hadoop;    specify the cluster: ln -s hadoop_virtual hadoop (the system will map hadoop by default, so that it points to the pseudo-distributed file)
       2. Edit the JAVA_HOME value of /opt/hadoop-2.7.5/etc/hadoop_virtual/hadoop-env.sh, $JAVA_HOME=/usr/java/jdk1.7.0_80
       3. Configure the files  under /opt/hadoop-2.7.5/etc/hadoop_virtual
          vi core-site.xml : (specify the path of hadoop.tmp.dir by yourself, the file must exist and the permission is 777) --- I didn't set it up when I built it, and no error was reported!

                <property>
                        <name>fs.defaultFS</name>
                        <value>hdfs://localhost/</value>
                </property>
                <property>
                          <name>hadoop.tmp.dir</name>
                          <value>/usr/hadoop_tmp</value>
                </property>

         vi hdfs-site.xml

                     <property>
                            <name>dfs.replication</name>
                            <value>1</value>
                      </property>

           vi mapred-site.xml

                       <property>
                                <name>mapreduce.framework.name</name>
                                <value>yarn</value>
                      </property>

            vi yarn-site.xml
                     <property>
                            <name>yarn.resourcemanager.hostname</name>
                            <value>localhost</value>
                    </property>
                    <property>
                            <name>yarn.nodemanager.aux-services</name>
                            <value>mapreduce_shuffle</value>
                     </property>
   

             4. Configure password-free login ssh

               ssh-keygen -t rsa: generate public key and key
              public key and key path: ls ~/.ssh , copy the public key to authorized_keys under /home/centos/.ssh/, cat id_rsa.pub >> authorized_keys
              settings File permissions password-free operation: chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys
              
             5. Format HDFS and restart   

                hdfs purpose-format

             6. Start the Hadoop cluster

                 /opt/hadoop-2.7.5/sbin/start-all.sh , jsp command verification


3. Fully distributed mode   

       1. Operate on the basis of pseudo-distributed mode, under /opt/hadoop-2.7.5/etc/, copy a fully distributed file: cp -R adoop_virtual hadoop_cluster

          Delete the file name of hadoop; re- specify the cluster: ln -s hadoop_cluster hadoop (here I just want to coexist and distinguish these three modes, so if you want to copy and re-point something, you can actually configure it directly on the original basis, and finally configure it as Complete distributed environment)

       2. Clone three virtual machines (s1, s2, s3), configure the network and user name (this is because the mac address has changed due to migration, and the ip has also changed. The solution will be explained at the end)

       3. Configure /etc/hosts in the host (s0) so that each machine can communicate with each other, ping to verify

              192.168.176.137     s0           

                   192.168.176.138s1
                   192.168.176.139s2
                   192.168.176.140s3

         4. Remotely copy /etc/hosts in the host (s0) to each machine:   

             sudo scp /etc/hosts [email protected]:/etc/ , ssh s1 verification Each host can log in to each other without password

       5. Host configuration files under /opt/hadoop-2.7.5/etc/hadoop_cluster (similar to pseudo-distribution, s0 master node namenode, s1, s2 slave node datanode, s3 is a copy of namenode)

          vi core-site.xml  :   

                <property>
                        <name>fs.defaultFS</name>
                        <value>hdfs://s0/</value>
                </property>
                <property>
                          <name>hadoop.tmp.dir</name>
                          <value>/usr/hadoop_tmp</value>
                </property>

         vi  hdfs-site.xml 

                     <property>
                            <name>dfs.replication</name>
                            <value>2</value>
                      </property>

           vi  mapred-site.xml : 

                       <property>
                                <name>mapreduce.framework.name</name>
                                <value>yarn</value>
                      </property>

            vi  yarn-site.xml 
                     <property>
                            <name>yarn.resourcemanager.hostname</name>
                            <value>s0</value>
                    </property>
                    <property>
                            <name>yarn.nodemanager.aux-services</name>
                            <value>mapreduce_shuffle</value>
                     </property>

           

             6.配置主机/opt/hadoop-2.7.5/etc/hadoop_cluster 下的slaves文件为:

                              s1
                              s2 

             7.远程拷贝主机(s0)中的/opt/hadoop-2.7.5/etc/hadoop_cluster到各个机器:

                   sudo  scp -r  hadoop_cluster   centos@s3:/opt/hadoop-2.7.5/etc/    (已存在文件夹会覆盖);      ssh切换到各                                                                                                                                                        个机器查看文件是否更改

            

            8.格式化HDFS,重启   

                hdfs namenode -format

            9. 启动Hadoop集群

                 /opt/hadoop-2.7.5/sbin/start-all.sh   ,     jsp命令验证(显示的与伪分布式是有区别的,s0上显示namenode相                                                                                                                                         关的,s1、s2显示datanode相关)





Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324846556&siteId=291194637