hadoop-2.8.0 Fully distributed mode of operation

1. The virtual machine is ready

  Prepare three clients (installation JDK8, turn off the firewall, static ip, host name)

  Modify etc / hosts file

192.168.138.102 hadoop102
192.168.138.103 hadoop103
192.168.138.104 hadoop104

2. Write a cluster distribution script xsync

  2.1 scp (secure copy) secure copy

    2.1.1 scp defined

      scp data copy may be implemented between the server and the server;

    2.1.2 The basic syntax

-r $ pdir scp / $ fname $ $ hadoop the User @ Host: $ pdir / $ fname 
command recursively to copy the file path / name of the destination user @ host: destination path / name

    2.1.3 On 192.168.138.55 host, software at the 192.168.138.187 / opt / module to copy the directory 192.168.138.66;

scp -r 192.168.138.55:/opt/module/hadoop 192.168.138.66:/opt/module

    2.1.4 On 192.168.138.77 host, under software / opt on 0192.168.138.55 server / module to copy the directory 192.168.138.77;

scp -r 192.168.138.55:/opt/module/hadoop 192.168.138.77:/opt/module

  2.2 rsync synchronization tool Yuncheng

    rsync remote synchronization tool, mainly for backup and mirroring. A block having a speed to avoid copying the same contents and advantages of the support symbolic links;

    rsync and scp difference: do with rsync to copy files faster than scp, rsync only difference file to do the update. scp is so files are copied to the past;

    Rsync 2.2.1 View instructions

man rsync | more

    2.2.2 The basic syntax

-rvl $ pdir rsync / $ fname $ $ hadoop the User @ Host: $ pdir 
file path # Command parameter to be copied / name of the destination user @ host: destination path

    2.2.3 Option Description

      

    2.2.4 the / opt / sodtware directory on machine 55 to synchronize the / opt directory servers 66

rsync -rvl /opt/software/* 192.168.138.66:/opt/software/

    2.2.5 original copy

rsync  -rvl     /opt/module    192.168.138.77:/opt/

    2.2.6 script to achieve

! # / bin / the bash 
# . 1 acquires the number of input parameters, without parameters, exit  Pcount = $ # IF ((Pcount == 0 )); the then echo NO args; Exit; Fi # 2 acquires the file name p1 = $ . 1 fname = `$ p1` the basename echo fname = $ # fname . 3 acquires the absolute path of the parent directory to the CD` = PDir - P $ ($ dirname P1); pwd` echo PDir = $ # PDir . 4 acquires the name of the current user = user ` # whoami` . 5 cycles for ((Host = 102; Host <104; ++ Host)); do echo --------------------- Host Hadoop $ ---- ------------ rsync -rvl $ pdir / $ fname $ $ hadoop the User @ Host: $ pdir DONE

    2.2.7 modify the script has execute permissions xsync

chmod 777 xsync

    2.2.8 execute the script file

./xsync xsync 

3. cluster configuration

  1. Cluster Deployment Planning

          hadoop55     hadoop66    hadoop77

    HDFS    NameNode DataNode SecondaryNameNode

          DataNode DataNode

    YARN   NodeManager    ResourceManager

                    NodeManager    NodeManager

  2. Configure the cluster (three machines)

    2.1 core-site.xml configuration file [/ hadoop / etc / hadoop directory]

<the Configuration> 
     <! - the NameNode IP address and port -> 
     <Property> 
          <name> fs.defaultFS </ name> 
          <value> HDFS: // hadoop102: 9000 </ value> 
     </ Property> 
      ! <- - generating a file storage directory specified run hadoop -> 
      <Property> 
           <name> hadoop.tmp.dir </ name> 
           <value> / opt / Module1 / hadoop / Data / TEMP </ value> 
      </ Property> 
< / configuration>

    2.2 hdfs-site.xml configuration file

<configuration>
     <!--指定HDFS副本的数量-->
     <property>
         <name>dfs.replication</name>
       <value>3</value>
     </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>hadoop104:50090</value>
    </property>
</configuration>

    2.3 configuration file hadoop-env.sh

# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64

    2.4 configuration file yarn-env.sh

# some Java parameters
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64

    2.5 Configuring yarn-site.xml file

<configuration>
<!-- Site specific YARN configuration properties -->
     <!-- Reducer获取数据的方式 -->
     <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
     </property>
    <!-- 指定YARN的ResourceManager的地址 -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop103</value> </property> </configuration>

    2.6 Configuring mapred-env.sh ask price

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64

    2.7 mapred-site.xml configuration file

<configuration>
<!-- 指定MR运行在YARN上 -->
     <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
     </property> </configuration>

    2.8 Distribution configured hadoop configuration files on the cluster

./xsync /opt/module/hadoop/

4. Start a single point cluster

  4.1 If the cluster is the first time you start, you need to format NameNode

bin / hdfs namenode -format

    

  4.2 NameNode start on hadoop55

hadoop-daemon.sh start namenode

  4.3 hadoop55, hadoop66, start DataNode respectively on hadoop77

hadoop-daemon.sh start datanode

  4.4 start SecondaryNameNode on hadoop77

hadoop-daemon.sh start secondarynamenode

  4.5 Access

    

  Under 4.6 hadoop fully distributed environment, DataNode normal start, but the display on the page nodes DataNode

    Solutions are as follows:

      1. Check the / etc / hosts is configured in all mappings from the node hostname to ip;

      2. Modify hafs.site.xml namenode files on the machine, after joining configuration, DataNode restart

<property>
     <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
     <value>false</value>
</property>

5.SSH no secret login configuration

  5.1 Configuring ssh

    5.1.1 ssh basic principles

      SSH has been able to ensure security, because it uses a public key encryption, as follows:

        1. remote host receives the user's login request, his own public key to the user;

        2. The user using this key after the password encrypted, sent back;

        3. remote host with their own private key to decrypt password, if the password is correct, agreed to user login;

    5.1.2 The basic syntax

      If the user name java, login to remote host name is Linux, the command is as follows:

        $ssh java@linux

      SSH default port is 22, that is, your login request will be sent into the 22 port of the remote host. Using parameter p, the port can be modified, for example, to port 88, the following command:

        $ssh -p 88@linux

      Note: If an error occurs provide: ssh: Could not resolve hostname linux: Name or service not known, it is because Linux host to add this host Name Server in, it does not recognize the need in / etc / hosts Lane added to the host and the corresponding IP can:

        linux  192.168.138.102

  5.2 No key configuration

    5.2.1 The principle of free dense Login

      When the Master as a client, to be achieved without cryptographic public key authentication, the server is connected to the take salve, a need to generate the master key pair comprising a public key and a private key, the public key and after it is copied to the salve Dun . When the master is connected via ssh salve, Salve will generate a random number and encrypts the random number using the master public key, and sends the master. After the master receives an encrypted private key to decrypt and then number and confirmation number back to the slave number is correct decryption allows to connect the master after decryption. This is a public key certification process, which does not require the user to manually enter a password.

    5.2.2 generated on the master host (hadoop102) without a cryptographic key pair  

ssh-keygen -t rsa

      When asked directly enter a path after running to save it, the default path;

      The generated key pair: id_rsa (private key), id_rsa.pub (public key), are stored in the default '/ Username /.ssh' directory;

      

    View 5.2.3 Key Pair

cd .ssh

      

    5.2.4 public key on the master (hadoop102) node transmits to the remote host

ssh-copy-id hadoop103

      

      Check whether the transfer was successful hadoop103

      

    5.2.5 Test No password

ssh hadoop103

      

  5.3 ssh folder file features explained

    (1) known_hosts: ssh recording visited public computer (public Key)
    (2) id_rsa: generating a private key
    (3) id_rsa.pub: generating a public key
    (4) authorized_keys: no adhesion to store authorization passes login server public Key

6. rallied cluster

  6.1 Configuration Slaves

cd /opt/module/hadoop/etc/hadoop/
vim slaves

    

    Once configured, distributed to other nodes

./xsync /opt/module/hadoop/etc/hadoop/

  Start Cluster 6.2

    6.2.1 If the first start, you need to format NameNode

bin / hdfs namenode -format

    6.2.2 start HDFS on hadoop102 machine

start-dfs.sh

      

      

      

      

    6.2.3 Start yarn on hadoop103 machine

start-yarn.sh

      

       

      

      

    6.2.4 Access

      HDFS:http://192.168.138.102:50070

        

      YARN:http://192.168.138.103:8088

        

7. Start program to test MapReduce

  7.1 Create a file in the directory folder named hadoop wcinput

mkdir wcinput

  7.2 wc.input create a file in the folder and compile wcinput

cd wcinput
touch wc.input
vim wc.input

    

  7.3 returns / opt / module / hadoop directory

  7.4 executor

hadoop fs -put wcinput /

    

  7.5 implementation of MapReduce programs

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.0.jar wordcount /wcinput /wcoutput

    

  7.6 查看运行结果

hadoop fs -cat /wcoutput/*

8.集群启动/停止方式总结

  8.1 各个组件逐一启动/停止

    8.1.1 分别启动/停止hdfs组件

hadoop-daemon.sh  start|stop  namenode|datanode|secondarynamenode

    8.1.2 分别启动/停止yarn组件

yarn-daemon.sh  start|stop  resourcemanager|nodemanager

  8.2 各个模块分开启动/停止(配置ssh前提)常用

    8.2.1 整体启动/停止hdfs

start-dfs.sh 
stop-dfs.sh

    8.2.2 整体启动/停止yarn

start-yarn.sh 
stop-yarn.sh

 9.集群时间同步

  时间同步的方式:找一个机器,作为时间服务器,所有的机器与这台集群时间进行定时的同步,比如,每隔十分钟,同步一次时间。

  9.1 检查是否安装

rpm -qa|grep ntp

    

  9.2 查看ntpd服务是否开启 

service ntpd status

    

    如果开启需要关闭,然后进行下面的操作;

  9.3 修改ntp配置文件

vim /etc/ntp.conf

    9.3.1 修改1(授权192.168.1.0网段上的所有机器可以从这台机器上查询和同步时间)

#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap为
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

    9.3.2 修改2(集群在局域网中,不使用其他的网络时间)

      

    9.3.3 添加3(当该节点丢失网络连接,依然可以作为时间服务器为集群中的其他节点提供时间同步)

server 127.127.1.0
fudge 127.127.1.0 stratum 10

  9.4 修改/etc/sysconfig/ntpd文件 

vim /etc/sysconfig/ntpd

    增加内容如下(让硬件时间与系统时间一起同步)

SYNC_HWCLOCK=yes

  9.5 重新启动ntpd服务

service ntpd start

  9.6 设置ntpd服务器开启启动

    9.6.1 在其他机器上配置1分钟与时间服务器同步一次

crontab -e

      编写内容如下:

*/1 * * * * /usr/sbin/ntpdate hadoop102

    9.6.2 修改任意机器的时间

date -s "2017-9-11 11:11:11"

    9.6.3 一分钟后查看机器是否与时间服务器同步

date

      

 

Guess you like

Origin www.cnblogs.com/wnwn/p/12521746.html