Haddop fully distributed cluster structures

Fully distributed build hadoop

Recommendations (encountered pit):

  1. If they use the operating system is linux, I was a deepin system, installed two virtual machines as a result, the user name is not configured, leading to not start, because when the hadoop master node starts, the default user name is the slave master machine username. Therefore, to ensure that the three machines, the user name is the same! !

  2. Debug MapReduce programs in windows will be little, the need to install winutils.

  3. Build a complete recommendation (including environmental, configuration), and then clone or copy a table scp.

  4. If you want even zookeeper and kafka all together, it can also directly configured. Do not install it, then use scp pass, quite area.

  5. Cloning finished modifying hostname!

  6. The whole process a little rough, some details of the encounter, have forgotten records. Welcome advice, if there are problems, please notify, I will promptly correct.

Virtual machine installation

  1. VMware is installed in more than CentOS-7 virtual machine (here installed two slave, because the operating system is deepin my own use)

  2. Transfer files to a virtual machine: I am using scp pass quickly. Users recommend xshell, sftp under windows.

  3. Network configuration of the virtual machine to the default NAT, CentOS minimum installation

Configuration:

  1. Turn off the firewall

    Status firewalld systemctl $             # View firewall state 
    $ systemctl STOP firewalld               # temporarily stop firewall 
    $ systemctl disable firewalld            # prohibit firewall boot
  2. Close SELinux

    Security-Enhanced Linux (Security-Enhanced Linux) referred to SELinux, it is a Linux kernel module, Linux is a security subsystem.

    Vim $ / etc / SELinux / config
     # will SELINUX = enforcing changed SELINUX = disabled
  3. Install ntp time synchronization service

    install yum - the y-ntp
     # Set the boot from Kai 
    systemctl enable ntpd.service    # start 
    systemctl enable ntpd 
    systemctl Status ntpd    # Check whether to activate

    There may not start due to: the chronyd conflict:

     disable chronyd systemctl  : switch off, start ntp, you can

  4. Modify the host name, configure a static ip

    (After cloning, which should make some modifications to change the host name, ip to change what you can)

    Step one: Modify / etc / sysconfig / network-scripts / ifcfg-xxx file

    The main changes the following parameters:

    BOOTPROTO="static"
    ONBOOT="yes"
    IPADDR="172.16.125.128"
    NETMASK="255.255.255.0"
    GATEWAY="172.16.125.2"

    Modify the contents of the file / etc / sysconfig / network: The second part

    Same as above

    # Created by anaconda
    GATEWAY=172.16.125.2
    DNS=172.16.125.2

    The third step: restart the network

    service network restart
  5. Configuring ssh

    master secret can log on to free slave node

    $ ssh-keygen -t rsa

    Open ~ / .ssh following three files

    -rw-r -. r-- 1 root root 392 9 Yue 21:05 authorized_keys 26    # Certified Keys 
    . -rw ------- 1 root root 1679 20:57 id_rsa 9 Yue 26             # private 
    -rw-r -. r-- 1 root root 393 9 Yue 20:57 id_rsa.pub 26         # public
    In the master on the public three machines into authorized_keys years. command:
    $ sudo cat id_rsa.pub >> authorized_keys

    The master authorized_keys placed on the other linux ~ / .ssh directory

    $ sudo scp authorized_keys [email protected]:~/.ssh

      Authorized_keys modify permissions, the command:

    $ chmod 644 authorized_keys
    Whether the test was successful

    ssh host2 enter the user name and password, and then quit, ssh host2 without a password again, directly into the system. This means success.

    If there is at the time of login ssh:

     The authenticity of host 'hadoop2 (192.168.238.130)' can't be established 

    The need to modify the / etc / ssh / ssh_config configuration file, add the following two lines arranged:

    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null
  6. Finally: installation configuration jdk, hadoop

    core stie.xml

    < The Configuration > 
        < Property > 
            < name > fs.defaultFS </ name > <-! Default file system location -> 
            < value > HDFS: // Master: 9000 / </ value > 
        </ Property > 
        < Property > 
            < name > hadoop.tmp.dir </ name > <-! Hadoop working directory, namenode, datanode data -> 
            < value > / Home / WHR / Workbench / Hadoop / data / </ value >
        </property>
    </configuration>

    hdfs.site.xml

    < Configuration > 
        < Property > 
            < name > dfs.replication </ name > <-! Copy number -> 
            < value > 2 </ value > 
        </ Property > 
        < Property > <-! Position of the secondary child node one configuration -> 
            < name > dfs.namenode.secondary.http-address </ name > 
            < value > Slave1: 50090 </ value > 
        </property>
    </configuration>   

    mapreduce.site.xml

    < The Configuration > 
        < Property > <-! Designated mapreduce running cluster with yarn in order to achieve distributed -> 
            < name > mapreduce.framework.name </ name > 
            < value > yarn </ value > 
        </ Property > 
    < / the Configuration >

    yarn.site.xml

    <configuration>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop1</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
    </configuration>

    Slaves configuration file

    slave1
    slave2

    Masters configuration file (CDH version, if any, I use, does not have this file)

    master

clone

  1. vmware cloning nothing to say

  2. Cloning is complete, to modify the host name in the / etc / hostname file!

  3. Add three machines ip address mapping: / etc / hosts

Start the cluster

  1. Format the node (after I formatted a cloned)
Hadoop the NameNode $ - format
  # finally appeared, that success, you can see the final status code is 0 
 Storage Directory / tmp / hadoop-root / the DFS / name has been successfully formatted.
  1. Start: sbin to the directory (the directory can be configured into first sbin environment variable; no need to switch on the directory)

start-dfs.sh
start-yarn.sh
  1. jps command

    After all the startup is complete, view the results :( ssh to log in on the slave master)

    You can see the master of NameNode, two DataNode, a SecondaryNameNode have Booted

    yarn under the ResourceManager, two slave in NodeManager also start to complete

  

  4. This can be accessed through a browser the port 50070

You can run about mapreduce sample program:

$ hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.15.2.jar pi 5 5

Hadoop configuration file

Reference profile

https://www.cnblogs.com/xhy-shine/p/10530729.html

 

 

Guess you like

Origin www.cnblogs.com/mussessein/p/11599405.html