Hadoop environment learning foundation to build

The intended purpose

Based VMware workstation 10.0 + CentOS 7 + hadoop 3.2.0, build Hadoop cluster environment on a virtual machine, comprising a total of four nodes, wherein a master node, three slave nodes.

During operation

 

Steps to create a virtual machine, install the system

Advance installed VMware workstation 10, download 7 good CentOS image file on the computer. Concrete steps will not be repeated, we have talked about a few places during the installation of the system needs attention:

  1. Select minimize installation
  2. The default network is not off the Internet, set up a network installation open
  3. No need to install Vmware Tools
  4. Remember that the root password and create an account settings, passwords

If the network is not open to install, can be installed after the system is completed manually open:

# Edit NIC configuration file 
vi / etc / sysconfig / network- scripts / ifcfg-ens33

Add or modify a ONBOOT = "yes" in the configuration file, as the name implies, the NIC boot. Then restart the network services:

service network restart

Step two host through a host configured to log on to the virtual machine SecureCRT

Vmware virtual machine in the room, switching between virtual machines and the host too much trouble, here by means of SecureCRT log in to the virtual machine via SSH, this operation much more comfortable.
Configuration is very simple, on-line to find SecureCRT green version, installed on a host computer, create a new Session, SSH2 protocol default on the line, Hostname IP fill the virtual machine you want to connect, enter the account connection, a password to log on to the virtual machine .

Step Three Basic Configuration

  1. Creating a user operation hadoop content hadoop_user
    # switch to the root user
    SU
    # Create a user
    the useradd hadoop_user
    # hadoop_user user to set a password
    passwd hadoop_user
  2. Install the JDK
    2.1 copy, extract jdk
    originally wanted by wget to download jdk, but unfortunately Oracle must now sign in jdk download, only to give up directly in the virtual machine.
    Here selecting a host computer to download jdk good, and then sends the packet through SecureFX jdk compressed virtual machine.
    # Switch to the root user 
    su 
    # created to put the jdk directory 
    mkdir / opt / Software / jdk / 
    # jdk to extract the directory just created 
    tar zxvf /home/zhq/jdk-8u211-linux-x64.tar.gz -C / opt / Software / jdk / 
    # Add the directory where the file to execute jdk environment variable 
    vi / etc / profile
    2.2 To Add jdk environment variable
    in the profile file add the following sentence at the end to save and exit
    export JAVA_HOME=/opt/software/jdk/jdk1.8.0_211
    export PATH=$PATH:$JAVA_HOME/bin
    # The environment variable configuration to take effect 
    Source / etc / Profile 
    # JDK directory will be changed to your user hadoop_user, so that after use 
    chown hadoop_user: hadoop_user / opt / software / jdk /
    

      

  3. Install net-tools
    As we adopt is to minimize installation, there are some common tools and systems have not, here we manually install them.
    su
    yum install -y net-tools
  4. Configuring Hostname and Hosts
    Set Hostname to master
    hostnamectl set-hostname master

    Configuring Hosts (which slave1, slave2, salve3 three hosts currently do not exist, we press VMware virtual IP address incremented estimated these three rules of IP, Step Five were found after the IP discrepancies come back here accordingly Adjustment)
    # Edit the hosts file 
    su 
    vi / etc / hosts
    Add the following to the end of the hosts file:
    192.168.212.132  master
    192.168.212.133  slave1
    192.168.212.134  slave2
    192.168.212.135  slave3

Step four Hadoop configuration

  1. Installation hadoop

    1.1 Copy, hadoop decompressed
    or compressed using hadoop SecureFX the master transmits in the virtual machine. After the transmission is completed to perform operations in a virtual machine:

    # Switch to the root user 
    su 
    # to create a place hadoop directory 
    mkdir / opt / Software / hadoop / 
    # jdk to extract the directory just created 
    tar zxvf /home/zhq/hadoop-3.2.0.tar.gz -C / opt / software / hadoop / 
    # Add the directory where the file to execute jdk environment variable 
    vi / etc / profile
    1.2 In order to add an environment variable hadoop
    after adding the following statement at the end of the profile file save and exit
    export HADOOP_HOME=/opt/software/hadoop/hadoop-3.2.0
    export PATH=$PATH:$HADOOP_HOME/bin
    # The environment variable configuration to take effect 
    Source / etc / Profile 
    # hadoop directory belongs to the user instead hadoop_user, so that after use 
    chown hadoop_user: hadoop_user / opt / software / hadoop /
  2. Hadoop environment configuration parameters
    Note, master as namenode, some datanode only need to configure this item can not fit, but do not copy one by one to the back of the virtual machine cluster nodes modify, written on here redundancy, redundant portions tentatively italics logo.
    2.1 JAVA_HOME disposed in the yarn_env.sh hadoop-env.sh and
    add the following code to the end of the above two documents:

    export JAVA_HOME=/opt/software/jdk/jdk1.8.0_211
    

    2.2 edit core-site.xml, add the following items arranged in the configuration file used to specify the cluster system hdfs, namenode is master.

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://master:9000</value>
    </property>

    2.3 edit hdfs-site.xml, add the following configuration items stored in the configuration to specify the number of copies hdfs 2, the relevant file is stored in the path namenode / hdfs_storage / name /, local file storage path is datanode / hdfs_storage / data / (ps: as used herein directory remember to manually create it in advance).

    <property>
      <name>dfs.replication</name>
      <value>2</value>
    </property>
    
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>/hdfs_storage/name/</value>
    </property>
    
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>/hdfs_storage/data/</value>
    </property>

    2.4 edit mapred-site.xml, add the following configuration items in the configuration used to specify the frame mapreduce job performed as yarn, job history access address to master: 10020, job history webapp address master: 19888.

    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
    
    <property>
      <name>mapreduce.jophistory.address</name>
      <value>master:10020</value>
    </property>
    
    <property>
      <name>mapreduce.jophistory.webapp.address</name>
      <value>master:19888</value>
    </property>

    2.5 edit yarn-site.xml, added in the configuration as follows yarn entry to specify the address of the resource manager master: 8032, scheduler resource manager is the address master: 8030, resource tracking resource manager master address : 8031, Explorer administrator address master: 8033, webapp access address resource manager for the master: 8088, mixed MapReduce programs used to wash services mapreduce_shuffle.

    <property>
      <name>yarn.resourcemanager.scheduler.address</name>
      <value>master:8030</value>
    </property>
    
    <property>
      <name>yarn.resourcemanager.resource-tracker.address</name>
      <value>master:8031</value>
    </property>
    
    <property>
      <name>yarn.resourcemanager.address</name>
      <value>master:8032</value>
    </property>
    
    <property>
      <name>yarn.resourcemanager.admin.address</name>
      <value>master:8033</value>
    </property>
    
    <property>
      <name>yarn.resourcemanager.webapp.address</name>
      <value>master:8088</value>
    </property>
    
    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
    </property>

      

Step five cloned remaining hosts in the cluster

Here we use function clone VMware virtual machines, the cloned slave1, slave2, slave3 on the basis of the master.

Step Six configure ssh login-free secret

  1. Rsa to generate a secret key

    Log on to hadoop_user to master, create the .ssh directory in / home / hadoop_user / and create a key pair:

    cd / Home / hadoop_user / 
    # create the .ssh directory 
    mkdir .ssh 
    cd .ssh 
    # ssh create rsa encryption key pair 
    ssh-keygen - t rsa  after # all the way round, it creates a pair of keys file directory under .ssh id_rsa and id_rsa.pub.

     

  2. Copied to the public key slave1, and added to a trusted public key
    in master operates as follows:
    scp id_rsa.pub hadoop_user @ slave1: / Home / hadoop_user / 
    # hadoop_user user input slave1 in the password, copy the success of

    In slave1 operation as follows:

    cd / Home / hadoop_user /
     mkdir . SSH 
    # Add the master's public key to their list of trusted 
    CAT id_rsa.pub >>. SSH / authorized_keys 
    # adjust file permissions 
    chmod  700 . SSH 
    chmod  600 . SSH / authorized_keys

     

  3. Test whether you can avoid dense login
    input in master  SSH slave1  Note If no password is required to log in slave1 successful, otherwise check for missing steps.
  4. 2 and 3 was repeated to effect the remaining logged on to the host-tight Free

 Step seven verification environment build results

  1. HDFS formatted
    before first use must be formatted once hdfs, before they start each daemon is successful. Formatted as follows:
    bin / hdfs namenode -format

    No error message if the format is successful.

  2. Start HDFS
    execute the following command to start hdfs:
    sbin/start-hdfs.sh

    Stop command for the corresponding

    sbin/stop-hdfs.sh

    Start command is executed without error, then an executable  jps  command to view the java process, master will appear NameNode and SecondaryNameNode two processes, each slave node will appear DataNode process.

  3. Start YARN
    execute the following command to start the yarn:
    sbin/start_yarn.sh

    Stop command for the corresponding

    sbin/stop-yarn.sh

    Start command is executed without error, then an executable jps command to view the java process, master will appear ResourceManager process, each slave node will appear NodeManager process.

Step Eight firewall configuration

In the above configuration with several port number, default centos Firewall is turned on, here we need to configure the firewall to open these port numbers above, or will not visit the corresponding web page. Command is as follows:

firewall-cmd --zone=public --add-port=8030/tcp --permanent
firewall-cmd --zone=public --add-port=8031/tcp --permanent
firewall-cmd --zone=public --add-port=8032/tcp --permanent
firewall-cmd --zone=public --add-port=8033/tcp --permanent
firewall-cmd --zone=public --add-port=8088/tcp --permanent
firewall-cmd --zone=public --add-port=10020/tcp --permanent
firewall-cmd --zone=public --add-port=19888/tcp --permanent

firewall-cmd --reload

After machining, the access master in the browser: 8088 Access to the cluster interface information on OK

 

to sum up

After more than eight steps, a fully distributed cluster hadoop up is completed! Finally, attach the configuration link hadoop official website https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-common/ClusterSetup.html , English proficiency can also learn if the proposed Quguan network.

Guess you like

Origin www.cnblogs.com/duanzi6/p/10995412.html