Hadoop distributed cluster construction

Hadoop distributed cluster construction

environment

VMware virtual machine under Windows, using cnetos to build three hadoop distributed clusters

Download package

1. Create a hadoop user

    useradd -m hadoop -s /bin/bash   # 创建新用户hadoop

    passwd hadoop 给用户添加密码    

2. Modify network information (static ip)

  • Modify the hosts file. If the hostname in etc/hosts in the experimental machine is inconsistent with the actual hostname, modify etc/hosts to be the current hostname

    sudo vi /etc/sysconfig/network  #修改主机名
    
    sudo vi /etc/hosts  #修改ip
    
  • After modification, save and exit and restart.
    reboot

3. Install SSH and configure SSH passwordless login

  • Use ssh-keygen to generate a key and add the key to the authorization:
    ssh-keygen -t rsa # There will be a prompt, just press Enter

    cat id_rsa.pub >> authorized_keys  # 加入授权
    
    chmod 600 ./authorized_keys    # 修改文件权限
    
  • Then, on the Master node, transfer the public key to the Slave node:

    scp ~/.ssh/id_rsa.pub hadoop@Slave1:/home/hadoop/
    
    mkdir ~/.ssh       # 如果不存在该文件夹需先创建,若已存在则忽略
    
    cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
    
    chmod 600 authorized_keys
    
    rm ~/id_rsa.pub    # 用完就可以删掉了
    

4. The CentOS system needs to close the firewall

    sudo service iptables stop   # 关闭防火墙服务

    sudo chkconfig iptables off  # 禁止防火墙开机自启,就不用手动关闭了

5. Install the Java environment

sudo tar -zxvf ~/downloads/jdk-7u91-linux-x64.tar.gz -C /usr/local  #解压到/usr/local目录下
  • Configure the JAVA_HOME environment variable:
    vi ~/.bashrc

  • Append the following and save:

    export JAVA_HOME=/usr/local/jdk1.7.0_91
    
    export PATH=$JAVA_HOME/bin:$PATH:
    

    * source ~/.bashrc # Make variable settings take effect *

  • After setting, let's check whether the settings are correct:

    java -version
    

6. Now install hadoop2 on the master node

    sudo tar -zxvf ~/downloads/hadoop-2.6.1.tar.gz -C /usr/local   # 解压到/usr/local中

    sudo mv hadoop-2.6.1 hadoop            # 将文件夹名改为hadoop

    sudo chown -R hadoop:hadoop hadoop        # 修改文件权限
  • Before building Hadoop, we also need to set the HADOOP environment variable:

    vi ~/.bashrc
    
  • Add the following at the end of the file:
    # Hadoop Environment Variables

    export HADOOP_HOME=/usr/local/hadoop
    
    export HADOOP_INSTALL=$HADOOP_HOME
    
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    
    export YARN_HOME=$HADOOP_HOME
    
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
    
    export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
    
    export PATH=$PATH:/usr/local/hadoop/sbin:/usr/local/hadoop/bin
    

* source ~/.bashrc #Make the command take effect *

7. Modify and configure hadoop configuration file information

Modify the configuration file core-site.xml (vi ./etc/hadoop/core-site.xml)

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://Master:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/usr/local/hadoop/tmp</value>
                <description>Abase for other temporary directories.</description>
        </property>
</configuration>

File hdfs-site.xml, dfs.replication is generally set to 3

<configuration>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>Master:50090</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/usr/local/hadoop/tmp/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/usr/local/hadoop/tmp/dfs/data</value>
        </property>
</configuration>

The file mapred-site.xml (may need to be renamed first, the default file name is mapred-site.xml.template), and then the configuration is modified as follows:

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>Master:10020</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>Master:19888</value>
        </property>
</configuration>

File yarn-site.xml:

<configuration>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>Master</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

8. After the configuration is completed. After configuration, copy the /usr/local/Hadoop folder on the Master to each node. Execute on the Master node:

If the Master ran pseudo-distributed before, it would delete unnecessary information

    sudo rm -r ./hadoop/tmp     # 删除 Hadoop 临时文件

    sudo rm -r ./hadoop/logs/*   # 删除日志文件

Pack and send

    tar -zcf ~/hadoop.master.tar.gz ./hadoop   # 先压缩再复制

    scp ./hadoop.master.tar.gz Slave1:/home/hadoop

Execute on the Slave1 node:

    sudo rm -r /usr/local/hadoop    # 删掉旧的(如果存在)

    sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local

    sudo chown -R hadoop /usr/local/hadoop

For the first startup, you need to format the NameNode on the Master node first:

    hdfs namenode -format       # 首次运行需要执行初始化,之后不需要

Start hadoop cluster:

    /usr/local/hadoop/sbin/start-all.sh

Verify hadoop cluster jps

主节点
Jps
2986 SecondaryNameNode
3143 ResourceManager
2791 NameNode
3212 Jps
子节点
jps
1950 Jps
1831 NodeManager
1725 DataNode

Problems encountered and solutions

no namenode process

  • Check whether the ip in ifconfig and /etc/hosts are consistent

no datenode process

  • Delete all files under hadoop/tmp, format dfs, and then start Hadoop again. This method will lose data and is not recommended.

This problem occurs because the clusterID in * /usr/local/hadoop/tmp/dfs/data/current/VERSION is inconsistent with the clusterID in /usr/local /
hadoop/tmp/dfs/name/current/VERSION * , keep the data in the data and the name consistent.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324600207&siteId=291194637