2. Hadoop tutorial tutorial on building a distributed environment cluster (detailed)

Preface
- - 1. Deployment environment

Preface

This article only introduces the installation and deployment of Apache Hadoop 2.x version, and subsequent articles will introduce the structure of Hadoop 2.x, the cooperative working principle of each module, and technical details. Installation is not the purpose, but understanding Hadoop through installation is the purpose.

1. Deployment environment

The partial method is to truly use multiple Linux hosts to deploy Hadoop, and to plan Linux machine clusters so that each module of Hadoop is deployed on different multiple machines.
The following will demonstrate the distributed deployment of a Hadoop cluster (including 1 master node and 2 slave nodes).

(1) Preparation

Prepare three linux servers :
192.168.2.110 (master node)
192.168.2.111 (slave node)
192.168.2.112 (slave node)
Close the firewall.
Install JDK and configure environment variables.
By default, you have installed JDK and configured environment variables. Install and configure here. Do not repeat it.

//192.168.2.110上配置如下：
/usr/local/jdk/jdk1.8.0_192/bin/java

//192.168.2.111上配置如下：
/usr/local/server/jdk1.8.0_171/bin/java

//192.168.2.112上配置如下：
/usr/local/server/jdk1.8.0_171/bin/java

Want to know if the configuration is successful? View global variables:

echo $JAVA_HOME

(1) Three servers configure synchronization time:

yum install ntpdate
ntpdate cn.pool.ntp.org

(2) Three server configuration host names:

vim /etc/sysconfig/network

//192.168.2.110上network配置如下：
NETWORKING=yes
HOSTNAME=node-1

//192.168.2.111上network配置如下：
NETWORKING=yes
HOSTNAME=node-2

//192.168.2.112上network配置如下：
NETWORKING=yes
HOSTNAME=node-3

(2) Three servers set ip and host name mapping:

vim /etc/hosts

//192.168.2.110上hosts配置如下：
192.168.2.110 node-1

//192.168.2.111上hosts配置如下：
192.168.2.111 node-2

//192.168.2.112上hosts配置如下：
192.168.2.112 node-3

(3) The master node server is configured with SSH password-free login (configured on 192.168.2.110)
. There are many ways to configure SSH password-free login on Hadoop on the Internet, which is dazzling. All roads lead to Rome, and the blogger has compiled the simplest configuration method:

ssh-keygen -t rsa              # 会有提示，都按回车就可以
//cat id_rsa.pub >> authorized_keys  # 加入授权（非必须执行）
//chmod 600 ./authorized_keys    # 修改文件权限（非必须执行）
ssh-copy-id node-1
ssh -o StrictHostKeyChecking=no node-1
ssh-copy-id node-2
ssh -o StrictHostKeyChecking=no node-2
ssh-copy-id node-3
ssh -o StrictHostKeyChecking=no node-3

ssh-copy-id : The command can fill the currently generated public key into the authorized_keys file on a remote machine.
ssh -o StrictHostKeyChecking=no : When connecting to the server for the first time, a prompt to confirm the public key will pop up. This will cause certain automated tasks to be interrupted due to the initial connection to the server. Or because the contents of the ~/.ssh/known_hosts file were emptied, the automation task was interrupted. The StrictHostKeyChecking configuration command of the SSH client can automatically accept the new public key when connecting to the server for the first time.

If an error is reported: The authenticity of host'node-2 (10.0.0.8)' can't be established. Don't panic, execute the following command again:

ssh -o StrictHostKeyChecking=no node-2

If an error is reported: Permissions 0644 for'/root/.ssh/id_rsa' are too open. Don't panic, it means that the permissions given by /root/.ssh/id_rsa are too open, execute the following commands to empower:

chmod 0600 /root/.ssh/id_rsa

(2) Configuration work

Transfer hadoop-2.8.5 to the /home/neijiang/ directory (can be customized). The following configurations are performed on the master node server. After the configuration is completed, copy the hadoop-2.8.5 of the master node to all slave nodes. But:
(Special note: Because the environment variables of hadoop are not configured in the hadoop-2.8.5 directory, all need to be manually configured to the slave node)
(1) Configure hadoop-env.sh

vim hadoop-env.sh

//192.168.2.110上hadoop-env.sh配置如下：
/usr/local/jdk1.8.0_171

(2) Modify the configuration file core-site.xml

//192.168.2.110上core-site.xml配置如下：
<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/neijiang/hadoop-2.8.5/hadoopData</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://node-1:9000</value>
    </property>
</configuration>

(3) Modify the configuration file hdfs-site.xml:

//192.168.2.110上hdfs-site.xml配置如下：
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node-2:50090</value>
    </property>
</configuration>

(4) Modify the configuration file mapred-site.xml.template:

mv mapred-site.xml.template mapred-site.xml
vim mapred-site.xml

//192.168.2.110上mapred-site.xml.template配置如下：
<configuration>
<!--指定mr运行时框架，这里指定在yarn上，默认是local-->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

（5）yarn-site.xml

//192.168.2.110上yarn-site.xml配置如下：
<!--指定yarn的老大（resourceManager）的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>node-1</value>
    </property>
<!--NodeManager上运行的附属服务，需配置成mapreduce_shuffle,才可运行MapReduce程序的默认值-->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

(6) The slaves file, in which the host name of the slave node is written

vim slaves

//192.168.2.110上slaves配置如下：
//删除原先的localhsot,写入如下内容：
node-1
node-2
node-3

(7) Add hadoop to environment variables

vim /etc/profile

//192.168.2.110上配置如下：
export HADOOP_HOME=/home/neijiang/hadoop-2.8.5/  #指向hadoop解压路径
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export PATH=$HBASE_HOME/bin:$ZK_HOME/bin:$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

(8) Copy the hadoop-2.10.0 root directory configured on the master node 192.168.2.110 to all slave nodes:

scp -r /home/neijiang/hadoop-2.8.5/ root@node-2:/home/neijiang/
scp -r /home/neijiang/hadoop-2.8.5/ root@node-3:/home/neijiang/

(9) Manually configure hadoop environment variables configured on the master node 192.168.2.110 to all slave nodes:

export HADOOP_HOME=/home/neijiang/hadoop-2.8.5  #指向hadoop解压路径
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

(3) Start work

(1) Regarding the formatting of hdfs: formatting is
required for the first startup. The essence of formatting is to initialize the file system and create some files that you need;
after formatting, the cluster starts successfully, and subsequent formatting is not required .
The formatting operation must be performed on the machine where the master node (Namenode) of the hdfs cluster is located.
Formatting command: hadoop namenode -format
(2) Start mode
/home/neijiang/hadoop-2.8.5/sbin

  hdfs启动:  start-dfs.sh
  yarn启动:  start-yarn.sh
  //等同于：start-all.sh

  hdfs关闭:  stop-dfs.sh
  yarn关闭:  stop-yarn.sh
  //等同于：stop-all.sh

You can use the jps command on each node server to view the startup things.
If you feel helpful, please "Like", "Follow", and "Favorite"!

2. Hadoop tutorial tutorial on building a distributed environment cluster (detailed)

2. Hadoop tutorial tutorial on building a distributed environment cluster (detailed)

Preface

1. Deployment environment

(1) Preparation

(2) Configuration work

(3) Start work

Guess you like