Quickly build Hadoop environment (pseudo-distributed and fully distributed step-by-step versions)

Table of contents

1. Initial preparation:

2. Environment setup:

1. Modify the IP and host name of the second and third hosts

2. Use Xftp to pass in the JDK and hadoop installation packages and install them

Ⅰ. First delete the openJDK that comes with centos7:

Ⅱ. Install JDK and configure JAVA environment

Ⅲ. Install Hadoop and configure the environment

IV. Host file configuration, password-free login

Ⅴ. File transfer and environment configuration of the remaining two machines

Ⅵ. Hadoop cluster startup and HDFS, YARN cluster UI information status query

3. Fully distributed Hadoop framework


1. Initial preparation:

1. Create one Linux virtual machine, clone two of them, modify the hostname and IP address, and then follow the steps below (or create three virtual machines, and set the IP address and hostname on the initial installation interface. The author uses the first method here).

2.Linux networking, using NAT mode, can ping successfully. (self-searching without installation and Internet connection)

3. ——》Resource download address 

2. Environment setup:

IP address              host name

192.168.1.111 bigdata111 #The first host
192.168.1.112 bigdata112 #The second host
192.168.1.113 bigdata113 #The third host

1. Modify the IP and host name of the second and third hosts

After creating the first bigdata111 host, clone two hosts, then connect them to the network and open the terminal of the second host.

su root
hostname  查看主机名
vim /etc/hostname  修改主机名
ip addr/ifconfig  查看主机IP
cd /etc/sysconfig/network-scripts
vi ifcfg-ens33 修改网络配置文件:

将BOOTPROTO=dhcp改为static
ONBOOT=NO改为yes
IPADDR=192.168.1.112
NETMASK=255.255.255.0
GATEWAY=192.168.80.2  #到编辑里面打开虚拟网络编辑器然后找到NAT模式点击NAT设置就可以看到网关,这个需要和虚拟机的网关保持一致
DNS1=8.8.8.8

systemctl restart network
ip addr或者ifconfig  查看是否修改成功
同理在bigdata113主机上进行主机名和IP地址的修改

After changing the IP address, the reason why Xshell cannot connect is that the subnet of the virtual machine and the changed IP are not on the same channel.

2. Use Xftp to pass in the JDK and hadoop installation packages and install them

Ⅰ. First delete the openJDK that comes with centos7:

方法一:
rpm -qa | grep java 查询java软件
sudo rpm -e 软件包  卸载不要的JDK
which java 查看JDK路径
rm -rf JDK路径  手动删除,最后把profile配置信息删除
方法二:
yum list installed | grep java  查看linux上自带的JDK
yum -y remove java-1.8.0-openjdk*   删除所有自带的JDK
yum -y remove tzdata-java.noarch

Ⅱ. Install JDK and configure JAVA environment

mkdir -p /export/data
mkdir -p /export/servers
mkdir -p /export/software

Xftp把上面的两个安装包传入到/export/software文件夹中
tar -zxvf /export/software/jdk-8u181-linux-x64.tar.gz -C /export/servers/
cd /export/servers
mv jdk1.8.0_181 jdk 

vi /etc/profile
#tip:在配置文件末尾追加
export JAVA_HOME=/export/servers/jdk
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
source /etc/profile
java -version

Ⅲ. Install Hadoop and configure the environment

tar -zxvf /export/software/hadoop-2.7.2.tar.gz -C /export/servers/
vi /etc/profile    (shift+G快速到文件末尾)

#tip:在文件末尾追加
export HADOOP_HOME=/export/servers/hadoop-2.7.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

source /etc/profile
hadoop version
cd /export/servers/hadoop-2.7.2/etc/hadoop/


vi hadoop-env.sh
#tip:找到相应位置,添加这段话
export JAVA_HOME=/export/servers/jdk


vi core-site.xml
#tip:下图中乱码部分为注释代码,可以删掉,不影响
<configuration>
    <!--用于设置Hadoop的文件系统,由URI指定-->
    <property>
        <name>fs.defaultFS</name>
        <!--用于指定namenode地址在hadoop01机器上-->
        <value>hdfs://bigdata111:9000</value>
    </property>
    <!--配置Hadoop的临时目录,默认/tem/hadoop-${user.name}-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/export/servers/hadoop-2.7.2/tmp</value>
    </property>
</configuration>


vi hdfs-site.xml
<configuration>
    <!--指定HDFS的数量-->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!--secondary namenode 所在主机的IP和端口-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>bigdata112:50090</value>
    </property>
</configuration>
cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<configuration>
    <!--指定MapReduce运行时的框架,这里指定在YARN上,默认在local-->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>


vi yarn-site.xml
<configuration>
    <!--指定YARN集群的管理者(ResourceManager)的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>bigdata111</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>


#tip:将文件中的localhost删除,添加主节点和子节点的主机名称
#tip:如主节点bigdata111,bigdata1112和bigdata113
vi slaves
bigdata111
bigdata112
bigdata113

IV. Host file configuration, password-free login

配置host文件,每个主机都要
vi /etc/hosts

192.168.1.111 bigdata111
192.168.1.112 bigdata112
192.168.1.113 bigdata113

scp /etc/hosts bigdata112:/etc/hosts
scp /etc/hosts bigdata113:/etc/hosts

在windows下打开C:\Windows\System32\drivers\etc\host,在桌面创建一个host文件不带后缀,然后把原本windows下的host文件复制到桌面的hosts文件里在最后加上上面/etc/hosts写入的,最后覆盖时候会提示文件重复,管理员权限继续覆盖即可。

免密登入(这个过程每次主机都要完整执行一次,共三次)
ssh-keygen -t rsa
cd /root
ll -a
cd .ssh/
ssh-copy-id bigdata111
ssh-copy-id bigdata112
ssh-copy-id bigdata113

ssh bigdata112/ssh bigdata111 看看是否能互通

Ⅴ. File transfer and environment configuration of the remaining two machines

在bigdata111上传输两个安装包给112和113    
scp /export/software/jdk-8u181-linux-x64.tar.gz bigdata112:/export/software/
scp /export/software/jdk-8u181-linux-x64.tar.gz bigdata113:/export/software/
scp /export/software/hadoop-2.7.2.tar.gz bigdata112:/export/software/
scp /export/software/hadoop-2.7.2.tar.gz bigdata113:/export/software/

分别在112和113上安装两个包
tar -zxvf /export/software/jdk-8u181-linux-x64.tar.gz -C /export/servers/
tar -zxvf /export/software/hadoop-2.7.2.tar.gz -C /export/servers/

传输111的环境变量文件给112和113
scp /etc/profile bigdata112:/etc/profile
scp /etc/profile bigdata113:/etc/profile
scp -r /export/ bigdata112:/
scp -r /export/ bigdata113:/

#tip:返回bigdata112和bigdata113节点执行下面命令,使环境变量生效
source /etc/profile
然后在112和113上java -version和hadoop version查看是否安装成功

Ⅵ. Hadoop cluster startup and HDFS, YARN cluster UI information status query

hdfs namenode -format
start-dfs.sh
start-yarn.sh
jps(只能查java编写的进程)
systemctl stop firewalld.service    #关闭防火墙
systemctl disable firewalld.service  #关闭防火墙开机启动

通过UI界面查看Hadoop运行状态,在Windows系统下,访问http://bigdata111:50070,查看HDFS集群状态,在Windows系统下,访问http://bigdata111:8088,查看Yarn集群状态

Find the PID of the DataNode in JPS, and then kill -9 PID (DataNode). Because of the heartbeat mechanism of the cluster, you will see a Dead Node appear on HDFS after 10 minutes, and verification can be achieved. At this point, the pseudo-distributed hadoop framework has been installed.

3. Fully distributed Hadoop framework

The node allocation is as shown in the figure:

Serve bigdata111 bigdata112 bigdata113
NameNode
Secondary NameNode
DataNode
ResourceManager
NodeManager
JobHistoryServer

After the above pseudo-distributed configuration is successful, you only need to modify a few configuration files, and then allocate files on bigdata111 to bigdata112 and 113.

cd /export/servers/hadoop-2.7.2/etc/hadoop/


vi core-site.xml
#tip:下图中乱码部分为注释代码,可以删掉,不影响
<configuration>
    <!--用于设置Hadoop的文件系统,由URI指定-->
    <property>
        <name>fs.defaultFS</name>
        <!--用于指定namenode地址在bigdata111机器上-->
        <value>hdfs://bigdata111:9000</value>
    </property>
    <!--配置Hadoop的临时目录,默认/tem/hadoop-${user.name}-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/export/servers/hadoop-2.7.2/tmp</value>
    </property>
</configuration>

vi hdfs-site.xml
<configuration>
    <!--指定HDFS的数量-->
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!--secondary namenode 所在主机的IP和端口-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>bigdata112:50090</value>
    </property>
    <property>
        <name>dfs.namenode.http-address</name>
        <value>bigdata111:50070</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
</configuration>

vi mapred-site.xml
<configuration>
    <!--指定MapReduce运行时的框架,这里指定在YARN上,默认在local-->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>bigdata113:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>bigdata113:19888</value>
    </property>
</configuration>

vi yarn-site.xml
<configuration>
    <!--指定YARN集群的管理者(ResourceManager)的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>bigdata113</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
</configuration>

scp -r /export/ bigdata112:/
scp -r /export/ bigdata113:/

Problems that arise: start-all.sh after hdfs namenode -format, and then find through jps that no DataNode appears. This is because there is a conflict between the previous pseudo-distributed installation and the current environment, or the namenode needs to be installed as much as possible after running Avoid formatting, otherwise the datanode will not find the corresponding cluster ID.

Solution: After cd /export/servers/hadoop-2.7.2/tmp/dfs/, run ls and find that there is a data file, then rm -rf data to delete the file, then hdfs namenode -format and start-all.sh. To solve the problem, please note that the data files of the three machines must be deleted!

Access the web page: Since ResourceManager is assigned to bigdata113 here, the web page visited is:

bigdata111:50070 and bigdta113:8088

 

Guess you like

Origin blog.csdn.net/weixin_56115549/article/details/126611995