Linux is enough to build Hadoop

Preface

This is mainly to build hadoop 2.8.3 under centos7. It records the build of Hadoop in more detail. It records and solves some errors encountered on the Internet and I. Although the build may not be too difficult, the first build, I also encountered a lot of mistakes, but it took one night and one afternoon to build successfully. Record the friends who hope to help other learning partners.

Ready to work

Java environment jdk1.8

image

hadoop2.8.3

image

Modify hostname

image

Turn off firewall

image

Create user hadoop

Start

  1. Modify the virtual machine network configuration NAT, set the virtual machine subnet segment

Modification method: local search -> virtual network editing

image

image

  1. Here, modify the default subnet segment, which is 192.168.100. My previous network segment is 192.168.85 . Remember the IP address of the network segment here, which is required for configuration in the virtual machine below.

image

  1. After modifying the subnet ip address, the corresponding ip configured in the virtual machine should be in this network segment.

image

  1. Modify hostname

There are two ways to modify the host name, one is temporary modification, which becomes invalid after restart, and the other is permanent modification.

First check the host name of the machine:

root@master ~]# hostname

image

My modified hostname, you should be localhost.localhost before you modify it

Here I suggest that you use temporary modification.

[root@master ~]# hostname master

For permanent modification, please refer to: https://www.cnblogs.com/zhangwuji/p/7487856.html

  1. Turn off firewall

For reference, here is the shutdown method of centos7: https://blog.csdn.net/ytangdigl/article/details/79796961

  • When turning off the firewall here, help selinux to also turn off. selinux is a sub-security mechanism of Linux.
vim /etc/sysconfig/selinux

image

  1. Create user hadoop and grant root privileges.

This is a basic linux command, so you can't check Baidu, it's relatively simple.

  • Because I have already created hadoop, here I will create hadoop1.
[root@master ~]# useradd hadoop1
[root@master ~]# passwd hadoop1

image

  • Give the hadoop1 user sudo permission. After giving sudo permission, we need to add a sudo before executing the command
[root@master ~]# vim /etc/sudoers

Add statement

root    ALL=(ALL)       ALL
hadoop1 ALL=(root) NOPASSWD:ALL

image

  • View current user
[root@master ~]# cat /etc/passwd

image

  • Test the user: su

image

  1. Install jdk
  • Installing jdk is relatively simple, I will not introduce too much here, here we must remember the jdk installation path.
  • Uninstall existing jdk
  • Query whether to install Java software:
rpm -qa | grep java
  • If the installed version is lower than 1.7, uninstall the JDK:
sudo rpm -e 软件包
  • Place the jdk installation package in the virtual machine and unzip it. Here we create a directory to save the software.
mkdir -p /opt/modulel
将解压后的Jdk放到这个目录下,防止忘记Java的安装位置。
  • Configure JDK environment variables
 打开/etc/profile文件
sudo vi /etc/profile  
在profile文件末尾添加JDK路径
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_144
export PATH=$PATH:$JAVA_HOME/bin
  • Let the modified file take effect
source /etc/profile
  • Test whether the JDK is installed successfully
Java -version

image

  1. Install hadoop

Before installing hadoop, make sure that your jdk installation is correct. Otherwise hadoop cannot run normally.

  • download

https://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/

  • Unzip, or unzip hadoop to /opt/module above.
  • Open the /etc/profile file
sudo vi /etc/profile
在profile文件末尾添加JDK路径:(shitf+g)
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-2.7.2(这个改成你的hadoop版本)
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
  • Exit after saving
  • Let the modified file take effect
source /etc/profile
  • Test whether the installation is successful
hadoop version (注意没有-)

image

hadoop run

  • Hadoop operating modes include: local mode, pseudo-distributed mode and fully distributed mode
  • The local mode and pseudo-distributed mode are mainly used here.

Local operation **** mode

  • Official Grep case
  • Create a folder wcinput under the hadoop2.8.3 file
mkdir wcinput
  • Create a wc.input file under the wcinput file
cd wcinput
touch wc.input
  • Edit the wc.input file
hadoop yarn
hadoop mapreduce
atguigu
atguigu
  • Go back to the Hadoop directory /opt/module/hadoop-2.8.2
  • execute program
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount wcinput wcoutput

Explain the consciousness of this sentence: This is a hadoop program. This jar package is a file that comes with hadoop. In the wordcount case, wcinput is the input folder followed by the output to that folder.

  • View Results
cat wcoutput/part-r-00000
atguigu 2
hadoop  2
mapreduce       1
yarn    1

Successfully run locally

Pseudo-distributed operating mode

  • step
  • Configure the cluster
  • Start, test cluster addition, deletion, and check
  • Implementation of WordCount case
  1. Configure the cluster

Configuration: hadoop-env.sh add JAVA_HOME Road King

export JAVA_HOME=/opt/module/jdk1.8.0_144
  1. Placement: core-site.xml
<!-- 指定HDFS中NameNode的地址 -->
<property>
<name>fs.defaultFS</name>
    <value>hdfs://master(你的主机名):9000</value>
</property>
 
<!-- 指定Hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-2.7.2/data/tmp</value>(这个是用来存放hadoop执行生成的数据文件夹)
</property>
  1. Configuration: hdfs-site.xml
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property
  1. Start the cluster
  • Format the NameNode (format it at the first startup, and don't format it all the time). Pay attention to this. Don't format it frequently. Frequent formatting can easily lead to errors in subsequent node correspondence.
  • Start the NameNode
sbin/hadoop-daemon.sh start namenode
  • Start DataNode
sbin/hadoop-daemon.sh start datanode
  1. Start SecondaryNameNode
sbin/hadoop-daemon.sh start secondarynamenode
  1. JPS command to check whether it has been started successfully, if there is a result, the start is successful
输入jps出现
3034 NameNode
3233 Jps
3193 SecondaryNameNode
3110 DataNode
表示成功
  1. After the startup is successful, you can check whether it is successful on the web

http://hadoop101:50070/dfshealth.html#tab-overview

Note: If you can’t view it, see the following post for processing: http://www.cnblogs.com/zlslch/p/6604189.html

image

Operation cluster (HDFS)

  1. Create an input folder on the HDFS file system
bin/hdfs dfs -mkdir -p /user/atguigu/input

image

image

  1. Upload the test file content to the file system, here we upload the folder wc.input we created earlier
bin/hdfs dfs -mkdir -p /user/atguigu/input

image

  1. Run MapReduce program
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar wordcount /user/atguigu/input/ /user/atguigu/output

After the operation is successful, two files will appear in the output directory, indicating that the execution is successful.
image

  1. Delete the output result, because we will output the file to the output later. If this folder exists, it may conflict, so delete it first.
hdfs dfs -rm -r /user/atguigu/output

Start YARN and run the MapReduce program

  • step
  • Configure the cluster to run MR on YARN
  • Start, test cluster addition, deletion, and check
  • Executing WordCount case on YARN
  1. Configure the cluster, configure yarn-env.sh, add a statement at the end of the file
export JAVA_HOME=/opt/module/jdk1.8.0_144(这里的jdk改为你的版本)
  1. Placement yarn-site.xml
<!-- Reducer获取数据的方式 -->
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
 
<!-- 指定YARN的ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
  1. Configuration: mapred-env.sh, also add java_home
export JAVA_HOME=/opt/module/jdk1.8.0_144
  1. Configuration: (Rename mapred-site.xml.template to) mapred-site.xml
mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
添加
<!-- 指定MR运行在YARN上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
  1. Start the cluster
  • Before starting, you must ensure that the NameNode and DataNode have been started, use jps to check whether they are started
  1. Start the ResourceManager this has been started before, don’t worry about this
  2. Start NodeManager
sbin/yarn-daemon.sh start nodemanager
  1. YARN browser page view http://master:8088/cluster

image

  1. Execute the MapReduce program, and delete the output directory earlier to prevent conflict with this, but you can modify the output folder name, for example, output1.
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar wordcount /user/atguigu/input  /user/atguigu/output

image

This indicates success, check it on the web side.

image

success.

problem

  1. Here I mainly explain the errors encountered by a few large hadoops. The biggest problem is the problem that made me build several times over and over again, that is, the last step is stuck.

I resolved this by referring to this: https://blog.csdn.net/wjw498279281/article/details/80317424

  1. Also, the upload failed when the file was uploaded by HDFS at the end. This problem was mainly because the firewall was not turned off. In fact, it was not turned off because I turned it off under root and finally switched to hadoop, but the upload failed.

The solution for this: https://blog.csdn.net/qq_44702847/article/details/105403388

3. There are some other mistakes, but the first problem keeps me stuck in the last step.

Conclusion

This is the first time I have built hadoop, please record it. After setting up, it can be regarded as a little understanding of the construction of hadoop.

Guess you like

Origin blog.csdn.net/qq_44762290/article/details/110137505