1. Preparing the environment
CentOS 7.4
hadoop hadoop-3.2.1 (http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz)
jdk 1.8.x
2, configure the environment variables
Command: vi / etc / profile
#hadoop
export HADOOP_HOME=/opt/module/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
Commands :: wq
Command: Source / etc / Profile (execute this command to refresh the configuration file)
3, the new directory
(Performed separately)
mkdir /root/hadoop
mkdir /root/hadoop/tmp
mkdir /root/hadoop/var
mkdir /root/hadoop/dfs
mkdir /root/hadoop/dfs/name
mkdir /root/hadoop/dfs/data
4, modify the configuration etc / hadoop
(1), modified core-site.xml
Add disposed in <configuration> node:
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://node180:9000</value>
</property>
(2) modifying hdfs-site.xml
Add disposed in <configuration> node:
<property>
<!-- 主节点地址 -->
<name>dfs.namenode.http-address</name>
<value>node180:50070</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/root/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.
</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/root/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>need not permissions</description>
</property>
After dfs.permissions configured as false, do not be allowed to check the permissions on the files generated on the dfs, convenient touches easily, but you need to prevent accidental deletion, set it to true, or directly delete the property node, because the default is true.
(3) modifying mapred-site.xml
Add disposed in <configuration> node:
<! - Configuration mapReduce run (running locally default) on the Yarn ->
<Property>
<name> mapreduce.framework.name </ name>
<value> Yarn </ value>
</ Property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.1</value>
</property>
(4), to modify yarn-site.xml
Add disposed in <configuration> node:
<!-- Site specific YARN configuration properties -->
<Property>
<the Description> Specifies the YARN boss (ResourceManager) address </ the Description>
<name> yarn.resourcemanager.hostname </ name>
<value> node180 </ value>
</ Property>
<-! runs on NodeManager ancillary services. Needs to be configured to mapreduce_shfffle, before running MapReduce Defaults ->
<Property>
<name> yarn.nodemanager.aux-Services </ name>
<value> mapreduce_shuffle </ value>
</ Property>
<-!
<Property>
<discription> each node available memory unit in MB, default 8182MB </ discription>
<name> yarn.scheduler.maximum-Allocation-MB </ name>
<value> 1024 </ value>
< / Property>
->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
Description: yarn.nodemanager.vmem-check-enabled this means ignoring virtual memory check, if you are installing on a virtual machine, this configuration is useful, coupled with follow-up operations to go after not easy to go wrong. If it is the physical machine, and enough memory, the configuration may be removed.
(5), workers file
Read:
node180
node181
node182
(6)、修改 hadoop-env.sh、mapred-env.sh、yarn-env.sh
Join jdk path configuration
# jdk
export JAVA_HOME="/opt/module/jdk1.8.0_161"
5, modify sbin
(1), modified start-dfs.sh, stop-dfs.sh
The first line added
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
(2), modify stop-dfs.sh, stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
6, file synchronization, each node
(1) folder synchronization hadoop
scp -r hadoop-3.2.1/ [email protected]:/opt/module
scp -r hadoop-3.2.1/ [email protected]:/opt/module
(2) synchronous data folder
scp -r /root/hadoop/ [email protected]:/root
scp -r /root/hadoop/ [email protected]:/root
6, start hadoop
(1), initialization is performed on namenode
Open the folder: cd /opt/module/hadoop-3.2.1/bin
Execute the command: ./ hadoop namenode -format
(2), the implementation started on namenode
Open the folder: cd /opt/module/hadoop-3.2.1/sbin
Execute the command: ./ start-all.sh
7, test hadoop
https://blog.csdn.net/weixin_38763887/article/details/79157652
https://blog.csdn.net/s1078229131/article/details/93846369
Open: http: //192.168.0.180: 50070 /
Open: http: //192.168.0.180: 8088 /
8, test analysis
Create a folder: hdfs dfs -mkdir -p / user / root
Upload word documents to hadoop server: wc.txt
Execute the command: hdfs dfs -put /root/wc.txt
Perform word command: hadoop jar /opt/module/hadoop-3.2.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount wc.txt wcount
View the results of the command: hdfs dfs -cat wcount / *
The effect is as: