background
Record the installation and deployment process of Hadoop 2.5.0 under CentOS7
step
1. Create a new cdh folder and unzip the hadoop compressed package into the cdh folder
#mkdir cdh
#tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C cdh
2. Switch to the etc/hadoop directory under the hadoop decompression directory, and modify hadoop-env.sh, mapred-env.sh, mapred-site.xml.template, hdfs-site.xml, yarn-site.xml, core-site .xml and slaves seven files.
Both envs can only modify the JAVA_HOME of the file on the room (java1.8 in my CentOS is installed in the /home/szc/jdk8_64 directory, so JAVA_HOME is set to /home/szc/jdk8_64)
export JAVA_HOME=/home/szc/jdk8_64
After the mapred-site.xml.template file is modified as follows, rename it to mapred-site.xml file
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.57.141:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>192.168.57.141:19888</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.57.141:50091</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resoucemanager.hostname</name>
<value>192.168.57.141</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
core-site.xml, note that the szc in the last two attributes is replaced with your own user name, and the directory corresponding to hadoop.tmp.dir should also be created by yourself
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.57.141:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/szc/cdh/hadoop-2.5.0-cdh5.3.6/data/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.szc.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.szc.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property
</configuration>
slaves
192.168.57.141
All ip above are the native ip of centos
3. Format hdfs
Switch to the bin directory of the hadoop decompression directory, and then run the command
#hdfs namenode -format
The screenshot after completion is as follows
4. Start the corresponding process
Switch to the sbin directory of the Hadoop decompression directory, run start-dfs.sh, start-yarn.sh to start hdfs and yarn, and then run the following command to start historyserver
./mr-jobhistory-daemon.sh start historyserver
5. Windows browser to view the ui interface of the cluster
Open port 50070 first
[root@localhost sbin]# firewall-cmd --add-port=50070/tcp --permanent
success
[root@localhost sbin]# firewall-cmd --reload
success
Then enter the centos ip:50070 in the windows browser, and the following interface will be displayed after pressing Enter
So far, hadoop deployment is complete
Conclusion
Above, thank you