Pseudo-distributed deployment hadoop big data analytics platform and shell scripts for automated deployment

Environment: only one vmware virtual machine, ip is 172.16.193.200.

First, the manual deployment

1, turn off the firewall, SELinux

systemctl stop firewalld
sed -i '/SELINUX/s/enforcing/disabled/g' /etc/selinux/config

2, set the hostname

hostnamectl set-hostname huatec01

3, install jdk1.8

https://blog.csdn.net/weixin_44571270/article/details/102939666

4, download hadoop

wget http://mirrors.hust.edu.cn/apache/hadoop/core/hadoop-2.8.5/hadoop-2.8.5.tar.gz
mv hadoop-2.8.5.tar.gz /usr/local
cd /usr/local
tar xvf hadoop-2.8.5.tar.gz
mv hadoop-2.8.5 hadoop

5, modify the configuration file hadoop

cd /usr/local/hadoop/etc/hadoop/
  • hadoop.env.sh
  • core-site.xml
  • hdfs-site.xml
  • mapred-site.xml
  • yarn-site.xml
(1)hadoop.env.sh

This file is the configuration file Hadoop operating environment, need to rely JDK Hadoop running, the value of which we modify the export JAVA_HOME JDK path we installed as follows:

export JAVA_HOME=/usr/local/jdk1.8.0_141
(2)core-site.xml

The core Hadoop profile file, the contents of the configuration file as follows:

<configuration>
        <property>
                <name>fs.defaultFS </name>
                <value>hdfs://huatec01:9000</value>
        </property>
        <property>
                <name>Hadoop.tmp.dir</name>
                <value>/huatec/hadoop-2.8.5/tmp</value>
        </property>
</configuration>

In the above code, we mainly configured with two attributes, a first attribute for specifying a communication address NameNode HDFS, here we assign it to huatec01; generating a second attribute for the file storage operation when specifying Hadoop catalog, we do not need to create this directory, because the format is automatically created when the Hadoop.

(3)hdfs-site.xml

The file is HDFS core configuration files, configuration files as follows:

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>

The default number of copies Hadoop cluster is three, but now we just pseudo-distributed installation on a single node, without having to save three copies, we modify the value of the property to 1 can be. Three nodes are fully distributed.

(4)mapred-site.xml

This file does not exist, but there is a template file mapred-site.xml.template, we will rename the template file mapred-site.xml, and then edit it. The MapReduce file profile to the core, the content of the configuration file as follows:

mv mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<configuration>
     <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

The reason why the configuration of the above properties, because after Hadoop2.0, mapreduce is running on Yarn architecture, we need to make special statement.

(5)yarn-site.xml

Yarn is the frame profile file, we specify the node name and the main properties of our NodeManager ResourceManager, the contents of the configuration file as follows:

<configuration>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>huatec01</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

In the above code, we configured the two properties. The first attribute is used to specify the address of the ResourceManager, because we are single-node deployment, we can specify huatec01; a second attribute for specifying the data acquisition mode reducer.

6, formatted file system

Initial start-up you need to perform hdfs namenode -format format file system, and then start hdfs and yarn, subsequent start direct start hdfs and yarn can be.

/usr/local/hadoop/bin/hdfs namenode -format

7, start hdfs and yarn

cd /usr/local/hadoop/sbin/
sh start-dfs.sh
sh start-yarn.sh

#关闭hadoop
sh stop-dfs.sh
sh stop-yarn.sh

8, see hadoop process

jps

Here Insert Picture DescriptionAccess what 172.16.193.200:50070
Here Insert Picture Descriptionaccess 172.16.193.200:8088, enter MapReduce management interface:
Here Insert Picture Descriptiona successful deployment!

Second, the pseudo-distributed automation deploy hadoop

#!/bin/bash
#authored by WuJinCheng
#This shell script is written in 2020.3.17
JDK=jdk1.8.tar.gz
HADOOP_HOME=/usr/local/hadoop
#------初始化安装环境----
installWget(){
	echo -e '\033[31m ---------------初始化安装环境...------------ \033[0m'
	sed -i '/SELINUX/s/enforcing/disabled/g' /etc/selinux/config
	systemctl stop firewalld
	wget -V
	if [ $? -ne 0 ];then
		echo -e '\033[31m 开始下载wget工具... \033[0m'
		yum install wget -y
	fi
}
#------JDK install----
installJDK(){
	ls /usr/local/|grep 'jdk*'
	if [ $? -ne 0 ];then
		echo -e '\033[31m ---------------开始下载JDK安装包...------------ \033[0m'
		wget http://www.wujincheng.xyz/$JDK
		if [ $? -ne 0 ]
			then
				exit 1
		fi
		mv $JDK /usr/local/
		cd /usr/local/
		tar xvf $JDK
		mv jdk1.8.0_141/ jdk1.8
		ls /usr/local/|grep 'jdk1.8'
		if [ $? -ne 0 ];then
			echo -e '\033[31m jdk安装失败! \033[0m'
		fi
		echo -e '\033[31m jdk安装成功! \033[0m'
	fi
}
JDKPATH(){
	echo -e '\033[31m ---------------开始配置环境变量...------------ \033[0m'
	grep -q "export JAVA_HOME=" /etc/profile
	if [ $? -ne 0 ];then
		echo 'export JAVA_HOME=/usr/local/jdk1.8'>>/etc/profile
		echo 'export CLASSPATH=$CLASSPATH:$JAVAHOME/lib:$JAVAHOME/jre/lib'>>/etc/profile
		echo 'export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin'>>/etc/profile
		source /etc/profile
	fi
}
#------hadoop install----
installHadoop(){
	hostnamectl set-hostname huatec01
	ls /usr/local/|grep "hadoop*"
	if [ $? -ne 0 ];then
		echo -e '\033[31m ---------------开始下载hadoop安装包...------------ \033[0m'
		wget http://mirrors.hust.edu.cn/apache/hadoop/core/hadoop-2.8.5/hadoop-2.8.5.tar.gz
		if [ $? -ne 0 ]
    		then
        		exit 2
		fi
		mv hadoop-2.8.5.tar.gz /usr/local
		cd /usr/local
		tar xvf hadoop-2.8.5.tar.gz
		mv hadoop-2.8.5 hadoop
	fi
}
#------hadoop conf----
hadoopenv(){
	sed -i '/export JAVA_HOME=/s#${JAVA_HOME}#/usr/local/jdk1.8#g' /usr/local/hadoop/etc/hadoop/hadoop-env.sh
}
coresite(){
echo '<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->


<configuration>
        <property>
                <name>fs.defaultFS </name>
                <value>hdfs://huatec01:9000</value>
</property>
        <property>
                <name>Hadoop.tmp.dir</name>
                <value>/huatec/hadoop-2.8.5/tmp</value>
        </property>
</configuration>'>$HADOOP_HOME/etc/hadoop/core-site.xml
}
hdfssite(){
echo '<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
</configuration>'>$HADOOP_HOME/etc/hadoop/hdfs-site.xml
}

mapredsite(){
mv $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
echo '<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>'>$HADOOP_HOME/etc/hadoop/mapred-site.xml
}

yarnsite(){
echo '<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>huatec01</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
<!-- Site specific YARN configuration properties -->

</configuration>'>$HADOOP_HOME/etc/hadoop/yarn-site.xml
}
#------hdfs 格式化----
hdfsFormat(){
	echo -e '\033[31m -----------------开始hdfs格式化...------------ \033[0m'
	$HADOOP_HOME/bin/hdfs namenode -format
	if [ $? -ne 0 ];then
		echo -e '\033[31m Hadoop-hdfs格式化失败! \033[0m'
		exit 1
	fi
	echo -e '\033[31m Hadoop-hdfs格式化成功! \033[0m'	
}
install(){
    installWget
    installJDK
	JDKPATH
    installHadoop
    hadoopenv
    coresite
    hdfssite
    mapredsite
    yarnsite
    hdfsFormat
}
case $1 in
	install)
		install
		echo -e '\033[31m Hadoop自动化部署完成! \033[0m'
	;;
	JDK)
		installWget
		installJDK
		JDKPATH
	;;
	Hadoop)
		installWget
		installHadoop
		echo -e '\033[31m Hadoop安装完成! \033[0m'
	;;
	confHadoop)
		hadoopenv
		coresite
		hdfssite
		mapredsite
		yarnsite
		echo -e '\033[31m Hadoop配置完成! \033[0m'
	;;
	format)
		hdfsFormat
	;;
	*)
		echo "Usage:$0 {install|JDK|Hadoop|confHadoop|format}";;
	esac

Third, resolve error

1, there may be unable to log on at the seventh step start hadoop! Deny logon
Here Insert Picture Descriptionappears this sentence.
Here Insert Picture Descriptionping about our host! It was found that the huatec01 when a domain name resolution. and so

vim /etc/resolv.conf

Here Insert Picture DescriptionThat's all!

2, datanode started, but no hadoop interface datanode
repeatedly clusterid sixth step formatting inconsistencies caused when taking into account!

cd /tmp/hadoop-root/dfs/

VERSION under the name of clusterid under current
and
clusterid VERSION under the current inconsistencies in the data
to be modified into a consistent, restart hadoop service!

He published 188 original articles · won praise 150 · views 30000 +

Guess you like

Origin blog.csdn.net/weixin_44571270/article/details/104792329