(Ultra-detailed version) Construction of Hadoop2.7.1 cluster environment under Linux (3 units as an example)

1. Basic environment

Before installing Hadoop on Linux, you need to install two programs:

1.1 Installation Instructions

1. JDK 1.6 or higher (the installation mentioned in this article is jdk1.7); the jdk that comes with redHat is generally not used, delete it and reinstall the one you need

2. SSH (Secure Shell Protocol), it is recommended to install MobaXterm_Personal. (functional, easy to use)

2. Host configuration

Since I built a Hadoop cluster including three machines, I need to modify and adjust the hosts file configuration of each machine, enter /etc/hosts, and configure the mapping between hostname and ip. The command is as follows:

vim /etc/hosts

If you don't have enough permissions, you can switch the user to root.

The contents of the three machines are uniformly added with the following host configuration:

The server name can be modified by hostname to redHat1, redHat2, redHat3

3. Hadoop installation and configuration

3.1 Create a file directory

For the convenience of management, create a directory under the user directory for the NameNode, DataNode and temporary files of redHat1's hdfs:

/data/hdfs/name

/data/hdfs/data

/data/hdfs/tmp

Then copy these directories to the same directory of redHat2 and redHat3 through the scp command.

3.2 Download

First, go to the Apache official website to download Hadoop, and select the recommended download image from it. I choose the hadoop-2.7.1 version and use the following command to download it to the redHat1 machine.

/data directory:

wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz

Then extract hadoop-2.7.1.tar.gz  to /data directory using the following command

tar -zxvf hadoop-2.7.1.tar.gz

3.3 Configure environment variables

Go back to the /data directory and configure the hadoop environment variables with the following commands:

vim /etc/profile

Add the following to /etc/profile

To make the hadoop environment variables take effect immediately, execute the following command:

source /etc/profile

Then use the hadoop command and find that there is a prompt, which means that the configuration takes effect.

hadoop

3.4 Hadoop configuration

Enter the configuration directory of hadoop-2.7.1:

cd /data/hadoop-2.7.1/etc/hadoop

Modify core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, and slaves files in sequence.

3.4.1 Modify core-site.xml

vim core-site.xml

 

<configuration>
 <property>
   <name>hadoop.tmp.dir</name>
   <value>file:/data/hdfs/tmp</value>
   <description>A base for other temporary directories.</description>
 </property>
 <property>
   <name>io.file.buffer.size</name>
   <value>131072</value>
 </property>
 <property>
   <name>fs.default.name</name>
   <value>hdfs://redHat1:9000</value>
 </property>
 <property>
   <name>hadoop.proxyuser.root.hosts</name>
   <value>*</value>
 </property>
 <property>
   <name>hadoop.proxyuser.root.groups</name>
   <value>*</value>
 </property>
</configuration>

Note: The value of hadoop.tmp.dir fills in the directory corresponding to the previously created directory

 

3.4.2 Modify vim hdfs-site.xml

vim hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 <!--
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
 
     http://www.apache.org/licenses/LICENSE-2.0
 
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->
 
 <!-- Put site-specific property overrides in this file. -->
 
<configuration>
 <property>
   <name>dfs.replication</name>
   <value>2</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/data/hdfs/name</value>
   <final>true</final>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/data/hdfs/data</value>
   <final>true</final>
 </property>
 <property>
   <name>dfs.namenode.secondary.http-address</name>
   <value>redHat1:9001</value>
 </property>
 <property>
   <name>dfs.webhdfs.enabled</name>
   <value>true</value>
 </property>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>

Note: The value of dfs.namenode.name.dir and dfs.datanode.data.dir fill in the corresponding directory created earlier

 

3.4.3 Modify vim mapred-site.xml

Copy the template to generate xml, the command is as follows:

cp mapred-site.xml.template mapred-site.xml

vim  mapred-site.xml

 

   

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 <property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
 </property>
</configuration>
 

3.4.4 Modify vim yarn-site.xml

vim yarn-site.xml

 

<?xml version="1.0"?>
 <!--
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
 
     http://www.apache.org/licenses/LICENSE-2.0
 
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License. See accompanying LICENSE file.
 -->
<configuration>

<!-- Site specific YARN configuration properties -->
 <property>
   <name>yarn.resourcemanager.address</name>
   <value>redHat1:18040</value>
 </property>
 <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>redHat1:18030</value>
 </property>
 <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>redHat1:18088</value>
 </property>
 <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>redHat1:18025</value>
 </property>
 <property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>redHat1:18141</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce.shuffle</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>

3.4.5 Modify data/hadoop-2.7.1/etc/hadoop/redHat1

Delete the original localhost and change it to the following

vim /data/hadoop-2.7.1/etc/hadoop/slaves

Finally, copy the entire hadoop-2.7.1 folder and its subfolders into the same directory for redHat2 and redHat3 using scp:

scp -r /data/hadoop-2.7.1 redHat2:/data

scp -r /data/hadoop-2.7.1 redHat3:/data

4. Running Hadoop

sh ./start-all.sh

Check the cluster status:

/data/hadoop-2.7.1/bin/hdfs dfsadmin -report

Test yarn:

http://192.168.92.140:18088/cluster/cluster

Test View HDFS:

http://192.168.92.140:50070/dfshealth.html#tab-overview

 

Focus: Problems encountered in configuring and running Hadoop

1 JAVA_HOME not set?

Reports at startup:

Then you need /data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh, add the JAVA_HOME path

To write the path as an absolute path, don't automatically get that kind of thing.

2. FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-336454126-127.0.0.1-1419216478581 (storage id DS-445205871-127.0.0.1-50010-1419216613930) service to /192.168.149.128:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-445205871-127.0.0.1-50010-1419216613930, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-41993190-ade1-486c-8fe1-395c1d6f5739;nsid=1679060915;c=0)

 

Reason:
Due to the known inconsistency between the data files in the local dfs.data.dir directory and the namenode, the datanode node is not accepted by the namenode.

solve:

1. Delete all files in the dfs.namenode.name.dir and dfs.datanode.data.dir directories

2. Modify hosts

 cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.149.128 localhost

3, reformat: bin/hadoop namenode -format

4, start

Restart

 

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325566663&siteId=291194637