1. Basic environment
Before installing Hadoop on Linux, you need to install two programs:
1.1 Installation Instructions
1. JDK 1.6 or higher (the installation mentioned in this article is jdk1.7); the jdk that comes with redHat is generally not used, delete it and reinstall the one you need
2. SSH (Secure Shell Protocol), it is recommended to install MobaXterm_Personal. (functional, easy to use)
2. Host configuration
Since I built a Hadoop cluster including three machines, I need to modify and adjust the hosts file configuration of each machine, enter /etc/hosts, and configure the mapping between hostname and ip. The command is as follows:
vim /etc/hosts
If you don't have enough permissions, you can switch the user to root.
The contents of the three machines are uniformly added with the following host configuration:
The server name can be modified by hostname to redHat1, redHat2, redHat3
3. Hadoop installation and configuration
3.1 Create a file directory
For the convenience of management, create a directory under the user directory for the NameNode, DataNode and temporary files of redHat1's hdfs:
/data/hdfs/name
/data/hdfs/data
/data/hdfs/tmp
Then copy these directories to the same directory of redHat2 and redHat3 through the scp command.
3.2 Download
First, go to the Apache official website to download Hadoop, and select the recommended download image from it. I choose the hadoop-2.7.1 version and use the following command to download it to the redHat1 machine.
/data directory:
wget http://archive.apache.org/dist/hadoop/core/hadoop-2.7.1/hadoop-2.7.1.tar.gz
Then extract hadoop-2.7.1.tar.gz to /data directory using the following command
tar -zxvf hadoop-2.7.1.tar.gz
3.3 Configure environment variables
Go back to the /data directory and configure the hadoop environment variables with the following commands:
vim /etc/profile
Add the following to /etc/profile
To make the hadoop environment variables take effect immediately, execute the following command:
source /etc/profile
Then use the hadoop command and find that there is a prompt, which means that the configuration takes effect.
hadoop
3.4 Hadoop configuration
Enter the configuration directory of hadoop-2.7.1:
cd /data/hadoop-2.7.1/etc/hadoop
Modify core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, and slaves files in sequence.
3.4.1 Modify core-site.xml
vim core-site.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>file:/data/hdfs/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>fs.default.name</name> <value>hdfs://redHat1:9000</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>
Note: The value of hadoop.tmp.dir fills in the directory corresponding to the previously created directory
3.4.2 Modify vim hdfs-site.xml
vim hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/data/hdfs/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/data/hdfs/data</value> <final>true</final> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>redHat1:9001</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
Note: The value of dfs.namenode.name.dir and dfs.datanode.data.dir fill in the corresponding directory created earlier
3.4.3 Modify vim mapred-site.xml
Copy the template to generate xml, the command is as follows:
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3.4.4 Modify vim yarn-site.xml
vim yarn-site.xml
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.address</name> <value>redHat1:18040</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>redHat1:18030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>redHat1:18088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>redHat1:18025</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>redHat1:18141</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> </configuration>
3.4.5 Modify data/hadoop-2.7.1/etc/hadoop/redHat1
Delete the original localhost and change it to the following
vim /data/hadoop-2.7.1/etc/hadoop/slaves
Finally, copy the entire hadoop-2.7.1 folder and its subfolders into the same directory for redHat2 and redHat3 using scp:
scp -r /data/hadoop-2.7.1 redHat2:/data
scp -r /data/hadoop-2.7.1 redHat3:/data
4. Running Hadoop
sh ./start-all.sh
Check the cluster status:
/data/hadoop-2.7.1/bin/hdfs dfsadmin -report
Test yarn:
http://192.168.92.140:18088/cluster/cluster
Test View HDFS:
http://192.168.92.140:50070/dfshealth.html#tab-overview
Focus: Problems encountered in configuring and running Hadoop
1 JAVA_HOME not set?
Reports at startup:
Then you need /data/hadoop-2.7.1/etc/hadoop/hadoop-env.sh, add the JAVA_HOME path
To write the path as an absolute path, don't automatically get that kind of thing.
2. FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-336454126-127.0.0.1-1419216478581 (storage id DS-445205871-127.0.0.1-50010-1419216613930) service to /192.168.149.128:9000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException): Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-445205871-127.0.0.1-50010-1419216613930, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-41993190-ade1-486c-8fe1-395c1d6f5739;nsid=1679060915;c=0)
Reason:
Due to the known inconsistency between the data files in the local dfs.data.dir directory and the namenode, the datanode node is not accepted by the namenode.
solve:
1. Delete all files in the dfs.namenode.name.dir and dfs.datanode.data.dir directories
2. Modify hosts
cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.149.128 localhost
3, reformat: bin/hadoop namenode -format
4, start
Restart