centos7全分布模式安装hadoop--hadoop安装系列之二

一、实验目标

1、学会全分布集群模式安装hadoop软件

2、学会安装hadoop软件过程中的各种跳坑姿势

二、实验环境

三台机器的网络主机配置如下：

192.168.10.166 master

192.168.10.167 slave01

192.168.10.168 slave02

再次梳理一下安装前置条件：

1、安装三台centos7的服务器，版本是Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

2、创建hadoop组，创建hadoop用户加入hadoop组

3、修改/etc/hosts文件，修改/etc/hostsname文件

4、安装jdk环境，验证jdk安装是否正确

5、完成集群内三台主机的免密认证登陆，实现master主机对slave01、slave02主机的免密登陆

具体实验环境及安装环境前置步骤参见前篇博文：

http:// blog.csdn.net/firehadoop/article/details/68953541

三、实验步骤

1、修改master主机hadoop用户环境变量，并拷贝至slave01、slave02主机上

[hadoop@master ~]$ vi .bashrc

export JAVA_HOME=/usr/java/jdk1.8.0_121
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HADOOP_USER_NAME=hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

--拷贝配置文件至slave01，slave02上

[hadoop@master ~]$ scp .bashrc hadoop@slave01:/home/hadoop/
.bashrc 100% 418 0.4KB/s 00:00
[hadoop@master ~]$ scp .bashrc hadoop@slave02:/home/hadoop/
.bashrc 100% 418 0.4KB/s 00:00

2、hadoop用户/home/hadoop目录下解压hadoop安装文件,并执行相关shell命令完成hadoop安装文件部署到/home/hadoop/bigdata/hadoop

[hadoop@master ~]$ tar -zxr hadoop-2.7.3.tar.gz

mkdir bigdata

mv hadoop-2.7.3 bigdata/

cd bigdata/

mv hadoop-2.7.3 hadoop

3、在集群的三台机器上关闭防火墙，并禁用防火墙服务防止下次启动自动加载

--如果不关闭防火墙，后面hadoop集群启动后会出现只有在master上可以运行hadoop命令，其它节点上运行hadoop命令都会报错。

--centos7之前版本关闭防火墙的命令如下，在centos7中使用systemctl工具来管理服务程序，包括了service和chkconfig

service iptables stop

/etc/init.d/iptables stop

--master上关闭防火墙

[hadoop@master ~]$ systemctl stop firewalld.service
[hadoop@master ~]$ systemctl disable firewalld.service
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.

--slave01上关闭防火墙

[hadoop@slave01 ~]$ systemctl stop firewalld.service
[hadoop@slave01 ~]$ systemctl disable firewalld.service
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.

--slave02上关闭防火墙

[hadoop@slave02 ~]$ systemctl stop firewalld.service
[hadoop@slave02 ~]$ systemctl disable firewalld.service
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.

4、在master上修改hadoop配置文件core-site.xml

vim /home/hadoop/bigdata/hadoop/etc/hadoop/core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>

<name>hadoop.tmp.dir</name>
<value>/home/hadoop/bigdata/data/hadoop/tmp</value>
</property>
</configuration>

--hadoop官网对core-site.xml参数参考

Parameter	Value	Notes
`fs.defaultFS`	NameNode URI	hdfs://host:port/
`io.file.buffer.size`	131072	Size of read/write buffer used in SequenceFiles.

5、在master上对hdfs-site.xml参数进行配置

--配置senondarynamenode节点在集群中的哪台机器上，本例中与namenode合设在master主机上，业务环境中不建议这样设置，应该设置与master不在同一服务器上；

--配置datanode、namenode的数据实际存储的操作系统文件位置；

--配置hdfs上每个block副本的数量，本例设置了3个副本；

vim /home/hadoop/bigdata/hadoop/etc/hadoop/hdfs-site.xml

<configuration>
<property>

<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/bigdata/data/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/bigdata/data/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>

--hadoop官网关于namenode的配置还有其它选项参考，比如每个block块的大小，包含或不包含那些datanode主机等

Parameter	Value	Notes
`dfs.namenode.name.dir`	Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.	If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
`dfs.hosts` / `dfs.hosts.exclude`	List of permitted/excluded DataNodes.	If necessary, use these files to control the list of allowable datanodes.
`dfs.blocksize`	268435456	HDFS blocksize of 256MB for large file-systems.
`dfs.namenode.handler.count`	100	More NameNode server threads to handle RPCs from large number of DataNodes.

6、修改配置文件mapred-site.xml

vim /home/hadoop/bigdata/hadoop/etc/hadoop/mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

--hadoop官网关于hadoop的mapreduce计算框架通过什么进行调度的配置，此处引入yarn资源调度框架；

--既然是资源调度框架，所有关于map、reduce、shuffle及task的资源优化配置都在此配置文件中配置，这里的资源是涉及jvm虚拟机的底层资源管理；

Parameter	Value	Notes
`mapreduce.framework.name`	yarn	Execution framework set to Hadoop YARN.
`mapreduce.map.memory.mb`	1536	Larger resource limit for maps.
`mapreduce.map.java.opts`	-Xmx1024M	Larger heap-size for child jvms of maps.
`mapreduce.reduce.memory.mb`	3072	Larger resource limit for reduces.
`mapreduce.reduce.java.opts`	-Xmx2560M	Larger heap-size for child jvms of reduces.
`mapreduce.task.io.sort.mb`	512	Higher memory-limit while sorting data for efficiency.
`mapreduce.task.io.sort.factor`	100	More streams merged at once while sorting files.
`mapreduce.reduce.shuffle.parallelcopies`	50	Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.

7、修改配置文件yarn-site.xml文件

--该步骤的配置是为了配置resourcemanager及nodemanager节点

<configuration>

<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

--hadoop官网关于yarn框架中对resourcemanager、nodemanager资源配置优化有非常多的参数，实验环境只需要设置上面最简单的参数即可跑起来，但是在业务环境中需要重点学习关注下面的参数；

Configurations for ResourceManager:

Parameter	Value	Notes
`yarn.resourcemanager.address`	`ResourceManager` host:port for clients to submit jobs.	host:port If set, overrides the hostname set in `yarn.resourcemanager.hostname`.
`yarn.resourcemanager.scheduler.address`	`ResourceManager` host:port for ApplicationMasters to talk to Scheduler to obtain resources.	host:port If set, overrides the hostname set in `yarn.resourcemanager.hostname`.
`yarn.resourcemanager.resource-tracker.address`	`ResourceManager` host:port for NodeManagers.	host:port If set, overrides the hostname set in `yarn.resourcemanager.hostname`.
`yarn.resourcemanager.admin.address`	`ResourceManager` host:port for administrative commands.	host:port If set, overrides the hostname set in `yarn.resourcemanager.hostname`.
`yarn.resourcemanager.webapp.address`	`ResourceManager` web-ui host:port.	host:port If set, overrides the hostname set in `yarn.resourcemanager.hostname`.
`yarn.resourcemanager.hostname`	`ResourceManager` host.	host Single hostname that can be set in place of setting all `yarn.resourcemanager*address` resources. Results in default ports for ResourceManager components.
`yarn.resourcemanager.scheduler.class`	`ResourceManager` Scheduler class.	`CapacityScheduler` (recommended), `FairScheduler` (also recommended), or `FifoScheduler`
`yarn.scheduler.minimum-allocation-mb`	Minimum limit of memory to allocate to each container request at the `Resource Manager`.	In MBs
`yarn.scheduler.maximum-allocation-mb`	Maximum limit of memory to allocate to each container request at the `Resource Manager`.	In MBs
`yarn.resourcemanager.nodes.include-path` / `yarn.resourcemanager.nodes.exclude-path`	List of permitted/excluded NodeManagers.	If necessary, use these files to control the list of allowable NodeManagers.

Configurations for NodeManager:

Parameter	Value	Notes
`yarn.nodemanager.resource.memory-mb`	Resource i.e. available physical memory, in MB, for given `NodeManager`	Defines total available resources on the `NodeManager` to be made available to running containers
`yarn.nodemanager.vmem-pmem-ratio`	Maximum ratio by which virtual memory usage of tasks may exceed physical memory	The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
`yarn.nodemanager.local-dirs`	Comma-separated list of paths on the local filesystem where intermediate data is written.	Multiple paths help spread disk i/o.
`yarn.nodemanager.log-dirs`	Comma-separated list of paths on the local filesystem where logs are written.	Multiple paths help spread disk i/o.
`yarn.nodemanager.log.retain-seconds`	10800	Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
`yarn.nodemanager.remote-app-log-dir`	/logs	HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
`yarn.nodemanager.remote-app-log-dir-suffix`	logs	Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
`yarn.nodemanager.aux-services`	mapreduce_shuffle	Shuffle service that needs to be set for Map Reduce applications.

8、修改slaves文件参数

[hadoop@master hadoop]$ vim slaves
slave01
slave02

--设置使用 slaves文件一次在许多主机上运行命令。它不用于任何基于Java的Hadoop配置。为了使用此功能，必须为用于运行Hadoop的帐户建立ssh信任。

9、拷贝master上的/home/hadoop/bigdata/hadoop目录都复制到slave01、slave02对应的目录下

--拷贝hadoop文件到slave01主机上

--此处必须要注意，你需要首先登陆到slave01主机上在/home/hadoop/下手工创建bigdata目录否则拷贝数据后会丢失/bigdata后面的hadoop目录

[hadoop@master hadoop]$ scp -r /home/hadoop/bigdata/hadoop/ hadoop@slave01:/home/hadoop/bigdata

--此处必须要注意，你需要首先登陆到slave02主机上在/home/hadoop/下手工创建bigdata目录否则拷贝数据后会丢失/bigdata后面的hadoop目录

[hadoop@master hadoop]$ scp -r /home/hadoop/bigdata/hadoop/ hadoop@slave01:/home/hadoop/bigdata

10、namenode文件系统初始化

--namenode必须要先格式化，否则namenode节点无法启动成功

[hadoop@master sbin]$ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

17/04/03 05:44:41 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.10.166
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.3

17/04/03 05:44:41 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
17/04/03 05:44:41 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-a36eb93b-a2f3-482e-b3c8-8507c2aeca07
17/04/03 05:44:42 INFO namenode.FSNamesystem: No KeyProvider found.
17/04/03 05:44:42 INFO namenode.FSNamesystem: fsLock is fair:true
17/04/03 05:44:42 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
17/04/03 05:44:42 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
17/04/03 05:44:42 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
17/04/03 05:44:42 INFO blockmanagement.BlockManager: The block deletion will start around 2017 Apr 03 05:44:42
17/04/03 05:44:42 INFO util.GSet: Computing capacity for map BlocksMap
17/04/03 05:44:42 INFO util.GSet: VM type = 64-bit
17/04/03 05:44:42 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
17/04/03 05:44:42 INFO util.GSet: capacity = 2^21 = 2097152 entries
17/04/03 05:44:42 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
17/04/03 05:44:42 INFO blockmanagement.BlockManager: defaultReplication = 3
17/04/03 05:44:42 INFO blockmanagement.BlockManager: maxReplication = 512
17/04/03 05:44:42 INFO blockmanagement.BlockManager: minReplication = 1
17/04/03 05:44:42 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
17/04/03 05:44:42 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
17/04/03 05:44:42 INFO blockmanagement.BlockManager: encryptDataTransfer = false
17/04/03 05:44:42 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
17/04/03 05:44:42 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
17/04/03 05:44:42 INFO namenode.FSNamesystem: supergroup = supergroup
17/04/03 05:44:42 INFO namenode.FSNamesystem: isPermissionEnabled = true
17/04/03 05:44:42 INFO namenode.FSNamesystem: HA Enabled: false
17/04/03 05:44:42 INFO namenode.FSNamesystem: Append Enabled: true
17/04/03 05:44:43 INFO util.GSet: Computing capacity for map INodeMap
17/04/03 05:44:43 INFO util.GSet: VM type = 64-bit
17/04/03 05:44:43 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
17/04/03 05:44:43 INFO util.GSet: capacity = 2^20 = 1048576 entries
17/04/03 05:44:43 INFO namenode.FSDirectory: ACLs enabled? false
17/04/03 05:44:43 INFO namenode.FSDirectory: XAttrs enabled? true
17/04/03 05:44:43 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
17/04/03 05:44:43 INFO namenode.NameNode: Caching file names occuring more than 10 times
17/04/03 05:44:43 INFO util.GSet: Computing capacity for map cachedBlocks
17/04/03 05:44:43 INFO util.GSet: VM type = 64-bit
17/04/03 05:44:43 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
17/04/03 05:44:43 INFO util.GSet: capacity = 2^18 = 262144 entries
17/04/03 05:44:43 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
17/04/03 05:44:43 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
17/04/03 05:44:43 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
17/04/03 05:44:43 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
17/04/03 05:44:43 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
17/04/03 05:44:43 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
17/04/03 05:44:43 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
17/04/03 05:44:43 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
17/04/03 05:44:43 INFO util.GSet: Computing capacity for map NameNodeRetryCache
17/04/03 05:44:43 INFO util.GSet: VM type = 64-bit
17/04/03 05:44:43 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
17/04/03 05:44:43 INFO util.GSet: capacity = 2^15 = 32768 entries

11、启动hadoop并验证是否安装配置成功

--进入hadoop启动目录执行启动命令

[hadoop@master hadoop]$ cd /home/hadoop/bigdata/hadoop/sbin
[hadoop@master sbin]$ sh start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-namenode-master.out
slave01: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-slave01.out
slave02: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-slave02.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave01: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-slave01.out
slave02: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-slave02.out

--验证master节点是否运行正常

[hadoop@master sbin]$ jps -m
22996 SecondaryNameNode
23158 ResourceManager
22778 NameNode
23420 Jps -m

--验证slave01节点是否运行正常

[hadoop@slave01 current]$ jps -m
16377 NodeManager
16250 DataNode
16506 Jps -m

--验证slave02节点是否运行正常

[hadoop@slave02 current]$ jps -m
59138 NodeManager
59011 DataNode
59275 Jps -m

--验证集群中所有节点hdfs运行状态

[hadoop@slave01 ~]$ hadoop dfsadmin -report

Configured Capacity: 38002491392 (35.39 GB)
Present Capacity: 27343609856 (25.47 GB)
DFS Remaining: 27343589376 (25.47 GB)
DFS Used: 20480 (20 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2)://清楚看到2个datanode节点存活

Name: 192.168.10.167:50010 (slave01)
Hostname: slave01
Decommission Status : Normal
Configured Capacity: 19001245696 (17.70 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 5321039872 (4.96 GB)
DFS Remaining: 13680193536 (12.74 GB)
DFS Used%: 0.00%
DFS Remaining%: 72.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Apr 03 18:06:47 PDT 2017

Name: 192.168.10.168:50010 (slave02)
Hostname: slave02
Decommission Status : Normal
Configured Capacity: 19001245696 (17.70 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 5337841664 (4.97 GB)
DFS Remaining: 13663395840 (12.73 GB)
DFS Used%: 0.00%
DFS Remaining%: 71.91%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Apr 03 18:06:48 PDT 2017

--通过hadoop的http服务验证

查看RM：http://master:8088/

查看hdfs：http://master:50070/

四、实验排错

1、master上的namenode总是无法启动

观察hadoop下的logs/hadoop-hadoop-namenode-master.log文件发现故障原因是没有初始化namenode，导致hdfs下对应的namenode不存在

2017-04-02 14:58:10,052 WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/hadoop/bigdata/data/hadoop/hdfs/namenode does not exist
2017-04-02 14:58:10,053 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/hadoop/bigdata/data/hadoop/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
2017-04-02 14:58:10,090 INFO org.mortbay.log: Stopped [email protected]:50070
2017-04-02 14:58:10,091 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2017-04-02 14:58:10,091 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2017-04-02 14:58:10,091 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2017-04-02 14:58:10,091 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.

2、多次初始化namenode后导致datanode上的参数与namenode对不上，datanode无法启动

cat /home/hadoop/bigdata/data/hadoop/hdfs/namenode/current/VERSION

--注意clusterID这行，复制下来

clusterID=CID-a36eb93b-a2f3-482e-b3c8-8507c2aeca07

--进入slave01主机

vim /home/hadoop/bigdata/data/hadoop/hdfs/datanode/current/VERSION

发现clusterID与上面master的值不一样，将master复制的值粘贴覆盖；

--进入slave02主机执行与slave01主机一样的粘贴覆盖操作；

vim /home/hadoop/bigdata/data/hadoop/hdfs/datanode/current/VERSION

故障原因：

这个坑是因为我安装集群后，没有格式化namenode就直接启动，导致namenode启动不了，datanode可以启动，后来格式化了namenode，但是忘记关闭防火墙，导致集群其它机器不能使用，所以我又格式化了两次namenode，并且又关闭了防火墙，结果这时namenode可以启动，结果datanode死活启动不了。

这个问题一般是由于两次或两次以上的格式化NameNode造成的，有两种方法可以解决，

第一种方法是删除DataNode的所有资料（及将集群中每个datanode的/home/hadoop/bigdata/data/hadoop/hdfs/namenode/current/VERSION删掉，然后执行hadoop namenode -format重启集群

第二种方法是修改每个DataNode的/home/hadoop/bigdata/data/hadoop/hdfs/namenode/current/VERSION中的clusterID，使其与master中一致。

3、没有关闭防火墙导致集群内只有master可以使用hadoop

--mster上的防火墙没有关，集群中master上的防火墙必须要关闭，slave01与slave02上的防火墙建议关闭

[hadoop@slave02 current]$ hadoop fs -ls /
ls: No Route to Host from slave02/192.168.10.168 to master:9000 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see: http://wiki.apache.org/hadoop/NoRouteToHost

注意以下四种情况错误即可：

The hostname of the remote machine is wrong in the configuration files
The client's host table /etc/hosts has an invalid IPAddress for the target host.
The DNS server's host table has an invalid IPAddress for the target host.
The client's routing tables (In Linux, iptables) are wrong.

五、实验总结

集群中安装hadoop2.7.3服务是一个要按照流程一步一步操作的过程，其中遇到的很多问题，都是因为对hadoop安装流程不熟人为导致的，回过头总结以下安装流程：

1、集群java环境安装确认；

2、集群hadoop组、用户及/etc/hosts、/etc/hostname文件修改确认

3、集群实现master主机对slave01、slave02免密登陆确认

4、集群hadoop用户环境配置文件 .bashrc修改确认

5、集群内三台服务器关闭防火墙及关闭启动服务修改确认

6、master主机上释放hadoop2.7.3的安装文件并做安装目录设置调整

7、进入hadoop配置文件目录/home/hadoop/bigdata/hadoop/etc/hadoop，修改以下五个配置文件：

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

slaves

8、登陆到slave01、slave02主机上在/home/hadoop下用hadoop用户创建bigdata目录

9、通过scp命令加-r（目录拷贝）将master上/home/hadoop/bigdata/hadoop目录所有文件内容拷贝到slave01、slave02主机的/home/hadoop/bigdata下

10、namenode节点hdfs文件系统初始化

11、进入/home/hadoop/bigdata/hadoop/sbin下通过sh start-all.sh启动hadoop服务；

centos7全分布模式安装hadoop--hadoop安装系列之二

猜你喜欢