CentOS7 Hadoop installation (fully distributed)

hadoop cluster installation mode
 
1) Stand-alone mode
Direct unpack, without any configuration. Mainly used for test code. No distributed file system.
 
2) pseudo-distributed
Fully distributed a form, but all the processes have to be configured on a node. There is a distributed file system, the file system is just only one node.
 
3) Fully distributed
Comprising a master node and a slave node, the master node namenode only one (in general, the real production environment namenode only as a single node) namenode responsible for storing metadata that describes the data stored datanode, as which of the data stored in datanode on one node, whose data is uploaded. datanode responsible for real work, is responsible for data storage. Fully distributed, if namenode is down will cause the entire cluster can not be used, and this is a major drawback of fully distributed, single point of failure problem. Therefore, in this mode is generally not used in a production environment.
 
4) High Availability
Clusters can continue to provide services to achieve 7 * 24 hours a day, depending on the zookeeper. Fully distributed architecture is a master multi-slave mode, high availability clustering architecture is a multi-master multi-slave, that is to say there are at least two namenode high availability cluster, but only one of them is active. We call this active namenode called active, the other is hot backup state, which they put this namenode called standby, and storage of metadata and active is exactly the same, down time when active, standby will immediately switch to active. If just down the namenode has returned to normal, but this is only namenode standby. But there is a flaw in this cluster, is at the same time only one active namenode. If the node is very large (ie, excessive metadata), this active namenode easily collapse.
 
5) federal mechanism
The same cluster can have multiple namenode, and at the same time can have multiple active namenode, these namenode common use of all cluster datanode, each responsible for managing a cluster namenode only part of the data on datanode. But the federal mechanisms also exist single point of failure, such as one active namenode is down, this will result in the presence of namenode data inaccessible, therefore, the general practice is to use the "Federal High Availability +" mode to build clusters.

 

Fully distributed installation

 

1. Cluster planning

CPU name IP HDFS yarn
hadoop01  192.168.220.141 namenode
datanode
nodeManager
hadoop02 192.168.220.142 secondarynamenode
datanode
nodeManager
hadoop03 192.168.220.143 DataNode resourceManager
nodeManager


  

2. Modify the hostname and hosts

 

3. Set the SSH key

 

4. Save the public key information to authorized certification

 

5. Log test

 

6. Configuration File

 

 

7. cluster configuration

 

 

8. Format File System

 

 

9. Start hadoop

 

10. Review Process

 

11. Is the test can be used normally HDSF

 

Guess you like

Origin www.cnblogs.com/caoxb/p/11280425.html