Hadoop (2)-Fully distributed installation, Hadoop high availability

Hadoop (2)-fully distributed installation, hadoop high availability

1. Fully distributed installation

After resetting, it is not necessary to delete it. The clusterID and name id can be kept the same.
Insert picture description here

Pseudo-distribution is to put all role processes on top of node06 processes, but full-distribution should be a distribution of different nodes.
Insert picture description here

The previous setup was that all the role processes were on the same node hadoop0, and the real namenode should deploy a single server.

  1. All environments must have jdk;
    through jpsto view
  2. Synchronize the time of all servers;
    check aliases:, cat /etc/hoststhere is a mapped ip address for each other to ping through;
    cat /etc/sysconfig/selinuxcheck whether it is closed;
    Insert picture description here
    fully distributed password-free login must have: who is the master node, who is the management node, whoever is To distribute your own key file .
    The secret-free login involves the secret-free of one master node and the other three slave nodes;

Step 1: Go back to the home directory first to see if there are hidden files. (If not, you need to create it)

cd 
ll -a
查看是否有.ssh文件

Step 2: Enter the .ssh file directory and distribute it through the distribution command;

在node06的服务器上:
cd .ssh/
scp id_dsa.pub node07:`pwd` /node06.pub
node06是主节点,node07是要被分发的节点;
在node07的服务器上:
cd .ssh/
cat node06.pub >> authorized_keys

Insert picture description here
Having the public key means: On the server of node06, you can log in without key through ssh node07.

Insert picture description here
With this foundation, you can build a fully distributed. If you haven't built a pseudo-distribution before, you need to hadoop-env.shmodify it first , if not, it will prompt JVM not found.
Need to modify core-site.xml:
Insert picture description here
In addition, you need to modify hdfs-site.xml: you
Insert picture description here
need to modify the slaves and configure the slave nodes:
Insert picture description here
if you accidentally put 06 in the first row, the master and slave nodes are placed Together, then the operation that needs to be done is transfer. Transfer the DataNode.

Distribution operation:

First, see if each server has the same directory.
Then distribute the corresponding documents.
scp -r stx/node07:pwd''
Insert picture description here
details: where to configure the environment variables. So far it is node06, and the environment variables need to be configured successfully.
The distribution of profile environment variables is realized through scp:
scp /etc/profile node07:/etc/

the basic conditions for starting the cluster are now met, but when starting the cluster, the node formatting operation must be performed first.

Execute on hadoop0 (node06): hdfs namenode -formatto format. The function of formatting is to store the file in the location of the full file defined in core-site.xml.

Formatting is only set for the main node , not for other nodes. Other nodes will have it when it starts.

Go to the full folder and start it cd /var/sxt/hadoop/full/dfsthrough start-dfs.sh. After the startup is complete, check the startup of the node through jps,

Insert picture description here
On the master node (hadoop0 or node06), there is only the namenode;
Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here
suppose one of the namenodes has a problem, what should I do?

  • See logs, log /hadoop-2.7.4/logs, the master node only namenode log (hadoop0 or node06), datanode log on another node. (hadoop1, hadoop2, hadoop3)
  • View the bottom 100 lines of the log:tail -100 hadoop-root-datanode-hadoop1.log
  • When files are uploaded, they are cut strictly when they are cut.hdfs dfs -D dfs.blocksize=1048576 -put test.txt

2. High availability

Insert picture description here
The namenode in hadoop1 is the most awkward: a namenode must control multiple datanodes, and it is a single management node that manages the entire cluster. Once the namenode hangs up, the entire cluster will become unavailable.

The above is a single point of failure problem. The namenode maintains metadata information and does not interact with the disk. Assuming that the cluster has reached a large scale, and a single point is controlling and maintaining the data volume of a large cluster, it may cause a single point to have limited capabilities and limit the performance of the entire cluster effect. This is called a single point bottleneck problem.

Insert picture description here
F–Federation;
HA–high availability

HA-can become multiple, provided by multiple master nodes, not at the same time. Just provide a backup method. When the main hangs up, the standby comes up, avoiding the downtime of the entire cluster due to a single point of downtime.

F-Multiple master nodes provide services at the same time, which exists to expand the capacity of the namenode, the so-called federation.

Insert picture description here
2.0 master-slave architecture model diagram:
Insert picture description here

The automatic switch is completed above, and the manual switch is completed below. The zookeeper framework will be used during automatic construction to coordinate planning.

Two namenode nodes, if the other wants to replace one of them, data synchronization is required. Synchronizing the original data information is actually a description of the different data blocks below.

Block data is called dynamic, and offsets are called static.

There are two ways of synchronization, static and dynamic.

Why is it called dynamic-because the datanode reports to the namenode. The datanode now reports to two namenodes at the same time, and the dynamic block information has changed from a single report to a multi-party report, and the metadata can be synchronized between the two namenodes.

The key is how to synchronize static synchronization?

Socket communication, hard-coded. However, this method requires an ack confirmation mechanism, needs to give a feedback, and faces a dilemma, whether to confirm or not. If the machine is broken, it cannot be confirmed. This is called strong consistency. This caused the role of single point blocking.

The client role only keeps in touch with the active namenode.

How to do hadoop? -Edits is written in the log file and confirmed by namenode. (prior to)

Let the new server store the current log file, (the client operating commands on it), and then another namenode reads out of the log to achieve synchronization in this way. This is called nfs.

The disadvantage of this approach is that there is still a single point of failure, so journalnode technology is derived.

Provide log server cluster nodes to help complete the operation of cluster server synchronization data. Three servers receive log files at the same time, and all three servers receive the same. Why should the three stations accept the same content, worrying that it will be broken, and doing an insurance job.

The reason for multiple servers: I am afraid that one of them will hang up. But what may happen to multiple servers: Is it necessary for all three servers to confirm that the acceptance is successful? Strong consistency is possible in relational databases, but not feasible in clusters.

Therefore, once one of the three fails, there needs to be a lower tolerance limit. Allow one to turn off (3 sets). Odd clusters are generally used. Weak consistency.

It involves the cap theorem.

Activate stores the data in the cluster, and Stabdy synchronizes the data (doing a persistence operation), without secondarynode.

The data is divided into two parts: how the datanode synchronizes the two nodes; how to synchronize the static metadata information. Upload and download messages through the cluster of journalnode.

Manual switch


Automatic switching

Must rely on the zookeeper cluster, a distributed coordination system. Coordinate the running status of other big data clusters.

Play a core management role in it.

Zookeeper is the best architecture for distributed coordination.

Zookeeper will open a physical process on each namenode, this physical process is called: FailoverController, the failover control process.

How does zookeeper complete a corresponding master-slave switch? At the beginning, the namenodes are in a non-faulty state. Zookeeper provides an election mechanism. Each namenode applies to zookeeper. Whoever registers first will become the master node.

Zookeeper is understood as a small database, the meaning is to help coordination, register a node or create a node path znode.

Below the path, there will be related registration information of the node. Zookeeper maintenance performance is done through the parent-child directory tree. Registered, it means that the main node is maintaining.

Register-create a node in the zookeeper cluster.

Event monitoring-zookeeper and namenode keep in touch at all times, and keep in touch through zkfc. The node suddenly goes down, and the zkfc process opens up two components: healthmo, an election mechanism. zkfc monitors the health status of the namenode anytime and anywhere. The obligation is to let him participate in elections and monitor his health.

After discovering the problem, zk learned of the event and notified the standby node. The slave node entrusts zk to monitor the occurrence of events. The zk cluster captures the event and informs the slave node, what the slave node does is to tamper with the authority, register itself as the master node. This behavior is called function callback.

The function callback is the function of each client.

Registration node creation and notification behavior.
Zookeeper will maintain such an operation:

  1. When the registration is successful, the zkfc process will observe the status of the healthy master node anytime and anywhere;
  2. Once unhealthy is found, the zookeeper will be notified of the change in the identity of its elector through the lifting mechanism;
  3. After an event occurs on the node, zookeeper informs the standby, and the standby regards itself as activate; (but it cannot be changed directly yet)
  4. Standby compulsorily turns active into standby, and then turns itself into activate;

Only one activate is allowed at the same time.

Insert picture description here

federal

Insert picture description here
As shown in the figure, three namenodes are in parallel at the same time, and the underlying datanod division of labor is like the following storage model.

The role of federation:
Assuming that the amount of memory data of the namenode node is limited, but the amount of underlying data is very large, the limitations of a single node appear.

Joint i form one:
Why do we need federation: clusters are not balanced. Multiple namenode nodes work in parallel to jointly use the underlying storage space.

Joint form 2:
Add namenode node, the original memory space is not expanded;

Joint i form three:
datanode is not enough, multiple namenodes are needed to operate and manage in the form of federation.

The first example is that each namenode does a different business, and the storage space is merged; the third is that everyone does a business;
(remember the scenario where the boss divides the host) If there
Insert picture description here
are many namenode nodes, this is for the customer There is also a drawback: it needs to remember which namenode does the thing.

Such drawbacks are often found in enterprises that provide a service platform interface on the basis of NameNode federated storage. The interface mechanism classifies operations: the client contacts the service platform, first finds the platform's interface for big data, and these interfaces complete the classification and storage of the following datanodes.

Insert picture description here
There is another problem: the server has the risk of downtime. From a horizontal perspective, the horizontal expansion storage capacity of the server cluster is improved. Therefore, for the vertical angle, each machine must be highly available. The purpose of high availability is to synchronize the information of the master node.
Insert picture description here
The establishment of the Federation is not the focus. The data stored by the namenode is metadata, and it contains information such as offset.

Build HA high-availability cluster

Insert picture description here
Insert picture description here
The function of automatic switching between two namenode nodes should be realized. Therefore, there must be a key-free operation between node06 and node07.

Key generator: Insert picture description here
Append to your own file.
Insert picture description here
Distribute it to the current directory of node06: (change the name to avoid overwriting)
Insert picture description here

nameservices represents a logical name, which needs to be associated with the configured node information.

High-availability construction theory

Version 2.0 mainly solves the single point of failure problem of version 1.0 through high-availability HA and federation.

Insert picture description here
Insert picture description here
Read the document;

nameservices:
first load into hdfs;

dfs.namenode.rpc-address to find the location of the physical machine through remote service calls.

dfs.namenode.http-address: Provide services to the browser, allowing the browser to access this cluster.

dfs.namenode.shared.edits.dir:

dfs.ha.fencing.methods: indicates the meaning of state isolation. The configuration of this thing isolates the surviving node when it fails.

Example: There are two nodes, one is ann and the other is sbnn. When ann fails, the election mechanism will notify the zookeeper cluster, and zookeeper will encapsulate this event and hand it over to sbnn. After sbnn gets it, it does not simply promote itself, but first forcibly convert the state of ann to make it an inactive node.

This conversion is the function of dfs.ha.fencing.methods.

Which file is involved in the private key operation – id_dsa.

The command is: generate a key file, and then append it to the author's file.

fs.defaultFS: namenode master node, the core-site file is modified.
Insert picture description here
Zookeeper is outside the entire system and is not necessary. Startup and shutdown have nothing to do with the startup and shutdown of the current cluster.

In the zookeeper cluster, each nameNode will have a session session and its binding.

In web development, session is used as the unique identifier of the visitor. The http protocol is a stateless protocol. You must not know the next time you visit. In order to solve it, the server provides the session, which is maintained by the cookie of the client, so as to achieve a different face.

Each namenode node will be registered in zookeeper, and each namenode corresponds to a persistent session. The session has a life cycle problem. Once the connection between the node and the cluster is interrupted, the node node bound to it is also destroyed.

zookeeper: The first allows you to register, the second callback monitoring, and the third event function. The callback is the function of the client, in fact, the callback is the function of zkfc.
zkfc maintains and monitors the status of the namenode.

Zookeeper is a separate cluster, which must run on 3-5 nodes. To cut off the cluster. Add parameter configuration to hdfs-site. Complete automatic failover.

Zookeeper requires a message at the beginning of the cluster construction, that is, how many servers have participated in the cluster construction. The second requirement is to provide each server with the server ID, which is serverid. It involves the election mechanism of zookeeper. Since it is a master-slave structure, it is necessary to determine who is the master at the beginning, and at the same time, whoever has the highest number is the master.

Insert picture description here

Then began to distribute.

Guess you like

Origin blog.csdn.net/qq_29027865/article/details/110840662