Follow-up of distributed website architecture: analysis of zookeeper technology

http://www.cnblogs.com/sharpxiajun/archive/2013/06/02/3113923.html

Zookeeper is a sub-project of hadoop. Although it is derived from hadoop, I found that zookeeper is out of the scope of hadoop to develop the application of distributed framework more and more. Today I want to talk about zookeeper. This article does not talk about how to use zookeeper, but what are the practical applications of zookeeper, what types of applications can take advantage of zookeeper, and finally how zookeeper can play a role in distributed website architecture.

  Zookeeper is a highly reliable coordination system for large distributed systems. From this definition, we know that zookeeper is a coordination system, and the object of its role is a distributed system. Why do distributed systems need a coordination system? The reason is as follows:

  developing distributed systems is difficult, and the difficulty is mainly reflected in the "partial failures" of distributed systems. "Partial failure" means that when information is transmitted between two nodes in the network, if the network fails, the sender cannot know whether the receiver has received the message, and the cause of this failure is very complex, the receiver may appear Network error The message may have been received before, or may not have been received, or the recipient's process died. The only way the sender can get the truth is to reconnect to the receiver and ask the receiver why it was wrong. This is the "partial failure" problem in distributed system development.

  Zookeeper is a framework for solving "partial failures" of distributed systems. Zookeeper does not allow distributed systems to avoid the "partial failure" problem, but allows distributed systems to correctly handle such problems when encountering partial failures, so that distributed systems can run normally.

  Now I want to talk about the actual application scenarios of zookeeper:

  Scenario 1: There is a group of servers that provide a certain service to the client (for example: the server side of the distributed website I made earlier is a cluster of four servers that provides services to the front-end cluster), we hope that the client requests each time The server can find a server in the server cluster, so that the server can provide the client with the services required by the client. For this scenario, our program must have a list of this group of servers. Every time the client requests, this server list is read from this list. Obviously, this sub-list cannot be stored on a single-node server, otherwise the node will hang up and the entire cluster will fail. We hope that this list will be highly available. The high-availability solution is: the list is distributed and managed by the servers that store the list. If a server in the storage list fails, other servers can immediately replace the broken server. , and can delete the broken server from the list, let the failed server exit the operation of the entire cluster, and all these operations will not be performed by the failed server, but by the normal servers in the cluster. This is an active distributed data structure that can actively modify the state of data items when external conditions change. The Zookeeper framework provides this service. The name of this service is: Unified Naming Service, which is very similar to the JNDI service in javaEE.

  Scenario 2: Distributed lock service. When a distributed system manipulates data, for example: reading data, analyzing data, and finally modifying data. In a distributed system, these operations may be scattered on different nodes in the cluster, then there is a problem of consistency in the data operation process. If it is inconsistent, we will get an incorrect operation result. In the distributed system, the problem of consistency is easy to solve, but it is more difficult in the distributed system, because the operations of different servers in the distributed system are in independent processes, and the intermediate results and processes of the operations are also transmitted through the network, then It is much more difficult to achieve consistency in data operations. Zookeeper provides a lock service to solve such a problem, allowing us to ensure the consistency of data operations when doing distributed data operations.

  Scenario 3: Configuration management. In a distributed system, we will deploy a service application to n servers respectively, and the configuration files of these servers are the same (for example: in the distributed website framework I designed, there are 4 servers on the server side, 4 The programs on the server are the same, the configuration files are the same), if the configuration options of the configuration files change, then we have to change these configuration files one by one. If we need to change less servers, these operations are not too much. Trouble, if we have a lot of distributed servers, such as the hadoop cluster of some large Internet companies with thousands of servers, then changing configuration options is a troublesome and dangerous thing. At this time, zookeeper can come in handy. We can use zookeeper as a highly available configuration storage, and hand over such things to zookeeper for management. We copy the configuration file of the cluster to a node in the file system of zookeeper. Then use zookeeper to monitor the status of configuration files in all distributed systems. Once a configuration file is found to have changed, each server will receive a notification from zookeeper, so that each server can synchronize the configuration files in zookeeper, and the zookeeper service will also ensure synchronization. Operational atomicity ensures that each server's configuration file is updated correctly.

  Scenario 4: Provide fault recovery functions for distributed systems. Cluster management is very difficult. Adding the zookeeper service to the distributed system allows us to easily manage the cluster. The most troublesome thing in cluster management is node failure management. Zookeeper can let the cluster select a healthy node as the master. The master node will know the running status of each server in the current cluster. Once a node fails, the master will take this situation. Notify other servers in the cluster to redistribute computing tasks on different nodes. Zookeeper can not only find faults, but also screen the faulty server to see what kind of fault the faulty server is. If the fault can be repaired, zookeeper can automatically repair it or tell the system administrator the cause of the error so that the administrator can quickly locate the problem. Repair node failures. You may still have a question, the master is faulty, what should I do? Zookeeper also takes this into account. Zookeeper has an "algorithm for electing a leader". The master can be dynamically selected. When the master fails, zookeeper can immediately elect a new master to manage the cluster.

  Now I want to talk about the characteristics of

zookeeper: Zookeeper is a streamlined file system. In this respect, it is similar to hadoop, but the file system of zookeeper manages small files, while hadoop manages large files.
Zookeeper provides a wealth of "components" that can implement many operations that coordinate data structures and protocols. For example: distributed queues, distributed locks, and a "leader election" algorithm for a group of peer nodes.
Zookeeper is highly available, and its own stability is quite good. Distributed clusters can completely rely on the management of zookeeper clusters, and use zookeeper to avoid the single point of failure of distributed systems.
Zookeeper adopts a loosely coupled interaction model. This is most obvious when zookeeper provides distributed locks. Zookeeper can be used as a dating mechanism, allowing participating processes to discover and interact with each other without knowing about other processes (or networks). The participating parties even It does not have to exist at the same time, as long as a message is left in zookeeper, after the process ends, another process can also read this information, thereby decoupling the relationship between each node.
Zookeeper provides a shared repository for the cluster, from which the cluster can centrally read and write shared information, avoid the shared operation programming of each node, and reduce the development difficulty of distributed systems.
The design of zookeeper adopts the design mode of observer. Zookeeper is mainly responsible for storing and managing the data that everyone cares about, and then accepts the registration of observers. Once the status of these data changes, Zookeeper will be responsible for notifying the registered data on Zookeeper. Those observers react accordingly, thus implementing a Master/Slave-like management model in the cluster.
  It can be seen that zookeeper is very beneficial to the development of distributed systems, which can make distributed systems more robust and efficient.

  Not long ago, I participated in the hadoop interest group of the department. I installed hadoop, mapreduce, hive and hbase in the test environment. When installing hbase, I had to install zookeeper in advance. I first installed zookeeper on all four servers. But my colleague said that installing four machines is the same as installing three machines. This is because zookeeper requires more than half of the machines to be available before zookeeper can provide services, so more than half of the three machines are two machines, and more than half of the four machines are also two machines. , so installing three servers can completely achieve the effect of 4 servers. This problem shows that zookeeper usually chooses an odd number of servers when installing. In the process of learning hadoop, I feel that zookeeper is the most difficult sub-project to understand. The reason is not that it is technically responsible, but that its application direction confuses me, so my first article about hadoop technology starts with zookeeper , and don't talk about the specific technical implementation, but starting from the application scenario of zookeeper, and understanding the field of zookeeper application, I think it will be more effective to learn zookeeper again.

  The reason why I want to talk about zookeeper today is to supplement the distributed website framework of my last article. Although I designed the website architecture to be a distributed structure, and also made a simple fault handling mechanism, such as the heartbeat mechanism, there is still no way to deal with the single point of failure of the cluster. If a certain server is broken, the client will still Attempting to connect to this server will block some requests and waste server resources. However, I don't want to modify my framework at present, because I always feel that adding zookeeper service to existing services will affect the efficiency of the website. It is worth considering if there is an independent server cluster to deploy zookeeper, but server resources are too precious. This is unlikely. Fortunately, our department has also discovered such a problem. Our department will develop a powerful remote call framework, separate out the cluster management and communication management, and provide efficient and available services in a centralized manner. After the development of the department's remote framework is completed, our The website adds new services, I think our website will be more stable and efficient.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326433728&siteId=291194637