ZooKeeper principle and introduction

Zookeeper Profile

What is Zookeeper

  • ZooKeeper is a distributed, open-source coordination service for distributed applications, is an open source implementation of Google's Chubby, Big Data is an important component of the ecological. It is a cluster manager, monitors the status of each node in the cluster for the next operation in accordance with a reasonable feedback submitting node. Finally, the easy to use interface and efficient performance, function and stability of the system to the user.

  • It is a consistency to provide coordinated services for distributed application middleware

 ZooKeeper provides what

  • File system
    • Zookeeper provide a multi-level namespace node (node called znode). And the file system is different is that these nodes can set the data associated , and the file system can only store data file node directory node can not. Zookeeper In order to ensure low latency and high throughput, maintained in memory of this tree directory structure, this feature makes Zookeeper can not be used to store large amounts of data , each node storing data limit is 1M .
  • Notification mechanism
    • client end will establish a znode a watcher event , when the znode changes, the client will be notified of these zk, and client can make changes on the services according to znode change.

 What are the Distributed Systems

  • Many computers form a whole, a whole unanimously and processing the same request

  • Each computer may communicate with each other inside the (rest / rpc)

  • Client to the server in response to a request end will undergo multiple computers

  • Exhibit 1

  • Exhibit 2

Problems in distributed systems

  • Dynamic service registration and discovery , in order to support high-concurrency, OrderService been deployed four, each client maintains a list of service providers, but this list is static (write died in the configuration file), If the service provider has changed, for example, some machines down, or has added OrderService example, the client does not know, want to get the latest list of URL's service provider, you must manually update the configuration file, very Convenience.

    • Question:  tight coupling client and service provider

    • Solution: decouple, add a middle layer -  registration center which holds services can provide the name , and the URL of . The first of these services will be registered in the registry, when the client queries, just give the name, the registry will be given a URL. All clients before accessing the service, you need to the registration center for questioning, to get the latest address.

    • May be a registry tree structure, each service has the following several nodes, each node represents instances of the service.

    • Registry and various service instances established directly Session, requested instances were regularly send heartbeat, once a particular time no heartbeat is considered an example hung up, remove the instance.

  • Job Coordination

    • Job same three features deployed on three different machines, requires that only one can run, that is, if there is a dang the case, the need to elect the remaining two Mastercontinue to work

    • Job So these three need to coordinate with each other

      • Use a shared database tables . We can not know the database primary key conflict, allowing three Job insert the same data into the table, who succeeded who is the Master. The disadvantage is that if the Job grab Master hung up, the ever-present record, other Job data can not be inserted. It must be coupled with a mechanism for regular updates.

      • Let Job After the start, go to registration centers registered, that is, create a tree node, who succeeded who is Master ( registration centers must ensure the success can only be created once ).

      • Thus, if a node is deleted, began a new round of fighting.

  • Distributed lock, different systems running on multiple machines operate the same resources

    • Use Master election the way, let everyone try to steal, who can grab you create a /distribute_locknode, it is deleted after reading, so that we come back to grab. The disadvantage is that a system may repeatedly grab, fair enough.

    • So that each system registry /distribute_lockcreated under the child node, then numbering, each system check their number, who think small number who holds a lock, such as the following figure is holding a lock system

    • After the system operation is complete, you can put process_01 delete, and then create a new node process_04. Process_02 at this time is the minimum, so the thought system 2 holds the lock.

    • After the operation is complete, it should also process_02 delete nodes, create a new node. This time process_03 is minimal, you can hold the lock.

  • Availability registry

    • If the registration center is just a machine that, when hung up, the entire system dang. So the need for multiple machines to ensure high availability. This leads to new problems, such as a tree structure needs to be synchronized between multiple machines, communications timed out how to do, how to ensure consistency between the strong tree structure in the machine.

Zookeeper role

  • The master node of the election, after the master node goes down, it will take over the job from the node, and ensure that this is the only node, which is the so-called Summit mode, so as to ensure that we are highly available cluster
  • Unified management configuration file, i.e., need to deploy only a single server, the same profile can be updated simultaneously to all other servers, particularly in the operation by the cloud (e.g., the modified uniform configuration redis)
  • Publish and subscribe data, similar to the message queue MQ
  • Distributed Lock, distributed environment, competition for resources among different processes, similar to the multi-process lock
  • Cluster management to ensure strong consistency of the data in the cluster

Zookeeper's characteristics

  • Consistency: Consistency data, storage data in the order in batches
  • Atomic: A transaction either succeed or fail
  • Single view: zk client to connect to any node in the cluster, the data are consistent
  • Reliability: every time zk operating state is saved on the server
  • Real-time: clients can read the latest server-side data zk

Transfer: https://www.cnblogs.com/xinyonghu/p/11031729.html

Guess you like

Origin www.cnblogs.com/diandianquanquan/p/12555047.html
Recommended