ZoopKeeper principle and introduction

About a Zookeeper

1.1 What is Zookeeper

  • ZooKeeper is a distributed, open-source coordination service for distributed applications, is an open source implementation of Google's Chubby, Big Data is an important component of the ecological. It is a cluster manager, monitors the status of each node in the cluster for the next operation in accordance with a reasonable feedback submitting node. Finally, the easy to use interface and efficient performance, function and stability of the system to the user.

  • It is a consistency to provide coordinated services for distributed application middleware

1.2 ZooKeeper provides what

  • File system
    • Zookeeper provide a multi-level namespace node (node called znode). And the file system is different is that these nodes can set the data associated , and the file system can only store data file node directory node can not. Zookeeper In order to ensure low latency and high throughput, maintained in memory of this tree directory structure, this feature makes Zookeeper can not be used to store large amounts of data , each node storing data limit is 1M .
  • Notification mechanism
    • client end will establish a znode a watcher event , when the znode changes, the client will be notified of these zk, and client can make changes on the services according to znode change.

1.3 What are the Distributed Systems

  • Many computers form a whole, a whole unanimously and processing the same request
  • Each computer may communicate with each other inside the (rest / rpc)
  • Client to the server in response to a request end will undergo multiple computers

  • Exhibit 1

  • Exhibit 2

1.4 Distributed System question

  • Dynamic service registration and discovery , in order to support high-concurrency, OrderService been deployed four, each client maintains a list of service providers, but this list is static (write died in the configuration file), If the service provider has changed, for example, some machines down, or has added OrderService example, the client does not know, want to get the latest list of URL's service provider, you must manually update the configuration file, very Convenience.

    • Question: tight coupling client and service provider

    • Solution: decouple, add a middle layer - registration center which holds services can provide the name , and the URL of . The first of these services will be registered in the registry, when the client queries, just give the name, the registry will be given a URL. All clients before accessing the service, you need to the registration center for questioning, to get the latest address.

    • May be a registry tree structure, each service has the following several nodes, each node represents instances of the service.

    • Registry and various service instances established directly Session, it requested instances were regularly send heartbeat, once a particular time no heartbeat is considered an example hung up, remove the instance.

  • Job Coordination

    • Job same three features deployed on three different machines, requires that only one can run, that is, if there is a dang the case, the need to elect the remaining two Mastercontinue to work

    • Job So these three need to coordinate with each other

      • Use a shared database tables . We can not know the database primary key conflict, allowing three Job insert the same data into the table, who succeeded who is the Master. The disadvantage is that if the Job grab Master hung up, the ever-present record, other Job data can not be inserted. It must be coupled with a mechanism for regular updates.

      • 让Job在启动之后,去注册中心注册,也就是创建一个树节点,谁成功谁是Master(注册中心必须保证只能创建成功一次)。

      • 这样,如果节点删除了,就开始新一轮争抢。

  • 分布式锁, 多台机器上运行的不同的系统操作同一资源

    • 使用Master选举的方式,让大家去抢,谁能抢到就创建一个/distribute_lock节点,读完以后就删除,让大家再来抢。缺点是某个系统可能多次抢到,不够公平。

    • 让每个系统在注册中心的/distribute_lock下创建子节点,然后编号,每个系统检查自己的编号,谁的编号小认为谁持有了锁,比如下图中是系统1持有了锁

    • 系统1操作完成以后,就可以把process_01删除了,再创建一个新的节点 process_04。此时是process_02最小了,所以认为系统2持有了锁。

    • 操作完成以后也要把process_02节点删除,创建新的节点。这时候process_03就是最小的了,可以持有锁了。

  • 注册中心的高可用

    • 如果注册中心只有一台机器,一旦挂了,整个系统就宕了。所以需要多台机器来保证高可用性。这样引出了新的问题,比如树形结构需要在多台机器之间进行同步,通信超时了怎么办,如何保证树形结构在机器之间的强一致性。

1.5 Zookeeper作用

  • master节点选举, 主节点down掉后, 从节点就会接手工作, 并且保证这个节点是唯一的,这也就是所谓首脑模式,从而保证我们集群是高可用的
  • 统一配置文件管理, 即只需要部署一台服务器, 则可以把相同的配置文件同步更新到其他所有服务器, 此操作在云计算中用的特别多(例如修改了redis统一配置)
  • 数据发布与订阅, 类似消息队列MQ
  • 分布式锁,分布式环境中不同进程之间争夺资源,类似于多进程中的锁
  • 集群管理, 保证集群中数据的强一致性

1.6 Zookeeper的特性

  • 一致性: 数据一致性, 数据按照顺序分批入库
  • 原子性: 事务要么成功要么失败
  • 单一视图: 客户端连接集群中的任意zk节点, 数据都是一致的
  • 可靠性:每次对zk的操作状态都会保存在服务端
  • 实时性: 客户端可以读取到zk服务端的最新数据

Guess you like

Origin www.cnblogs.com/xinyonghu/p/11031729.html