Taobao distributed configuration management service Diamond

In a distributed environment, many instances of the same type of service are often deployed. These instances use some configuration, and in order to better maintain these configurations, the configuration management service is generated. Configuration issues for these application services can be easily managed through this service. The application scenarios can be summarized as:

One application of zookeeper is distributed configuration management ( design and implementation of a ZooKeeper-based configuration information storage scheme ). Baidu also has a similar implementation: disconf .

Diamond is an implementation of a distributed configuration management service open sourced by Taobao. Diamond is essentially a web application written in Java, and its external interfaces are based on the HTTP protocol. When reading the code, you can start with the controller that implements each interface.

Distributed Configuration Management

The essence of distributed configuration management is basically the use of a push-subscribe model. The application side of the configuration is the subscriber, and the configuration management service is the pusher. Summarized as the following figure:

Among them, the client includes the administrator to publish data to the configuration management service, which can be understood as adding/updating data; the configuration management service notifying data to the subscriber can be understood as pushing.

The configuration management service often encapsulates a client library, and the application side interacts with the configuration management service based on the library. In actual implementation, the client library may actively pull (pull) data, but for the application side, it is generally an event notification method.

The data in Diamond is a simple key-value structure. The application side subscribes to the data based on the key, and of course the unsubscribed data will not be pushed. The data is further divided into aggregate and non-aggregate by type. Because there may be many data pushers, in the entire distributed environment, there may be multiple pushers pushing data of the same key. If the data is aggregated, the data pushed by all these pushers will be merged together; otherwise, if is non-aggregated, overwriting occurs.

The source of the data may be manually entered through the management terminal, or it may be automatically entered by other services through the push interface of the configuration management service.

Architecture and Implementation

The Diamond service is a cluster, a collaborative cluster with no single point removed. As shown in the figure:

The figure can be divided into the following parts:

Sync between services

Each instance of the Diamond service cluster can provide complete services to the outside world, which means that each instance has data maintained by the entire cluster. Diamond guarantees this in two ways:

  • Any instance has the addresses of other instances; when the data on any instance changes, the changed data will be synchronized to mysql, and then all other instances will be notified to perform a data pull ( DumpService::dump) from mysql. This process only pulls changed data
  • After any instance is started, it will perform a full data pull ( DumpAllProcessor) from mysql at a long time interval (several hours).

For consistency in implementation, notify other instances that actually contain themselves. Taking the addition of aggregated data received by the server as an example, the processing process is roughly as follows:

[plain]  view plain copy  
 
  1. DatumController::addDatum // /datum.do?method=addDatum  
  2.     PersistService::addAggrConfigInfo   
  3.     MergeDatumService::addMergeTask // Add a MergeDataTask for asynchronous processing  
  4.   
  5. MergeTaskProcessor :: process  
  6.     PersistService::insertOrUpdate  
  7.         EventDispatcher.fireEvent(new ConfigDataChangeEvent // Dispatches a ConfigDataChangeEvent event  
  8.   
  9. NotifyService::onEvent // Receive events and process  
  10.     TaskManager::addTask(..., new NotifyTask // Thus, when the data changes, a NoticyTask is finally created  
  11.   
  12. // NotifyTask is also processed asynchronously  
  13. NotifyTaskProcessor::process  
  14.     foreach server in serverList // 包含自己  
  15.         notifyToDump // 调用 /notify.do?method=notifyConfigInfo 从mysql更新变动的数据  

虽然Diamond去除了单点问题,不过问题都下降到了mysql上。但由于其作为配置管理的定位,其数据量就mysql的应用而言算小的了,所以可以一定程度上保证整个服务的可用性。

数据一致性

由于Diamond服务器没有master,任何一个实例都可以读写数据,那么针对同一个key的数据则可能面临冲突。这里应该是通过mysql来保证数据的一致性。每一次客户端请求写数据时,Diamond都将写请求投递给mysql,然后通知集群内所有Diamond实例(包括自己)从mysql拉取数据。当然,拉取数据则可能不是每一次写入都能拉出来,也就是最终一致性。

Diamond中没有把数据放入内存,但会放到本地文件。对于客户端的读操作而言,则是直接返回本地文件里的数据。

服务实例列表

Diamond服务实例列表是一份静态数据,直接将每个实例的地址存放在一个web server上。无论是Diamond服务还是客户端都从该web server上取出实例列表。

对于客户端而言,当其取出了该列表后,则是随机选择一个节点(ServerListManager.java),以后的请求都会发往该节点。

数据同步

客户端库中以固定时间间隔从服务器拉取数据(ClientWorker::ClientWorkerClientWorker::checkServerConfigInfo)。只有应用方关心的数据才可能被拉取。另外,为了数据推送的及时,Diamond还使用了一种long polling的技术,其实也是为了突破HTTP协议的局限性。如果整个服务是基于TCP的自定义协议,客户端与服务器保持长连接则没有这些问题

数据的变更

Diamond中很多操作都会检查数据是否发生了变化。标识数据变化则是基于数据对应的MD5值来实现的。

容灾

在整个Diamond系统中,几个角色为了提高容灾性,都有自己的缓存,概括为下图:

每一个角色出问题时,都可以尽量保证客户端对应用层提供服务。

参考文档

 

 

http://blog.csdn.net/kevinlynx/article/details/40017109

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326027316&siteId=291194637