[Hadoop in China 2011] Quest Alipay dog fur seal is not quasi-real-time search queries

http://storage.it168.com/a2011/1203/1283/000001283153.shtml

 

  As the nation's largest third-party trading platform, Alipay amount of data generated every day is undoubtedly difficult to estimate. These data whether it is for individual users or Alipay is very important. Typically, individuals will be recorded in the recorded history of consumption Hbase to query about, or CTU risk of data items. In addition, the use of Alipay is relatively mature Hadoop, including its next -stop resource services dolphins systems , and related Pig visual user independent inquiry. These tools and applications essential to the formation of Alipay ADC architecture system, in which most affect the user experience is the seal real-time search service .

 

Quest Alipay dog ​​fur seal is not quasi-real-time search queries

 

  As shown above, Seal real-time search , the blue whale flow calculation , dolphins massive computing , swordfish massive data queries , starfish distributed data mining and octopus Data Distribution Center together form the ADC architecture system , together provide users with massive data base services and these are all data related to, or associated with Hadoop.

      Note: the future focus of the following open source products: Mass Storage: Hadoop, Hbase, Greenplum, the DFS , massive computing: Hive, Pig, Mahout

               Hadoop, Hbase, Hive, Pig, Mahout and Solr for future focus.

Quest Alipay dog ​​fur seal is not quasi-real-time search queries

   从上图可以看出,海狗海量数据实时搜索是基于Hbase和Solr集成,为用户提供千亿级别的数据实时查询和全文检索,这在支付宝的业务中是重中之重, 也是基础服务之一,必然要求数据查询结果能够实时返回。那么面对海量的数据信息,支付宝的技术团队是如何做到的呢?下面,支付宝公司架构师蒋杰将为我们揭秘海狗实时搜索的内在技术

Quest Alipay dog ​​fur seal is not quasi-real-time search queries
支付宝架构师蒋杰(平原君)

Quest Alipay dog ​​fur seal is not quasi-real-time search queries
海狗实时搜索工作原理拓扑图

   海狗系统(ARSC Alipay Realtime Search Cluster)是支付宝实时搜索集群平台,其是基于Hadoop、Hbase、Zookeeper、Solr以及Zoie等开源技术二次开发而 成。它的产生是为解决当前数据库无法支持海量数据的检索/全文检索、数据库存在Schema动态扩展问题、Hbase无法支持多维度检索以及普通搜索引擎无法做到实时更新数据检索等难题。

Quest Alipay dog ​​fur seal is not quasi-real-time search queries
海狗系统逻辑架构

Quest Alipay dog ​​fur seal is not quasi-real-time search queries
海狗集群架构

Quest Alipay dog ​​fur seal is not quasi-real-time search queries
▲海狗功能模块

Quest Alipay dog ​​fur seal is not quasi-real-time search queries
▲海狗节点示意图

   ARSC Node主要作用在于能够高效地接收Client输入数据,并同步Hbase Record数据到Solr索引(WAL作用),同时还能缓存瞬时高并发数据。其工作原理是Client请求ZK,ZK取得ARSC Node列表返回给Client,ARSC Node接收Client CRUD请求,再通过MQ Handler模块持久化数据到MQ-shard表,并通过MQ Handler模块写MQ内存缓存;若内存缓存写满,那么开始写本地硬盘上,最后返回客户端。

 

  Solr Node主要作用是接收ARSC Node发送数据并创建实时索引,以提供实时搜索。其具体工作流程如下图所示:

Quest Alipay dog ​​fur seal is not quasi-real-time search queries

  Solr Core接收从MQ Push过来的数据,保存到内存索引A(B为空),内存索引A是每添加完文档后立刻更新索引,保证实时性,同时,内存索引A和硬盘上 的索引Disk,同时对外提供搜索服务。当A中的文档数量达到一定的数量时,需要同硬盘上的索引进行合并,这时候会创建内存索引B,在合并过程中新添加的 文档全部放入内存索引B中。A,B和Disk Index共同对外提供搜索服务(PS:A中的索引不会重复索引,索引一致性保证),A和Disk index合并之后,原来的索引A变为null,B改名为A,同时,并重新打开Disk索引提供搜索(Disk index=A+Old Disk index)。

 

  Seal advantage of real-time search is the ability to provide real-time data update and retrieval of values can search, retrieve enumeration, full-text search and other types provide real-time multi-dimensional search, and also supports asynchronous queries and batch SQL-like query. And, it is also very flexible extended, extension of the dynamic performance can be achieved, on the capacity scales linearly, and with dynamic load balancing and dynamic extension Schema.

Reproduced in: https: //www.cnblogs.com/licheng/archive/2011/12/05/2276397.html

Guess you like

Origin blog.csdn.net/weixin_34174132/article/details/92627687
Dog