500 million daily queries Jingdong Order Center, why MySQL homes with ES?

Read This takes about 8 minutes.
Source: Technology Jingdong subscription number (ID: jingdongjishu)
Author: Zhang sir
 
Jingdong Order Center home systems business, whether it is an external business-to-order, or rely on internal downstream system, the amount of orders inquiry calls are very large, resulting in data reading and writing of orders less.

We order data stored in MySQL, but apparently only to support a large number of queries is not desirable by DB. While for some complex queries, MySQL support was friendly enough, so the system uses Elasticsearch Order Center to host major pressure Order query.

Elasticsearch distributed as a powerful search engine, support for near real-time storage, search data, plays a huge role in the Jingdong home orders in the system, the amount of data currently stored ES Orders center cluster to reach one billion documents, the average daily queries 500 million.

With Jingdong home in recent years, the rapid development of business, Order Center ES set up program is also evolving, ES cluster development so far is to set up a real-time mutual support scheme, well protect the stability of the ES read-write cluster, gave the following tell you about some of the history of this pit and encountered.

ES cluster architecture evolution of the road

1, the initial phase

Order Center ES initial stage as a piece of paper, basically no plan to set up, many configurations are maintained cluster default configuration. The entire cluster deployed on the Group's elastic cloud, cluster nodes, and machine deployment ES are confusing. At the same time according to the dimension of the cluster, the cluster will have a single point ES problem, apparently for orders business center is also not allowed.

2, isolated from the cluster stage

And a lot of business as a way mixed cloth used ES cluster. However, due to the order center ES is stored online orders, occasionally mixed cloth cluster system will preempt a lot of resources, leading to the Order Center ES service exception.

Obviously any impact on the stability of orders inquiry is intolerable, so for in this case, the first elastic cloud ES Orders center is located, to move out of those system resources to seize the high cluster node, cluster status improved slightly ES . But with increasing cluster data, elastic cloud configuration has been less able to meet the ES cluster, and in order to complete physical isolation, the final order to simply deploy ES cluster to the center of the high profile of the physical machine, ES cluster performance has been improved.

3, the node copies tuning stage

ES performance with the hardware resources of a great relationship, when ES cluster to deploy a separate physical machine, the internal nodes of the cluster is not occupy the whole physical machine resources, there will still seize the resource when the cluster nodes running on the same physical machine The problem. So in this case, in order to allow a single node to use ES greatest degree of machine resources, using each node ES deployed on a single physical machine.

But then, the question again, if a single node bottleneck of it? How should we then optimize it?

ES query principle, when the request hit a fragment number, if no slice type (the Preference parameter) query requests to the load on each slice number corresponding to the node. The cluster copy of the default configuration is a main one, for this case, we think the expansion copy mode, the default of a main one becomes a main two, while increasing the corresponding physical machine.

Order Center cluster erected schematic ES

As shown, the entire set up by way of VIP load balancing to external requests:

The entire set of primary cluster fragments, two sets of sub-slicing (a primary two), forwarding the request from the gateway node over, will be equalized by way of polling data before the hit node. A copy of the cluster increase and expansion of the machine way to increase the throughput of the cluster, so as to enhance the overall query performance cluster.

Illustration The graphic shows the various stages of performance Order Center ES cluster, a visual representation significantly improve cluster performance after each stage of the optimization ES:

Of course, the number of slices and slice the number of copies is not possible, at this stage, we made further exploration of selecting the appropriate number of fragments. Number of fragments can be understood as MySQL in the sub-library sub-table, and the current inquiry Order Center ES divided into two categories: single query ID and paging query.

 

The larger the number of fragments, the size of the cluster is greater lateral expansion, according to a certain order ID query fragmentation can also greatly enhance the routing, but the polymerization will reduce the paging query performance; the smaller the number of slices, also the cluster size of lateral expansion smaller, single ID query performance will decline, but the performance will increase pagination queries.

So how balanced the number of sub-sheet and existing business inquiries, we do a lot of adjustment pressure measurement, chose a good cluster performance is the number of slices.

4, the main adjustment phase from the cluster

到此,订单中心的ES集群已经初具规模,但由于订单中心业务时效性要求高,对ES查询稳定性要求也高,如果集群中有节点发生异常,查询服务会受到影响,从而影响到整个订单生产流程。很明显这种异常情况是致命的,所以为了应对这种情况,我们初步设想是增加一个备用集群,当主集群发生异常时,可以实时的将查询流量降级到备用集群。

那备用集群应该怎么来搭?主备之间数据如何同步?备用集群应该存储什么样的数据?

考虑到ES集群暂时没有很好的主备方案,同时为了更好地控制ES数据写入,我们采用业务双写的方式来搭设主备集群。每次业务操作需要写入ES数据时,同步写入主集群数据,然后异步写入备集群数据。同时由于大部分ES查询的流量都来源于近几天的订单,且订单中心数据库数据已有一套归档机制,将指定天数之前已经关闭的订单转移到历史订单库。

所以归档机制中增加删除备集群文档的逻辑,让新搭建的备集群存储的订单数据与订单中心线上数据库中的数据量保持一致。同时使用ZK在查询服务中做了流量控制开关,保证查询流量能够实时降级到备集群。在此,订单中心主从集群完成,ES查询服务稳定性大大提升。

5、现今:实时互备双集群阶段

期间由于主集群ES版本是较低的1.7,而现今ES稳定版本都已经迭代到6.x,新版本的ES不仅性能方面优化很大,更提供了一些新的好用的功能,所以我们对主集群进行了一次版本升级,直接从原来的1.7升级到6.x版本。

集群升级的过程繁琐而漫长,不但需要保证线上业务无任何影响,平滑无感知升级,同时由于ES集群暂不支持从1.7到6.x跨越多个版本的数据迁移,所以需要通过重建索引的方式来升级主集群,具体升级过程就不在此赘述了。

主集群升级的时候必不可免地会发生不可用的情况,但对于订单中心ES查询服务,这种情况是不允许的。所以在升级的阶段中,备集群暂时顶上充当主集群,来支撑所有的线上ES查询,保证升级过程不影响正常线上服务。同时针对于线上业务,我们对两个集群做了重新的规划定义,承担的线上查询流量也做了重新的划分。

备集群存储的是线上近几天的热点数据,数据规模远小于主集群,大约是主集群文档数的十分之一。集群数据量小,在相同的集群部署规模下,备集群的性能要优于主集群。

然而在线上真实场景中,线上大部分查询流量也来源于热点数据,所以用备集群来承载这些热点数据的查询,而备集群也慢慢演变成一个热数据集群。之前的主集群存储的是全量数据,用该集群来支撑剩余较小部分的查询流量,这部分查询主要是需要搜索全量订单的特殊场景查询以及订单中心系统内部查询等,而主集群也慢慢演变成一个冷数据集群。

同时备集群增加一键降级到主集群的功能,两个集群地位同等重要,但都可以各自降级到另一个集群。双写策略也优化为:假设有AB集群,正常同步方式写主(A集群)异步方式写备(B集群)。A集群发生异常时,同步写B集群(主),异步写A集群(备)。

ES 订单数据的同步方案

MySQL数据同步到ES中,大致总结可以分为两种方案:

  • 方案1:监听MySQL的Binlog,分析Binlog将数据同步到ES集群中。
  • 方案2:直接通过ES API将数据写入到ES集群中。

考虑到订单系统ES服务的业务特殊性,对于订单数据的实时性较高,显然监听Binlog的方式相当于异步同步,有可能会产生较大的延时性。且方案1实质上跟方案2类似,但又引入了新的系统,维护成本也增高。所以订单中心ES采用了直接通过ES API写入订单数据的方式,该方式简洁灵活,能够很好的满足订单中心数据同步到ES的需求。

由于ES订单数据的同步采用的是在业务中写入的方式,当新建或更新文档发生异常时,如果重试势必会影响业务正常操作的响应时间。

所以每次业务操作只更新一次ES,如果发生错误或者异常,在数据库中插入一条补救任务,有Worker任务会实时地扫这些数据,以数据库订单数据为基准来再次更新ES数据。通过此种补偿机制,来保证ES数据与数据库订单数据的最终一致性。

遇到的一些坑

 
1、实时性要求高的查询走DB

对于ES写入机制的有了解的同学可能会知道,新增的文档会被收集到Indexing Buffer,然后写入到文件系统缓存中,到了文件系统缓存中就可以像其他的文件一样被索引到。

然而默认情况文档从Indexing Buffer到文件系统缓存(即Refresh操作)是每秒分片自动刷新,所以这就是我们说ES是近实时搜索而非实时的原因:文档的变化并不是立即对搜索可见,但会在一秒之内变为可见。

当前订单系统ES采用的是默认Refresh配置,故对于那些订单数据实时性比较高的业务,直接走数据库查询,保证数据的准确性。

2、避免深分页查询

ES集群的分页查询支持from和size参数,查询的时候,每个分片必须构造一个长度为from+size的优先队列,然后回传到网关节点,网关节点再对这些优先队列进行排序找到正确的size个文档。

假设在一个有6个主分片的索引中,from为10000,size为10,每个分片必须产生10010个结果,在网关节点中汇聚合并60060个结果,最终找到符合要求的10个文档。

由此可见,当from足够大的时候,就算不发生OOM,也会影响到CPU和带宽等,从而影响到整个集群的性能。所以应该避免深分页查询,尽量不去使用。

3、FieldData与Doc Values

FieldData

线上查询出现偶尔超时的情况,通过调试查询语句,定位到是跟排序有关系。排序在es1.x版本使用的是FieldData结构,FieldData占用的是JVM Heap内存,JVM内存是有限,对于FieldData Cache会设定一个阈值。

如果空间不足时,使用最久未使用(LRU)算法移除FieldData,同时加载新的FieldData Cache,加载的过程需要消耗系统资源,且耗时很大。所以导致这个查询的响应时间暴涨,甚至影响整个集群的性能。针对这种问题,解决方式是采用Doc Values。

Doc Values

Doc Values是一种列式的数据存储结构,跟FieldData很类似,但其存储位置是在Lucene文件中,即不会占用JVM Heap。随着ES版本的迭代,Doc Values比FieldData更加稳定,Doc Values在2.x起为默认设置。

总结

 
架构的快速迭代源于业务的快速发展,正是由于近几年到家业务的高速发展,订单中心的架构也不断优化升级。而架构方案没有最好的,只有最合适的,相信再过几年,订单中心的架构又将是另一个面貌,但吞吐量更大,性能更好,稳定性更强,将是订单中心系统永远的追求。

 

·END·

程序员的成长之路

路虽远,行则必至

本文原发于 同名微信公众号「程序员的成长之路」,回复「1024」你懂得,给个赞呗。

回复 [ 520 ] 领取程序员最佳学习方式

回复 [ 256 ] 查看 Java 程序员成长规划

 

Guess you like

Origin www.cnblogs.com/gdjk/p/11495870.html
Recommended