Three-node application RDS 5.7 Enterprise Edition Alibaba

  RDS 5.7 Enterprise Edition is a three-node Ali independent research and development, is fully compatible with MySQL consistent strong global ecological level distributed relational database. Starting in 2017, large-scale use in the group, currently serving not only the Lynx / Taobao core trading systems, also ants, rookie, nails, horse boxes, Youku, high ethics, reputation and other sub-BU Ali heavy use, this we take a look at RDS 5.7 enterprise Edition three-node specific application scenarios.

A three-node .RDS 5.7 Enterprise Edition's core strengths

  Part we introduced the RDS 5.7 Enterprise Edition three-node data consistency solution of this chapter I will do a three-node core strengths RDS 5.7 Enterprise Edition is a simple summary.

1. The data is consistent and strong capability

  RDS 5.7 Enterprise Edition three-node basis AliSQL on the integrated X-Paxos. With a distributed consensus algorithm of X-Paxos, RDS 5.7 Enterprise Edition from the three-node function level can solve the data quality problems, let RDS 5.7 Enterprise Edition from a three-node data consistency AliSQL era completely came out from the aid Eco solve the consistency problem, upgrade to the product inside an integrated manner to solve.

Three-node performance optimization 2.RDS 5.7 Enterprise Edition

We all know that a strong agreement must bring down the performance, we look at the three-node RDS 5.7 Enterprise Edition to optimize what we have done, to improve throughput and performance.

1). The integrated design of the log

  RDS 5.7 Enterprise Edition three-node Consensus log is a merger of the original binlog and relay_log, but retains the original MySQL binlog transaction log format. Bring the benefits of integrated log first is to reduce a log is written; secondly scan the log can be done sequentially IO, this is especially efficient for searching and retrieving logs under Paxos algorithm is an improvement.

2). Asynchronous transaction commits

  First X-Paxos entirely based multithreading, you can use the full multi-threading capabilities in a single partition Paxos, all tasks have a common worker to run, eliminating the bottleneck CPU. On this basis, we split (synchronous processing requests → → wait for the transaction is committed) transaction processes in the service layer, different stages performed by different threads in the thread pool. With asynchronous transformation X-Paxos multi-threaded model and service layer, the system has greatly improved throughput. In particular, the cross-domain or cross-scene, a huge network latency makes the thread pool becomes a bottleneck: the actual operation will find most of the thread pool thread synchronization log back into bales awaiting cross-section, the server does not have enough threads to handle customer requests.

3).Batching &Pipelining

  X-Paxos protocol optimized for high latency networks have done a lot to try and test, combined with the existing theoretical results academia reasonable Batching and Pipelining, designed and implemented a set of adaptive for high latency high throughput and low latency high throughput mode of the communication network, greatly enhance the performance of the X-Paxos. Pipeling introduced, it is necessary to solve the problem log out of order, especially in remote scenarios, window increased, increasing the probability of out of order. X-Paxos achieve an efficient out of order processing module, the shield can be realized on the underlying disorder problem log, log out of order to achieve efficient storage.
Above optimize several points to ensure that the RDS 5.7 Enterprise Edition three-node cross-domain or multi-cell deployment in the deployment of a multi-room city, the write performance and single-node mode (non-uniform intensity) no significant degradation. As well as providing performance optimization feature allows RDS 5.7 Enterprise Edition three-node can really solve the consistency problem, while nearly two years in promoting the use of the group also verified the three RDS 5.7 Enterprise Edition node reliability and completeness.

3. A variety of roles and dynamics of change

  In addition to the consistency of solutions, RDS 5.7 Enterprise Edition also three-node by means of the X-Paxos powerful ecological functions, an increase of three RDS 5.7 Enterprise Edition node flexibility, reflected in the following several points:
  1) friendly control. Action: support online add & remove nodes, support online sale Leader;
  2) the policy of the majority and the weight of the selected master: let the business according to their deployment features configurable;
  3) the node customization: classic multi- paxos implementations, each node typically contains proposer / Accepter / Learner three functions, each node is a fully functional node. Enterprise Edition supports three independently configurable features RDS 5.7 three-node, three functions Paxos algorithm nodes were stripped and re-form a variety of nodes in different roles. As follows:
image
  . 4) .SDK mode: We packaged into a learner role the SDK, the SDK through, can quickly achieve a three-node docking RDS 5.7 Enterprise Edition and downstream of each system, a closed loop is formed complete ecosystem. For example: based on subscriber feature, we can achieve real-time backup logs, real-time subscription incremental downstream consumption and other functions. The following is a true case of a line running cluster.
image

II. Common RDS 5.7 Enterprise Edition three-node deployment model

  We now know that three-node RDS 5.7 Enterprise Edition not only made a strong consistent high performance, but also provide rich and flexible deployment model for business. Here are several deployment models we look at the application of Ali.

1. Cross-city shop mode

Figure deployment model

image

Feature

1).机房级别容灾,数据零丢失,10秒级别的容灾能力;
2).部署采用两数据副本,一日志节点(无数据,同时最低配置)。相比主备方案成本增加很有限;
3).备份接入SDK后,RPO<1秒。

This mode is most commonly used model group, various extensions of the model, can be the following several modes (modes have extended backup RPO <1 second capability)

2. Cross-domain high-performance mode

Figure deployment model

image

Feature

1).机房级别容灾能力的基础上,具备跨域接流能力。
2).高性能:该模式是在同城达到强同步,对于写业务性能相对跨域强同步是有很大提升的。
3).模式扩展能力:在该模式下,华南region异常时,华南region的业务会读站华北;
   如果想要同城容灾能力,可以在华南region再加一个learner节点,添加这个节点对业务写性能无任何影响。

3. Strong cross-domain synchronous mode

Figure deployment model

image

Feature

1).真正的跨域强一致能力:任意城市不可用,不影响集群的可用性,数据零丢失。
2).灵活的切换策略:可设置同城节点,跨域节点的切换优先级。
3).灵活的伸缩能力:该部署模式可以在面临大促等需求时,可以动态将跨域强同步模式切换到跨域高性能模式:在保证机房级别容灾的基础上,获取更高性能。

Business applications practice cases

  我们看到阿里内部常用的几种部署模式后,接下来我们看下RDS 5.7三节点企业版结合具体业务的应用扩展案例。我们最常用的结构是同城跨机房部署模式,这种模式用最小成本给业务带来了强一致,刚开始只在金融、结算等对一致性要求极高的业务线落地,随着推广和产品的完备,该部署模式是同城容灾的默认部署模式。
  在之前的单元化、区域化等项目中,单元之间同步都依赖DTS来完成。在有了RDS 5.7三节点企业版之后,鉴于RDS 5.7三节点企业版在跨单元的性能优化,对于跨域同步需求,RDS 5.7三节点企业版内部就可以直接完成。借助多角色能力,优酷通过在香港单元动态扩展一个Leaner角色的节点实现“优酷出海”;同时中心化单元架构也都演进了RDS 5.7三节点企业版的跨域高性能模式。接下来我们来介绍一个跨域高性能模式下的一个极致优化案例。

1.交易库存大促方案

  交易库存作为交易链路的核心系统之一,承载着淘宝、天猫、航旅、大麦等业务的库存扣减。对阿里单元化了解的同学基本都清楚,交易单元化是从买家维度进行的拆分,使得交易在多个单元都可以为买家提供服务。而库存由于跟买家和卖家都相关,采用的是中心模式的单元化架构。所以在数据库架构上选择了RDS 5.7三节点企业版的跨域高性能模式。下面我们看下在面临双十一极限流量挑战时,我们的应对策略。

1).一键切换级联复制模式

  我们先看下库存的日常级联模式,如下图:
image
  在这个模式下,leader需要给5个节点发日志,在性能压测阶段,我们发现leader的TPS到6W时,热缓冲区锁的争抢导致出现性能瓶颈。单元同步改到中心follower节点后,leader节点的RT大幅下降。如下是大促级联模式图以及压测结果对比。
image
image

2).弱一致模式

  通过该参数weak_consensus_mode可以打开RDS 5.7三节点企业版同步的弱一致模式,打开后,事务提交无需再等待达成多数派。当然打开后,数据一致性会回退到主备模式,我们目前就只在大促开始高峰期开启。高峰期过后,这个参数就会恢复。
3).热点更新以及热点更新下的同步性能优化
  热点更新原本就是数据库的一个难题,受制于引擎内部的行锁竞争,性能吞吐一直很难提升上去。RDS 5.7三节点企业版面对跨域场景下的长传网络更加是雪上加霜,提交的时间变长,事务占据行锁的时间也显著增加。为了解决这个问题,RDS 5.7三节点企业版在原AliSQL的热点功能上优化了复制,使得保证数据强一致的情况下,热点更新性能提升200倍。
image
  如上图所示,RDS 5.7三节点企业版针对热点行更新的基本思路是合并多个事务对同一行的更新。为了让批量的更新事务能够同时进行提交,RDS 5.7三节点企业版增加了一种新的行锁类型——热点行锁。热点行锁下,热点更新的事务之间是相容的。 RDS 5.7三节点企业版为了保证数据的一致性,对同一批的热点更新事务日志打上特殊标志, RDS 5.7三节点企业版会根据这些标志将这一整批事务的日志组成一个单独的网络包进行集群间的数据同步,保证这些事务是原子的提交/回滚。除此之外为了提升日志回放的效率,RDS 5.7三节点企业版将每个批次事务中对于热点行的更新日志也做了合并,保证了库存单元和DTS的延迟。
  经历连续两年双十一的大考,高流量高性能验证了RDS 5.7三节点企业版的稳定性和极致性能。

2.菜鸟电子面单异地容灾方案

  在主备同步时代,我们只需要关注主备数据在主库宕机时的一致性问题。到了DTS链路的单元化架构时,我们的数据一致性问题被放大很多:跨单元的数据一致性如何保证?面对单元化架构的异地容灾,HA应该如何切换?DTS的位点如何联动?我们来看下菜鸟电子面单的异地容灾场景下的解决方案。
  菜鸟电子是菜鸟网络联合快递公司向商家提供的一种物流面单服务,商家在ISV软件提供商发货时,会通过电子面单获取物流包裹信息,打印后交付快递公司揽收派送。电子面单打单量巨大,而且服务可用性要求非常高(5个9),一旦服务异常,大量商家无法发货,严重影响物流时效,甚至带来社会影响。同时电子面单对数据一致性要求很高:电子面单的一个大核心功能是给物流商提供唯一的面单序列号,这个序列号作用于物流的整个生命周期,而如果出现序列号错乱可能会导致包裹无法揽收,物流详情错乱等业务影响。综上我们把这里的业务需求转换为技术要求:数据强一致和跨域容灾能力。同时从资源成本考虑,业务不仅是异地部署,是要达到双活。
  因此在数据库层面我们采用了RDS 5.7三节点企业版跨域强同步模式,在这个模式下,应用层节点写leader,leader写入后强同步到异地follower节点,保证了多地域的数据强一致;对于跨域写leader这块,业务通过定制化连接池、Batch SQL提交等技术,把异地网络耗时降到最低,保证了异地双活链路的服务可用性。应用读服务采用读本地策略。另外从成本考虑,华东Follower节点我们也调整为Log节点。具体的业务部署图如下
image
接下来我们重点看下网络耗时减小的优化方案:

1).事务专用连接池

  系统原本使用的数据库中间件对连接池的管理上使用的是auto_commit是true,这样每次事务执行时,都会存在auto_commit设置以及复位带来的DB交互。考虑到跨域网络延迟加大,业务层面自行维护了一个事务专用连接池,让这些连接的auto_commit属性持续保持是false。同时在事务访问上,我们结合了特有的hint(COMMIT_ON_SUCCESSROLLBACK_ON_FAIL),减少单独一次的commit或者rollback带来的网络交互。这样在事务层面把网络耗时减到最小。

2)事务内多条SQL请求批量合并

  默认情况下,MySQL JDBC驱动不支持BATCH,当业务层发起多条SQL请求时,JDBC驱动会将所有语句拆散,串行发送到服务端执行。该种模式在跨地域写的场景下,每一条SQL执行,都会带来40ms跨城耗时叠加。考虑到该点,我们在业务层对JDBC驱动配置进行了调整,利用rewriteBatchedStatements + allowMultiQueries,最终将事务中所有多条写SQL语句一次性批量发送到DB服务端执行,去除了原有串行执行的网络开销。
通过在业务层面的两个优化,我们把原本270ms的跨域事务写耗时降低到了70ms,与中心交互次数从原有6次降低到1次。
这样我们在菜鸟电子面单业务上,结合RDS 5.7三节点企业版以及业务优化最终做到了:
  1).容灾演练:同城容灾时,会根据我们的权重配置,优先切换到同城(华北)的follower节点。当华北地域有异常时,剩余华南华东节点会重新选主,切主到华南follower,由于节点数据保持强一致性同步,该部署架构下,切主无数据质量风险。实际演练数据是23s完成中心leader 切主到异地follower节点。
  2).双活性能方面:经过如上的应用层网络耗时的优化,电子面单整体集群采用华北中心-华南单元双活分流,其中华南集群承担超过20%取号流量,由于启用了双活RT优化,华南单元取号接口RT稳定保持在90ms左右,与华北中心RT差距<30ms,系统成功率持续大于99%。
  如上介绍了RDS 5.7三节点企业版在双十一大促库存场景下的极致优化案例,以及菜鸟电子面单业务跨域强一致、跨域容灾架构下的极致优化方案。这些案例都是极致情况下的极致优化,对于大部分业务来说,RDS 5.7三节点企业版 常见的部署模式完全可以适用。

综述

  The series revolves around the theme "Go IOE", Ali Baba database solutions on data consistency. In MySQL standby mode, we use some of the features of the patch and the surrounding ecological construction to try to ensure consistency of data in stand alone, standby cluster unit architecture. To the era of RDS 5.7 Enterprise Edition three-node, we use his powerful functionality and extreme consistency of performance optimization, done entirely responsible for data consistency. The same time with RDS 5.7 Enterprise Edition three-node flexible deployment model that allows Ali's Enterprise Edition database schema in the era of RDS 5.7 has a new three-node upgrade. With RDS 5.7 Enterprise Edition as a financial three-node three-node (5.7) products officially on the cloud, we can look forward to bringing more technology to more users dividend. RDS 5.7 Enterprise Edition three-node as a new database product, you have any questions, you are welcome by the cloud database expert services contact us.

Guess you like

Origin yq.aliyun.com/articles/708759