How to improve the performance of Jingdong shopping cart by 30%

This article mainly introduces that under the background of business complexity, the Jingdong retail shopping cart team strives to practice the spirit of craftsmanship, and improves system performance and user experience through fully asynchronous transformation. Through this article, readers can understand the overall plan for the full asynchronous transformation of the shopping cart platform, as well as the problems and solutions encountered during the implementation of the plan. Readers can focus on the multi-page parallelism mentioned in the article, the fine control of paging and the bottom layer RPC exception information problem .



1. Background


Shopping cart challenges:
1) New business : With the enrichment of business forms, the shopping cart continues to support various new businesses, and the external interfaces it relies on also increase;
2) Sinking : Some interfaces called by the front end sink to the center of the shopping cart;
3) Front-end : Many businesses in the settlement process are pre-loaded into the shopping cart, such as coupons and Jingdou;
4) Expansion : In order to improve the user experience, the number of items that can be accommodated in the shopping cart is constantly increasing;
These lead to an increase in the number of RPC interfaces and paging calls that the shopping cart depends on. As the beginning of the transaction process, the shopping cart itself has a large amount of traffic. In the context of business complexity, how to improve performance and ensure user experience has become a major challenge for the shopping cart.


2. Fully asynchronous transformation plan


Although the problem can be solved to a certain extent by increasing server resources, it will bring a large cost overhead, which is also contrary to the spirit of craftsmanship. Can performance be improved through technical means? Through analysis, asynchronous transformation becomes an effective means to solve this problem.

1) Parallelization of different RPCs
The shopping cart depends on dozens of interfaces, and there are complex dependencies among the interfaces. It is necessary to sort out the dependencies between interfaces and identify which ones can be parallelized. Then the original code is split into two parts: RPC asynchronous request and result processing. According to the dependency relationship, RPC is executed in parallel to the maximum extent, reducing the waiting time for asynchronous response in the result processing stage, so as to achieve the purpose of improving performance.
2) Batch interface with multiple pages in parallel
Most of the shopping cart dependent interfaces are batch interfaces, and there is a limit on the amount of data in a single call, so the data needs to be split into multiple paging calls. Then multiple paging can also be parallelized. The asynchronous paging tool is encapsulated in the transformation, so that the business layer is not aware of the paging logic. The asynchronous tool automatically splits the data exceeding the upper limit of the interface into multiple paging parallel calls to improve the response speed of a single interface.
3) The bottom layer adopts JSF asynchronous call
The asynchronous call is based on the JD.com RPC framework JSF, version 1.7.5 or later is recommended, and CompletableFuture is supported.


3. Problems and solutions


The overall plan for asynchronous transformation is not complicated, but in the actual implementation process, many detailed problems were encountered:

1) Exception retry needs to be refined
When called synchronously, the call will be retried if it times out. Retrying after changing to asynchronous will fail, because generally no error will be reported when calling, and it is necessary to retry after obtaining an asynchronous response timeout in the result processing stage.
In addition, when multiple pages are parallelized, when a page request times out, only the wrong page should be retried. The bottom layer encapsulates the paging call, and the upper layer business code cannot perceive which page is timed out when obtaining data, so the scene information must be saved in the packaging class during the asynchronous call, and returned to the business layer together. After the Get data times out, Retry failed pagination individually.
When an exception occurs, not all situations need to be retried. When encountering an exception such as current limiting, retrying cannot be performed. The underlying tools need to automatically filter current limit exceptions, and of course also support custom rules.
2) Asynchronous RPC monitoring is more complicated
The underlying RPC time-consuming monitoring needs to be split into two parts, which are recorded as the start time when the page is called, and recorded as the end time after the asynchronous result arrives. If the call is abnormal or Get times out, you need to mark the call as failed. For retries, it is also necessary to record the call time, and the normal call and the retry call need to be recorded separately.
In addition to monitoring the RPC time consumption, it is also necessary to monitor the waiting time of Get in the result processing phase. This time is the time that really affects application performance. Since the bottom layer is paging calls, the number of business calls is not the same as the number of bottom RPC calls.
3) Paging asynchronous results cannot be merged, otherwise the exception provider information cannot be obtained
The result of the underlying asynchronous call must be returned to the upper layer as it is through the wrapper class. In addition to the need for single-page retry mentioned above, another reason is that the asynchronous result must be retained, and the timed-out Provider information can only be output after the page times out. This is because the Provider information depends on the JSFCompletableFuture of the JSF framework. If the result is merged at the bottom layer, the information will be lost.
4)每页超时时间需单独控制
分页调用过程如上图所示,在结果处理时,每页Get超时时间需要单独控制,因为获取结果是顺序进行,获取后边的分页时,前边分页等待的时间也应计算在内,以保证整个获取结果的时间不超过单个分页的最大超时时间。 计算公式如下:
超时=RPC超时时间 > (当前时间-异步调用开始时间) ? RPC超时时间 – (当前时间-异步调用开始时间) : 0
5)分页均衡
为避免最后一页数据过少造成数据倾斜,需要将请求数据均分到每一页,以最大限度提高整个请求的性能。


四、收益


改造完成后购物车核心接口耗时减少30%,保证用户体验,节省大量服务器资源。 后续增加新的RPC接口时,只要处在调用拓扑的非关键路径上,对购物车性能没有太大影响。 另外,容量增加时除少数不能分页调用的接口外,对性能影响已经比较小。

-end-

本文分享自微信公众号 - 京东云开发者(JDT_Developers)。
如有侵权,请联系 [email protected] 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4090830/blog/9020116