How to effectively shorten the processing time of Xianyu messages

background

With the rapid growth of the number of users, Xianyu's IM has also faced unprecedented challenges. For many years of business iteration, the end-side IM code has not been clear enough due to years of iterative hierarchy. Some data synchronization problems hidden in previous messages have also been enlarged as the number of users increases.

The specific process here is that the background needs to be synchronized to the data packets on the client side. The background will be divided into different data fields according to the business type of the data packet. The data packet has a unique and continuous number in the corresponding field. Each data packet After being sent to the end side and successfully consumed, the end side will record the current version number of each data field that has been synchronized. The next data synchronization will start with the number of the local data field and continue to synchronize to the client.

Of course, users will not always wait for messages online, so the push-pull combination was used on the end side to ensure data synchronization.

  • When online, use ACCS to push the latest data content to the client in real time. ACCS is Taobao Wireless providing developers with full-duplex, low-latency, and high-security channel services.

  • After offline startup, according to the local data field number, pull the data difference when offline.

  • When a black hole occurs in data acquisition (a state where the data package version is not continuous), the data synchronization pull is triggered.

analysis

An existing synchronization strategy can basically guarantee IM data synchronization, but it also comes with some hidden problems:

  1. When intensive data is pushed in a short period of time, multiple data domain synchronizations will be triggered quickly. If there is a problem with the data returned from the domain synchronization, a new round of synchronization will be triggered, causing a waste of network resources. Redundant data packets/invalid data content will occupy effective content processing resources, and cause waste of CPU and memory resources.

  2. Whether the client side of the data packet in the data domain consumes normally, the server side has no perception and can only passively return data based on the current data domain information.

  3. The data collection/message data body analysis/storage logic split is not clear enough, and it is impossible to perform ABTest on a certain layer of code split replacement.

In response to these problems encountered, Xianyu IM was restructured in layers to remove the data synchronization layer. In addition to hope that the synchronized content of this data can be used in IM in the future, it is also hoped that with the increase in stability, other business scenarios will be empowered.

This article focuses on some solutions for the IM data synchronization of the bottom-side idle fish.

Data synchronization optimization

Split & layer

For the server side, after the business side produces the data packet, it will splice the current data domain information, and then push the data to the end side through the data synchronization layer.

For the client, after receiving the data packet, it will determine the business party that needs to consume the data packet according to the current data domain information, and after ensuring that the data packet is complete and continuous in the data domain, the data body will be unpacked and delivered to the business side Consumption, and respond to the state of consumption.

The extraction of the data synchronization layer encapsulates the packing, unpacking, verification, and retry processes in the data synchronization, so that the upper-level business only needs to care about the data domain information that it needs to monitor, and then when these data domains update the data At that time, these data can be obtained for consumption, and no longer need to care about whether the data package is complete.

The business side only needs to care about the protocol that is connected to the business side, the data side only needs to care about the protocol packaged on the data side, and the network layer is responsible for real data transmission.

  1. Align the data layer data transmission protocol, describe the current data packet body data domain information

  2. Separate the processing/merging/dropping of messages into data consumers

  3. Rely on abstraction up and downstairs, remove the dependence on specific implementation

Data layer structure model

Based on the separation of data model and the regularization of solutions to problems encountered at the moment, the data synchronization layer is split into this architecture:

step1: The ACCS long link service is established when the App starts to ensure that the push channel link is pushed, and a data pull is triggered based on the current local data domain information.

step2: The data consumer registers consumer information and the data domain information that needs to be monitored. Here is a one-to-many relationship.

step3: After the new data arrives at the end side, put the data packet into the buffer pool of the designated data domain, and restart the data reading after the batch data is summarized.

step4: According to the priority of the current data field, the highest-quality data packet will be popped up to determine whether the data field version meets the consumer's requirements. If it meets the requirements, the data packet will be unpacked and then thrown to the consumer for consumption. Domain information triggers incremental data domain synchronization pull.

step5: When triggering the synchronous pull of the data domain, the block data is read. At this time, the data touched by ACCS will continue to be summarized in the specified data domain queue, waiting for the result of the synchronous pulling of the data domain, and sorting the data packets. Remove the duplicates and merge them into the corresponding data domain queue. Then reactivate data reading.

Step6: After the data packet body is correctly consumed by the consumer, update the domain information and inform the server of the correctly processed data domain information through the uplink channel.

Data Domain Synchronization Protocol

The data carried in the Region need not be too much, but the content of the data packet needs to be clearly described

  • The ID of the target user to determine whether the target packet is correct

  • Data field ID and priority information

  • The domain priority version of the current packet

Sorting strategy

For domain data induction, whether it is sorting when writing data or searching when reading data, a sorting operation is required. The optimal time complexity is also at the O(logn) level, which is found in actual coding Since in a data field, the Version information of the data package is continuous and unique and there is no gap, the Version information of the last stable consumption data body is automatically incremented to the Version of the next data package, so Versio is used as the main key here. Map storage not only reduces the time complexity, but also enables the package content that arrives at the end after the uniquely identified data package to overwrite the previous package content.

Some problems & solutions

Balance of multiple data sources and unique data consumption

Whenever a data packet for the current user is generated, if the current ACCS long link exists, the data packet will be pushed to the client through ACCS. If the App switches to the background for a period of time or is directly killed, the ACCS link will be disconnected. Then it can only be pushed to the user's notification panel offline. So whenever the App switches to the active state, it needs to trigger a data synchronization from the background based on the current locally stored data domain information

The source of the data packet reaching the end side is mainly the push of the ACCS long link and the pull during the domain synchronization, but the consumption of the data packet is the only consumer divided according to the monitoring of the data domain, that is, only one data can be consumed at the same time package.

In the stress test, when the background intensively pushes data packets to the end side through ACCS in a short time, the data packets received by the end side are not in order, and the discontinuous data packet domain version will trigger a new data domain synchronization , Causing the same data packet to reach the end side through two different channels multiple times, wasting unnecessary traffic.

When the data domain is synchronized, the new data packet generated at this time node will also be pushed to the end side, the data body is valid and needs to be consumed correctly.

Solutions to these problems:

A data intermediate layer is loaded between data consumption and data acquisition. When the data domain synchronization is triggered, the block data is read and the data packets pushed by ACCS will be stored in a data transfer station. When the data domain is synchronously pulled After the data comes back, merge the data before restarting the data reading process.

Data field priority

The data packets that need to be pushed to the end side are divided according to the different priorities of the business. The data packets generated by the user and the user chat will have a higher priority than the data packets of the operational messages, so it should be more priority When the data packet arrives at the end side quickly, the data packet of the high-priority data domain needs to be consumed first, and the priority of the data domain also needs to be dynamically adjusted and continuously changed.

Solutions to this problem:

Different data fields produce different data queues, and data packets in the high-quality queues will be read and consumed first.

The data domain information brought back in each data packet body can be marked with the current data domain priority. When the data domain priority changes, the data packet consumption priority strategy can be adjusted.

Optimization effect

In addition to the hierarchical combing of the structure, the data synchronization layer and the dependent service content can be easily decoupled/each link is pluggable. In data synchronization, the message consumption time/traffic saving is more obvious in the stress test scenario.

Stress test scenario: push 100 out-of-order data packets within 500ms

Message processing time (receive-up screen) shortened by 31%

Traffic loss (the cumulative size of data packets finally pulled to the end) is reduced by 35%

Follow-up plan

Improved data synchronization layer capabilities

The goal of the data synchronization side is to not only ensure that the data packets arrive at the end side intact, but also to reduce the pull of data as much as possible on the premise of ensuring stability, so that every data acquisition is effective. The subsequent data synchronization layer will start to further optimize the effective data rate and arrival rate.

  • According to different scenarios, the priority strategy of data synchronization is dynamically and intelligently adjusted.

  • Blocking long link push ensures that only push mode or pull mode exists at the same time, further reducing the push of redundant data packets.

Xianyu IM end-side overall architecture upgrade

Upgrading the data synchronization layer strategy is mainly to improve the ability of Xianyu IM. After the data is synchronized and layered, the next step is to streamline the message processing, and each process can be monitored and traced to improve the correct analysis and storage of IM data packets. And drop rate.

  • After the data source side is separated, subsequent rectification of IM will gradually separate the processing of the message.

  • Process reporting of key nodes in message processing, and establish a complete monitoring system to make problem discovery before user public opinion

  • Dynamic self-checking of message integrity minimizes data compensation and completion.

Note: The upgrade will be available in the mid-November version, so stay tuned

Guess you like

Origin blog.csdn.net/weixin_38912070/article/details/109610253