Construction and application practice of long connection components on Baidu iOS terminal

Author | Baidu News China Taiwan Team

guide 

In the past ten years, with the rapid development of mobile technology, mobile applications have gradually become the main way to conveniently access and use the Internet, undertaking more and more services and functions, which also means that the communication between the mobile terminal and the server Communication efficiency and stability put forward higher requirements. In order to achieve more efficient real-time communication and data synchronization, long-term connection has gradually become a key technology. By maintaining a persistent connection between the client and the server, real-time data exchange between the two parties is realized, and frequent connection establishment and disconnection costs are avoided. Thereby improving user experience, service stability, reliability and other aspects of performance.

This article aims to discuss the practice of long-term connection technology in the mobile terminal, focusing on the technology selection and overall architecture logic of Baidu iOS in the process of building long-term connection. At the same time, combined with the introduction and analysis of IM instant messaging cases, it shows how persistent connections provide solutions for similar business scenarios in the field of mobile applications.

This article will be divided into five main parts. First, an overview of long connection technology, including definition, comparison with short connection and common applications in the mobile Internet field. Next, we will briefly introduce Baidu's long-term connection service, including the background of the construction and the core main process of the service provided after completion. Then, it will focus on the challenges and solutions in the process of building the Baidu iOS long-connection SDK, including protocol selection, DNS resolution optimization, and the design of the long-term connection keep-alive mechanism. Then, taking the persistent connection practice in the IM instant messaging scene as an example, it shows how the persistent connection SDK implements functions such as request data forwarding and receiving server active push for business. Finally, the main content of this article is summarized, and the future development trend and prospects of persistent connections in mobile applications are prospected.

The full text is 7193 words, and the expected reading time is 18 minutes.

01 Introduction to long connection

1.1 Understanding long connections

A persistent connection means that multiple data packets can be sent continuously on a connection. During the connection maintenance period, if no data packets are sent, both parties need to send link detection packets.

picture

1.2 Comparison of long connection and short connection

picture

1.3 Application of persistent connection in the field of mobile Internet

Persistent connections are widely used in the field of mobile Internet, especially playing a key role in realizing functions such as real-time communication and message push. For example, common instant messaging software such as WeChat and QQ realize the need for instant information transmission by maintaining a long connection between the client and the server. Another example is some online games, location services, news push, etc., which also use long connections to push new developments or messages to users in real time. In this way, no matter when and where the user is, as long as he is connected to the Internet, he can receive the latest information, which greatly improves the user experience and makes the mobile Internet more convenient.

In general, long-term connections are a good choice for applications that have strong requirements for real-time performance, data transmission efficiency, and frequent communications.

02 Introduction to Baidu Long Connection Service

2.1 The background of building a unified long connection

Previously, Baidu mobile terminals were all long-term connections that were operated and maintained by each business. The construction and maintenance costs were often high, and the reusability was not great. Therefore, it is planned to implement a set of high-concurrency, low-latency, high-touch Unified long connection components can support various business access more flexibly and efficiently, and can independently output long connection services to each APP of the Baidu system to meet the demands of each business, thereby improving service quality and reducing resource costs.

2.2 Main flow chart of persistent connection service

The Baidu persistent connection service includes two parts: the persistent connection SDK on the client side and the persistent connection access layer on the server side. The persistent connection access layer also includes an access control module and an access module, which are responsible for maintaining persistent connection management and business data forwarding. The following figure describes the persistent connection establishment and heartbeat keep-alive process, business login and post-login push process, and the final persistent disconnection process triggered by the SDK. The following article will discuss in more detail the specific implementation solution of the persistent connection SDK on the iOS side, and the business application of the persistent connection SDK in Baidu APP.

picture

03 Baidu's iOS solution to build a persistent connection SDK

3.1 An overview of the challenges of establishing a long connection on the client side

The client builds a complete persistent connection SDK from 0 to 1. This process involves the consideration of multiple technical points, including but not limited to: creation and maintenance of connections, selection of network protocols, use of encrypted transmission, verification of data sources, etc. To ensure the security of long-term connections, to reduce the amount of data to improve transmission efficiency through data transmission format selection, data compression, etc., and to handle errors and exceptions, developers need to choose the optimal implementation plan according to the actual situation. Among them, the core can be disassembled into the following two parts:

1. Connection creation: the design of the complete connection establishment process, the selection of network protocols, and core indicators such as the success rate and delay of long-term connection establishment need to be considered during the design;

2. Connection maintenance: The first step is to ensure the success of connection establishment. Long connection needs to maintain and maintain the connection between both parties to achieve the purpose of continuous communication. This includes: in the case of no data interaction for a long time, it is necessary to send heartbeat packets regularly To keep alive the connection, and after the long connection is disconnected, it is necessary to disconnect and reconnect in time to restore the connection to the online state.

3.2 Core Logic 1: Connection Creation

Establishing a persistent connection, that is, establishing a connection between the client and the server, is the first thing that the persistent connection SDK does. All data transmission (uplink and downlink) of the business side must be based on the premise that the persistent connection is successfully established. Establishing a persistent connection is not a single simple operation, but a phased process. This section will mainly discuss several technical points and implementation solutions that need to be considered and confirmed before designing and developing the long-term connection establishment module, as well as the architecture of the complete long-term connection establishment process finally realized by Baidu's long-term connection SDK on the iOS side.

3.2.1 Challenge ①: Protocol selection

The problem: UDP or TCP

For the topic of network programming, which data transport layer protocol to use to achieve communication is a very basic but always debated issue. UDP and TCP have their own application scenarios. TCP can provide reliable data transmission, while UDP has higher transmission efficiency. The difference between TCP and UDP will not be described here. It is a matter of opinion which protocol to choose in the end. , which needs to be considered comprehensively in combination with the overall application scenario, development cost, deployment and operating costs.

Solution: TCP-based, while exploring the potential of QUIC with small traffic

There are two sets of data transmission solutions in the Baidu iOS long connection SDK:

Solution 1 : In the early stage of long-connection SDK construction, according to the research results of mature technical solutions in the industry, and the consideration of development cost and maintenance convenience, the first solution is rewritten with reference to the CocoaAsyncSocket framework, based on Socket native development, using TCP protocol , supports TLS/SSL secure transmission, and is thread-safe. This solution is relatively mature, easy to use, and has a high success rate of connection establishment. At present, 90% of the user traffic on the Baidu APP iOS terminal is through the long-term connection logic implemented by this solution.

Solution 2 : Generally, stable network transmission is through TCP. However, as the network infrastructure itself has become more and more perfect, some problems in the design of TCP have been exposed. In addition, TCP is implemented in the operating kernel and intermediate firmware. Therefore, it is almost difficult to make major changes to TCP. Problems such as long handshake time during the connection establishment process and head-of-line blocking have not been well resolved. Let us start to consider some new possibilities. The persistent connection SDK subsequently introduced a second solution based on the QUIC protocol. The QUIC protocol is based on UDP and achieves reliable transmission. Compared with the HTTP2+TCP+TLS protocol, QUIC has many advantages: it reduces the TCP three-way handshake and TLS handshake time, improves congestion control, and has no queue head blocking Multiplexing, support for connection migration, etc. Baidu iOS persistent connection SDK currently introduces the implementation of the QUIC protocol through NWConnection. Although the QUIC protocol is relatively advanced, it also means that there is more room for optimization in terms of engineering implementation. At present, the second solution is still in the experimental stage of small traffic, and there is still a lot of optimization work to be further implemented in the future. Judging from the data obtained by the current heavy volume, the QUIC implementation scheme has a better performance in terms of the success rate and delay indicators of long-term connection establishment.

3.2.2 Challenge ②: DNS resolution optimization

Problems: DNS Intractable Diseases Facing Domestic Mobile Networks

Due to domain name caching, resolution and forwarding, and LocalDNS recursive egress NAT, the LocalDNS of various domestic ISP operators is likely to cause problems such as DNS being hijacked, resulting in service unavailability, and inaccurate DNS scheduling leading to performance degradation. The efficiency and accuracy of DNS resolution directly affect the quality of long-term connection establishment, which in turn affects the company's business.

Solution: HTTPDNS

Therefore, in the Baidu iOS long-connection SDK, the current mainstream solution in the industry: HTTPDNS is used to replace LocalDNS resolution. HTTPDNS uses the HTTP protocol to interact with the DNS server, bypassing the operator's LocalDNS service, effectively preventing domain name hijacking and improving domain name resolution efficiency.

3.2.3 Complete solution: the overall process of long-term connection establishment on the Baidu iOS terminal

Connection timing

In Baidu APP, a series of system events and life cycles are uniformly maintained for each component to monitor. According to the business characteristics of Baidu APP, the iOS persistent connection SDK chooses to trigger the long-term connection establishment after the event of environment setup is completed, that is, wait for the necessary data such as the home page resources to be loaded for the APP to start, and then start to trigger the long-term connection establishment.

The complete process of establishing a connection

The following figure shows the four processes of establishing a persistent connection SDK:

Get Token :

  • The meaning of obtaining Token: In a narrow sense, Token refers to the access-token returned by the persistent connection access control module, which is subsequently uploaded to the persistent connection access layer with the persistent connection login request, and used for authentication by the access layer to the persistent connection access control module . In a broad sense, data such as the transmission protocol and access point are sent along with the Token request, including but not limited to: QUIC or TCP is used for the long-term connection protocol, whether ipv6 is preferred, connection domain name and port, log management small flow switch, etc.

  • Mechanism for obtaining Token: To obtain Token, go to the local cache first. When there is no valid data in the local cache, a network request is sent. The request is a short connection request based on NSURLSession; Token is cached on both the server and the client, and there is an expiration time , if the token expires, it will be reflected in the failure of the persistent connection login request in stage 4 in the figure below. At this time, the local Token cache will be cleared and a new connection will be triggered.

DNS domain name resolution : As mentioned above, HTTPDNS is used instead of LocalDNS in the persistent connection SDK to prevent DNS hijacking and improve resolution efficiency. At the same time, in the iOS Release environment, in order to improve the efficiency of DNS resolution, a local cache mechanism is established. After the HTTPDNS resolution result is returned, the local cache will be updated. The next time the connection is established, the cache will be prioritized, and the network request will be sent only if the cache is invalid.

Establishing a Socket connection : The process of establishing a Socket connection involves the selection of a transport protocol. According to the previous introduction, the iOS persistent connection SDK currently uses a small traffic experiment. 10% of users use QUIC to establish a connection, and 90% of users use TCP to establish a connection.

Long-term connection login request : carry the access-token obtained in the Token to complete the authentication. The successful return of the long-connection login request means that the entire connection establishment process is completed, and the business layer can start using the long-term connection for communication normally. If the login returns an error, then will trigger a reconnection.

After successfully completing these four stages, the long-term connection will continue to send heartbeat packets on the link to keep the connection alive. Before the active disconnection triggered by abnormal disconnection or background pressure, etc., the connection will always be kept online. Data transmission provides access.

picture

3.3 Core Logic 2: Connection Maintenance

3.3.1 Significance of maintaining connections

The previous section introduced the whole process of establishing a persistent connection. After the persistent connection is successfully established, it is actually in an available state, that is, the communication of various services based on the persistent connection can be carried out normally. But after that, it is also very important to maintain the availability of long connections. If the long-term connection cannot be maintained well, the server will continue to push downlink notifications when the connection has failed, but the end will not receive them, resulting in a waste of resources and failure to re-establish the connection in time, resulting in business losses.

3.3.2 Solutions for maintaining long connections

For several main reasons that may lead to the disconnection of long-term connections, the long-term connection SDK has established a corresponding mechanism to ensure the stability of the connection, which can be summarized into two points: heartbeat keep alive and reconnection after disconnection.

picture

Solution ①: Heartbeat keep alive

Definition of heartbeat keep-alive : The way to implement long-term connection keep-alive is usually to use application layer heartbeat, and perform reconnection operations through timeout or error reporting of heartbeat packets. Heartbeat generally means that one end (usually the client) sends custom instructions to the other end (usually the server) at regular intervals to determine whether the two parties are alive. Because it is sent at a certain interval, similar to heartbeat, it is called Stay alive for a heartbeat.

Baidu iOS long-connection SDK heartbeat keep-alive mechanism : After the long-connection login request is successful, the returned data is parsed. If the server sends the heartbeat packet interval, the heartbeat packet will be sent continuously at the interval sent by the server for connection protection. Live, if there is no interval for sending heartbeat packets, the client will default to 60s interval to trigger the sending of heartbeat packets. The specific heartbeat keep alive process is shown in the figure below.

picture

Solution ②: Disconnect and reconnect

Principle of reconnection after disconnection : In the scenario where the long connection may be disconnected (pressing the background to re-enter the APP, network status changes, etc.), the availability status of the long connection is detected, and when the connection is detected to be unavailable, the reconnection mechanism is triggered in time.

Baidu iOS long-connection SDK disconnection reconnection mechanism : the specific timing of triggering disconnection and reconnection is shown in the figure below. The iOS long-connection SDK internally maintains a serial queue and a unified long-term connection status monitoring record, which will not cause repeated connection establishment occur.

picture

04 Application and practice of long connection in Baidu APP

4.1 Implementation of persistent connection in Baidu APP

A persistent connection is a full-duplex connection from the client to the server. After the connection is established, it can provide services such as request forwarding and server active push for the business party. In the Baidu APP, a series of real-time consulting services including online health diagnosis and treatment, college entrance examination voluntary reporting consultation, emotional and psychological counseling, etc., real-time communication scenarios for users such as sending live bullet screens, joining a large V fan group chat, and privately messaging friends, etc., have been implemented. As well as the realization of the basic capacity building that the cloud can actively issue the configuration control terminal in a timely manner when the user is online, it is inseparable from the support of long connections. The persistent connection provides a stable and convenient way and channel for each business to interact with its own server data.

The figure below is a schematic diagram of the structure of long-term connection and landing business in Baidu APP. The complete persistent connection module includes two parts: the persistent connection SDK on the client side and the persistent connection access layer on the server side. As an intermediate channel for data exchange between each business and its own server, it handles the tasks including connection establishment and keep alive, and realization of various business customers. Logic such as two-way exchange of data between the client and its own server. The following will focus on the practice of persistent connections in IMSDK real-time chat communication scenarios.

picture

4.2 The Practice of Long Connections in IM Instant Messaging Scenarios

4.2.1 Background introduction

IMSDK, that is, the client SDK with in-app instant messaging capabilities created by Baidu Message Center for Baidu APP and other Baidu products, including a variety of user communication scenarios: private chat, group chat, chat room, live barrage, etc., and Help business push message notifications to reach users, and establish communication channels between B-end and C-end. At present, the main functions such as pulling the session list, pulling messages, sending messages, and reading messages are all implemented by long connections. This section will introduce how the data forwarding and server active push capabilities provided by persistent connections can be implemented in business scenarios by introducing the two common scenarios in IM instant messaging: the user sends a message and the user receives a new message notification.

4.2.2 Practice 1: Users send messages in real-time chat scenarios

practice scene

In the real-time chat scenario, the user sends a message to his friend in the chat box. If the message fails to be sent, the application usually displays a red exclamation mark next to the message bubble. This application scenario should be very familiar to Internet users. From a technical point of view, in essence, the business client sends a request to its own server, and the server returns the request result to the client. This is a typical scenario that requires frequent point-to-point communication, which is very suitable for implementation based on long connections. The persistent connection SDK provides a packaged persistent connection request class externally. External business parties such as IMSDK create an instance of this class when making an uplink long connection request, assign the parameters and data required for uplink to the request instance, and set the callback closure to use To receive and process the request receipt data and results, and finally send the request. The business does not need to consider logic such as data transmission and forwarding. The long connection will act as a path between the business client and the server, and the black box handles this process.

Technical Difficulties

For the persistent connection SDK, the most important and complex logical point on this path is that the uplink requests and downlink notifications of each business party are performed concurrently, and how the long-term connection SDK manages the data flow in an orderly manner. The upstream request is the write stream, and the downlink data is the read stream. The following is a brief introduction to the management of the read and write stream, and the solution to the problem of matching the request with the receipt data.

Technical realization

There are two queues for read and write data maintenance in the persistent connection SDK: read queue and write queue, and maintain a cache pool for matching request instances and request receipt data. When the business side uploads a long connection request, it actually adds the request task to the write queue. If it is in the writable flow state at this time, it will also trigger the write flow. When the socket connection is successfully established, the task at the head of the write queue will be taken out, and the stream will be written. After the stream is written, it will check whether the write queue is empty. If it is not empty, continue to execute the task at the head of the queue until the write queue is empty. At the same time, if the socket connection is successfully established, a read task will be added to the read queue, and if it is in the readable state at this time, it will take out the first read task at the head of the queue and start reading the stream. After the read stream is successful, it will continue to add a read task. Go to the read queue and read stream operations in a loop.

The server’s downlink return data obtained by reading the stream is composed of a unique key value through serviceId (business number) + methodId (long connection request method number) + the timestamp when the request is initiated, and is matched to the request body corresponding to the downlink return data in the cache area. The method of callback returns the request result to the caller. Once the request is called back once, its instance will be deleted from the cache, the cache memory will be released in time, and it is guaranteed that multiple callbacks will not occur for one request.

picture

4.2.3 Practice 2: User receives new message notification in real-time chat scene

practice scene

In a real-time chat scenario, how does a user receive notifications of new messages sent to him by other users? In fact, it relies on the downlink notification from the server to the client. Persistent connections not only provide the ability to forward uplink requests for business clients, but also provide active push services from the server. For example, in the IM service, the real-time receiving and fetching of messages is completed by relying on the downlink new message notification of the IM server. How do these notifications reach IMSDK? In fact, it is similar to the process of long connection request on IMSDK in the previous section.

Technical realization

In the initialization phase of the persistent connection management class of IMSDK, the downlink notification method that needs to be received will be registered. The registration here actually refers to multiple uplink persistent connection requests. Each request has a corresponding serviceID (business number) and methodID ( Notification method number that needs to be registered). The difference from the long connection request in the previous section is that these requests will not be removed from the long connection SDK request buffer after receiving the receipt data, but will exist for a long time, as long as the data corresponding to the methodID is read when reading the stream, Then you can find the corresponding request in the request cache and transfer the downlink data to IMSDK. In this way, as long as the long-term connection is online, the business party can receive the downlink notification message from its server in real time.

05 Epilogue

The core of the persistent connection service can be roughly divided into: connection establishment process, connection maintenance process and data transmission process. This article gives some challenges and solutions in the process of building a persistent connection service, and combines the practice of the persistent connection function in the instant messaging scene of Baidu APP, and briefly introduces the overall architecture of the Baidu iOS terminal persistent connection SDK.

On the mobile side, the application prospect of persistent connection technology is very broad. With the development of high-speed mobile networks such as 5G and 6G, mobile applications will be able to use persistent connection technology more efficiently, thereby achieving more real-time and efficient data exchange. This also provides a broader imagination space for application scenarios that have a strong demand for real-time data exchange, such as the Internet of Things, smart home, virtual reality and augmented reality technologies, in which long connections will play a more important role.

——END——

Recommended reading:

Baidu App Startup Performance Optimization Practice

Application practice of light sweeping motion effect on mobile terminal

Android SDK security hardening issues and analysis

Large-scale quantitative practice of search semantic model

How to design an efficient distributed log service platform

Multimodal Semantic Matching Model in Video and Image Retrieval: Principles, Implications, Applications and Prospects

iQIYI client "White" TV, the background uploads TIOBE's July list at full speed: C++ is about to surpass C, JavaScript enters Top6 GPT-4 model architecture leak: Contains 1.8 trillion parameters, using Mixed Expert Model (MoE) CURD has been suffering for a long time in the middle and backstage front-end, and Koala Form will be used for 30 years, and the Linux market share has reached 3%. Musk announced the establishment of xAI company ChatGPT traffic dropped by 10 % . SUSE invests $10 million in sweeping data theft , forks RHEL
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4939618/blog/10088121