Featured Cases | Zhihu Uncovers the Tens of Millions of High-Performance Long-Connection Gateways

 

Real-time response is always exciting, just like when you see the other party typing in WeChat, if you respond to everything in the Canyon of Kings, like the 666 you all agree on in the live barrage, all of them are inseparable from the long connection Technology support.

 

Almost every Internet company has a set of persistent connection systems, which are used in news reminders, instant messaging, push, live barrage, games, shared positioning, stock quotes, and other scenarios. When the company develops to a certain scale and the business scenarios become more complex, it is more likely that multiple businesses will need to use the persistent connection system at the same time.

 

Designing long connections separately between businesses will lead to a sharp increase in R&D and maintenance costs, waste of infrastructure, increased client power consumption, and inability to reuse existing experience. The shared persistent connection system also needs to coordinate the requirements of authentication, authentication, data isolation, protocol expansion, message delivery guarantee, etc. between different systems. During the iteration process, the protocol needs to be forward compatible. The system will also increase the difficulty of capacity management.

 

After more than a year of development and evolution, through our services for several internal and external apps, access to more than a dozen long-term connection services with different needs and forms, millions of devices online at the same time, and sudden large-scale message sending After tempering the scenarios, we have extracted a general solution for the persistent connection system gateway, which solves various problems encountered when multiple services share long connections.

 

Zhihu persistent connection gateway is dedicated to decoupling business data, efficiently distributing messages, solving capacity problems, and providing a certain degree of message reliability guarantee.

 

 

 

How do we design the communication protocol?

 

business decoupling

 

The long-connection gateway supporting multi-services is actually connected to multi-clients and multi-service backends at the same time. It is a many-to-many relationship, and only one long connection is used for communication between them.

This many-to-many system should avoid strong coupling when designing. The logic of the business side will also be dynamically adjusted. If the business protocol and logic are coupled with the gateway, all businesses will be involved with each other, and protocol upgrades and maintenance will be extremely difficult.

 

So we try to use the classic publish-subscribe model to decouple the persistent connection gateway from the client and the business backend. They only need to agree on a topic to freely publish and subscribe to each other. The transmitted message is pure binary data, and the gateway does not need to care about the specific protocol specification and serialization method of the business party.

access control

 

We use publish-subscribe to decouple the implementation of the gateway and the business side. We still need to control the client's permission to publish and subscribe to the Topic to avoid intentional or unintentional data pollution or unauthorized access.

 

If the lecturer is giving a lecture on the 165218 channel of Zhihu Live, when the client enters the room and tries to subscribe to the topic of the 165218 channel, the backend of Zhihu Live needs to determine whether the current user has paid. The permissions in this case are actually very flexible. Users can subscribe after paying, otherwise they cannot subscribe. The status of permissions is only known to the backend of Zhihu Live business, and the gateway cannot make independent judgments.

 

Therefore, we have designed a callback-based authentication mechanism in the ACL rules, and can configure Live-related Topic subscription and publishing actions to be sent to Live's back-end service judgment through HTTP callbacks.

At the same time, according to our observation of internal business, in most scenarios, what the business needs is only a private topic of the current user to receive notifications or messages from the server. very cumbersome.

 

Therefore, we designed a topic template variable in the ACL rule to reduce the access cost of the business side. We configured the business side to allow the topic to be subscribed to include the connected user name variable identifier, indicating that only users are allowed to subscribe or send messages to their own topic. .

At this time, the gateway can independently and quickly determine whether the client has permission to subscribe or send messages to the topic without communicating with the business party.

 

Message Reliability Guarantee

 

As the hub of message transmission, the gateway is connected to the business backend and client at the same time. When forwarding messages, it is necessary to ensure the reliability of the messages during transmission.

 

TCP can only guarantee the order and reliability of the transmission process, but when the TCP status is abnormal, the receiving logic of the client is abnormal, or Crash occurs, the messages in transmission will be lost.

 

In order to ensure that the sent or uploaded messages are processed normally by the peer end, we have implemented the functions of receipt and retransmission. Important business messages need to be sent to the client after they are received and processed correctly. The gateway temporarily saves the unreceived messages from the client. The gateway will judge the client's reception and try to send it again until the client's message receipt is received correctly.

In the face of the large traffic scenario of the server business, the method of sending a receipt for each message sent by the server to the gateway is inefficient. We also provide a receiving and sending method based on message queues, which will be described in detail later when publishing and subscribing. elaborate.

 

When designing the communication protocol, we refer to the MQTT specification, expand the authentication and authentication design, complete the isolation and decoupling of business messages, and ensure a certain degree of transmission reliability. At the same time, it maintains a certain degree of compatibility with the MQTT protocol, which makes it easy for us to directly use the MQTT clients to implement, reducing the access cost of the business side.

 

 

 

 

How do we design the system architecture?

 

When designing the overall architecture of the project, our priorities are:

 

  • reliability

  • Horizontal scalability

  • Dependent component maturity

 

Simple is trustworthy.

 

In order to ensure reliability, we did not consider all internal data storage, computing, message routing and other components in a large distributed system for maintenance like traditional persistent connection systems, which increases the complexity of system implementation and maintenance. We try to separate the components of these parts, and leave storage and message routing to professional systems, so that the functions of each component are as simple and clear as possible.

 

At the same time, we also need rapid horizontal expansion capabilities. Various marketing activities in the Internet scene may lead to a sharp increase in the number of connections. At the same time, the number of messages delivered in the publish-subscribe model system will increase linearly with the number of topic subscribers. At this time, the client temporarily stored by the gateway has not received the message. Storage pressure is also doubled. After disassembling each component and reducing the internal state of the process, we can deploy services into containers, and use containers to achieve rapid and almost unlimited horizontal expansion.

 

The final designed system architecture is as follows:

The system mainly consists of four main components:

 

1. The access layer is implemented using OpenResty, which is responsible for connection load balancing and session maintenance

2. Long-term connection Broker, deployed in the container, responsible for protocol analysis, authentication and authentication, session, publish and subscribe logic

3. Redis storage, persistent session data

4. Kafka message queue, which distributes messages to Broker or business parties

 

Among them, Kafka and Redis are basic components widely used in the industry. They have been platformized and containerized in Zhihu, and they can also achieve rapid expansion in minutes.

 

 

 

How do we build a long connection gateway?

 

 

 

access layer

 

OpenResty is a widely used Nginx extension solution that supports Lua in the industry. It has excellent flexibility, stability, and performance. We also consider using OpenResty in the selection of the access layer solution.

 

The access layer is the side closest to the user, and two things need to be done at this layer:

 

1. Load balancing, to ensure that the number of connections on each long-connected Broker instance is relatively balanced

2. Session persistence, a single client connects to the same Broker every time, which is used to ensure the reliability of message transmission

 

There are actually many algorithms that can implement load balancing, whether it is random or various Hash algorithms can be implemented relatively well, and the more troublesome thing is session retention.

 

A common four-layer load balancing strategy is to perform consistent Hash based on the connection source IP. This can ensure that the Hash goes to the same Broker every time when the number of nodes remains unchanged, and can be found with a high probability even when the number of nodes changes slightly. previously connected nodes.

 

We have also used the source IP Hash strategy before, but there are two main disadvantages:

 

1. The distribution is not even enough. Part of the source IP is the NAT exit of a large LAN, and the number of connections on it is large, resulting in an unbalanced number of connections on the Broker.

2. The client cannot be accurately identified. When the mobile client disconnects and switches the network, it may not be able to connect back to the previous Broker.

 

Therefore, we consider seven-layer load balancing, and perform consistent hashing based on the unique identifier of the client, so that the randomness is better, and at the same time, it can also ensure correct routing after network switching. The conventional method is to completely analyze the communication protocol, and then forward the packets according to the protocol, which is very costly and increases the risk of protocol analysis errors.

 

Finally, we choose to use Nginx's preread mechanism to achieve seven-layer load balancing, which is less intrusive to the implementation of long-term connection brokers, and the resource overhead of the access layer is also small.

 

When Nginx accepts a connection, it can specify to pre-read the connection data into the preread buffer. We extract the client ID by analyzing the first message sent by the client in the preread buffer, and then use this client ID to perform consistent Hash. Got a fixed Broker.

 

publish and subscribe

 

We introduced Kafka, a message queue widely used in the industry, as a hub for internal message transmission. Some of the reasons for this usage were mentioned earlier:

 

  • Reduce the internal state of the long-connected Broker, so that the Broker can expand without pressure

  • Zhihu has been platformized internally and supports horizontal expansion

 

Some other reasons are:

 

  • Use message queue peak clipping to avoid sudden upstream or downstream messages from overwhelming the system

  • Kafka is widely used in the business system to transmit data, reducing the cost of docking with business parties

 

Among them, it is easy to understand the use of message queues to cut peaks. Let's take a look at how to use Kafka to better complete the connection with the business side.

 

(1) publish

 

The persistent connection Broker will publish the message to the Kafka Topic according to the routing configuration, and will also consume Kafka according to the subscription configuration and send the message to the subscribing client. Routing rules and subscription rules are configured separately, so four situations may occur:

 

1. Messages are routed to Kafka Topic, but not consumed, which is suitable for data reporting scenarios.

 

2. Messages are routed to Kafka Topic and consumed, common instant messaging scenarios.

 

3. Directly consume and distribute from Kafka Topic, which is used in the scenario of purely sending messages.

 

4. Messages are routed to a Topic and then consumed from another Topic, which is used in scenarios where messages need to be filtered or preprocessed.

The design flexibility of this set of routing strategies is very high, which can solve the message routing needs of almost all scenarios. At the same time, because publish-subscribe is based on Kafka, it can guarantee message reliability when processing large-scale data.

 

(2) subscribe

 

When the persistent connection Broker consumes the message from the Kafka Topic, it will look for the local subscription relationship, and then distribute the message to the client session.

 

We initially used HashMap to store the client's subscription relationship directly. When the client subscribes to a topic, we put the client's session object into the subscription map with the topic as the key. When checking the subscription relationship of the message, we can directly use the topic to get the value from the map.

 

Because this subscription relationship is a shared object, when subscription and unsubscription occur, there will be connections trying to operate this shared object. In order to avoid concurrent writing, we added a lock to HashMap, but the conflict of this global lock is very serious, which seriously affects performance.

 

In the end, we refined the granularity of locks through sharding to disperse lock conflicts.

 

Create hundreds of HashMaps locally at the same time. When you need to access data on a Key, find one of the HashMaps through Hash and modulo and then perform operations. This will disperse the global locks among hundreds of HashMaps, greatly reducing operation conflicts. It also improves overall performance.

 

conversation

 

(1) Sustainability

 

After the message is distributed to the Session object, the Session controls the delivery of the message.

 

Session will judge whether the message is an important Topic message, if so, mark the message as QoS level 1, store the message in the unreceived message queue of Redis, and send the message to the client. Wait for the client to ACK the message, and then delete the message in the unacknowledged queue.

 

Some industry solutions maintain a list in memory, and this part of the data cannot be migrated during capacity expansion or contraction. There are also some industry solutions that maintain a distributed memory storage in a persistent connection cluster, which will also increase the complexity of implementation.

 

We put the unconfirmed message queue in the external persistent storage to ensure that after a single broker goes down, the client can restore the session data even if the client goes online to connect to other brokers, reducing the burden of expansion and shrinkage.

 

(2) Sliding window

 

When sending a message, each QoS 1 message needs to be transmitted, processed by the client, and returned with an ACK to confirm that the delivery is complete, and the path takes a long time. If the amount of messages is large, each message has to wait for such a long confirmation before sending the next one, and the bandwidth of the sending channel cannot be fully utilized.

 

In order to ensure the efficiency of sending, we design a parallel sending mechanism with reference to the sliding window of TCP. We set a certain threshold as the sliding window for sending, which means that there can be so many messages being transmitted and waiting for confirmation on the channel at the same time.

The sliding window designed by our application layer is actually somewhat different from the sliding window of TCP.

 

The IP packets in the sliding window of TCP cannot be guaranteed to arrive in order, and our communication is based on TCP, so the business messages in our sliding window are in order, which is only possible when the connection status is abnormal, the client logic is abnormal, etc. Causes messages to be out of order in some windows.

 

Because the TCP protocol guarantees the order in which messages are received, there is no need to retry a single message during the normal sending process, and the unconfirmed messages in the window are resent only after the client reconnects. The receiving end of the message will also reserve a window-sized buffer for message deduplication to ensure that the messages received by the business side will not be repeated.

 

The sliding window we built based on TCP ensures the order of messages and greatly improves the throughput of transmission.

 

 

write at the end

 

The infrastructure group is responsible for Zhihu’s traffic entrance and internal infrastructure construction. Externally, we are fighting the front line of massive traffic. Internally, we provide a rock-solid infrastructure for all businesses. A request and every call on the intranet are closely related to our system.

 

Article source: https://zhuanlan.zhihu.com/p/66807833

 

I wonder if you want to know more Zhihu cases after reading this article? On July 6-7, at the 43rd MPD Workshop Beijing Station, we invited Zhihu Test Architect Wang Shouyu to give us a 3-hour in-depth sharing:

 

 

 

 

 

 

Course summary: The main goal of the QA team is to ensure the quality of product delivery. During the work process, we will encounter many common problems: poor test quality, many manual tests, and many online failures. In this Topic, the QA team of Zhihu gave a new solution to the problem, which is to obtain high-quality products by building a quality culture in which all employees pay attention to quality and participate in quality assurance.

 

In addition to Zhihu, first-line experts and technical experts from Ali, Baidu, Tencent, Sina, Qunar, Didi, NetEase, VIPKID and other enterprises will participate in the competition through speeches, practical exercises, group discussions, and PK between groups, etc. form, speak with cases, restore real work scenarios, and deliver real work skills.

Guess you like

Origin blog.csdn.net/msup789/article/details/91886373