Highlights Websocket Cluster Solution Summary

question

Suppose we have a chat application where the client communicates with the server in real time via WebSocket. In a stand-alone environment, all WebSocket connections are handled by a single server.

stand-alone scene

After user A and user B establish a connection with the web server, 用户Athey send a message to the server, and the server pushes it to the server. 用户BOn the stand-alone system, all users establish a connection with the same server, and all are sessionstored in the same server.

But as the number of users increases, we need to scale the application into a WebSocket cluster to provide better performance and scalability.

cluster scenario

When it evolves into a cluster environment, user A establishes a connection with node 1, user B establishes a connection with node 2, and 用户Asends a message to the server, but the server can no longer push it to 用户B.

In WebSocket clusters, a common problem is connection state synchronization. When a node receives a connection request from a client and establishes a WebSocket connection with it, other nodes also need to know the existence of this connection. This way, when other nodes receive the message, they can properly push the message to the node associated with that client.

However, if there is a problem with connection state synchronization, messages may be sent incorrectly or lost. For example, in a WebSocket cluster, when a client connects to node A, and other nodes (B, C) do not know that this connection exists, when node B receives a message and tries to push it to the client, It fails because it doesn't know that the client is connected to node A.

In this case, clients may miss important messages or experience inconsistent push messages. In addition, it is also crucial that the connection state of each node is synchronized if the client disconnects. If a node still thinks a client is active when in fact the client is disconnected, the node may keep trying to push messages to it, wasting resources and bandwidth.

In order to solve this problem, it is necessary to implement the consistency and synchronization mechanism of the connection state in the cluster to ensure that the message can be delivered and pushed to the client correctly.

plan

Option 1: Session sharing (not feasible)

In a WebSocket cluster, the method of sharing Session is not suitable for solving the problem of connection state synchronization. Although in HTTP, you can use shared Session to solve clustering problems, such as storing Session information in the Redis database, but this method is not feasible in WebSocket.

The Session of WebSocket is different from the Session of HTTP. The Session of WebSocket is a state related to the connection, not related to the request like the Session of HTTP. WebSocket connections cannot be shared between different servers, so WebSocket Sessions cannot be stored in shared storage.

In other words, Http short connection is stateless, websocket is long connection stateful, which node is connected to.

Solution 2: Load Balancer (Stateful Routing)

In WebSocket clusters, stateful routing using a load balancer is a common solution. The load balancer can distribute client connection requests to different nodes in the cluster to achieve load balancing and high availability.

User A and user B establish a connection with node 1, user C establishes a connection with node 2, 用户Asends a message to the server, and the server pushes it to the server 用户B.

Fixed parameter hashing (Fixed Parameter Hashing) : This strategy is to perform hash calculations based on specific parameters in the request (such as meeting IDs), and then route requests for the same hash results to the same node. This ensures that the same session or set of related connections is always routed to the same node. The advantage of this method is that the consistency of the connection is maintained, and the disadvantage is that it will cause the node load to be unbalanced , because some parameter values ​​​​may cause larger connections to be concentrated on certain nodes.

Here is an example Nginx configuration showing how to load balance WebSocket connections with a fixed parameter hash:

http {
    upstream websocket_backend {
        hash $arg_meeting_id consistent;
        server backend1.example.com:8080;
        server backend2.example.com:8080;
        server backend3.example.com:8080;
    }
    
    server {
        listen 80;
        
        location /websocket {
            proxy_pass http://websocket_backend;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "Upgrade";
        }
    }
}

Solution 3: Broadcast Mechanism (Asynchronous Mode - Suggestion)

Send a group message on each service node in the cluster to ensure that the message reaches all connected clients. This mechanism can be implemented by using message queues or broadcast protocols. However, it should be noted that the broadcast mechanism may increase network load and processing costs .

In order to solve the limitation that the sender and the receiver must be under the same server to send messages, the message can be notified to all servers in the form of message broadcast. This can be achieved by using the publish-subscribe pattern of messaging middleware. By sending the message to the middleware, and then broadcasting the message to the subscribed server, similar to broadcasting, as long as you subscribe to the message, you can receive the notification of the message.

The specific implementation can use one of the following two methods:

  1. Redis's publish subscription (Pub/Sub): Using Redis as a message middleware, the publisher publishes the message to a specific channel, and the subscriber subscribes to the channel to receive the notification of the message.
  2. Broadcasting of message queues: use message queues that support broadcasting functions, such as RabbitMQ or Apache Kafka. Publishers send messages to a specific topic (topic) of the message queue, and all consumers who subscribe to the topic can receive the message.

The above schemes decouple the sending and receiving of messages, and realize cross-server message broadcasting through message middleware. In this way, no matter which server the connected clients are distributed on, they can receive broadcast messages.

Solution 4: Route forwarding (synchronous mode)

In a WebSocket cluster, routing and forwarding can be used to process group messages. The core idea of ​​this solution is to maintain a routing table on each service node of the cluster (if you can't find it, go to redis to check it), record the connected client and the corresponding service node. When sending a message, the message is forwarded to the corresponding service node according to the routing table, and the node is responsible for sending the message to the connected client.

The specific implementation of this scheme requires the following steps:

  1. Connection routing table: Each service node maintains a connection routing table, which is used to record client connection information, including client identification (such as client ID) and corresponding service node (if you can't find it, go to redis to check).
  2. Routing protocol: Implement a routing protocol for transferring routing information and messages between service nodes in the cluster (either using the Http protocol or using the websocket protocol to simulate a client).
  3. Routing forwarding: When a message needs to be sent, the service node of the message sender determines the service node where each connected client is located according to the routing table, and forwards the message to the corresponding service node.
  4. Message sending: The service node that receives the forwarded message is responsible for sending the message to the connected client to ensure that the message can reach each client.

Through the routing and forwarding scheme, the load of group messages can be distributed to each service node, avoiding the pressure of a single node to process a large number of messages. At the same time, the scheme also ensures that messages are properly routed and forwarded to each connected client.

Solution 5: High availability (multi-active redundancy)

In a WebSocket cluster, a high-availability multi-active redundancy scheme can be used to improve system availability and fault tolerance. The core idea of ​​this solution is to work multiple servers at the same time, and allow clients to connect to multiple servers at the same time. When a message needs to be sent, each server sends the message to the clients connected to it.

The specific steps to implement the program are as follows:

  1. Multiple servers: Configure multiple servers, each running the same application, to form a cluster. These servers can be located in different physical locations or on cloud service providers.
  2. The client connects to multiple servers, and each client can establish connections with multiple servers at the same time.
  3. Message sending: When a message needs to be sent, each server sends the message to the clients connected to it. This means that each server needs to maintain its own list of connections and send messages to connected clients.
  4. Fault-tolerant processing: If a server fails or disconnects, other servers can still continue to send messages to clients.

Guess you like

Origin blog.csdn.net/abu935009066/article/details/131402043