Tomcat websocket concurrency problem solving (4)

another problem

Since the optimization was done last time, it seems that the program runs OK, but recently I found that the log reported this error:

java.lang.IllegalStateException: The remote endpoint was in state [TEXT_PARTIAL_WRITING] which is an invalid state for called method
        at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.checkState(WsRemoteEndpointImplBase.java:1224)
        at org.apache.tomcat.websocket.WsRemoteEndpointImplBase$StateMachine.textPartialStart(WsRemoteEndpointImplBase.java:1182)
        at org.apache.tomcat.websocket.WsRemoteEndpointImplBase.sendPartialString(WsRemoteEndpointImplBase.java:222)
        at org.apache.tomcat.websocket.WsRemoteEndpointBasic.sendText(WsRemoteEndpointBasic.java:49)
        at org.springframework.web.socket.adapter.standard.StandardWebSocketSession.sendTextMessage(StandardWebSocketSession.java:203)
        at org.springframework.web.socket.adapter.AbstractWebSocketSession.sendMessage(AbstractWebSocketSession.java:101)

Why is this? Isn't the same session already synchronized?

Look carefully here, it is different from the previous error report, this time the state is TEXT_PARTIAL_WRITING, so how did this state come from? Why would it report an error? We look at the code according to the exception stack information.


	@Override
	protected void sendTextMessage(TextMessage message) throws IOException {
		getNativeSession().getBasicRemote().sendText(message.getPayload(), message.isLast());
	}
	
   @Override
    public void sendText(String fragment, boolean isLast) throws IOException {
        base.sendPartialString(fragment, isLast);
    }
    
    public void sendPartialString(String fragment, boolean isLast)
            throws IOException {
        if (fragment == null) {
            throw new IllegalArgumentException(sm.getString("wsRemoteEndpoint.nullData"));
        }
        stateMachine.textPartialStart();
        sendMessageBlock(CharBuffer.wrap(fragment), isLast);
    }
    
    public synchronized void textPartialStart() {
        checkState(State.OPEN, State.TEXT_PARTIAL_READY);
        state = State.TEXT_PARTIAL_WRITING;
    }
    
    void sendMessageBlock(CharBuffer part, boolean last) throws IOException {
        long timeoutExpiry = getTimeoutExpiry();
        boolean isDone = false;
        while (!isDone) {
            encoderBuffer.clear();
            CoderResult cr = encoder.encode(part, encoderBuffer, true);
            if (cr.isError()) {
                throw new IllegalArgumentException(cr.toString());
            }
            isDone = !cr.isOverflow();
            encoderBuffer.flip();
            sendMessageBlock(Constants.OPCODE_TEXT, encoderBuffer, last && isDone, timeoutExpiry);
        }
        stateMachine.complete(last);
    }
    

It turns out that StandardWebSocketSession.sendTextMessage calls the sendPartialString method. This method checks whether the state is OPEN or TEXT_PARTIAL_READY before sending. After the check is passed, the state is set to TEXT_PARTIAL_WRITING. After sending, the state is reset to TEXT_PARTIAL_READY or OPEN through the stateMachine.complete method ( Depends on whether the message has been sent, i.e. parameter last = true|false)

In theory, the messages of the same session are all sent in the same thread (JobHandler), and the state must be changed according to the state set by stateMachine, so there will be no problem. But suppose a situation, that is, in the sendMessageBlock method in the last step, the connection is closed due to an abnormal network, and the sending method throws an exception, then the reset code stateMachine.complete(last) will not be executed, and the state will remain at TEXT_PARTIAL_WRITING has not changed, but there are still messages in the queue that have not been sent by this session, then the above exception will be thrown as soon as the following tasks are executed to checkState(State.OPEN, State.TEXT_PARTIAL_READY).

Solution

This problem is not big, since the connection has been closed, it fails to send the message. However, the first error log will kill the obsessive-compulsive disorder, and the second will cause some resource waste.

The solution I can think of is to receive this exception at the upper layer, log trace/debug log and then ignore it, and then use WeakReference management where the session is held for a long time in the program. However, the gap between session invalidation and being recycled will inevitably lead to waste.

From this point of view, the best way is to use the ConcurrentWebsocketSessionDecorator provided by Spring to wrap the session in one layer, so that since the message queue is held by the session, once an exception is thrown, the session will become invalid, and the remaining unsent messages will also become invalid. , to avoid the waste of memory and CPU resources. A big factory is a big factory. . . Stop talking to me about akimbo!

Supplement (March 06, 2018)

Today I thought of some things to add. In fact, my processing method is not a complete failure compared to ConcurrentWebsocketSessionDecorator.

In my implementation method, the sending of the message is done asynchronously, and the thread processing the user request will not block, and it will return immediately after being thrown into the queue, because the sending of the message is actually done by the JobHandler thread. The message sending method of ConcurrentWebsocketSessionDecorator will block the thread, and it is performed in the http thread that processes the user request.

Consider an extreme case. After a user thread A gets the lock, it sends a message, and then another thread B throws the message into the buffer, tries to get the lock, but fails to return. After A sends the message, it finds that the buffer is not empty, and takes it out and continues to send it. Then, when a thread C comes in, it throws the message into the buffer, but cannot get the lock and returns. When A comes back, he finds that there is something in the buffer. . . In this way, thread A becomes a thread dedicated to sending messages, but this is the thread for processing user requests, and there may be other things down there. . . It's all delayed, isn't it? . .

Of course, this is an extreme situation, just to illustrate that my implementation method can avoid this situation. As the old saying goes, it depends on the scene if there is no silver bullet.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325053039&siteId=291194637