Tracing the problem: QA pressure measurement tool http server persistent connection is always close case

1. Background

Recently QA single line of pressure-sensing module (non-full link pressure measurement), tcp thrift at http client and the server are always close at a time.
See server log shows i / o timeout.
The end result is: QPS is too small, a long time is not a transmission request, server socket active close

2. Check thrift framework source code

thrift processing request frame pseudo-code as follows:

//服务端listen
for {
    // 服务端接收一个tcp连接请求
    client := server.accept()

    // 独立启动一个goroutine处理client后续请求
    go fun () {
        inputTrans := xxx(client)
        outputTrans := xxx(client)
        if inputTrans != nil {  //return 时, 关闭读
            defer   inputTrans.close()   
        }
        if outputTrans != nil {
            defer   outputTrans.close() //return时, 关闭写
        }
        
        //独立goroutine处理写结果 (使用channel同步)
        go func(
            for {
                //超时,或者其他错误, return
            }
        ) ()

        //for循环: 服务读取请求 (两个for循环使用channel同步)
        for {
            //超时,或者其他错误, return
        }
    }()
}

As can be seen,

  • thrift multiplexing frame and not as the io image epollo, but a goroutine start processing for each individual client
  • Dual TCP when the above code is read, write processing, respectively, using the sync channel
  • Experiencing client timeouts or error, server will take the initiative to close tcp connection (using SetDeadline, SetReadDeadline, SetWriteDeadline set the timeout)

3. Possible two-out situations

3.1 server write timeout

Pressure measuring tool, only responsible for sending the request, the read request is generally not.
If you use a socket connected directly to the server thrift port, send data only, not recv data. It may cause the server to send buffer is full, send the data out.

3.2 server read timeout

client qps too small, the transmission request interval is greater than the read time-out time setting server.

4. Summary

This is not a long connection, the problem of short connections, server is a server-side self-protection mechanism.
Server for receiving a large number of client requests, if a lot of useless connections exist not only take up server resources, more serious cases will make the case to other client can not connect to the server.
You can use other methods to solve:

  • client and server custom heartbeat protocol. The secret is transmitted no-op command
  • For example redis client, can be disconnected if a retransmission (first connector, recurrence)

Guess you like

Origin www.cnblogs.com/xudong-bupt/p/11334282.html